llama-cpp

Star

Here are 26 public repositories matching this topic...

fboulnois / llama-cpp-docker

Star

Run llama.cpp in a GPU accelerated Docker container

docker docker-compose llama mistral llm chatgpt llama-cpp llama2 llama3

Updated Apr 3, 2026
Shell

raketenkater / llm-server

Star

Smart launcher for llama.cpp / ik_llama.cpp — auto-detects GPUs, optimizes MoE placement, crash recovery

cli metal cuda inference moe multi-gpu apple-silicon llm llama-cpp local-ai gguf ik-llama-cpp

Updated Apr 10, 2026
Shell

enescingoz / mac-llm-bench

Star

Community benchmark database for running LLMs on Apple Silicon Macs

benchmark inference apple-silicon llm llama-cpp local-llm llm-benchmark tokens-per-second

Updated Apr 9, 2026
Shell

TheAiSingularity / hermesclaw

Star

Hermes Agent (NousResearch) sandboxed by NVIDIA OpenShell — hardware-enforced network/filesystem/syscall policy, full memory + gateway stack

docker security sandbox self-hosted nvidia hermes ai-agent llm llama-cpp openclaw nemoclaw openshell

Updated Mar 31, 2026
Shell

stampby / halo-ai-core

Star

Bare-metal AI platform for AMD Strix Halo. One script. Everything works. Lego blocks — snap in what you need.

privacy ai amd gpu systemd inference arch-linux self-hosted caddy gaia bare-metal lemonade rocm agent-framework llama-cpp local-ai ryzen-ai strix-halo

Updated Apr 12, 2026
Shell

hogeheer499-commits / strix-halo-guide

Star

The definitive Strix Halo LLM guide — 65 t/s on a $2,999 mini PC. Live benchmarks, tested optimizations, and everything that doesn't work.

benchmark amd optimization vulkan inference rocm mini-pc unified-memory beelink llm llama-cpp local-llm ollama gguf rdna3 strix-halo gfx1151 dgx-spark ryzen-ai-max

Updated Mar 21, 2026
Shell

infernet-org / foundry

Star

GPU-tuned Docker images for LLM inference on consumer hardware. Auto-detects your GPU, downloads the model, serves an OpenAI-compatible API.

docker moe nvidia-smi ai-observability llama-cpp open-weight rtx5090 cuda-12 agentic-inference

Updated Mar 2, 2026
Shell

zeeb0tt / runpod-llm

Star

Runpod-LLM provides ready-to-use container scripts for running large language models (LLMs) easily on RunPod.

llm runpod llms llmops llm-serving llamacpp llama-cpp llm-training llm-inference runpod-worker ollama llama-cpp-python ollama-api runpod-serverless runpods llamacpp-python runpod-endpoint

Updated May 20, 2025
Shell

BjornMelin / local-llm-workbench

Star

🧠 A comprehensive toolkit for benchmarking, optimizing, and deploying local Large Language Models. Includes performance testing tools, optimized configurations for CPU/GPU/hybrid setups, and detailed guides to maximize LLM performance on your hardware.

cuda gpu-acceleration model-management inference-optimization model-quantization cpu-inference llama-cpp local-llm llm-deployment llm-benchmarking ollama-optimization hybrid-inference wsl-ai-setup context-window-scaling

Updated Mar 27, 2025
Shell

ukkit / llama-chat

Star

Lightweight web UI for llama.cpp with dynamic model switching, chat history & markdown support. No GPU required. Perfect for local AI development.

javascript sqlite self-hosted developer-tools web-interface python-flask chat-ui ai-assistant conversation-history model-switching cpu-inference performance-optimized llama-cpp local-ai llm-frontend open-source-ai offline-ai private-ai markdown-chat

Updated Dec 10, 2025
Shell

aiden0rchad / agentzero-pi-setup

Star

Deploy AgentZero on a Raspberry Pi 4 (ARM64) with a local LLM. Includes automated setup script, ARM64 compatibility fixes, Tailscale remote access, systemd auto-start, and Telegram bridge. Fully private, no cloud APIs.

raspberry-pi self-hosted arm64 homelab tailscale ai-agent llm llama-cpp local-ai agent-zero

Updated Feb 27, 2026
Shell

capetron / private-ai-deployment-guide

Star

Complete guide to deploying private, on-premise AI and LLMs: hardware selection, model comparison (ollama vs vLLM vs llama.cpp), security hardening, and AI governance policy templates. By Petronella Technology Group.

machine-learning gpu self-hosted ai-security llm ai-governance llama-cpp vllm ollama private-ai ai-deployment on-premise-ai

Updated Mar 6, 2026
Shell

mikesdatawork / 101-series

Star

A comprehensive collection of the 101 most useful commands for various programming languages, tools, and technologies.

python aws machine-learning azure terraform gcp transformers r-language peft huggingface ai-tools large-language-models llama-cpp vllm gptq qlora ollama gguf llm-interence

Updated Jan 4, 2026
Shell

ccebelenski / oc-ls-configer

Star

Auto-configure opencode to use a local llama-swap instance with model and context detection

opencode llama-cpp local-llm llama-swap

Updated Mar 30, 2026
Shell

magusverma / god-mode-local-ai

Star

Privacy-first local AI coding agent setup — dual-model (Qwen + Gemma), 4 language profiles, 15+ skills, zero cloud dependency

bash privacy ai developer-tools gemma llama-cpp local-llm qwen offline-ai coding-agent

Updated Apr 12, 2026
Shell

TiiZss / PicoClawInstaller

Star

The Ultimate Offline AI & Deployment Suite for Raspberry Pi 3B

bash raspberry-pi ai telegram-bot picovoice llm llama-cpp qwen

Updated Feb 22, 2026
Shell

zetta-app / llama.cpp_turboquant

Star

LLM inference with 7x KV cache compression. Combines llama.cpp (production inference engine) with TurboQuant (KV quantization). Run 131K token context on 16GB VRAM. OpenAI-compatible API server. Supports 100+ model architectures.

machine-learning deep-learning gpu cuda inference neural-networks quantization language-model kv-cache openai-api llm llama-cpp local-ai memory-compression turboquant

Updated Apr 9, 2026
Shell

VIS-WA / LLM-replication

Star

Repo to download, save and run quantised LLM models using Llama.cpp and benchmark the results (private use)

bash benchmark llama-cpp llm-inference

Updated Feb 28, 2024
Shell

Rybens92 / llama-installer

Star

Automatic installer for llama.cpp binaries across different platforms and GPU configurations.

linux bash llama llamacpp llama-cpp local-llm llm-inference qwen3 gpt-oss

Updated Dec 17, 2025
Shell

thupalo / llama-on-dgx-spark

Star

Deploy Nemotron 3 Nano 30B with 1M context window on NVIDIA DGX Spark using llama.cpp (Blackwell sm_121, Q4_0 KV cache quantization)

cuda inference aarch64 mamba mixture-of-experts blackwell long-context llama-cpp local-llm gguf kv-cache-quantization nemotron nvidia-dgx-spark 1m-context-window

Updated Mar 22, 2026
Shell

Improve this page

Add a description, image, and links to the llama-cpp topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llama-cpp topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-cpp

Here are 26 public repositories matching this topic...

fboulnois / llama-cpp-docker

raketenkater / llm-server

enescingoz / mac-llm-bench

TheAiSingularity / hermesclaw

stampby / halo-ai-core

hogeheer499-commits / strix-halo-guide

infernet-org / foundry

zeeb0tt / runpod-llm

BjornMelin / local-llm-workbench

ukkit / llama-chat

aiden0rchad / agentzero-pi-setup

capetron / private-ai-deployment-guide

mikesdatawork / 101-series

ccebelenski / oc-ls-configer

magusverma / god-mode-local-ai

TiiZss / PicoClawInstaller

zetta-app / llama.cpp_turboquant

VIS-WA / LLM-replication

Rybens92 / llama-installer

thupalo / llama-on-dgx-spark

Improve this page

Add this topic to your repo