llama-cpp
Here are 26 public repositories matching this topic...
Community benchmark database for running LLMs on Apple Silicon Macs
-
Updated
Apr 9, 2026 - Shell
Bare-metal AI platform for AMD Strix Halo. One script. Everything works. Lego blocks — snap in what you need.
-
Updated
Apr 12, 2026 - Shell
The definitive Strix Halo LLM guide — 65 t/s on a $2,999 mini PC. Live benchmarks, tested optimizations, and everything that doesn't work.
-
Updated
Mar 21, 2026 - Shell
GPU-tuned Docker images for LLM inference on consumer hardware. Auto-detects your GPU, downloads the model, serves an OpenAI-compatible API.
-
Updated
Mar 2, 2026 - Shell
Runpod-LLM provides ready-to-use container scripts for running large language models (LLMs) easily on RunPod.
-
Updated
May 20, 2025 - Shell
🧠 A comprehensive toolkit for benchmarking, optimizing, and deploying local Large Language Models. Includes performance testing tools, optimized configurations for CPU/GPU/hybrid setups, and detailed guides to maximize LLM performance on your hardware.
-
Updated
Mar 27, 2025 - Shell
Lightweight web UI for llama.cpp with dynamic model switching, chat history & markdown support. No GPU required. Perfect for local AI development.
-
Updated
Dec 10, 2025 - Shell
Deploy AgentZero on a Raspberry Pi 4 (ARM64) with a local LLM. Includes automated setup script, ARM64 compatibility fixes, Tailscale remote access, systemd auto-start, and Telegram bridge. Fully private, no cloud APIs.
-
Updated
Feb 27, 2026 - Shell
Complete guide to deploying private, on-premise AI and LLMs: hardware selection, model comparison (ollama vs vLLM vs llama.cpp), security hardening, and AI governance policy templates. By Petronella Technology Group.
-
Updated
Mar 6, 2026 - Shell
A comprehensive collection of the 101 most useful commands for various programming languages, tools, and technologies.
-
Updated
Jan 4, 2026 - Shell
Auto-configure opencode to use a local llama-swap instance with model and context detection
-
Updated
Mar 30, 2026 - Shell
Privacy-first local AI coding agent setup — dual-model (Qwen + Gemma), 4 language profiles, 15+ skills, zero cloud dependency
-
Updated
Apr 12, 2026 - Shell
The Ultimate Offline AI & Deployment Suite for Raspberry Pi 3B
-
Updated
Feb 22, 2026 - Shell
LLM inference with 7x KV cache compression. Combines llama.cpp (production inference engine) with TurboQuant (KV quantization). Run 131K token context on 16GB VRAM. OpenAI-compatible API server. Supports 100+ model architectures.
-
Updated
Apr 9, 2026 - Shell
Repo to download, save and run quantised LLM models using Llama.cpp and benchmark the results (private use)
-
Updated
Feb 28, 2024 - Shell
Deploy Nemotron 3 Nano 30B with 1M context window on NVIDIA DGX Spark using llama.cpp (Blackwell sm_121, Q4_0 KV cache quantization)
-
Updated
Mar 22, 2026 - Shell
Improve this page
Add a description, image, and links to the llama-cpp topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the llama-cpp topic, visit your repo's landing page and select "manage topics."