Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.
-
Updated
May 16, 2026 - Python
Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.
MacOS menu‑bar utility to adjust Apple Silicon GPU VRAM allocation
Social AI Agent Blueprint. Powered by vram.ai
First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.
Rust block device in userspace
TITAN — autonomous AI agent framework. ~270 tools, 37 LLM providers, 19 channels, GPU VRAM orchestration, mesh networking, LiveKit voice, mission canvas UI, homeostatic drive layer. Open-source, TypeScript, MIT. npm i -g titan-agent
Estimate whether a Hugging Face model fits and fine-tunes on your local GPU.
Find which LLMs actually fit on your hardware. Client-side GPU detection, quantization-aware memory estimation, and speed predictions.
A simple tool to find out GPU VRAM requirements for running LLMs
Add a description, image, and links to the vram topic page so that developers can more easily learn about it.
To associate your repository with the vram topic, visit your repo's landing page and select "manage topics."