Distribute and run LLMs with a single file.
-
Updated
Apr 23, 2026 - C++
Distribute and run LLMs with a single file.
High-speed Large Language Model Serving for Local Deployment
Mano-P: Open-source GUI-VLA agent for edge devices. #1 on OSWorld (specialized, 58.2%). Runs locally on Apple M4 Mac mini/MacBook — no data leaves your device.Mano-P 是一个开源 GUI-VLA 项目,支持在 Mac mini/MacBook 上或通过算力棒本地运行推理,实现纯视觉驱动的跨平台 GUI 自动化操作。数据完全本地处理,支持复杂多步骤任务规划与执行。
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
InferrLM - On-device AI for iOS & Android
LLM chatbot example using OpenVINO with RAG (Retrieval Augmented Generation).
A fully browser-native RAG application for document Q&A, powered by Rust and WebAssembly with local vector search, embeddings, and in-browser LLM inference.
Notolog Markdown Editor
Local AI music generator with smart lyrics: Gradio web UI for HeartMuLa + Ollama/OpenAI, tags, history, and high-fidelity audio.
Modern desktop application (Rust + Tauri v2 + Svelte 5 + Candle (HF)) for communicating with AI models that runs completely locally on your computer. No subscriptions, no data sent to the internet — just you and your personal AI assistant
Edge Agent Lab is an Android testing platform for evaluating small language model (SLM) agents directly on mobile devices.
Desktop AI tutoring app with local inference using Ollama for privacy-focused education.
An opinionated distribution of pi, the coding agent. Extensions for project memory, spec-driven development, local LLM inference, parallel task decomposition, and more.
MindSpark: ThoughtForge — A rune-forged conversation engine by RuneForgeAI, built for tiny GPT-Nothing-class minds. Through guided memory, lean cognition, and relentless refinement, it gives small local models depth, presence, and will—bringing powerful AI to edge devices, low-power hardware, and the Third Path beyond bloated machine empires.
MCP server that runs local LLMs (with full access to MCP tools included). Callable by Python to chain MCP tools with local intelligence.
AI infrastructure for people who ship. Production agent setups, recovery kits, and ops consulting for AU/NZ businesses.
A lightweight CUDA-based local inference platform built around Z-Image Turbo by Tongyi
Docker-first, local-first AI workload toolkit for macOS Apple Silicon using Ollama, llama.cpp, LiteLLM, and Claude-compatible local endpoints.
Verify claims using AI agents that debate using scraped evidence and local language models.
Calculate your AI agent infrastructure costs. Compare cloud-only vs hybrid local+cloud inference. Real numbers from production.
Add a description, image, and links to the local-inference topic page so that developers can more easily learn about it.
To associate your repository with the local-inference topic, visit your repo's landing page and select "manage topics."