High-speed Large Language Model Serving for Local Deployment
-
Updated
Jan 24, 2026 - C++
High-speed Large Language Model Serving for Local Deployment
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
Modern desktop application (Rust + Tauri v2 + Svelte 5 + Candle (HF)) for communicating with AI models that runs completely locally on your computer. No subscriptions, no data sent to the internet — just you and your personal AI assistant
Mano-P: Open-source GUI-VLA agent for edge devices. #1 on OSWorld (specialized, 58.2%). Runs locally on Apple M4 Mac mini/MacBook — no data leaves your device.Mano-P 是一个开源 GUI-VLA 项目,支持在 Mac mini/MacBook 上或通过算力棒本地运行推理,实现纯视觉驱动的跨平台 GUI 自动化操作。数据完全本地处理,支持复杂多步骤任务规划与执行。
On-device AI for iOS & Android
Notolog Markdown Editor
A fully browser-native RAG application for document Q&A, powered by Rust and WebAssembly with local vector search, embeddings, and in-browser LLM inference.
Tool for test diferents large language models without code.
Local AI music generator with smart lyrics: Gradio web UI for HeartMuLa + Ollama/OpenAI, tags, history, and high-fidelity audio.
Desktop AI tutoring app with local inference using Ollama for privacy-focused education.
LLM chatbot example using OpenVINO with RAG (Retrieval Augmented Generation).
EN: An overfitted SD prompt engine with severe "aesthetic snobbery," forcibly transforming mundane ideas into professional-grade physical rendering instructions. CN: 一个具备“审美洁癖”的过拟合提示词引擎,强行将平庸构思纠偏为具备极致物理质感的工业级渲染指令。
A lightweight CUDA-based local inference platform built around Z-Image Turbo by Tongyi
Llama.cpp but for kindles
Lightweight 6GB VRAM Gradio web app with auto-installer for running AuraFlow locally — no cloud, no clutter.
Local embeddings server for Apple Silicon using MLX, providing OpenAI-compatible API endpoints
Edge Agent Lab is an Android testing platform for evaluating small language model (SLM) agents directly on mobile devices.
Privacy‑first, real‑time speech‑to‑text dictation. 100% local inference in Rust; hotkey to dictate anywhere (macOS, Linux, Windows).
Verify claims using AI agents that debate using scraped evidence and local language models.
Windows-based, high-performance, full stack, local LLM chat application written in C# .NET 10 using the Lethe AI library.
Add a description, image, and links to the local-inference topic page so that developers can more easily learn about it.
To associate your repository with the local-inference topic, visit your repo's landing page and select "manage topics."