local-inference

Star

Here are 53 public repositories matching this topic...

Tiiny-AI / PowerInfer

Star

High-speed Large Language Model Serving for Local Deployment

llama large-language-models llm local-inference llm-inference

Updated Jan 24, 2026
C++

efeslab / fiddler

Star

[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration

mixture-of-experts llm local-inference llm-inference mixtral-8x7b

Updated Nov 18, 2024
Python

Modern desktop application (Rust + Tauri v2 + Svelte 5 + Candle (HF)) for communicating with AI models that runs completely locally on your computer. No subscriptions, no data sent to the internet — just you and your personal AI assistant

Updated Mar 24, 2026
Rust

Mininglamp-AI / Mano-P

Star

Mano-P: Open-source GUI-VLA agent for edge devices. #1 on OSWorld (specialized, 58.2%). Runs locally on Apple M4 Mac mini/MacBook — no data leaves your device.Mano-P 是一个开源 GUI-VLA 项目，支持在 Mac mini/MacBook 上或通过算力棒本地运行推理，实现纯视觉驱动的跨平台 GUI 自动化操作。数据完全本地处理，支持复杂多步骤任务规划与执行。

desktop-automation mano gui-automation edge-computing on-device-ai local-inference vision-language-action multimodal-ai gui-grounding osworld computer-use-agents visual-language-model mano-p

Updated Mar 26, 2026

sbhjt-gr / InferrLM

Star

On-device AI for iOS & Android

embeddings gemini http-server openai document-processing rag edge-ai on-device-ai local-inference anthropic llamacpp llama-cpp local-llm gguf multimodal-ai

Updated Mar 29, 2026
TypeScript

notolog / notolog-editor

Star

Notolog Markdown Editor

python markdown qt emacs markdown-editor onnx python-ai on-device-ai ai-assistant pyside6 python-qt local-inference llama-cpp local-llm qwen llama-cpp-python gguf phi-4 privacy-first-ai

Updated Feb 7, 2026
Python

YASSERRMD / barq-web-rag

Star

A fully browser-native RAG application for document Q&A, powered by Rust and WebAssembly with local vector search, embeddings, and in-browser LLM inference.

rust wasm rag local-inference barq rag-in-browser browser-vector-db

Updated Mar 24, 2026
JavaScript

BorjaOteroFerreira / IALab-Suite

Star

Tool for test diferents large language models without code.

api-rest chat-application flask-api inference-api large-language-models llm local-inference llamacpp llm-inference llama2 llama-cpp-python llama2-7b mixtral-8x7b

Updated Oct 18, 2025
Python

strnad / HeartMuse

Sponsor

Star

Local AI music generator with smart lyrics: Gradio web UI for HeartMuLa + Ollama/OpenAI, tags, history, and high-fidelity audio.

Updated Mar 9, 2026
Python

michael-borck / study-buddy

Star

Desktop AI tutoring app with local inference using Ollama for privacy-focused education.

electron javascript desktop-app css education privacy typescript ai offline nextjs edtech desktop-application llama tutoring privacy-focused local-inference ollama ai-tutor offline-application

Updated Mar 1, 2026
TypeScript

yas-sim / openvino-llm-chatbot-rag

Star

LLM chatbot example using OpenVINO with RAG (Retrieval Augmented Generation).

natural-language-processing offline chatbot intel edge-computing rag openvino huggingface edge-inference cloud-free llm local-inference langchain dolly2 retrieval-augmented-generation llama2 neural-chat

Updated Jan 25, 2024
Python

LianHe-BI / Basic-Qwen-3B-SD-Prompt-SOUL-ARCHITECT-v2.0-DEMO

Star

EN: An overfitted SD prompt engine with severe "aesthetic snobbery," forcibly transforming mundane ideas into professional-grade physical rendering instructions. CN: 一个具备“审美洁癖”的过拟合提示词引擎，强行将平庸构思纠偏为具备极致物理质感的工业级渲染指令。