local-inference

Star

Here are 84 public repositories matching this topic...

mozilla-ai / llamafile

Star

Distribute and run LLMs with a single file.

cross-platform speech-to-text local-inference llama-cpp local-llm local-ai gguf open-source-ai single-file-executable

Updated Apr 23, 2026
C++

Tiiny-AI / PowerInfer

Star

High-speed Large Language Model Serving for Local Deployment

llama large-language-models llm local-inference llm-inference

Updated Jan 24, 2026
C++

Mano-P: Open-source GUI-VLA agent for edge devices. #1 on OSWorld (specialized, 58.2%). Runs locally on Apple M4 Mac mini/MacBook — no data leaves your device.Mano-P 是一个开源 GUI-VLA 项目，支持在 Mac mini/MacBook 上或通过算力棒本地运行推理，实现纯视觉驱动的跨平台 GUI 自动化操作。数据完全本地处理，支持复杂多步骤任务规划与执行。

desktop-automation mano gui-automation edge-computing on-device-ai local-inference vision-language-action multimodal-ai gui-grounding osworld computer-use-agents visual-language-model mano-p

Updated Apr 20, 2026

efeslab / fiddler

Star

[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration

mixture-of-experts llm local-inference llm-inference mixtral-8x7b

Updated Nov 18, 2024
Python

sbhjt-gr / InferrLM

Star

InferrLM - On-device AI for iOS & Android

embeddings gemini http-server openai document-processing rag edge-ai on-device-ai local-inference anthropic llamacpp llama-cpp local-llm gguf multimodal-ai

Updated Apr 17, 2026
TypeScript

yas-sim / openvino-llm-chatbot-rag

Star

LLM chatbot example using OpenVINO with RAG (Retrieval Augmented Generation).

natural-language-processing offline chatbot intel edge-computing rag openvino huggingface edge-inference cloud-free llm local-inference langchain dolly2 retrieval-augmented-generation llama2 neural-chat

Updated Jan 25, 2024
Python

YASSERRMD / barq-web-rag

Star

A fully browser-native RAG application for document Q&A, powered by Rust and WebAssembly with local vector search, embeddings, and in-browser LLM inference.

rust wasm rag local-inference barq rag-in-browser browser-vector-db

Updated Mar 24, 2026
JavaScript

notolog / notolog-editor

Star

Notolog Markdown Editor

python markdown qt emacs markdown-editor onnx python-ai on-device-ai ai-assistant pyside6 python-qt local-inference llama-cpp local-llm qwen llama-cpp-python gguf phi-4 privacy-first-ai

Updated Feb 7, 2026
Python

strnad / HeartMuse

Sponsor

Star

Local AI music generator with smart lyrics: Gradio web UI for HeartMuLa + Ollama/OpenAI, tags, history, and high-fidelity audio.

Updated Mar 9, 2026
Python

oxide-lab / Oxide-Lab

Star

Modern desktop application (Rust + Tauri v2 + Svelte 5 + Candle (HF)) for communicating with AI models that runs completely locally on your computer. No subscriptions, no data sent to the internet — just you and your personal AI assistant

Updated Mar 24, 2026
Rust

monday8am / edgelab

Star

Edge Agent Lab is an Android testing platform for evaluating small language model (SLM) agents directly on mobile devices.

android-development gemma litert edge-ai mediapipe local-inference function-calling agentic-ai koog litert-lm

Updated Apr 28, 2026
Kotlin

michael-borck / study-buddy

Star

Desktop AI tutoring app with local inference using Ollama for privacy-focused education.

electron javascript desktop-app css education privacy typescript ai offline nextjs edtech desktop-application llama tutoring privacy-focused local-inference ollama ai-tutor offline-application

Updated Apr 13, 2026
TypeScript

styrene-lab / omegon-pi

Star

An opinionated distribution of pi, the coding agent. Extensions for project memory, spec-driven development, local LLM inference, parallel task decomposition, and more.

cli typescript pi developer-tools ai-agent llm local-inference ollama coding-agent

Updated Mar 21, 2026
TypeScript

hrabanazviking / MindSpark_ThoughtForge

Star

MindSpark: ThoughtForge — A rune-forged conversation engine by RuneForgeAI, built for tiny GPT-Nothing-class minds. Through guided memory, lean cognition, and relentless refinement, it gives small local models depth, presence, and will—bringing powerful AI to edge devices, low-power hardware, and the Third Path beyond bloated machine empires.