Local LLM inference engine written from scratch in Rust — hand-written AVX-512 assembly kernels, Metal & Vulkan compute shaders. Supports Qwen3, Mistral3, ... Q4/INT8/BF16 quantization.
-
Updated
Mar 18, 2026 - Rust
Local LLM inference engine written from scratch in Rust — hand-written AVX-512 assembly kernels, Metal & Vulkan compute shaders. Supports Qwen3, Mistral3, ... Q4/INT8/BF16 quantization.
Real-time LLM interp framework in Rust
rust api wrapper for llm-inference chatllm.cpp
Rust-native AI inference & training engine. Block AttnRes linear-time transformer with profile-driven GPU dispatch, TurboQuant, GGUF/SafeTensors, LoRA, multimodal (vision/audio/video), FerrisRes Armor security. 39K LOC, 951 tests. Adapts to any GPU via wgpu — Intel iGPU to H100. No Python.
Gemma 4 on a 24GB MacBook: measured recipes, runtimes, and fallback paths.
Offline multimodal evidence capture that emits a signed, locally verifiable .witness bundle. Tauri + Rust + Gemma 4 + Ed25519. Static HTML verifier runs with no server.
Local-first AI document intelligence for CPA firms. Extracts W-2s, 1099s, and financial statements on-device — no data leaves your machine.
🖥️ Explore CPU-SLM, a Rust-based SLM/LLM project that runs on CPU, offering efficient inference and chat with minimal dependencies.
Translate EPUB files using Gemma
Free, local-first AI dictation for macOS and Windows. Auditable, offline-first, privacy by design. [Pre-PoC]
Add a description, image, and links to the gemma topic page so that developers can more easily learn about it.
To associate your repository with the gemma topic, visit your repo's landing page and select "manage topics."