-
llmfit
Right-size LLM models to your system hardware. Interactive TUI and CLI to match models against available RAM, CPU, and GPU.
-
llama-cpp-4
llama.cpp bindings for Rust
-
gguf-utils
handling gguf files
-
oxibonsai-cli
Pure Rust 1-bit LLM inference engine for PrismML Bonsai models β CLI
-
ggus
GGUF in Rustπ¦
-
forgellm-cli
CLI tool for the ForgeLLM compiler
-
forgellm-codegen-metal
Metal GPU code generation for Apple Silicon inference in ForgeLLM
-
forgellm-frontend
Model parsing (GGUF, SafeTensors) and IR construction for ForgeLLM
-
forgellm-codegen-cpu
CPU code generation (x86 AVX2/512, ARM NEON) for ForgeLLM
-
rust-hf-downloader
TUI and CLI for searching and downloading HuggingFace models
-
ztensor
Unified, zero-copy, and safe I/O for deep learning formats
-
oxillama-gguf
GGUF v3 parser and tensor loader for OxiLLaMa
-
llama-mcp-server
Local LLM inference MCP server powered by llama.cpp
-
oxillama-arch
Model architecture implementations β LLaMA, Qwen3, Mistral, Gemma, Phi
-
rig-llama-cpp
Rig completion provider for local GGUF models via llama.cpp, with streaming, tool calling, reasoning, and multimodal (mtmd) support
-
llama-gguf
A high-performance Rust implementation of llama.cpp - LLM inference engine with full GGUF support
-
large
Rust LLM inference implementation
-
gguf-rs-lib
reading and writing GGUF (GGML Universal Format) files
-
llama-rs
A high-performance Rust implementation of llama.cpp - LLM inference engine with full GGUF support
-
forgellm-codegen-gpu
GPU code generation via wgpu/WGSL for ForgeLLM
-
onde-cli
Terminal UI for signing up, signing in, and managing your Onde Inference account
-
qlora-rs
4-bit quantized LoRA (QLoRA) implementation with dual GGUF and Candle native export for Rust
-
yule
Local AI inference runtime β verified models, sandboxed execution, signed audit logs
-
forgellm-codegen-wasm
WASM + WebGPU code generation for ForgeLLM
-
llm_hunter
forensic research of LLM gguf files and more
-
voxtral-micro
Voxtral Micro - Minimal text-to-speech with Q4 GGUF quantization
-
forgellm-optimizer
Graph-level optimizations for ForgeLLM (fusion, layout, quantization, memory planning)
-
sonr
High-performance semantic search tool for local codebases
-
wax-llm
Command-line LLM inference with Candle, safetensors, GGUF, and Metal support
-
forgellm-runtime
Minimal runtime for ForgeLLM (KV cache, sampling, tokenizer, API server)
-
ferrum-quantization
Weight-format abstraction (Dense / GPTQ / AWQ / GGUF) for Ferrum models
-
shimmytok
Pure Rust tokenizer for GGUF models with llama.cpp compatibility (SentencePiece + BPE + WPM + UGM + RWKV)
-
a3s-power
A3S Power β Privacy-preserving LLM inference for TEE environments
-
neutts
Rust port of NeuTTS β on-device voice-cloning TTS with GGUF backbone and NeuCodec decoder
-
apr-qa-runner
Playbook executor for APR model qualification testing
-
oxillama-cli
Pure Rust LLM inference engine CLI β the sovereign alternative to llama.cpp
-
apr-qa-report
Popperian report generator and MQS scoring for APR model qualification
-
rage-quant
High-performance quantized GEMV kernels for CPU-only LLM inference. Direct dot product on Q8_0/Q6_K/Q4_K GGUF blocks with AVX2+FMA SIMD β 3.0x decode speedup.
-
inferno-ai
Enterprise AI/ML model runner with automatic updates, real-time monitoring, and multi-interface support
-
hot-loop
Running Gguf Chat-Models on Pure-Rust, Uses the Candle Backend
-
apr-qa-certify
Model certification tools and README synchronization
-
wax-bench
Benchmark types and helpers for wax
-
qts
Qwen3 TTS inference (GGUF + GGML); Rust API for host apps and gdext
-
kyro
A high-performance ML inference engine
-
qts_cli
Command-line tools for Qwen3 TTS synthesis and WAV output
-
llm_client
easiest Rust interface for local LLMs
-
modelc
Compile model weight files to standalone executable binaries
-
llama-cpp-sys-4
Low Level Bindings to llama.cpp
-
oxillama
Pure Rust LLM inference engine β the sovereign alternative to llama.cpp (meta crate)
-
tensor-man
A small utility to inspect and validate safetensors and ONNX files
-
wax-core
Core inference engine for wax, a small Candle-based local LLM runner
-
safetensors_explorer
CLI utility to inspect and explore .safetensors and .gguf files
-
apr-qa-gen
Property-based scenario generator for APR model qualification
-
oxide-rs
AI Inference library and CLI in Rust - llama.cpp style
-
candelabra
Desktop-friendly GGUF LLaMA inference wrapper for Candle and Hugging Face Hub
-
ggufy
Unified GGUF wrapper for llama.cpp and Ollama
-
mlmf
Machine Learning Model Files - Loading, saving, and dynamic mapping for ML models
-
tibet-oomllama
OomLlama β Sovereign LLM runtime with .oom format, Q2/Q4/Q8 quantization, and lazy-loading inference
-
inspector-gguf
A powerful GGUF file inspection tool with a graphical and command-line interface
-
kapsl-llm
Large language model inference with GGUF and ONNX backend support for Kapsl
-
apr-qa-cli
CLI for APR model qualification testing
-
clat
Command line assistance tool. Describe what you want in plain English;
clatgenerates a shell script and runs it. -
oxillama-wasm
WebAssembly bindings for OxiLLaMa GGUF parsing and quantization
-
oxillama-py
Python bindings for OxiLLaMa LLM inference engine
-
bare-metal-gguf
GGUF binary format parser for bare-metal LLM inference β zero-copy mmap, all quantization types
-
kwaai-inference
Inference engine for KwaaiNet - Candle-based ML runtime
-
swink-agent-local-llm
Local on-device LLM inference for swink-agent using llama.cpp
-
llama-cpp-v3
Safe and ergonomic Rust wrapper for llama.cpp with dynamic loading
-
rusty-genius-cortex
Inference engine interaction layer for rusty-genius
-
ggml-quants
GGml defined quantized data types and their quant/dequant algorithm
-
oxillama-bench
Benchmark suite for OxiLLaMa inference engine
-
offline_intelligence_cpp
C++ bindings for Offline Intelligence Library
-
oxbitnet
Run BitNet b1.58 ternary LLMs with wgpu
-
pllm
Portable LLM
-
yule-gpu
GPU compute backends: Vulkan, CUDA, Metal, and CPU SIMD fallback
-
gguf-llms
parsing GGUF (GGML Universal Format) files
-
tensorsafe2gguf
convert tensorsafe model to gguf model
-
qts_ggml
Thin safe wrappers over qts_ggml_sys for qts
-
oxibonsai
Pure Rust 1-bit LLM inference engine for PrismML Bonsai models β umbrella crate
-
yule-registry
Model registry: pull, cache, and manage verified model artifacts
-
yule-infer
Inference engine: attention, KV cache, sampling, quantization, token generation
-
llmfit-core
Core library for llmfit β hardware detection, model fitting, and provider integration
-
yule-sandbox
Cross-platform process sandboxing: seccomp, AppContainer, seatbelt
-
yule-api
Local API server: capability-token auth, streaming inference, OpenAI-compatible endpoints
-
alith-models
Load and Download LLM Models, Metadata, and Tokenizers
-
yule-verify
Cryptographic integrity verification: Merkle trees, signatures, and model manifests
-
llm_prompt
Low Level Prompt System for API LLMs and local LLMs
-
yule-attest
Cryptographic attestation: signed inference logs, audit trails
-
offline_intelligence_java
Java bindings for Offline Intelligence Library
-
gguf
A small utility to parse GGUF files
-
qts_ggml_sys
Low-level FFI bindings to ggml-org/ggml (built from vendored sources)
-
alith-prompt
LLM Prompting
-
offline_intelligence_js
JavaScript bindings for Offline Intelligence Library
-
oxibonsai-core
GGUF Q1_0_g128 loader, tensor types, and configuration for OxiBonsai
-
alith-client
The Easiest Rust Interface for Local LLMs, and an Interface for Deterministic Signals from Probabilistic LLM Vibes
-
vil_quantized
D13 - Model Quantization Runtime for VIL
-
llm_models
Load and download LLM models, metadata, and tokenizers
-
yule-core
Core types, tensor abstractions, and model metadata for the Yule inference runtime
-
aprender-serve
Pure Rust ML inference engine built from scratch - model serving for GGUF and safetensors
-
mlx-io
Tensor serialization: safetensors, GGUF, mmap loading
-
aprender-quant
K-quantization formats (Q4_K, Q5_K, Q6_K) for GGUF/APR model weights
-
localgpt-core
Core library for LocalGPT β agent, memory, config, security
-
aprender-train-inspect
SafeTensors model inspection and format conversion
-
ggufscan
Easily find and delete GGUF model files from your HDD
-
spn-native
Native model inference and storage for SuperNovae ecosystem
-
aprender-train-distill
End-to-end knowledge distillation CLI
-
entrenar-inspect
SafeTensors model inspection and format conversion
-
airframe
FP32-first inference core for Llama-family models. Pure Rust physics engine.
Try searching with DuckDuckGo.