#gguf

  1. llmfit

    Right-size LLM models to your system hardware. Interactive TUI and CLI to match models against available RAM, CPU, and GPU.

    v0.9.22 850 #gguf #llm-inference #model #llm #tui
  2. llama-cpp-4

    llama.cpp bindings for Rust

    v0.2.54 1.1K #gguf #llama #inference #ggml #llm
  3. gguf-utils

    handling gguf files

    v0.4.1 600 #gguf #llama-cpp #ggml
  4. oxibonsai-cli

    Pure Rust 1-bit LLM inference engine for PrismML Bonsai models β€” CLI

    v0.1.3 #gguf #inference-engine #llm-inference #bonsai #ternary #metal #apple-silicon #1-bit #llama #llama-cpp
  5. ggus

    GGUF in RustπŸ¦€

    v0.5.1 11K #gguf #llama-cpp #ggml
  6. forgellm-cli

    CLI tool for the ForgeLLM compiler

    v0.7.6 #gguf #llm-inference #compiler #aot #llm
  7. forgellm-codegen-metal

    Metal GPU code generation for Apple Silicon inference in ForgeLLM

    v0.7.6 #gguf #llm-inference #compiler #aot #llm
  8. forgellm-frontend

    Model parsing (GGUF, SafeTensors) and IR construction for ForgeLLM

    v0.7.6 #gguf #llm-inference #compiler #aot #llm
  9. forgellm-codegen-cpu

    CPU code generation (x86 AVX2/512, ARM NEON) for ForgeLLM

    v0.7.6 #gguf #llm-inference #compiler #aot #llm
  10. rust-hf-downloader

    TUI and CLI for searching and downloading HuggingFace models

    v1.4.0 #tui #downloader #hugging-face #model #rate-limiting #incomplete-download #gguf #authentication #download-speed #sha-256
  11. ztensor

    Unified, zero-copy, and safe I/O for deep learning formats

    v1.2.3 #safetensors #onnx #gguf #deep-learning #zero-copy #numpy #mmap #npz #zt #pickle
  12. oxillama-gguf

    GGUF v3 parser and tensor loader for OxiLLaMa

    v0.1.3 #gguf #llm-inference #llm #model-format
  13. llama-mcp-server

    Local LLM inference MCP server powered by llama.cpp

    v0.1.1 #gguf #llama #llm-inference #mcp #mcp-server #json-rpc #local-llm #web-server #mcp-model #llama-cpp
  14. oxillama-arch

    Model architecture implementations β€” LLaMA, Qwen3, Mistral, Gemma, Phi

    v0.1.3 #gguf #llama #llm-inference #neural-network #llm
  15. rig-llama-cpp

    Rig completion provider for local GGUF models via llama.cpp, with streaming, tool calling, reasoning, and multimodal (mtmd) support

    v0.1.4 #gguf #inference #llama-cpp #llm #rig
  16. llama-gguf

    A high-performance Rust implementation of llama.cpp - LLM inference engine with full GGUF support

    v0.14.0 #gguf #llama #llm-inference #llm
  17. large

    Rust LLM inference implementation

    v0.2.0 #gguf #llm-inference #qwen3 #top-p #bpe #dot-product #gpt-2 #byte-level #memory-map #nucleus
  18. gguf-rs-lib

    reading and writing GGUF (GGML Universal Format) files

    v0.2.5 2.8K #gguf #ggml #machine-learning #parser
  19. llama-rs

    A high-performance Rust implementation of llama.cpp - LLM inference engine with full GGUF support

    v0.15.1 #gguf #llm-inference #llm
  20. forgellm-codegen-gpu

    GPU code generation via wgpu/WGSL for ForgeLLM

    v0.7.6 #gguf #llm-inference #compiler #aot #llm
  21. onde-cli

    Terminal UI for signing up, signing in, and managing your Onde Inference account

    v0.3.1 #gguf #tui #inference
  22. qlora-rs

    4-bit quantized LoRA (QLoRA) implementation with dual GGUF and Candle native export for Rust

    v1.0.5 #gguf #lora #4bit
  23. yule

    Local AI inference runtime β€” verified models, sandboxed execution, signed audit logs

    v0.1.0 #gguf #artificial-intelligence #inference #vulkan #audit #model-inference #sandboxed
  24. forgellm-codegen-wasm

    WASM + WebGPU code generation for ForgeLLM

    v0.7.6 #gguf #llm-inference #compiler #aot #llm
  25. llm_hunter

    forensic research of LLM gguf files and more

    v0.3.5 #gguf #binary-analysis #llm #forensic-analysis #llm-forensics
  26. voxtral-micro

    Voxtral Micro - Minimal text-to-speech with Q4 GGUF quantization

    v1.0.0 #gguf #text-to-speech #model #voxtral #q4 #euler #gb #tts-engine #audio-buffer #quantization
  27. forgellm-optimizer

    Graph-level optimizations for ForgeLLM (fusion, layout, quantization, memory planning)

    v0.7.6 #gguf #llm-inference #compiler #aot #llm
  28. sonr

    High-performance semantic search tool for local codebases

    v0.1.7 #semantic-search #tool-for-local #codebase #semantic-search-tool #mcp #gguf #model-context-protocol #llama
  29. wax-llm

    Command-line LLM inference with Candle, safetensors, GGUF, and Metal support

    v0.1.0 #gguf #safetensors #llm-inference #model #metal #candle #wax #top-p #top-k #cuda
  30. forgellm-runtime

    Minimal runtime for ForgeLLM (KV cache, sampling, tokenizer, API server)

    v0.7.6 #gguf #llm-inference #compiler #aot #llm
  31. ferrum-quantization

    Weight-format abstraction (Dense / GPTQ / AWQ / GGUF) for Ferrum models

    v0.7.3 #gguf #llama #ferrum #apple-silicon #inference-engine #open-ai-compatible #llm-inference #metal #dense #moe
  32. shimmytok

    Pure Rust tokenizer for GGUF models with llama.cpp compatibility (SentencePiece + BPE + WPM + UGM + RWKV)

    v0.7.0 500 #gguf #llama #sentence-piece #llm
  33. a3s-power

    A3S Power β€” Privacy-preserving LLM inference for TEE environments

    v0.4.2 #gguf #llm-inference #tee #llm
  34. neutts

    Rust port of NeuTTS β€” on-device voice-cloning TTS with GGUF backbone and NeuCodec decoder

    v0.1.1 130 #gguf #text-to-speech #voice-cloning #speech-synthesis
  35. apr-qa-runner

    Playbook executor for APR model qualification testing

    v0.1.0 #gguf #safetensors #apr #model-format #playbook #parallel-execution #ci #model-family #qualification #format-conversion
  36. oxillama-cli

    Pure Rust LLM inference engine CLI β€” the sovereign alternative to llama.cpp

    v0.1.3 #gguf #llama #llm-inference #llm
  37. apr-qa-report

    Popperian report generator and MQS scoring for APR model qualification

    v0.1.0 #gguf #safetensors #apr #model #report-generator #qualification #model-family #playbook #certification #ci-cd
  38. rage-quant

    High-performance quantized GEMV kernels for CPU-only LLM inference. Direct dot product on Q8_0/Q6_K/Q4_K GGUF blocks with AVX2+FMA SIMD β€” 3.0x decode speedup.

    v0.1.0 #gguf #llm-inference #llm #inference
  39. inferno-ai

    Enterprise AI/ML model runner with automatic updates, real-time monitoring, and multi-interface support

    v0.10.3 #gguf #onnx #ml #onnx-inference #inference
  40. hot-loop

    Running Gguf Chat-Models on Pure-Rust, Uses the Candle Backend

    v0.5.3 #gguf #inference #llm #model #session #model-session
  41. apr-qa-certify

    Model certification tools and README synchronization

    v0.1.0 #gguf #safetensors #certification #model #csv #apr #grade #mvp #playbook #model-family
  42. wax-bench

    Benchmark types and helpers for wax

    v0.1.0 #benchmark #wax #gguf #safetensors #helper #llama #llm #config-json #mlx
  43. qts

    Qwen3 TTS inference (GGUF + GGML); Rust API for host apps and gdext

    v0.1.0 #gguf #onnx #text-to-speech #qwen3 #ggml #sample-rate #vocoder #tts-engine #gdext #synthesize
  44. kyro

    A high-performance ML inference engine

    v0.1.1 #inference-engine #ml #model #prefix #token #gguf #paged-attention #metrics #distributed #gpu
  45. qts_cli

    Command-line tools for Qwen3 TTS synthesis and WAV output

    v0.1.0 #text-to-speech #qwen3 #wav #model #vocoder #ggml #tui #onnx #gguf
  46. llm_client

    easiest Rust interface for local LLMs

    v0.0.7 330 #gguf #llama-cpp #openai #anthropic #llm
  47. modelc

    Compile model weight files to standalone executable binaries

    v0.1.1 #safetensors #gguf #onnx #executable #weight #on-disk
  48. llama-cpp-sys-4

    Low Level Bindings to llama.cpp

    v0.2.54 800 #gguf #inference #ggml #llama #llm
  49. oxillama

    Pure Rust LLM inference engine β€” the sovereign alternative to llama.cpp (meta crate)

    v0.1.3 #gguf #llama #llm-inference #llm #pure-rust
  50. tensor-man

    A small utility to inspect and validate safetensors and ONNX files

    v0.4.2 290 #safetensors #gguf #onnx #machine-learning-models #sign-verify #docker #pytorch
  51. wax-core

    Core inference engine for wax, a small Candle-based local LLM runner

    v0.1.0 #gguf #safetensors #llama #inference-engine #llm #local-llm #llm-inference #mlx #top-p #candle
  52. safetensors_explorer

    CLI utility to inspect and explore .safetensors and .gguf files

    v0.2.0 360 #safetensors #gguf #tensor #machine-learning-models #file-explorer #explore #ggml
  53. apr-qa-gen

    Property-based scenario generator for APR model qualification

    v0.1.0 #gguf #safetensors #apr #generator #model #qualification #property-testing #playbook #qa #qwen
  54. oxide-rs

    AI Inference library and CLI in Rust - llama.cpp style

    v0.1.16 #gguf #llm-inference #llm #candle #machine-learning
  55. candelabra

    Desktop-friendly GGUF LLaMA inference wrapper for Candle and Hugging Face Hub

    v0.2.0 #gguf #llama #candle #llm
  56. ggufy

    Unified GGUF wrapper for llama.cpp and Ollama

    v0.2.0 #gguf #llama #ollama #llama-cpp #unified #cache-directory #multimodal
  57. mlmf

    Machine Learning Model Files - Loading, saving, and dynamic mapping for ML models

    v0.2.0 #gguf #safetensors #transformer-model #machine-learning #model-files
  58. tibet-oomllama

    OomLlama β€” Sovereign LLM runtime with .oom format, Q2/Q4/Q8 quantization, and lazy-loading inference

    v0.1.0 #gguf #llm-inference #oom #llm #oomllama
  59. inspector-gguf

    A powerful GGUF file inspection tool with a graphical and command-line interface

    v0.3.1 #gguf #llm #llm-model #candle
  60. kapsl-llm

    Large language model inference with GGUF and ONNX backend support for Kapsl

    v0.1.0 #gguf #onnx #inference-engine #back-end #model-inference #shared-memory #kapsl
  61. apr-qa-cli

    CLI for APR model qualification testing

    v0.1.0 #gguf #safetensors #apr #model #ci #playbook #html-reports #qualification #property-testing #parallel-execution
  62. clat

    Command line assistance tool. Describe what you want in plain English; clat generates a shell script and runs it.

    v0.1.2 #generator #model #script #plain-english #shell #command-line-tool #assistance #lm-studio #gguf #git
  63. oxillama-wasm

    WebAssembly bindings for OxiLLaMa GGUF parsing and quantization

    v0.1.3 #gguf #llama #llm #wasm
  64. oxillama-py

    Python bindings for OxiLLaMa LLM inference engine

    v0.1.3 #gguf #llama #llm-inference #llm
  65. bare-metal-gguf

    GGUF binary format parser for bare-metal LLM inference β€” zero-copy mmap, all quantization types

    v0.7.1 #gguf #llm-inference #llm
  66. kwaai-inference

    Inference engine for KwaaiNet - Candle-based ML runtime

    v0.4.63 #gguf #ml #candle #transformer
  67. swink-agent-local-llm

    Local on-device LLM inference for swink-agent using llama.cpp

    v0.9.0 #gguf #llm-inference #local-llm #llm #embedding
  68. llama-cpp-v3

    Safe and ergonomic Rust wrapper for llama.cpp with dynamic loading

    v0.1.7 #llama #llama-cpp #gguf #loading #back-end #dynamic-loading #sampler #dll #cache-back-end #github-api
  69. rusty-genius-cortex

    Inference engine interaction layer for rusty-genius

    v0.1.6 #gguf #llama-cpp #llm #inference
  70. ggml-quants

    GGml defined quantized data types and their quant/dequant algorithm

    v0.1.0 34K #ggml #gguf #llama-cpp
  71. oxillama-bench

    Benchmark suite for OxiLLaMa inference engine

    v0.1.3 #gguf #benchmark #llm-inference #llm #performance
  72. offline_intelligence_cpp

    C++ bindings for Offline Intelligence Library

    v0.1.2 #bindings-for-offline #context #onnx #gguf #safetensors #memory-optimization #memory-search #conversation #artificial-intelligence #cross-platform
  73. oxbitnet

    Run BitNet b1.58 ternary LLMs with wgpu

    v0.5.2 #gguf #bit-net #ternary #llm #wgpu #cache #wgsl #bitnet #chat-template #top-k
  74. pllm

    Portable LLM

    v0.4.1 500 #gguf #inference #llm-inference #portable #transformer #llama2
  75. yule-gpu

    GPU compute backends: Vulkan, CUDA, Metal, and CPU SIMD fallback

    v0.1.0 #gguf #artificial-intelligence #yule #vulkan #gpu #gpu-compute #metal #model-inference #vulkan-back-end
  76. gguf-llms

    parsing GGUF (GGML Universal Format) files

    v0.0.2 #gguf #ggml #parser #machine-learning
  77. tensorsafe2gguf

    convert tensorsafe model to gguf model

    v0.1.0 #gguf #model
  78. qts_ggml

    Thin safe wrappers over qts_ggml_sys for qts

    v0.1.0 #qts #ggml #onnx #gguf #wrapper #safe-wrapper #text-to-speech #python-uv #godot #qwen3
  79. oxibonsai

    Pure Rust 1-bit LLM inference engine for PrismML Bonsai models β€” umbrella crate

    v0.1.3 #gguf #llm-inference #1-bit #llm #bonsai
  80. yule-registry

    Model registry: pull, cache, and manage verified model artifacts

    v0.1.0 #gguf #artificial-intelligence #yule #vulkan #inference #llama #model-inference
  81. yule-infer

    Inference engine: attention, KV cache, sampling, quantization, token generation

    v0.1.0 #gguf #inference-engine #artificial-intelligence #llama #yule #vulkan #token-generation #gpu #model-inference #vulkan-back-end
  82. llmfit-core

    Core library for llmfit β€” hardware detection, model fitting, and provider integration

    v0.9.22 1.6K #gguf #model-fitting #llm-inference #gpu #llm
  83. yule-sandbox

    Cross-platform process sandboxing: seccomp, AppContainer, seatbelt

    v0.1.0 #gguf #sandbox #artificial-intelligence #yule #seccomp #vulkan #seatbelt #cross-platform #gpu #model-inference
  84. yule-api

    Local API server: capability-token auth, streaming inference, OpenAI-compatible endpoints

    v0.1.0 #gguf #artificial-intelligence #yule #inference #server-api #open-ai-compatible #vulkan #model-inference #local-api #authentication
  85. alith-models

    Load and Download LLM Models, Metadata, and Tokenizers

    v0.4.3 #gguf #model #tokenize #hugging-face #metadata #llm #embedding #artificial-intelligence
  86. yule-verify

    Cryptographic integrity verification: Merkle trees, signatures, and model manifests

    v0.1.0 #gguf #artificial-intelligence #signature-verification #yule #merkle-tree #integrity-verification #vulkan #model-inference
  87. llm_prompt

    Low Level Prompt System for API LLMs and local LLMs

    v0.0.3 #gguf #local-llm #chat-template #llm-client #llm-token #prompt-tokens #llm-interface #llm-model #token-count
  88. yule-attest

    Cryptographic attestation: signed inference logs, audit trails

    v0.1.0 #gguf #artificial-intelligence #yule #inference #attestation #vulkan #model-inference #audit #cryptography #trails
  89. offline_intelligence_java

    Java bindings for Offline Intelligence Library

    v0.1.2 #java #jni #java-bindings #bindings-for-offline #context #onnx #gguf #safetensors #memory-optimization #memory-search
  90. gguf

    A small utility to parse GGUF files

    v0.1.2 1.1K #artificial-intelligence #ai-model #metadata-parser #parser #metadata
  91. qts_ggml_sys

    Low-level FFI bindings to ggml-org/ggml (built from vendored sources)

    v0.1.0 #ggml #qts #source #onnx #gguf #text-to-speech #godot #python-uv #qwen3 #vocoder
  92. alith-prompt

    LLM Prompting

    v0.4.3 #llm-prompt #prompting #openai #prompt-tokens #prompt-generation #prompt-context #llm-token #hash-map #gguf #artificial-intelligence
  93. offline_intelligence_js

    JavaScript bindings for Offline Intelligence Library

    v0.1.2 #javascript #offline-intelligence #bindings-for-offline #js-bindings #javascript-bindings #onnx #gguf #safetensors #napi #artificial-intelligence
  94. oxibonsai-core

    GGUF Q1_0_g128 loader, tensor types, and configuration for OxiBonsai

    v0.1.3 #gguf #llm-inference #1-bit #llm #inference
  95. alith-client

    The Easiest Rust Interface for Local LLMs, and an Interface for Deterministic Signals from Probabilistic LLM Vibes

    v0.4.3 #gguf #anthropic #openai #llm #api-bindings
  96. vil_quantized

    D13 - Model Quantization Runtime for VIL

    v0.4.0 #gguf #vil #quantization #model #quantized #distributed-systems #model-loading #candle #ggml #zero-copy
  97. llm_models

    Load and download LLM models, metadata, and tokenizers

    v0.0.3 310 #gguf #model #tokenize #metadata #llm #artificial-intelligence #candle
  98. Try searching with DuckDuckGo.

  99. yule-core

    Core types, tensor abstractions, and model metadata for the Yule inference runtime

    v0.1.0 #gguf #artificial-intelligence #inference #model-inference #vulkan
  100. aprender-serve

    Pure Rust ML inference engine built from scratch - model serving for GGUF and safetensors

    v0.33.0 1.0K #gguf #model-serving #machine-learning
  101. mlx-io

    Tensor serialization: safetensors, GGUF, mmap loading

    v0.1.0 #safetensors #tensor #mmap #gguf #save #serialization
  102. aprender-quant

    K-quantization formats (Q4_K, Q5_K, Q6_K) for GGUF/APR model weights

    v0.31.2 850 #gguf #quantization #llm #neural-network #machine-learning
  103. localgpt-core

    Core library for LocalGPT β€” agent, memory, config, security

    v0.3.6 #artificial-intelligence #config #gguf #agent #openai #sandbox #bevy #logging #embedding #building-block
  104. aprender-train-inspect

    SafeTensors model inspection and format conversion

    v0.31.2 #gguf #format-conversion #inspection #aprender #safe-tensors #apr
  105. ggufscan

    Easily find and delete GGUF model files from your HDD

    v0.1.1 #gguf #scan #cleanup #utilities
  106. spn-native

    Native model inference and storage for SuperNovae ecosystem

    v0.2.0 #gguf #hugging-face #inference #llm-inference #llm #supernovae
  107. aprender-train-distill

    End-to-end knowledge distillation CLI

    v0.31.2 #gguf #distillation #end-to-end #aprender #knowledge #progressive #attention #student #teacher #hugging-face
  108. entrenar-inspect

    SafeTensors model inspection and format conversion

    v0.1.0 #gguf #model-format #format-conversion #entrenar #inspection #safe-tensors #apr #quantization #aprender
  109. airframe

    FP32-first inference core for Llama-family models. Pure Rust physics engine.

    v0.0.1 #gguf #llama #llm #rust #llm-inference