moe
Here are 10 public repositories matching this topic...
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
-
Updated
Mar 15, 2024 - C++
🚀 Run any LLM on any hardware. 130% faster MoE inference with ExpertFlow + TurboQuant KV compression. Ollama-compatible API. Built on llama.cpp.
-
Updated
Apr 1, 2026 - C++
Production-grade LLM inference in Rust. Single binary, OpenAI-compatible, runs on Apple Silicon and CUDA.
-
Updated
Jun 11, 2026 - C++
Downstream llama.cpp TurboQuant CUDA fork with adaptive KV layout selection for long-context inference on consumer Blackwell GPUs.
-
Updated
May 1, 2026 - C++
Improve this page
Add a description, image, and links to the moe topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the moe topic, visit your repo's landing page and select "manage topics."