Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4
-
Updated
Jun 4, 2026 - C
Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4
MOE is an event-driven OS for 8/16/32-bit MCUs. MOE means "Minds Of Embedded system", It’s also the name of my lovely baby daughter 😎
Minimal, zero-dependency LLM inference in pure C11. CPU-first with NEON/AVX2 SIMD. Flash MoE (pread + LRU expert cache). TurboQuant 3-bit KV compression (8.9x less memory per session). 20+ GGUF quant formats. Compiles to WASM.
Cross‑platform inference engine for huge AI models (1B–397B). Runs on any CPU (x86_64/ARM64) with AVX2/NEON, supports dense & MoE models (Qwen, Llama, Mistral…). GPU backends (Metal, OpenCL, CUDA) coming soon. No Python, no frameworks – pure C with optional PyQt5 GUI.
Expert streaming inference engine for MoE models larger than VRAM — run 235B+ models on consumer GPUs
Add a description, image, and links to the moe topic page so that developers can more easily learn about it.
To associate your repository with the moe topic, visit your repo's landing page and select "manage topics."