moe

Here are 10 public repositories matching this topic...

uccl-project / uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

ai networking hpc amd gpu collective cuda p2p nvidia broadcom moe rdma allreduce llm kvcache

Updated Jun 10, 2026
C++

inferflow / inferflow

Star

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

bloom falcon moe gemma mistral mixture-of-experts model-quantization multi-gpu-inference m2m100 llamacpp llm-inference internlm llama2 qwen baichuan2 mixtral phi-2 deepseek minicpm

Updated Mar 15, 2024
C++

yvonwin / qwen2.cpp

Star

qwen2 and llama3 cpp implementation

nlp moe large-language-models qwen qwen2 qwen1-5

Updated Jun 7, 2024
C++

Harry-Chen / InfMoE

Sponsor

Star

Inference framework for MoE layers based on TensorRT with Python binding

inference moe tensorrt

Updated May 31, 2021
C++

tobychui / Weather-Pet-Display

Star

A simple weather display with a cute interactive desktop pet (❛◡❛✿)

weather arduino esp8266 anime display pet moe diy maker cute uart-hmi

Updated May 24, 2022
C++

MartinCrespoC / QuantumLeap---Llama.cpp-TurboQuant

Star

🚀 Run any LLM on any hardware. 130% faster MoE inference with ExpertFlow + TurboQuant KV compression. Ollama-compatible API. Built on llama.cpp.

machine-learning ai cpp gpu optimization cuda inference moe quantization rocm nvidia-gpu amd-gpu mixture-of-experts openai-api llm llama-cpp local-llm llm-inference ollama

Updated Apr 1, 2026
C++

sizzlecar / ferrum-infer-rs

Star

Production-grade LLM inference in Rust. Single binary, OpenAI-compatible, runs on Apple Silicon and CUDA.

rust metal cuda inference moe llama inference-engine mixture-of-experts apple-silicon openai-api llm qwen

Updated Jun 11, 2026
C++

kair998 / Musuka

Star

Musuka・むすか・娘化　is an application, which can exchange your shortcut to any image in a new desktop, especially moe girl character image.

windows wallpaper anime moe

Updated Jun 7, 2026
C++

craftogrammer / llama.cpp-adaptive-turboquant

Star

Downstream llama.cpp TurboQuant CUDA fork with adaptive KV layout selection for long-context inference on consumer Blackwell GPUs.

cuda inference moe quantization blackwell kv-cache long-context llama-cpp local-llm rtx-5080 sm120 turboquant

Updated May 1, 2026
C++

m1kron / flash-distributed-moe

Star

Inspired by flashDmoe paper: https://arxiv.org/abs/2506.04667 paper. Implementation for AMD's GPU ecosystem.

amd gpu moe rocm rocshmem

Updated May 6, 2026
C++

Improve this page

Add a description, image, and links to the moe topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the moe topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

moe

Here are 10 public repositories matching this topic...

uccl-project / uccl

inferflow / inferflow

yvonwin / qwen2.cpp

Harry-Chen / InfMoE

tobychui / Weather-Pet-Display

MartinCrespoC / QuantumLeap---Llama.cpp-TurboQuant

sizzlecar / ferrum-infer-rs

kair998 / Musuka

craftogrammer / llama.cpp-adaptive-turboquant

m1kron / flash-distributed-moe

Improve this page

Add this topic to your repo