-
-
-
hpc-ops Public
Forked from Tencent/hpc-opsHigh Performance LLM Inference Operator Library
C++ Other UpdatedJun 11, 2026 -
OSCAR Public
Forked from FutureMLS-Lab/OSCAROSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
Python UpdatedJun 3, 2026 -
mKernel Public
Forked from uccl-project/mKernelmKernel: fast multi-node, multi-GPU fused kernels
Cuda MIT License UpdatedJun 2, 2026 -
uccl Public
Forked from uccl-project/ucclUCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
C++ Apache License 2.0 UpdatedJun 1, 2026 -
slime Public
Forked from THUDM/slimeslime is an LLM post-training framework for RL Scaling.
Python Apache License 2.0 UpdatedMay 30, 2026 -
miles Public
Forked from radixark/milesMiles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
Python Apache License 2.0 UpdatedMay 27, 2026 -
AI-Infra-Auto-Driven-SKILLS Public
Forked from BBuf/AI-Infra-Auto-Driven-SKILLSPython UpdatedMay 26, 2026 -
RLinf Public
Forked from RLinf/RLinfRLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI
Python Apache License 2.0 UpdatedMay 26, 2026 -
AReaL Public
Forked from areal-project/AReaLThe RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.
Python Apache License 2.0 UpdatedMay 26, 2026 -
-
dynamo Public
Forked from ai-dynamo/dynamoA Datacenter Scale Distributed Inference Serving Framework
Rust Other UpdatedMay 21, 2026 -
flashinfer Public
Forked from flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
Python Apache License 2.0 UpdatedMay 19, 2026 -
sglang Public
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Python Apache License 2.0 UpdatedMay 19, 2026 -
vllm-omni Public
Forked from vllm-project/vllm-omniA framework for efficient model inference with omni-modality models
Python Apache License 2.0 UpdatedMay 13, 2026 -
InferenceX Public
Forked from SemiAnalysisAI/InferenceXOpen Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3
Python Apache License 2.0 UpdatedMay 12, 2026 -
mlx Public
Forked from ml-explore/mlxMLX: An array framework for Apple silicon
C++ MIT License UpdatedMay 9, 2026 -
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedMay 8, 2026 -
ds4 Public
Forked from antirez/ds4DeepSeek 4 Flash local inference engine for Metal
C MIT License UpdatedMay 8, 2026 -
tokenspeed Public
Forked from lightseekorg/tokenspeedTokenSpeed is a speed-of-light LLM inference engine.
Python MIT License UpdatedMay 7, 2026 -
lucebox-hub Public
Forked from Luce-Org/lucebox-hubLucebox optimization hub: hand-tuned LLM inference, built for specific consumer hardware.
C++ MIT License UpdatedMay 3, 2026 -
quant_kernel_benchmarks Public
Forked from neuralmagic/quant_kernel_benchmarksBenchmarking code for running quantized kernels from vLLM and other libraries
Python UpdatedMay 2, 2026 -
marlin Public
Forked from IST-DASLab/marlinFP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Python Apache License 2.0 UpdatedMay 2, 2026 -
FlashQLA Public
Forked from QwenLM/FlashQLAhigh-performance linear attention kernel library built on TileLang
Python MIT License UpdatedApr 29, 2026 -
le-wm Public
Forked from lucas-maes/le-wmOfficial code base for LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
Python MIT License UpdatedApr 27, 2026 -
-
tilelang Public
Forked from tile-ai/tilelangDomain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Python Other UpdatedApr 26, 2026 -
dflash Public
Forked from z-lab/dflashDFlash: Block Diffusion for Flash Speculative Decoding
Python MIT License UpdatedApr 26, 2026 -
SpecForge Public
Forked from sgl-project/SpecForgeTrain speculative decoding models effortlessly and port them smoothly to SGLang serving.
Python MIT License UpdatedApr 23, 2026