Highlights
- Pro
-
hpc-ops Public
Forked from Tencent/hpc-opsHigh Performance LLM Inference Operator Library
C++ Other UpdatedJan 27, 2026 -
minions Public
Forked from HazyResearch/minionsBig & Small LLMs working together
Python MIT License UpdatedJan 27, 2026 -
FastVideo Public
Forked from hao-ai-lab/FastVideoA unified inference and post-training framework for accelerated video generation.
Python Apache License 2.0 UpdatedJan 16, 2026 -
Engram Public
Forked from deepseek-ai/EngramConditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
Python Apache License 2.0 UpdatedJan 12, 2026 -
ThunderKittens Public
Forked from HazyResearch/ThunderKittensTile primitives for speedy kernels
Cuda MIT License UpdatedJan 12, 2026 -
-
cutlass Public
Forked from NVIDIA/cutlassCUDA Templates and Python DSLs for High-Performance Linear Algebra
C++ Other UpdatedJan 9, 2026 -
-
CUDA-L2 Public
Forked from deepreinforce-ai/CUDA-L2CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning
Cuda MIT License UpdatedJan 8, 2026 -
verl Public
Forked from verl-project/verlverl: Volcano Engine Reinforcement Learning for LLMs
Python Apache License 2.0 UpdatedJan 6, 2026 -
DeepGEMM Public
Forked from deepseek-ai/DeepGEMMDeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Cuda MIT License UpdatedJan 6, 2026 -
dflash Public
Forked from z-lab/dflashBlock Diffusion for Ultra-Fast Speculative Decoding
Python MIT License UpdatedJan 5, 2026 -
ArcticInference Public
Forked from snowflakedb/ArcticInferenceArcticInference: vLLM plugin for high-throughput, low-latency inference
Python Apache License 2.0 UpdatedDec 30, 2025 -
DeepEP Public
Forked from deepseek-ai/DeepEPDeepEP: an efficient expert-parallel communication library
Cuda MIT License UpdatedDec 29, 2025 -
Megatron-LM Public
Forked from NVIDIA/Megatron-LMOngoing research training transformer models at scale
Python Other UpdatedDec 28, 2025 -
mini-sglang Public
Forked from sgl-project/mini-sglangA compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Python UpdatedDec 26, 2025 -
DeepSpeed Public
Forked from deepspeedai/DeepSpeedDeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Python Apache License 2.0 UpdatedDec 24, 2025 -
sionna-rk Public
Forked from NVlabs/sionna-rkSionna Research Kit: A GPU-Accelerated Research Platform for AI-RAN
Jupyter Notebook Other UpdatedDec 19, 2025 -
FlashMLA Public
Forked from deepseek-ai/FlashMLAFlashMLA: Efficient Multi-head Latent Attention Kernels
C++ MIT License UpdatedDec 15, 2025 -
nanoGPT Public
Forked from karpathy/nanoGPTThe simplest, fastest repository for training/finetuning medium-sized GPTs.
Python MIT License UpdatedNov 12, 2025 -
nano-vllm Public
Forked from GeeeekExplorer/nano-vllmNano vLLM
Python MIT License UpdatedNov 3, 2025 -
Advanced-Progress-Bars Public
Forked from cactuzhead/Advanced-Progress-BarsObsidian plugin to create custom progress bars
TypeScript MIT License UpdatedOct 3, 2025 -
-
DistServe Public
Forked from LLMServe/DistServeDisaggregated serving system for Large Language Models (LLMs).
Jupyter Notebook Apache License 2.0 UpdatedApr 6, 2025 -
-
marlin Public
Forked from IST-DASLab/marlinFP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Python Apache License 2.0 UpdatedSep 4, 2024 -