Stars
MoonPalaceïŒæćź«ïŒæŻç± Moonshot AI æäčæéąæäŸç API è°èŻć·„ć ·ă
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
ROCm / Megatron-LM
Forked from NVIDIA/Megatron-LMOngoing research training transformer models at scale
verl: Volcano Engine Reinforcement Learning for LLMs
Fine-tuning & Reinforcement Learning for LLMs. đŠ„ Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.
đLeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginnersđ, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.đ
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
HabanaAI / vllm-fork
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Run compilers interactively from your web browser and interact with the assembly
Unified KV Cache Compression Methods for Auto-Regressive Models
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
Open-source Linux performance suite for engineersâprofiling and tuning workloads and system configurations.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
The calflops is designed to calculate FLOPsăMACs and Parameters in all various neural networks, such as Lineară CNNă RNNă GCNăTransformer(BertăLlaMA etc Large Language Model)
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.