Stars
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Supercharge Your LLM with the Fastest KV Cache Layer
LLM serving cluster simulator
Simulator for LLM inference on an abstract 3D AIMC-based accelerator
A large-scale simulation framework for LLM inference
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
Awesome LLM compression research papers and tools.
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
Official Repository of Absolute Zero Reasoner
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
A live stream development of RL tunning for LLM agents
DeepEP: an efficient expert-parallel communication library
A very simple GRPO implement for reproducing r1-like LLM thinking.
Curated collection of papers in machine learning systems
TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)
Fully open data curation for reasoning models
Democratizing Reinforcement Learning for LLMs
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
📰 Must-read papers and blogs on Speculative Decoding ⚡️
Scalable data pre processing and curation toolkit for LLMs
A series of technical report on Slow Thinking with LLM
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
Performance Estimates for Transformer AI Models in Science
A recipe for online RLHF and online iterative DPO.