Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Curated collection of papers in machine learning systems
The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.
A curated survey of database systems, design patterns, and architectural practices in modern AI systems including multi-agent frameworks, RAG pipelines, and LLM applications.
This repository contains the code for the ICLR 2026 paper “DASH: Deterministic Attention Scheduling for High-Throughput Reproducible LLM Training”, developed on top of the FlashAttention codebase.
A minimal yet professional single agent demo project that showcases the core execution pipeline and production-grade features of agents.
Accelerating MoE with IO and Tile-aware Optimizations
"Paper2Slides: From Paper to Presentation in One Click"
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
Distributed Compiler based on Triton for Parallel Systems
[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient Multi-head Latent Attention Kernels
[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.
Puzzles for learning Triton
A unified inference and post-training framework for accelerated video generation.
Large Language Model (LLM) Systems Paper List