Highlights
- Pro
Stars
A high-throughput and memory-efficient inference and serving engine for LLMs
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Efficient Triton Kernels for LLM Training
微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
Userspace eBPF runtime for Observability, Network, GPU & General Extensions Framework
A Survey of Reinforcement Learning for Large Reasoning Models
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…
MoonPalace(月宫)是由 Moonshot AI 月之暗面提供的 API 调试工具。
Video+code lecture on building nanoGPT from scratch
Tiny-DeepSpeed, a minimalistic re-implementation of the DeepSpeed library
siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems
PKU-DAIR / Hetu
Forked from Hsword/HetuA high-performance distributed deep learning system targeting large-scale and automated distributed training.
[ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Ring attention implementation with flash attention
Principles and Methodologies for Serial Performance Optimization (OSDI' 25)
Visualize and post-hoc analyze RL training for debugging and understanding
Allow torch tensor memory to be released and resumed later
SkyRL: A Modular Full-stack RL Library for LLMs