Skip to content
View PanZaifeng's full-sized avatar

Highlights

  • Pro

Block or report PanZaifeng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Curated collection of papers in machine learning systems

542 36 Updated Feb 7, 2026

The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.

Rust 185,699 108,605 Updated Apr 17, 2026

A curated survey of database systems, design patterns, and architectural practices in modern AI systems including multi-agent frameworks, RAG pipelines, and LLM applications.

2 Updated Mar 9, 2026

This repository contains the code for the ICLR 2026 paper “DASH: Deterministic Attention Scheduling for High-Throughput Reproducible LLM Training”, developed on top of the FlashAttention codebase.

Python 8 Updated Jan 31, 2026

A minimal yet professional single agent demo project that showcases the core execution pipeline and production-grade features of agents.

Python 2,439 354 Updated Feb 14, 2026

Accelerating MoE with IO and Tile-aware Optimizations

Python 635 73 Updated Apr 17, 2026

"Paper2Slides: From Paper to Presentation in One Click"

Python 3,311 434 Updated Mar 15, 2026

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

Python 1,093 157 Updated Apr 17, 2026

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,306 137 Updated Apr 17, 2026

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

Cuda 2,200 195 Updated Apr 17, 2026

Kernels, of the mega variety :)

Python 707 55 Updated Apr 17, 2026

CUDA Graph aware nvtx

C++ 2 Updated Jun 6, 2025
Python 166 18 Updated Dec 27, 2024
Python 30 3 Updated Mar 24, 2025
Python 10 3 Updated May 11, 2025

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

428 27 Updated Mar 3, 2025
Python 4 Updated Sep 30, 2024

Distributed Compiler based on Triton for Parallel Systems

Python 1,408 138 Updated Apr 17, 2026

[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference

Jupyter Notebook 51 2 Updated Jun 17, 2025

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 4,016 320 Updated Apr 17, 2026

DeepEP: an efficient expert-parallel communication library

Cuda 9,134 1,152 Updated Apr 16, 2026

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,559 1,008 Updated Apr 7, 2026

[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.

Python 227 31 Updated May 31, 2025

Puzzles for learning Triton

Jupyter Notebook 2,378 217 Updated Apr 1, 2026

A unified inference and post-training framework for accelerated video generation.

Python 3,399 318 Updated Apr 17, 2026

Large Language Model (LLM) Systems Paper List

1,927 99 Updated Apr 17, 2026
Next