-
Tsinghua University
- Beijing, China
Stars
🚀🚀 Efficient implementations of Native Sparse Attention
A powerful toolkit for compressing large models including LLM, VLM, and video generation models.
Development repository for the Triton language and compiler
Propositions of solutions to the exercises from Terence Tao's textbooks, Analysis I & II. Mirrored from https://gitlab.com/f-santos/taoanalysissolutions
Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Large Language Models.
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
MSCCL++: A GPU-driven communication stack for scalable AI applications
slime is an LLM post-training framework for RL Scaling.
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.
This is a repo to track the latest autoregressive visual generation papers.
The missing star history graph of GitHub repos - https://star-history.com
Distributed query engine providing simple and reliable data processing for any modality and scale
A compiler for the SYSY language (a subset of C). My homework for the course "compiler principles"