Highlights
- Pro
Lists (5)
Sort Name ascending (A-Z)
Starred repositories
Vortex: Programmable Sparse Attention for Agents as Algorithm Designers
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
mKernel: fast multi-node, multi-GPU fused kernels
TokenSpeed is a speed-of-light LLM inference engine.
A lightweight inference engine supporting speculative speculative decoding (SSD).
MiroThinker is a deep research agent optimized for complex research and prediction tasks. Our latest models, MiroThinker-1.7, achieves 74.0 and 75.3 on the BrowseComp and BrowseComp Zh, respectively.
Distributed MoE in a Single Kernel [NeurIPS '25]
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Efficient Long-context Language Model Training by Core Attention Disaggregation
Accelerating MoE with IO and Tile-aware Optimizations
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
一个基于nano banana pro🍌的原生AI PPT生成应用,迈向"Vibe PPT"; 支持上传任意模板图片,上传任意素材&智能解析,一句话/大纲/页面描述自动生成PPT,口头修改指定区域、一键导出可编辑ppt - An AI-native slides generator based on nano banana pro🍌
SC'25 UltraAttn: Efficiently Parallelizing Attention through Hierarchical Context-Tiling
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.
Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding
Real-Time VLAs via Future-state-aware Asynchronous Inference.
A framework for efficient model inference with omni-modality models
[ICLR 2026]QeRL enables RL for 32B LLMs on a single H100 GPU.
[ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
An early research stage expert-parallel load balancer for MoE models based on linear programming.
A framework for few-shot evaluation of language models.
[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive