Stars
FlashInfer: Kernel Library for LLM Serving
AI agents running research on single-GPU nanochat training automatically
LLM驱动的 A/H/美股智能分析器:多数据源行情 + 实时新闻 + LLM决策仪表盘 + 多渠道推送,零成本定时运行,纯白嫖. LLM-powered stock analysis system for A/H/US markets.
Gen-Searcher: Reinforcing Agentic Search for Image Generation
Wan: Open and Advanced Large-Scale Video Generative Models
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
DFlash: Block Diffusion for Flash Speculative Decoding
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
WeDLM: The fastest diffusion language model with standard causal attention and native KV cache compatibility, delivering real speedups over vLLM-optimized baselines.
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
SGLang is a high-performance serving framework for large language models and multimodal models.
Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"
[ICLR 2026] Taming large-scale few-step training with self-adversarial flows! 👏🏻
(arXiv) MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
Light Image Video Generation Inference Framework
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
Dimple, the first Discrete Diffusion Multimodal Large Language Model
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
dInfer: An Efficient Inference Framework for Diffusion Language Models
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing and Generation (CVPR2026 Highlight)''