Stars
SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles
The official implementation of T3D: T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization
[ICML 2026] code & model for arxiv paper "Autoregressive Image Generation with Masked Bit Modeling"
[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
Tile primitives for speedy kernels
DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
🔥 How to efficiently and effectively compress the CoTs or directly generate concise CoTs during inference while maintaining the reasoning performance is an important topic!
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
Paper list for Efficient Reasoning.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Strong and Open Vision Language Assistant for Mobile Devices
Sky-T1: Train your own O1 preview model within $450
📚 Collection of token-level model compression resources.
[ICLR2025] Accelerating Diffusion Transformers with Token-wise Feature Caching
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
SpeeD: A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
Official inference repo for FLUX.1 models
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.