Stars
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
[SIGGRAPH'24] CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization
Official implementation of HPSv3: Towards Wide-Spectrum Human Preference Score (ICCV2025)
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
Wan: Open and Advanced Large-Scale Video Generative Models
Official PyTorch implementation for "Large Language Diffusion Models"
[NeurIPS 2025 Spotlight] FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
The author's implementation for the ICML 2024 paper.
[CVPR 2025] Implementation of "Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models"
[ICLR 2025] Code for the paper "Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning"
Official implementation of ICML 2025 Oral 🏆 paper "Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection".
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
[TMLR 2025🔥] A survey for the autoregressive models in vision.
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Official repository of In-Context LoRA for Diffusion Transformers
The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
[ECCV 2024] Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion