Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
[ICCV 2023] Consistent Image Synthesis and Editing
[TOG 2024]StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter
Tiny AutoEncoder for Hunyuan Video (and other video models)
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
A collection of awesome text-to-image generation studies.
MoviiGen 1.1: Towards Cinematic-Quality Video Generative Models
Official PyTorch implementation of DiffMoE, TC-DiT, EC-DiT and Dense DiT
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
Code for the project "MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos"
LLM/VLM gaming agents and model evaluation through games.
TransNet V2: Shot Boundary Detection Neural Network
Wan: Open and Advanced Large-Scale Video Generative Models
Official implementation for "RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers" (ICML 2025) and "UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers"
Implementation of "EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer"(ICCV2025)
Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]
The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]
[ICCV 2025, Oral] TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models