Stars
MatAnyone 2: Scaling Video Matting via a Learned Quality Evaluator
[ICCV 2025] FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing
Lets make video diffusion practical!
TradingAgents: Multi-Agents LLM Financial Trading Framework
[ICML 2025] Official PyTorch Implementation of "History-Guided Video Diffusion"
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation [Siggraph Asian 2025]
ObjectClear: Complete Object Removal via Object-Effect Attention
MAGI-1: Autoregressive Video Generation at Scale
Official Implementation of paper "MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion"
[CVPR 2025] MatAnyone: Stable Video Matting with Consistent Memory Propagation
The official implementation of CVPR'25 Oral paper "Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise"
[NeurIPS 2024] Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation
[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
[ICCV'25 Best Paper Finalist] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
A unified inference and post-training framework for accelerated video generation.
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
[CVPR 2025] 3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement
Wan: Open and Advanced Large-Scale Video Generative Models
SkyReels V1: The first and most advanced open-source human-centric video foundation model
[CVPR 2025] Official implementation of the paper "Generative Inbetweening through Frame-wise Conditions-Driven Video Generation"
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos"