Stars
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer
Code of the paper: A Recipe for Watermarking Diffusion Models
Official Implementation of ReCo: Region-Constraint In-Context Generation for Instructional Video Editing
RePlan: Reasoning-Guided Region Planning for Complex Instruction-Based Image Editing
We present FlashPortrait, an end-to-end video diffusion transformer capable of synthesizing ID-preserving, infinite-length videos while achieving up to 6$\times$ acceleration in inference speed.
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
Official Implementation of "MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives"
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning
V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties
[NeurIPS 2025] Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Official repo for paper "Video-As-Prompt: Unified Semantic Control for Video Generation"
Offical Implementation of SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
Finetune HunyuanImage 3.0, a 80B unified understanding and generation model
Context-Aware Diffusion for Character-Consistent Text-to-Video Generation
Official repository for the paper "MICo-150K: A Comprehensive Dataset for Multi-Image Composition".
OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing
This project is the official implementation of "UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation"
[AAAI2026] Bring Your Dreams to Life: Continual Text-to-Video Customization