-
Ant Group
- Hangzhou, China
- https://scholar.google.com/citations?hl=en&user=VRsy9v8AAAAJ
Starred repositories
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
WorldPlay: Interactive World Modeling with Real-Time Latency and Geometric Consistency
[NeurIPS 2025] Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
Official Implementations for Paper - MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues
HunyuanVideo-1.5: A leading lightweight video generation model
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
MotionStream: Real-Time Video Generation with Interactive Motion Controls
SGLang is a fast serving framework for large language models and vision language models.
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Krea Realtime 14B. An open-source realtime AI video model.
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.
"ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)"
Native Multimodal Models are World Learners
Official repo for paper "Video-As-Prompt: Unified Semantic Control for Video Generation"
Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.
[Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets