-
Fudan University
- Shanghai
- https://wdrink.github.io/
Stars
PICABench: How Far Are We from Physically Realistic Image Editing?
This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA benchmark performance. It also significantly improves the quality, fine-grain…
[Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
Post-training with Tinker
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
LongLive: Real-time Interactive Long Video Generation
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs".
verl: Volcano Engine Reinforcement Learning for LLMs
Fully Open Framework for Democratized Multimodal Training
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Official pytorch implementation of "Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use"
Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.
[ICCV 2025, Oral] TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
[ICCV 2025] GameFactory: Creating New Games with Generative Interactive Videos
[ICCV'25 Best Paper Finalist] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition
[ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory
[ICCV 2025 & ICCV 2025 RIWM Outstanding Paper] Aether: Geometric-Aware Unified World Modeling
Reference PyTorch implementation and models for DINOv3
DeepVerse: 4D Autoregressive Video Generation as a World Model
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.