-
The Chinese University of Hong Kong, Shenzhen
- Shenzhen, China
-
23:51
(UTC -12:00) - https://robbinw.github.io/
- https://scholar.google.com/citations?user=u2_lz64AAAAJ&hl=zh-CN
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
WorldArena: A Unified Benchmark for Evaluating Perception and Functional Utility of Embodied World Models
Official codebase for Fast-WAM: Do World Action Models Need Test-time Future Imagination?
[CVPR'26] Semi-Supervised Conformal Prediction With Unlabeled Nonconformity Score
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards
PAct: Part-Decomposed Single-View Articulated Object Generation
Causal video-action world model for generalist robot control
(arXiv) MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Reference PyTorch implementation and models for DINOv3
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
[ICML 2025] Official PyTorch Implementation of "History-Guided Video Diffusion"
AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation
Official repo for vidar and vidarc: video foundation model for robotics.
WoW (World-Omniscient World Model) is a generative world model trained on 2 million robotic interaction trajectories, designed to imagine, reason, and act in the physical world. Unlike passive vide…
Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment
A general fine-tuning kit geared toward image/video/audio diffusion models.
A pipeline parallel training script for diffusion models.
Enjoy the magic of Diffusion models!
Wan: Open and Advanced Large-Scale Video Generative Models
[ICRA 2026] VITRA: Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Pusa: Thousands Timesteps Video Diffusion Model
Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
An end-to-end, GPU-accelerated, and modular platform for building generalized Embodied Intelligence.
Official code of Motus: A Unified Latent Action World Model
Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels with Hunyuan3D World Model