CoRL 2025, Best Student Paper Award.
[project page] [arxiv] [proceedings]
Visual Imitation Enables Contextual Humanoid Control.
- Sep 30, 2025: Videomimic won the Best Student Paper Award.
- Sep 15, 2025: Simulation code and preliminary sim2real code released.
- Jul 6, 2025: Initial real-to-sim pipeline release.
VideoMimic’s real-to-sim pipeline reconstructs 3D environments and human motion from single-camera videos and retargets the motion to humanoid robots for imitation learning. It extracts human poses in world coordinates, maps them to robot configurations, and reconstructs environments as pointclouds later converted to meshes.
Provides sim training pipeline. See readme for details. It proceeds in 4 stages including motion capture pretraining, scene-conditioned tracking, distillation, and RL finetuning.
Provides real world deployment pipeline. See readme for details. We provide a C++ file which you can compile to a binary to run on your real robot using torchscript-exported checkpoints.
Uploaded here. Note that individual videos are provided as sequences of jpegs rather than encoded mp4s.
[BibTex]
@inproceedings{allshire2025visual,
title={Visual Imitation Enables Contextual Humanoid Control},
author={Allshire, Arthur and Choi, Hongsuk and Zhang, Junyi and McAllister, David and Zhang, Anthony and Kim, Chung Min and Darrell, Trevor and Abbeel, Pieter and Malik, Jitendra and Kanazawa, Angjoo},
booktitle={Proceedings of The Conference on Robot Learning},
series={Proceedings of Machine Learning Research},
year={2025}
}