[ICCV'25] MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips
[ Project Page ] [ Paper ] [ SupMat ] [ ArXiv ] [ Video ]
Authors: Shibo Wang, Haonan He, Maria Parelli, Christoph Gebhardt, Zicong Fan, Jie Song
- 2025.11.04: MagicHOI v1.0.0 is released!
- 2025.10.18: MagicHOI beta is released!
- 2025.6.26: MagicHOI is accepted to ICCV'25!
This repository accompanies MagicHOI, a method for reconstructing hands and objects from short monocular videos by leveraging novel-view synthesis priors to regularize occluded object regions.
- Download resources: Instructions for obtaining in-the-wild videos from MagicHOI and the corresponding preprocessed datasets.
- Data preparation: Scripts for preprocessing and training models on your own custom videos.
- Interactive viewer: A tool to visualize and interact with the model’s predictions.
- Evaluation tools: Code to evaluate performance and compare results between MagicHOI and SOTA methods on the HO3D dataset.
- Reconstruction framework: A complete framework for reconstructing dynamic hand–object interactions using novel view synthesis priors.
- Object model training code
- Hand-object alignment code
- Evaluation code
- Result visualization
- Custom dataset
- In-the-wild dataset
-
Get a copy of the code
git clone git@github.com:byran-wang/MagicHOI.git cd MagicHOI; git submodule update --init --recursive
-
Set up environments
- I'd recommend having at least 24GB of system RAM for training.
- Follow the instructions here:
docs/setup.md.
-
Download
- Follow the instructions here:
docs/download.md.
- Follow the instructions here:
-
Train the object model on a preprocessed sequence
Let's use the sequence
hold_MC1_ho3d.0as an example. The available sequences for--seq_listare defined in theall_sequenceslist inrun.py.seq_name=hold_MC1_ho3d.0 # run all the sequences if seq_name set to all python run.py --execute_list only_3d --process_list rm train export --seq_list $seq_name python run.py --mute --execute_list only_3d --process_list validate gen_cond_depth align save_align --seq_list $seq_name python run.py --mute --execute_list only_ref --process_list rm train export --seq_list $seq_name python run.py --mute --execute_list 3d_ref --process_list rm train export --seq_list $seq_name python run.py --mute --execute_list 3d_ref --process_list validate --seq_list $seq_name python run.py --mute --execute_list 3d_ref_weight --process_list rm train export --seq_list $seq_name
-
Align the object to the hand
seq_name=hold_MC1_ho3d.0 # run all the sequences if seq_name set to all python run.py --execute_list 3d_ref_weight --process_list align_hand_object_h align_hand_object_r align_hand_object_o align_hand_object_ho --seq_list $seq_name --rebuild
-
Visualize the reconstruction result
- After reconstructing the object and aligning the hand to the object, visualize the hand–object pair with AITViewer.
seq_name=hold_MC1_ho3d.0 # run all the sequences if seq_name set to all python run.py --execute_list 3d_ref_weight --process_list vis_ait --seq_list $seq_name
-
Evaluate the reconstruction result
- Evaluate results for all sequences against ground truth:
seq_name=all python run.py --execute_list 3d_ref_weight --process_list eval_step_ho_pose_refine --seq_list $seq_name --rebuild- Merge the per-sequence evaluation results:
python run.py --execute_list 3d_ref_weight --process_list eval_summary_ho --seq_list hold_MC1_ho3d.0 --rebuild
- The merged metrics are written to
<project_dir>/outputs/metrics_summary/metrics_ho_pose_refine_results.txt.
-
Prepare custom data
- You can capture an RGB video with your telephone and follow
docs/custom.mdto obtain segmentations and poses for the hand and object.
- You can capture an RGB video with your telephone and follow
@inproceedings{wang2025magichoi,
title={{MagicHOI}: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips},
author={Wang, Shibo and He, Haonan and Parelli, Maria and Gebhardt, Christoph and Fan, Zicong and Song, Jie},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={5957--5968},
year={2025}
}For technical questions, please create an issue. For other questions, please contact the first author.
The authors would like to thank: Muhammed Kocabas, Xu Chen, Bonan Liu for detailed discussions and insightful feedback, Handi Yin for support and International Max Planck Research School for Intelligent Systems (IMPRSIS) for supporting Maria Parelli.
Our code benefits a lot from threestudio, hold, aitviewer, hloc. If you find our work useful, consider checking out their work.