This is the codebase for the vision toolkit of our paper Dex4D. This repository contains code for RGBD frame capture, video depth estimation, offline and online point tracking, and visualization of the point tracks in 4D.
[Paper] [Project Page] [Code (Simulation)] [Code (Hardware)]
[NOTE]: we will soon update the codebase to be an installable package for easier usage. Please stay tuned!
overview.mp4
Please follow the steps below to perform the installation:
conda create -n dex4d-vision python==3.11.0
conda activate dex4d-visionpip install -r requirements.txtDownload the checkpoints and put them under the checkpoints directory.
bash get_weights.shpip install 'imageio[ffmpeg]'
cd <PATH_FOR_COTRACKER>/
git clone git@github.com:facebookresearch/co-tracker.git
cd co-tracker/ && pip install -e .pip install pyrealsense2And follow instructions here to install the SDK.
cd <PATH_FOR_SAM2>/
git clone https://github.com/facebookresearch/sam2.git && cd sam2
pip install -e .
# Download checkpoints
cd checkpoints && \
./download_ckpts.sh && \
cd ..The following instructions will guide you the whole process of first RGBD frame capture -> video generation -> video depth estimation -> offline point tracking, as well as real-time online point tracking. You can also run each module separately.
First, specify the experiment name:
export exp_name="YOUR_EXP_NAME"Then run the realsense capture script:
python realsense.py --exp_name $exp_nameYou shall find the captured RGBD frames in outputs/realsense/$exp_name/.
After first frame capture, use your favorite video generation model to generate a video and put it in outputs/video_gen/$exp_name/<YOUR_VIDEO_NAME>.mp4. You can also resize the video to the same size as the original RGB frame using resize_video.py:
python resize_video.py --video_path outputs/video_gen/$exp_name/<YOUR_VIDEO_NAME>.mp4python3 run.py --input_video outputs/video_gen/$exp_name/gen_video_resized.mp4 --encoder vitl --save_npz # --metricFor more options, please check Video-Depth-Anything for details.
python -m cotracker3 --filename outputs/video_gen/$exp_name/gen_video_resized.mp4 --grid_size 100 # --use-maskpython visualize_track_4d.py --data_path outputs/ --exp_name $exp_name # --shareAfter these steps, you should be able to see:
video-to-4d.mp4
python real_time_tracking.py --exp_name $exp_nameThe real-time tracking process should look like this:
real-time-tracking.mp4
This project is built upon the following open-source projects:
We thank the authors for their open-source contributions.
If you find this work helpful, please consider citing:
@article{kuang2026dex4d,
title={Dex4D: Task-Agnostic Point Track Policy for Sim-to-Real Dexterous Manipulation},
author={Kuang, Yuxuan and Park, Sungjae and Fragkiadaki, Katerina and Tulsiani, Shubham},
journal={arXiv preprint arXiv:2602.15828},
year={2026}
}