Skip to content

Dex4D/Dex4D-Vision

Repository files navigation

Dex4D-Vision

This is the codebase for the vision toolkit of our paper Dex4D. This repository contains code for RGBD frame capture, video depth estimation, offline and online point tracking, and visualization of the point tracks in 4D.

[Paper] [Project Page] [Code (Simulation)] [Code (Hardware)]

[NOTE]: we will soon update the codebase to be an installable package for easier usage. Please stay tuned!

overview.mp4

Installation

Please follow the steps below to perform the installation:

1. Create virtual environment

conda create -n dex4d-vision python==3.11.0
conda activate dex4d-vision

2. Video Depth Anything

pip install -r requirements.txt

Download the checkpoints and put them under the checkpoints directory.

bash get_weights.sh

3. CoTracker

pip install 'imageio[ffmpeg]'
cd <PATH_FOR_COTRACKER>/
git clone git@github.com:facebookresearch/co-tracker.git
cd co-tracker/ && pip install -e .

4. RealSense

pip install pyrealsense2

And follow instructions here to install the SDK.

5. SAM2

cd <PATH_FOR_SAM2>/
git clone https://github.com/facebookresearch/sam2.git && cd sam2
pip install -e .
# Download checkpoints
cd checkpoints && \
./download_ckpts.sh && \
cd ..

Usage

The following instructions will guide you the whole process of first RGBD frame capture -> video generation -> video depth estimation -> offline point tracking, as well as real-time online point tracking. You can also run each module separately.

RGBD Frame Capture

First, specify the experiment name:

export exp_name="YOUR_EXP_NAME"

Then run the realsense capture script:

python realsense.py --exp_name $exp_name

You shall find the captured RGBD frames in outputs/realsense/$exp_name/.

Video Generation

After first frame capture, use your favorite video generation model to generate a video and put it in outputs/video_gen/$exp_name/<YOUR_VIDEO_NAME>.mp4. You can also resize the video to the same size as the original RGB frame using resize_video.py:

python resize_video.py --video_path outputs/video_gen/$exp_name/<YOUR_VIDEO_NAME>.mp4

Video Depth Estimation

python3 run.py --input_video outputs/video_gen/$exp_name/gen_video_resized.mp4 --encoder vitl --save_npz # --metric

For more options, please check Video-Depth-Anything for details.

Offline Point Tracking

python -m cotracker3 --filename outputs/video_gen/$exp_name/gen_video_resized.mp4 --grid_size 100 # --use-mask

Visualize the point tracks in 4D

python visualize_track_4d.py --data_path outputs/ --exp_name $exp_name # --share

After these steps, you should be able to see:

video-to-4d.mp4

Real-Time Online Point Tracking (with Apriltag Calibration)

python real_time_tracking.py --exp_name $exp_name

The real-time tracking process should look like this:

real-time-tracking.mp4

Acknowledgement

This project is built upon the following open-source projects:

We thank the authors for their open-source contributions.

Citation

If you find this work helpful, please consider citing:

@article{kuang2026dex4d,
  title={Dex4D: Task-Agnostic Point Track Policy for Sim-to-Real Dexterous Manipulation},
  author={Kuang, Yuxuan and Park, Sungjae and Fragkiadaki, Katerina and Tulsiani, Shubham},
  journal={arXiv preprint arXiv:2602.15828},
  year={2026}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors