GitHub - 3DAgentWorld/VGGT4D: The official implementation of the paper “VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction.”

VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction

Yu Hu¹* Chong Cheng^1,2* Sicheng Yu¹*

Xiaoyang Guo² Hao Wang¹†

¹The Hong Kong University of Science and Technology (Guangzhou)
²Horizon Robotics

* Equal contribution. † Corresponding author.

Quick Start

This section will guide you through setting up the environment and running VGGT4D on your own data.

1. Environment Setup

We recommend using pyenv together with virtualenv to ensure a clean and reproducible Python environment.

# Select Python version
pyenv shell 3.12

# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install core dependencies
pip install torch==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cu118

# Install remaining project requirements
pip install -r requirements.txt

2. Download Pre-trained Checkpoint

Download the pre-trained model checkpoint:

mkdir -p ckpts/
wget -c "https://huggingface.co/facebook/VGGT_tracker_fixed/resolve/main/model_tracker_fixed_e20.pt?download=true" -O ckpts/model_tracker_fixed_e20.pt

3. Run the Demo

Run the VGGT4D demo script to process your scene data:

python demo_vggt4d.py --input_dir <path_to_input_dir> --output_dir <path_to_output_dir>

Input Directory Structure:

The input directory should follow this structure:

input_dir/
├── scene1/
│   ├── image001.jpg
│   ├── image002.jpg
│   └── ...
└── scene2/
    ├── image001.png
    ├── image002.png
    └── ...

Each scene subdirectory should contain image files in .jpg or .png format.

Example Usage:

python demo_vggt4d.py --input_dir ./datasets/input_dir --output_dir ./outputs

Output Files:

The script processes each scene and generates the following outputs in the output directory:

Depth maps (frame_%04d.npy format)
Depth confidence maps (conf_%04d.npy format)
Camera intrinsics (pred_intrinsics.txt)
Camera poses in TUM format (pred_traj.txt)
Refined dynamic masks (dynamic_mask_%04d.png format)
RGB images (frame_%04d.png format)

TODO

Acknowledgements

We thank the authors of VGGT, DUSt3R, and Easi3R for releasing their models and code. Their contributions to geometric learning and dynamic reconstruction provided essential foundations for this work, along with many other inspiring works in the community.

License

This project is licensed under the MIT License.
You are free to use, modify, and distribute this software for both academic and commercial purposes, provided that proper attribution is given.

See the LICENSE file for details.

Citation

If you find VGGT4D useful for your research, please cite our paper:

@misc{hu2025vggt4d,
      title={VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction}, 
      author={Yu Hu and Chong Cheng and Sicheng Yu and Xiaoyang Guo and Hao Wang},
      year={2025},
      eprint={2511.19971},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.19971}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
training		training
vggt		vggt
vggt4d		vggt4d
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo_vggt4d.py		demo_vggt4d.py
eval_mask.py		eval_mask.py
requirements.txt		requirements.txt
vis_vggt4d.py		vis_vggt4d.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction

Quick Start

1. Environment Setup

2. Download Pre-trained Checkpoint

3. Run the Demo

TODO

Acknowledgements

License

Citation

About

Uh oh!

Releases

Packages

Languages

License

3DAgentWorld/VGGT4D

Folders and files

Latest commit

History

Repository files navigation

VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction

Quick Start

1. Environment Setup

2. Download Pre-trained Checkpoint

3. Run the Demo

TODO

Acknowledgements

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages