VGG-T³: Offline Feed-Forward 3D Reconstruction at Scale

NVIDIA University of Toronto Vector Institute

Sven Elflein, Ruilong Li, Sérgio Agostinho, Zan Gojcic, Laura Leal-Taixé, Qunjie Zhou, Aljosa Osep

Overview

VGG-T³ processes large image collections significantly faster than other feed-forward methods (1k images in <1 minute vs. 10 minutes for VGGT) by replacing the quadratic-scaling softmax attention in the global attention layers with a linear alternative based on test-time training.

Quick Start

Clone this repo and then install (preferably in a conda environment):

pip install torch==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cu126
pip install .

VGG-T³ is compatible with the VGGT API and can be used in a similar way:

from vggttt.nets.vggt.models.vggt import VGGT
from vggttt.nets.vggt.img import load_and_preprocess_images

vggttt = VGGT.from_pretrained("nvidia/vgg-ttt").eval().cuda()

image_names = ["path/to/imageA.png", "path/to/imageB.png", "path/to/imageC.png"]
images = load_and_preprocess_images(image_names).to("cuda")

preds = vggttt.infer(images)
# Dict containing the predicted outputs with the following keys:
#  - 'pose':        [#images, 4, 4]  Camera-to-world transformation
#  - 'intrinsics':  [#images, 3, 3]  Pinhole camera matrix
#  - 'pts3d':       [#images, height, width, 3]  Per-pixel points in world coordinates
#  - 'conf':        [#images, height, width]  Per-pixel confidence in range ]1, inf[
#  - 'depth':       [#images, height, width, 1]  Per-pixel depth

Demo

We provide an interactive web interface to perform 3D reconstruction of images and videos and visualize the result.

python vggttt/demo.py

Note: When running on a remote server you need to forward both the viser and Gradio port. See the CLI output for details.

Evaluation

Find details on how to reproduce the results in the paper here.

Training

We release the training harness, however, dataset implementations and preprocessing code is missing. We are currently in the process of checking feasibility for releasing the relevant code.

Acknowledgmens

We are also grateful to several other open-source repositories that we drew inspiration from or built upon during the development of our pipeline:

Citation

If you find this work useful, please cite:

@inproceedings{elflein2026vggttt,
  title     = {VGG-T\textsuperscript{3}: Offline Feed-Forward 3D Reconstruction at Scale},
  author    = {Elflein, Sven and Li, Ruilong and Agostinho, S{\'e}rgio and Gojcic, Zan and Leal-Taix{\'e}, Laura and Zhou, Qunjie and Osep, Aljosa},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}

License

The code and model are released under the NVIDIA OneWay Noncommercial License, with the following exceptions:

vggttt/nets/vggt/ — released under the VGGT license.
vggttt/nets/ttt.py — adapted from LaCT and released under the MIT License.
vggttt/evaluation/pointmaps/utils.py — adapted from CUT3R and released under CC BY-NC-SA 4.0.

See THIRD_PARTY_LICENSES.md for the full license texts of all third-party components.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
vggttt		vggttt
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VGG-T³: Offline Feed-Forward 3D Reconstruction at Scale

Overview

Quick Start

Demo

Evaluation

Training

Acknowledgmens

Citation

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VGG-T³: Offline Feed-Forward 3D Reconstruction at Scale

Overview

Quick Start

Demo

Evaluation

Training

Acknowledgmens

Citation

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages