TAPIP3D: Tracking Any Point in Persistent 3D Geometry

Bowei Zhang^1,2*, Lei Ke¹*, Adam W. Harley³, Katerina Fragkiadaki¹

¹Carnegie Mellon University ²Peking University ³Stanford University

NeurIPS 2025

* Equal Contribution

Overview

TAPIP3D is a method for long-term feed-forward 3D point tracking in monocular RGB and RGB-D video sequences. It introduces a 3D feature cloud representation that lifts image features into a persistent world coordinate space, canceling out camera motion and enabling accurate trajectory estimation across frames.

We provide a detailed video illustration of our TAPIP3D.

Installation

Installing dependencies

Prepare the environment

conda create -n tapip3d python=3.10
conda activate tapip3d

pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 "xformers>=0.0.27" --index-url https://download.pytorch.org/whl/cu124
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.4.1+cu124.html
pip install -r requirements.txt

Compile pointops2

cd third_party/pointops2
LIBRARY_PATH=$CONDA_PREFIX/lib:$LIBRARY_PATH python setup.py install
cd ../..

Compile megasam

cd third_party/megasam/base
LIBRARY_PATH=$CONDA_PREFIX/lib:$LIBRARY_PATH python setup.py install
cd ../../..

Downloading checkpoints

Download our TAPIP3D model checkpoint here to checkpoints/tapip3d_final.pth

If you want to run TAPIP3D on monocular videos, you need to prepare the following checkpoints manually to run MegaSAM:

Download the DepthAnything V1 checkpoint from here and put it to third_party/megasam/Depth-Anything/checkpoints/depth_anything_vitl14.pth
Download the RAFT checkpoint from here and put it to third_party/megasam/cvd_opt/raft-things.pth

Additionally, the checkpoints of MoGe and UniDepth will be downloaded automatically when running the demo. Please make sure your network connection is available.

Demo Usage

We provide a simple demo script inference.py, along with sample input data located in the demo_inputs/ directory.

The script accepts as input either an .mp4 video file or an .npz file. If providing an .npz file, it should follow the following format:

video: array of shape (T, H, W, 3), dtype: uint8
depths (optional): array of shape (T, H, W), dtype: float32
intrinsics (optional): array of shape (T, 3, 3), dtype: float32
extrinsics (optional): array of shape (T, 4, 4), dtype: float32

For demonstration purposes, the script uses a 32x32 grid of points at the first frame as queries.

Inference with Monocular Video

By providing an video as --input_path, the script first runs MegaSAM with MoGe to estimate depth maps and camera parameters. Subsequently, the model will process these inputs within the global frame.

Demo 1

To run inference:

python inference.py --input_path demo_inputs/sheep.mp4 --checkpoint checkpoints/tapip3d_final.pth --resolution_factor 2

An npz file will be saved to outputs/inference/. To visualize the results:

python visualize.py <result_npz_path>

Demo 2

python inference.py --input_path demo_inputs/pstudio.mp4 --checkpoint checkpoints/tapip3d_final.pth --resolution_factor 2

Inference with Known Depths and Camera Parameters

If an .npz file containing all four keys (rgb, depths, intrinsics, extrinsics) is provided, the model will operate in an aligned global frame, generating point trajectories in world coordinates. We provide one example .npz file at here and please put it in the demo_inputs/ directory.

Demo 3

python inference.py --input_path demo_inputs/dexycb.npz --checkpoint checkpoints/tapip3d_final.pth --resolution_factor 2

Citation

If you find this project useful, please consider citing:

@article{tapip3d,
  title={TAPIP3D: Tracking Any Point in Persistent 3D Geometry},
  author={Zhang, Bowei and Ke, Lei and Harley, Adam W and Fragkiadaki, Katerina},
  journal={arXiv preprint arXiv:2504.14717},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
annotation		annotation
datasets		datasets
demo_inputs		demo_inputs
media		media
models		models
third_party		third_party
training		training
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
requirements.txt		requirements.txt
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TAPIP3D: Tracking Any Point in Persistent 3D Geometry

Overview

Installation

Installing dependencies

Downloading checkpoints

Demo Usage

Inference with Monocular Video

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

zbw001/TAPIP3D

Folders and files

Latest commit

History

Repository files navigation

TAPIP3D: Tracking Any Point in Persistent 3D Geometry

Overview

Installation

Installing dependencies

Downloading checkpoints

Demo Usage

Inference with Monocular Video

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages