St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World (ICCV 2025)

This is the official PyTorch implementation of St4RTrack (pronounced “Star Trek”).

We propose a unified feed-forward framework that simultaneously reconstructs and tracks dynamic video content in a world coordinate frame from RGB inputs. Our method enables simultaneous 4D reconstruction and tracking by predicting two appropriately defined pointmaps for frame pairs, naturally combining 3D reconstruction with 3D tracking through video sequences.

Key Features:

Unified Representation: Simultaneous 4D reconstruction and tracking in world coordinates
Feed-forward: Efficient processing without post optimization
In-The-Wild Adaptation: Test-time adaptation using reprojection loss on unlabeled data
Comprehensive Evaluation: New WorldTrack benchmark for world coordinate tracking

Please refer to the arXiv paper for more technical details and Project Page for interactive video results.

TODO

Clean code for dataset and loss computation
Add configurable track evaluation data root argument
Remove unused training arguments for cleaner codebase
Remove abs path in scripts
Release pre-trained model weights: Hugging Face and Google Drive
More unit tests
Add dataset/benchmark download and preprocess instructions
Add pre-trained models download scripts
Check requirements.txt

Getting Started

Installation

Clone St4RTrack with submodules:

git clone https://github.com/HavenFeng/St4RTrack.git
cd St4RTrack

Create the environment: we use torch 2.5.1 with CUDA 12.1 for our implementation, you can set up the environment with

conda env create -f environment.yml
conda activate st4rtrack

optionally, you can also use

conda create -n st4rtrack python=3.12 cmake=3.14.0
conda activate st4rtrack
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt
# Optional dependencies for training and evaluation:
pip install -r requirements_optional.txt

Optional: Compile CUDA kernels for RoPE (faster runtime):

cd croco/models/curope/
python setup.py build_ext --inplace
cd ../../../

Download Checkpoints

We currently provide fine-tuned model weights for St4RTrack, which can be downloaded locally via Google Drive. We recommand St4RTrack_Seqmode_reweightMax5.pth by default. Optionally, you can also load checkpoint via Hugging Face by easily adding

--hf_model "yupengchengg147/St4RTrack" \
--hf_variant seq \
--hf_force_download #optional

in training and inference command.

Training

Preparing Data

Download the dataset for PointOdyssey and DynamicReplica:

bash data/download_po.sh
bash data/download_dynamic_replica.sh

For Kubric, please download and prepare the dataset based on the official instructions.

Download Pre-Trained Models

mkdir -p checkpoints/
wget https://download.europe.naverlabs.com/ComputerVision/MASt3R/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth -P ./checkpoints/

Pair Mode Training

Train with pair-wise image reconstruction and dynamic pixel reweighting:

# Remember to replace the dataset path to your own path
# the script has been tested on a 4xA100(80G) machine
bash scripts_run/train_pair_reweight.sh

Sequence Mode Training

Train with sequence-based tracking:

# Remember to replace the dataset path to your own path
# the script has been tested on a 4xA100(80G) machine
bash scripts_run/train_seq_reweight.sh

Test-Time Adaptation

Fine-tune on specific sequences:

# Remember to replace the dataset path and checkpoint path to your own path
# the script has been tested on a 4xA100(80G) machine
bash scripts_run/train_tta.sh

Training and Adaptation Improvements post Submission (Led by Pengcheng Yu)

After the initial submission, we identified and addressed training stability issues related to confidence estimation for dynamic pixels. Originally, the confidence-based loss function is:

L = conf1 * l1 - alpha * log_conf1 + conf2 * l2 - alpha * log_conf2

We observed that the confidence-based loss formulation used in prior works (e.g., DUSt3R, MonSt3R, MASt3R) is suboptimal for St4RTrack, as it often leads the model to neglect pixels belonging to moving objects. Since DUSt3R-based models are mainly trained on static scenes, they tend to assign low confidence to dynamic or translucent pixels. As a result, during training, the model reduces conf1 (confidence) rather than minimizing the actual error l1, which undermines learning on dynamic content. Additionally, simply removing the confidence term without introducing alternative constraints degrades performance, as previous approaches rely on confidence weighting for effective point-view regression.

Our Quick Solution - Dynamic Pixel Reweighting for Head1: We reweight the confidence values of dynamic pixels in head1 by replacing them with scaled static pixel confidence:

w = reweight_scale * conf1_static.max()  # or conf1_static.mean()
L = conf1_static*l1_static - alpha * log_conf1_static + w * l1_dynamic + conf2 * l2 - alpha * log_conf2

Dynamic vs Static Pixel Classification: During training, pixels are classified based on ground-truth trajectory displacement:

Dynamic pixels: Displacement > dataset-specific threshold
Static pixels: Displacement ≤ threshold

Threshold Selection Strategy:

Pair mode training:
- PointOdyssey and Dynamic Replica: max(0.75 quantile displacement, mean displacement)
- Kubrick: median displacement
Sequence mode training: mean displacement across all datasets

This improvement significantly enhances training stability and final performance, especially for scenes with substantial dynamic content.

Here we show the quantitative comparision of tracking performance between model trained with and without Dynamic Pixel Reweighting for Head1. Please take Table 1 and Table 3 in the paper as reference.

Table 1: APD (Average Percentage of points within Delta) ⬆️

Method	All-points				Dynamic points
	po	dr	adt	PStudio	po	dr	adt	PStudio
pair_mode with reweight	67.29	71.28	68.97	67.59	69.76	74.65	76.22	67.59
sequence with reweight	67.34	74.34	73.03	70.67	72.04	76.82	78.01	70.67
st4rtrack wo reweight	67.95	73.74	76.00	69.67	68.71	68.13	75.34	69.67

Table 3: EPE (End Point Error) ⬇️

Method	All-points				Dynamic points
	po	dr	adt	PStudio	po	dr	adt	PStudio
pair_mode with reweight	0.3163	0.3016	0.3324	0.2850	0.2612	0.2180	0.1158	0.2850
sequence with reweight	0.3169	0.2605	0.2946	0.2489	0.2367	0.1978	0.1087	0.2489
st4rtrack wo reweight	0.3140	0.2682	0.2680	0.2637	0.2970	0.2961	0.1212	0.2637

Visual Comparison

Inference

Run inference on your data:

python infer.py \
    --batch_size 128 \
    --input_dir /path/to/your/data \
    --weights checkpoints/your_model.pth \
    --output_dir results/your_path

Visualize the results:

python visualizer_st4rtrack.py --traj_path results/your_path

Evaluation

Evaluate on standard benchmarks:

bash scripts_run/eval.sh

Evaluation Benchmarks

Download the WorldTrack dataset from Google Drive and place it in ./data/worldtrack_release, the structure should be like this:

./data/worldtrack_release/
└───adt_mini/
└───pstudio_mini/
└───po_mini/
└───ds_mini/
└───tum/

We provide the following evaluation datasets:

Trajectory evaluation datasets: adt_mini, pstudio_mini, po_mini, ds_mini
Reconstruction evaluation datasets: po_mini, tum

Citation

If you find our work useful, please cite:

@inproceedings{st4rtrack2025,
  title={St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World},
  author={Feng*, Haiwen and Zhang*, Junyi and Wang, Qianqian and Ye, Yufei and Yu, Pengcheng and Black, Michael J. and Darrell, Trevor and Kanazawa, Angjoo},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

License

This code and model are available for non-commercial scientific research purposes.

Concurrent Efforts

Shout-out to other concurrent efforts on pointmap-based dense 3D tracking:

ZeroMSF - https://research.nvidia.com/labs/lpr/zero_msf/
DynaDUSt3R - https://stereo4d.github.io/
Dynamic Point Maps - https://www.robots.ox.ac.uk/~vgg/research/dynamic-point-maps/

Acknowledgements

We would like to thank the authors of DUSt3R and MonST3R for their foundational work in stereo matching and 3D reconstruction. We also thank the contributors of PointOdyssey, TUM-Dynamics, Dynamic Replica, TAPVid-3D and DAVIS datasets for enabling comprehensive evaluation.

This work was supported by UC Berkeley and the Max Planck Institute for Intelligent Systems.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
croco		croco
data		data
datasets_preprocess		datasets_preprocess
dust3r		dust3r
scripts_run		scripts_run
third_party/MoGe		third_party/MoGe
README.md		README.md
environment.yml		environment.yml
infer.py		infer.py
requirements.txt		requirements.txt
requirements_optional.txt		requirements_optional.txt
train.py		train.py
visualizer_st4rtrack.py		visualizer_st4rtrack.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World (ICCV 2025)

TODO

Getting Started

Installation

Download Checkpoints

Training

Preparing Data

Download Pre-Trained Models

Pair Mode Training

Sequence Mode Training

Test-Time Adaptation

Table 1: APD (Average Percentage of points within Delta) ⬆️

Table 3: EPE (End Point Error) ⬇️

Visual Comparison

Inference

Evaluation

Evaluation Benchmarks

Citation

License

Concurrent Efforts

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 3

Languages

HavenFeng/St4RTrack

Folders and files

Latest commit

History

Repository files navigation

St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World (ICCV 2025)

TODO

Getting Started

Installation

Download Checkpoints

Training

Preparing Data

Download Pre-Trained Models

Pair Mode Training

Sequence Mode Training

Test-Time Adaptation

Table 1: APD (Average Percentage of points within Delta) ⬆️

Table 3: EPE (End Point Error) ⬇️

Visual Comparison

Inference

Evaluation

Evaluation Benchmarks

Citation

License

Concurrent Efforts

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages