Official repo for the paper "Multi-view Pyramid Transformer: Look Coarser to See Broader"
# create conda environment
conda create -n mvp python=3.11 -y
conda activate mvp
# install PyTorch (adjust cuda version according to your system)
pip install -r requirements.txt
pip install git+https://github.com/nerfstudio-project/gsplat.gitThe model checkpoints are host on HuggingFace (mvp_960x540).
For training and evaluation, we used the DL3DV dataset after applying undistortion preprocessing with this script, originally introduced in Long-LRM.
Download the DL3DV benchmark dataset from here, and apply undistortion preprocessing.
Update the inference.ckpt_path field in configs/inference.yaml with the pretrained model.
Update the entries in data/dl3dv_eval.txt to point to the correct processed dataset path.
# inference
CUDA_VISIBLE_DEVICES=0 python inference.py --config configs/inference.yamlUpdate the configs/api_keys.yaml with your own personal wandb api key.
Update the entries in data/dl3dv_train.txt to point to the correct processed dataset path.
# Example for single GPU training
CUDA_VISIBLE_DEVICES=0 python train_single.py --config configs/train_stage1.yaml
# Example for multi GPU training
torchrun --nproc_per_node 8 --nnodes 1 \
--rdzv_id 1234 --rdzv_endpoint localhost:8888 \
train.py --config configs/train_stage1.yaml- Training code (Stage 3)
- Preprocessed Tanks&Temple and Mip-NeRF360 dataset
@article{kang2025multi,
title={Multi-view Pyramid Transformer: Look Coarser to See Broader},
author={Kang, Gyeongjin and Yang, Seungkwon and Nam, Seungtae and Lee, Younggeun and Kim, Jungwoo and Park, Eunbyung},
journal={arXiv preprint arXiv:2512.07806},
year={2025}
}
This project is built on many amazing research works, thanks a lot to all the authors for sharing!