Skip to content

Gynjn/MVP

Repository files navigation

Multi-view Pyramid Transformer: Look Coarser to See Broader

arXiv Project Page

Gyeongjin Kang, Seungkwon Yang, Seungtae Nam, Younggeun Lee, Jungwoo Kim, Eunbyung Park

Official repo for the paper "Multi-view Pyramid Transformer: Look Coarser to See Broader"

Installation

# create conda environment
conda create -n mvp python=3.11 -y
conda activate mvp

# install PyTorch (adjust cuda version according to your system)
pip install -r requirements.txt
pip install git+https://github.com/nerfstudio-project/gsplat.git

Checkpoints

The model checkpoints are host on HuggingFace (mvp_960x540).

For training and evaluation, we used the DL3DV dataset after applying undistortion preprocessing with this script, originally introduced in Long-LRM.

Download the DL3DV benchmark dataset from here, and apply undistortion preprocessing.

Inference

Update the inference.ckpt_path field in configs/inference.yaml with the pretrained model.

Update the entries in data/dl3dv_eval.txt to point to the correct processed dataset path.

# inference
CUDA_VISIBLE_DEVICES=0 python inference.py --config configs/inference.yaml

Train

Update the configs/api_keys.yaml with your own personal wandb api key.

Update the entries in data/dl3dv_train.txt to point to the correct processed dataset path.

# Example for single GPU training
CUDA_VISIBLE_DEVICES=0 python train_single.py --config configs/train_stage1.yaml

# Example for multi GPU training
torchrun --nproc_per_node 8 --nnodes 1 \
         --rdzv_id 1234 --rdzv_endpoint localhost:8888 \
         train.py --config configs/train_stage1.yaml

TODO List

  • Training code (Stage 3)
  • Preprocessed Tanks&Temple and Mip-NeRF360 dataset

Citation

@article{kang2025multi,
  title={Multi-view Pyramid Transformer: Look Coarser to See Broader},
  author={Kang, Gyeongjin and Yang, Seungkwon and Nam, Seungtae and Lee, Younggeun and Kim, Jungwoo and Park, Eunbyung},
  journal={arXiv preprint arXiv:2512.07806},
  year={2025}
}

Acknowledgements

This project is built on many amazing research works, thanks a lot to all the authors for sharing!

About

Official repository of "Multi-view Pyramid Transformer: Look Coarser to See Broader"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages