Skip to content

Code for ICCV'2025 (Best student paper honorable mention) "RayZer: A Self-supervised Large View Synthesis Model"

License

Notifications You must be signed in to change notification settings

WangYu0611/RayZer

 
 

Repository files navigation

RayZer: A Self-supervised Large View Synthesis Model

ICCV 2025 (Oral)

Hanwen Jiang, Hao Tan, Peng Wang, Haian Jin, Yue Zhao, Sai Bi, Kai Zhang, Fujun Luan, Kalyan Sunkavalli, Qixing Huang, Georgios Pavlakos



0. Clarification

This is the official repository for the paper "RayZer: A Self-supervised Large View Synthesis Model ".

The code here is a re-implementation and differs from the original version developed at Adobe. However, the provided checkpoints are from the original Adobe implementation and were trained inside Adobe. This codebase is developed based on LVSM.

We have verified that the re-implemented version matches the performance of the original. For any questions or issues, please contact Hanwen Jiang at hwjiang1510@gmail.com.


1. Preparation

Environment

conda create -n rayzer python=3.11
conda activate rayzer
pip install -r requirements.txt

As we used xformers memory_efficient_attention, the GPU device compute capability needs > 8.0. Otherwise, it would pop up an error. Check your GPU compute capability in CUDA GPUs Page.

Data

We provide preprocessed DL3DV benchmark data for evaluation purpose. You can find the preprocessed data here. Then place the data at ./dl3dv_benchmark

Note that for training, you will need to preprocess the training set (10K scenes not included in benchmark) as the same data format.

Checkpoints

Data Model View Sampling PSNR SSIM LPIPS Results
DL3DV RayZer-8-12-12-100K Even 25.59 0.795 0.183 link
DL3DV RayZer-8-12-12-100K Random 25.47 0.795 0.181 link

2. Training

Before training, you need to follow the instructions here to generate the Wandb key file for logging and save it in the configs folder as api_keys.yaml. You can use the configs/api_keys_example.yaml as a template.

Note that for training, you will need to preprocess the training set (10K scenes not included in benchmark) as the same data format.

The original training command:

torchrun --nproc_per_node 8 --nnodes 4 \
    --rdzv_id 18635 --rdzv_backend c10d --rdzv_endpoint localhost:29502 \
    train.py --config configs/rayzer_dl3dv.yaml

The training will be distributed across 8 GPUs and 4 nodes with a total batch size of 256. rayzer_dl3dv.yaml is the config file for the RayZer-DL3DV model. You can also use LVSM_dl3dv.yaml for training LVSM assuming known poses. Note that for efficiency, we use a patch size of 16, which is different from the patch size of 8 used in original LVSM paper.

3. Inference

torchrun --nproc_per_node 8 --nnodes 1 \
    --rdzv_id 18635 --rdzv_backend c10d --rdzv_endpoint localhost:29506 \
    inference.py --config "configs/rayzer_dl3dv.yaml" \
    training.dataset_path = "./data/dl3dv10k_benchmark.txt" \
    training.batch_size_per_gpu = 4 \
    training.target_has_input =  false \
    training.num_views = 24 \
    training.num_input_views = 16 \
    training.num_target_views = 8 \
    inference.if_inference = true \
    inference.compute_metrics = true \
    inference.render_video = false \
    inference.view_idx_file_path = "./data/rayzer_evaluation_index_dl3dv_even.json" \
    inference.model_path = ./model_checkpoints/rayzer_dl3dv_8_12_12_96k.pt \
    inference_out_root = ./experiments/evaluation/test \

We use ./data/rayzer_evaluation_index_dl3dv_even.json and ./data/rayzer_evaluation_index_dl3dv_random.json to specify the view indices. The two files correspond to the settings of even sampling and random sampling in paper.

After the inference, the code will generate a html file in the inference_out_dir folder. You can open the html file to view the results.

4. Citation

If you find this work useful in your research, please consider citing:

@article{jiang2025rayzer,
  title={RayZer: A Self-supervised Large View Synthesis Model},
  author={Jiang, Hanwen and Tan, Hao and Wang, Peng and Jin, Haian and Zhao, Yue and Bi, Sai and Zhang, Kai and Luan, Fujun and Sunkavalli, Kalyan and Huang, Qixing and others},
  journal={arXiv preprint arXiv:2505.00702},
  year={2025}
}

5. TODO

[] Prepare evaluation and training scripts on RE10K.

6. Known Issues

The model can be sensitive to the number of views, as it uses image index embedding. Make sure your number of views are the same during training and testing.

About

Code for ICCV'2025 (Best student paper honorable mention) "RayZer: A Self-supervised Large View Synthesis Model"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%