RayZer: A Self-supervised Large View Synthesis Model

ICCV 2025 (Oral)

Hanwen Jiang, Hao Tan, Peng Wang, Haian Jin, Yue Zhao, Sai Bi, Kai Zhang, Fujun Luan, Kalyan Sunkavalli, Qixing Huang, Georgios Pavlakos

Project Page | Paper

0. Clarification

This is the official repository for the paper "RayZer: A Self-supervised Large View Synthesis Model ".

The code here is a re-implementation and differs from the original version developed at Adobe. However, the provided checkpoints are from the original Adobe implementation and were trained inside Adobe. This codebase is developed based on LVSM.

We have verified that the re-implemented version matches the performance of the original. For any questions or issues, please contact Hanwen Jiang at hwjiang1510@gmail.com.

1. Preparation

Environment

conda create -n rayzer python=3.11
conda activate rayzer
pip install -r requirements.txt

As we used xformers memory_efficient_attention, the GPU device compute capability needs > 8.0. Otherwise, it would pop up an error. Check your GPU compute capability in CUDA GPUs Page.

Data

We provide preprocessed DL3DV benchmark data for evaluation purpose. You can find the preprocessed data here. Then place the data at ./dl3dv_benchmark

Note that for training, you will need to preprocess the training set (10K scenes not included in benchmark) as the same data format.

Checkpoints

Data	Model	View Sampling	PSNR	SSIM	LPIPS	Results
DL3DV	RayZer-8-12-12-100K	Even	25.59	0.795	0.183	link
DL3DV	RayZer-8-12-12-100K	Random	25.47	0.795	0.181	link

2. Training

Before training, you need to follow the instructions here to generate the Wandb key file for logging and save it in the configs folder as api_keys.yaml. You can use the configs/api_keys_example.yaml as a template.

Note that for training, you will need to preprocess the training set (10K scenes not included in benchmark) as the same data format.

The original training command:

torchrun --nproc_per_node 8 --nnodes 4 \
    --rdzv_id 18635 --rdzv_backend c10d --rdzv_endpoint localhost:29502 \
    train.py --config configs/rayzer_dl3dv.yaml

The training will be distributed across 8 GPUs and 4 nodes with a total batch size of 256. rayzer_dl3dv.yaml is the config file for the RayZer-DL3DV model. You can also use LVSM_dl3dv.yaml for training LVSM assuming known poses. Note that for efficiency, we use a patch size of 16, which is different from the patch size of 8 used in original LVSM paper.

3. Inference

torchrun --nproc_per_node 8 --nnodes 1 \
    --rdzv_id 18635 --rdzv_backend c10d --rdzv_endpoint localhost:29506 \
    inference.py --config "configs/rayzer_dl3dv.yaml" \
    training.dataset_path = "./data/dl3dv10k_benchmark.txt" \
    training.batch_size_per_gpu = 4 \
    training.target_has_input =  false \
    training.num_views = 24 \
    training.num_input_views = 16 \
    training.num_target_views = 8 \
    inference.if_inference = true \
    inference.compute_metrics = true \
    inference.render_video = false \
    inference.view_idx_file_path = "./data/rayzer_evaluation_index_dl3dv_even.json" \
    inference.model_path = ./model_checkpoints/rayzer_dl3dv_8_12_12_96k.pt \
    inference_out_root = ./experiments/evaluation/test \

We use ./data/rayzer_evaluation_index_dl3dv_even.json and ./data/rayzer_evaluation_index_dl3dv_random.json to specify the view indices. The two files correspond to the settings of even sampling and random sampling in paper.

After the inference, the code will generate a html file in the inference_out_dir folder. You can open the html file to view the results.

4. Citation

If you find this work useful in your research, please consider citing:

@article{jiang2025rayzer,
  title={RayZer: A Self-supervised Large View Synthesis Model},
  author={Jiang, Hanwen and Tan, Hao and Wang, Peng and Jin, Haian and Zhao, Yue and Bi, Sai and Zhang, Kai and Luan, Fujun and Sunkavalli, Kalyan and Huang, Qixing and others},
  journal={arXiv preprint arXiv:2505.00702},
  year={2025}
}

5. TODO

[] Prepare evaluation and training scripts on RE10K.

6. Known Issues

The model can be sensitive to the number of views, as it uses image index embedding. Make sure your number of views are the same during training and testing.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
data		data
model		model
utils		utils
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
generate_html.py		generate_html.py
inference.py		inference.py
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RayZer: A Self-supervised Large View Synthesis Model

ICCV 2025 (Oral)

0. Clarification

1. Preparation

Environment

Data

Checkpoints

2. Training

3. Inference

4. Citation

5. TODO

6. Known Issues

About

Uh oh!

Releases

Packages

Languages

License

WangYu0611/RayZer

Folders and files

Latest commit

History

Repository files navigation

RayZer: A Self-supervised Large View Synthesis Model

ICCV 2025 (Oral)

0. Clarification

1. Preparation

Environment

Data

Checkpoints

2. Training

3. Inference

4. Citation

5. TODO

6. Known Issues

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages