Hanwen Jiang, Hao Tan, Peng Wang, Haian Jin, Yue Zhao, Sai Bi, Kai Zhang, Fujun Luan, Kalyan Sunkavalli, Qixing Huang, Georgios Pavlakos
This is the official repository for the paper "RayZer: A Self-supervised Large View Synthesis Model ".
The code here is a re-implementation and differs from the original version developed at Adobe. However, the provided checkpoints are from the original Adobe implementation and were trained inside Adobe. This codebase is developed based on LVSM.
We have verified that the re-implemented version matches the performance of the original. For any questions or issues, please contact Hanwen Jiang at hwjiang1510@gmail.com.
conda create -n rayzer python=3.11
conda activate rayzer
pip install -r requirements.txt
As we used xformers memory_efficient_attention, the GPU device compute capability needs > 8.0. Otherwise, it would pop up an error. Check your GPU compute capability in CUDA GPUs Page.
We provide preprocessed DL3DV benchmark data for evaluation purpose. You can find the preprocessed data here. Then place the data at ./dl3dv_benchmark
Note that for training, you will need to preprocess the training set (10K scenes not included in benchmark) as the same data format.
| Data | Model | View Sampling | PSNR | SSIM | LPIPS | Results |
|---|---|---|---|---|---|---|
| DL3DV | RayZer-8-12-12-100K | Even | 25.59 | 0.795 | 0.183 | link |
| DL3DV | RayZer-8-12-12-100K | Random | 25.47 | 0.795 | 0.181 | link |
Before training, you need to follow the instructions here to generate the Wandb key file for logging and save it in the configs folder as api_keys.yaml. You can use the configs/api_keys_example.yaml as a template.
Note that for training, you will need to preprocess the training set (10K scenes not included in benchmark) as the same data format.
The original training command:
torchrun --nproc_per_node 8 --nnodes 4 \
--rdzv_id 18635 --rdzv_backend c10d --rdzv_endpoint localhost:29502 \
train.py --config configs/rayzer_dl3dv.yamlThe training will be distributed across 8 GPUs and 4 nodes with a total batch size of 256.
rayzer_dl3dv.yaml is the config file for the RayZer-DL3DV model. You can also use LVSM_dl3dv.yaml for training LVSM assuming known poses.
Note that for efficiency, we use a patch size of 16, which is different from the patch size of 8 used in original LVSM paper.
torchrun --nproc_per_node 8 --nnodes 1 \
--rdzv_id 18635 --rdzv_backend c10d --rdzv_endpoint localhost:29506 \
inference.py --config "configs/rayzer_dl3dv.yaml" \
training.dataset_path = "./data/dl3dv10k_benchmark.txt" \
training.batch_size_per_gpu = 4 \
training.target_has_input = false \
training.num_views = 24 \
training.num_input_views = 16 \
training.num_target_views = 8 \
inference.if_inference = true \
inference.compute_metrics = true \
inference.render_video = false \
inference.view_idx_file_path = "./data/rayzer_evaluation_index_dl3dv_even.json" \
inference.model_path = ./model_checkpoints/rayzer_dl3dv_8_12_12_96k.pt \
inference_out_root = ./experiments/evaluation/test \We use ./data/rayzer_evaluation_index_dl3dv_even.json and ./data/rayzer_evaluation_index_dl3dv_random.json to specify the view indices. The two files correspond to the settings of even sampling and random sampling in paper.
After the inference, the code will generate a html file in the inference_out_dir folder. You can open the html file to view the results.
If you find this work useful in your research, please consider citing:
@article{jiang2025rayzer,
title={RayZer: A Self-supervised Large View Synthesis Model},
author={Jiang, Hanwen and Tan, Hao and Wang, Peng and Jin, Haian and Zhao, Yue and Bi, Sai and Zhang, Kai and Luan, Fujun and Sunkavalli, Kalyan and Huang, Qixing and others},
journal={arXiv preprint arXiv:2505.00702},
year={2025}
}[] Prepare evaluation and training scripts on RE10K.
The model can be sensitive to the number of views, as it uses image index embedding. Make sure your number of views are the same during training and testing.