CAMEO:
Correspondence-Attention Alignment for Multi-View Diffusion Models

Minkyung Kwon^*1, Jinhyeok Choi^*1, Jiho Park^*1, Seonghu Jeon¹,
Jinhyuk Jang¹, Junyoung Seo¹, Minseop Kwak¹, Jin-Hwa Kim^†2,3, Seungryong Kim^†1

¹KAIST AI, ²NAVER AI Lab, ³SNU AIIS

* Equal contribution † Co-corresponding author

Abstract

Multi-view diffusion models have recently emerged as a powerful paradigm for novel view synthesis, yet the underlying mechanism that enables their view-consistency remains unclear. In this work, we first verify that the attention maps of these models acquire geometric correspondence throughout training, attending to the geometrically corresponding regions across reference and target views for view-consistent generation. However, this correspondence signal remains incomplete, with its accuracy degrading under large viewpoint changes. Building on these findings, we introduce CAMEO, a simple yet effective training technique that directly supervises attention maps using geometric correspondence to enhance both the training efficiency and generation quality of multi-view diffusion models. Notably, supervising a single attention layer is sufficient to guide the model toward learning precise correspondences, thereby preserving the geometry and structure of reference images, accelerating convergence, and improving novel view synthesis performance. CAMEO reduces the number of training iterations required for convergence by half while achieving superior performance at the same iteration counts. We further demonstrate that CAMEO is model-agnostic and can be applied to any multi-view diffusion model.

To Do

Release Code

Dataset

batch (dict)  
 ├─ image:      [B, F, 3, H, W]  
 ├─ intrinsic:  [B, F, 3, 3]  
 ├─ extrinsic:  [B, F, 3(4), 4]  
 └─ point_map (optional): [B, F, 3, H, W]

Frame order: reference -> target

Before forwarding model, frame sequence should be ordered as [reference_frames, target_frames].
In train.py, provide original sequence of the data; the code does ordering automatically.

Train

export WANDB_API_KEY='your_wandb_key'
WANDB_PROJECT_NAME=your_project_name
RUN_NAME=your_run_name
CONFIG_PATH="configs/your_config.yaml"
OUTPUT_DIR="check_points/${RUN_NAME}"
accelerate launch --mixed_precision="bf16" \
                  --num_processes=2 --num_machines 1 --main_process_port 12312 \
                  --config_file configs/deepspeed/acc_zero2_bf16.yaml train.py \
                  --tracker_project_name $WANDB_PROJECT_NAME \
                  --output_dir=$OUTPUT_DIR \
                  --config_file=$CONFIG_PATH \
                  --train_log_interval=10000 \
                  --val_interval=40000 \
                  --val_cfg=2.0 \
                  --min_decay=0.5 \
                  --log_every 10 \
                  --seed 0 \
                  --run_name $RUN_NAME  \
                  --num_workers_per_gpu 2 \
                  --checkpointing_last_steps 5000 \
                  --autocast_fp32_on_distill \
                  # --config_file="check_points/${RUN_NAME}/config.yaml" \
                  # --resume_from_last

Acknowledgements

This code is based on the work of MVGenMaster. Many thanks to them for making their project available.

Citation

If you find our work useful in your research, please consider citing:

@article{kwon2025cameo,
  title={CAMEO: Correspondence-Attention Alignment for Multi-View Diffusion Models},
  author={Kwon, Minkyung and Choi, Jinhyeok and Park, Jiho and Jeon, Seonghu and Jang, Jinhyuk and Seo, Junyoung and Kwak, Min-Seop and Kim, Jin-Hwa and Kim, Seungryong},
  journal={arXiv preprint arXiv:2512.03045},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
configs		configs
my_diffusers		my_diffusers
src		src
vggt		vggt
.gitignore		.gitignore
README.md		README.md
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CAMEO:
Correspondence-Attention Alignment for Multi-View Diffusion Models

Abstract

To Do

Dataset

Train

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

cvlab-kaist/CAMEO

Folders and files

Latest commit

History

Repository files navigation

CAMEO: Correspondence-Attention Alignment for Multi-View Diffusion Models

Abstract

To Do

Dataset

Train

Acknowledgements

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

CAMEO:
Correspondence-Attention Alignment for Multi-View Diffusion Models

Packages