Prajnan Goswami1*,
Tianye Ding1*,
Feng Liu2,
Huaizu Jiang1
1 Visual Intelligence Lab, Northeastern University, 2 Adobe Research
* Equal Contribution
conda create --name unicorrn python=3.11
conda activate unicorrnor
conda create --prefix /environment_path/unicorrn python=3.11
conda activate /environment_path/unicorrnpip install -r requirements.txt
pip install --no-build-isolation --no-deps git+https://github.com/Silverster98/pointops
pip install --no-build-isolation --no-deps git+https://github.com/qinzheng93/vision3d.gitInstall unicorrn
pip install -e .OPTIONAL
Install Cuda Rotary Position Embedding cuRoPE.
cd unicorrn/model/embedder/curope
python setup.py build_ext --inplace
cd ../../../../For the visual localization benchmark on InLoc.
pip install -r requirements_optional.txtmkdir -p pretrained_models/UniCorrn Stage 1 and Stage 2 pre-trained weights
wget https://huggingface.co/prajnan/unicorrn/resolve/main/UniCorrn_Large_Stage1.pth -P pretrained_models/
wget https://huggingface.co/prajnan/unicorrn/resolve/main/UniCorrn_Large_Stage2.pth -P pretrained_models/2D2D
import torch
import numpy as np
from PIL import Image
from unicorrn.model import build_model
from unicorrn.utils import safe_load_weights, plot_correspondences
from unicorrn.utils.config import read_yaml_config
from unicorrn.inference import (
init_query_points,
coarse_to_fine,
cycle_uniform_grid_inference,
)
MODEL_CONFIG_PATH = "/your_project_path/configs/models/unicorrn_large_stage2.yml"
CKPT_PATH = "/your_project_path/pretrained_models/UniCorrn_Large_Stage2.pth"
GRID_SIZE = 4
CONFIDENCE_THRESHOLD = 3.8
MATCHING_RADIUS_PX = 1.0
# ---- Shared setup: load model and input images ----
model_cfg = read_yaml_config(MODEL_CONFIG_PATH)
model = build_model(model_cfg.NAME, cfg=model_cfg)
weights = torch.load(CKPT_PATH, map_location="cpu", weights_only=False)
safe_load_weights(model, weights["model"])
model = model.eval().cuda()
img1_path = "assets/image_a.png"
img2_path = "assets/image_b.png"
img1 = np.array(Image.open(img1_path).convert("RGB"))
img2 = np.array(Image.open(img2_path).convert("RGB"))
H, W = img1.shape[:2]
# ---- Usage 1: User-specified keypoints with confidence filter ----
queries = init_query_points(H, W, grid_size=GRID_SIZE).view(-1, 2).numpy()
kpts1, kpts2, confidence, _ = coarse_to_fine(
img1,
img2,
queries,
model,
unified_model=True,
)
mask = confidence.squeeze() >= CONFIDENCE_THRESHOLD
kpts1 = kpts1[mask]
kpts2 = kpts2[mask]
plot_correspondences(img1, img2, kpts1, kpts2, marker_size=1, plot_line=False, save_path="example_confidence.jpg")
# ---- Usage 2: Cycle-consistency correspondence extraction ----
# Builds a uniform grid on img1 and keeps only matches whose backward
# prediction lands within `MATCHING_RADIUS_PX` of the original query.
cycle_kpts1, cycle_kpts2, _, _ = cycle_uniform_grid_inference(
img1,
img2,
model,
grid_size=GRID_SIZE,
matching_radius_px=MATCHING_RADIUS_PX,
unified_model=True,
)
plot_correspondences(img1, img2, cycle_kpts1, cycle_kpts2, marker_size=1, plot_line=False, save_path="example_cycle.jpg")2D3D and 3D3D examples are included in notebooks/2d3d.ipynb and notebooks/3d3d.ipynb.
2D-2D
We use the datasets listed below, following DUSt3R's preprocessing step. See the Datasets section in DUSt3R for details.
- ARKitScenes - Creative Commons Attribution-NonCommercial-ShareAlike 4.0
- BlendedMVS - Creative Commons Attribution 4.0 International License
- CO3Dv2 - Creative Commons Attribution-NonCommercial 4.0 International
- ScanNet++ - non-commercial research and educational purposes
- WayMo Open dataset - Non-Commercial Use
- MegaDepth
- StaticThings3D
Download the datasets into data/Datasets/
data/Datasets/
├── blendedmvs_processed/
└── megadepth_processed/
.
.
└── waymo_processed/
2D-3D
We follow 2D3DMATR to prepare the 2D-3D datasets.
The 7Scenes dataset can be downloaded from BaiduYun (extraction code: m7mc). Place it under data/Datasets/ and organize as follows:
data/Datasets/
└── 7Scenes/
├── metadata/
└── data/
├── chess/
├── fire/
├── heads/
├── office/
├── pumpkin/
├── redkitchen/
└── stairs/
The RGBD-ScenesV2 dataset can be downloaded from BaiduYun (extraction code: 2dc7). Place it under data/Datasets/ and organize as follows:
data/Datasets/
└── RGBDScenesV2/
├── metadata/
└── data/
├── rgbd-scenes-v2-scene_01/
├── ...
└── rgbd-scenes-v2-scene_14/
3D-3D
We use 3DMatch and ModelNet for 3D-3D. Download the datasets into data/Datasets/ using:
bash scripts/download_3d3d_data.shThe resulting layout should be:
data/Datasets/
├── indoor/
└── modelnet40_ply_hdf5_2048/
We provide download links for the pre-trained CroCoV2 weights from the original repository, as well as an additional decoder weights file aligned with our feature fusion module.
wget https://download.europe.naverlabs.com/ComputerVision/CroCo/CroCo_V2_ViTLarge_BaseDecoder.pth -P pretrained_models/
wget https://huggingface.co/prajnan/unicorrn/resolve/main/CroCoV2_Large_BaseDecoder.pth -P pretrained_models/Stage 1:
bash scripts/train_stage1.shStage 2:
bash scripts/train_stage2.shDownload the precomputed query points using RoMa for MegaDepth1500 and ScanNet1500:
wget https://huggingface.co/prajnan/unicorrn/resolve/main/megadepth1500_query_points.json -P benchmarks
wget https://huggingface.co/prajnan/unicorrn/resolve/main/scannet1500_query_points.json -P benchmarksMegaDepth-1500:
bash scripts/benchmark_2d2d_megadepth1500.shScanNet-1500:
bash scripts/benchmark_2d2d_scannet1500.shInLoc Visual localization
Prepare the InLoc dataset following the steps in the DUSt3R visloc guide and place it under data/Datasets/InLoc/.
Then run:
bash scripts/benchmark_2d2d_inloc.shAfter completion, upload the results_filename_ltvl.txt file to https://www.visuallocalization.net/.
7Scenes:
bash scripts/benchmark_2d3d_7Scenes.shRGBD-ScenesV2:
bash scripts/benchmark_2d3d_rgbdscenesv2.sh3DMatch and 3DLoMatch:
bash scripts/benchmark_3d3d_3dmatch.shModelNet and ModelLoNet:
bash scripts/benchmark_3d3d_modelnet.shIf you find this repository useful in your research, please consider giving a star ⭐ and a citation
@inproceddings{goswami2026unicorrn,
title={UniCorrn: Unified Correspondence Transformer Across 2D and 3D},
author={Goswami, Prajnan and Ding, Tianye and Liu, Feng and Jiang, Huaizu},
booktitle={CVPR},
year={2026}
}We would like to thank the authors of MASt3R, RoMa, PointTransformerV3, 2D3DMATR, PREDATOR, Vision3D, and many other repositories for open-sourcing their code.