Unified Geo-Localization, VPR, and Retrieval Research Toolbox
Composable GeoEncoder pipelines for backbone exploration, descriptor aggregation, metric learning, evaluation, and reproducible geometric retrieval experiments.
GeoMetricLab is a research-oriented toolbox for visual geo-localization, visual place recognition (VPR), and image retrieval. It centers on a unified GeoEncoder abstraction that composes modern CNN / ViT backbones with retrieval-oriented aggregation heads, making it easy to train, evaluate, and compare geometry-aware global descriptors in a single codebase.
- Unified encoder design for backbone + aggregator composition
- Support for both CNN and transformer-style feature pipelines
- Ready-to-run training engines for GL3D and University-1652 workflows
- Lightweight evaluation scripts with optional feature-cache loading
- Clean model hubs for canonical presets, transforms, losses, and datasets
- Third-party methods integrated with explicit provenance and submodule management
GeoMetricLab targets the representation-learning layer of geo-localization and retrieval pipelines:
- Extract dense or token features with a configurable backbone
- Aggregate local descriptors into a compact global representation
- Optionally apply BN, whitening, and final normalization
- Train with task-specific pipelines and evaluate with retrieval metrics
The repository is designed for fast iteration on:
- backbone selection
- aggregation design
- descriptor dimensionality
- metric-learning objectives and schedules
- cached-feature evaluation and deployment-style inference
Current backbone registry includes:
- ResNet / ResNeXt
- DINOv2
- PEFT-DINOv2
- DINOv3
- Swin Transformer V2
- ConvNeXt
The maintained aggregator set in this repository includes:
- Avg
- CLS
- GeM
- BoQ
- CosPlace
- EigenPlace
- NetVLAD
- GhostVLAD
- SuperVLAD
- SALAD
- CricaVPR
Several implementations are adapted from their original public repositories, with per-file attribution placed near the corresponding class or function definitions.
GeoMetricLab currently supports the following workflows:
- training instance-level retrieval models
- training supervised scene-level retrieval models
- evaluating GL3D and University-1652 retrieval pipelines
- loading descriptors from cache for fast offline benchmarking
- exporting or reusing trained encoder weights
- experimenting with LoRA-style PEFT adaptation on DINOv2 backbones
- initializing VLAD-style aggregators with FAISS-based clustering utilities
The main training entrypoints live in engine/:
engine/train_instance_gl3d_engine.pyengine/train_instance_u1652_engine.pyengine/train_supscene_gl3d_engine.py
The reusable data / framework components in src/pipeline/ currently cover:
- instance metric learning
- supervised scene training
Tracked evaluation scripts live in scripts/:
scripts/eval_gl3d.pyscripts/eval_university1652.py
These scripts support two common evaluation modes:
- loading pre-extracted descriptors from cache
- building a
GeoEncoderand extracting features from weights directly
- GL3D(BlendedMVS)
- University-1652
- ROxford
- RParis
- SfM-120k
- GSVCities
- MSLS
Some of these datasets are used as local evaluation resources or experiment assets, while the officially maintained training engines in this repository currently target GL3D and University-1652 first.
GeoMetricLab is built around a practical research stack for modern retrieval experiments:
- PyTorch for model definition and tensor computation
- PyTorch Lightning / Lightning for training orchestration
- torchvision for transforms and model utilities
- PyTorch Metric Learning for retrieval losses, miners, and distance functions
- FAISS for fast nearest-neighbor search and VLAD-related clustering utilities
- h5py for descriptor cache storage and feature IO
- Weights & Biases and TensorBoard for experiment logging
- PEFT for LoRA-style efficient adaptation of Visual Foundation Model backbones
External projects under third_party/ are tracked as submodules rather than vendored into the main repository. The current repository layout includes method ecosystems such as:
- DINOv2
- DINOv3
- BoQ
- CosPlace
- CricaVPR
- SALAD
This keeps method provenance explicit while making it easier to update or compare upstream implementations.
GeoMetricLab/
βββ config/ # canonical model / transform / loss configs
βββ data/ # dataset roots (often local symlinks or external mounts)
βββ engine/ # train entrypoints
βββ scripts/ # tracked evaluation entrypoints
βββ src/
β βββ datasets/ # dataset implementations
β βββ models/ # backbones, aggregators, encoder modules
β βββ pipeline/ # datamodules + training frameworks
β βββ utils/ # metrics, IO, logging, callbacks
βββ third_party/ # external dependencies as submodules
βββ weights/ # local model weights
βββ notebooks/ # analysis and visualization notebooks
git clone https://github.com/Suxilan/GeoMetricLab.git
cd GeoMetricLabgit submodule update --init --recursivepip install -e .Install your PyTorch stack separately if you need a CUDA-specific build.
This repository expects datasets to be placed under data/. In practice, large datasets are often mounted or linked from external storage.
Typical strategy:
- keep raw datasets outside the repo
- create symlinks into
data/ - keep experiment caches and checkpoints local
Refer to data/README.md for dataset-specific layout notes.
python engine/train_instance_gl3d_engine.py \
--config config/train_instance_gl3d/default.yamlpython engine/train_supscene_gl3d_engine.py \
--config config/train_supscene_gl3d/default.yamlpython engine/train_instance_u1652_engine.py \
--config config/train_instance_u1652/default.yamlpython scripts/eval_gl3d.py \
--model dino_salad \
--root ./data/GL3D/testpython scripts/eval_university1652.py \
--model resnet50_cosplace \
--task u1652_drone2satellite \
--root ./data/university1652/testCanonical model presets are defined in config/model_hubs.py. These entries provide a stable way to reference common backbone / aggregator combinations such as:
resnet50_gemresnet50_cosplaceresnet50_boq_16384dinov2_boq_12288dino_salad
This makes it easier to keep evaluation and weight loading consistent across experiments.
- Keep the encoder modular and easy to recompose
- Separate experimental weights and caches from source code
- Preserve third-party method provenance explicitly
- Prefer simple training and evaluation entrypoints over deeply coupled runners
GeoMetricLab is an active research codebase. The repository prioritizes reproducible experimentation, modular method integration, and fast empirical iteration over packaging polish. Interfaces may evolve as new backbones, aggregators, and benchmarks are added.
GeoMetricLab is built on top of a strong open-source ecosystem in visual retrieval, VPR, and modern vision foundation models. Special thanks to the communities and projects behind:
- OpenVPRLab, for helping shape open and reproducible VPR research practices
- PyTorch and PyTorch Lightning, for the core training and experimentation framework
- PyTorch Metric Learning, for robust retrieval-oriented losses, miners, and metric utilities
- FAISS, for efficient large-scale nearest-neighbor search and clustering primitives
- DINOv2 and DINOv3, for strong vision foundation backbones
- public method repositories such as SALAD, CosPlace, BoQ, and CricaVPR, which make comparative retrieval research much easier
If GeoMetricLab contributes to your research workflow, please cite the original papers of the backbone and aggregation methods you use.