Skip to content
forked from ekonwang/GeoVista

Official repo for "GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization"

License

Notifications You must be signed in to change notification settings

CowPeas/GeoVista

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

56 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

logo

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization


Paper GeovistaBench GeovistaBench GeoVista-RL-6k-7B Webpage
* This repository is intended solely for research purposes.

Quick Start

  1. Setup the environment:
conda create -n geo-vista python==3.10 -y
conda activate geo-vista

bash setup.sh
  1. Set up web search API key

We use Tavily during inference and training. You can sign up for a free account and get your Tavily API key, and then update the TAVILY_API_KEY variable of the .env file. You can run bash examples/search_test.sh to verify your API key is working.

  1. Download the tuned GeoVista model and deploy with vllm

Download from HuggingFace, place it in the ./.temp/checkpoints/LibraTree/GeoVista-RL-6k-7B directory.

python3 scripts/download_hf.py \
--model LibraTree/GeoVista-RL-6k-7B \
--local_model_dir .temp/checkpoints/

then deploy the GeoVista model with vllm:

bash inference/vllm_deploy_geovista_rl_6k.sh

  1. Run an example inference
export VLLM_PORT=8000
export VLLM_HOST="localhost"
# apply env variables
set -a; source .env; set +a;
python examples/infer_example.py \
--multimodal_input examples/geobench-example.png \
--question "Please analyze where is the place."

You will see the model's thinking trajectory and final answer in the console output.

Benchmark

  • We have already released the GeoVista-Bench (GeoBench) Dataset on HuggingFace πŸ€—, a benchmark that includes photos and panoramas from around the world, along with a subset of satellite images of different cities to rigorously evaluate the geolocalization ability of agentic models.

GeoBench is the first high-resolution, multi-source, globally annotated dataset to evaluate agentic models’ general geolocalization ability.

  • We assess other geolocalization benchmarks with ours along five axes: Global Coverage (GC), indicating whether images span diverse regions worldwide; Reasonable Localizability (RC), measuring whether non-localizable or trivially localizable images are filtered out to preserve meaningful difficulty; High Resolution (HR), requiring all images to have at least $1~\mathrm{M}$ pixels for reliable visual clue extraction; Data Variety (DV), capturing whether multiple image modalities or sources are included to test generalization; and Nuanced Evaluation (NE), which checks whether precise coordinates are available to enable fine-grained distance-based metrics such as haversine distance gap.
Benchmark Year GC RC HR DV NE
Im2GPS 2008 βœ“
YFCC4k 2017 βœ“
Google Landmarks v2 2020 βœ“
VIGOR 2022 βœ“
OSV-5M 2024 βœ“ βœ“ βœ“
GeoComp 2025 βœ“ βœ“ βœ“
GeoBench (ours) 2025 βœ“ βœ“ βœ“ βœ“ βœ“

Inference and Evaluation on GeoBench

We provide the whole inference and evaluation pipeline for GeoVista on GeoBench.

Inference

  • Download the GeoBench dataset from HuggingFace and place it in the ./.temp/datasets directory.
python3 scripts/download_hf.py \
--dataset LibraTree/GeoVistaBench \
--local_dataset_dir ./.temp/datasets
  • Download the pre-trained model from HuggingFace and place it in the ./.temp/checkpoints directory.
python3 scripts/download_hf.py \
--model LibraTree/GeoVista-RL-12k-7B \
--local_model_dir .temp/checkpoints/
  • Deploy the GeoVista model with vllm:
bash inference/vllm_deploy.sh
  • Configure the settings including the output directory, run the inference script:
bash inference/run_inference.sh

After running the above commands, you should be able to see the inference results in the specified output directory, e.g., ./.temp/outputs/geobench/geovista-rl-12k-7b/, which contains the inference_<timestamp>.jsonl file with the inference results.

Evaluation

  • After obtaining the inference results, you can evaluate the geolocalization performance using the evaluation script:
MODEL_NAME=geovista-rl-12k-7b
BENCHMARK=geobench
EVALUATION_RESULT=".temp/outputs/${BENCHMARK}/${MODEL_NAME}/evaluation.jsonl"

python3 eval/eval_infer_geolocation.py \
  --pred_jsonl <The inference file path> \
  --out_jsonl ${EVALUATION_RESULT}\
  --dataset_dir .temp/datasets/${BENCHMARK} \
  --num_samples 1500 \
  --model_verifier \
  --no_eval_accurate_dist \
  --timeout 120 --debug | tee .temp/outputs/${BENCHMARK}/${MODEL_NAME}/evaluation.log 2>&1

You can acclerate the evaluation process by changing the workers argument in the above command (default is 1):

  --workers 8 \

Nuanced Evaluation

  • To run nuanced evaluation on GeoBench, please refer to evaluation.md for guidance.

Training Pipeline

  • To be released soon.

BibTex

Please consider citing our paper and starring this repo if you find them helpful. Thank you!

@misc{wang2025geovistawebaugmentedagenticvisual,
      title={GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization}, 
      author={Yikun Wang and Zuyan Liu and Ziyi Wang and Pengfei Liu and Han Hu and Yongming Rao},
      year={2025},
      eprint={2511.15705},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.15705}, 
}

Star History

StarHistory

Acknowledgements

  • We thank Tavily, Google Cloud for providing reliable web search API and geocoding services for research use. Also we thank Mapillary for providing high-quality street-level images around the world.
  • We would like to thank the contributors to the VeRL, TRL, gpt-researcher and DeepEyes repositories, for their open-sourced framework or research.

About

Official repo for "GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.6%
  • Shell 1.4%