- Setup the environment:
conda create -n geo-vista python==3.10 -y
conda activate geo-vista
bash setup.sh- Set up web search API key
We use Tavily during inference and training. You can sign up for a free account and get your Tavily API key, and then update the TAVILY_API_KEY variable of the .env file. You can run bash examples/search_test.sh to verify your API key is working.
- Download the tuned GeoVista model and deploy with vllm
Download from HuggingFace, place it in the ./.temp/checkpoints/LibraTree/GeoVista-RL-6k-7B directory.
python3 scripts/download_hf.py \
--model LibraTree/GeoVista-RL-6k-7B \
--local_model_dir .temp/checkpoints/then deploy the GeoVista model with vllm:
bash inference/vllm_deploy_geovista_rl_6k.sh- Run an example inference
export VLLM_PORT=8000
export VLLM_HOST="localhost"
# apply env variables
set -a; source .env; set +a;
python examples/infer_example.py \
--multimodal_input examples/geobench-example.png \
--question "Please analyze where is the place."You will see the model's thinking trajectory and final answer in the console output.
- We have already released the GeoVista-Bench (GeoBench) Dataset on HuggingFace π€, a benchmark that includes photos and panoramas from around the world, along with a subset of satellite images of different cities to rigorously evaluate the geolocalization ability of agentic models.
GeoBench is the first high-resolution, multi-source, globally annotated dataset to evaluate agentic modelsβ general geolocalization ability.
- We assess other geolocalization benchmarks with ours along five axes: Global Coverage (GC), indicating whether images span diverse regions worldwide; Reasonable Localizability (RC), measuring whether non-localizable or trivially localizable images are filtered out to preserve meaningful difficulty; High Resolution (HR), requiring all images to have at least
$1~\mathrm{M}$ pixels for reliable visual clue extraction; Data Variety (DV), capturing whether multiple image modalities or sources are included to test generalization; and Nuanced Evaluation (NE), which checks whether precise coordinates are available to enable fine-grained distance-based metrics such as haversine distance gap.
| Benchmark | Year | GC | RC | HR | DV | NE |
|---|---|---|---|---|---|---|
| Im2GPS | 2008 | β | ||||
| YFCC4k | 2017 | β | ||||
| Google Landmarks v2 | 2020 | β | ||||
| VIGOR | 2022 | β | ||||
| OSV-5M | 2024 | β | β | β | ||
| GeoComp | 2025 | β | β | β | ||
| GeoBench (ours) | 2025 | β | β | β | β | β |
We provide the whole inference and evaluation pipeline for GeoVista on GeoBench.
- Download the GeoBench dataset from HuggingFace and place it in the
./.temp/datasetsdirectory.
python3 scripts/download_hf.py \
--dataset LibraTree/GeoVistaBench \
--local_dataset_dir ./.temp/datasets- Download the pre-trained model from HuggingFace and place it in the
./.temp/checkpointsdirectory.
python3 scripts/download_hf.py \
--model LibraTree/GeoVista-RL-12k-7B \
--local_model_dir .temp/checkpoints/- Deploy the GeoVista model with vllm:
bash inference/vllm_deploy.sh- Configure the settings including the output directory, run the inference script:
bash inference/run_inference.shAfter running the above commands, you should be able to see the inference results in the specified output directory, e.g., ./.temp/outputs/geobench/geovista-rl-12k-7b/, which contains the inference_<timestamp>.jsonl file with the inference results.
- After obtaining the inference results, you can evaluate the geolocalization performance using the evaluation script:
MODEL_NAME=geovista-rl-12k-7b
BENCHMARK=geobench
EVALUATION_RESULT=".temp/outputs/${BENCHMARK}/${MODEL_NAME}/evaluation.jsonl"
python3 eval/eval_infer_geolocation.py \
--pred_jsonl <The inference file path> \
--out_jsonl ${EVALUATION_RESULT}\
--dataset_dir .temp/datasets/${BENCHMARK} \
--num_samples 1500 \
--model_verifier \
--no_eval_accurate_dist \
--timeout 120 --debug | tee .temp/outputs/${BENCHMARK}/${MODEL_NAME}/evaluation.log 2>&1You can acclerate the evaluation process by changing the workers argument in the above command (default is 1):
--workers 8 \- To run nuanced evaluation on GeoBench, please refer to evaluation.md for guidance.
- To be released soon.
Please consider citing our paper and starring this repo if you find them helpful. Thank you!
@misc{wang2025geovistawebaugmentedagenticvisual,
title={GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization},
author={Yikun Wang and Zuyan Liu and Ziyi Wang and Pengfei Liu and Han Hu and Yongming Rao},
year={2025},
eprint={2511.15705},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.15705},
}- We thank Tavily, Google Cloud for providing reliable web search API and geocoding services for research use. Also we thank Mapillary for providing high-quality street-level images around the world.
- We would like to thank the contributors to the VeRL, TRL, gpt-researcher and DeepEyes repositories, for their open-sourced framework or research.