Vero: An Open RL Recipe for General Visual Reasoning

Vero is a fully open reinforcement learning recipe for training and evaluating multi-task visual reasoning with vision-language models.

The released project combines an RL training stack (vero-rl) and an evaluation harness (vero-eval).

Highlights

600K curated RL samples from 59 datasets across 6 visual reasoning task categories: STEM, Chart & OCR, Spatial & Action, Knowledge & Recognition, Grounding, Counting & Search, & Captioning & Instruction Following
Single-stage RL recipe for visual reasoning with task-routed reward functions
VeroEvalSuite with 30 benchmarks spanning the 6 multimodal reasoning task categories
Support for many base models: Qwen3.5, Qwen2.5-VL, Qwen3-VL, MiMo-VL, Bee, Molmo2
Fully open codebase for training and evaluation

Installation

Clone Repository

git clone https://github.com/zlab-princeton/vero.git
cd vero

Environment Setup

bash scripts/setup_env.sh

This installs PyTorch, vLLM, Transformers, FlashAttention, and both project packages (vero-rl, vero-eval) in editable mode. See scripts/setup_env.sh for the full setup flow.

Data Setup

For Vero RL training, the model-run scripts use formatted local data under vero-rl/data by default. Prepare it once with:

python scripts/download_and_format_vero_600k.py

This script downloads or reuses cached data from zlab-princeton/Vero-600k, exports images into vero-rl/data/images/, and writes:

vero-rl/data/vero_600k_train.verl.jsonl
vero-rl/data/vero_600k_val.verl.jsonl

All bash launchers in vero-rl/examples/model_runs/ will pick up those files automatically once they exist.

For custom data, Vero expects a specific data format for RL training.

For dataset format, curation details, and reward routing metadata, see docs/DATA.md.

Vero Reward

We open source our runtime reward stack in vero-rl/vero_reward. Its main entrypoint, math_verify_reward_type_boxed.py, routes scoring by reward_type and combines strict <think>/<answer> format checks with task-specific accuracy. The package covers boxed/numeric/string-match style rewards, grounding rewards based on bbox matching in grounding_reward.py, clicking rewards based on point-in-box checks in click_reward.py, and instruction-following checks in instructions.py.

During Vero RL training, these rule-based rewards are combined with an LLM-judge path implemented in vero_vllm_judge.py. The shared model-run config gspo_llmjudge_shared.yaml enables the vero_vllm_judge reward manager, points the custom reward function at vero_reward/math_verify_reward_type_boxed.py, and configures judge parameters such as the local API endpoint, sampling settings, sleep mode, and the instruction-following blend weight.

The LLM judge itself uses the prompt in llm_judge_reference.txt, which asks the judge model to compare the rollout answer against a reference answer and return a structured 1-10 score. In the standard training scripts such as run_gspo_qwen3vl_instruct_mix_all_llmjudge.sh, the judge server is started automatically by sourcing llm_judge_server.sh, which launches a local vllm serve process, waits for readiness, and prepares the server for training-time reward calls.

Model Checkpoints

Pretrained Huggingface checkpoints are available via the following links:

Model	Base Model	Parameters	HF Link
`Vero-Qwen25-7B`	Qwen2.5-VL-7B-Instruct	7B	zlab-princeton/Vero-Qwen25-7B
`Vero-Qwen3I-8B`	Qwen3-VL-8B-Instruct	8B	zlab-princeton/Vero-Qwen3I-8B
`Vero-Qwen3T-8B`	Qwen3-VL-8B-Thinking	8B	zlab-princeton/Vero-Qwen3T-8B
`Vero-MiMo-7B`	MiMo-VL-7B-SFT	7B	zlab-princeton/Vero-MiMo-7B

See docs/MODELS.md for the documented model families, training settings, and inference format.

Supported Training Launch Scripts

Script	Model Family	Base Model
Train Vero-Qwen25-7B	`Vero-Qwen25-7B`	Qwen2.5-VL-7B-Instruct
Train Vero-Qwen3I-8B	`Vero-Qwen3I-8B`	Qwen3-VL-8B-Instruct
Train Vero-MiMo-7B	`Vero-MiMo-7B`	MiMo-VL-7B-SFT

Quick Start

First prepare the repo-local training data:

python scripts/download_and_format_vero_600k.py

Then launch a training run. TRAIN_FILES, VAL_FILES, and IMAGE_ROOT are optional overrides if you want to point at different formatted data.

export ROOT_PATH="/path/to/data_root"  # for datasets and checkpoints
cd vero-rl
bash examples/model_runs/run_gspo_qwen3vl_instruct_mix_all_llmjudge.sh

Optional dataset overrides:

export TRAIN_FILES="/path/to/train.verl.jsonl"
export VAL_FILES="/path/to/val.verl.jsonl"
export IMAGE_ROOT="/path/to/data_root"

The training scripts auto-detect REPO_ROOT from their location, manage the LLM judge server automatically, and use Hydra-based configs from vero-rl/examples/model_runs/config/.

Evaluation

Vero is evaluated with vero-eval, an evaluation harness built on lmms-eval which houses VeroEvalSuite, a 30-benchmark suite spanning:

Chart and OCR
STEM reasoning
Spatial reasoning and action
Knowledge and recognition
Grounding, counting, and visual search
Captioning and instruction following

Evaluation Benchmarks

Task Category	Benchmarks
Chart & OCR	ChartQA-Pro, ChartQA, InfoVQA, CharXiv, ChartMuseum, EvoChart
STEM	MMMU-PRO Standard, MMMU-PRO Vision, MathVision, MathVista
Spatial & Action	Blink, ERQA, GameQA, EmbSpatial, CVBench
Knowledge & Recognition	RealWorldQA, SimpleVQA (English), FVQA, MM-Vet V2
Grounding, Counting & Visual Search	CountBenchQA, CountQA, MMERealWorld, VStarBench, AerialVG, VisualProbe, ScreenSpot, ScreenSpotPro
Captioning & Instruction Following	MM-MTBench, MIABench, MMIFEval

Quick Start

cd vero-eval

# Evaluate on a single task
bash examples/eval.sh \
    --model-path zlab-princeton/Vero-Qwen3I-8B \
    --tasks chartqa_reasoning

# Evaluate on a full domain
bash examples/eval_domain.sh \
    --model-path zlab-princeton/Vero-Qwen3I-8B \
    --domain chart_ocr \
    --variant reasoning

For direct lmms_eval usage:

cd vero-eval

python -m lmms_eval \
    --model vllm \
    --model_args model=zlab-princeton/Vero-Qwen3I-8B,tensor_parallel_size=1 \
    --tasks chartqa_reasoning \
    --batch_size 2048 \
    --output_path ./eval_results/

See docs/EVALUATION.md for benchmark coverage, judge configuration, and evaluation workflows.

Repository Structure

Vero/
|-- docs/          Data, training, evaluation, and model documentation
|-- scripts/       Environment setup and data filtering scripts
|-- vero-eval/     Evaluation harness built around lmms-eval
`-- vero-rl/       RL training framework built around veRL

Documentation

Citation

If you use this repository, please cite:

@article{sarch2026vero,
    title   = {Vero: An Open RL Recipe for General Visual Reasoning},
    author  = {Sarch, Gabriel and Cai, Linrong and Wang, Qunzhong and Wu, Haoyang and Chen, Danqi and Liu, Zhuang},
    year    = {2026},
    journal = {arXiv preprint arXiv:2604.04917},
  }

Acknowledgements

This project builds on several strong open-source foundations:

veRL for distributed RL training infrastructure
lmms-eval for multimodal evaluation

License

This project is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
docs		docs
examples		examples
scripts		scripts
vero-eval		vero-eval
vero-rl		vero-rl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vero: An Open RL Recipe for General Visual Reasoning

Highlights

Installation

Clone Repository

Environment Setup

Data Setup

Vero Reward

Model Checkpoints

Supported Training Launch Scripts

Quick Start

Evaluation

Evaluation Benchmarks

Quick Start

Repository Structure

Documentation

Citation

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vero: An Open RL Recipe for General Visual Reasoning

Highlights

Installation

Clone Repository

Environment Setup

Data Setup

Vero Reward

Model Checkpoints

Supported Training Launch Scripts

Quick Start

Evaluation

Evaluation Benchmarks

Quick Start

Repository Structure

Documentation

Citation

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages