🔭 Seeing with You: Perception-Reasoning Co-evolution for Multimodal Reasoning

Official implementation of Seeing with You: Perception-Reasoning Co-evolution for Multimodal Reasoning

If you find our project helpful, please consider giving us a star ⭐ on GitHub!

Overview

PRCO is a dual-role reinforcement learning with verifiable rewards (RLVR) framework for multimodal reasoning.

Observer: extracts question-relevant visual facts from the image and produces a question-conditioned evidence caption.
Solver: predicts the final answer from the caption, optionally consulting the image when needed.

Figure: Overview of PRCO.

🚀 News

[2026/03/25] We've released the model checkpoints, and evaluation code for PRCO.

TODO List

Release PRCO checkpoints (3B / 7B / 8B)
Release evaluation code
Release paper.
Release training code

Highlights

Dual-role RLVR framework for multimodal reasoning with a shared policy.
Observer/Solver decomposition for explicit separation of perception and reasoning.
Reliable role-specific rewards for better gradient-level credit assignment.
Consistent gains across model scales, including strong improvements on both 3B and 7B backbones.
Broad benchmark coverage across visual math, geometry, logic, and multidisciplinary reasoning.

Benchmark Results

Main Results on 8 Benchmarks (7B)

Model	MathVerse	MathVision	MathVista	WeMath	DynaMath	LogicVista	MMMU-Pro	MMStar	Avg.
Qwen2.5-VL-7B	43.02	25.46	70.20	35.43	20.35	45.41	35.49	64.26	42.45
DAPO	48.73	29.30	74.80	45.62	26.14	47.87	41.38	65.40	47.41
PRCO-7B	49.49	30.86	77.10	50.29	29.74	49.66	42.08	67.80	49.63

Model Zoo

Model	Backbone	Status	Link
PRCO-3B	Qwen2.5-VL-3B	Released	Checkpoint
PRCO-7B	Qwen2.5-VL-7B	Released	Checkpoint
PRCO-8B	Qwen3-VL-8B	Released	Checkpoint

Usage

1. Installation

git clone https://github.com/Dtc7w3PQ/PRCO.git
cd PRCO
conda create -n prco python=3.12 -y
conda activate prco
pip install -r requirements.txt

2. Evaluation

PRCO-3B / PRCO-7B / PRCO-8B use the same evaluation workflow.

Fill environment variables in VLMEvalKit/.env:

LMUData="<PATH_TO_LMUDATA>"
OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
OPENAI_API_BASE="<YOUR_OPENAI_API_BASE>"  # optional

Then load them in your shell:

set -a
source VLMEvalKit/.env
set +a

Set the correct local checkpoint path in each model config (both observer.model_path and solver.model_path):

VLMEvalKit/scripts/prco_3b/config.json
VLMEvalKit/scripts/prco_7b/config.json
VLMEvalKit/scripts/prco_8b/config.json

Run inference and evaluation scripts.

Example for one model (prco_7b):

cd VLMEvalKit
bash scripts/prco_7b/infer.sh
bash scripts/prco_7b/eval.sh

Run all three models with the same pipeline:

cd VLMEvalKit
for m in prco_3b prco_7b prco_8b; do
  bash scripts/$m/infer.sh
  bash scripts/$m/eval.sh
done

Logs are written to:

VLMEvalKit/scripts/<model_name>/infer.log
VLMEvalKit/scripts/<model_name>/eval.log

Predictions and evaluation outputs are written under:

VLMEvalKit/outputs/<model_name>/

3. Training

Train PRCO with the dual-role Observer/Solver framework:

Coming soon...

Citation

If you find this project helpful, please cite our paper:

@article{miao2026seeing,
  title={Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning},
  author={Miao, Ziqi and Jia, Haonan and Li, Lijun and Qian, Chen and Xiong, Yuan and Yan, Wenting and Shao, Jing},
  journal={arXiv preprint arXiv:2603.28618},
  year={2026}
}

Acknowledgement

This project is built around open multimodal reasoning research. We especially thank the open-source communities behind vLLM, EasyR1, and verl, which made this work possible.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
VLMEvalKit		VLMEvalKit
assets		assets
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔭 Seeing with You: Perception-Reasoning Co-evolution for Multimodal Reasoning

Overview

🚀 News

TODO List

Highlights

Benchmark Results

Main Results on 8 Benchmarks (7B)

Model Zoo

Usage

1. Installation

2. Evaluation

3. Training

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔭 Seeing with You: Perception-Reasoning Co-evolution for Multimodal Reasoning

Overview

🚀 News

TODO List

Highlights

Benchmark Results

Main Results on 8 Benchmarks (7B)

Model Zoo

Usage

1. Installation

2. Evaluation

3. Training

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages