Official implementation of Seeing with You: Perception-Reasoning Co-evolution for Multimodal Reasoning
If you find our project helpful, please consider giving us a star ⭐ on GitHub!
PRCO is a dual-role reinforcement learning with verifiable rewards (RLVR) framework for multimodal reasoning.
- Observer: extracts question-relevant visual facts from the image and produces a question-conditioned evidence caption.
- Solver: predicts the final answer from the caption, optionally consulting the image when needed.
Figure: Overview of PRCO.
- [2026/03/25] We've released the model checkpoints, and evaluation code for PRCO.
- Release PRCO checkpoints (3B / 7B / 8B)
- Release evaluation code
- Release paper.
- Release training code
- Dual-role RLVR framework for multimodal reasoning with a shared policy.
- Observer/Solver decomposition for explicit separation of perception and reasoning.
- Reliable role-specific rewards for better gradient-level credit assignment.
- Consistent gains across model scales, including strong improvements on both 3B and 7B backbones.
- Broad benchmark coverage across visual math, geometry, logic, and multidisciplinary reasoning.
| Model | MathVerse | MathVision | MathVista | WeMath | DynaMath | LogicVista | MMMU-Pro | MMStar | Avg. |
|---|---|---|---|---|---|---|---|---|---|
| Qwen2.5-VL-7B | 43.02 | 25.46 | 70.20 | 35.43 | 20.35 | 45.41 | 35.49 | 64.26 | 42.45 |
| DAPO | 48.73 | 29.30 | 74.80 | 45.62 | 26.14 | 47.87 | 41.38 | 65.40 | 47.41 |
| PRCO-7B | 49.49 | 30.86 | 77.10 | 50.29 | 29.74 | 49.66 | 42.08 | 67.80 | 49.63 |
| Model | Backbone | Status | Link |
|---|---|---|---|
| PRCO-3B | Qwen2.5-VL-3B | Released | Checkpoint |
| PRCO-7B | Qwen2.5-VL-7B | Released | Checkpoint |
| PRCO-8B | Qwen3-VL-8B | Released | Checkpoint |
git clone https://github.com/Dtc7w3PQ/PRCO.git
cd PRCO
conda create -n prco python=3.12 -y
conda activate prco
pip install -r requirements.txtPRCO-3B / PRCO-7B / PRCO-8B use the same evaluation workflow.
- Fill environment variables in
VLMEvalKit/.env:
LMUData="<PATH_TO_LMUDATA>"
OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
OPENAI_API_BASE="<YOUR_OPENAI_API_BASE>" # optionalThen load them in your shell:
set -a
source VLMEvalKit/.env
set +a- Set the correct local checkpoint path in each model config (both
observer.model_pathandsolver.model_path):
VLMEvalKit/scripts/prco_3b/config.jsonVLMEvalKit/scripts/prco_7b/config.jsonVLMEvalKit/scripts/prco_8b/config.json
- Run inference and evaluation scripts.
Example for one model (prco_7b):
cd VLMEvalKit
bash scripts/prco_7b/infer.sh
bash scripts/prco_7b/eval.shRun all three models with the same pipeline:
cd VLMEvalKit
for m in prco_3b prco_7b prco_8b; do
bash scripts/$m/infer.sh
bash scripts/$m/eval.sh
doneLogs are written to:
VLMEvalKit/scripts/<model_name>/infer.logVLMEvalKit/scripts/<model_name>/eval.log
Predictions and evaluation outputs are written under:
VLMEvalKit/outputs/<model_name>/
Train PRCO with the dual-role Observer/Solver framework:
Coming soon...If you find this project helpful, please cite our paper:
@article{miao2026seeing,
title={Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning},
author={Miao, Ziqi and Jia, Haonan and Li, Lijun and Qian, Chen and Xiong, Yuan and Yan, Wenting and Shao, Jing},
journal={arXiv preprint arXiv:2603.28618},
year={2026}
}This project is built around open multimodal reasoning research. We especially thank the open-source communities behind vLLM, EasyR1, and verl, which made this work possible.