SpaceR: Reinforcing MLLMs in Video Spatial Reasoning

📅 News

🚀 [07/06/2025] SpaceR-Eval now supports more models (e.g., Qwen2.5VL, InternVL, KimiVL, MiniCPM-V, VideoLLaMA3) and benchmarks (e.g., VSI-Bench, STI-Bench, SPAR-Bench, Video-MME, LongVideoBench, TempCompass, Video-Holmes). SpatialScore also supports SpaceR evaluation.

🚀 [06/04/2025] Our SpaceR model achieves 37.28% accuracy on VGBench and 53.72% accuracy on SpatialScore, representing the state-of-the-art performance among all 7B/8B models to date.

🚀 [05/29/2025] Our SpaceR achieves 35.2% accuracy on the new video reasoning benchmark Video-Holmes, beating the commercial model o4-mini (29.9%) and Gemini-2.0-Flash (30.6%).

🚀 [05/19/2025] We release SpaceR-151k dataset.

🚀 [05/10/2025] We release SpaceR checkpoint.

🚀 [04/29/2025] We release SR-91k dataset.

🚀 [04/10/2025] We update the training framework of SpaceR.

🚀 [04/02/2025] We share the paper SpaceR on arxiv.

🚀 [03/31/2025] We release evluation and training code.

SpaceR

The first MLLM empowered by SG-RLVR for video spatial reasoning

🏆 Performance Comparison

Data Statistics of SpaceR-151k

QA Examples of SR-91k

We curate SpaceR-151k dataset and propose SpaceR. It achieves promising gains in VSI-Bench, SPAR-Bench and STI-Bench. NOTE We have excluded videos used in VSI-Bench to prevent data leakage.

Training

git clone https://github.com/OuyangKun10/SpaceR.git
cd SpaceR/SpaceR

# build environment
conda create -n SpaceR python=3.11 
conda activate SpaceR
bash setup.sh

# qwen video extraction setting, e.g., max frames, resolutions
# Use the [decord] feature to improve speed
cd src/qwen-vl-utils
pip install -e .[decord]
cd ..

Data Preparation:

Download SpaceR-151k dataset.
Decompress it

bash decompress.sh

Training script for SpaceR

bash ./src/scripts/run_SpaceR_SG_RLVR.sh

Evaluation

SpaceR-Eval

Setup

Environment: Python 3.8+, CUDA-enabled GPUs.

Install Libraries:

pip install torch pandas numpy pillow accelerate transformers sentencepiece decord flash-attn --no-build-isolation

Dataset: VSI-Bench STI-Bench, SPAR-Bench, Video-MME, TempCompass, LongVideoBench

Usage

python evaluate.py

Citation:

@article{ouyang2025spacer,
  title={SpaceR: Reinforcing MLLMs in Video Spatial Reasoning},
  author={Ouyang, Kun and Liu, Yuanxin and Wu, Haoning and Liu, Yi and Zhou, Hao and Zhou, Jie and Meng, Fandong and Sun, Xu},
  journal={arXiv preprint arXiv:2504.01805},
  year={2025}
}

License

The code in this repo is released under the CC BY-NC 4.0 License.
The usage of SpaceR-151k dataset and SpaceR model weights must strictly follow CC BY-NC 4.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
SpaceR-Eval		SpaceR-Eval
SpaceR-SG-RLVR		SpaceR-SG-RLVR
figure		figure
LICENSE		LICENSE
README.md		README.md
SpaceR_Preprint.pdf		SpaceR_Preprint.pdf
decompress.sh		decompress.sh
exclude_list.txt		exclude_list.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SpaceR: Reinforcing MLLMs in Video Spatial Reasoning

SpaceR

Training

Evaluation

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

OuyangKun10/SpaceR

Folders and files

Latest commit

History

Repository files navigation

SpaceR: Reinforcing MLLMs in Video Spatial Reasoning

SpaceR

Training

Evaluation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages