📖 Paper 🤗 SpaceR 📊 SpaceR-151k
📅 News
🚀 [07/06/2025] SpaceR-Eval now supports more models (e.g., Qwen2.5VL, InternVL, KimiVL, MiniCPM-V, VideoLLaMA3) and benchmarks (e.g., VSI-Bench, STI-Bench, SPAR-Bench, Video-MME, LongVideoBench, TempCompass, Video-Holmes). SpatialScore also supports SpaceR evaluation.
🚀 [06/04/2025] Our SpaceR model achieves 37.28% accuracy on VGBench and 53.72% accuracy on SpatialScore, representing the state-of-the-art performance among all 7B/8B models to date.
🚀 [05/29/2025] Our SpaceR achieves 35.2% accuracy on the new video reasoning benchmark Video-Holmes, beating the commercial model o4-mini (29.9%) and Gemini-2.0-Flash (30.6%).
🚀 [05/19/2025] We release SpaceR-151k dataset.
🚀 [05/10/2025] We release SpaceR checkpoint.
🚀 [04/29/2025] We release SR-91k dataset.
🚀 [04/10/2025] We update the training framework of SpaceR.
🚀 [04/02/2025] We share the paper SpaceR on arxiv.
🚀 [03/31/2025] We release evluation and training code.
The first MLLM empowered by SG-RLVR for video spatial reasoning
Data Statistics of SpaceR-151k
QA Examples of SR-91k
We curate SpaceR-151k dataset and propose SpaceR. It achieves promising gains in VSI-Bench, SPAR-Bench and STI-Bench. NOTE We have excluded videos used in VSI-Bench to prevent data leakage.
git clone https://github.com/OuyangKun10/SpaceR.git
cd SpaceR/SpaceR
# build environment
conda create -n SpaceR python=3.11
conda activate SpaceR
bash setup.sh
# qwen video extraction setting, e.g., max frames, resolutions
# Use the [decord] feature to improve speed
cd src/qwen-vl-utils
pip install -e .[decord]
cd ..Data Preparation:
-
Download SpaceR-151k dataset.
-
Decompress it
bash decompress.shTraining script for SpaceR
bash ./src/scripts/run_SpaceR_SG_RLVR.shSpaceR-Eval
Setup
- Environment: Python 3.8+, CUDA-enabled GPUs.
- Install Libraries:
pip install torch pandas numpy pillow accelerate transformers sentencepiece decord flash-attn --no-build-isolation
- Dataset: VSI-Bench STI-Bench, SPAR-Bench, Video-MME, TempCompass, LongVideoBench
Usage
python evaluate.pyCitation:
@article{ouyang2025spacer,
title={SpaceR: Reinforcing MLLMs in Video Spatial Reasoning},
author={Ouyang, Kun and Liu, Yuanxin and Wu, Haoning and Liu, Yi and Zhou, Hao and Zhou, Jie and Meng, Fandong and Sun, Xu},
journal={arXiv preprint arXiv:2504.01805},
year={2025}
}
- The code in this repo is released under the CC BY-NC 4.0 License.
- The usage of SpaceR-151k dataset and SpaceR model weights must strictly follow CC BY-NC 4.0 License.