Skip to content

OuyangKun10/SpaceR

Repository files navigation

SpaceR: Reinforcing MLLMs in Video Spatial Reasoning

📖 Paper 🤗 SpaceR 📊 SpaceR-151k

📅 News

🚀 [07/06/2025] SpaceR-Eval now supports more models (e.g., Qwen2.5VL, InternVL, KimiVL, MiniCPM-V, VideoLLaMA3) and benchmarks (e.g., VSI-Bench, STI-Bench, SPAR-Bench, Video-MME, LongVideoBench, TempCompass, Video-Holmes). SpatialScore also supports SpaceR evaluation.

🚀 [06/04/2025] Our SpaceR model achieves 37.28% accuracy on VGBench and 53.72% accuracy on SpatialScore, representing the state-of-the-art performance among all 7B/8B models to date.

🚀 [05/29/2025] Our SpaceR achieves 35.2% accuracy on the new video reasoning benchmark Video-Holmes, beating the commercial model o4-mini (29.9%) and Gemini-2.0-Flash (30.6%).

🚀 [05/19/2025] We release SpaceR-151k dataset.

🚀 [05/10/2025] We release SpaceR checkpoint.

🚀 [04/29/2025] We release SR-91k dataset.

🚀 [04/10/2025] We update the training framework of SpaceR.

🚀 [04/02/2025] We share the paper SpaceR on arxiv.

🚀 [03/31/2025] We release evluation and training code.

SpaceR

The first MLLM empowered by SG-RLVR for video spatial reasoning

🏆 Performance Comparison

Data Statistics of SpaceR-151k

QA Examples of SR-91k

We curate SpaceR-151k dataset and propose SpaceR. It achieves promising gains in VSI-Bench, SPAR-Bench and STI-Bench. NOTE We have excluded videos used in VSI-Bench to prevent data leakage.

Training

git clone https://github.com/OuyangKun10/SpaceR.git
cd SpaceR/SpaceR

# build environment
conda create -n SpaceR python=3.11 
conda activate SpaceR
bash setup.sh

# qwen video extraction setting, e.g., max frames, resolutions
# Use the [decord] feature to improve speed
cd src/qwen-vl-utils
pip install -e .[decord]
cd ..

Data Preparation:

  1. Download SpaceR-151k dataset.

  2. Decompress it

bash decompress.sh

Training script for SpaceR

bash ./src/scripts/run_SpaceR_SG_RLVR.sh

Evaluation

SpaceR-Eval

Setup

  1. Environment: Python 3.8+, CUDA-enabled GPUs.
  2. Install Libraries:
    pip install torch pandas numpy pillow accelerate transformers sentencepiece decord flash-attn --no-build-isolation
  3. Dataset: VSI-Bench STI-Bench, SPAR-Bench, Video-MME, TempCompass, LongVideoBench

Usage

python evaluate.py

Citation:

@article{ouyang2025spacer,
  title={SpaceR: Reinforcing MLLMs in Video Spatial Reasoning},
  author={Ouyang, Kun and Liu, Yuanxin and Wu, Haoning and Liu, Yi and Zhou, Hao and Zhou, Jie and Meng, Fandong and Sun, Xu},
  journal={arXiv preprint arXiv:2504.01805},
  year={2025}
}

License

  • The code in this repo is released under the CC BY-NC 4.0 License.
  • The usage of SpaceR-151k dataset and SpaceR model weights must strictly follow CC BY-NC 4.0 License.

About

SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published