MVP : Minimal Video Pairs

A shortcut-aware benchmark for spatio-temporal and intuitive physics video understanding (VideoQA) using minimally different video pairs.

Getting started

Environment Setup

To enable reproducible evaluation, we utilize the lmms-eval library, which is referenced as a submodule. First clone this repo using the flag --recurse-submodules which will automatically setup the required sumbodules. Alternatively, you will need to manually run git sumbodule init after cloning.

Next, navigate into the root directory of the repository and create your conda environment,

make .env.init

Data

The annotations are released at facebook/minimal_video_pairs on Huggingface Datasets. We provide scripts in this repository for downloading the videos in Makefile, make sure to go and accept all data license requirements for each data source before attempting to download. Next log into Huggingface (this is needed to download the Vinoground subset)

huggingface-cli login

Now you can download all videos from their original data sources,

make download_videos

This will create a videos folder with 9 subfolders for different data sources which are used to create the subsets:

Subset	Data sources
Human object interactions	PerceptionTest, SomethingSomethingV2
Robot object interactions	Language Table
Intuitive Physics and collisions	IntPhys, InfLevel, GRASP, CLEVRER
Temporal Reasoning	STAR, Vinoground

Run evaluation

As previously mentioned, we utilize the lmms-eval library to enable reproducible evaluation. We have provided the task files you need to run mvp and mvp_mini, mvp_mini is essentially a smaller, balanced evaluation set with 9k examples for enabling faster evaluations.

To run the evals:

Copy the task files at tasks/mvp folder files to lmms-evals/lmms_evals/tasks/
Ensure videos are downloaded in videos folder in root of this repository
Run the evaluations with task names mvp and mvp_mini. You can also run individual subsets.

We primarily report paired_accuracy. An example in mvp consists two QA examples, with identical question and answer options (A or B) but the video is differrent and the correct option is different (A is correct for video1 and B for video2). For paired_accuracy, a model only gets a correct score (+1) if it gets both questions correct.

Leaderboard submission

We have setup a leaderboard as part of Physical World Models release from FAIR on Huggingface: Physical Reasoning Leaderboard. To submit the results of your model on our leaderboard, combine the mvp_[mini]_{task}.jsonl in ./logs/{model} folder and upload with the specifics of your run.

cat submissions/mvp_*.jsonl > mvp_submission.jsonl

Citation and acknowledgements

We are grateful to the many open-source datasets on top of which we built our benchmark: Perception Test, Something Something v2, CLEVRER, Language Table, IntPhys, InfLevel, GRASP, STAR, Vinoground.

If you find this repository useful in your research, please consider giving a star ⭐ and a citation, and make sure to cite the original video data sources referenced above as well

@article{krojer2025shortcut,
  title={A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs}
  author={Benno Krojer and Mojtaba Komeili and Candace Ross and Quentin Garrido and Koustuv Sinha and Nicolas Ballas and Mahmoud Assran},
  journal={arXiv},
  year={2025}
}

License

We release this benchmark under the LICENSE file in the root directory of this source tree.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
lmms-eval @ a7b7c8d		lmms-eval @ a7b7c8d
models		models
setup		setup
tasks/mvp		tasks/mvp
.gitmodules		.gitmodules
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
overview.png		overview.png
reproducibility.md		reproducibility.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MVP : Minimal Video Pairs

Getting started

Environment Setup

Data

Run evaluation

Leaderboard submission

Citation and acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

facebookresearch/minimal_video_pairs

Folders and files

Latest commit

History

Repository files navigation

MVP : Minimal Video Pairs

Getting started

Environment Setup

Data

Run evaluation

Leaderboard submission

Citation and acknowledgements

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages