📄 Paper | 🤗 Checkpoints | 📜 License
- Disentangled Visual Foresight automatically capture the latent actions that delineate the visual trajectory without overburdening the backbone.
- Progressive Training introduces modalities in stages, preserving the language understanding and reasoning capabilities of the VLM backbone.
- Adaptive Temporal Ensemble dynamically adjusts temporal ensembling strength, reducing inference cost while maintaining stable control.
More demos coming soon...
| Put the cup on the female singer |
Put the cup on the Marvel superhero |
Put the watch in the basket |
| Put the cup on Taylor Swift |
Put the cup on Iron Man |
Put a thing that can tell the time in the basket |
| Model | Note |
|---|---|
| Mantis-Base | Base Mantis model trained through the 3-stage pretraining pipeline |
| Mantis-SSV2 | Mantis model pretrained on the SSV2 dataset after Stage 1 |
| Mantis-LIBERO | Mantis model fine-tuned on the LIBERO dataset |
| Dataset | Note |
|---|---|
| Something-Something-v2 | The human action video dataset used in Stage 1 pretraining |
| DROID-Lerobot | The robot dataset used in Stage 2 & 3 pretraining |
| LLaVA-OneVision-1.5-Instruct-Data | The multimodal dataset used in Stage 3 pretraining |
| LIBERO-Lerobot | The LIBERO dataset used for fine-tuning |
First, clone the repository and create the conda environment:
git clone git@github.com:Yysrc/Mantis.git
cd Mantis
conda env create -f configs/environment_libero.yml
conda activate mantis_libero
Then clone and install the LIBERO repository:
git clone git@github.com:Lifelong-Robot-Learning/LIBERO.git
cd LIBERO
pip install -e .
Install other required packages:
cd ..
pip install -r experiments/libero/libero_requirements.txt
Evaluate the LIBERO benchmark:
sh experiments/libero/run_libero_eval.sh
Modify the task_suite_name parameter in the script to evaluate different task suites. Adjust the eval_mode parameter to switch between
Please first download the LIBERO datasets and the base Mantis model.
First, create the training conda environment:
conda env create -f configs/environment_lerobot.yml
conda activate mantis_lerobot
Then clone and install the Lerobot repository:
git clone -b paszea/lerobot git@github.com:Yysrc/lerobot.git
cd lerobot
conda install ffmpeg=7.1.1 -c conda-forge
pip install -e .
The configuration files are in the configs folder. Please update the dataset_root_dir to the LIBERO dataset directory and set resume_from_checkpoint to the path of the base Mantis model.
Train the Mantis model on the LIBERO dataset:
sh train.sh
Heartfelt thanks to the creators of Metaquery and Lerobot for their open-sourced work!
If you find our code or models useful in your work, please cite our paper:
@article{yang2025mantis,
title={Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight},
author={Yang, Yi and Li, Xueqi and Chen, Yiyang and Song, Jin and Wang, Yihan and Xiao, Zipeng and Su, Jiadi and Qiaoben, You and Liu, Pengfei and Deng, Zhijie},
journal={arXiv preprint arXiv:2511.16175},
year={2025}
}