Skip to content

LIBERO-PRO is the official repository of the LIBERO-PRO — an evaluation extension of the original LIBERO benchmark

License

Notifications You must be signed in to change notification settings

Zijian007/LIBERO-PRO

 
 

Repository files navigation

Tests Passing GitHub Contributors Issues

LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization

Xueyang Zhou1, Yangming Xu1, Guiyao Tie1, Yongchao Chen2,3, Guowen Zhang1, Duanfeng Chu4, Pan Zhou1, Lichao Sun5

Affiliations: 1Huazhong University of Science and Technology, 2Harvard University, 3Massachusetts Institute of Technology, 4Wuhan University of Technology, 5Lehigh University

[Paper] [Webpage] [Code]


pull_figure

We propose LIBERO-PRO—a plug-and-play benchmark built on the LIBERO—designed to offer a more comprehensive and flexible environment for assessing the generalization capabilities of models.​ LIBERO-PRO enables holistic robotic capability assessment via five core generalization dimensions, with rational combinatorial evaluation rules to ensure meaningful analysis:​

  • Object Perturbation: A new asset library for LIBERO’s four original tasks, created by modifying object appearance, size, and color, to test adaptation to object variations.​
  • Position Perturbation: Alternative spatial regions for manipulable objects (aligned with physical constraints/task definitions) to evaluate the model’s ability to handle position changes.​
  • Semantic Perturbation: Three paraphrased variants per task instruction to verify accuracy in understanding natural language semantic variations.​
  • Task Perturbation: Redesigned feasible task logics, with new object sets and target states, to examine adaptation to task paradigm changes.​
  • Environment Perturbation: Random cross-task substitution of LIBERO’s five built-in environments to test robustness across scenarios.

We do not intend to criticize or compare any specific VLA architectures. Instead, our goal is to call on the community to adopt more challenging and fair evaluation standards that can better promote genuine generalization and understanding in VLA models.

⚠️ Collapse of strategy

Model LIBERO-Goal LIBERO-Pro LIBERO-Spatial LIBERO-Pro LIBERO-10 LIBERO-Pro LIBERO-Object LIBERO-Pro
openvla
pi0
pi0.5
univla

🟦 Original 🟩 Position perturbation 🟧 Task perturbation
📉 All models collapse from >0.9 → ≈0.0 on LIBERO-Pro perturbations.

Welcome to join our wechat discussion group, we will answer any questions in real time, and also welcome more in-depth academic discussion.


Contents

Installtion

Clone the official LIBERO-PRO repository by run:

git clone https://github.com/Zxy-MLlab/LIBERO-PRO/

LIBERO-PRO is developed based on the original LIBERO benchmark, so it uses the same runtime environment as LIBERO—no separate environment configuration for LIBERO-PRO is needed. You only need to install the environment in accordance with LIBERO’s official requirements, as shown below:

Please run the following commands in the given order to install the dependency for LIBERO.

conda create -n libero python=3.8.13
conda activate libero
git clone https://github.com/Zxy-MLlab/LIBERO-PRO/LIBERO.git
cd LIBERO
pip install -r requirements.txt
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113

Then install the libero package:

pip install -e .

Datasets

We provide high-quality human teleoperation demonstrations for the four task suites in LIBERO. To download the demonstration dataset, run:

python benchmark_scripts/download_libero_datasets.py

By default, the dataset will be stored under the LIBERO folder and all four datasets will be downloaded. To download a specific dataset, use

python benchmark_scripts/download_libero_datasets.py --datasets DATASET

where DATASET is chosen from [libero_spatial, libero_object, libero_100, libero_goal.

NEW!!!

Alternatively, you can download the dataset from HuggingFace by using:

python benchmark_scripts/download_libero_datasets.py --use-huggingface

This option can also be combined with the specific dataset selection:

python benchmark_scripts/download_libero_datasets.py --datasets DATASET --use-huggingface

The datasets hosted on HuggingFace are available at here.

LIBERO-PRO Evaluation

To specify single-type or combined-type generalization evaluation, you only need to modify the evaluation_config.yaml configuration file in the project directory. The core configuration parameters and their functions are as follows:

Please modify the path in evaluation_config.yaml to the absolute path of your project before the evaluation. In evaluation_config.yaml, adjust the boolean values ( true/false ) of the following parameters to enable or disable specific generalization evaluation types:

Parameter Function
use_environment Enable (true) or disable (false) environment generalization evaluation
use_swap Enable (true) or disable (false) position generalization evaluation
use_object Enable (true) or disable (false) object generalization evaluation
use_language Enable (true) or disable (false) semantic (language) generalization evaluation
use_task Enable (true) or disable (false) task generalization evaluation

Note: to avoid meaningless evaluation results, task generalization (use_task: true) cannot be combined with any other generalization types.

Below is a reference code snippet for conducting LIBERO-PRO generalization evaluation on OpenVLA. Please place LIBERO-PRO in the following directory:

# 📁 openvla-oft-main
.
├── .idea/
├── experiments/
│   └── robot/
│       ├── aloha/
│       └── libero/
│           ├── experiments/
│           ├── LIBERO-PRO/ 
│           ├── libero_utils.py
│           ├── regenerate_libero_dataset.py
│           ├── run_libero_eval.py
│           ├── sample_libero_spatial_observation.pkl
│           ├── openvla_utils.py
│           └── robot_utils.py

Before evaluating, modify the run_libero_eval.py code to adapt to LIBERO-RPO:

from LIBERO-PRO import perturbation

# Register for temporary evaluation tasks
class TaskSuite(str, Enum):
  ...
  LIBERO_GOAL_TEMP = "libero_goal_temp"
  LIBERO_SPATIAL_TEMP = "libero_spatial_temp"
  LIBERO_10_TEMP = "libero_10_temp"
  LIBERO_OBJECT_TEMP = "libero_object_temp"

TASK_MAX_STEPS = {
  ...
  TaskSuite.LIBERO_GOAL_TEMP: 300,
  TaskSuite.LIBERO_SPATIAL_TEMP: 220,
  TaskSuite.LIBERO_10_TEMP: 520,
  TaskSuite.LIBERO_OBJECT_TEMP: 280,
}

# Modify this line
def check_unnorm_key(cfg: GenerateConfig, model) -> None:
  ...
  unnorm_key = cfg.unnorm_key
  ...

# Modify this line
def eval_libero(cfg: GenerateConfig) -> float:
  ...
  with open(cfg.evaluation_config_path, "r", encoding="utf-8") as f:
    evaluation_cfg = yaml.safe_load(f)
  
  evaluation_cfg["bddl_files_path"] = evaluation_cfg.get("bddl_files_path", "") + "/" + cfg.task_suite_name
  evaluation_cfg["task_suite_name"] = cfg.task_suite_name
  
  if not os.path.exists(evaluation_cfg.get("init_file_dir", "") + cfg.task_suite_name + "_temp/"):
    perturbation.create_env(
      configs=evaluation_cfg,
    )
  
  cfg.task_suite_name = cfg.task_suite_name + "_temp"
  ...

Note!!!

For unknown reasons, in some cases replacing the environment will cause the objects on the table to move randomly. After many tests, replacing the environment with 'main_table' works and we are actively in contact with the authors of LIBERO to fix this issue.

Citation

If you use LIBERO-PRO in your research, please cite both the original LIBERO benchmark (as LIBERO-PRO is fully built upon it) and the LIBERO-PRO paper:

Cite LIBERO

@article{liu2023libero,
  title={LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning},
  author={Liu, Bo and Zhu, Yifeng and Gao, Chongkai and Feng, Yihao and Liu, Qiang and Zhu, Yuke and Stone, Peter},
  journal={arXiv preprint arXiv:2306.03310},
  year={2023}
}

Cite LIBERO-PRO

@article{2025liberpro,
  title={LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization},
  author={Xueyang Zhou and Yangming Xu and Guiyao Tie and Yongchao Chen and Guowen Zhang and Duanfeng Chu and Pan Zhou and Lichao Sun},
  journal={[arXiv preprint arXiv:2510.03827]},
  year={2025},
  publisher={[Publisher]} / eprint={[arXiv ID]}
}

License

Component License
Codebase MIT License
Datasets Creative Commons Attribution 4.0 International (CC BY 4.0)

About

LIBERO-PRO is the official repository of the LIBERO-PRO — an evaluation extension of the original LIBERO benchmark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 71.8%
  • Python 28.2%