An elegant and researcher-friendly RL library for Vision-Language-Action (VLA) models.
- Simple and clear implementation — cleanly separated policy, rollout, runner, and model layers with minimal abstraction; easy to read, modify, and extend for research purposes
- Dependency-decoupled architecture — model backends use separate uv projects, while benchmark environments run as independent ZMQ processes; this keeps base-model and simulator dependency conflicts out of the core library
- Async off-policy training — supports asynchronous off-policy training, enabling non-blocking data collection alongside model updates
| Category | Type | Supported |
|---|---|---|
| RL Algorithms | On-policy RL | PPO, GRPO |
| Off-policy RL | DSRL, RLT | |
| Model-based RL | VLA-MBPO | |
| Base Models | Flow-based VLA | π₀.₅ |
| Autoregressive VLA | OpenVLA-OFT | |
| Benchmarks | Simulation | LIBERO, ManiSkill, RoboTwin |
We use uv to manage Python dependencies. See the uv installation instructions to set it up. Once uv is installed, run the following to set up the environment:
git clone https://github.com/VLARLKit/VLARLKit.git
cd VLARLKit
uv sync
uv pip install -e .The core package intentionally does not depend on any base-model repository. Install the model backend you need for each experiment.
Each model backend runs from its own uv project with its own dependencies. This
keeps base-model repositories separate from the core package while still
training the model in the main torchrun process.
Install and prepare the backend you need:
# OpenPI
uv sync --project model_backends/openpi
uv run --project model_backends/openpi \
bash model_backends/openpi/scripts/apply_transformers_patch.sh
# OpenVLA-OFT
uv sync --project model_backends/openvla_oft
uv run --project model_backends/openvla_oft \
bash model_backends/openvla_oft/scripts/install_flash_attn.shThe environment client runs in a separate Python environment with its own dependencies. This avoids dependency conflicts between the simulator and the training stack.
Install scripts for each benchmark are located in the third_party/ directory. Run the one you need:
# LIBERO
bash third_party/install_libero.sh
# ManiSkill
bash third_party/install_maniskill.sh
# RoboTwin
bash third_party/new_install_robotwin.shRL process is typically performing on a SFT model. So you need to download such an SFT model first. We highly recommend you to use models from RLinf community. For the full benchmark-by-benchmark SFT and RL setup, see SFT Checkpoints and RL Settings.
hf download RLinf/RLinf-Pi05-LIBERO-SFT --local-dir <your local path>
# For ManiSkill SFT model:
# hf download RLinf/RLinf-Pi05-ManiSkill-25Main-SFT --local-dir <your local path>
# For RoboTwin SFT model:
# hf download RLinf/RLinf-Pi05-RoboTwin-SFT-adjust_bottle --local-dir <your local path>Then, change the model_path in config file (examples/configs/libero_spatial_ppo_pi05.yaml) to your path.
For example:
model:
model_path: "<your download path>/RLinf-Pi05-LIBERO-SFT"Now, you can lanuch the script to run!
bash examples/run_onpolicy_rl.shIf you want to have a try with our MBRL method (VLA-MBPO), please follow BAGEL-WM to setup envs and artifacts.
- Add ManiSkill benchmark support
- Add RoboTwin benchmark support
- Add GRPO algorithm support
- Add off-policy asynchronous training support
- Add OpenVLA-OFT base model support
- Add offline and model-based VLA methods support
We borrow some good designs from RLinf. The model integration and environment module implementations are primarily adapted from RLinf. We thank the RLinf team for their foundational work.
This project is licensed under the MIT License (see LICENSE file).
Some source files are derived from Apache-2.0 licensed projects. The original copyright notices are preserved in those files.
If you find VLARLKit useful in your research, please consider citing it:
@misc{vlarlkit2026,
title = {VLARLKit: An Elegant PyTorch VLA-RL Library},
author = {Yihao Sun},
year = {2026},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {https://github.com/VLARLKit/VLARLKit}
}