Contains a robotics friendly implementation for the REPPO algorithm
A fast and simple implementation of learning algorithms for robotics. For an overview of the library please have a look at https://arxiv.org/pdf/2509.10771.
Environment repositories using the framework:
Isaac Lab(built on top of NVIDIA Isaac Sim): https://github.com/isaac-sim/IsaacLabLegged Gym(built on top of NVIDIA Isaac Gym): https://leggedrobotics.github.io/legged_gym/MuJoCo Playground(built on top of MuJoCo MJX and Warp): https://github.com/google-deepmind/mujoco_playground/mjlab(built on top of MuJoCo Warp): https://github.com/mujocolab/mjlab
The library currently supports PPO and Student-Teacher Distillation with additional features from our research. These include:
- Random Network Distillation (RND) - Encourages exploration by adding a curiosity driven intrinsic reward.
- Symmetry-based Augmentation - Makes the learned behaviors more symmetrical.
We welcome contributions from the community. Please check our contribution guidelines for more information.
Maintainer: Mayank Mittal and Clemens Schwarke, REPPO maintainer: Claas Voelcker
Affiliation: Robotic Systems Lab, ETH Zurich & NVIDIA
Contact: cschwarke@ethz.ch
[2026/02/17] Bugs should be cleared and we are releasing some training tipps for REPPO + Isaac [2026/01/30] We found two major bugs in the implementation which are currently being fixed
The package can be installed via PyPI with:
pip install rsl-rl-libor by cloning this repository and installing it with:
git clone https://github.com/leggedrobotics/rsl_rl
cd rsl_rl
pip install -e .The package supports the following logging frameworks which can be configured through logger:
- Tensorboard: https://www.tensorflow.org/tensorboard/
- Weights & Biases: https://wandb.ai/site
- Neptune: https://docs.neptune.ai/
For a demo configuration of PPO, please check the example_config.yaml file.
Max entropy RL algorithms struggle on some Isaac tasks, as the action space is not bounded by -1 to 1. We have provided some simple utilities for scaling the tanh Gaussian between arbitrary lower and upper action bounds per dimension. This can be accomplished by setting the action lower and upper bounds in the actor. We have found that scaling the actions in the actor instead of scaling them in the environment wrapper leads to more stable training.
It is also possible to use a non-squashed Normal distribution. In this case, the initial temperature alpha needs to be set lower to prevent the entropy bonus term from diverging. In addition, the reward is tuned for significantly smaller step sizes, as the standard RSL-RL PPO implementation constrains the KL deiation per update iteration to 0.01. Setting the desired_kl parameter for REPPO to this value can stabilize learning, especially when using non-squashed distributions.
Finally, we observed that the curricula used in some Isaac environments have a strong influence on REPPO performance. In general, REPPO learns a strong policy much faster than PPO, but struggles to adapt to the distribution shift introduced by the curriculum.
For documentation, we adopt the Google Style Guide for docstrings. Please make sure that your code is well-documented and follows the guidelines.
We use the following tools for maintaining code quality:
- pre-commit: Runs a list of formatters and linters over the codebase.
- ruff: An extremely fast Python linter and code formatter, written in Rust.
Please check here for instructions to set these up. To run over the entire repository, please execute the following command in the terminal:
# for installation (only once)
pre-commit install
# for running
pre-commit run --all-filesIf you use this library for your research, please cite the following work:
@article{schwarke2025rslrl,
title={RSL-RL: A Learning Library for Robotics Research},
author={Schwarke, Clemens and Mittal, Mayank and Rudin, Nikita and Hoeller, David and Hutter, Marco},
journal={arXiv preprint arXiv:2509.10771},
year={2025}
}
If you use the library with curiosity-driven exploration (random network distillation), please cite:
@InProceedings{schwarke2023curiosity,
title = {Curiosity-Driven Learning of Joint Locomotion and Manipulation Tasks},
author = {Schwarke, Clemens and Klemm, Victor and Boon, Matthijs van der and Bjelonic, Marko and Hutter, Marco},
booktitle = {Proceedings of The 7th Conference on Robot Learning},
pages = {2594--2610},
year = {2023},
volume = {229},
series = {Proceedings of Machine Learning Research},
publisher = {PMLR},
url = {https://proceedings.mlr.press/v229/schwarke23a.html},
}
If you use the library with symmetry augmentation, please cite:
@InProceedings{mittal2024symmetry,
author={Mittal, Mayank and Rudin, Nikita and Klemm, Victor and Allshire, Arthur and Hutter, Marco},
booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
title={Symmetry Considerations for Learning Task Symmetric Robot Policies},
year={2024},
pages={7433-7439},
doi={10.1109/ICRA57147.2024.10611493}
}