Skip to content

Customized version of the original RSL RL project that additionally supports multi discrete action spaces, providing a fast and simple implementation of PPO algorithm, designed to run fully on GPU.

License

Notifications You must be signed in to change notification settings

alexpalms/discrete_rsl_rl

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

multidiscrete_training continuous_training

Blog Post

Discrete RSL RL

Customized version of the original RSL RL project that additionally supports multi discrete action spaces, providing a fast and simple implementation of PPO algorithm, designed to run fully on GPU.

Overview

Motivation

RSL-RL is a lightweight, GPU-native RL library designed for fast, high-throughput continuous control, but it originally only supported continuous actions, limiting its applicability. Many real-world tasks, like maintenance scheduling or resource allocation, require multi-discrete action spaces, where the agent makes independent categorical choices per branch. This project extended RSL-RL to handle multi-discrete actions while preserving its continuous control performance, keeping API changes minimal, and validating results against Stable Baselines 3. With this update, RSL-RL can now tackle a broader range of decision-making and optimization problems, all while maintaining GPU-accelerated speed.

Implementation Details

The core of this project was extending the RSL-RL codebase to natively support multi-discrete actions while keeping continuous control fully functional. You can see all the modifications in my commit here.

Here’s a high-level overview of the changes:

  • Actor-Critic Module. The ActorCritic class was refactored to handle both continuous and multi-discrete distributions. For continuous actions, it still outputs mean and standard deviation vectors. For multi-discrete actions, it now outputs concatenated logits per branch and manages sampling, evaluation, and log-probability calculations appropriately. This allows PPO and other algorithms to work seamlessly with either action type.

  • Rollout Storage. Rollout storage was adapted to store logits for multi-discrete actions instead of continuous action vectors. This includes modifications to add_transitions and mini-batch generators to maintain a uniform interface for both types of action spaces. The storage still resides fully on GPU for maximum speed.

  • PPO Runner. The on-policy runner was updated to correctly initialize the algorithm with the proper shapes for multi-discrete actions. It now dynamically handles action type selection from the environment configuration without breaking the continuous action workflow.

Key Points:

  • The implementation maintains full GPU execution, no matter if actions are continuous or multi-discrete.
  • Continuous action benchmarks remain unaffected, ensuring no regressions.
  • The API changes are minimal, keeping the library intuitive and easy to use.

These updates make RSL-RL more versatile, opening the door to a broader class of reinforcement learning problems without sacrificing speed or simplicity.

Technical Details

Below are instructions to set up the environment, run training, and visualize results.

Environment Configuration

First, create and activate the Conda environment, then install dependencies:

conda create -n rsl_rl python=3.11
conda activate rsl_rl
pip install -r requirements

This ensures that all necessary libraries for RSL-RL, Stable Baselines 3, and GPU execution are available.

Validation

We validate the multi-discrete and continuous implementations through training and evaluation on representative benchmarks, ensuring full GPU execution and no regressions on continuous tasks.

Multi Discrete Action Space - Maintenance Scheduling Optimization

  • Environment: Custom maintenance scheduling optimization problem, where the agent decides which machine to service at each timestep over a 1-year simulation.
  • Action Space: Multi-discrete choice.
  • Goal: Test correctness and efficiency of multi-discrete support in RSL-RL.
  • Validation: Compared performance and training curves with SB3’s multi-discrete PPO implementation.

Train agents:

python training_rsl_multidiscrete.py # RSL Training
python training_sb3_multidiscrete.py # Stable Baselines 3 Training

After training, evaluate the trained models:

python evaluate_multidiscrete.py

Results are visualized below. On the left, you can see the training curves comparing RSL and SB3, while on the right is the evaluation of both trained model on the scheduling task.

multidiscrete_training multidiscrete_results
MultiDiscrete Training Curve RSL VS SB3 Trained Model Evaluation

Continuous Action Space - Robotics Legged Locomotion

  • Environment: Genesis simulator with a Unitree Go2 quadruped robot.
  • Action Space: Fully continuous joint controls.
  • Goal: Evaluate if RSL-RL still matches or exceeds baseline performance in standard continuous locomotion tasks.
  • Validation: Compared training curves with original RSL-RL results and evaluation runs with Stable Baselines 3 (SB3) using PPO.

Traing agents:

python training_rsl_continuous.py # RSL Training
python training_sb3_continuous.py # Stable Baselines 3 Training

Evaluate the trained models:

python evaluate_continuous.py

The table below shows the learning curves (left) and evaluation of the Unitree Go2 locomotion task for both RSL and SB3 (center and right). RSL-RL maintains performance parity while running fully on GPU.

continuous_training continuous_rsl_results continuous_sb3_results
MultiDiscrete Training Curve RSL VS SB3 Trained Model Evaluation - RSL Trained Model Evaluation - SB3

Tensorboard Visualization

For interactive monitoring of training metrics, you can launch Tensorboard:

tensorboard --logdir runs/

This will allow you to explore rewards, losses, and other key statistics in real-time.

Citation

@misc{palmas2025discrete_rsl_rl,
  author = {Alessandro Palmas},
  title = {Discrete RSL RL},
  year = {2025},
  url = {https://github.com/alexpalms/discrete_rsl_rl},
  note = {GitHub repository}
}

RSL RL (Original Readme)

A fast and simple implementation of RL algorithms, designed to run fully on GPU. This code is an evolution of rl-pytorch provided with NVIDIA's Isaac Gym.

Environment repositories using the framework:

The main branch supports PPO and Student-Teacher Distillation with additional features from our research. These include:

We welcome contributions from the community. Please check our contribution guidelines for more information.

Maintainer: Mayank Mittal and Clemens Schwarke
Affiliation: Robotic Systems Lab, ETH Zurich & NVIDIA
Contact: cschwarke@ethz.ch

Note: The algorithms branch supports additional algorithms (SAC, DDPG, DSAC, and more). However, it isn't currently actively maintained.

Setup

The package can be installed via PyPI with:

pip install rsl-rl-lib

or by cloning this repository and installing it with:

git clone https://github.com/leggedrobotics/rsl_rl
cd rsl_rl
pip install -e .

The package supports the following logging frameworks which can be configured through logger:

For a demo configuration of PPO, please check the example_config.yaml file.

Contribution Guidelines

For documentation, we adopt the Google Style Guide for docstrings. Please make sure that your code is well-documented and follows the guidelines.

We use the following tools for maintaining code quality:

  • pre-commit: Runs a list of formatters and linters over the codebase.
  • black: The uncompromising code formatter.
  • flake8: A wrapper around PyFlakes, pycodestyle, and McCabe complexity checker.

Please check here for instructions to set these up. To run over the entire repository, please execute the following command in the terminal:

# for installation (only once)
pre-commit install
# for running
pre-commit run --all-files

Citing

We are working on writing a white paper for this library. Until then, please cite the following work if you use this library for your research:

@InProceedings{rudin2022learning,
  title = 	 {Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning},
  author =       {Rudin, Nikita and Hoeller, David and Reist, Philipp and Hutter, Marco},
  booktitle = 	 {Proceedings of the 5th Conference on Robot Learning},
  pages = 	 {91--100},
  year = 	 {2022},
  volume = 	 {164},
  series = 	 {Proceedings of Machine Learning Research},
  publisher =    {PMLR},
  url = 	 {https://proceedings.mlr.press/v164/rudin22a.html},
}

If you use the library with curiosity-driven exploration (random network distillation), please cite:

@InProceedings{schwarke2023curiosity,
  title = 	 {Curiosity-Driven Learning of Joint Locomotion and Manipulation Tasks},
  author =       {Schwarke, Clemens and Klemm, Victor and Boon, Matthijs van der and Bjelonic, Marko and Hutter, Marco},
  booktitle = 	 {Proceedings of The 7th Conference on Robot Learning},
  pages = 	 {2594--2610},
  year = 	 {2023},
  volume = 	 {229},
  series = 	 {Proceedings of Machine Learning Research},
  publisher =    {PMLR},
  url = 	 {https://proceedings.mlr.press/v229/schwarke23a.html},
}

If you use the library with symmetry augmentation, please cite:

@InProceedings{mittal2024symmetry,
  author={Mittal, Mayank and Rudin, Nikita and Klemm, Victor and Allshire, Arthur and Hutter, Marco},
  booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
  title={Symmetry Considerations for Learning Task Symmetric Robot Policies},
  year={2024},
  pages={7433-7439},
  doi={10.1109/ICRA57147.2024.10611493}
}

About

Customized version of the original RSL RL project that additionally supports multi discrete action spaces, providing a fast and simple implementation of PPO algorithm, designed to run fully on GPU.

Topics

Resources

License

Stars

Watchers

Forks

Languages

  • Python 100.0%