Discrete RSL RL

Discrete RSL RL

Customized version of the original RSL RL project that additionally supports multi discrete action spaces, providing a fast and simple implementation of PPO algorithm, designed to run fully on GPU.

Overview

Motivation

RSL-RL is a lightweight, GPU-native RL library designed for fast, high-throughput continuous control, but it originally only supported continuous actions, limiting its applicability. Many real-world tasks, like maintenance scheduling or resource allocation, require multi-discrete action spaces, where the agent makes independent categorical choices per branch. This project extended RSL-RL to handle multi-discrete actions while preserving its continuous control performance, keeping API changes minimal, and validating results against Stable Baselines 3. With this update, RSL-RL can now tackle a broader range of decision-making and optimization problems, all while maintaining GPU-accelerated speed.

Implementation Details

The core of this project was extending the RSL-RL codebase to natively support multi-discrete actions while keeping continuous control fully functional. You can see all the modifications in my commit here.

Here’s a high-level overview of the changes:

Actor-Critic Module. The ActorCritic class was refactored to handle both continuous and multi-discrete distributions. For continuous actions, it still outputs mean and standard deviation vectors. For multi-discrete actions, it now outputs concatenated logits per branch and manages sampling, evaluation, and log-probability calculations appropriately. This allows PPO and other algorithms to work seamlessly with either action type.
Rollout Storage. Rollout storage was adapted to store logits for multi-discrete actions instead of continuous action vectors. This includes modifications to add_transitions and mini-batch generators to maintain a uniform interface for both types of action spaces. The storage still resides fully on GPU for maximum speed.
PPO Runner. The on-policy runner was updated to correctly initialize the algorithm with the proper shapes for multi-discrete actions. It now dynamically handles action type selection from the environment configuration without breaking the continuous action workflow.

Key Points:

The implementation maintains full GPU execution, no matter if actions are continuous or multi-discrete.
Continuous action benchmarks remain unaffected, ensuring no regressions.
The API changes are minimal, keeping the library intuitive and easy to use.

These updates make RSL-RL more versatile, opening the door to a broader class of reinforcement learning problems without sacrificing speed or simplicity.

Technical Details

Below are instructions to set up the environment, run training, and visualize results.

Environment Configuration

First, create and activate the Conda environment, then install dependencies:

conda create -n rsl_rl python=3.11
conda activate rsl_rl
pip install -r requirements

This ensures that all necessary libraries for RSL-RL, Stable Baselines 3, and GPU execution are available.

Validation

We validate the multi-discrete and continuous implementations through training and evaluation on representative benchmarks, ensuring full GPU execution and no regressions on continuous tasks.

Multi Discrete Action Space - Maintenance Scheduling Optimization

Environment: Custom maintenance scheduling optimization problem, where the agent decides which machine to service at each timestep over a 1-year simulation.
Action Space: Multi-discrete choice.
Goal: Test correctness and efficiency of multi-discrete support in RSL-RL.
Validation: Compared performance and training curves with SB3’s multi-discrete PPO implementation.

Train agents:

python training_rsl_multidiscrete.py # RSL Training
python training_sb3_multidiscrete.py # Stable Baselines 3 Training

After training, evaluate the trained models:

python evaluate_multidiscrete.py

Results are visualized below. On the left, you can see the training curves comparing RSL and SB3, while on the right is the evaluation of both trained model on the scheduling task.


MultiDiscrete Training Curve RSL VS SB3	Trained Model Evaluation

Continuous Action Space - Robotics Legged Locomotion

Environment: Genesis simulator with a Unitree Go2 quadruped robot.
Action Space: Fully continuous joint controls.
Goal: Evaluate if RSL-RL still matches or exceeds baseline performance in standard continuous locomotion tasks.
Validation: Compared training curves with original RSL-RL results and evaluation runs with Stable Baselines 3 (SB3) using PPO.

Traing agents:

python training_rsl_continuous.py # RSL Training
python training_sb3_continuous.py # Stable Baselines 3 Training

Evaluate the trained models:

python evaluate_continuous.py

The table below shows the learning curves (left) and evaluation of the Unitree Go2 locomotion task for both RSL and SB3 (center and right). RSL-RL maintains performance parity while running fully on GPU.


MultiDiscrete Training Curve RSL VS SB3	Trained Model Evaluation - RSL	Trained Model Evaluation - SB3

Tensorboard Visualization

For interactive monitoring of training metrics, you can launch Tensorboard:

tensorboard --logdir runs/

This will allow you to explore rewards, losses, and other key statistics in real-time.

Citation

@misc{palmas2025discrete_rsl_rl,
  author = {Alessandro Palmas},
  title = {Discrete RSL RL},
  year = {2025},
  url = {https://github.com/alexpalms/discrete_rsl_rl},
  note = {GitHub repository}
}

RSL RL (Original Readme)

A fast and simple implementation of RL algorithms, designed to run fully on GPU. This code is an evolution of rl-pytorch provided with NVIDIA's Isaac Gym.

Environment repositories using the framework:

Isaac Lab (built on top of NVIDIA Isaac Sim): https://github.com/isaac-sim/IsaacLab
Legged-Gym (built on top of NVIDIA Isaac Gym): https://leggedrobotics.github.io/legged_gym/

The main branch supports PPO and Student-Teacher Distillation with additional features from our research. These include:

Random Network Distillation (RND) - Encourages exploration by adding a curiosity driven intrinsic reward.
Symmetry-based Augmentation - Makes the learned behaviors more symmetrical.

We welcome contributions from the community. Please check our contribution guidelines for more information.

Maintainer: Mayank Mittal and Clemens Schwarke
Affiliation: Robotic Systems Lab, ETH Zurich & NVIDIA
Contact: cschwarke@ethz.ch

Note: The algorithms branch supports additional algorithms (SAC, DDPG, DSAC, and more). However, it isn't currently actively maintained.

Setup

The package can be installed via PyPI with:

pip install rsl-rl-lib

or by cloning this repository and installing it with:

git clone https://github.com/leggedrobotics/rsl_rl
cd rsl_rl
pip install -e .

The package supports the following logging frameworks which can be configured through logger:

Tensorboard: https://www.tensorflow.org/tensorboard/
Weights & Biases: https://wandb.ai/site
Neptune: https://docs.neptune.ai/

For a demo configuration of PPO, please check the example_config.yaml file.

Contribution Guidelines

For documentation, we adopt the Google Style Guide for docstrings. Please make sure that your code is well-documented and follows the guidelines.

We use the following tools for maintaining code quality:

pre-commit: Runs a list of formatters and linters over the codebase.
black: The uncompromising code formatter.
flake8: A wrapper around PyFlakes, pycodestyle, and McCabe complexity checker.

Please check here for instructions to set these up. To run over the entire repository, please execute the following command in the terminal:

# for installation (only once)
pre-commit install
# for running
pre-commit run --all-files

Citing

We are working on writing a white paper for this library. Until then, please cite the following work if you use this library for your research:

@InProceedings{rudin2022learning,
  title = 	 {Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning},
  author =       {Rudin, Nikita and Hoeller, David and Reist, Philipp and Hutter, Marco},
  booktitle = 	 {Proceedings of the 5th Conference on Robot Learning},
  pages = 	 {91--100},
  year = 	 {2022},
  volume = 	 {164},
  series = 	 {Proceedings of Machine Learning Research},
  publisher =    {PMLR},
  url = 	 {https://proceedings.mlr.press/v164/rudin22a.html},
}

If you use the library with curiosity-driven exploration (random network distillation), please cite:

@InProceedings{schwarke2023curiosity,
  title = 	 {Curiosity-Driven Learning of Joint Locomotion and Manipulation Tasks},
  author =       {Schwarke, Clemens and Klemm, Victor and Boon, Matthijs van der and Bjelonic, Marko and Hutter, Marco},
  booktitle = 	 {Proceedings of The 7th Conference on Robot Learning},
  pages = 	 {2594--2610},
  year = 	 {2023},
  volume = 	 {229},
  series = 	 {Proceedings of Machine Learning Research},
  publisher =    {PMLR},
  url = 	 {https://proceedings.mlr.press/v229/schwarke23a.html},
}

If you use the library with symmetry augmentation, please cite:

@InProceedings{mittal2024symmetry,
  author={Mittal, Mayank and Rudin, Nikita and Klemm, Victor and Allshire, Arthur and Hutter, Marco},
  booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
  title={Symmetry Considerations for Learning Task Symmetric Robot Policies},
  year={2024},
  pages={7433-7439},
  doi={10.1109/ICRA57147.2024.10611493}
}

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github		.github
config		config
examples		examples
licenses/dependencies		licenses/dependencies
rsl_rl		rsl_rl
runs		runs
sb3		sb3
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE		LICENSE
README.md		README.md
evaluation_continuous.py		evaluation_continuous.py
evaluation_multidiscrete.py		evaluation_multidiscrete.py
pyproject.toml		pyproject.toml
requirements		requirements
setup.py		setup.py
training_rsl_continuous.py		training_rsl_continuous.py
training_rsl_multidiscrete.py		training_rsl_multidiscrete.py
training_sb3_continuous.py		training_sb3_continuous.py
training_sb3_multidiscrete.py		training_sb3_multidiscrete.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Discrete RSL RL

Overview

Motivation

Implementation Details

Technical Details

Environment Configuration

Validation

Multi Discrete Action Space - Maintenance Scheduling Optimization

Continuous Action Space - Robotics Legged Locomotion

Tensorboard Visualization

Citation

RSL RL (Original Readme)

Setup

Contribution Guidelines

Citing

About

Uh oh!

Languages

License

alexpalms/discrete_rsl_rl

Folders and files

Latest commit

History

Repository files navigation

Discrete RSL RL

Overview

Motivation

Implementation Details

Technical Details

Environment Configuration

Validation

Multi Discrete Action Space - Maintenance Scheduling Optimization

Continuous Action Space - Robotics Legged Locomotion

Tensorboard Visualization

Citation

RSL RL (Original Readme)

Setup

Contribution Guidelines

Citing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages