RoTO is a reinforcement learning benchmark environment designed to standardise and promote future research in tactile-based manipulation. The reason we made this is because tactile RL is hard! We are dealing with a trifecta of manipulation, on-policy RL, and ML-unfriendly tactile data. By open-sourcing our environments and robustly tuned baselines, we hope to reduce the barrier to entry and enable researchers to prioritise fundamental algorithmic challenges over tedious RL tuning. We will continue to add more environments and strongly welcome contributions 🤗
- 5 robot embodiments: 4× hands + 1× arm (Allegro Hand, ORCA Hand, Shadow Dexterous Hand, Shadow Dexterous Hand Lite, Franka)
- 3 tactile-diverse tasks to cover sparse, intermittent, and sustained interactions: find an object, ball bouncing, and Baoding ball rotation
- Integrated hyperparameter optimisation with optuna — essential for tactile agents but often missing ❗
- Well-tuned baselines for each robot-task-agent combo (40-trial sweep) that reach state-of-the-art speeds in sim
roto 1.0included the Find (Franka), Bounce & Baoding (Shadow Hand) tasks. It was introduced in Enhancing Tactile-based RL for Robotic Control (NeurIPS 2025), which shows that blind superhuman dexterity is possible with sparse binary contacts + self-supervision.roto 2.0is extended to include the Allegro, ORCA, and Shadow Dexterous Hand Lite robots for the Bounce & Baoding tasks. We swept hyperparameters for the full state & blind agents, and benchmarked the results in a 2-page paper, accepted to ViTAC 2026 workshop at ICRA. See project page for some speedy agent videos.
The data 📁 for the NeurIPS paper and roto 2.0 ICRA workshop paper (checkpoints, training logs, plot scripts) are available in the roto_paper_results repo.
We split the paper code across two repositories. Imagine the typical RL loop: you can think of multimodal_rl as the agent, and roto as the environment. We did this for modularity, in case you want to use your own RL repository instead of ours (there will be some integration to achieve this but happy to help).
multimodal_rl: The motto of this repo is "doing good RL with Isaac Lab as painlessly as possible". We started from the skrl library and made significant changes to better handle multimodal dictionary observations, observation stacking and associated memory management, and integrated self-supervision. Many existing libraries did not provide support for doing robust RL research (correct evaluation metrics, distinct train/evaluation envs, integrated hyperparameter optimisation). These are well established norms in the RL research community, but are not yet consistently present in RL+robotics research, which we want to encourage 🚀
roto: This repo just contains the robot configurations and task definitions. We take advantage of class inheritance to heavily reduce repeated code. RotoEnv is a child of DirectRLEnv, and sets up basic functions to perform joint position control of a robot and reset it. [Robot]Env is a child of RotoEnv, defining robot-specific functions that do not change task-to-task, e.g. the proprioceptive observation key. Finally, [Task]Env defines task-specific functions such as setting up the environment, rewards, and episode resets.
The agents are all joint position controlled. Franka has 9 joints, Shadow has 20 actuated joints.
We need to install Isaac Sim, Isaac Lab, multimodal_rl and roto in a conda environment. We recommend using the latest Isaac Sim for maximum performance.
-
Create conda environment and install Isaac Lab and Isaac Sim (easiest to install both as pip packages)
-
Install multimodal_rl as a local editable package
git clone git@github.com:elle-miller/multimodal_rl.git
cd multimodal_rl
pip install -e .
- Install
rotoas a local editable package
git clone git@github.com:elle-miller/roto.git
cd roto
pip install -e .
- Test the installation by playing a trained agent.
# play in isaac sim viewer
python scripts/play.py --task Baoding --robot Shadow --num_envs 512 --agent_cfg forward_dynamics_memory --checkpoint readme_assets/checkpoints/baoding_memory.pt
# save a video
python scripts/play.py --task Baoding --robot Shadow --num_envs 512 --agent_cfg forward_dynamics_memory --video --video_length 1200 --headless --checkpoint readme_assets/checkpoints/baoding_memory.pt
The video should pop up in a ./videos folder and look like this:
You can find more trained checkpoints in the roto_paper_results repository.
Mostly the same as default Isaac Lab setup. The only breaking change is that a given task is not linked to a cfg file. The cfgs must be defined in the task __init__.py and specified as an agent_cfg argument.
We use opunta for integrated hyperparameter optimisation. The command is the same as for train.py, but with an additional --study name argument. You can specify the pruner, number of trials, number of warm up steps etc.
The basic commands are below. Note that when running for the first time, you will need to update your wandb entity, project etc. in the relevant .yaml file, and possibly some asset paths.
- {TASK} = [Find, Bounce, Baoding]
- {ROBOT} = [shadow, shadowlite, orca, allegro, franka]
- {CFG} = rl_only_pt (blind setting), rl_only_ptg (vision setting), forward_dynamics (blind + self-supervision), etc. or define your own
# training
python scripts/train.py --task {TASK} --robot {ROBOT} --agent_cfg {CFG} --num_envs 4196 --headless --seed 1234
# sweeping
python scripts/sweep.py --task {TASK} --robot {ROBOT} --agent_cfg {CFG} --num_envs 4196 --headless --seed 1234 --study {YOUR_STUDY_NAME}
# playing - see final step in installation
# examples
python scripts/train.py --task Baoding --robot shadow --agent_cfg rl_only_pt --num_envs 4196 --headless
python scripts/train.py --task Bounce --robot orca --agent_cfg rl_only_ptg --num_envs 4196 --headless
python scripts/train.py --task Find --robot franka --agent_cfg forward_dynamics --num_envs 4196 --headless
python scripts/sweep.py --task Baoding --robot shadowlite --agent_cfg rl_only_pt --num_envs 4196 --headless --study lite_baoding_blind
We use dictionary-style observations, and categorising into proprioception, tactile, rgb, depth, and gt (ground-truth). The proprioception & tactile methods should be defined in {Robot}Env, but gt information is task-dependent. To specify which observations are used, add the keys to obs_list in the agent cfg..
observations:
obs_list:
- prop
- tactile
- rgb
- depth
- gt
obs_stack: 3
tactile_cfg:
binary_tactile: true
binary_threshold: 0.01
pixel_cfg:
width: 80
height: 80
latent_pixel_dim: 128
normalise_rgb: true
max_depth: 2.0 # meters
Here is an example rendering of raw RGB, normalised RGB, and depth of Shadow Baoding agent.
We highly welcome community contributions and PRs! The most promising research directions we think are:
- Beyond sparse binary contacts to richer forms of tactile information
- ML methodologies with inductive biases for tactile data
- Expanding tasks. Our results indicate that while blind policies can approach privileged performance in simple tasks like bouncing, complex or multi-object manipulation (Baoding) remains an open challenge
If you use this benchmark environment in your academic or professional research, please cite the following work:
@inproceedings{miller2025tactilerl,
author = {Miller, Elle and McInroe, Trevor and Abel, David and Mac Aodha, Oisin and Vijayakumar, Sethu},
title = {Enhancing Tactile-based Reinforcement Learning for Robotic Control},
booktitle = {NeurIPS},
year = {2025},
}
roto 2.0 (ICRA ViTac Workshop paper): original contributors with addition of:
- Jayaram Reddy — National University of Singapore
- Ayush Deshmukh — University of Edinburgh
roto 1.0 (NeurIPS 2025 paper): see citation above.
For any questions, issues, or collaborations, please feel free to post an issue/start a discussion/reach out.
- Maintainer: Elle Miller
- Project Website: https://elle-miller.github.io/tactile_rl
This project is licensed under the BSD-3 License.
Full gallery: project page.
In the below videos, "Vision" means the agent had access to object states, proprioception, and binary contacts. "Blind" means the agent only had proprioception and binary contacts.
Shadow Hand
| Vision | Blind |
|---|---|
shadow_baoding_vision.mp4 |
shadow_baoding_blind.mp4 |
Allegro Hand
| Vision | Blind |
|---|---|
allegro_baoding_vision.mp4 |
allegro_baoding_blind.mp4 |
ORCA Hand
| Vision | Blind |
|---|---|
orca_baoding_vision.mp4 |
orca_baoding_blind.mp4 |
Shadow Dexterous Hand Lite
| Vision | Blind |
|---|---|
lite_baoding_vision.mp4 |
lite_baoding_blind.mp4 |
Shadow Hand
| Vision | Blind |
|---|---|
shadow_bounce_vision.mp4 |
shadow_bounce_blind.mp4 |
Allegro Hand
| Vision | Blind |
|---|---|
allegro_bounce_vision.mp4 |
allegro_bounce_blind.mp4 |
ORCA Hand
| Vision | Blind |
|---|---|
orca_bounce_vision.mp4 |
orca_bounce_blind.mp4 |
Shadow Dexterous Hand Lite
| Vision | Blind |
|---|---|
lite_bounce_vision.mp4 |
lite_bounce_blind.mp4 |