HIL-SERL: Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning
Webpage: https://hil-serl.github.io/
HIL-SERL is a system for training state of the art manipulation policies using Reinforcement Learning.
We first tele-operate the robot to collect positive and negative samples and train a binary reward classifier. We then collect a small set of demonstrations, which is added to the demo buffer at the start of RL training. During online training, we use the binary classifier as a sparse reward signal and provide human interventions. Initially, we provide more frequent interventions to demonstrate ways of solving the task from various states and prevent the robot from performing undesirable behavior. We gradually reduce the amount of interventions as the policy reaches higher success rate and faster cycle times.
Our project extends the hil-serl framework, which is designed for real-world robotic tasks using the Franka robot. The original implementation requires physical hardware and relies on neural networks to evaluate task success, making it challenging for developers to test and adapt the code for their specific needs.
By adapting the SERL codebase, we created a lightweight simulation environment that allows developers to test and debug their implementations without needing physical hardware. Building on this, users can control the Franka robot via keyboard(not spacemouse) to collect small-scale pre-training data. Through iterative human-in-the-loop online reinforcement learning, the system achieves a 100% success rate for vision-based robotic grasping.
This significantly lowers the barrier to entry, accelerates development, and makes the hil-serl framework more accessible to researchers and developers.
After 30000 steps of training and human's intervention(about 1 hours), our policy can achieve 100% of success in pick_cube_sim environment. The Here is our training curve and final policy out come.
-
Setup Conda Environment: create an environment with
conda create -n hilserl python=3.10 conda activate hilserl
-
Install pytorch
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121 -i https://pypi.tuna.tsinghua.edu.cn/simple -
Install Jax as follows:
For CPU (not recommended):
pip install --upgrade "jax[cpu]"For GPU:
pip install --upgrade "jax[cuda12_pip]==0.4.35" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.htmlFor TPU
pip install --upgrade "jax[tpu]" -f https://storage.googleapis.com/jax-releases/libtpu_releases.htmlSee the Jax Github page for more details on installing Jax.
-
Install the serl_launcher
cd serl_launcher pip install -e . pip install -r requirements.txt cd ..
-
Install the Franka_sim
cd franka_sim pip install -e . -
Install the requirement
pip install -r requirements.txt
Here, we will tell you how to use hil-serl to train Franka Arm in Simulation environment(pick_cube_sim). The goal of this environment is to capture randomly appearing blocks in space and raise them by 0.1m in the z-direction.
-
Change directory to target folder
cd examples/experiments/pick_cube_sim/ -
Try the control the robot in simulation environment. You can familiarize yourself with our simulation environment through a simple demo, such as how humans intervene and control, controlling the robotic arm to complete tasks.
python ../../../env_test.pyIn this setup, we are looking forward from the base of the Franka robot, while the control keys are defined as follows:
WMove forward;SMove backward;AMove left;DMove right;JMove up;KMove down;LOpen/Close gripper;;Enable/disable human intervention mode -
Collect demo data. 20 means we need to collect 20 successful human teleoperation trajectories. Afterwards, the collected trajectories will be saved in the path
examples/experiments/pick_cube_sim/demo_data.python ../../record_demos.py --exp_name pick_cube_sim --successes_needed 20 -
Start human-in-loop RL training. Before runnning the following two bash file, you need to check and update
run_actor.shandrun_learner.sh. For example the exp_name must be one of the folder name in experiments, the demo_path need to be the absolute path of your.pklfile(not relative path).bash run_actor.sh --checkpoint_path first_run bash run_learner.sh -
Evaluate our final policy. You can modify the parameters according to your needs.
bash run_actor.sh --eval_checkpoint_step=30000 and --eval_n_trajs=100 --checkpoint_path=first_run
You can also download and utilize our pre-trained models to quickly test the performance and compare the results. To do so, move the downloaded first_run folder to the directory examples/experiments/pick_cube_sim/first_run, and then use the following code to evaluate the results
cd examples/experiments/pick_cube_sim/
bash run_actor.sh --eval_checkpoint_step=30000 and --eval_n_trajs=100 --checkpoint_path=first_run
HIL-SERL provides a set of common libraries for users to train RL policies for robotic manipulation tasks. The main structure of running the RL experiments involves having an actor node and a learner node, both of which interact with the robot gym environment. Both nodes run asynchronously, with data being sent from the actor to the learner node via the network using agentlace. The learner will periodically synchronize the policy with the actor. This design provides flexibility for parallel training and inference.
Table for code structure
| Code Directory | Description |
|---|---|
| examples | Scripts for policy training, demonstration data collection, reward classifier training |
| serl_launcher | Main code for HIL-SERL |
| serl_launcher.agents | Agent Policies (e.g. SAC, BC) |
| serl_launcher.wrappers | Gym env wrappers |
| serl_launcher.data | Replay buffer and data store |
| serl_launcher.vision | Vision related models and utils |
| serl_robot_infra | Robot infra for running with real robots |
| serl_robot_infra.robot_servers | Flask server for sending commands to robot via ROS |
| serl_robot_infra.franka_env | Gym env for Franka robot |
| examples.experiments.pick_cube_sim | Scripts and configuration file for simulation training. |
| franka_sim | Main code to build up simulation environment. |
If you use this code for your research, please cite the following two papers:
hil-serl
@misc{luo2024hilserl,
title={Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning},
author={Jianlan Luo and Charles Xu and Jeffrey Wu and Sergey Levine},
year={2024},
eprint={2410.21845},
archivePrefix={arXiv},
primaryClass={cs.RO}
}And serl
@misc{luo2024serl,
title={SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning},
author={Jianlan Luo and Zheyuan Hu and Charles Xu and You Liang Tan and Jacob Berg and Archit Sharma and Stefan Schaal and Chelsea Finn and Abhishek Gupta and Sergey Levine},
year={2024},
eprint={2401.16013},
archivePrefix={arXiv},
primaryClass={cs.RO}
}