This repository contains the source code and results for my thesis. The goal was to implement modern Deep RL algorithms (PPO, TD3, SAC) from scratch and compare them against established libraries like Stable-Baselines3 and CleanRL.
The main focus is on continuous control tasks in MuJoCo and DeepMind Control Suite.
Each algorithm is self-contained in its own directory; the code should be clean, easy to read and well documented.
- Proximal Policy Optimization (PPO)
- Soft Actor-Critic (SAC)
- Twin Delayed DDPG (TD3)
All algorithms were tested on:
- Gymnasium MuJoCo: Swimmer, Hopper, HalfCheetah, Walker2D
- DeepMind Control Suite: Finger Spin, Reacher, Cartpole Swingup
| Hopper-v5 | Swimmer-v5 | HalfCheetah-v5 | Walker2D-v5 |
|---|---|---|---|
| Finger Spin | Reach | Cartpole Swingup |
|---|---|---|
In thesis, the models were evaluated against Stable-Baselines3 and CleanRL. Here, only the results of my implementations are shown. All models were trained for 1M steps on 3 seeds; evaluation was done using the best model across 50 episodes. The results are reported as mean reward ± standard deviation.
| Environment | PPO | TD3 | SAC |
|---|---|---|---|
| Swimmer-v5 | 344 ± 3.3 | 37 ± 7.6 | 31 ± 9.9 |
| Hopper-v5 | 2625 ± 65.1 | 2021 ± 691.1 | 1669 ± 366.0 |
| HalfCheetah-v5 | 3105 ± 494.7 | 9802 ± 708.4 | 7250 ± 89.0 |
| Walker2d-v5 | 3930 ± 1585.3 | 5030 ± 1711.6 | 3660 ± 581.1 |
| Environment | PPO | TD3 | SAC |
|---|---|---|---|
| Finger Spin | 738 ± 51.1 | 907 ± 10.1 | 988 ± 6.4 |
| Reacher Hard | 552 ± 469.8 | 905 ± 231.5 | 973 ± 17.9 |
| Cartpole Swingup | 341 ± 78.0 | 480 ± 0.3 | 475 ± 0.4 |
Models, videos and configs can be found on Hugging Face.
The project is written in Python 3.11. It relies on PyTorch (tested with CUDA 12.8), Gymnasium and MuJoCo.
pip install -r requirements.txtIf you have a different CUDA version, you might need to install PyTorch manually.
To train an agent, run the main script with a config file:
python -m src.main --config <config-path>I included a simple terminal-based tool to visualize trained agents and benchmark their performance:
python -m src.playgroundTo train and benchmark SB3 agent, run:
python -m src.benchmark --env <env-name> --alg <TD3/SAC/PPO>To share trained models, use the included utility script. It automatically records a replay video, generates a model card with metadata, and uploads the model + config to the Hub. You need to be logged in to Hugging Face CLI to use it. You can specify a regex to select which configs to upload.
python -m src.utils.upload_to_hf --username <your-username> --collection <collection-name> --select <config-regex> I used black and pylint to keep the code consistent. Sometimes, it looks weird, but it is what it is.
black src
pylint src