slimRL - simple, minimal and flexible Deep RL

slimRL provides a concise and customizable implementation of Deep Q-Network (DQN) and Fitted Q Iteration (FQI) algorithms in Reinforcement Learning⛳ for Lunar Lander and Car-On-Hill environments. It enables to quickly code and run proof-of-concept type of experiments in off-policy Deep RL settings.

🚀 Key advantages

✅ Easy to read - clears the clutter with minimal lines of code 🧹
✅ Easy to experiment - flexible to play with algorithms and environments 📊
✅ Fast to run - jax accleration, support for GPU and multiprocessing ⚡

Let's dive in!

User installation

CPU installation:

python3 -m venv env_cpu
source env_cpu/bin/activate
pip install --upgrade pip setuptools wheel
pip install -e .[dev]

GPU installation:

python3 -m venv env_gpu
source env_gpu/bin/activate
pip install --upgrade pip setuptools wheel
pip install -e .[dev,gpu]

To verify the installation, run the tests as:pytest

Running experiments

slimRL provides support for Car-On-Hill with FQI and Lunar Lander with DQN algorithm. However, you can easily extend it to other gym environments like Acrobot, Cart Pole, Mountain Car, by replicating the setup for Lunar Lander.

Training

To train a DQN agent on Lunar Lander on your local system, run (provide the --gpu flag if you want to use GPU):
launch_job/lunar_lander/launch_local_dqn.sh --experiment_name {experiment_name} --first_seed 0 --last_seed 0 --features 100 100 --learning_rate 3e-4 --n_epochs 100

It trains a DQN agent with 2 hidden layers of size 100, for a single random seed for 100 epochs.

You can tune the other parameters based on your requirements. Run lunar_lander_dqn --help on the terminal to check out all the parameters
To see the stage of training, you can check the logs in experiments/lunar_lander/logs/{experiment_name}/dqn folder
The models and results are stored in experiments/lunar_lander/exp_output/{experiment_name}/dqn folder

To train on cluster:
launch_job/lunar_lander/launch_cluster_dqn.sh --experiment_name {experiment_name} --first_seed 0 --last_seed 0 --features 100 100 --learning_rate 3e-4 --n_epochs 100

Plotting results

Once the training is done, you can generate the Performance Curve (for multiple experiments) by running:
plot_iqm --experiment_folders "{experiment_name_1}/dqn" "{experiment_name_2}/dqn" --env "lunar_lander"

It generates an IQM-based Performance Curve, similar to the one shown above.

Plotting the metrics for Car-On-Hill

Generate the necessary metrics required for plotting Performance loss, Approximation Error, etc. by running:
car_on_hill_fqi_eval --experiment_folder "{experiment_name}/fqi" --performance --approximation_error_components

Once complete, open experiments/car_on_hill/plots.ipynb jupyter notebook, set the appropriate experiment name and parameters and run all cells to generate the plots.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
experiments		experiments
launch_job		launch_job
slimdqn		slimdqn
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
flops_count.ipynb		flops_count.ipynb
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

slimRL - simple, minimal and flexible Deep RL

🚀 Key advantages

User installation

Running experiments

Training

Plotting results

Plotting the metrics for Car-On-Hill

License

About

Uh oh!

Releases 1

Contributors 2

Uh oh!

Languages

License

theovincent/EauDeDQN

Folders and files

Latest commit

History

Repository files navigation

slimRL - simple, minimal and flexible Deep RL

🚀 Key advantages

User installation

Running experiments

Training

Plotting results

Plotting the metrics for Car-On-Hill

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors 2

Uh oh!

Languages