Implementation of Randomized Ensembled Double Q-Learning(REDQ) Algorithm

This repository is a simple and readable implementation of REDQ algorithm proposed in ICLR 2021 paper Randomized ensembled double q-learning：learning fast without a model.

Run following command to train:

python src/main.py --env_index 1 --algorithm sac --device cuda:1

Each train instance will create a folder in output/ to store experiment data.

Folder structure

src/main.py is entry point of all training tasks. it includes code for:
- parsing command line argument parsing
- initializing gym mujoco task
- creating folder for experiment
- main loop of training
src/redq.py and src/sac.py contains classes that implement redq and sac algorithm, respectively.
src/utils.py contain class of replay buffer, actor, and critic.
src/plot-train.ipynb plots compariason of learning curve in two experiment instance.

algorithm. redq or sac.
- command line argument --algorithm sac
task. mujoco task agent will be trained on.
- command line argument --env_index 1
- 1 for 'hopper', 2 for 'walker', 3 for 'ant', 4 for 'humanoid'.
$N$. ensemble size. number of Q functions. default: 20.
- command line argument --critic_num 20
$M$. number of Q functions selected to compute MSBE target term. default: 2.
- command line argument --critic_num_select 2
$G$. UTD(update-to-data) ratio. default: 20
- how many times to train after one interaction with environment.
- command line argument --train_num 2
$\rho$. target smoothing coefficient. default: 0.005
- command line argument --critic_target_update_ratio 0.005
replay buffer size. default: 1e6.
- command line argument --replay_buffer_size 1e6
maximum training step. default: 5e6
- commdn line argument --step_train_max 5e6

All experimetns are conducted using default hyperparameters values, which are same to those used in REDQ paper.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
result		result
src		src
README.md		README.md
utils_project.py		utils_project.py