rl_ppo

The repositary consists of custom implementation of PPO agent, Atari wrapper, training scripts, experiment results + trained models.

PPO agent is able to:

collect data via multiprocessing
compute target values by GAE (Generalized Advantage Estimate)
predict values by head support
explore state space by using Random Network Distillation module

I tuned PPO agent in domain of Atari games, specially on Pong, Breakout, MsPacman and Montezuma's Revenge. My students and colleagues have also used my PPO agent in another domains like Starcraft and Doom mini games, card game Dominion and p-median optimalization problem. However, most of these experiments have not been allowable yet. :)

Results of PPO:

Pong

One iteration consists of 4096 steps
Batch size was set to 512
4 epochs per iteration

Brekout

One iteration consists of 4096 steps
Batch size was set to 512
4 epochs per iteration

MsPacman

One iteration consists of 4096 steps
Batch size was set to 512
4 epochs per iteration

Montezuma's Revenge

One iteration consists of 16384 steps
Batch size was set to 512
4 epochs per iteration

Few advantages in research:

In Atari domain, othogonal initialization of model is better than Xavier one.
Adam optimizer is good option... almost everytime.
In Atari domain, learning rate set to 0.00025 is good choice in most of environmnets.
On GPU envs, PPO without multiprocessing is better otpion.
I tried to use new reward head as a part of the predicted model in order to support learning. However, this modification approxiamted worse than baseline. Don't repeat my mistake. :)

There are many improvements and agents that have reached SOTA results in current research. Of course, I tried a lot of these improvements and agents. Although you are able to reach better score, dont' forget: your training will prolong many times. For example, I implemented efficient MuZero algorithm but the training in pong environment lasts about one day (PPO reach max score in one hour).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
agents		agents
models		models
plots		plots
results		results
utils		utils
LICENSE		LICENSE
README.md		README.md
create_plot.py		create_plot.py
main.py		main.py
train_breakout.py		train_breakout.py
train_pacman.py		train_pacman.py
train_pong.py		train_pong.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rl_ppo

Results of PPO:

Pong

Brekout

MsPacman

Montezuma's Revenge

Few advantages in research:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

rl_ppo

Results of PPO:

Pong

Brekout

MsPacman

Montezuma's Revenge

Few advantages in research:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages