ReMax RL

This is the official implementation of the paper Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying.

We argue that exploration matters because we are $\color{#C44E52}{\text{uncertain}}$ about the return and are allowed to $\color{#4678C8}{\textbf{retry}}$.

If no $\color{#C44E52}{\text{uncertainty}}$, the problem would reduce to pure optimization.
If no chance to $\color{#4678C8}{\textbf{retry}}$, only rational action is the current best.

We turn this intuition into an objective for RL, ReMax, where we assume $\color{#C44E52}{\text{distribution over the return}}$ and measure the $\color{#4678C8}{\textbf{best of M retries}}$.

Setup

Please make sure you have installed proper GPU compatible JAX in your environment.

uv sync

For Atari, for the compatibility to the envpool, we recommend to build the docker image with agents/atari/Dockerfile.

Reproduce the results in the paper

Bandit Experiments

In bandit/, we implement the bandit experiments in the paper.

python plot_binary_bandit.py  # Binary bandit plot (Figure 1 (left))
python plot_scaled_bernoulli_bandit.py  # Bernoulli bandit plot (Figure 1 (center))
python plot_fixed_binary_bandit.py  # Fixed binary bandit plot (Figure 1 (right))

python plot_bandit_with_posterior.py --family beta  # for Beta-Bernoulli regret plot (Figure 2 (left))
python plot_bandit_with_posterior.py --family gaussian # for Gaussian-Gaussian regret plot (Figure 2 (right))

RL Experiments

In agents/, we implement the algorithms used in the paper.

minatar/: MinAtar experiments, using pgx implementation.
atari/: Atari experiments (based on purejaxql).
craftax/: Craftax experiments.

At sh/, run

./run_minatar.sh  # for MinAtar
./run_atari.sh  # for Atari
./run_craftax.sh  # for Craftax

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
agents		agents
bandit		bandit
docs		docs
sh		sh
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReMax RL

Contents

Setup

Reproduce the results in the paper

Bandit Experiments

RL Experiments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ReMax RL

Contents

Setup

Reproduce the results in the paper

Bandit Experiments

RL Experiments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages