AMPED (Adaptive Multi-objective Projection for Exploration and skill Diversification) is a skill-based reinforcement learning algorithm designed to explicitly balance exploration and skill diversity. AMPED integrates entropy and RND based exploration with contrastive skill separation, and resolves conflicting learning signals using gradient surgery.
For more information, please see our project webpage.
AMPED Demonstrations
Before getting started, make sure the following requirements are met:
- Conda (for environment management)
- (Optional, only in case you want to run experiments through gpu) GPU with CUDA 11.1 and cuDNN 8 installed
conda env create -f conda_env.ymlAfter the installation ends you can activate your environment with
conda activate ampedThe main implementation of AMPED can be found in agent/amped.py.
Implementations of baseline agents such as APT, BeCL, CeSD, CIC, ComSD, DIAYN, and RND are also available in the directory. The SAC-based skill selector is implemented in skill_selector/sac.py.
AMPED supports training agents across various domains and tasks through a two-stage process:
-
Pre-training: Learn diverse and meaningful skills using unsupervised objectives.
- Run via
pretrain.py - Pretraining will produce several agent snapshots after training for
100k,500k,1M, and2Mframes and snapshots will be stored in./models/states/<domain>/<agent>/<seed>/. (i.e. the snapshots path is./models/states/walker/amped/3/).
- Run via
-
Fine-tuning: Adapt the pretrained policy to a downstream task using its extrinsic reward.
- Run via
finetune.pyorfinetunev2.py - Use
finetunev2.pyto enable the skill selector, which dynamically chooses the best skill to execute at each time step. - During fine-tuning, the pretrained agent is initialized from a saved snapshot and continues learning in a reward-driven setting.
- Run via
# Pre-train AMPED on walker domain
python pretrain.py agent=amped domain=walker seed=3
# Finetune AMPED on walker_stand task
python finetune.py task=walker_stand obs_type=states agent=amped reward_free=false seed=3 domain=walker snapshot_ts=2000000
# Pre-train APT on jaco domain
python pretrain.py agent=apt domain=jaco seed=100
# Finetune APT on jaco_reach_top_left task
python finetune.py task=jaco_reach_top_left obs_type=states agent=apt reward_free=false seed=100 domain=jaco snapshot_ts=2000000
# Finetune AMPED on walker_stand task with skill selector
python finetunev2.py task=walker_stand obs_type=states agent=amped reward_free=false seed=3 domain=walker snapshot_ts=2000000AMPED supports the following domains and associated tasks:
| Domain | Tasks |
|---|---|
walker |
stand, walk, run, flip |
quadruped |
walk, run, stand, jump |
jaco |
reach_top_left, reach_top_right, reach_bottom_left, reach_bottom_right |
We support the following baseline agents:
| Baseline |
|---|
APT: Behavior From the Void: Unsupervised Active Pre-Training (NeurIPS 2021) |
BeCL: Behavior Contrastive Learning for Unsupervised Skill Discovery (ICML 2023) |
CeSD: Constrained Ensemble Exploration for Unsupervised Skill Discovery (ICML 2024) |
CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery |
ComSD: Balancing Behavioral Quality and Diversity in Unsupervised Skill Discovery (IEEE T-Cybernetics 2025) |
DIAYN: Diversity is All You Need: Learning Skills without a Reward Function |
RND: Exploration by Random Network Distillation |
Training logs are saved in the exp_local directory. To launch TensorBoard, run:
tensorboard --logdir exp_localFor logging with Weights & Biases (wandb):
- Set
use_wandb: trueand provide your WandB API key via thewandb_keyfield in theconfig.yamlfile. - Alternatively, you can enable logging by passing
--use_wandb trueand--wandb_key <your_wandb_key>as command-line arguments.
Console output is also provided in this format:
| train | F: 6000 | S: 3000 | E: 6 | L: 1000 | R: 5.5177 | FPS: 96.7586 | T: 0:00:42Where:
F: total number of environment frames
S: total number of agent steps
E: total number of episodes
R: episode return
FPS: training throughput (frames per second)
T: total training time-
We adopt an agent-specific code structure to avoid unnecessary complexity. Instead of sharing a unified codebase across all agents, we maintain a separate set of files for each agent to improve clarity and modularity.
-
Please note that this codebase may not exactly reproduce the results reported in the paper due to potential human errors during code migration. If you observe any discrepancies in performance, feel free to reach out—we’d appreciate your feedback.
This codebase is built on top of the Unsupervised Reinforcement Learning Benchmark (URLB) codebase.
The implementation of CeSD is adapted from its CeSD Repository, BeCL from the BeCL Repository, CIC from the CIC Repository, and ComSD is implemented based on the ComSD Repository.
This project is licensed under the MIT License -- see the LICENSE file for details. Note that the repository relies on third-party libraries subject to their respective licenses.
@article{AMPED,
title={AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification},
author={Cho, Geonwoo and Lee, Jaemoon and Im, Jaegyun and Lee, Subi and Lee, Jihwan and Kim, Sundong},
journal={arXiv preprint arXiv:2506.05980},
year={2025}
}