s2v DAgger

Official implementation of

When Should We Prefer State-to-Visual DAgger Over Visual Reinforcement Learning? by

Tongzhou Mu*, Zhaoyang Li*, Stanisław Wiktor Strzelecki*, Xiu Yuan, Yunchao Yao, Litian Liang, Hao Su, (UC San Diego)

Overview

Visual RL is a promising approach that directly trains policies from visual observations, although it faces challenges in sample efficiency and computational costs. We conduct an empirical comparison of State-to-Visual DAgger — a two-stage framework that initially trains a state policy before adopting online imitation to learn a visual policy — and Visual RL across a diverse set of tasks.

We evaluate both methods across 16 tasks from three benchmarks, focusing on their asymptotic performance, sample efficiency, and computational costs. Surprisingly, our findings reveal that State-to-Visual DAgger does not universally outperform Visual RL but shows significant advantages in challenging tasks, offering more consistent performance. In contrast, its benefits in sample efficiency are less pronounced, although it often reduces the overall wall-clock time required for training.

Installation

Install all dependencies via mamba or conda by running the following command:

mamba env create -f environment.yml
mamba activate s2v

Note: mamba is a drop-in replacement for conda. Feel free to use conda if you prefer it.

Run Experiments

State-to-Visual DAgger has two stages:

Stage 1: Learning State Policy by RL.
Stage 2: Learning Visual Policy by DAgger.

Stage 1: Learning State Policy by RL

The following commands should be run under the repo root dir.

python scripts/state_sac_adroit.py --env-id AdroitHandHammer-v1

python scripts/state_sac_adroit.py --env-id AdroitHandPen-v1

python scripts/state_sac_adroit.py --env-id AdroitHandRelocate-v1 --total-timesteps 80_000_000 --reward-scale 1

Note:

If you want to use Weights and Biases (wandb) to track learning progress, please add --track to your commands.

Stage 2: Learning Visual Policy by DAgger

The following commands should be run under the repo root dir.

python scripts/s2v_dagger_adroit.py --env-id AdroitHandHammer-v1 --expert-ckpt STATE_POLICY_CHECKPOINT_FROM_STAGE_1

python scripts/s2v_dagger_adroit.py --env-id AdroitHandPen-v1 --expert-ckpt STATE_POLICY_CHECKPOINT_FROM_STAGE_1

python scripts/s2v_dagger_adroit.py --env-id AdroitHandRelocate-v1 --expert-ckpt STATE_POLICY_CHECKPOINT_FROM_STAGE_1 --total-timesteps 40000000

Citation

If you find our work useful, please consider citing our paper as follows:

@inproceedings{mu2025when,
  title={When Should We Prefer State-to-Visual DAgger Over Visual Reinforcement Learning?},
  author={Mu, Tongzhou and Li, Zhaoyang and Strzelecki, Stanisław and Yuan, Xiu and Yao, Yunchao and Liang, Litian and Su, Hao},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2025}
}

Acknowledgments

This codebase is built upon CleanRL repository.

License

This project is licensed under the MIT License - see the LICENSE file for details. Note that the repository relies on third-party code, which is subject to their respective licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

s2v DAgger

Overview

Installation

Run Experiments

Stage 1: Learning State Policy by RL

Stage 2: Learning Visual Policy by DAgger

Citation

Acknowledgments

License

About

Uh oh!

Releases

Packages

Languages

License

ZhaoyangLi-1/s2v-dagger

Folders and files

Latest commit

History

Repository files navigation

s2v DAgger

Overview

Installation

Run Experiments

Stage 1: Learning State Policy by RL

Stage 2: Learning Visual Policy by DAgger

Citation

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages