This repository contains the reference implementation of the [DAU+]DSUP(
Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning
by Harley Wiltzer*, Marc G. Bellemare, David Meger, Patrick Shafto, and Yash Jhaveri*.
This project uses PDM for dependency management. See https://pdm-project.org/latest/#installation for installation instructions.
Once PDM has been installed, execute the following from the project root to sync the dependencies:
pdm venv create
pdm installBefore running any code, be sure to activate the virtual environment (from the project root):
source .venv/bin/activateSome environments simulate dynamics from datasets. The download_data.sh file downloads these datasets. Make this
script executable:
chmod +x download_data.shThen run the script to download the datasets:
./download_data.shThis script will create a data directory in the project root with the requisite datasets.
The easiest way to run training scripts is with our justfile, using the just command runner.
To train agents for risk-neutral option trading, execute
just writer=[aim | comet] agent=[dsup | qrdqn | dau] option_idx=<int> time_mul=<int> train_optionsHere, option_idx specifies the commodity for the environment, and time_mul is the decision frequency. Setting time_mul=1 results in the base frequency, and time_mul=n is n times the base frequency.
To train the DAU+DSUP(1/2) variant, execute replace train_options with train_options_dsup_shifted.
To train agents for risk-sensitive option trading with CVaR, execute
just writer=[aim | comet] agent=[dsup | qrdqn | dau] option_idx=<int> time_mul=<int> risk_param=<float> train_options_riskyHere, risk_param refers to the CVaR level for the experiment.
If you build on our work or find it useful, please cite it using the following bibtex,
@inproceedings{wiltzer2024action,
title={Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning},
author={Harley Wiltzer and Marc G. Bellemare and David Meger and Patrick Shafto and Yash Jhaveri},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=BRW0MKJ7Rr}
}