This repository is the official implementation of the paper "Deeply-Debiased Off-Policy Interval Estimation" (ICML 2021) in Python.
Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel procedure to construct an efficient, robust, and flexible CI on a target policy's value. Our method is justified by theoretical results and numerical experiments.
| Method | Results |
|---|---|
- Code files in the main folder:
- Methods:
_TRIPLE.py: main function for the proposed method_IS.py: code to implement the two IS-based competing methods
- Environment:
_Ohio_Simulator.py: simulate for theDiabatesenvironment_cartpole.py: simulate for theCartpoleenvironment, forked from OpenAI Gym, with slight modifications.
_util.py: helper functions_analyze.py: post-process simulation results
- Methods:
/density: functions for estimating the two density ratio functions/coinDice: code for the competing method "coinDice". Forked fromhttps://github.com/google-research/dice_rl/target_policies: checkpoints for the learned target policies/RL: some useful RL functionsDQN.pyandFQI.py: implementation of the target/behaviour policiesFQE.py: function for estimating the initial Q functionmy_gym.py: helper functions for trainingsampler.py: samplers and replay buffers
/TOY: code to generate the two plots for toy examplesTOY_coverage.ipynb: for the plot showing the CI coverageTOY_TRIPLY.ipynb: for the plot showing the triply robust property_plot.py: helper functions for plotting_discrete.py: TR method for discrete state space
/script: scripts to run the experiments.
To reproduce our simulation experiment results, please follow the steps:
- install the required packages
- change the working directory to the main folder
- open the jupyer notebook and modify the hyper-parameters
- run and analyze the output results