Improving the sample efficiency of pixel-based model-free RL algorithms by learning a high-level latent representation of given input observation. This is an attempt to combine reconstruction loss and consistency loss with contrastive learning for learning efficient representations. All the experiments are carried out on DeepMind Control Suite environments.
All the dependencies are in the setup/env.yml file. We assume that you have access to a GPU with CUDA >=9.2 support. The dependencies can be installed with the following commands:
conda env create -f setup/env.yml
conda activate crc
sh setup/install_envs.shTo run the code, set the required hyperparameters and environment name in the config.py file and execute the command-
python train.pyNote- Deep Mind Control Suite requires mujoco as its prerequisite. Please refer to this link for the mujoco installation.
Weights & Biases is used for logging the training and evaluation plots. For initiating wandb logging set the WB_LOG flag to True in the train.py file and login to your wandb account.
- We have used CURL architecture as our baseline. All our contributions/modifications are done on top of it.
- Our implementation of SAC is based on SAC+AE by Denis Yarats.
- For testing the generalization capability of our proposed method we have used the color hard and video easy environments of dm-control-generalization-benchmark