This repository contains the implementation of the paper Mitigating over-exploration in latent space optimization using LES, by Omer Ronen, Ahmed Imtiaz Humayun, Richard Baraniuk, Randall Balestriero and Bin Yu.
Citation
If you use LES or any of the resources in this repo in your work, please use the following citation:
@misc{ronen2025mitigatingoverexplorationlatentspace,
title={Mitigating over-exploration in latent space optimization using LES},
author={Omer Ronen and Ahmed Imtiaz Humayun and Richard Baraniuk and Randall Balestriero and Bin Yu},
year={2025},
eprint={2406.09657},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2406.09657},
}Table of contents
Using Anaconda, first clone the current repository:
git clone https://github.com/OmerRonen/les.gitThen install the dependencies using:
conda env create --file environment.yml
conda activate lesTo use the log-expected improvement acquisition function, you would have to manually clone and install the BoTorch repository:
git clone https://github.com/pytorch/botorch.git
cd botorch
pip install -e .This repository uses the expressions and SMILES datasets, both can be downloaded from the repository of the Grammar Variational Autoencoder paper. Specifically, the eq2_grammar_dataset.h5 and 250k_rndm_zinc_drugs_clean.smi files should be downloaded into the data/grammar and data/molecules directories, respectively.
All the models used in our work can be found in the trained_models directory. The following command loads a pre-trained VAE for the expressions dataset:
from les.nets.utils import get_vae
from les.utils.les import LES
dataset = "expressions"
architecture = "gru"
beta = "1"
vae, _ = get_vae(dataset=dataset, architecture=architecture, beta=beta)For replicating the results on the molecular datasets (SELFIES and SMILES), we recommend using a GPU to avoid long running times.
The results in Table 1 can be replicated using:
python -m les.analysis.ood <DATASET> <ARCHITECTURE> <BETA>where <DATASET> should be replaced with expressions, smiles, or selfies, <ARCHITECTURE> with gru, lstm, or transformer and <BETA> with 0.05, 0.1 or 1.
The Bayesian Optimization results in Section 4 can be replicated with (see les/configs/bayes_opt.yaml for run configuration):
python -m les.analysis.boIf you are interested in calculating ScaLES with a given pre-trained generative model, you can use the following code:
from les.nets.utils import get_vae
from les.utils.les import LES
dataset = "expressions"
architecture = "gru"
beta = "1"
vae, _ = get_vae(dataset=dataset, architecture=architecture, beta=beta)
les = LES(vae)
z = torch.randn((5, vae.latent_dim))
les_score = les(z)The code is released under the MIT license; see the LICENSE file for details.