This is the repository for the generalized variant of Bondnet. Here we have iterated on the original algorithm with generalized treatment of atom mappings, bonding, and descriptors. This variant still works with a minimal set of features and can handle rdkit for vanilla bond definitions but can also intake custom bond definitions to work with metals, noncovalent bonds, etc. We can also now handle arbitrary number of species on each side of the reaction.
BonDNet is a graph neural network model for the prediction of bond dissociation energies (BDEs). It can be applied to both homolytic and heterolytic bond dissociations for molecules of any charge. This model is described in the paper: BonDNet: a graph neural network for the prediction of bond dissociation energies for charged molecules, Chemical Science, 2021.
Currently, we support installation from source:
-
create a conda environment using yaml provided and install pip dependencies
cd ./enviro conda env create --file=environment.yml pip install -r requirements.txt -
install this repo
git clone https://github.com/santi921/bondnet/ cd bondnet pip install -e ./
Data for the generalized model should provide the model a json/bson with a few columns. Here we define the bonds as :
bonds_broken- A list of lists with atoms broken in the reactants vs. the productsbonds_formed- A list of lists with atoms formed in the reactants vs. the productsreactant/product_bonds- A full list of lists with all the bonds in the reactants/products. This is a single list of lists on each side of the reaction.reactant/product_bonds_no_metal- A full list of lists with all the bonds in the reactants/products EXCLUDING bonds with metals in them. This separate list is used by the featurizer and if your reactions don't have metals, simply copyreactant/product_bondshere. This is a single list of lists on each side of the reaction.- Pymatgen objects under the key
combined_reactants/products_graphorreactants/products_molecular_graph reactant/product_id- These are either strings, ints, or lists depending on the number of products/reactants on each side.charge- defines the charge of the reaction- A target variable that you can specify in the settings.json file, we use
dG_sporproduct_energy-reactant_energyYou can also specify a transfer learning variable in the settings file to train on before training on the actual target. - Optionally, you can specify extra bond and atom features to use in training. These are specified with
extra_feat_atom_reactant/productandextra_feat_bond_reactant/productfor atoms and bond features respectively. Forextra_feat_atomfeatures you order atoms in the same order as the bonds define atoms.extra_feat_bond_products/reactantsshould be ordered in the same order as the bonds are specified. - You can also keep extra features to carry through training in order to track descriptors such as functional groups, for example
functional_group_reacted. These are not used by the model.
Note on atom mapping: there is no explicit atom mappings used by this variant of bondnet, we assume that the user handles that such a bond [0, 10] in products and a bond [0, 10] in the reactants are between the same atoms. The user has to ensure atom mapping is consistent between bonds on each side of the reaction AND that the pymatgen objects and extra features follow this ordering as well