Out-of-the-Box Deep Learning Prediction of Pharmaceutical Properties by Broadly Learned Knowledge-Based Molecular Representations

The reproduction repo. in codeocean: https://codeocean.com/capsule/2307823/tree

For the application in the Omics Data, please follow the link in AggMap: https://github.com/shenwanxiang/bidd-aggmap

Out-of-the-Box Deep Learning Prediction of Pharmaceutical Properties by Broadly Learned Knowledge-Based Molecular Representations

MolMap

MolMap is generated by the following steps:

Step1: Data sampling
Step2: Feature extraction
Step3: Feature pairwise distance calculation --> cosine, correlation, jaccard
Step4: Feature 2D embedding --> umap, tsne, mds
Step5: Feature grid arrangement --> grid, scatter
Step5: Transform --> minmax, standard
Step6: Get MolMap

Construction of the MolMap Objects

The MolMapNet Architecture

Installation

install rdkit and tamp first(create a molmap env):

conda create -c conda-forge -n molmap rdkit
conda activate molmap
conda install -c tmap tmap

in your "molmap" env, install molmap by:

git clone https://github.com/shenwanxiang/bidd-molmap.git
cd bidd-molmap
pip install -r requirements.txt --user

# add molmap to PYTHONPATH
echo export PYTHONPATH="\$PYTHONPATH:`pwd`" >> ~/.bashrc

# init bashrc
source ~/.bashrc

ChemBench (optional, if you wish to use the dataset and the split induces in this paper).
If you have gcc problems when you install molmap, please installing g++ first:

sudo apt-get install g++

Out-of-the-Box Usage

import molmap
# Define your molmap
mp_name = './descriptor.mp'
mp = molmap.MolMap(ftype = 'descriptor', fmap_type = 'grid',
                   split_channels = True,   metric='cosine', var_thr=1e-4)

# Fit your molmap
mp.fit(method = 'umap', verbose = 2)
mp.save(mp_name)

# Visulization of your molmap
mp.plot_scatter()
mp.plot_grid()

# Batch transform 
from molmap import dataset
data = dataset.load_ESOL()
smiles_list = data.x # list of smiles strings
X = mp.batch_transform(smiles_list,  scale = True, 
                       scale_method = 'minmax', n_jobs=8)
Y = data.y 
print(X.shape)

# Train on your data and test on the external test set
from molmap.model import RegressionEstimator
from sklearn.utils import shuffle 
import numpy as np
import pandas as pd
def Rdsplit(df, random_state = 888, split_size = [0.8, 0.1, 0.1]):
    base_indices = np.arange(len(df)) 
    base_indices = shuffle(base_indices, random_state = random_state) 
    nb_test = int(len(base_indices) * split_size[2]) 
    nb_val = int(len(base_indices) * split_size[1]) 
    test_idx = base_indices[0:nb_test] 
    valid_idx = base_indices[(nb_test):(nb_test+nb_val)] 
    train_idx = base_indices[(nb_test+nb_val):len(base_indices)] 
    print(len(train_idx), len(valid_idx), len(test_idx)) 
    return train_idx, valid_idx, test_idx

# split your data
train_idx, valid_idx, test_idx = Rdsplit(data.x, random_state = 888)
trainX = X[train_idx]
trainY = Y[train_idx]
validX = X[valid_idx]
validY = Y[valid_idx]
testX = X[test_idx]
testY = Y[test_idx]

# fit your model
clf = RegressionEstimator(n_outputs=trainY.shape[1], 
                          fmap_shape1 = trainX.shape[1:], 
                          dense_layers = [128, 64], gpuid = 0) 
clf.fit(trainX, trainY, validX, validY)

# make prediction
testY_pred = clf.predict(testX)
rmse, r2 = clf._performance.evaluate(testX, testY)
print(rmse, r2)

Click for More Example

Out-of-the-Box Performances

Dataset	Task Metric	MoleculeNet (GCN Best Model)	Chemprop (D-MPNN model)	MolMapNet (MMNB model)
ESOL	RMSE	0.580 (MPNN)	0.555	0.575
FreeSolv	RMSE	1.150 (MPNN)	1.075	1.155
Lipop	RMSE	0.655 (GC)	0.555	0.625
PDBbind-F	RMSE	1.440 (GC)	1.391	0.721
PDBbind-C	RMSE	1.920 (GC)	2.173	0.931
PDBbind-R	RMSE	1.650 (GC)	1.486	0.889
BACE	ROC_AUC	0.806 (Weave)	N.A.	0.849
HIV	ROC_AUC	0.763 (GC)	0.776	0.777
PCBA	PRC_AUC	0.136 (GC)	0.335	0.276
MUV	PRC_AUC	0.109 (Weave)	0.041	0.096
ChEMBL	ROC_AUC	N.A.	0.739	0.750
Tox21	ROC_AUC	0.829 (GC)	0.851	0.845
SIDER	ROC_AUC	0.638 (GC)	0.676	0.68
ClinTox	ROC_AUC	0.832 (GC)	0.864	0.888
BBBP	ROC_AUC	0.690 (Weave)	0.738	0.739

Name		Name	Last commit message	Last commit date
Latest commit History 225 Commits
docs		docs
molmap		molmap
paper		paper
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Out-of-the-Box Deep Learning Prediction of Pharmaceutical Properties by Broadly Learned Knowledge-Based Molecular Representations

MolMap

Construction of the MolMap Objects

The MolMapNet Architecture

Installation

Out-of-the-Box Usage

Out-of-the-Box Performances

About

Releases

Packages

Languages

License

sailfish009/bidd-molmap

Folders and files

Latest commit

History

Repository files navigation

Out-of-the-Box Deep Learning Prediction of Pharmaceutical Properties by Broadly Learned Knowledge-Based Molecular Representations

MolMap

Construction of the MolMap Objects

The MolMapNet Architecture

Installation

Out-of-the-Box Usage

Out-of-the-Box Performances

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages