DeepResponse: Large Scale Prediction of Cancer Cell Line Drug Response with Deep Learning Based Pharmacogenomic Modelling
Assessing the best treatment option for each patient is the main goal of precision medicine. Patients with the same diagnosis may display varying sensitivity to the applied treatment due to genetic heterogeneity, especially in cancers.
Here, we propose DeepResponse, a machine learning-based system that predicts drug responses (sensitivity) of cancer cells. DeepResponse employs multi-omics profiles of different cancer cell-lines obtained from large-scale screening projects, together with drugs’ molecular features at the input level, and processes them via a hybrid convolutional (cell encoder) and transformer-based (drug encoder) neural network to learn the relationship between tumour multi-omics features and sensitivity to the administered drug.
Both the performance results and in vitro validation experiments indicated DeepResponse successfully predicts drug sensitivity of cancer cells, and especially the multi-omics aspect benefited the learning process and yielded better performance compared to the state-of-the-art. DeepResponse can be used for early-stage discovery of new drug candidates and for repurposing the existing ones against resistant tumours.
This repository implements a hybrid architecture consisting of a SELFormer-based drug encoder, an enhanced CNN cell-line encoder, and an MLP prediction head:
Figure 1. Hybrid deep convolutional and graph neural network (HDCGNN) architecture of DeepResponse. Multi-omic features of cell lines are processed via deep convolutional neural networks, whereas graph represented drug molecules are proessed by message passing networks containing transformer encoder layers.
Implementation references: src/dataset/base_dataset_strategy.py, src/model/build/base_model_build_strategy.py, src/model/build/architecture/selformer_architecture.py.
Table 1. Evaluation results (in terms of RMSE) of DeepResponse and other methods on the GDSC dataset (10-fold cross-validation). DeepResponse is the state-of-art compared to existing models on all split strategies.
- Miniforge is recommended for compatibility with Apple Silicon devices.
For the other devices, you can install
AnacondaorMiniconda.
- Please check the prefix field in environment files and change it based on your installation type and directory.
-
Datasets are stored under
dataset/<source>/:- Raw inputs:
dataset/<source>/raw/ - Processed artifacts used for training:
dataset/<source>/processed/
Download link (Google Drive): Google Drive folder
You can either download ready-to-use processed datasets or place the raw files under
dataset/<source>/raw/and generate the processed artifacts locally (see “Dataset Creation”). - Raw inputs:
-
Execute the following commands with the appropriate environment file for your operating system.
-
You need to create a conda environment, all the related packages will be installed.
conda env create -f [apple_silicon_env.yml/linux_env.yml]
- It will create an environment as "deep-response", and you need to activate it.
conda activate deep-response
- You can run the model via the terminal:
python3 -m deep_response [--use_comet --data_source --evaluation_source --data_type --split_type --random_state --batch_size --epoch --learning_rate]
You can check the arguments and their default values:
python3 -m deep_response --help
An example of a running statement with all parameters:
python3 -m deep_response --use_comet True --data_source 'gdsc' --evaluation_source 'ccle' --data_type 'normal' --split_type 'random' --random_state 28 --batch_size 64 --epoch 50 --learning_rate 0.01
If you have placed the required raw files under dataset/<source>/raw/, you can regenerate the processed artifacts via:
python3 -m dataset.depmap.create_depmap_dataset
python3 -m dataset.ccle.create_ccle_dataset
python3 -m dataset.gdsc.create_gdsc_dataset
Each script writes outputs under dataset/<source>/processed/ (including dataset_records.csv and cell_features_lookup.npz).
DeepResponse supports the following high-level CLI configuration rules:
--split_type:random,cell_stratified,drug_stratified,drug_cell_stratified,cross_domain--data_type: currentlynormal--data_source: any dataset source that has processed artifacts available underdataset/<source>/processed/--evaluation_source:- Required when
--split_type cross_domain - Must be omitted (
None) for all other split types
- Required when
In cross_domain, the model is trained on --data_source and evaluated on --evaluation_source, and both sources must have their processed datasets generated/downloaded.
You can run DeepResponse with Comet support.
In order to do that, you need to pass True as Comet variable.
python3 -m deep_response --use_comet True
You need to specify api_key, project_name and workspace. Recommended way is to create dev.env at the same level as .yml files and store these variables in there.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.