Skip to content

DeepResponse: Large Scale Prediction of Cancer Cell Line Drug Response with Deep Learning Based Pharmacogenomic Modelling

Notifications You must be signed in to change notification settings

burakcan-izmirli/DeepResponse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepResponse: Large Scale Prediction of Cancer Cell Line Drug Response with Deep Learning Based Pharmacogenomic Modelling

Codacy Badge Platform License

Abstract

Assessing the best treatment option for each patient is the main goal of precision medicine. Patients with the same diagnosis may display varying sensitivity to the applied treatment due to genetic heterogeneity, especially in cancers.

Here, we propose DeepResponse, a machine learning-based system that predicts drug responses (sensitivity) of cancer cells. DeepResponse employs multi-omics profiles of different cancer cell-lines obtained from large-scale screening projects, together with drugs’ molecular features at the input level, and processes them via a hybrid convolutional (cell encoder) and transformer-based (drug encoder) neural network to learn the relationship between tumour multi-omics features and sensitivity to the administered drug.

Both the performance results and in vitro validation experiments indicated DeepResponse successfully predicts drug sensitivity of cancer cells, and especially the multi-omics aspect benefited the learning process and yielded better performance compared to the state-of-the-art. DeepResponse can be used for early-stage discovery of new drug candidates and for repurposing the existing ones against resistant tumours.

Architecture

This repository implements a hybrid architecture consisting of a SELFormer-based drug encoder, an enhanced CNN cell-line encoder, and an MLP prediction head:

Architecture of DeepResponse

Figure 1. Hybrid deep convolutional and graph neural network (HDCGNN) architecture of DeepResponse. Multi-omic features of cell lines are processed via deep convolutional neural networks, whereas graph represented drug molecules are proessed by message passing networks containing transformer encoder layers.

Implementation references: src/dataset/base_dataset_strategy.py, src/model/build/base_model_build_strategy.py, src/model/build/architecture/selformer_architecture.py.

Results

Table 1. Evaluation results (in terms of RMSE) of DeepResponse and other methods on the GDSC dataset (10-fold cross-validation). DeepResponse is the state-of-art compared to existing models on all split strategies.

Results of DeepResponse

Installation

  1. Miniforge is recommended for compatibility with Apple Silicon devices. For the other devices, you can install Anaconda or Miniconda.
  • Please check the prefix field in environment files and change it based on your installation type and directory.
  1. Datasets are stored under dataset/<source>/:

    • Raw inputs: dataset/<source>/raw/
    • Processed artifacts used for training: dataset/<source>/processed/

    Download link (Google Drive): Google Drive folder

    You can either download ready-to-use processed datasets or place the raw files under dataset/<source>/raw/ and generate the processed artifacts locally (see “Dataset Creation”).

  2. Execute the following commands with the appropriate environment file for your operating system.

  3. You need to create a conda environment, all the related packages will be installed.

conda env create -f [apple_silicon_env.yml/linux_env.yml]

Execution

  1. It will create an environment as "deep-response", and you need to activate it.
conda activate deep-response
  1. You can run the model via the terminal:
python3 -m deep_response [--use_comet --data_source --evaluation_source --data_type --split_type --random_state --batch_size --epoch --learning_rate]

You can check the arguments and their default values:

python3 -m deep_response --help

An example of a running statement with all parameters:

python3 -m deep_response --use_comet True --data_source 'gdsc' --evaluation_source 'ccle' --data_type 'normal' --split_type 'random' --random_state 28 --batch_size 64 --epoch 50 --learning_rate 0.01

Dataset Creation

If you have placed the required raw files under dataset/<source>/raw/, you can regenerate the processed artifacts via:

python3 -m dataset.depmap.create_depmap_dataset
python3 -m dataset.ccle.create_ccle_dataset
python3 -m dataset.gdsc.create_gdsc_dataset

Each script writes outputs under dataset/<source>/processed/ (including dataset_records.csv and cell_features_lookup.npz).

Supported Configurations

DeepResponse supports the following high-level CLI configuration rules:

  • --split_type: random, cell_stratified, drug_stratified, drug_cell_stratified, cross_domain
  • --data_type: currently normal
  • --data_source: any dataset source that has processed artifacts available under dataset/<source>/processed/
  • --evaluation_source:
    • Required when --split_type cross_domain
    • Must be omitted (None) for all other split types

In cross_domain, the model is trained on --data_source and evaluated on --evaluation_source, and both sources must have their processed datasets generated/downloaded.

Usage with Comet

You can run DeepResponse with Comet support.

In order to do that, you need to pass True as Comet variable.

python3 -m deep_response --use_comet True

You need to specify api_key, project_name and workspace. Recommended way is to create dev.env at the same level as .yml files and store these variables in there.

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

About

DeepResponse: Large Scale Prediction of Cancer Cell Line Drug Response with Deep Learning Based Pharmacogenomic Modelling

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages