DeepVul is a multi-task transformer-based model designed to jointly predict gene essentiality and drug response using gene expression data. The model uses a shared feature extractor to learn robust biological representations that can be fine-tuned for downstream tasks, such as gene knockout effect prediction or treatment sensitivity profiling.
- 🚀 Features
- 📦 Installation
- 📊 Datasets
- ⚙️ Hyperparameters
- 🏃 Running the Model
- 🧠 Additional Info
- 📄 Citation
- Joint prediction of gene essentiality and drug response
- Shared transformer encoder for multi-task learning
- Flexible modes: pre-training only, fine-tuning only, or both
- Compatible with public omics and pharmacogenomic datasets
- Fully configurable via command-line arguments
Make sure you have conda installed. Then run:
conda env create --file condaenv.yml
conda activate condaenv
To run DeepVul, download the following datasets and place them in the data/
directory:
Dataset | Description | Source |
---|---|---|
Gene Expression | TPM-log transformed gene expression data | Download |
Gene Essentiality | CRISPR-Cas9 knockout effect scores | Download |
Drug Response | PRISM log-fold change drug response | Download |
Sanger Essentiality | CERES gene effect data from Sanger | Download |
Somatic Mutation | Mutation profiles for CCLE lines | Download |
DeepVul supports flexible training via CLI arguments:
Parameter | Default | Description |
---|---|---|
--pretrain_batch_size |
20 | Batch size during pre-training |
--finetuning_batch_size |
20 | Batch size during fine-tuning |
--hidden_state |
500 | Size of transformer hidden layers |
--pre_train_epochs |
20 | Pre-training epochs |
--fine_tune_epochs |
20 | Fine-tuning epochs |
--opt |
Adam | Optimizer type |
--lr |
0.0001 | Learning rate |
--dropout |
0.1 | Dropout rate |
--nhead |
2 | Number of attention heads |
--num_layers |
2 | Transformer encoder layers |
--dim_feedforward |
2048 | Feedforward network size |
--fine_tuning_mode |
freeze-shared | Whether to freeze shared layers during fine-tuning |
--run_mode |
pre-train / fine-tune / both | Execution mode |
cd src
python run_deepvul.py --run_mode pre-train ...
python run_deepvul.py --run_mode fine-tune ...
python run_deepvul.py --run_mode both ...
Customize the CLI options as needed based on your experiment setup.
- Source code for model architecture, training, and evaluation is located in the
src/
directory. - If you encounter issues or have questions, please open a GitHub Issue or contact the maintainers.
- Model interpretation and evaluation scripts are included in the repo.
If you use DeepVul in your work, please cite:
@article {Jararweh2024.10.17.618944,
author = {Jararweh, Ala and Arredondo, David and Macaulay, Oladimeji and Dicome, Mikaela and Tafoya, Luis and Hu, Yue and Virupakshappa, Kushal and Boland, Genevieve and Flaherty, Keith and Sahu, Avinash},
title = {DeepVul: A Multi-Task Transformer Model for Joint Prediction of Gene Essentiality and Drug Response},
elocation-id = {2024.10.17.618944},
year = {2024},
doi = {10.1101/2024.10.17.618944},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2024/10/21/2024.10.17.618944},
journal = {bioRxiv}
}