The 5’ untranslated region (5’ UTR) is a primary driver of protein expression, yet the complexity of its regulatory code has hindered our ability to decode translation regulation, understand variant effects and engineer effective mRNAs. To address this, we developed UTR-CODE, a deep learning model trained on 1,586 samples with paired RNA-seq and Ribo-seq datasets across six species. UTR-CODE demonstrates strong cross-species generalizability and outperforms existing tools across diverse species and platforms.
We provide a web server for predicting and optimizing sequences.
We tested the UTR-CODE on Python 3.10.
git clone https://github.com/UTR-CODE/UTR-CODE
cd UTR-CODE
pip install -r requirements.txt
Download and decompress the training raw data from figshare
You can train the UTR-CODE with the simple example.
python train.pycd script
python Pred_single.py --weights epochs/best.epoch \
--utr5 GCTACGATCGATCGATCGACTAG \
--utr3 CCACAACCACTGAGT \
--cds CGTACGCTAGCTAGCAT cd script
python script/Pred_batch.py --file batch_mRNA.csv --output batch_mRNA.result.csv
python ISM.py --weights epochs/best.epoch \
--utr5 GCTACGATCGATCGATCGACTAG \
--utr3 CCACAACCACTGAGT \
--cds CGTACGCTAGCTAGCAT If users want train UTR-CODE with custom data, we provide the tutorial prepare input data.
UTR-CODE are maintained by: * @HeXin
For more Help, Please leave a message in the issue, I will reply as soon as possible.
MIT