RNA-Protein Complex Refinement via Diffusion
Clone the repo:
git clone https://github.com/zeqri/STRAND.git
Create and activate conda environment:
conda create -n STRAND python=3.9.18
conda activate STRAND
pip install -r requirements.txt
Download the preprocessed structures used in our experiments:
👉 Download Structures from Google Drive
- Download the
test_data.zipfile - Place it in the
datasetsdirectory - Extract the archive:
unzip datasets/test_data.zip -d datasets/Choose your inference mode based on whether you want to use the confidence model:
Run structure refinement with manual selection on any of the three benchmark datasets:
# RNA-Pro dataset
sh src/inference_manual.sh rnapro
# Non-X-ray dataset
sh src/inference_manual.sh nonxray
# X-ray dataset
sh src/inference_manual.sh xrayRun structure refinement using the confidence model for automated structure selection:
# RNA-Pro dataset
sh src/inference_conf.sh rnapro
# Non-X-ray dataset
sh src/inference_conf.sh nonxray
# X-ray dataset
sh src/inference_conf.sh xrayRefined structures and evaluation metrics will be saved in the results/ directory, organized by dataset and method used.
Download PDB files containing RNA-protein complexe before the cutoff date 30.Sept.2021 via the website and store them into datasets/pdb_files or run the command:
mk dir datasets/pdb_files
sh datasets/batch_download.sh -f datasets/list_file.txt -p -o datasets/pdb_files
All the data must be stored as dill files, to do so run:
python src/data/preprocessing/cache_data.py --dir_path datasets/pdb_files --save_path datasets/train/af3_1022P_1022R
Strand tr+rot utalized data augmentation during training, to augment the data run:
sh src/data/preprocessing/data_aug.sh
By default, STRAND trains with translation + rotation (STRAND-tr+rot).
To train with different spatial transformations, modify the boolean arguments in src/train.sh:
# Available options:
--translation True # Enable translation refinement
--rotation True # Enable rotation refinement
--torsion True # Enable torsion angle refinementScore model:
- Set
Data_fileandData_pathvariables insrc/train.sh. - Configure your desired spatial transformations in
src/train.sh. - Run the training script:
sh src/train.shNote: Training requires preprocessed datasets and sufficient computational resources (GPU recommended).
Generate samples:
After obtaining an optimised SCORE MODEL, use it to generate samples via:
sh src/generate_samples.sh
Confidence model:
Use the generated samples to train the confidence model and run:
sh src/train_confidence.sh
Store the structures to be refined as dill files using src/data/preprocessing/cache_data.py.
Specify the path of the stored data set to be refined and it's corrosponding csv file in the variables Data_path and Data_file respectively in the file src/train_confidence.sh.
Set --run_inference_without_confidence_model to be True to run the inference without the confidence model.
Run the inferecne porcess via:
sh src/inference.sh
Set --run_inference_without_confidence_model to be False to run the inference without the confidence model.
sh src/inference.sh
After running the inference visualization directories are created containing the generated samples. Defualt path is visualization/STRAND
To assess how well the refined samples are, Downdload the Ground Truth files that were refined from the PDB as .pdb files and store them indatasets/gt_dir.
Manual Selection results:
To display manual selection results run:
python src/visualize_inf_manual.py --gt_path datasets/gt_dir --samples_path visualization/STRAND
Selection via confidence model:
To display the confidence model selection results run:
python src/visualize_inf_conf.py --gt_path datasets/gt_dir --samples_path visualization/STRAND