Skip to content

gusgrazelis/CULLALGO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CULLALGO 1.1

Description

CULLALGO is a script designed to "cull" large datasets of .fasta protein sequences into a manageable amount with desired traits, outputting results in a .fasta file. All measurements leverage up-to-date literature.

Determinant Parameters

The script uses the following parameters:

  1. Molecular Weight
  2. Average Surface Accessibility
  3. Isoelectric Point
  4. Cost
  5. Solubility
  6. Thermostability
  7. Alpha Helical Propensity
  8. Shannon's Entropy (Redundancy Measure)
  9. DNA Complexity

CULLALGO utilizes both NETSOLP and TemStaPro for determining solubility and thermostability thresholds, respectively. Both tools must be installed and configured to run the script.

Dependencies

Automatic installation

Important Notes

⚠️ Still in development ⚠️

  • Permissions: Running such a script may require administrative privileges, especially for parts that involve installing system-wide packages or modifying system paths.
  • Compatibility: This script assumes a Unix-like operating system because of its dependency on bash commands. For Windows, adjustments might be necessary, particularly in how environments are activated and paths are handled.

Run

python3 setup.py

Manual Installation

Environment Setup

conda create -n CULLALGO python=3.11
conda activate CULLALGO

CULLALGO Script preperation

git clone https://github.com/gusgrazelis/CULLALGO.git
cd cullalgo
pip install -r cull_requirements.txt

NETSOLP Installation

This should be installed within the cullalgo directory

wget https://services.healthtech.dtu.dk/services/NetSolP-1.0/netsolp-1.0.ALL.tar.gz
tar -xzvf netsolp-1.0.ALL.tar.gz
pip install -r requirements.txt

Running CULLALGO after TemStaPro has ran

python3 CULLALGO1.1.py --config config.yaml

TemStaPro Installation

This should be its own directory and can be where you like.

git clone https://github.com/ievapudz/TemStaPro.git
cd TemStaPro

Environment requirements

Before starting up Anaconda or Miniconda should be installed in the system. Follow instructions given in Conda's documentation.

Setting up the environment can be done in one of the following ways.

From YML file

In this repository two YML files can be found: one YML file has the prerequisites for the environment that exploits only CPU (environment_CPU.yml), another one to exploit both CPU GPU (environment_GPU.yml).

This approach was tested with Conda 4.10.3 and 4.12.0 versions.

Run the following command to create the environment from a YML file:

conda env create -f environment_CPU.yml

Activate the environment:

conda activate temstapro_env_CPU

From scratch

GPU Setup

To set up the environment to exploit GPU for the program, run the following commands:

conda create -n temstapro_env python=3.7
conda activate temstapro_env
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
conda install -c conda-forge transformers
conda install -c conda-forge sentencepiece
conda install -c conda-forge matplotlib

To test if PyTorch package is installed to exploit CUDA, call python3 command interpreter and run the following lines:

python3
import torch
torch.cuda.is_available()

If the output is 'True', then the installing procedure was successful, otherwise try to set the path to the installed packages:

export PATH=/usr/local/cuda-11.7/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

If CUDA for PyTorch is still not available, check out the forum.

CPU setup

For the systems without GPU, run the following commands for CPU setup:

conda create -n temstapro_env python=3.7
conda activate temstapro_env
conda install -c conda-forge transformers
conda install pytorch -c pytorch
conda install -c conda-forge sentencepiece
conda install -c conda-forge matplotlib

Run TemStaPro

./temstapro -f ./tests/data/long_sequence.fasta -d ./ProtTrans/ \
    -e tests/outputs/ --mean-output ./long_sequence_predictions.tsv

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages