AFCluster, but modular

A modular reimplementation of AF-Cluster for plug-n-play functionality, incorporation into computational workflows, and HPC/HTC deployment through ColabFold.

Features

Modular Design: Clean separation of MSA generation, clustering, and structure prediction
Slurm & Apptainer Compatibility: Integration for high-performance and high-throughput computing clusters
Modular MSA Clustering: Additional clustering methods can be easily swapped in.

Installation

Prerequisites

Python 3.8+
ColabFold (for structure prediction)
MMseqs2 (optional, for local MSA generation)

Setup

Clone the repository:

git clone git@github.com:gelnesr/AFCluster-2.git
cd AFCluster-2

Set up ColabFold locally:

If you do not already have ColabFold installed, please follow the installation instructions from localcolabfold:
```
# For Linux
wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_linux.sh
bash install_colabbatch_linux.sh
```
After setting up ColabFold, set the relative path in the configs/afcluster.yml file. We recommend also setting up a cache directory.
For HPC deployment, run the setup script:

This will automatically set up the enviornment and set the path for ColabFold to your $SCRATCH/tools folder. Edit this line in the .sh script to the appropriate directory.
```
bash scripts/env_setup.sh
```
For Apptainer deployment, run the following set of commands.

First set up a .sif file to your $SCRATCH/containers folder and a cache directory at $SCRATCH/cache. Edit this line in the .sh script to the appropriate directory.
```
bash scripts/build_apptainer.sh
```
Then, run this which will automatically set up the enviornment and set the path for ColabFold to your $SCRATCH/tools folder. Edit this line in the .sh script to the appropriate directory.
```
bash scripts/env_setup.sh
```
Then run module load apptainer or module load singularity to initialize an apptainer. You should set the INPUT.fasta in the command below before running:
```
apptainer exec --nv \
     --bind "AFCluster-2:/w","$SCRATCH:$SCRATCH" \
     --env XDG_CACHE_HOME="$CACHE" \
     --env MPLCONFIGDIR="$CACHE" \
     "$IMG" bash -lc '
     cd /w
     source afc/bin/activate
     python afcluster.py --input INPUT.fasta
     '
```

Usage

Basic Usage

Run the pipeline on a FASTA file:

python afcluster.py --input sequences.fasta

Parameters

--input: Input FASTA file (required)
--msa: Pre-computed MSA file (optional)

Configuration

Modify configs/afcluster.yml to adjust clustering parameters:

keyword: "MAIN"           
gap_cutoff: 0.25
random_seed: 42

cluster_method: "dbscan"

dbscan:
  min_samples: 10
  eps_val: null           
  min_eps: 3
  max_eps: 20.0
  eps_step: 0.5 

path_vars:
  PATH: "/path/to/colabfold/bin:$PATH"
  XDG_CACHE_HOME: "/path/to/cache"
  MPLCONFIGDIR: "/path/to/cache"

Workflow

Input Processing: Load FASTA sequences
MSA Generation: Create multiple sequence alignments using colabfold_batch or local MMSeqs
Sequence Filtering: Remove sequences with high gap content
Clustering: Group similar sequences using specified method
Structure Prediction: Run ColabFold on each cluster
Output: Generate cluster-specific A3M files and predicted structures with corresponding json/png files

Output Structure

output/
├── sequence_id/
│   ├── sequence_id.a3m          # Generated MSA
│   ├── clusters/
│   │   ├── sequence_id_000.a3m  # Cluster 0
│   │   ├── sequence_id_001.a3m  # Cluster 1
│   │   └── ...
│   └── preds/
│       ├── sequence_id_000/
│       │   ├── s0/              # Structure prediction seed 0
│       │   ├── s1/              # Structure prediction seed 1
│       │   └── ...
│       └── ...

License

This project is based on the original AF-Cluster implementation.

Citation

If you use AFCluster-2 in your research, please cite the following works and acknowledge this implementation:

@article{AFCluster,
  title={Predicting multiple conformations via sequence clustering and AlphaFold2},
  DOI={10.1038/s41586-023-06832-9},
  journal={Nature},
  author={Wayment-Steele, Hannah K. and Ojoawo, Adedolapo and Otten, Renee and Apitz, Julia M. and Pitsawong, Warintra and Hömberger, Marc and Ovchinnikov, Sergey and Colwell, Lucy and Kern, Dorothee},
  year={2023},
}

Dependencies

This software builds upon the following tools and methods:

AF-Cluster - Wayment-Steele, H.K., Ojoawo, A., Otten, R. et al. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 625, 832–839 (2024). https://doi.org/10.1038/s41586-023-06832-9
[GitHub]

ColabFold - Mirdita, M., Schütze, K., Moriwaki, Y. et al. ColabFold: making protein folding accessible to all. Nat Methods 19, 679–682 (2022). https://doi.org/10.1038/s41592-022-01488-1
[GitHub] | [Local Installation]

MMseqs2 - Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol 35, 1026–1028 (2017). https://www.nature.com/articles/nbt.3988 [GitHub]

AlphaFold - Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
configs		configs
input		input
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
afcluster.py		afcluster.py
filehandling.py		filehandling.py
make_sbatch.py		make_sbatch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AFCluster, but modular

Features

Installation

Prerequisites

Setup

Usage

Basic Usage

Parameters

Configuration

Workflow

Output Structure

License

Citation

Dependencies

About

Uh oh!

Releases

Packages

Languages

gelnesr/AFCluster-2

Folders and files

Latest commit

History

Repository files navigation

AFCluster, but modular

Features

Installation

Prerequisites

Setup

Usage

Basic Usage

Parameters

Configuration

Workflow

Output Structure

License

Citation

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages