This repository provides a reproducible pipeline for phylogenetic analysis using Nextflow, BLASTP, CD-HIT, MAFFT, trimAl, and IQ-TREE. The workflow supports automated BLAST search, clustering, alignment, trimming, and tree inference for protein. The pipeline is highly configurable, allowing users to control each step with parameters.
- Automated environment setup with Conda
- BLASTP search against a user-specified protein database
- Limit the number of BLAST hits retrieved (parameterized)
- Sequence clustering with CD-HIT (optional)
- Combine original and clustered sequences (optional)
- Multiple sequence alignment using MAFFT
- Alignment trimming with trimAl
- Phylogenetic tree inference using IQ-TREE (optional)
- Highly configurable workflow via Nextflow parameters
- Organized output directories for results
- Miniconda/Anaconda
- Nextflow (installed via Conda environment)
Clone the repository:
git clone https://github.com/alejimgon/phylogeny-nf.git
cd phylogeny-nfSet up the Conda environment:
source env/setup.shThis will create and activate the phylogeny-nf environment with all dependencies.
Prepare your input files:
- Place your FASTA files in
inputs/
Run the pipeline with default settings:
nextflow run main.nf --blast_db /path/to/your/blastdb--blast_num_seq 500: Number of BLAST hits to retrieve per query (default: 500)--cd_hit true: Enable CD-HIT clustering (default: false)--cd_hit_ident 0.9: CD-HIT identity threshold (default: 0.9)--combine true: Combine original and clustered sequences before alignment (default: false)--phylogeny true: Run alignment, trimming, and tree inference (default: false)
Example with clustering and phylogeny:
nextflow run main.nf --blast_db /path/to/your/blastdb --cd_hit true --cd_hit_ident 0.95 --combine true --phylogeny trueResults are saved in the results/ directory:
results/blast/for BLAST outputs and extracted FASTAresults/cdhit/for clustered and/or combined FASTA filesresults/mafft/for alignmentsresults/trimal/for trimmed alignmentsresults/iqtree/for phylogenetic trees
Conda environment not activated:
Always use source env/setup.sh to ensure the environment is activated in your current shell.
Missing dependencies:
Re-run the setup script or check env/env.yaml.
No results produced:
Check that your input files are in the correct inputs/ directory and that the BLAST database path is correct.
This project is for non-commercial use. See LICENSE for details.
If you use this pipeline, please cite BLAST, CD-HIT, MAFFT, trimAl, IQ-TREE, and Nextflow.
Alejandro Jiménez-González