Phylogenetic Pipeline (phylogeny-nf)

This repository provides a reproducible pipeline for phylogenetic analysis using Nextflow, BLASTP, CD-HIT, MAFFT, trimAl, and IQ-TREE. The workflow supports automated BLAST search, clustering, alignment, trimming, and tree inference for protein. The pipeline is highly configurable, allowing users to control each step with parameters.

Features

Automated environment setup with Conda
BLASTP search against a user-specified protein database
Limit the number of BLAST hits retrieved (parameterized)
Sequence clustering with CD-HIT (optional)
Combine original and clustered sequences (optional)
Multiple sequence alignment using MAFFT
Alignment trimming with trimAl
Phylogenetic tree inference using IQ-TREE (optional)
Highly configurable workflow via Nextflow parameters
Organized output directories for results

Requirements

Miniconda/Anaconda
Nextflow (installed via Conda environment)

Installation

Clone the repository:

git clone https://github.com/alejimgon/phylogeny-nf.git
cd phylogeny-nf

Set up the Conda environment:

source env/setup.sh

This will create and activate the phylogeny-nf environment with all dependencies.

Usage

Prepare your input files:

Place your FASTA files in inputs/

Run the pipeline with default settings:

nextflow run main.nf --blast_db /path/to/your/blastdb

Optional parameters

--blast_num_seq 500 : Number of BLAST hits to retrieve per query (default: 500)
--cd_hit true : Enable CD-HIT clustering (default: false)
--cd_hit_ident 0.9 : CD-HIT identity threshold (default: 0.9)
--combine true : Combine original and clustered sequences before alignment (default: false)
--phylogeny true : Run alignment, trimming, and tree inference (default: false)

Example with clustering and phylogeny:

nextflow run main.nf --blast_db /path/to/your/blastdb --cd_hit true --cd_hit_ident 0.95 --combine true --phylogeny true

Output

Results are saved in the results/ directory:

results/blast/ for BLAST outputs and extracted FASTA
results/cdhit/ for clustered and/or combined FASTA files
results/mafft/ for alignments
results/trimal/ for trimmed alignments
results/iqtree/ for phylogenetic trees

Troubleshooting

Conda environment not activated: Always use source env/setup.sh to ensure the environment is activated in your current shell.

Missing dependencies: Re-run the setup script or check env/env.yaml.

No results produced: Check that your input files are in the correct inputs/ directory and that the BLAST database path is correct.

License

This project is for non-commercial use. See LICENSE for details.

Citation

If you use this pipeline, please cite BLAST, CD-HIT, MAFFT, trimAl, IQ-TREE, and Nextflow.

Developed by

Alejandro Jiménez-González

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
env		env
inputs		inputs
results		results
.gitignore		.gitignore
README.md		README.md
main.nf		main.nf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phylogenetic Pipeline (phylogeny-nf)

Features

Requirements

Installation

Usage

Optional parameters

Output

Troubleshooting

License

Citation

Developed by

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Phylogenetic Pipeline (phylogeny-nf)

Features

Requirements

Installation

Usage

Optional parameters

Output

Troubleshooting

License

Citation

Developed by

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages