Minipoa: A minimizer-based method for fast and memory-efficient partial order alignment.
Install via conda
conda install -c malab minipoaOr, make from source
# Git clone this repository
git clone https://github.com/NCl3-lhd/minipoa.git
cd minipoa && mkdir build && cd build
# Auto-detect SIMD (prefers AVX2, falls back to SSE)
cmake .. && make
# Or specify a backend explicitly:
cmake -DENABLE_AVX512=ON .. && make
cmake -DENABLE_SSE2=ON .. && makeGenerate a consensus sequence (Sequencing mode):
minipoa test/mtDNA.fasta > cons.fastaPerform multiple sequence alignment (MSA mode):
minipoa test/mtDNA.fasta -S -r1 -t thread > mtDNA.fasta Output the sequence graph in GFA format:
minipoa test/mtDNA.fasta -S -r2 -t thread > mtDNA.gfaView the full list of parameters:
minipoa -hMinipoa supports input in FASTA, FASTQ, gzipped FASTA(.fa.gz), and gzipped FASTQ(.fq.gz) formats. It incrementally construct a alignment graph by input sequences. Optionally, an existing GFA file can be provided via -i to initialize the alignment graph.
minipoa input.fasta -i input.gfa -S -t thread -r2 > output.gfaMinipoa provides three output modes, which can be selected using the -r parameter. Please note that the -r option is independent and is not affected by any other parameters.
-r0: Output the consensus sequence in FASTA format (default)-r1: Output the multiple sequence alignment:-r2: Output the sequence graph in GFA format
To accommodate different datasets and balance speed with accuracy, minipoa provides the following advanced command-line parameters for fine-tuning:
You can customize the dynamic programming scoring scheme using the following parameters:
-M: Match score-X: Mismatch penalty-O: Gap open penalty-E: Gap extension penalty
minipoa input.fasta -M 2 -X -4 -O -4 -E -2 > output.fastaAlternatively, you can load a full substitution matrix from an .mtx file:
-m: Path to a scoring matrix file in.mtxformat (supports both nucleotide and amino acid matrices)
# Use HOXD70 nucleotide substitution matrix
minipoa input.fasta -m test/HOXD70.mtx -O -105 -E -12 > output.fastaThe .mtx file format is a tab/space-delimited text file. The first non-comment line (not starting with #) defines the alphabet as column headers, and each subsequent line provides the row character followed by scores. See test/HOXD70.mtx for an example.
-B : Enable the adaptive band strategy (static banding is enabled by default).
# Enable adaptive band strategy in Sequencing mode
minipoa input.fasta -B > output.fastaThe band length is automatically calculated based on the specified parameters and the query sequence length using the formula: len = b + 1 / f * query_length.
# Narrow the bandwidth under adaptive band strategy
minipoa input.fasta -B -b 10 -f 100 > output.fastaf = 0 : If you explicitly set f = 0, minipoa will not apply any banding strategy and will perform full dynamic programming across the entire matrix.
minipoa input.fasta -f 0 > output.fasta-p : Optimize the alignment order by constructing a guide tree. Both the time and space complexity for this process are
-S : Enable anchor chain optimization. Please note that enabling this option will dynamically select between a static band and an adaptive band strategy based on the characteristics of the delineated alignment region to maintain alignment robustness.
-W : Adjust the window distance parameter for anchors.
- Recommended Range: Values between
500andquery_length / 24are generally appropriate. - Default Behavior: To prevent incorrect alignments caused by false-positive anchors, the software conservatively defaults to the maximum value in this range.
- Performance Tuning: If you want to further accelerate the alignment process, you can manually specify a smaller
-Wvalue. This will increase speed, though it may result in a very slight loss of accuracy.
minipoa input.fasta -S -W 500 -t thread > output.fastaFor any questions or issues, please contact me at ncl3.lhd@gmail.com.