Skip to content

Robaina/filterSAM

Repository files navigation

logo

A Python tool to filter sam/bam files by percent identity or percent of matched sequence

PyPI GitHub release (latest by date) GitHub license Contributor Covenant DOI


Percent identity is computed as:

$$PI = 100 \frac{N_m}{N_m + N_i}$$

where $N_m$ is the number of matches and $N_i$ is the number of mismatches.

Percent of matched sequences is computed as:

$$PM = 100 \frac{N_m}{L}$$

where $L$ corresponds to query sequence length.

NOTES

  1. Percent of matched sequence is also an alternative definition of percent identity used in some cases, for intance, in BLAST.

  2. BAM/SAM files must contain MD tags to be able to filter by percent identity. Aligners such as BWA add MD tags to each queried sequence in a BAM file. MD tags can also be generated with samtools.

Installation

pip install filtersam

Usage

You can find a jupyter notebook with usage examples here.

Citation

If you use this software, please cite it as below:

Robaina-Estévez, S. (2022). filterSAM: filter sam/bam files by percent identity or percent of matched sequence (Version 0.0.11)[Computer software]. https://doi.org/10.5281/zenodo.7056278.

About

Tools to filter SAM/BAM files by percent identity and percent of matched sequence

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages