Skip to content

selkamand/micrite-gethuman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

micrite-gethuman

micrite-gethuman is a nextflow pipeline designed to extract high-confidence host (human) reads from clinical sequencing data.

Overview

When searching for microbial sequences in human clinical samples, it is helpful to have a "ground truth" subset of human DNA from those same samples. This subset can serve as a baseline to compare putative microbial hits too with metrics like base qualities (PHRED scores), helping to distinguish true biological signals from sequencing noise or artifacts.

This pipeline processes BAM files (paired-end reads aligned to a human reference) and applies multiple filters to ensure only the most reliable host reads are retained.

Filtering Logic

The pipeline extracts reads that meet the following criteria:

  • Primary Alignments Only: Excludes secondary or supplementary alignments.
  • Proper Pairs: Both reads must be oriented and spaced as expected by the aligner, and cannot be marked as PCR/optical duplicates.
  • Expected Reference Chromosome Maps specifically to a user-defined set of --hostchroms (e.g., "chr1 chr2") to ensure no decoy contig alignments contaminate the outputs.
  • **Quality Thresholds:**Exceeds a user-specified Mapping Quality (MAPQ).
  • Length Thresholds: Exceeds a user-specified minimum Query Length.

After that initial filter, we randomly subsample to a specific number of reads based on --nreads argument. Random subsampling uses the seed 111 by defualt but can be changed using the process directive (task.ext.seed = )


Quick Start

nextflow run selkamand/micrite-gethuman -profile docker \
  --sampleid testsample \
  --hostchroms "chr1 chr2" \
  --min_query_length 10 \
  --min_mapping_quality 20 \
  --nreads 5

Testing

To verify the installation and workflow logic, run the built-in test profile:

nextflow run . -profile docker,test

See test file readme for details on what to expect in test run output

About

Fetch a subsample of paired human reads that map well to the reference genome. Used to compare to microbial reads for Phred quality, GC bias, etc

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors