1. Introduction

This pipeline automates the preprocessing and analysis of high-throughput sequencing data to evaluate read mappability across the genome. It performs adapter trimming, read merging, genome alignment, and generates the mapping output file. The pipeline supports both single-end and paired-end data, with adaptable scripts based on input format. User-defined adapters and minimum read length thresholds are customizable in the trimming script. Designed for Linux environments, it leverages multi-threading where possible for scalability. The modular script design allows for debugging and isolated execution of any pipeline step. This pipeline provides a comprehensive and reproducible framework for evaluating the mappability of sequencing reads.

2. System Requirements:

- Linux or WSL environment
- ≥4 GB RAM (8 GB recommended)
- ≥50 GB free disk space
- Software requirements: FastQC, MultiQC, Cutadapt, Bowtie2, Samtools

Activate the conda environment before running any scripts:
Command: conda activate mytools

3. Pipeline Steps

3.1 Adapter Trimming and Quality Control

Input: Raw '.fastq.gz' files.
Output: Trimmed FASTQ files, Cutadapt reports, FastQC, and MultiQC summaries.
Operations:
- Removes standard Illumina and custom adapters.
- Trims poly-A, T, C, G sequences.
- Performs quality trimming and filters reads by minimum length.
- Runs `fastqc` on each trimmed file.
- Summarizes QC results with `multiqc`.
Tools: cutadapt, fastqc, multiqc.
Command: bash trim.sh

3.2 Merge (This Step is only for paired-end reads)

Input: Paired-end trimmed files (`*_R1_001.fastq.gz` and `*_R2_001.fastq.gz`).
Output: Concatenated files in `merged_results/`.
Operations:
- Merges each R1 and R2 pair using `cat`.
- Output is named `sample_001.fastq.gz`.
Command: bash merge.sh

3.3 Alignment to the reference genome (sam_bam.sh)

Input: Merged or trimmed single-end FASTQ files.
Output: Sorted and indexed BAM files in `output/`.
Operations:
- Uses `bowtie2` to align reads to the human reference genome (GRCh38).
- Converts SAM to BAM, sorts, and indexes with `samtools`.
- Removes intermediate SAM files.

Key Parameters:
- index_prefix: Path to Bowtie2 index files.
- threads: Number of CPU threads (default: 94).
Command: bash sam_bam.sh

3.4 Alignment Statistics and Mapping (stats.sh)

Input: BAM files from the alignment step.
Output: Per-sample chromosome-level stats (`*_chr_stats.txt`) in `stats/`.
Operations:
- Runs 'samtools idxstats' to extract:
- Chromosome names
- Chromosome lengths
- Number of mapped reads
- Number of unmapped reads
Command: bash stats.sh

3.5 Statistics Compilation (merge_stats.sh)

Input: Individual '*_chr_stats.txt' files.
Output: Summary table `merged_chr_stats.tsv`.
Operations:
- Adds a Sample column to each row.
- Appends all stats into a master TSV file.
Fields: Sample, Chromosome, Chromosome_Length, Mapped_Reads, Unmapped_Reads
Command Example: bash merge_stats.sh

4. Directory Outputs

----------------------------------------------------------------------------------------------------------------I

Directory	Contents
trimmed_results/	Trimmed FASTQ files, Cutadapt, FastQC, MultiQC outputs
merged_results/	Merged R1 and R2 FASTQ files
output/	Sorted and indexed BAM files
stats/	Per-sample and merged mapping statistics
---------------------------------------------------------------------------------------------------------------I

5. Dependencies

Ensure the following tools are installed and accessible from the command line:
-cutadapt
-fastqc
-multiqc
-bowtie2
-samtools
-conda

6. Notes

-Adjust adapter sequences in 'trim.sh' according to your sequencing kit.
-Confirm Bowtie2 index path ('index_prefix') in `sam_bam.sh` matches your system.
-Scripts assume a flat directory structure with all input FASTQ files present.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
merge.sh		merge.sh
merge_stats.sh		merge_stats.sh
sam_bam_v1.sh		sam_bam_v1.sh
sam_bam_v2.sh		sam_bam_v2.sh
sam_bam_v3.sh		sam_bam_v3.sh
stats.sh		stats.sh
trim_v1.sh		trim_v1.sh
trim_v2.sh		trim_v2.sh
trim_v3.sh		trim_v3.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. Introduction

2. System Requirements:

3. Pipeline Steps

3.1 Adapter Trimming and Quality Control

3.2 Merge (This Step is only for paired-end reads)

3.3 Alignment to the reference genome (sam_bam.sh)

3.4 Alignment Statistics and Mapping (stats.sh)

3.5 Statistics Compilation (merge_stats.sh)

4. Directory Outputs

5. Dependencies

Ensure the following tools are installed and accessible from the command line:
-cutadapt
-fastqc
-multiqc
-bowtie2
-samtools
-conda

6. Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

1. Introduction

2. System Requirements:

3. Pipeline Steps

3.1 Adapter Trimming and Quality Control

3.2 Merge (This Step is only for paired-end reads)

3.3 Alignment to the reference genome (sam_bam.sh)

3.4 Alignment Statistics and Mapping (stats.sh)

3.5 Statistics Compilation (merge_stats.sh)

4. Directory Outputs

5. Dependencies

Ensure the following tools are installed and accessible from the command line: -cutadapt -fastqc -multiqc -bowtie2 -samtools -conda

6. Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Ensure the following tools are installed and accessible from the command line:
-cutadapt
-fastqc
-multiqc
-bowtie2
-samtools
-conda

Packages