FastQC-RS

A Rust implementation of FastQC, a quality control tool for high throughput sequence data. This is a 1:1 rewrite of FastQC v0.12.1 with identical output format and analysis algorithms.

Features

Built-in Trim Galore support — adapter/quality trimming via fastqc-rs trim-galore, a Rust reimplementation wrapping Cutadapt
12 analysis modules with identical algorithms and pass/warn/fail thresholds:
1. Basic Statistics
2. Per Base Sequence Quality
3. Per Tile Sequence Quality
4. Per Sequence Quality Scores
5. Per Base Sequence Content
6. Per Sequence GC Content
7. Per Base N Content
8. Sequence Length Distribution
9. Sequence Duplication Levels
10. Overrepresented Sequences
11. Adapter Content
12. Kmer Content
Input formats: FASTQ (plain, gzip, bzip2), BAM, SAM
Output: HTML report with SVG graphs, ZIP archive, fastqc_data.txt, summary.txt
Output compatibility: Text reports match Java FastQC output (identical PASS/WARN/FAIL, near-identical data values)
Multi-file parallel processing via rayon
Single binary with embedded configuration files

Installation

cargo install --path .

Or build from source:

cargo build --release

Usage

# Basic usage
fastqc-rs input.fastq

# Multiple files with parallel processing
fastqc-rs -t 4 sample1.fastq.gz sample2.fastq.gz

# Specify output directory
fastqc-rs -o results/ input.fastq

# BAM/SAM files
fastqc-rs input.bam
fastqc-rs -f sam_mapped input.sam   # Only mapped reads, with soft-clip removal

# Extract results from ZIP
fastqc-rs --extract input.fastq

# Quiet mode (suppress progress)
fastqc-rs -q input.fastq

CLI Options

Option	Description
`-o, --outdir <DIR>`	Output directory (must exist)
`--extract`	Unzip output after creation
`--noextract`	Don't unzip output (default)
`--delete`	Delete ZIP after extraction
`-f, --format <FMT>`	Force format: `fastq`, `bam`, `sam`, `bam_mapped`, `sam_mapped`
`-c, --contaminants <FILE>`	Custom contaminant list
`-a, --adapters <FILE>`	Custom adapter list
`-l, --limits <FILE>`	Custom pass/warn/fail thresholds
`-t, --threads <N>`	Number of files to process simultaneously (default: 1)
`-k, --kmers <SIZE>`	Kmer length (default: 7)
`-q, --quiet`	Suppress progress messages
`--casava`	CASAVA mode (filter flagged reads)
`--nogroup`	Disable base position grouping
`--expgroup`	Use exponential base grouping
`--min-length <BP>`	Minimum sequence length for grouping
`--dup-length <BP>`	Truncation length for duplication analysis
`--svg`	Output SVG graphs

Output

For each input file sample.fastq.gz, produces:

sample_fastqc.html — Interactive HTML report
sample_fastqc.zip — Archive containing:
- fastqc_report.html
- fastqc_data.txt — Tab-delimited analysis data
- summary.txt — PASS/WARN/FAIL per module
- Icons/ — Status icons
- Images/ — SVG graphs

Trim Galore

A built-in Rust reimplementation of Trim Galore (by Felix Krueger), wrapping Cutadapt for adapter and quality trimming. Requires Cutadapt to be installed separately.

# Single-end with adapter auto-detection
fastqc-rs trim-galore reads.fq.gz

# Paired-end with custom cutadapt path
fastqc-rs trim-galore --paired --path-to-cutadapt /opt/bin/cutadapt \
  R1.fq.gz R2.fq.gz

# Illumina adapter, quality 30, min length 50, 4 cores
fastqc-rs trim-galore --illumina -q 30 --length 50 -j 4 -o trimmed/ reads.fq.gz

# Hard-trim to first 75bp
fastqc-rs trim-galore --hardtrim5 75 reads.fq.gz

# See all options
fastqc-rs trim-galore --help

Key options:

Option	Description
`--paired`	Paired-end mode
`-a, --adapter <SEQ>`	Custom adapter sequence
`--illumina / --nextera / --small-rna / --bgiseq`	Adapter presets
`-q, --quality <INT>`	Quality cutoff (default: 20)
`--length <INT>`	Minimum read length (default: 20)
`-j, --cores <N>`	Number of Cutadapt cores
`--path-to-cutadapt <PATH>`	Path to cutadapt executable
`--clip_R1 / --clip_R2 <INT>`	Clip N bp from 5' end
`--three_prime_clip_R1 / --three_prime_clip_R2 <INT>`	Clip N bp from 3' end
`--hardtrim5 / --hardtrim3 <INT>`	Hard-trim to N bp from 5'/3' end
`--rrbs`	RRBS mode (MspI-digested)
`--fastqc`	Run FastQC after trimming
`-o, --output_dir <DIR>`	Output directory

Performance

Benchmarked on a paired-end Illumina dataset (~1.15 GB / ~1.20 GB gzipped FASTQ, ~9.9M reads x 150bp):

Baseline (direct Rust rewrite, no optimization)

File	FastQC v0.12.1 (Java)	fastqc-rs (Rust)	Speedup
SPL1E1_raw_1.fastq.gz (1.15 GB)	48.6s	46.8s	1.04x
SPL1E1_raw_2.fastq.gz (1.20 GB)	47.6s	45.7s	1.04x

Optimized v1 (zlib-rs + 2-thread pipeline + ahash + LTO)

Optimizations applied: zlib-rs decompression backend, reader/processor pipeline (overlapping I/O with compute), AHashMap for hot-path modules, in-place ASCII uppercase, 256KB I/O buffer, LTO + codegen-units=1.

File	FastQC v0.12.1 (Java)	fastqc-rs (optimized)	Speedup vs Java	Speedup vs baseline
SPL1E1_raw_1.fastq.gz (1.15 GB)	48.6s	38.8s	1.25x	1.21x
SPL1E1_raw_2.fastq.gz (1.20 GB)	47.6s	39.1s	1.22x	1.17x

Tested on Linux 6.6 (WSL2). Pipeline uses 2 threads (reader + processor). CPU utilization ~117%. Bottleneck is module processing (~85% of wall time).

Optimized v2 (data-parallel multi-threaded processing, output bug) `#0458a08`

Added data-parallel architecture: 1 reader thread + 4 worker threads, each with independent module copies. Workers process sequence subsets in parallel, results merged after completion.

File	FastQC v0.12.1 (Java)	fastqc-rs (multi-threaded)	Speedup vs Java	Speedup vs baseline
SPL1E1_raw_1.fastq.gz (1.15 GB)	48.6s	9.4s	5.2x	5.0x
SPL1E1_raw_2.fastq.gz (1.20 GB)	47.6s	9.4s	5.1x	4.9x

This version was fast, but not output-compatible. Multi-threaded duplication tracking changed the original FastQC semantics, and per-sequence quality bucketing still used Rust-side rounding instead of Java truncation.

Optimized v2 fixed (reader-ordered duplication + Java-compatible bucketing) `#3fdd3b9`

Keeps the 1 reader + 4 worker architecture, but moves duplication tracking back to the reader thread so it preserves original file order. Also restores Java-compatible per-sequence quality bucketing and more closely matches Java number formatting.

File	FastQC v0.12.1 (Java)	fastqc-rs (fixed multi-threaded)	Speedup vs Java	Speedup vs baseline
SPL1E1_raw_1.fastq.gz (1.15 GB)	48.6s	13.0s	3.7x	3.6x
SPL1E1_raw_2.fastq.gz (1.20 GB)	47.6s	13.7s	3.5x	3.3x

Tested on Linux 6.6 (WSL2). Uses 5 threads total (1 reader + 4 workers). summary.txt matches Java FastQC exactly, Per sequence quality scores matches exactly, and Sequence Duplication Levels now differs only in floating-point tail digits.

Comprehensive Benchmark (cold cache, Illumina + ONT)

Tested on Intel i9-13900K (8C/16T), Linux 6.6 (WSL2), all runs with cold disk cache (echo 3 > /proc/sys/vm/drop_caches).

Illumina short reads (~9.9M reads x 150bp per file):

Test	FastQC v0.12.1 (Java)	fastqc-rs	Speedup
Single file (1.15 GB gz)	52.3s	24.8s	2.1x
Two files `-t 2` (2.35 GB gz)	50.5s	14.1s	3.6x

ONT long reads (500K reads, variable length 200bp–150kbp, 3.99 GB gz):

Test	FastQC v0.12.1 (Java)	fastqc-rs	Speedup
Single file	154.7s	32.9s	4.7x

Note on Java memory: FastQC (Java) defaults to 512 MB heap and will OOM on large ONT datasets. The benchmark used --memory 5120 (5 GB) for the ONT test. fastqc-rs has no such limitation — memory usage scales automatically.

Compatibility

Output format is compatible with Java FastQC v0.12.1:

summary.txt — identical (PASS/WARN/FAIL per module match exactly)
fastqc_data.txt — nearly identical with minor differences:
- Per sequence quality scores: Matches Java FastQC exactly after restoring Java-style truncation for per-read mean quality.
- Sequence Duplication Levels: PASS/WARN/FAIL status matches exactly; remaining differences are limited to floating-point tail digits in the deduplicated percentage and related percentages.
- Number formatting: Small and large values are formatted closer to Java style, including scientific notation where applicable.
- Floating-point precision: Some modules still differ only in the last 1-2 digits due to f64 vs Java double rounding.
Modules with identical or effectively identical data: Basic Statistics, Per Base Sequence Quality, Per Sequence Quality Scores, Per Sequence GC Content, Sequence Length Distribution, Overrepresented Sequences, Sequence Duplication Levels
Same module ordering and threshold logic
Same CLI flags (with minor naming convention differences for multi-word flags)

Citation

If you use this project in academic work, please cite the original FastQC release/publication.

License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0), consistent with the original FastQC project it rewrites.

Acknowledgements

FastQC by Simon Andrews at the Babraham Institute
Trim Galore by Felix Krueger

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
docs/history		docs/history
recipe		recipe
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastQC-RS

Features

Installation

Usage

CLI Options

Output

Trim Galore

Performance

Baseline (direct Rust rewrite, no optimization)

Optimized v1 (zlib-rs + 2-thread pipeline + ahash + LTO)

Optimized v2 (data-parallel multi-threaded processing, output bug) `#0458a08`

Optimized v2 fixed (reader-ordered duplication + Java-compatible bucketing) `#3fdd3b9`

Comprehensive Benchmark (cold cache, Illumina + ONT)

Compatibility

Citation

License

Acknowledgements

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FastQC-RS

Features

Installation

Usage

CLI Options

Output

Trim Galore

Performance

Baseline (direct Rust rewrite, no optimization)

Optimized v1 (zlib-rs + 2-thread pipeline + ahash + LTO)

Optimized v2 (data-parallel multi-threaded processing, output bug) #0458a08

Optimized v2 fixed (reader-ordered duplication + Java-compatible bucketing) #3fdd3b9

Comprehensive Benchmark (cold cache, Illumina + ONT)

Compatibility

Citation

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Optimized v2 (data-parallel multi-threaded processing, output bug) `#0458a08`

Optimized v2 fixed (reader-ordered duplication + Java-compatible bucketing) `#3fdd3b9`

Packages