A Rust implementation of FastQC, a quality control tool for high throughput sequence data. This is a 1:1 rewrite of FastQC v0.12.1 with identical output format and analysis algorithms.
-
Built-in Trim Galore support — adapter/quality trimming via
fastqc-rs trim-galore, a Rust reimplementation wrapping Cutadapt -
12 analysis modules with identical algorithms and pass/warn/fail thresholds:
- Basic Statistics
- Per Base Sequence Quality
- Per Tile Sequence Quality
- Per Sequence Quality Scores
- Per Base Sequence Content
- Per Sequence GC Content
- Per Base N Content
- Sequence Length Distribution
- Sequence Duplication Levels
- Overrepresented Sequences
- Adapter Content
- Kmer Content
-
Input formats: FASTQ (plain, gzip, bzip2), BAM, SAM
-
Output: HTML report with SVG graphs, ZIP archive,
fastqc_data.txt,summary.txt -
Output compatibility: Text reports match Java FastQC output (identical PASS/WARN/FAIL, near-identical data values)
-
Multi-file parallel processing via rayon
-
Single binary with embedded configuration files
cargo install --path .Or build from source:
cargo build --release# Basic usage
fastqc-rs input.fastq
# Multiple files with parallel processing
fastqc-rs -t 4 sample1.fastq.gz sample2.fastq.gz
# Specify output directory
fastqc-rs -o results/ input.fastq
# BAM/SAM files
fastqc-rs input.bam
fastqc-rs -f sam_mapped input.sam # Only mapped reads, with soft-clip removal
# Extract results from ZIP
fastqc-rs --extract input.fastq
# Quiet mode (suppress progress)
fastqc-rs -q input.fastq| Option | Description |
|---|---|
-o, --outdir <DIR> |
Output directory (must exist) |
--extract |
Unzip output after creation |
--noextract |
Don't unzip output (default) |
--delete |
Delete ZIP after extraction |
-f, --format <FMT> |
Force format: fastq, bam, sam, bam_mapped, sam_mapped |
-c, --contaminants <FILE> |
Custom contaminant list |
-a, --adapters <FILE> |
Custom adapter list |
-l, --limits <FILE> |
Custom pass/warn/fail thresholds |
-t, --threads <N> |
Number of files to process simultaneously (default: 1) |
-k, --kmers <SIZE> |
Kmer length (default: 7) |
-q, --quiet |
Suppress progress messages |
--casava |
CASAVA mode (filter flagged reads) |
--nogroup |
Disable base position grouping |
--expgroup |
Use exponential base grouping |
--min-length <BP> |
Minimum sequence length for grouping |
--dup-length <BP> |
Truncation length for duplication analysis |
--svg |
Output SVG graphs |
For each input file sample.fastq.gz, produces:
sample_fastqc.html— Interactive HTML reportsample_fastqc.zip— Archive containing:fastqc_report.htmlfastqc_data.txt— Tab-delimited analysis datasummary.txt— PASS/WARN/FAIL per moduleIcons/— Status iconsImages/— SVG graphs
A built-in Rust reimplementation of Trim Galore (by Felix Krueger), wrapping Cutadapt for adapter and quality trimming. Requires Cutadapt to be installed separately.
# Single-end with adapter auto-detection
fastqc-rs trim-galore reads.fq.gz
# Paired-end with custom cutadapt path
fastqc-rs trim-galore --paired --path-to-cutadapt /opt/bin/cutadapt \
R1.fq.gz R2.fq.gz
# Illumina adapter, quality 30, min length 50, 4 cores
fastqc-rs trim-galore --illumina -q 30 --length 50 -j 4 -o trimmed/ reads.fq.gz
# Hard-trim to first 75bp
fastqc-rs trim-galore --hardtrim5 75 reads.fq.gz
# See all options
fastqc-rs trim-galore --helpKey options:
| Option | Description |
|---|---|
--paired |
Paired-end mode |
-a, --adapter <SEQ> |
Custom adapter sequence |
--illumina / --nextera / --small-rna / --bgiseq |
Adapter presets |
-q, --quality <INT> |
Quality cutoff (default: 20) |
--length <INT> |
Minimum read length (default: 20) |
-j, --cores <N> |
Number of Cutadapt cores |
--path-to-cutadapt <PATH> |
Path to cutadapt executable |
--clip_R1 / --clip_R2 <INT> |
Clip N bp from 5' end |
--three_prime_clip_R1 / --three_prime_clip_R2 <INT> |
Clip N bp from 3' end |
--hardtrim5 / --hardtrim3 <INT> |
Hard-trim to N bp from 5'/3' end |
--rrbs |
RRBS mode (MspI-digested) |
--fastqc |
Run FastQC after trimming |
-o, --output_dir <DIR> |
Output directory |
Benchmarked on a paired-end Illumina dataset (~1.15 GB / ~1.20 GB gzipped FASTQ, ~9.9M reads x 150bp):
| File | FastQC v0.12.1 (Java) | fastqc-rs (Rust) | Speedup |
|---|---|---|---|
| SPL1E1_raw_1.fastq.gz (1.15 GB) | 48.6s | 46.8s | 1.04x |
| SPL1E1_raw_2.fastq.gz (1.20 GB) | 47.6s | 45.7s | 1.04x |
Optimizations applied: zlib-rs decompression backend, reader/processor pipeline (overlapping I/O with compute), AHashMap for hot-path modules, in-place ASCII uppercase, 256KB I/O buffer, LTO + codegen-units=1.
| File | FastQC v0.12.1 (Java) | fastqc-rs (optimized) | Speedup vs Java | Speedup vs baseline |
|---|---|---|---|---|
| SPL1E1_raw_1.fastq.gz (1.15 GB) | 48.6s | 38.8s | 1.25x | 1.21x |
| SPL1E1_raw_2.fastq.gz (1.20 GB) | 47.6s | 39.1s | 1.22x | 1.17x |
Tested on Linux 6.6 (WSL2). Pipeline uses 2 threads (reader + processor). CPU utilization ~117%. Bottleneck is module processing (~85% of wall time).
Added data-parallel architecture: 1 reader thread + 4 worker threads, each with independent module copies. Workers process sequence subsets in parallel, results merged after completion.
| File | FastQC v0.12.1 (Java) | fastqc-rs (multi-threaded) | Speedup vs Java | Speedup vs baseline |
|---|---|---|---|---|
| SPL1E1_raw_1.fastq.gz (1.15 GB) | 48.6s | 9.4s | 5.2x | 5.0x |
| SPL1E1_raw_2.fastq.gz (1.20 GB) | 47.6s | 9.4s | 5.1x | 4.9x |
This version was fast, but not output-compatible. Multi-threaded duplication tracking changed the original FastQC semantics, and per-sequence quality bucketing still used Rust-side rounding instead of Java truncation.
Keeps the 1 reader + 4 worker architecture, but moves duplication tracking back to the reader thread so it preserves original file order. Also restores Java-compatible per-sequence quality bucketing and more closely matches Java number formatting.
| File | FastQC v0.12.1 (Java) | fastqc-rs (fixed multi-threaded) | Speedup vs Java | Speedup vs baseline |
|---|---|---|---|---|
| SPL1E1_raw_1.fastq.gz (1.15 GB) | 48.6s | 13.0s | 3.7x | 3.6x |
| SPL1E1_raw_2.fastq.gz (1.20 GB) | 47.6s | 13.7s | 3.5x | 3.3x |
Tested on Linux 6.6 (WSL2). Uses 5 threads total (1 reader + 4 workers).
summary.txtmatches Java FastQC exactly,Per sequence quality scoresmatches exactly, andSequence Duplication Levelsnow differs only in floating-point tail digits.
Tested on Intel i9-13900K (8C/16T), Linux 6.6 (WSL2), all runs with cold disk cache (echo 3 > /proc/sys/vm/drop_caches).
Illumina short reads (~9.9M reads x 150bp per file):
| Test | FastQC v0.12.1 (Java) | fastqc-rs | Speedup |
|---|---|---|---|
| Single file (1.15 GB gz) | 52.3s | 24.8s | 2.1x |
Two files -t 2 (2.35 GB gz) |
50.5s | 14.1s | 3.6x |
ONT long reads (500K reads, variable length 200bp–150kbp, 3.99 GB gz):
| Test | FastQC v0.12.1 (Java) | fastqc-rs | Speedup |
|---|---|---|---|
| Single file | 154.7s | 32.9s | 4.7x |
Note on Java memory: FastQC (Java) defaults to 512 MB heap and will OOM on large ONT datasets. The benchmark used
--memory 5120(5 GB) for the ONT test. fastqc-rs has no such limitation — memory usage scales automatically.
Output format is compatible with Java FastQC v0.12.1:
summary.txt— identical (PASS/WARN/FAIL per module match exactly)fastqc_data.txt— nearly identical with minor differences:- Per sequence quality scores: Matches Java FastQC exactly after restoring Java-style truncation for per-read mean quality.
- Sequence Duplication Levels: PASS/WARN/FAIL status matches exactly; remaining differences are limited to floating-point tail digits in the deduplicated percentage and related percentages.
- Number formatting: Small and large values are formatted closer to Java style, including scientific notation where applicable.
- Floating-point precision: Some modules still differ only in the last 1-2 digits due to f64 vs Java double rounding.
- Modules with identical or effectively identical data: Basic Statistics, Per Base Sequence Quality, Per Sequence Quality Scores, Per Sequence GC Content, Sequence Length Distribution, Overrepresented Sequences, Sequence Duplication Levels
- Same module ordering and threshold logic
- Same CLI flags (with minor naming convention differences for multi-word flags)
If you use this project in academic work, please cite the original FastQC release/publication.
This project is licensed under the GNU General Public License v3.0
(GPL-3.0), consistent with the original
FastQC project it rewrites.
- FastQC by Simon Andrews at the Babraham Institute
- Trim Galore by Felix Krueger