speccheck

speccheck is a modular command-line tool for collecting, validating, and summarizing quality control (QC) metrics from genomic analysis pipelines. It automatically detects and processes outputs from multiple bioinformatics tools, validates them against customizable criteria, and generates comprehensive reports with optional interactive visualizations.

Features

🔍 Automatic Module Detection: Supports CheckM, QUAST, Speciator, ARIBA, and Sylph outputs
✅ Flexible QC Validation: Define organism-specific quality criteria with pass/fail checks
📊 Interactive Reports: Generate HTML dashboards with Plotly visualizations
🔗 Metadata Integration: Merge external sample metadata into QC reports
📝 Rich Logging: Beautiful console output with Rich library
🐳 Docker Support: Pre-built Docker images available

Installation

From Source

Clone the repository and install with pip:

git clone https://github.com/happykhan/speccheck.git
cd speccheck
pip install -e .

Development Installation

For development with testing and linting tools:

pip install -e '.[dev]'

Note: This project uses modern Python packaging with pyproject.toml (PEP 517/621). See MIGRATION.md for details on the migration from setup.py.

Docker

A Docker image is available for containerized execution:

docker pull happykhan/speccheck

Quick Start

Collect QC data from analysis outputs:

speccheck collect tests/practice_data/Sample_* --output-file results.csv

Generate summary report with visualizations:

speccheck summary qc_results/ --plot

Validate criteria file:

speccheck check --criteria-file criteria.csv

Usage

Command: `collect`

Collect and validate QC metrics from bioinformatics tool outputs.

speccheck collect [OPTIONS] FILEPATHS...

Options

Option	Type	Default	Description
`FILEPATHS`	Positional	Required	File paths (supports wildcards like `data//.tsv`)
`--organism`	String	Auto-detect	Organism name for criteria matching
`--sample`	String	None	Sample identifier
`--criteria-file`	Path	`criteria.csv`	CSV file with QC criteria
`--output-file`	Path	`qc_results/collected_data.csv`	Output CSV path
`--metadata`	Path	None	CSV with additional metadata (requires `sample_id` column)
`-v, --verbose`	Flag	False	Enable debug logging
`--version`	Flag	-	Show version and exit

Examples

Basic collection:

speccheck collect data/sample1/*.tsv --sample sample1

With organism specification:

speccheck collect data/ecoli_* --organism "Escherichia coli" --output-file ecoli_qc.csv

With metadata merging:

speccheck collect data/* --metadata sample_info.csv --output-file merged_results.csv

Supported Modules

The collect command automatically detects outputs from:

CheckM: Completeness, contamination, genome metrics
QUAST: Assembly statistics (N50, contigs, GC content)
Speciator: Species identification and confidence
ARIBA: Antimicrobial resistance gene detection
Sylph: Metagenomic profiling and ANI values

Command: `summary`

Generate consolidated reports from multiple collected QC files.

speccheck summary [OPTIONS] DIRECTORY

Options

Option	Type	Default	Description
`DIRECTORY`	Positional	Required	Directory containing CSV QC reports
`--output`	Path	`qc_report`	Output directory for summary
`--species`	String	`Speciator.speciesName`	Column name for species field
`--sample`	String	`sample_id`	Column name for sample identifier
`--templates`	Path	`templates/report.html`	HTML template file
`--plot`	Flag	False	Generate interactive plots
`-v, --verbose`	Flag	False	Enable debug logging
`--version`	Flag	-	Show version and exit

Examples

Basic summary:

speccheck summary qc_results/

With plotting enabled:

speccheck summary qc_results/ --plot --output final_report/

Custom field names:

speccheck summary results/ --sample SampleID --species Species --plot

Output

report.csv: Consolidated QC metrics with sorted columns (sample_id, all_checks_passed, .check columns, other fields)
report.html: Interactive HTML dashboard (when --plot is enabled)

Command: `check`

Validate the structure and content of a criteria file.

speccheck check [OPTIONS]

Options

Option	Type	Default	Description
`--criteria-file`	Path	`criteria.csv`	Path to criteria CSV file
`-v, --verbose`	Flag	False	Enable debug logging
`--version`	Flag	-	Show version and exit

Example

speccheck check --criteria-file config/custom_criteria.csv

Criteria File Format

The criteria file defines organism-specific QC thresholds in CSV format:

organism,software,field,operator,threshold
Escherichia coli,Checkm,Completeness,>=,95
Escherichia coli,Checkm,Contamination,<=,5
Escherichia coli,Quast,N50,>=,50000

Columns:

organism: Species or genus name (use "all" for universal criteria)
software: Tool name (CheckM, QUAST, Speciator, ARIBA, Sylph)
field: Metric name from tool output
operator: Comparison operator (>=, <=, ==, >, <)
threshold: Numeric threshold value

Metadata Integration

Add external sample metadata using the --metadata option:

metadata.csv:

sample_id,location,sequencing_date,batch
sample1,Lab A,2024-01-15,Batch1
sample2,Lab B,2024-01-16,Batch1

speccheck collect data/* --metadata metadata.csv --output-file results.csv

Metadata columns are automatically merged with QC metrics based on sample_id.

Output Format

CSV Column Order

Output files are automatically organized for readability:

Sample identifier (sample_id or Sample)
Overall checks (columns ending with all_checks_passed)
Individual checks (columns ending with .check) - sorted alphabetically
Metrics (remaining columns) - sorted alphabetically

Example Output

sample_id,all_checks_passed,Checkm.all_checks_passed,Checkm.Completeness.check,Checkm.Contamination.check,Checkm.Completeness,Checkm.Contamination
sample1,True,True,True,True,98.5,1.2
sample2,False,False,False,True,89.3,0.8

Development

Running Tests

pytest
pytest --cov=speccheck  # With coverage

Code Quality

pylint speccheck/

Project Structure

speccheck/
├── speccheck/
│   ├── __init__.py
│   ├── main.py              # Core logic
│   ├── collect.py           # File collection & writing
│   ├── criteria.py          # Criteria validation
│   ├── report.py            # Report generation
│   ├── modules/             # Tool-specific parsers
│   │   ├── checkm.py
│   │   ├── quast.py
│   │   ├── speciator.py
│   │   ├── ariba.py
│   │   └── sylph.py
│   └── plot_modules/        # Visualization modules
│       ├── plot_checkm.py
│       ├── plot_quast.py
│       └── ...
├── tests/                   # Pytest test suite
├── templates/               # HTML templates
├── speccheck.py            # CLI entry point
└── setup.py                # Package configuration

Dependencies

Core: rich, typer, pandas, jinja2, plotly
Dev: pytest, pytest-cov, pylint, coverage

Version

Check the installed version:

speccheck --version

License

This project is licensed under the GNU General Public License v3.0 (GPLv3). See LICENSE for details.

Contributing

Contributions are welcome! We appreciate bug reports, feature requests, documentation improvements, and code contributions.

Quick Start for Contributors

Fork the repository
Install development dependencies: pip install -e '.[dev]'
Install pre-commit hooks: pre-commit install
Create a feature branch: git checkout -b feature/your-feature
Make your changes and add tests
Run checks: pytest --cov=speccheck && ruff check speccheck/
Submit a pull request

For detailed guidelines, see CONTRIBUTING.md.

Code Quality

This project uses:

Black for code formatting
Ruff for fast linting
Pylint for comprehensive code analysis
pytest with coverage reporting
pre-commit hooks for automated checks

All PRs must pass CI checks including tests on Python 3.10, 3.11, and 3.12 across Ubuntu, macOS, and Windows.

Citation

If you use speccheck in your research, please cite:

[Citation information to be added]

Support

Issues: GitHub Issues
Documentation: This README
Contact: See setup.py for author information

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.github		.github
.vscode		.vscode
docker		docker
output		output
speccheck		speccheck
templates		templates
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
DOCKER_USAGE.md		DOCKER_USAGE.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
bump_version.py		bump_version.py
collect.sh		collect.sh
criteria.csv		criteria.csv
pyproject.toml		pyproject.toml
setup.py		setup.py
speccheck.py		speccheck.py

License

happykhan/speccheck

Folders and files

Latest commit

History

Repository files navigation

speccheck

Features

Installation

From Source

Development Installation

Docker

Quick Start

Usage

Command: collect

Options

Examples

Supported Modules

Command: summary

Options

Examples

Output

Command: check

Options

Example

Criteria File Format

Metadata Integration

Output Format

CSV Column Order

Example Output

Development

Running Tests

Code Quality

Project Structure

Dependencies

Version

License

Contributing

Quick Start for Contributors

Code Quality

Citation

Support

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Contributors 2

Languages

Command: `collect`

Command: `summary`

Command: `check`