speccheck is a modular command-line tool for collecting, validating, and summarizing quality control (QC) metrics from genomic analysis pipelines. It automatically detects and processes outputs from multiple bioinformatics tools, validates them against customizable criteria, and generates comprehensive reports with optional interactive visualizations.
- 🔍 Automatic Module Detection: Supports CheckM, QUAST, Speciator, ARIBA, and Sylph outputs
- ✅ Flexible QC Validation: Define organism-specific quality criteria with pass/fail checks
- 📊 Interactive Reports: Generate HTML dashboards with Plotly visualizations
- 🔗 Metadata Integration: Merge external sample metadata into QC reports
- 📝 Rich Logging: Beautiful console output with Rich library
- 🐳 Docker Support: Pre-built Docker images available
Clone the repository and install with pip:
git clone https://github.com/happykhan/speccheck.git
cd speccheck
pip install -e .For development with testing and linting tools:
pip install -e '.[dev]'Note: This project uses modern Python packaging with pyproject.toml (PEP 517/621). See MIGRATION.md for details on the migration from setup.py.
A Docker image is available for containerized execution:
docker pull happykhan/speccheck- Collect QC data from analysis outputs:
speccheck collect tests/practice_data/Sample_* --output-file results.csv- Generate summary report with visualizations:
speccheck summary qc_results/ --plot- Validate criteria file:
speccheck check --criteria-file criteria.csvCollect and validate QC metrics from bioinformatics tool outputs.
speccheck collect [OPTIONS] FILEPATHS...| Option | Type | Default | Description |
|---|---|---|---|
FILEPATHS |
Positional | Required | File paths (supports wildcards like data/*/*.tsv) |
--organism |
String | Auto-detect | Organism name for criteria matching |
--sample |
String | None | Sample identifier |
--criteria-file |
Path | criteria.csv |
CSV file with QC criteria |
--output-file |
Path | qc_results/collected_data.csv |
Output CSV path |
--metadata |
Path | None | CSV with additional metadata (requires sample_id column) |
-v, --verbose |
Flag | False | Enable debug logging |
--version |
Flag | - | Show version and exit |
Basic collection:
speccheck collect data/sample1/*.tsv --sample sample1With organism specification:
speccheck collect data/ecoli_* --organism "Escherichia coli" --output-file ecoli_qc.csvWith metadata merging:
speccheck collect data/* --metadata sample_info.csv --output-file merged_results.csvThe collect command automatically detects outputs from:
- CheckM: Completeness, contamination, genome metrics
- QUAST: Assembly statistics (N50, contigs, GC content)
- Speciator: Species identification and confidence
- ARIBA: Antimicrobial resistance gene detection
- Sylph: Metagenomic profiling and ANI values
Generate consolidated reports from multiple collected QC files.
speccheck summary [OPTIONS] DIRECTORY| Option | Type | Default | Description |
|---|---|---|---|
DIRECTORY |
Positional | Required | Directory containing CSV QC reports |
--output |
Path | qc_report |
Output directory for summary |
--species |
String | Speciator.speciesName |
Column name for species field |
--sample |
String | sample_id |
Column name for sample identifier |
--templates |
Path | templates/report.html |
HTML template file |
--plot |
Flag | False | Generate interactive plots |
-v, --verbose |
Flag | False | Enable debug logging |
--version |
Flag | - | Show version and exit |
Basic summary:
speccheck summary qc_results/With plotting enabled:
speccheck summary qc_results/ --plot --output final_report/Custom field names:
speccheck summary results/ --sample SampleID --species Species --plotreport.csv: Consolidated QC metrics with sorted columns (sample_id, all_checks_passed, .check columns, other fields)report.html: Interactive HTML dashboard (when--plotis enabled)
Validate the structure and content of a criteria file.
speccheck check [OPTIONS]| Option | Type | Default | Description |
|---|---|---|---|
--criteria-file |
Path | criteria.csv |
Path to criteria CSV file |
-v, --verbose |
Flag | False | Enable debug logging |
--version |
Flag | - | Show version and exit |
speccheck check --criteria-file config/custom_criteria.csvThe criteria file defines organism-specific QC thresholds in CSV format:
organism,software,field,operator,threshold
Escherichia coli,Checkm,Completeness,>=,95
Escherichia coli,Checkm,Contamination,<=,5
Escherichia coli,Quast,N50,>=,50000
Columns:
organism: Species or genus name (use "all" for universal criteria)software: Tool name (CheckM, QUAST, Speciator, ARIBA, Sylph)field: Metric name from tool outputoperator: Comparison operator (>=,<=,==,>,<)threshold: Numeric threshold value
Add external sample metadata using the --metadata option:
metadata.csv:
sample_id,location,sequencing_date,batch
sample1,Lab A,2024-01-15,Batch1
sample2,Lab B,2024-01-16,Batch1
speccheck collect data/* --metadata metadata.csv --output-file results.csvMetadata columns are automatically merged with QC metrics based on sample_id.
Output files are automatically organized for readability:
- Sample identifier (
sample_idorSample) - Overall checks (columns ending with
all_checks_passed) - Individual checks (columns ending with
.check) - sorted alphabetically - Metrics (remaining columns) - sorted alphabetically
sample_id,all_checks_passed,Checkm.all_checks_passed,Checkm.Completeness.check,Checkm.Contamination.check,Checkm.Completeness,Checkm.Contamination
sample1,True,True,True,True,98.5,1.2
sample2,False,False,False,True,89.3,0.8
pytest
pytest --cov=speccheck # With coveragepylint speccheck/speccheck/
├── speccheck/
│ ├── __init__.py
│ ├── main.py # Core logic
│ ├── collect.py # File collection & writing
│ ├── criteria.py # Criteria validation
│ ├── report.py # Report generation
│ ├── modules/ # Tool-specific parsers
│ │ ├── checkm.py
│ │ ├── quast.py
│ │ ├── speciator.py
│ │ ├── ariba.py
│ │ └── sylph.py
│ └── plot_modules/ # Visualization modules
│ ├── plot_checkm.py
│ ├── plot_quast.py
│ └── ...
├── tests/ # Pytest test suite
├── templates/ # HTML templates
├── speccheck.py # CLI entry point
└── setup.py # Package configuration
- Core:
rich,typer,pandas,jinja2,plotly - Dev:
pytest,pytest-cov,pylint,coverage
Check the installed version:
speccheck --versionThis project is licensed under the GNU General Public License v3.0 (GPLv3). See LICENSE for details.
Contributions are welcome! We appreciate bug reports, feature requests, documentation improvements, and code contributions.
- Fork the repository
- Install development dependencies:
pip install -e '.[dev]' - Install pre-commit hooks:
pre-commit install - Create a feature branch:
git checkout -b feature/your-feature - Make your changes and add tests
- Run checks:
pytest --cov=speccheck && ruff check speccheck/ - Submit a pull request
For detailed guidelines, see CONTRIBUTING.md.
This project uses:
- Black for code formatting
- Ruff for fast linting
- Pylint for comprehensive code analysis
- pytest with coverage reporting
- pre-commit hooks for automated checks
All PRs must pass CI checks including tests on Python 3.10, 3.11, and 3.12 across Ubuntu, macOS, and Windows.
If you use speccheck in your research, please cite:
[Citation information to be added]
- Issues: GitHub Issues
- Documentation: This README
- Contact: See setup.py for author information