Skip to content

alejandrogzi/bqlint

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bqlint

Version Badge Crates.io Version GitHub License Crates.io Total Downloads

validate BINSEQ (.bq, .vbq) files

docs . usage . features . format

Description

BQLint validates BINSEQ compressed sequence files used in bioinformatics pipelines. It supports both single-file validation and paired-file validation for ensuring data quality and consistency across sequencing datasets.

Installation

cargo install bqlint

Or build from source:

git clone https://github.com/alejandrogzi/bqlint.git
cd bqlint
cargo build --release

Usage

Single File Validation

# Basic validation
bqlint --file sample.bq

# With custom settings
bqlint --file sample.bq --single-read-validation-level medium

Paired File Validation

# Basic single validation from paired files
bqlint --file R1.vbq

# Basic paired validation
bqlint --file1 R1.vbq --file2 R2.vbq

# With custom settings
bqlint --file1 R1.vbq --file2 R2.vbq --paired-read-validation-level high

Advanced Options

# Disable specific validators
bqlint --file sample.bq --disable S003,P001

# Custom quality range
bqlint --file sample.bq --quality-min 30 --quality-max 120

# JSON output
bqlint --file sample.bq --json

# Multi-threading (for .bq files)
bqlint --file sample.bq --threads 8

# Log mode instead of panic on first error
bqlint --file sample.bq --lint-mode log

Features

  • Fast validation: Optimized for large BINSEQ files with multi-threaded support
  • Comprehensive checks: 17 validators covering format, quality, and consistency
  • Flexible configuration: Configurable validation levels and disabled validators
  • Paired-end support: Validates consistency between paired sequencing files
  • Multiple output formats: Plain text or JSON output
  • Quality control: Configurable quality score ranges and validation

Format

.bq (BINSEQ)

Compressed binary format for sequencing data with:

  • Optional headers and quality scores
  • Variable or fixed sequence lengths
  • Multi-threaded processing support

.vbq (Variable BINSEQ)

Variable-length BINSEQ format with:

  • Sequential processing (single-threaded)
  • Enhanced compression for variable-length reads
  • Same validation capabilities as .bq

Validators

Single File Validators (S-series)

Code Level Description
S003 Low Missing headers
S004 High Reserved
S005 High Quality length mismatch
S006 Medium Quality byte out of range
S007 High Duplicate headers
S008 High Reserved
S009 High Reserved
S010 High Decoded length mismatch
S011 High Invalid nucleotides
S012 Medium Reserved (disabled)

Paired File Validators (P-series)

Code Level Description
P001 Medium Header mismatch between paired files
P002 High Reserved
P003 High Reserved
P004 Medium Length consistency and pairing validation

Format Validators (V-series)

Code Level Description
V001 High Reserved
V002 High Reserved

Validation Levels

  • Low: Basic integrity checks
  • Medium: Standard quality and consistency validation
  • High: Comprehensive validation including all checks

Options

Usage: bqlint [OPTIONS]

Options:
  -f, --file <FILE>
          Single file to lint (.bq or .vbq)
  -1, --file1 <FILE1>
          Paired files to lint together (.bq or .vbq)
  -2, --file2 <FILE2>
          
  -m, --lint-mode <LINT_MODE>
          Panic on first error or log all errors [default: panic] [possible values: panic, log]
  -s, --single-read-validation-level <SINGLE_READ_VALIDATION_LEVEL>
          Single-read validator level [default: high] [possible values: low, medium, high]
  -p, --paired-read-validation-level <PAIRED_READ_VALIDATION_LEVEL>
          Paired-read validator level [default: high] [possible values: low, medium, high]
  -d, --disable <DISABLE>
          Disable validators by code (comma separated, e.g. S003,P001)
      --quality-min <QUALITY_MIN>
          Minimum allowed quality byte (inclusive) [default: 33]
      --quality-max <QUALITY_MAX>
          Maximum allowed quality byte (inclusive) [default: 126]
  -t, --threads <THREADS>
          Number of threads for BQ processing (VBQ is sequential per file) [default: 16]
  -J, --json
          Emit JSON output instead of plain text
  -L, --level <LOG_LEVEL>
          Logging verbosity (controls colored stderr logs) [default: info] [possible values: error, warn, info, debug, trace]
  -h, --help
          Print help

Examples

Basic Usage

# Validate a single file
bqlint --file sample.bq

# Validate paired files
bqlint --file1 R1.vbq --file2 R2.vbq

Custom Validation

# Use medium validation level and disable specific checks
bqlint --file sample.bq \
  --single-read-validation-level medium \
  --disable S003,S006

# Custom quality range for stricter validation
bqlint --file sample.bq \
  --quality-min 35 \
  --quality-max 120

Output Formats

# Plain text output (default)
bqlint --file sample.bq

# JSON output for programmatic use
bqlint --file sample.bq --json

About

validate BINSEQ .bq/.vbq files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages