5 releases (3 breaking)
Uses new Rust 2024
| 0.4.0 | Jan 21, 2026 |
|---|---|
| 0.3.0 | Jan 21, 2026 |
| 0.2.0 | Jan 5, 2026 |
| 0.1.1 | Aug 18, 2025 |
| 0.1.0 | Feb 27, 2025 |
#323 in Biology
862 downloads per month
Used in convert_genome
50KB
1K
SLoC
check_build
A fast, memory-efficient tool to verify VCF files against hg19 and hg38 reference genomes. Also available as a library for general-purpose use beyond VCF.
Quick Start
What build is my file?
check_build --detect my_variants.vcf
# Output: Hg38 (100.0% match, high confidence)
Full verification:
check_build my_variants.vcf
Installation
cargo install check_build
Or from source:
git clone https://github.com/SauersML/check_build.git
cd check_build
cargo build --release
Usage
CLI
# Simple build detection
check_build --detect sample.vcf
# Full verification with summary
check_build sample.vcf
# Quiet mode (no progress bars)
check_build -q sample.vcf
# Summary only (no mismatch details)
check_build -s sample.vcf
# Single reference
check_build --hg38-only sample.vcf
# Custom reference paths
check_build --hg19-path /data/hg19.fa --hg38-path /data/hg38.fa sample.vcf
Library
Add to Cargo.toml:
[dependencies]
check_build = { git = "https://github.com/SauersML/check_build" }
Simple usage:
use check_build::detect_build;
let result = detect_build("sample.vcf")?;
println!("{}", result); // "Hg38 (100.0% match, high confidence)"
Full control:
use check_build::{Verifier, Reference};
let result = Verifier::new("sample.vcf")
.quiet()
.verify_both()?;
println!("hg19: {:.1}% match", result.match_rate(Reference::Hg19));
println!("hg38: {:.1}% match", result.match_rate(Reference::Hg38));
// Detailed detection with edge case handling
match result.detect() {
BuildDetection::Detected { build, confidence, .. } => {
println!("Build: {:?} ({} confidence)", build, confidence);
}
BuildDetection::Ambiguous { reason, .. } => {
println!("Cannot determine: {}", reason);
}
BuildDetection::Unknown { reason, .. } => {
println!("Problem with file: {}", reason);
}
BuildDetection::NoData => {
println!("No valid variants found");
}
}
Features
- Fast: Parallel verification of hg19/hg38 using rayon
- Memory-efficient: Streams references, processes one contig at a time
- Auto-download: Fetches reference FASTAs if not present
- Edge case handling: Detects ambiguous, unknown, or corrupt files
- Dual interface: Both CLI and library
How It Works
- Splits VCF by contig into temp files
- Streams each reference FASTA (never loads full genome)
- Verifies REF alleles match reference bases
- Reports match rates and infers build
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success (build detected or verification passed) |
| 1 | Error (file not found, download failed, etc.) |
| 2 | Ambiguous (matches both builds similarly) |
| 3 | Unknown (low match on both, possibly corrupt) |
| 4 | No data (VCF had no valid variants) |
License
MIT
Dependencies
~10–29MB
~395K SLoC