Fast region queries against BGZF-compressed VCF files using the Tabix index. The C++ layer is built with Rcpp and links against Rhtslib. All file handles and index structures are managed with RAII so memory is released correctly on every call path, including repeated queries in a loop.
# Install dependencies first if needed
install.packages("BiocManager")
BiocManager::install("Rhtslib")
install.packages("Rcpp")
# Install tabixr from the source tarball
install.packages("tabixr_0.2.0.tar.gz", repos = NULL, type = "source")library(tabixr)
vcf <- "path/to/your/file.vcf.gz" # .tbi index must exist alongside it
# Chromosome names in the index
vcf_seqnames(vcf)
# Sample names from the #CHROM header line
vcf_samples(vcf)
# All header lines (## metadata + #CHROM column header)
vcf_header(vcf)
# Query a genomic region — returns a data.frame
df <- query_vcf(vcf, chrom = "chr21", start = 9411245, end = 9412000)
# Query a specific set of positions — opens the file and index only once
positions <- c(9411245L, 9411354L, 9411690L)
df <- query_vcf_positions(vcf, chrom = "chr21", positions = positions)Use query_vcf_positions when looking up a pre-defined list of positions
(e.g. a GWAS hit list). It is significantly faster than calling query_vcf
once per position because the VCF file and Tabix index are opened only once
for the entire vector.
Both functions return a data.frame with column names taken from the VCF
#CHROM header line:
| Column | Type | Notes |
|---|---|---|
| CHROM | character | |
| POS | integer | |
| ID | character | |
| REF | character | |
| ALT | character | |
| QUAL | character | |
| FILTER | character | |
| INFO | character | Raw KEY=VALUE;... string |
| FORMAT | character | e.g. GT:GP:DS |
| sample columns | character | One column per sample, named from the header |
Both endpoints of a region query are inclusive. Multi-allelic sites contribute one row per ALT allele. An empty result returns a zero-row data.frame with the correct column names and types.
- The VCF must be compressed with bgzip (not plain gzip).
- A Tabix index (
.tbi) must exist in the same directory with the same base name, e.g.chr21.vcf.gz→chr21.vcf.gz.tbi. - To create these from an uncompressed VCF using htslib tools:
bgzip file.vcf tabix -p vcf file.vcf.gz