This directory contains test files for testing various file formats supported by the boom package.
test.fastq- Sample FASTQ file with 3 sequencestest.fastq.gz- Gzip-compressed version of test.fastqtest.fasta- Sample FASTA file with 3 sequencestest.fasta.gz- Gzip-compressed version of test.fasta
test.sam- Sample SAM alignment filetest.bam- Sample BAM alignment filetest-sort.bam- Sample sorted BAM alignment file (used for testing indexed access)test-sort.bam.bai- BAM index file for test-sort.bam
test.vcf- Sample VCF (Variant Call Format) file with 5 variants across 2 chromosomes
FASTQ files contain sequence data with quality scores. Each record has 4 lines:
- Sequence identifier (starts with @)
- Nucleotide sequence
- Plus sign (+)
- Quality scores (same length as sequence)
FASTA files contain sequence data without quality scores. Each record has:
- Sequence identifier line (starts with >)
- One or more lines of sequence data
SAM (Sequence Alignment/Map) and BAM (binary SAM) files contain aligned sequence data:
- SAM is a text-based format
- BAM is a compressed binary format
- BAI files are index files for BAM files enabling rapid random access
VCF (Variant Call Format) files contain variant call data:
- VCF is a text-based format for storing gene sequence variations
- Each record represents a genomic variant with position, reference, and alternate alleles
- Can include sample genotype information and quality metrics
These test files are used by the tests to verify that:
- FASTQ files can be opened and handled (fastq_fasta_test.go)
- FASTA files can be opened and handled (fastq_fasta_test.go)
- SAM/BAM alignment files can be opened and handled (sam_bam_test.go)
- VCF variant files can be opened and handled (vcf_test.go)
- BAM index files can be loaded and used for random access (index_test.go)
- Compressed versions (gzip) of various formats can be handled
- The htslib wrapper correctly interfaces with these file formats
Example programs in the examples/ directory also use these test files.