-
Notifications
You must be signed in to change notification settings - Fork 226
Description
Only bug reports!
When running vg construct (https://github.com/vgteam/vg), the vcflib component checking appears to identify the vcf file as having an incorrect field order. A similar bug has previously been filed and addressed within htslib:
VCF Header: must Number be before Type? · Issue #642
samtools/hts-specs#642
The over strict checking rejects files where the field order may vary in the header. An example of the impact using vg, is described below:
$ vg construct -r /data/refs/hs38DH.fa
-v1kGP_high_coverage_Illumina.chr1.filtered.SNV_INDEL_SV_phased_panel.v
cf.g z1kGP_high_coverage_Illumina.chr1.filtered.SNV_INDEL_SV_phased_panel.vc
f.g
z.vg
header parse error at:
fields[2] != "Number"
##INFO=<ID=END2,Type=Integer,Number=1,Description="Position of
breakpoint on CHR2">
The vcf's used are from the 1k genomes project and are of known verified quality. Downloads are available from here:
http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/
In the example case a single new vg file, describing the variants in a graph format should be produced.
The 1k genomes repository is an excellent test set, that was designed to aid and accelerate tools development and is widely used. The current changes to the order testing will therefore be an issue that impacts anyone looking to use vcf tools based on the current vcflib releases
It has been suggested that the issue may be related to the vcflib code here:
Line 1839 in 6dbe2f6
| if (fields[2] != "Number") { |
With thanks to local team members for their help in tracking this issue to the above code base and many thanks to the vcflib team for making these tools available
Pete