Skip to content

tabix.createIndex: [ti_index_core] the file out of order at line #### #25

@bschilder

Description

@bschilder

Hello,

When trying to use tabix.createIndex to index a munged GWAS summary stats file (tab-separated and compressed by Rsamtools::bgzip), I keep getting the following error:

[ti_index_core] the file out of order at line 776464
Create tabix index failed for [ /Users/schilder/Downloads/pgc-bip2021-all_munged.tsv.bgz ]!

And yet, I've made sure to sort the file by CHR (chromosome) and BP (position). I even visually confirmed that the positions are in order at the line it is referencing (after it's been bgzip compressed):

Screenshot 2022-03-16 at 22 17 25

Reprex

The data can be downloaded here.

#### Set up paths ####
fullSS_path_vcf <- "~/Downloads/pgc-bip2021-all.vcf.tsv.gz" 
fullSS_path_tsv <- gsub("\\.vcf","",fullSS_path_vcf) 
fullSS_path_munged <- gsub("-all","-all_munged",fullSS_path_tsv)
#### Edit ####
dat <- data.table::fread(fullSS_path_vcf, 
                         skip = "#CHROM")
colnames(dat) <- gsub("#","",colnames(dat))
#### Sort ####
data.table::setkey(dat, CHROM, POS)
#### Save ####
data.table::fwrite(x = dat, 
                   file = fullSS_path_tsv, 
                   sep="\t")
#### Munge ####
fullSS_path <- MungeSumstats::format_sumstats(path = fullSS_path_tsv, 
                                              save_path = fullSS_path_munged, 
                                              sort_coordinates = TRUE,
                                              log_folder = "~/Downloads/logs", 
                                              log_mungesumstats_msgs = TRUE, 
                                              log_folder_ind = TRUE)

#### Compress ####
bgz_file <- Rsamtools::bgzip(file = fullSS_path, 
                                 overwrite = TRUE)

#### Index ####
 seqminer::tabix.createIndex(
        bgzipFile = bgz_file,
        sequenceColumn = 2,
        startColumn = 3,
        endColumn = 3,
        ## Just use the first column's name (since none have the `#` symbol)
        metaChar = "SNP"
    )

Any help would be appreciated.

Best,
Brian

Session info

Details
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] arrow_7.0.0      ggimage_0.3.0    ggplot2_3.3.5    dplyr_1.0.8      hexSticker_0.4.9
[6] echotabix_0.99.3

loaded via a namespace (and not attached):
  [1] AnnotationHub_3.2.2           BiocFileCache_2.2.1           systemfonts_1.0.4            
  [4] igraph_1.2.11                 BiocParallel_1.28.3           GenomeInfoDb_1.30.1          
  [7] digest_0.6.29                 yulab.utils_0.0.4             htmltools_0.5.2              
 [10] magick_2.7.3                  fansi_1.0.2                   magrittr_2.0.2               
 [13] memoise_2.0.1                 BSgenome_1.62.0               echoverseTemplate_0.99.0     
 [16] ontologyPlot_1.6              openxlsx_4.2.5                Biostrings_2.62.0            
 [19] matrixStats_0.61.0            R.utils_2.11.0                sysfonts_0.8.5               
 [22] prettyunits_1.1.1             colorspace_2.0-3              blob_1.2.2                   
 [25] rappdirs_0.3.3                textshaping_0.3.6             xfun_0.30                    
 [28] crayon_1.5.0                  RCurl_1.98-1.6                echodata_0.99.6              
 [31] jsonlite_1.8.0                hexbin_1.28.2                 graph_1.72.0                 
 [34] VariantAnnotation_1.40.0      glue_1.6.2                    gtable_0.3.0                 
 [37] zlibbioc_1.40.0               XVector_0.34.0                DelayedArray_0.20.0          
 [40] Rgraphviz_2.38.0              BiocGenerics_0.40.0           scales_1.1.1                 
 [43] DBI_1.1.2                     Rcpp_1.0.8.2                  showtextdb_3.0               
 [46] xtable_1.8-4                  progress_1.2.2                gridGraphics_0.5-1           
 [49] bit_4.0.4                     clisymbols_1.2.0              stats4_4.1.0                 
 [52] DT_0.21                       htmlwidgets_1.5.4             httr_1.4.2                   
 [55] ontologyIndex_2.7             ellipsis_0.3.2                pkgconfig_2.0.3              
 [58] XML_3.99-0.9                  R.methodsS3_1.8.1             farver_2.1.0                 
 [61] seqminer_8.4                  dbplyr_2.1.1                  utf8_1.2.2                   
 [64] here_1.0.1                    ggplotify_0.1.0               tidyselect_1.1.2             
 [67] labeling_0.4.2                rlang_1.0.2                   later_1.3.0                  
 [70] AnnotationDbi_1.56.2          BiocVersion_3.14.0            munsell_0.5.0                
 [73] tools_4.1.0                   cachem_1.0.6                  cli_3.2.0                    
 [76] generics_0.1.2                RSQLite_2.2.10                evaluate_0.15                
 [79] stringr_1.4.0                 fastmap_1.1.0                 yaml_2.3.5                   
 [82] ragg_1.2.2                    knitr_1.37                    bit64_4.0.5                  
 [85] fs_1.5.2                      zip_2.2.0                     purrr_0.3.4                  
 [88] KEGGREST_1.34.0               gh_1.3.0                      showtext_0.9-5               
 [91] mime_0.12                     R.oo_1.24.0                   xml2_1.3.3                   
 [94] biomaRt_2.50.3                brio_1.1.3                    compiler_4.1.0               
 [97] rstudioapi_0.13               interactiveDisplayBase_1.32.0 filelock_1.0.2               
[100] curl_4.3.2                    png_0.1-7                     testthat_3.1.2               
[103] paintmap_1.0                  tibble_3.1.6                  stringi_1.7.6                
[106] GenomicFeatures_1.46.5        desc_1.4.1                    lattice_0.20-45              
[109] Matrix_1.4-0                  vctrs_0.3.8                   pillar_1.7.0                 
[112] lifecycle_1.0.1               BiocManager_1.30.16           data.table_1.14.2            
[115] bitops_1.0-7                  httpuv_1.6.5                  rtracklayer_1.54.0           
[118] GenomicRanges_1.46.1          R6_2.5.1                      BiocIO_1.4.0                 
[121] promises_1.2.0.1              IRanges_2.28.0                ontoProc_1.16.0              
[124] assertthat_0.2.1              pkgload_1.2.4                 SummarizedExperiment_1.24.0  
[127] rprojroot_2.0.2               rjson_0.2.21                  withr_2.5.0                  
[130] GenomicAlignments_1.30.0      Rsamtools_2.10.0              S4Vectors_0.32.3             
[133] GenomeInfoDbData_1.2.7        parallel_4.1.0                hms_1.1.1                    
[136] grid_4.1.0                    ggfun_0.0.5                   tidyr_1.2.0                  
[139] rmarkdown_2.13                MatrixGenerics_1.6.0          piggyback_0.1.1              
[142] Biobase_2.54.0                shiny_1.7.1                   lubridate_1.8.0              
[145] restfulr_0.0.13          

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions