-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Description
Hello,
When trying to use tabix.createIndex to index a munged GWAS summary stats file (tab-separated and compressed by Rsamtools::bgzip), I keep getting the following error:
[ti_index_core] the file out of order at line 776464
Create tabix index failed for [ /Users/schilder/Downloads/pgc-bip2021-all_munged.tsv.bgz ]!
And yet, I've made sure to sort the file by CHR (chromosome) and BP (position). I even visually confirmed that the positions are in order at the line it is referencing (after it's been bgzip compressed):
Reprex
The data can be downloaded here.
#### Set up paths ####
fullSS_path_vcf <- "~/Downloads/pgc-bip2021-all.vcf.tsv.gz"
fullSS_path_tsv <- gsub("\\.vcf","",fullSS_path_vcf)
fullSS_path_munged <- gsub("-all","-all_munged",fullSS_path_tsv)
#### Edit ####
dat <- data.table::fread(fullSS_path_vcf,
skip = "#CHROM")
colnames(dat) <- gsub("#","",colnames(dat))
#### Sort ####
data.table::setkey(dat, CHROM, POS)
#### Save ####
data.table::fwrite(x = dat,
file = fullSS_path_tsv,
sep="\t")
#### Munge ####
fullSS_path <- MungeSumstats::format_sumstats(path = fullSS_path_tsv,
save_path = fullSS_path_munged,
sort_coordinates = TRUE,
log_folder = "~/Downloads/logs",
log_mungesumstats_msgs = TRUE,
log_folder_ind = TRUE)
#### Compress ####
bgz_file <- Rsamtools::bgzip(file = fullSS_path,
overwrite = TRUE)
#### Index ####
seqminer::tabix.createIndex(
bgzipFile = bgz_file,
sequenceColumn = 2,
startColumn = 3,
endColumn = 3,
## Just use the first column's name (since none have the `#` symbol)
metaChar = "SNP"
)Any help would be appreciated.
Best,
Brian
Session info
Details
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.4
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] arrow_7.0.0 ggimage_0.3.0 ggplot2_3.3.5 dplyr_1.0.8 hexSticker_0.4.9
[6] echotabix_0.99.3
loaded via a namespace (and not attached):
[1] AnnotationHub_3.2.2 BiocFileCache_2.2.1 systemfonts_1.0.4
[4] igraph_1.2.11 BiocParallel_1.28.3 GenomeInfoDb_1.30.1
[7] digest_0.6.29 yulab.utils_0.0.4 htmltools_0.5.2
[10] magick_2.7.3 fansi_1.0.2 magrittr_2.0.2
[13] memoise_2.0.1 BSgenome_1.62.0 echoverseTemplate_0.99.0
[16] ontologyPlot_1.6 openxlsx_4.2.5 Biostrings_2.62.0
[19] matrixStats_0.61.0 R.utils_2.11.0 sysfonts_0.8.5
[22] prettyunits_1.1.1 colorspace_2.0-3 blob_1.2.2
[25] rappdirs_0.3.3 textshaping_0.3.6 xfun_0.30
[28] crayon_1.5.0 RCurl_1.98-1.6 echodata_0.99.6
[31] jsonlite_1.8.0 hexbin_1.28.2 graph_1.72.0
[34] VariantAnnotation_1.40.0 glue_1.6.2 gtable_0.3.0
[37] zlibbioc_1.40.0 XVector_0.34.0 DelayedArray_0.20.0
[40] Rgraphviz_2.38.0 BiocGenerics_0.40.0 scales_1.1.1
[43] DBI_1.1.2 Rcpp_1.0.8.2 showtextdb_3.0
[46] xtable_1.8-4 progress_1.2.2 gridGraphics_0.5-1
[49] bit_4.0.4 clisymbols_1.2.0 stats4_4.1.0
[52] DT_0.21 htmlwidgets_1.5.4 httr_1.4.2
[55] ontologyIndex_2.7 ellipsis_0.3.2 pkgconfig_2.0.3
[58] XML_3.99-0.9 R.methodsS3_1.8.1 farver_2.1.0
[61] seqminer_8.4 dbplyr_2.1.1 utf8_1.2.2
[64] here_1.0.1 ggplotify_0.1.0 tidyselect_1.1.2
[67] labeling_0.4.2 rlang_1.0.2 later_1.3.0
[70] AnnotationDbi_1.56.2 BiocVersion_3.14.0 munsell_0.5.0
[73] tools_4.1.0 cachem_1.0.6 cli_3.2.0
[76] generics_0.1.2 RSQLite_2.2.10 evaluate_0.15
[79] stringr_1.4.0 fastmap_1.1.0 yaml_2.3.5
[82] ragg_1.2.2 knitr_1.37 bit64_4.0.5
[85] fs_1.5.2 zip_2.2.0 purrr_0.3.4
[88] KEGGREST_1.34.0 gh_1.3.0 showtext_0.9-5
[91] mime_0.12 R.oo_1.24.0 xml2_1.3.3
[94] biomaRt_2.50.3 brio_1.1.3 compiler_4.1.0
[97] rstudioapi_0.13 interactiveDisplayBase_1.32.0 filelock_1.0.2
[100] curl_4.3.2 png_0.1-7 testthat_3.1.2
[103] paintmap_1.0 tibble_3.1.6 stringi_1.7.6
[106] GenomicFeatures_1.46.5 desc_1.4.1 lattice_0.20-45
[109] Matrix_1.4-0 vctrs_0.3.8 pillar_1.7.0
[112] lifecycle_1.0.1 BiocManager_1.30.16 data.table_1.14.2
[115] bitops_1.0-7 httpuv_1.6.5 rtracklayer_1.54.0
[118] GenomicRanges_1.46.1 R6_2.5.1 BiocIO_1.4.0
[121] promises_1.2.0.1 IRanges_2.28.0 ontoProc_1.16.0
[124] assertthat_0.2.1 pkgload_1.2.4 SummarizedExperiment_1.24.0
[127] rprojroot_2.0.2 rjson_0.2.21 withr_2.5.0
[130] GenomicAlignments_1.30.0 Rsamtools_2.10.0 S4Vectors_0.32.3
[133] GenomeInfoDbData_1.2.7 parallel_4.1.0 hms_1.1.1
[136] grid_4.1.0 ggfun_0.0.5 tidyr_1.2.0
[139] rmarkdown_2.13 MatrixGenerics_1.6.0 piggyback_0.1.1
[142] Biobase_2.54.0 shiny_1.7.1 lubridate_1.8.0
[145] restfulr_0.0.13
Metadata
Metadata
Assignees
Labels
No labels