Skip to content

seqminer is missing some SNPs in range compared to PLINK #17

@garyzhubc

Description

@garyzhubc

I made a comparison between the PLINK2 glm result from the some range versus reading the dosage matrix with seqminer. The result from seqminer is missing some SNPs that appeared in the PLINK result from the same range.

PLINK

Prune the .bgen file with PLINK and do glm.

./plink2 --bfile ukb_imp_chr1_v3_pruned --from-bp 54505148 --to-bp 56530526  --chr 1 --out ukb_imp_chr1_v3_pruned_trimmed_2m --export bgen-1.2 bits=8
./plink2 --bgen ukb_imp_chr1_v3_pruned_trimmed_2m.bgen ref-first --sample ukb_imp_chr1_v3_pruned_trimmed_2m.sample --glm --pheno ldl_pcs_joined_with_geno.tsv --pheno-name LDL --covar ldl_pcs_joined_with_geno.tsv --covar-name PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10, PC11,PC12,PC13,PC14,PC15,PC16,PC17,PC18,PC19,PC20, PC21,PC22,PC23,PC24,PC25,PC26,PC27,PC28,PC29,PC30, PC31,PC32,PC33,PC34,PC35,PC36,PC37,PC38,PC39,PC40 --out ukb_imp_chr1_v3_pruned_trimmed_2m

Inspect PLINK glm result and see which positions are used.

> ukb_imp_chr1_v3_pruned_trimmed_2m.LDL.glm.linear <- read_tsv("ukb_imp_chr1_v3_pruned_trimmed_2m.LDL.glm.linear.tsv") %>% filter(TEST=="ADD") %>% mutate(CHROM=as.integer(CHROM), POS=as.integer(POS))
> length(ukb_imp_chr1_v3_pruned_trimmed_2m.LDL.glm.linear $POS)
[1] 11984
> ukb_imp_chr1_v3_pruned_trimmed_2m.LDL.glm.linear $POS
   [1] 54505201 54505304 54505437 54505703 54505756 54505887 54506600 54507000 54507273 54507506 54507734 54507907 54507996
  [14] 54508051 54508231 54508394 54508564 54509028 54509259 54509316 54509594 54509616 54509786 54509957 54509994 54510577
  [27] 54510860 54511200 54511248 54511349 54512553 54512583 54513132 54513323 54513367 54513825 54513867 54513921 54514065
  [40] 54514261 54514341 54514570 54514675 54515041 54515365 54516228 54516280 54516334 54516526 54516601 54516792 54516899
  [53] 54517324 54517633 54517874 54518434 54518725 54519045 54519107 54519119 54519158 54519292 54519295 54519731 54519800
  [66] 54520603 54520921 54521284 54521434 54521855 54522000 54522240 54522280 54522307 54522331 54522355 54522645 54522830
  [79] 54523285 54523304 54523344 54523630 54523639 54523689 54523881 54524519 54524575 54524608 54524940 54525150 54525207

seqminer

Load the .bgen file as a dosage matrix with seqminer and inspect the data size. 5766 is way smaller than 11984. So about half the SNPs are missed by seqminer.

library(tidyverse)
library(seqminer)
ukb_imp_chr1_v3_pruned.bgen <- "ukb_imp_chr1_v3_pruned.bgen"
start_pos  <- 54505148
end_pos <- 56530526
geno <- t(readBGENToMatrixByRange(ukb_imp_chr1_v3_pruned.bgen, paste0("1:", start_pos,"-",end_pos))[[1]])
> print(dim(geno))
[1] 487409   5766
> colnames(geno)
   [1] "1:54505201" "1:54506600" "1:54507273" "1:54507506" "1:54507734"
   [6] "1:54507996" "1:54508051" "1:54508394" "1:54508564" "1:54509028"
  [11] "1:54509316" "1:54509594" "1:54509957" "1:54510860" "1:54511248"
  [16] "1:54511349" "1:54513132" "1:54513323" "1:54513367" "1:54513825"
  [21] "1:54513921" "1:54514261" "1:54514341" "1:54514675" "1:54515365"
  [26] "1:54516228" "1:54516280" "1:54516526" "1:54516792" "1:54516899"
  [31] "1:54517633" "1:54517874" "1:54518434" "1:54518725" "1:54519045"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions