Skip to content

mapping on subset of a bigger reference gives different results #381

@mewu3

Description

@mewu3

Dear all,

I have construct a smaller reference databases (less than 1 GB) from a largger reference databse (~ 60 GB) by subtract small windows of sequences, therefore, the sequences in the samll reference are completely identifical to those in the large reference. Mapping the same query on this small reference gives more mapping than the large reference.

bwa mem -t 32 -L 50,50 -Y -h 10000

query       0       Candida_glabrata        10526496        0       40M     *       0       0       GTTCAATGCCACTCGTGACTACAAATTGCCTGAGGCCCCA        EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE        NM:i:1  MD:Z:24G15      AS:i:35 XS:i:35 XA:Z:Candida_glabrata,+10525296,40M,1;Candida_glabrata,+10526196,40M,1;Candida_glabrata,+10525896,40M,1;Candida_glabrata,+10525596,40M,1;Candida_glabrata,+10526796,40M,2;Candida_glabrata,-16703643,40M,2;Candida_glabrata,-12363822,40M,2;Candida_glabrata,+13345998,40M,3;

bwa mem -t 32 -L 50,50 -Y -h 10000

query       0       Candida_glabrata        73738   0       40M     *       0       0       GTTCAATGCCACTCGTGACTACAAATTGCCTGAGGCCCCA        EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE        NM:i:0  MD:Z:40 AS:i:40XS:i:40  XA:Z:Candida_glabrata,+73413,40M,0;Aspergillus_lentulus,-6216,40M,0;Candida_glabrata,+53407,40M,0;Candida_glabrata,+74338,40M,0;Candida_glabrata,+73113,40M,0;Candida_glabrata,+80122,40M,0;Candida_glabrata,+55738,40M,0;Candida_glabrata,+79521,40M,0;Candida_glabrata,+55438,40M,0;Candida_glabrata,+79822,40M,0;

There is Aspergillus_lentulus in the middle of Candida_glabrata

Main problem here is i got more hits on different species with the small reference, I would prefer that results remain the same as the large one. I did try to change -h to a much smaller number (100, 50, 10 ...), Some work, but others still got extra mapping species.

Could you please give me some advises ?

Thanks,
MJ

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions