-
Notifications
You must be signed in to change notification settings - Fork 561
Description
Dear all,
I have construct a smaller reference databases (less than 1 GB) from a largger reference databse (~ 60 GB) by subtract small windows of sequences, therefore, the sequences in the samll reference are completely identifical to those in the large reference. Mapping the same query on this small reference gives more mapping than the large reference.
bwa mem -t 32 -L 50,50 -Y -h 10000
query 0 Candida_glabrata 10526496 0 40M * 0 0 GTTCAATGCCACTCGTGACTACAAATTGCCTGAGGCCCCA EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE NM:i:1 MD:Z:24G15 AS:i:35 XS:i:35 XA:Z:Candida_glabrata,+10525296,40M,1;Candida_glabrata,+10526196,40M,1;Candida_glabrata,+10525896,40M,1;Candida_glabrata,+10525596,40M,1;Candida_glabrata,+10526796,40M,2;Candida_glabrata,-16703643,40M,2;Candida_glabrata,-12363822,40M,2;Candida_glabrata,+13345998,40M,3;
bwa mem -t 32 -L 50,50 -Y -h 10000
query 0 Candida_glabrata 73738 0 40M * 0 0 GTTCAATGCCACTCGTGACTACAAATTGCCTGAGGCCCCA EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE NM:i:0 MD:Z:40 AS:i:40XS:i:40 XA:Z:Candida_glabrata,+73413,40M,0;Aspergillus_lentulus,-6216,40M,0;Candida_glabrata,+53407,40M,0;Candida_glabrata,+74338,40M,0;Candida_glabrata,+73113,40M,0;Candida_glabrata,+80122,40M,0;Candida_glabrata,+55738,40M,0;Candida_glabrata,+79521,40M,0;Candida_glabrata,+55438,40M,0;Candida_glabrata,+79822,40M,0;
There is Aspergillus_lentulus in the middle of Candida_glabrata
Main problem here is i got more hits on different species with the small reference, I would prefer that results remain the same as the large one. I did try to change -h
to a much smaller number (100, 50, 10 ...), Some work, but others still got extra mapping species.
Could you please give me some advises ?
Thanks,
MJ