mapping on subset of a bigger reference gives different results

Dear all, 

I have construct a smaller reference databases (less than 1 GB) from a largger reference databse (~ 60 GB) by subtract small windows of sequences, therefore, the sequences in the samll reference are completely identifical to those in the large reference. Mapping the same query on this small reference gives more mapping than the large reference.

bwa mem -t 32 -L 50,50 -Y -h 10000
```
query       0       Candida_glabrata        10526496        0       40M     *       0       0       GTTCAATGCCACTCGTGACTACAAATTGCCTGAGGCCCCA        EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE        NM:i:1  MD:Z:24G15      AS:i:35 XS:i:35 XA:Z:Candida_glabrata,+10525296,40M,1;Candida_glabrata,+10526196,40M,1;Candida_glabrata,+10525896,40M,1;Candida_glabrata,+10525596,40M,1;Candida_glabrata,+10526796,40M,2;Candida_glabrata,-16703643,40M,2;Candida_glabrata,-12363822,40M,2;Candida_glabrata,+13345998,40M,3;
```

bwa mem -t 32 -L 50,50 -Y -h 10000
```
query       0       Candida_glabrata        73738   0       40M     *       0       0       GTTCAATGCCACTCGTGACTACAAATTGCCTGAGGCCCCA        EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE        NM:i:0  MD:Z:40 AS:i:40XS:i:40  XA:Z:Candida_glabrata,+73413,40M,0;Aspergillus_lentulus,-6216,40M,0;Candida_glabrata,+53407,40M,0;Candida_glabrata,+74338,40M,0;Candida_glabrata,+73113,40M,0;Candida_glabrata,+80122,40M,0;Candida_glabrata,+55738,40M,0;Candida_glabrata,+79521,40M,0;Candida_glabrata,+55438,40M,0;Candida_glabrata,+79822,40M,0;
```
**There is Aspergillus_lentulus in the middle of Candida_glabrata**

Main problem here is i got more hits on different species with the small reference, I would prefer that results remain the same as the large one. I did try to change `-h` to a much smaller number (100, 50, 10 ...), Some work, but others still got extra mapping species.

Could you please give me some advises ? 

Thanks, 
MJ


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mapping on subset of a bigger reference gives different results #381

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

mapping on subset of a bigger reference gives different results #381

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions