-
Notifications
You must be signed in to change notification settings - Fork 561
Description
Does anyone (e.g., @ilveroluca or @avilella) have any thoughts on the performance of bwa mem
when run with a shared memory index (bwa shm
)? I've found there to be a 24% performance penalty when using a pre-loaded index, which to my naive mind indicates something either with either increased cache misses or suboptimal virtual memory paging (possibly related to MMAP flags)? Ideally, I would love for the the pre-loaded index to improve performance due to the overall decreased amount of RAM usage, reduced amount of time spent on IO, and increased flexibility for threading / multiplexing.
Does this number seem "reasonable" to others who have thought more carefully about memory management?
My benchmark code is below. FWIW, I've observed similar behavior on both OSX and linux.
wget ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/131219_D00360_005_BH814YADXX/Project_RM8398/Sample_U0a/U0a_CGATGT_L001_R1_001.fastq.gz
time bwa mem -t 12 ref.bwa_mem.fa U0a_CGATGT_L001_R1_001.fastq.gz > /dev/null
[M::bwa_idx_load_from_disk] ...
[...]
[main] Version: 0.7.15-r1140
[main] CMD: bwa mem -t 12 ref.bwa_mem.fa U0a_CGATGT_L001_R1_001.fastq.gz
[main] Real time: 146.956 sec; CPU: 1690.897 sec
bwa shm ref.bwa_mem.fa
time bwa mem -t 12 ref.bwa_mem.fa U0a_CGATGT_L001_R1_001.fastq.gz > /dev/null
[M::main_mem] load the bwa index from shared memory
[...]
[main] Version: 0.7.15-r1140
[main] CMD: bwa mem -t 12 ref.bwa_mem.fa U0a_CGATGT_L001_R1_001.fastq.gz
[main] Real time: 182.335 sec; CPU: 2153.612 sec