Enable parallelization of BCMath::FastPValue#328
Conversation
Possible reasons for little gain from from multi-threading are frequent memory allocation (global heap lock) or data processing that is memory-I/O-bound instead of CPU-bound. |
|
I see the need for parallelization but I fear you didn't get it right. You can't just split the serial Markov chain and run iterations in parallel. What you can do is to run multiple independent chains in parallel and merge their results. For that, you would want to have different initial states in theory. Perhaps in practice you may get the same results by starting all chains from the mode, as is done currently. In any case, for such a change I would require a unit test to ensure things agree in serial and parallel execution |
Sometimes a relatively high number of iterations is needed for
BCMath::FastPValueto return a stable result. In the fit i'm working on right now, for example, I need at least 10^7 iterations to compute a p-value which is stable at the percent level, and it take nearly 3 minutes on my Intel i7-7700. So I added some OpenMP instructions around the main loop to try to speed things up.Unfortunately I don't get the performance I was expecting at the beginning. On my machine I get a speed-up factor of 1.6 with 4 threads and 2.2 with 8 (hyper-) threads, which is better than nothing but still meh. I don't see where all that overhead could be, but I'm also not highly experienced with parallel coding. Maybe someone of you can spot the culprit more easily than me.
I don't know if it's anyways worth to add this feature, let me know what you think!