perf: parallelize copying data sections#1277
Conversation
|
Is this what you meant for the initial low hanging fruit @davidlattimore? Marking this as draft cause some additional benchmarking is needed. Should the threshold be configurable in some way? |
|
On my system wild seems to be faster on the include-bob benchmark — however after this change writing is a lot faster. before: after: That's over 50% improvement (twice as fast for writing output) in this scenario on my system! |
|
We have picked up a lot of speed along the way in some conditions, if you hardcode https://github.com/davidlattimore/wild/blob/11c291f5e47d073a86fd71a390c4153764bccf06/libwild/src/file_writer.rs#L228 to return false you might be able to reproduce the original lack of performance. Alternatively, you could modify the benchmark to output cdylib instead of the executable. |
|
Nice! Yep, I see a good improvement on the include-blob benchmark too: So I'm happy to merge this once it's marked as not a draft. edit: I reran and amended the above benchmark results. The original baseline was wrong, so I was diffing two performance-sensitive changes not just this PR. |
I don't think it's necessary. We could experiment with different thresholds and see what difference they make - although we'd need to use a different benchmark other than the include-blob benchmark. But I think the current threshold is likely a reasonable starting point, so it's fine to just go with that for now. |
Okay great — I've marked this as open. |
Part of #1194.