perf: cell reader buffer reuse, optimise maps#611
Conversation
68cd545 to
5ffad42
Compare
jmcnamara
left a comment
There was a problem hiding this comment.
+1 on adding the context.
|
Overall looks good. My only concern is with adding Also, I think the first commit could have been 3 functionally separate commits (even if they are small). You don't need to go back and change that though. Just a note for the future. |
Honestly? Couldn't even measure it; just kept it as the dependency was already implicit 😄 (Update: done)
Noted (and agreed ;) |
5efabc2 to
bde3b37
Compare
bde3b37 to
0cf5dbc
Compare
|
LGTM! Every little bit of performance is always appreciated! |
|
Merged. Thanks. |
Three optimisations - two (very) small/minor, one medium, no public API changes:
Optimisations
Replaced
s.chunks(4)withs.chunks_exact(4)for the most micro of micro-optimisations.As per @tafia's review suggestion perf: cache path and attr lookups, reuse buffers, minimise allocs #606 (review), replaced use of the standard
HashMapwithhashbrown::HashMap(which usesahash). This was already a transitive dependency viazip, so doesn't actually expand the current set of deps (though is now explicit inCargo.toml). Also replacedBTreeMapwithHashMapin two places where the map does not actually need to maintain ordering ("read_relationships" and "number_formats") to getO(n)instead ofO(log n); marginal impact though as these maps are rarely (if ever?) going to be large.Update: reverted explicit
hashbrownintegration (benefit was too marginal to reliably measure - maps are likely too small in relation to everything else), but kept the switch fromBTreeMaptoHashMapin places where order isn't needed..XlsxCellReader, but cleaned it up a bit by consolidating the buffers in a newValueBufsstruct. No more use ofread_string, so dropped it (only the variant that takes shared buffers is used now).Miscellaneous
clippycomplained about one of the functions having one too many args, so I also consolidated workbook-related args into aWorkbookContextstruct; seems tidy?Results
~6-7% speedup1 when testing on a large (10 million cell) mixed-dtype Workbook.
Footnotes
Benchmarked on: Apple Silicon M3 Max ↩