Tags: zeebo/xxh3
Tags
stop escaping input argument the trick to dispatch to one of two functions by picking it and putting it into a variable to allow the body to be inlined has a problem: the compiler, seeing the virtual dispatch, is not smart enough to notice that every option does not escape the argument, and so it escapes the argument. remove that and refactor some to gain some lost performance. some smaller sizes are worse by <1ns, but some larger sizes are better by up to 15%, and nothing escapes anymore. fixes #19
fix incorrect Hasher output for sizes larger than a block+stripe (so 1088 bytes) the Hasher would output the wrong value. this was due to a bug caused by having reversed arguments to a copy call. the test did not catch it because it too had a bug where it was generating very aligned data bytes. the reason why it does this copy is because the final accumulation step always reads a stripe backwards from the end, so if we just accumulated a block and then need to finalize, it reads in the last stripe of bytes again. fixes #14
Use correct sized constant This worked because of alignment. Instead use explicit uint64s.
add Hasher for hash.Hash and clean up This change adds a Hasher type that implements hash.Hash. It has to keep a trailing stripe because the finalization step ends up reading up to a stripe from the end of the buffer after a full block has been consumed. In order to implement it, a new accumBlock function was introduced that does the first part of the hashLarge function that operates on full blocks. It's possible that we can reduce the amount of internal state the hash has to keep to 2 stripes instead of 17 stripes at the cost of more accumBlock calls. That's is left for further work and investigation. Additionally, there was a lot of reorganization around how the architecture specific code was laid out. This should help me stop pushing changes that break other architectures because I forgot build tags or whatever. There's now a `make vet` check that runs `go vet` on different GOARCH values. So far, that has been able to catch every silly mistake I've made doing that. It also helps reduce some duplication by making the cpu feature flags constants on unsupported architectures so that the compiler can just dead-code eliminate some branches. Fixes #7.
add Hasher for hash.Hash and clean up This change adds a Hasher type that implements hash.Hash. It has to keep a trailing stripe because the finalization step ends up reading up to a stripe from the end of the buffer after a full block has been consumed. In order to implement it, a new accumBlock function was introduced that does the first part of the hashLarge function that operates on full blocks. It's possible that we can reduce the amount of internal state the hash has to keep to 2 stripes instead of 17 stripes at the cost of more accumBlock calls. That's is left for further work and investigation. Additionally, there was a lot of reorganization around how the architecture specific code was laid out. This should help me stop pushing changes that break other architectures because I forgot build tags or whatever. There's now a `make vet` check that runs `go vet` on different GOARCH values. So far, that has been able to catch every silly mistake I've made doing that. It also helps reduce some duplication by making the cpu feature flags constants on unsupported architectures so that the compiler can just dead-code eliminate some branches. Fixes #7.
return struct for hash128
if your code broke because of this change, i would first like
to apologize. let me try to explain why i decided to do this
even though it would be breaking.
first, it's a v0 library. this is just a necessary condition
though not sufficient. if it were v1, i would have to do
something else. second, it turns out that the go compiler does
not optimize with arrays well. heres some benchmarks:
Fixed128/1 102MB/s ± 0% 241MB/s ± 1% +136.24%
Fixed128/2 194MB/s ± 1% 457MB/s ± 2% +135.17%
Fixed128/3 306MB/s ± 1% 717MB/s ± 0% +134.58%
Fixed128/4 398MB/s ± 0% 924MB/s ± 3% +132.31%
Fixed128/8 793MB/s ± 1% 1854MB/s ± 1% +133.88%
Fixed128/9 855MB/s ± 1% 1597MB/s ± 1% +86.83%
Fixed128/16 1.52GB/s ± 1% 2.84GB/s ± 1% +86.99%
Fixed128/17 1.37GB/s ± 0% 2.51GB/s ± 1% +83.70%
Fixed128/32 2.56GB/s ± 1% 4.58GB/s ± 9% +78.52%
Fixed128/33 2.15GB/s ± 1% 3.53GB/s ± 1% +64.34%
Fixed128/64 4.17GB/s ± 1% 6.86GB/s ± 1% +64.78%
Fixed128/65 3.61GB/s ± 1% 5.36GB/s ± 4% +48.60%
Fixed128/96 5.34GB/s ± 1% 8.01GB/s ± 0% +50.01%
Fixed128/97 4.69GB/s ± 1% 6.85GB/s ± 1% +46.10%
Fixed128/128 6.20GB/s ± 0% 9.04GB/s ± 1% +45.86%
Fixed128/129 5.42GB/s ± 1% 6.35GB/s ± 1% +17.19%
Fixed128/240 6.94GB/s ± 1% 7.85GB/s ± 1% +13.11%
some of those speedups are absolutely massive. i went through
great lengths to get even 10% speedups before. third, one of
the main criterias of a good hash function is speed. assuming
two hash functions do an equally good job at their core
competency of hashing, speed is the most important feature.
otherwise, we'd just use some cryptographic hash function and
call it a day. so given that, i'm assuming that users of this
library would happily make small fixes to get that performance
improvement. indeed, a local fix could be applied by changing
res := xxh3.Hash128(someBytes)
to
resTmp := xxh3.Hash128(someBytes)
res := [2]uint64{resTmp.Hi, resTmp.Lo}
and if i could ship this change with a gofix to do that, i would.
anyway, sorry again. i hope you agree that it was worth it. all
approximately 5 of you at the time of this commit according to
deps.dev.
PreviousNext