Conversation
bloom.go
Outdated
| h1, h2 := hash.Sum128() | ||
| hash = hash.Write(entropy) // Add entropy | ||
| h3, h4 := hash.Sum128() | ||
| h1, h2 := murmur3.Sum128(data) |
There was a problem hiding this comment.
Isn't this different semantically from the previous code? That is, previous code would be the equivalent of:
h1, h2 := murmur3.Sum128(data)
// hash hasn't reset I assume, so the new call is actually on data + entropy)
h3, h4 := murmur3.Sum128(data + entropy)
``
There was a problem hiding this comment.
yes, I completely misread the code.
… hash function with padding for entropy
murmur.go
Outdated
| h1, h2 = _h1, _h2 // restore state | ||
| clen++ // add entropy byte to total hash data length | ||
| switch len(data) + 1 { // process entropy byte *only*, no fallthrough - unless it's now a full block | ||
| case 16: // == entire block |
There was a problem hiding this comment.
As mentioned in chat--I think this is a reasonable way to write this, but I still found it confusing; maybe worth adding more commentary?
Something to the effect of:
Pretend that the entropy byte is the last byte of the array, and apply the Sum128() logic accordingly.
Logically, this is:
if len(data) + 1 == 16 {
// do one more bmix pass, and then skip the switch statement below (no cases would apply)
}
switch len(data) + 1 {
case 15:
// entropy byte is the last byte; treat it specially
k2 ^= entropy << 48
fallthrough
// now, continue processing the rest of the slice
// }
It's hard to write it this way because of the fallthrough's--we would need to (wastefully) check at every case whether this is actually the last byte (in which case, use `entropy`) or not (in which case, use the actual byte at that position
There was a problem hiding this comment.
... or something like that; dunno if that kind of verbiage is more clear or just more verbose :D
andrewmains12
left a comment
There was a problem hiding this comment.
Implementation lgtm, nice fix! Like I mentioned--I had a bit of a hard time mapping it to the twmb implementation directly, but after sitting with it a bit it makes sense and seems correct. Dunno if there's much we can do to make that more clear so I'm good with this as it stands.
murmur.go
Outdated
| switch rlen + 1 { | ||
| case 15: | ||
| k2 ^= entropy << 48 | ||
| goto l14 |
There was a problem hiding this comment.
Nit: maybe a comment to the effect of:
// the entropy byte is the last byte (index 14)--process it, then proceed with the rest of the array (length 14 case below).
murmur.go
Outdated
| k1, k2 = 0, 0 | ||
| h1, h2 = _h1, _h2 // restore state to before final mixing | ||
| clen++ // total hash data length now includes entropy | ||
| // Fist, process entropy byte *only*. No fallthrough. |
Use hand-rolled hash function, identical to the existing algo using
m3db/stackmurmur3Code is all based on
twmb/murmur3and is actually faster, without any use ofunsafeor assembly foramd64