Convert all parameters integers to 64 bit#8
Merged
Conversation
satta
approved these changes
Oct 24, 2018
Contributor
Author
|
Thanks for approving this @satta, please note that it will break backwards compatibility with the old filters. You should also approve the corresponding change in the Python library to make the two compatible. |
Member
|
Noted. We have already discussed this issue and I will take a look. Thanks for keeping an eye on it 👍 |
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I converted all parameter integers to 64 bit values to allow for larger filters. This is a change that breaks backwards-compatibility with older filters. I added a 64-bit value at the beginning of the filter to store the file version (currently 1) and additional flags, which we can use in the future to ensure backwards-compatibility.
I also noticed that the false positive probability was higher than anticipated for large filter values, which I found to be related to the hash generation. Before, the hashes were generated following the method outlined in Adam Kirsch et. al. (https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf), however their method of generating k hash values from two initial ones seems to be suboptimal and does not produce a fully uniform distribution. I replaced it with a scheme that borrows from PRNG schemes by generating an initial value
hnusing aFNVhash and then in each round setshn = hn*g % m, wheremis a large prime number andgis a primitive root ofm. As long as the number of bits in the filter is smaller thanm(which for all practical purposes it is) we can use this method to generatekhash values using an initial seed value generated viaFNV.I did this when noticing problems while converting the
Have I Been Pwneddatabase into a Bloom filter.