Skip to content

Convert all parameters integers to 64 bit#8

Merged
rhaist merged 5 commits into
DCSO:masterfrom
adewes:master
Oct 24, 2018
Merged

Convert all parameters integers to 64 bit#8
rhaist merged 5 commits into
DCSO:masterfrom
adewes:master

Conversation

@adewes

@adewes adewes commented Jun 17, 2018

Copy link
Copy Markdown
Contributor

I converted all parameter integers to 64 bit values to allow for larger filters. This is a change that breaks backwards-compatibility with older filters. I added a 64-bit value at the beginning of the filter to store the file version (currently 1) and additional flags, which we can use in the future to ensure backwards-compatibility.

I also noticed that the false positive probability was higher than anticipated for large filter values, which I found to be related to the hash generation. Before, the hashes were generated following the method outlined in Adam Kirsch et. al. (https://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf), however their method of generating k hash values from two initial ones seems to be suboptimal and does not produce a fully uniform distribution. I replaced it with a scheme that borrows from PRNG schemes by generating an initial value hn using a FNV hash and then in each round sets hn = hn*g % m, where m is a large prime number and g is a primitive root of m. As long as the number of bits in the filter is smaller than m (which for all practical purposes it is) we can use this method to generate k hash values using an initial seed value generated via FNV.

I did this when noticing problems while converting the Have I Been Pwned database into a Bloom filter.

@rhaist rhaist requested a review from satta October 24, 2018 13:04
@adewes

adewes commented Oct 24, 2018

Copy link
Copy Markdown
Contributor Author

Thanks for approving this @satta, please note that it will break backwards compatibility with the old filters. You should also approve the corresponding change in the Python library to make the two compatible.

@satta

satta commented Oct 24, 2018

Copy link
Copy Markdown
Member

Noted. We have already discussed this issue and I will take a look. Thanks for keeping an eye on it 👍

@rhaist rhaist merged commit 7f2d6fa into DCSO:master Oct 24, 2018
@satta satta mentioned this pull request Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants