This library is a fork of Bits and Blooms that uses an alternative backing bitset based on Go's sync/atomic.Int64 rather than a bare slice of integers.
This allows for concurrent addition and testing of filters without creating memory safety issues or race conditions by leveraging hardware support for atomic Load and Or operations on Int64s.
A Bloom filter is a concise/compressed representation of a set, where the main requirement is to make membership queries; i.e., whether an item is a member of a set. A Bloom filter will always correctly report the presence of an element in the set when the element is indeed present. A Bloom filter can use much less storage than the original set, but it allows for some 'false positives': it may sometimes report that an element is in the set whereas it is not.
When you construct, you need to know how many elements you have (the desired capacity), and what is the desired false positive rate you are willing to tolerate. A common false-positive rate is 1%. The lower the false-positive rate, the more memory you are going to require. Similarly, the higher the capacity, the more memory you will use. You may construct the Bloom filter capable of receiving 1 million elements with a false-positive rate of 1% in the following manner.
filter := bloom.NewWithEstimates(1000000, 0.01) You should call NewWithEstimates conservatively: if you specify a number of elements that it is
too small, the false-positive bound might be exceeded. A Bloom filter is not a dynamic data structure:
you must know ahead of time what your desired capacity is.
Our implementation accepts keys for setting and testing as []byte. Thus, to
add a string item, "Love":
filter.Add([]byte("Love"))Similarly, to test if "Love" is in bloom:
if filter.Test([]byte("Love"))For numerical data, we recommend that you look into the encoding/binary library. But, for example, to add a uint32 to the filter:
i := uint32(100)
n1 := make([]byte, 4)
binary.BigEndian.PutUint32(n1, i)
filter.Add(n1)Godoc documentation: https://pkg.go.dev/github.com/ericvolp12/atomic-bloom
go get -u github.com/ericvolp12/atomic-bloomBefore committing the code, please check if it passes all tests using (note: this will install some dependencies):
make deps
make qaA Bloom filter has two parameters: m, the number of bits used in storage, and k, the number of hashing functions on elements of the set. (The actual hashing functions are important, too, but this is not a parameter for this implementation). A Bloom filter is backed by an Atomic BitSet; a key is represented in the filter by setting the bits at each value of the hashing functions (modulo m). Set membership is done by testing whether the bits at each value of the hashing functions (again, modulo m) are set. If so, the item is in the set. If the item is actually in the set, a Bloom filter will never fail (the true positive rate is 1.0); but it is susceptible to false positives. The art is to choose k and m correctly.
In this implementation, the hashing functions used is murmurhash, a non-cryptographic hashing function.
Given the particular hashing scheme, it's best to be empirical about this. Note that estimating the FP rate will clear the Bloom filter.
This implementation of Bloom Filters is safe to call from concurrent goroutines so long as your CPU supports Atomic Int64 (Most mainstream x86_64 and ARM systems do).
You can Test from and Add to the same filter to your heart's content.