Example
• Consider the Hidden Markov model where the
Professor picks a shirt colour based on his mood.
Given that he picked green, blue, and red shirt for
three days, predict his mood on those 3 days
respectively.
• Hidden States are: •
• Happy and Sad
• Observed States are:
• Red Shirt, Green Shirt
and Blue Shirt
• For the Given Markov
Chain:
• P(H|H) = 0.7
• P(S|H) = 0.3
• P(H|S) = 0.5
• P(S|S) = 0.5
Day 1 Day 2 Day 3
Happy 0.58 0.62 0.62
Sad 0.42 0.38 0.38
Based on state transition probabilities the mood for respective 3 days are as above.
Day 1 Day 2 Day 3
Shirt Colour Green Blue Red
Mood Sad Sad Happy
Flajolet Martin (FM) Algorithm LogLog Counting
• Given set {1, 2, 3, 4, 2, 1, 4, 4, 2, 1, 3]
• Unique Elements are 🡺 [1, 2, 3, 4]
• The FM algorithm approximates the number of
unique objects in a stream or database in one pass.
• If stream contains n elements with m of them are
unique then FM algorithm runs in 0(n) and needs
0(log m) memory.
• It uses very less memory to approximate unique
number of objects in a stream.
• FM algorithm gives an approximation of the count
and not the exact count. It may have approx. error.
Working of FM Algorithm
•
Example
•
X Hash Value h(x) Binary of Hash Value Number of Trailing zeros
1 (3 * 1 + 1) mod 5 = 4 100 2
3 (3 * 3 + 1) mod 5 = 0 000 0
5 (3 * 5 + 1) mod 5 = 1 001 0
7 (3 * 7 + 1) mod 5 = 2 010 1
5 (3 * 5 + 1) mod 5 = 1 001 0
2 (3 * 2 + 1) mod 5 = 2 010 1
7 (3 * 7 + 1) mod 5 = 2 010 1
•
What is Bloom Filter?
• A Bloom Filter is a data structure designed to tell you, rapidly
and memory-efficiently, whether an element is present in a set.
It is based on a probabilistic mechanism where false positive
retrieval results are possible, but false negatives are not.
Bloom filter
Element Hash
Function
set
(+) (-)
Might be Present Surely not Present
What is hashing?
• Hash functions can be thought of as code given to
each data item. The data item can be anything. It
can be a string saying Hello World and there will
be a corresponding code. This code is used as an
index to share the information in a secure way and
to retrieve information from memory core. This
code tells the computer what to look for and
where to.
• Bloom filters have a number of hash functions,
which are used to set bits in the bit array. The bit
positions are determined by the hash functions.
Working of Bloom Filter
Example
A set consists of some elements {8,10,…},
Check whether 4 and 7 present in the set or not.
Set size=10(Array size)
Hash function set:-
• (3x+3)mod6
• (3x+7)mod8
• (2x+9)mod2
• (2x+3)mod5
Step 1:-Initialize Array
0 0 0 0 0 0 0 0 0 0
0 1 2 3 4 5 6 7 8 9
Step 2:-Apply hash function and
calculate value
8 10 Array
(3x+3)mod6
3 3 {0,0,0,1,0,0,0,0,0,0}
(3x+7)mod8 7 5 {0,0,0,1,0,1,0,1,0,0}
8,10
(2x+9)mod2 1 1 {0,1,0,1,0,1,0,1,0,0}
4 3 {0,1,0,1,1,1,0,1,0,0}
(2x+3)mod5
Step 3:- To check Whether element is
present in the set. Apply hash
function and obtain all values.
(i) If all values are 1 then element is
Probably in the set.
(ii) If any or all value are 0 then
element is surely not in the set
Bit Array
4 7
(3x+3)mod6
3 ✓ 0
(3x+7)mod8 3 ✓ 4 ✓
4,7
(2x+9)mod2 1 ✓ 1 ✓
(2x+3)mod5 1 ✓ 2
Operations that a Bloom Filter
supports
• insert(x) : To insert an element in the Bloom
Filter.
• lookup(x) : to check whether an element is
already present in Bloom Filter with a positive
false probability
NOTE : We cannot delete an element in Bloom
Filter.
Some of the properties of bloom filters are
• It allows for membership lookups in constant space & time. Bloom
filter can very quickly answer YES/NO questions, like “is this item in the
set?”.
• Very infrequently it will give a false positive answer, implies it will say
YES if the answer is NO (Probably in the set).
• It will never give false negative answer implies it will never say NO if
the answer is YES (Definitely not in the set).
• Basic bloom filter supports two operations test and add.
• It has constant time complexity for both adding items and asking
whether a key is present or not.
• You can’t remove an item from a bloom filter.
• It also requires very less space compared to the number of items you
need to store and check.
• Now we want to remove SYSTEM from the bloom
filter. To do that we need to unset indexes 4, 5 and 9.
If we do so, INTERVIEW will also be removed from
the bloom filter because 5 and 9 were also set by
INTERVIEW.
• It is almost impossible to remove an element from a
bloom filter without removing other elements.
Applications of Bloom filters
• Medium uses bloom filters for recommending post to users by
filtering post which have been seen by user.
• Quora implemented a shared bloom filter in the feed backend to
filter out stories that people have seen before.
• The Google Chrome web browser used to use a Bloom filter to
identify malicious URLs
• Google BigTable, Apache HBase and Apache Cassandra, and
PostgreSQL use Bloom filters to reduce the disk lookups for
non-existent rows or columns.
• Helps to avoid costly lookups considering increases the
performance of a database query operations.
Some of the existing implementations
of bloom filters are
• Bloomd is a high-performance C server which is used
to expose bloom filters and operations over them to
networked clients. It nicely fits into the micro-services
architecture.
• PyreBloom is python based implementation of bloom
filter based on Redis
• The Cuckoo filter is another variation of the bloom
filter. Cuckoo filters support adding and removing
items dynamically while achieving even higher
performance than Bloom filters. There is a nice blog by
Farhan on the difference between Cuckoo and Bloom
filters.