0% found this document useful (0 votes)
34 views21 pages

DSBDA UT 2 Part 2

The document discusses using a Hidden Markov model to predict a professor's mood on three days based on the color of shirt he wore each day (green, blue, red). It provides the transition probabilities for happy and sad mood states. Using these probabilities, it estimates the professor's mood each day as mostly sad on the first two days (wearing green and blue shirts) and happy on the third day (wearing a red shirt).

Uploaded by

sunandanpt21020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views21 pages

DSBDA UT 2 Part 2

The document discusses using a Hidden Markov model to predict a professor's mood on three days based on the color of shirt he wore each day (green, blue, red). It provides the transition probabilities for happy and sad mood states. Using these probabilities, it estimates the professor's mood each day as mostly sad on the first two days (wearing green and blue shirts) and happy on the third day (wearing a red shirt).

Uploaded by

sunandanpt21020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Example

• Consider the Hidden Markov model where the


Professor picks a shirt colour based on his mood.
Given that he picked green, blue, and red shirt for
three days, predict his mood on those 3 days
respectively.
• Hidden States are: •
• Happy and Sad
• Observed States are:
• Red Shirt, Green Shirt
and Blue Shirt
• For the Given Markov
Chain:
• P(H|H) = 0.7
• P(S|H) = 0.3
• P(H|S) = 0.5
• P(S|S) = 0.5
Day 1 Day 2 Day 3

Happy 0.58 0.62 0.62

Sad 0.42 0.38 0.38

Based on state transition probabilities the mood for respective 3 days are as above.
Day 1 Day 2 Day 3

Shirt Colour Green Blue Red

Mood Sad Sad Happy


Flajolet Martin (FM) Algorithm LogLog Counting
• Given set {1, 2, 3, 4, 2, 1, 4, 4, 2, 1, 3]
• Unique Elements are 🡺 [1, 2, 3, 4]
• The FM algorithm approximates the number of
unique objects in a stream or database in one pass.
• If stream contains n elements with m of them are
unique then FM algorithm runs in 0(n) and needs
0(log m) memory.
• It uses very less memory to approximate unique
number of objects in a stream.
• FM algorithm gives an approximation of the count
and not the exact count. It may have approx. error.
Working of FM Algorithm

Example

X Hash Value h(x) Binary of Hash Value Number of Trailing zeros


1 (3 * 1 + 1) mod 5 = 4 100 2
3 (3 * 3 + 1) mod 5 = 0 000 0
5 (3 * 5 + 1) mod 5 = 1 001 0
7 (3 * 7 + 1) mod 5 = 2 010 1
5 (3 * 5 + 1) mod 5 = 1 001 0
2 (3 * 2 + 1) mod 5 = 2 010 1
7 (3 * 7 + 1) mod 5 = 2 010 1

What is Bloom Filter?
• A Bloom Filter is a data structure designed to tell you, rapidly
and memory-efficiently, whether an element is present in a set.
It is based on a probabilistic mechanism where false positive
retrieval results are possible, but false negatives are not.
Bloom filter

Element Hash
Function
set

(+) (-)
Might be Present Surely not Present
What is hashing?
• Hash functions can be thought of as code given to
each data item. The data item can be anything. It
can be a string saying Hello World and there will
be a corresponding code. This code is used as an
index to share the information in a secure way and
to retrieve information from memory core. This
code tells the computer what to look for and
where to.
• Bloom filters have a number of hash functions,
which are used to set bits in the bit array. The bit
positions are determined by the hash functions.
Working of Bloom Filter
Example
A set consists of some elements {8,10,…},
Check whether 4 and 7 present in the set or not.

Set size=10(Array size)

Hash function set:-


• (3x+3)mod6
• (3x+7)mod8
• (2x+9)mod2
• (2x+3)mod5
Step 1:-Initialize Array

0 0 0 0 0 0 0 0 0 0
0 1 2 3 4 5 6 7 8 9
Step 2:-Apply hash function and
calculate value

8 10 Array

(3x+3)mod6
3 3 {0,0,0,1,0,0,0,0,0,0}

(3x+7)mod8 7 5 {0,0,0,1,0,1,0,1,0,0}
8,10
(2x+9)mod2 1 1 {0,1,0,1,0,1,0,1,0,0}

4 3 {0,1,0,1,1,1,0,1,0,0}
(2x+3)mod5
Step 3:- To check Whether element is
present in the set. Apply hash
function and obtain all values.

(i) If all values are 1 then element is


Probably in the set.
(ii) If any or all value are 0 then
element is surely not in the set
Bit Array

4 7
(3x+3)mod6
3 ✓ 0

(3x+7)mod8 3 ✓ 4 ✓
4,7
(2x+9)mod2 1 ✓ 1 ✓

(2x+3)mod5 1 ✓ 2
Operations that a Bloom Filter
supports
• insert(x) : To insert an element in the Bloom
Filter.

• lookup(x) : to check whether an element is


already present in Bloom Filter with a positive
false probability

NOTE : We cannot delete an element in Bloom


Filter.
Some of the properties of bloom filters are
• It allows for membership lookups in constant space & time. Bloom
filter can very quickly answer YES/NO questions, like “is this item in the
set?”.
• Very infrequently it will give a false positive answer, implies it will say
YES if the answer is NO (Probably in the set).
• It will never give false negative answer implies it will never say NO if
the answer is YES (Definitely not in the set).
• Basic bloom filter supports two operations test and add.
• It has constant time complexity for both adding items and asking
whether a key is present or not.
• You can’t remove an item from a bloom filter.
• It also requires very less space compared to the number of items you
need to store and check.
• Now we want to remove SYSTEM from the bloom
filter. To do that we need to unset indexes 4, 5 and 9.
If we do so, INTERVIEW will also be removed from
the bloom filter because 5 and 9 were also set by
INTERVIEW.
• It is almost impossible to remove an element from a
bloom filter without removing other elements.
Applications of Bloom filters
• Medium uses bloom filters for recommending post to users by
filtering post which have been seen by user.
• Quora implemented a shared bloom filter in the feed backend to
filter out stories that people have seen before.
• The Google Chrome web browser used to use a Bloom filter to
identify malicious URLs
• Google BigTable, Apache HBase and Apache Cassandra, and
PostgreSQL use Bloom filters to reduce the disk lookups for
non-existent rows or columns.
• Helps to avoid costly lookups considering increases the
performance of a database query operations.
Some of the existing implementations
of bloom filters are
• Bloomd is a high-performance C server which is used
to expose bloom filters and operations over them to
networked clients. It nicely fits into the micro-services
architecture.
• PyreBloom is python based implementation of bloom
filter based on Redis
• The Cuckoo filter is another variation of the bloom
filter. Cuckoo filters support adding and removing
items dynamically while achieving even higher
performance than Bloom filters. There is a nice blog by
Farhan on the difference between Cuckoo and Bloom
filters.

You might also like