0% found this document useful (0 votes)
960 views16 pages

Decaying Window

Uploaded by

kharshitha93
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
960 views16 pages

Decaying Window

Uploaded by

kharshitha93
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Sliding Windows

Sliding window is a useful model of stream processing in which the


queries are about a window of length N – the N most recent
elements received.
In certain cases N is so large that the data cannot be stored in
memory, or even on disk. Sliding window is also known as

Consider a sliding window of length N=6 on a single


stream as shown in figure 1. As the stream content
varies over time the sliding window highlights new
stream elements.

Figure 1. Sliding window on stream


Example1: Consider Amazon online transactions. For
every product X we keep 0/1 stream of whether that
product was sold in the n-th transaction. A query like,
“how many times have we sold X in the last k sales?”
and an answer for it can be derived using sliding
window concept.

Now let us suppose we have a window of


length N (say N=24) on a binary system, We
want at all times to be able to answer a query
of the form “ How many 1’s are there in the
last K bits?” for K<=N.

Here comes the DGIM Algorithm into


picture:

COUNTING THE NUMBER OF 1’s IN THE


DATA STREAM

DGIM algorithm (Datar-Gionis-Indyk-


Motwani Algorithm)

Designed to find the number 1’s in a data


set. This algorithm uses O(log²N) bits to
represent a window of N bit, allows to
estimate the number of 1’s in the window
with an error of no more than 50%.

So this algorithm gives a 50% precise answer.

In DGIM algorithm, each bit that arrives has a


timestamp, for the position at which it arrives.

if the first bit has a timestamp 1, the second bit


has a timestamp 2 and so on.. the positions are
recognized with the window size N (the window
sizes are usually taken as a multiple of
2).The windows are divided into buckets
consisting of 1’s and 0's.

RULES FOR FORMING THE


BUCKETS:

1. The right side of the bucket should


always start with 1. (if it starts with
a 0,it is to be neglected) E.g. ·
1001011 → a bucket of size
4 ,having four 1’s and starting with
1 on it’s right end.
2. Every bucket should have at least
one 1, else no bucket can be
formed.
3. All buckets should be in powers of
2.
4. The buckets cannot decrease in
size as we move to the left. (move in
increasing order towards left)

Let us take an example to understand the


algorithm.

Estimating the number of 1’s and


counting the buckets in the given data
stream.
This picture shows how we can form the
buckets based on the number of ones by
following the rules.

In the given data stream let us assume


the new bit arrives from the right. When
the new bit = 0
After the new bit ( 0 ) arrives with a time
stamp 101, there is no change in the
buckets.

But what if the new bit that arrives is 1,


then we need to make changes..
· Create a new bucket with the current
timestamp and size 1.

· If there was only one bucket of size 1,


then nothing more needs to be done.
However, if there are now three buckets
of size 1( buckets with timestamp
100,102, 103 in the second step in the
picture) We fix the problem by combining
the leftmost(earliest) two buckets of size
1.

To combine any two adjacent buckets of


the same size, replace them by one
bucket of twice the size. The timestamp
of the new bucket is the timestamp of the
rightmost of the two buckets.

Now, sometimes combining two buckets


of size 1 may create a third bucket of size
2. If so, we combine the leftmost two
buckets of size 2 into a bucket of size 4.
This process may ripple through the
bucket sizes.

How long can you continue doing this…

You can continue if current timestamp-


leftmost bucket timestamp of window <
N (=24 here) E.g. 103–87=16 < 24 so I
continue, if it greater or equal to then I
stop.

Finally the answer to the query.

How many 1’s are there in the last 20


bits?

Counting the sizes of the buckets in the


last 20 bits, we say, there are 11 ones.

In a decaying window, you assign a score or


weight to every element of the incoming data
stream.
Decaying Window Algorithm
This algorithm allows you to identify the most
popular/trending elements in an incoming data
stream.

The decaying window algorithm not only tracks


the most recurring elements in an incoming data
stream, but also discounts any random spikes or
spam requests that might have boosted an
element’s frequency.
In a decaying window, you assign a score or
weight to every element of the incoming data
stream.
Further, you need to calculate the aggregate sum
for each distinct element by adding all the
weights assigned to that element. The element
with the highest total score is listed as trending
or the most popular.

weights

timet
1. Assign each element with a weight/score.
2. Calculate aggregate sum for each distinct
element by adding all the weights assigned to
that element.

In a decaying window algorithm, we assign


more weight to newer elements. For a new
element, you first reduce the weight of all the
existing elements by a constant factor k and then
assign the new element with a specific weight.
The aggregate sum of the decaying exponential
weights can be calculated using the following
formula:

∑t-1 at−i(1−c)i
i=0
Here, c is usually a small constant. Whenever a
new element, say at+1, arrives in the data stream
you perform the following steps to achieve an
updated sum:
1. Multiply the current sum/score by the
value (1−c).
2. Add the weight corresponding to the new
element.

Weight decays exponentially over time

In a data stream consisting of various elements,


you maintain a separate sum for each distinct
element. For every incoming element, you
multiply the sum of all the existing elements by
a value of (1−c). Further, you add the weight of
the incoming element to its corresponding
aggregate sum.

A threshold can be kept to, ignore elements of


weight lesser than that.
Finally, the element with the highest aggregate
score is listed as the most popular element.
Example
For example, consider a sequence of twitter tags
below:
data stream [fifa, ipl, fifa, ipl, ipl, ipl, fifa]

Also, let's say each element in sequence has


weight of 1.
Let's c be 0.1
The aggregate sum of each tag in the end of
above stream will be calculated as below:
Fifa SCORE
fifa - 1 * (1-0.1) = 0.9
ipl - 0.9 * (1-0.1) + 0 = 0.81 (adding 0 because
current tag is different than fifa)
fifa - 0.81 * (1-0.1) + 1 = 1.729 (adding 1
because current tag is fifa only)
ipl - 1.729 * (1-0.1) + 0 = 1.5561
ipl - 1.5561 * (1-0.1) + 0 = 1.4005
ipl - 1.4005 * (1-0.1) + 0 = 1.2605
fifa - 1.2605 * (1-0.1) + 1 = 2.135
ipl SCORE
fifa - 0 * (1-0.1) = 0
ipl - 0 * (1-0.1) + 1 = 1
fifa - 1 * (1-0.1) + 0 = 0.9 (adding 0 because
current tag is different than ipl)
ipl - 0.9 * (1-0.01) + 1 = 1.81
ipl - 1.81 * (1-0.01) + 1 = 2.7919
ipl - 2.7919 * (1-0.01) + 1 = 3.764
fifa - 3.764 * (1-0.01) + 0 = 3.7264

In the end of the sequence, we can see the score


of fifa is 2.135 but ipl is 3.7264
So, ipl is more trending than fifa
Even though both of them occurred same
number of times in input there score is still
different.
Advantages of Decaying Window
Algorithm:
1. Sudden spikes or spam data is taken care.
2. New element is given more weight by this
mechanism, to achieve right trending output.

Reference:
https://nitinbhojwani-tech-talk.blogspot.com/
2018/12/decaying-window-algorithm.html

You might also like