0% found this document useful (0 votes)

960 views16 pages

Decaying Window

Uploaded by

kharshitha93

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

960 views16 pages

Decaying Window

Uploaded by

kharshitha93

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Sliding Windows

Sliding window is a useful model of stream processing in which the

queries are about a window of length N – the N most recent
elements received.
In certain cases N is so large that the data cannot be stored in
memory, or even on disk. Sliding window is also known as

Consider a sliding window of length N=6 on a single

stream as shown in figure 1. As the stream content
varies over time the sliding window highlights new
stream elements.

Figure 1. Sliding window on stream

Example1: Consider Amazon online transactions. For
every product X we keep 0/1 stream of whether that
product was sold in the n-th transaction. A query like,
“how many times have we sold X in the last k sales?”
and an answer for it can be derived using sliding
window concept.

Now let us suppose we have a window of

length N (say N=24) on a binary system, We
want at all times to be able to answer a query
of the form “ How many 1’s are there in the
last K bits?” for K<=N.

Here comes the DGIM Algorithm into

picture:

COUNTING THE NUMBER OF 1’s IN THE

DATA STREAM

DGIM algorithm (Datar-Gionis-Indyk-

Motwani Algorithm)

Designed to find the number 1’s in a data

set. This algorithm uses O(log²N) bits to
represent a window of N bit, allows to
estimate the number of 1’s in the window
with an error of no more than 50%.

So this algorithm gives a 50% precise answer.

In DGIM algorithm, each bit that arrives has a

timestamp, for the position at which it arrives.

if the first bit has a timestamp 1, the second bit

has a timestamp 2 and so on.. the positions are
recognized with the window size N (the window
sizes are usually taken as a multiple of
2).The windows are divided into buckets
consisting of 1’s and 0's.

RULES FOR FORMING THE

BUCKETS:

1. The right side of the bucket should

always start with 1. (if it starts with
a 0,it is to be neglected) E.g. ·
1001011 → a bucket of size
4 ,having four 1’s and starting with
1 on it’s right end.
2. Every bucket should have at least
one 1, else no bucket can be
formed.
3. All buckets should be in powers of
2.
4. The buckets cannot decrease in
size as we move to the left. (move in
increasing order towards left)

Let us take an example to understand the

algorithm.

Estimating the number of 1’s and

counting the buckets in the given data
stream.
This picture shows how we can form the
buckets based on the number of ones by
following the rules.

In the given data stream let us assume

the new bit arrives from the right. When
the new bit = 0
After the new bit ( 0 ) arrives with a time
stamp 101, there is no change in the
buckets.

But what if the new bit that arrives is 1,

then we need to make changes..
· Create a new bucket with the current
timestamp and size 1.

· If there was only one bucket of size 1,

then nothing more needs to be done.
However, if there are now three buckets
of size 1( buckets with timestamp
100,102, 103 in the second step in the
picture) We fix the problem by combining
the leftmost(earliest) two buckets of size
1.

To combine any two adjacent buckets of

the same size, replace them by one
bucket of twice the size. The timestamp
of the new bucket is the timestamp of the
rightmost of the two buckets.

Now, sometimes combining two buckets

of size 1 may create a third bucket of size
2. If so, we combine the leftmost two
buckets of size 2 into a bucket of size 4.
This process may ripple through the
bucket sizes.

How long can you continue doing this…

You can continue if current timestamp-

leftmost bucket timestamp of window <
N (=24 here) E.g. 103–87=16 < 24 so I
continue, if it greater or equal to then I
stop.

Finally the answer to the query.

How many 1’s are there in the last 20

bits?

Counting the sizes of the buckets in the

last 20 bits, we say, there are 11 ones.

In a decaying window, you assign a score or

weight to every element of the incoming data
stream.
Decaying Window Algorithm
This algorithm allows you to identify the most
popular/trending elements in an incoming data
stream.

The decaying window algorithm not only tracks

the most recurring elements in an incoming data
stream, but also discounts any random spikes or
spam requests that might have boosted an
element’s frequency.
In a decaying window, you assign a score or
weight to every element of the incoming data
stream.
Further, you need to calculate the aggregate sum
for each distinct element by adding all the
weights assigned to that element. The element
with the highest total score is listed as trending
or the most popular.

weights

timet
1. Assign each element with a weight/score.
2. Calculate aggregate sum for each distinct
element by adding all the weights assigned to
that element.

In a decaying window algorithm, we assign

more weight to newer elements. For a new
element, you first reduce the weight of all the
existing elements by a constant factor k and then
assign the new element with a specific weight.
The aggregate sum of the decaying exponential
weights can be calculated using the following
formula:

∑t-1 at−i(1−c)i
i=0
Here, c is usually a small constant. Whenever a
new element, say at+1, arrives in the data stream
you perform the following steps to achieve an
updated sum:
1. Multiply the current sum/score by the
value (1−c).
2. Add the weight corresponding to the new
element.

Weight decays exponentially over time

In a data stream consisting of various elements,

you maintain a separate sum for each distinct
element. For every incoming element, you
multiply the sum of all the existing elements by
a value of (1−c). Further, you add the weight of
the incoming element to its corresponding
aggregate sum.

A threshold can be kept to, ignore elements of

weight lesser than that.
Finally, the element with the highest aggregate
score is listed as the most popular element.
Example
For example, consider a sequence of twitter tags
below:
data stream [fifa, ipl, fifa, ipl, ipl, ipl, fifa]

Also, let's say each element in sequence has

weight of 1.
Let's c be 0.1
The aggregate sum of each tag in the end of
above stream will be calculated as below:
Fifa SCORE
fifa - 1 * (1-0.1) = 0.9
ipl - 0.9 * (1-0.1) + 0 = 0.81 (adding 0 because
current tag is different than fifa)
fifa - 0.81 * (1-0.1) + 1 = 1.729 (adding 1
because current tag is fifa only)
ipl - 1.729 * (1-0.1) + 0 = 1.5561
ipl - 1.5561 * (1-0.1) + 0 = 1.4005
ipl - 1.4005 * (1-0.1) + 0 = 1.2605
fifa - 1.2605 * (1-0.1) + 1 = 2.135
ipl SCORE
fifa - 0 * (1-0.1) = 0
ipl - 0 * (1-0.1) + 1 = 1
fifa - 1 * (1-0.1) + 0 = 0.9 (adding 0 because
current tag is different than ipl)
ipl - 0.9 * (1-0.01) + 1 = 1.81
ipl - 1.81 * (1-0.01) + 1 = 2.7919
ipl - 2.7919 * (1-0.01) + 1 = 3.764
fifa - 3.764 * (1-0.01) + 0 = 3.7264

In the end of the sequence, we can see the score

of fifa is 2.135 but ipl is 3.7264
So, ipl is more trending than fifa
Even though both of them occurred same
number of times in input there score is still
different.
Advantages of Decaying Window
Algorithm:
1. Sudden spikes or spam data is taken care.
2. New element is given more weight by this
mechanism, to achieve right trending output.

Reference:
https://nitinbhojwani-tech-talk.blogspot.com/
2018/12/decaying-window-algorithm.html

Counting Oneness in A Window
No ratings yet
Counting Oneness in A Window
12 pages
Counting Ones in A Window
No ratings yet
Counting Ones in A Window
11 pages
Unit 4 - Lecture 3 - DGIM Algorithm Notes
100% (1)
Unit 4 - Lecture 3 - DGIM Algorithm Notes
8 pages
13 (A) Explain The Banker's Algorithm For Deadlock Avoidance With An Illustration. - Bituh
100% (1)
13 (A) Explain The Banker's Algorithm For Deadlock Avoidance With An Illustration. - Bituh
6 pages
IoT Data Analysis for Students
No ratings yet
IoT Data Analysis for Students
17 pages
C# Inheritance and Polymorphism Guide
No ratings yet
C# Inheritance and Polymorphism Guide
26 pages
Flajolet-Martin Algorithm
No ratings yet
Flajolet-Martin Algorithm
28 pages
Bda Experiment 4: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
No ratings yet
Bda Experiment 4: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
5 pages
Lecture Notes - Unit I: EID 453 Design Patterns 4/4 B.Tech (CSE B3)
No ratings yet
Lecture Notes - Unit I: EID 453 Design Patterns 4/4 B.Tech (CSE B3)
11 pages
File Access Methods in Operating System
No ratings yet
File Access Methods in Operating System
4 pages
Data Mining Techniques Explained
No ratings yet
Data Mining Techniques Explained
9 pages
Design and Analysis of Algorithms: Unit - III
No ratings yet
Design and Analysis of Algorithms: Unit - III
34 pages
ADA Question Bank
No ratings yet
ADA Question Bank
8 pages
Ooad
No ratings yet
Ooad
3 pages
Chapter - 11: Access Layer: Object Storage and Object Interoperability
100% (1)
Chapter - 11: Access Layer: Object Storage and Object Interoperability
8 pages
Cvent Associate Product Consultant Interview Prep
100% (1)
Cvent Associate Product Consultant Interview Prep
2 pages
Exp 07
No ratings yet
Exp 07
31 pages
DBMS
No ratings yet
DBMS
8 pages
CRT and Flat Panel Display Technologies
No ratings yet
CRT and Flat Panel Display Technologies
4 pages
Ooad Multiple Choice Questions Unitwise Unit 1
No ratings yet
Ooad Multiple Choice Questions Unitwise Unit 1
7 pages
Collision-Free Protocol
100% (1)
Collision-Free Protocol
3 pages
OGSA
No ratings yet
OGSA
164 pages
Java Applet Notes
No ratings yet
Java Applet Notes
4 pages
Information Package Diagram
100% (1)
Information Package Diagram
10 pages
Object Modeling Technique (OMT)
No ratings yet
Object Modeling Technique (OMT)
22 pages
Static and Dynamic Hashing
No ratings yet
Static and Dynamic Hashing
12 pages
CH 23 Time and Space
No ratings yet
CH 23 Time and Space
13 pages
16 Mark Questions OOAD
100% (2)
16 Mark Questions OOAD
9 pages
Python
100% (1)
Python
8 pages
Deadlock
No ratings yet
Deadlock
14 pages
Case Study of Various Routing Algorithm
No ratings yet
Case Study of Various Routing Algorithm
25 pages
Java TCP/UDP Socket Programming
No ratings yet
Java TCP/UDP Socket Programming
58 pages
Java IO Streams Interview Guide
No ratings yet
Java IO Streams Interview Guide
5 pages
CCS Full Notes
No ratings yet
CCS Full Notes
222 pages
Serial Schedule Non-Serial Schedule: Checkpoints
No ratings yet
Serial Schedule Non-Serial Schedule: Checkpoints
7 pages
UNIT-IV Advanced Architecture Part-2
No ratings yet
UNIT-IV Advanced Architecture Part-2
52 pages
Computer Forensics Evidence and Capture: Data Recovery
No ratings yet
Computer Forensics Evidence and Capture: Data Recovery
15 pages
ch-6 Common Mechanisms
No ratings yet
ch-6 Common Mechanisms
15 pages
Unit 3 Topic 4 Java Interfaces To HDFS
0% (1)
Unit 3 Topic 4 Java Interfaces To HDFS
15 pages
AVL Tree
No ratings yet
AVL Tree
30 pages
Data Stream Sampling Techniques
No ratings yet
Data Stream Sampling Techniques
3 pages
Agents and Communities
No ratings yet
Agents and Communities
53 pages
Cs3391 Oops Unit 1 Notes Eduengg
No ratings yet
Cs3391 Oops Unit 1 Notes Eduengg
60 pages
Mphasis Interview Questions
No ratings yet
Mphasis Interview Questions
5 pages
Data Structure Module 5
No ratings yet
Data Structure Module 5
22 pages
OOPs Concepts in JAVA
No ratings yet
OOPs Concepts in JAVA
4 pages
Load Balancing Policies Guide
No ratings yet
Load Balancing Policies Guide
2 pages
DFS and BFS Algorithm
100% (1)
DFS and BFS Algorithm
11 pages
Module 2 - 3 Cost Benefit Evaluation Techniques
No ratings yet
Module 2 - 3 Cost Benefit Evaluation Techniques
15 pages
Aiml (Sample) - Full Stack Development Lab Manual
No ratings yet
Aiml (Sample) - Full Stack Development Lab Manual
57 pages
BDA Presentation1
No ratings yet
BDA Presentation1
12 pages
Flajolet-Martin Algorithm for Distinct Count
No ratings yet
Flajolet-Martin Algorithm for Distinct Count
23 pages
Digital Nurture 2.0 - Deep Skilling Stage - Handbook
No ratings yet
Digital Nurture 2.0 - Deep Skilling Stage - Handbook
11 pages
Turing Test in AI, Agents, Environment
No ratings yet
Turing Test in AI, Agents, Environment
17 pages
Unit-3-Greedy Method PDF
No ratings yet
Unit-3-Greedy Method PDF
22 pages
Unit-4: Transfer From Analysis To Design in The Characterization Stage: Interaction Diagrams
100% (3)
Unit-4: Transfer From Analysis To Design in The Characterization Stage: Interaction Diagrams
40 pages
Module 4
No ratings yet
Module 4
20 pages
Counting Ones in A Window: The Cost of Exact Counts
100% (1)
Counting Ones in A Window: The Cost of Exact Counts
13 pages
Streams 1
No ratings yet
Streams 1
33 pages
Counting Ones in A Window
No ratings yet
Counting Ones in A Window
27 pages
Nmap Guide: Network Scanning Mastery
No ratings yet
Nmap Guide: Network Scanning Mastery
5 pages
Hacking Definition
No ratings yet
Hacking Definition
2 pages
WWW Goodfirms Co Big Data Analytics India Page 9
No ratings yet
WWW Goodfirms Co Big Data Analytics India Page 9
20 pages
CC Unit - 3
No ratings yet
CC Unit - 3
36 pages
Simplilearn in Brief PDF
No ratings yet
Simplilearn in Brief PDF
16 pages
Class Test II Questions
No ratings yet
Class Test II Questions
2 pages
Problem Management User Guide
No ratings yet
Problem Management User Guide
148 pages
Resume TanmayGoswamin
No ratings yet
Resume TanmayGoswamin
1 page
ERP & Business Process Analyst Expertise
No ratings yet
ERP & Business Process Analyst Expertise
1 page
Progress Test 1 - Attempt Review
No ratings yet
Progress Test 1 - Attempt Review
22 pages
Untitled 1
No ratings yet
Untitled 1
423 pages
Applications of Cloud Computing
No ratings yet
Applications of Cloud Computing
10 pages
Loader & Linker Guide for Students
No ratings yet
Loader & Linker Guide for Students
83 pages
Concurrent and Real-Time Programming in Java: © Andy Wellings, 2004
No ratings yet
Concurrent and Real-Time Programming in Java: © Andy Wellings, 2004
35 pages
Cloud Computing Mini Project Ideas
No ratings yet
Cloud Computing Mini Project Ideas
10 pages
Related To Fiori LP PDF
No ratings yet
Related To Fiori LP PDF
13 pages
Examens 3iir 2S 2024 - 240721 - 211542
No ratings yet
Examens 3iir 2S 2024 - 240721 - 211542
53 pages
Sworn Authority To Submit Certificate of Nomination and Certificate of Acceptance of Nomination
No ratings yet
Sworn Authority To Submit Certificate of Nomination and Certificate of Acceptance of Nomination
1 page
25 26comp
No ratings yet
25 26comp
4 pages
Final E-Government Strategy Implementation Report v1.12-26th Feb
No ratings yet
Final E-Government Strategy Implementation Report v1.12-26th Feb
361 pages
2024 Solution Challenge - Participant Guide
No ratings yet
2024 Solution Challenge - Participant Guide
17 pages
ICT Solutions for Enterprises
No ratings yet
ICT Solutions for Enterprises
8 pages
Privacy-Enhanced Mail (PEM) : Originator Authentication
No ratings yet
Privacy-Enhanced Mail (PEM) : Originator Authentication
4 pages
EDS - ExtremeWireless Cloud Exam - Dojo (Pass)
No ratings yet
EDS - ExtremeWireless Cloud Exam - Dojo (Pass)
13 pages
Rom and Ram
No ratings yet
Rom and Ram
5 pages
Module 1 - Introduction To Internet of Things
No ratings yet
Module 1 - Introduction To Internet of Things
48 pages
Billing Software System-Individual Project (Vipul Y S-4AD19CS103)
No ratings yet
Billing Software System-Individual Project (Vipul Y S-4AD19CS103)
33 pages
FMC SAL Integration
No ratings yet
FMC SAL Integration
36 pages
Wireguard Setup for OpenWrt Users
No ratings yet
Wireguard Setup for OpenWrt Users
11 pages
Chapter 1-Introduction To Computer Security
No ratings yet
Chapter 1-Introduction To Computer Security
40 pages

Decaying Window

Uploaded by

Decaying Window

Uploaded by

Sliding Windows

Sliding window is a useful model of stream processing in which the

Consider a sliding window of length N=6 on a single

Figure 1. Sliding window on stream

Now let us suppose we have a window of

Here comes the DGIM Algorithm into

COUNTING THE NUMBER OF 1’s IN THE

DGIM algorithm (Datar-Gionis-Indyk-

Designed to find the number 1’s in a data

So this algorithm gives a 50% precise answer.

In DGIM algorithm, each bit that arrives has a

if the first bit has a timestamp 1, the second bit

RULES FOR FORMING THE

1. The right side of the bucket should

Let us take an example to understand the

Estimating the number of 1’s and

In the given data stream let us assume

But what if the new bit that arrives is 1,

· If there was only one bucket of size 1,

To combine any two adjacent buckets of

Now, sometimes combining two buckets

How long can you continue doing this…

You can continue if current timestamp-

Finally the answer to the query.

How many 1’s are there in the last 20

Counting the sizes of the buckets in the

In a decaying window, you assign a score or

The decaying window algorithm not only tracks

In a decaying window algorithm, we assign

Weight decays exponentially over time

In a data stream consisting of various elements,

A threshold can be kept to, ignore elements of

Also, let's say each element in sequence has

In the end of the sequence, we can see the score

You might also like