0% found this document useful (0 votes)

12 views62 pages

Association Analysis Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar

Uploaded by

dasoha5437

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views62 pages

Association Analysis Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar

Uploaded by

dasoha5437

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

Data Mining

Association Analysis

Introduction to Data Mining, 2nd Edition

by
Tan, Steinbach, Karpatne, Kumar

3/8/2021 Introduction to Data Mining, 2 nd Edition 1

Introduction

 Business enterprises accumulate large quantities of data

from their day-to-day operations
 huge amounts of customer purchase data collected daily at
the checkout counters of grocery stores.
Market-Basket transactions
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

3/8/2021 Introduction to Data Mining, 2 nd Edition 2

Introduction

 Business enterprises accumulate large quantities of data

from their day-to-day operations
 huge amounts of customer purchase data collected daily at
the checkout counters of grocery stores.
Market-Basket transactions Analyze the data to learn about the
purchasing behaviour of the
TID Items customers.
1 Bread, Milk
Valuable information can be used to
2 Bread, Diaper, Beer, Eggs support variety of business-related
3 Milk, Diaper, Beer, Coke applications:

4 Bread, Milk, Diaper, Beer → Marketing promotions

5 Bread, Milk, Diaper, Coke → Inventory managment

→ CRM
→ Pricing
3/8/2021 Introduction to Data Mining, 2 nd Edition 3
Association Rule Mining

 Given a set of transactions, find rules that will predict the

occurrence of an item based on the occurrences of other
items in the transaction

Market-Basket transactions
Example of Association Rules
TID Items
{Diaper} →
{Beer},
1 Bread, Milk {Milk, Bread} →
{Eggs,Coke},
2 Bread, Diaper, Beer, Eggs {Beer, Bread} →{Milk},
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer Implication means co-occurrence,
5 Bread, Milk, Diaper, Coke not causality!

3/8/2021 Introduction to Data Mining, 2 nd Edition 4

Definition: Frequent Itemset
 Itemset
– A collection of one or more items
 Example: {Milk, Bread, Diaper}
– k-itemset TID Items
 An itemset that contains k items 1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

3/8/2021 Introduction to Data Mining, 2 nd Edition 5

3/8/2021 Introduction to Data Mining, 2 nd Edition 6

Definition: Frequent Itemset
 Itemset
– A collection of one or more items
 Example: {Milk, Bread, Diaper}
– k-itemset TID Items
 An itemset that contains k items 1 Bread, Milk
 Support count ( σ) 2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
– Frequency of occurrence of an itemset
4 Bread, Milk, Diaper, Beer
– E.g. σ
({Milk, Bread,Diaper}) = 2
5 Bread, Milk, Diaper, Coke
 Support
– Fraction of transactions that contain an
itemset
– E.g. s({Milk, Bread, Diaper}) = 2/5

3/8/2021 Introduction to Data Mining, 2 nd Edition 7

3/8/2021 Introduction to Data Mining, 2 nd Edition 8

3/8/2021 Introduction to Data Mining, 2 nd Edition 9

3/8/2021 Introduction to Data Mining, 2 nd Edition 10

Definition: Association Rule
 Association Rule
TID Items
– An implication expression of the form
1 Bread, Milk
X→Y, where X and Y are itemsets
2 Bread, Diaper, Beer, Eggs
– Example:
{Milk, Diaper} →
{Beer} 3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
 Rule Evaluation Metrics
– Support (s)
 Fraction of transactions that contain Example:
both X and Y {Milk , Diaper}  {Beer}
– Confidence (c)
 Measures how often items in Y  (Milk , Diaper, Beer ) 2
appear in transactions that s  0.4
contain X
|T| 5
 (Milk, Diaper, Beer ) 2
c  0.67
 (Milk , Diaper ) 3
3/8/2021 Introduction to Data Mining, 2 nd Edition 11
Association Rule Mining Task

 Given a set of transactions T, the goal of association

rule mining is to find all rules having
– support ≥ minsup threshold
– confidence ≥ minconf threshold

3/8/2021 Introduction to Data Mining, 2 nd Edition 12

Association Rule Mining Task

 Given a set of transactions T, the goal of association

rule mining is to find all rules having
– support ≥ minsup threshold
– confidence ≥ minconf threshold

 Brute-force approach:
– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf thresholds
Computationally prohibitive!

3/8/2021 Introduction to Data Mining, 2 nd Edition 13

Computational Complexity
 Given d unique items:
– Total number of itemsets = 2d
– Total number of possible association rules:

 d 
d1  d  k 
d k
R       
 k   j 
k 1 j 1

3  2  1
d d 1

If d=6, R = 602 rules

3/8/2021 Introduction to Data Mining, 2 nd Edition 14

Mining Association Rules

Example of Rules:
TID Items
1 Bread, Milk
{Milk,Diaper} →
{Beer} (s=0.4, c=0.67)
{Milk,Beer} →
{Diaper} (s=0.4, c=1.0)
2 Bread, Diaper, Beer, Eggs
{Diaper,Beer} →{Milk} (s=0.4, c=0.67)
3 Milk, Diaper, Beer, Coke
{Beer} →{Milk,Diaper} (s=0.4, c=0.67)
4 Bread, Milk, Diaper, Beer {Diaper} →
{Milk,Beer} (s=0.4, c=0.5)
5 Bread, Milk, Diaper, Coke {Milk} →
{Diaper,Beer} (s=0.4, c=0.5)

3/8/2021 Introduction to Data Mining, 2 nd Edition 15

Mining Association Rules

Observations:
 All the rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}

 Rules originating from the same itemset have identical support but
can have different confidence

 Thus, we may decouple the support and confidence requirements

3/8/2021 Introduction to Data Mining, 2 nd Edition 16

Mining Association Rules

Divide the problem in 2 sub tasks:

1. Frequent Itemset Generation

Find all the itemsets where support >= minsup threshold
2. Rule Generation

From above set find all itemsets with high confidence. These
are strong rules

 Frequent Itemset Generation is still computationally

expensive than Rule Generation

3/8/2021 Introduction to Data Mining, 2 nd Edition 17

Frequent Itemset Generation

What are the list of possible itemsets?

For items, I = { a, b, c, d, e}

Itemsets could be a, b, c, d, e

ab, ac, ad, ae

bc, bd, be

cd, ce

de

abc, abd, abe, acd, ace, and all such combinations

3/8/2021 Introduction to Data Mining, 2 nd Edition 18

Frequent Itemset Generation
null

A B C D E

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

How many possible
candidate itemsets?
ABCDE

3/8/2021 Introduction to Data Mining, 2 nd Edition 19

Frequent Itemset Generation
null

A B C D E

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Given d items, there
are 2d possible
ABCDE candidate itemsets
3/8/2021 Introduction to Data Mining, 2 nd Edition 20
Frequent Itemset Generation
 Brute-force approach:
– Each itemset in the lattice is a candidate frequent itemset
– Count the support of each candidate by scanning the database

Transactions List of
Candidates
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
N 3 Milk, Diaper, Beer, Coke M
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
w

– Match (compare) each transaction against every candidate

– Complexity ~ O(NMw) => Expensive since M = 2d-1 !!!
3/8/2021 Introduction to Data Mining, 2 nd Edition 21
Frequent Itemset Generation Strategies

 Reduce the number of candidates (M)

– Complete search: M=2d
– Use pruning techniques to reduce M

 Reduce the number of transactions (N)

– Reduce size of N as the size of itemset increases
– Used by DHP and vertical-based mining algorithms

 Reduce the number of comparisons (NM)

– Use efficient data structures to store the candidates or
transactions
– No need to match every candidate against every transaction

3/8/2021 Introduction to Data Mining, 2 nd Edition 22

The Apriori Principle

 Apriori principle:
– If an itemset is frequent, then all of its subsets must also
be frequent

 For example:
– {Milk, Bread, Diapers} is a frequent itemset

means Milk, Bread, Diapers support is high

thus {Milk, Bread, Diapers} occur in many
transaction

3/8/2021 Introduction to Data Mining, 2 nd Edition 23

The Apriori Principle

 Apriori principle:

If {Milk, Bread, Diapers} occurs in many transactions

So does {Milk} , {Bread}, {Diapers}

and

{Milk, Bread}, {Milk, Diapers}, {Bread, Diapers}

3/8/2021 Introduction to Data Mining, 2 nd Edition 24

The Apriori Principle

3/8/2021 Introduction to Data Mining, 2 nd Edition 25

The Apriori Principle

 Apriori principle:
– If an itemset is infrequent, then all of its supersets must
also be infrequent

 For example:
– if {Coke} , {Jam} is infrequent itemset

means {Coke} , {Jam} support is low

thus {Coke, Jam} also don’t occur in many
transaction

3/8/2021 Introduction to Data Mining, 2 nd Edition 26

The Apriori Principle

3/8/2021 Introduction to Data Mining, 2 nd Edition 27

The Apriori Principle

 Support-Based Pruning.
– Trimming the exponential search space based on the
support measure.
– Based on Anti-Monotone Property

Support of an itemset never exceeds the support of
it’s subsets:

I : set of items,

J = 2I (power set) // all possible combinations

S (support) is anti-monotone (or downward closed)

X , Y : ( X  Y )  s ( X ) s(Y )

Any anti-monotone measured can be used in mining
3/8/2021 Introduction to Data Mining, 2 nd Edition 28
The Apriori Principle

 Monotonicity Property
 I : set of items,
 J = 2I (power set) // all possible combinations
– f is monotone (or upward closed)

∀X, Y ∈ J : (X ⊆ Y ) → f (X) ≤ f (Y )

3/8/2021 Introduction to Data Mining, 2 nd Edition 29

Apriori Algorithm

 Uses Apriori Principle

 Support Based Pruning

 Control exponential growth of candidate itemset

3/8/2021 Introduction to Data Mining, 2 nd Edition 30

TID Items

Apriori Algorithm 1
2
3
Bread, Milk
Beer, Bread, Diaper, Eggs
Beer, Coke, Diaper, Milk
4 Beer, Bread, Diaper, Milk
5 Bread, Coke, Diaper, Milk

3/8/2021 Introduction to Data Mining, 2 nd Edition 31

TID Items

Apriori Algorithm 1
2
3
Bread, Milk
Beer, Bread, Diaper, Eggs
Beer, Coke, Diaper, Milk
4 Beer, Bread, Diaper, Milk
5 Bread, Coke, Diaper, Milk

If every subset is considered,

6
C1 + 6C2 + 6C3
6 + 15 + 20 = 41
With support-based pruning,
3/8/20216 + 6 + 1Introduction
= 13 to Data Mining, 2 nd Edition 32
Illustrating Apriori Principle

null

A B C D E

AB AC AD AE BC BD BE CD CE DE

Found to be
Infrequent
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Pruned
ABCDE
supersets
3/8/2021 Introduction to Data Mining, 2 nd Edition 34
Illustrating Apriori Principle

TID
TID Items
Items Items (1-itemsets)
1 Bread, Milk
Item Count
2 Beer, Bread,
B read, Diaper, Eggs
Bread 4
3 Beer, Coke, Diaper, Milk Coke 2
4 B read, Diaper, Milk
Beer, Bread, Milk 4
5 Bread, Coke, Diaper, Milk Beer 3
Diaper 4
Eggs 1

Minimum Support = 3

If every subset is considered,

6
C1 + 6C2 + 6C3
6 + 15 + 20 = 41
With support-based pruning,
6 + 6 + 4 = 16

3/8/2021 Introduction to Data Mining, 2 nd Edition 35

Illustrating Apriori Principle

TID Items
Items (1-itemsets)
1 Bread, Milk
2 Beer, Bread, Diaper, Eggs Item Count
Bread 4
3 Beer, Coke, Diaper, Milk
Coke 2
4 Beer, Bread, Diaper, Milk Milk 4
5 Bread, Coke, Diaper, Milk Beer 3
Diaper 4
Eggs 1

Minimum Support = 3

If every subset is considered,

6
C1 + 6C2 + 6C3
6 + 15 + 20 = 41
With support-based pruning,
6 + 6 + 4 = 16

3/8/2021 Introduction to Data Mining, 2 nd Edition 36

Illustrating Apriori Principle TID
1
Items
Bread, Milk
2 Beer, Bread, Diaper, Eggs
3 Beer, Coke, Diaper, Milk
Milk
4 Beer, Bread, Diaper, Milk
Milk
5 Bread, Coke, Diaper, Milk
Milk

Item Count Items (1-itemsets)

Bread 4
Coke 2
Milk 4 Itemset Pairs (2-itemsets)
Beer 3 {Bread,Milk}
Diaper 4 {Bread, Beer } (No need to generate
Eggs 1 {Bread,Diaper}
{Beer, Milk}
candidates involving Coke
{Diaper, Milk} or Eggs)
{Beer,Diaper}

Minimum Support = 3

If every subset is considered,

6
C1 + 6C2 + 6C3
6 + 15 + 20 = 41
With support-based pruning,
6 + 6 + 4 = 16

3/8/2021 Introduction to Data Mining, 2 nd Edition 37

Illustrating Apriori Principle TID
1
Items
Bread, Milk
2 Beer, Bread, Diaper, Eggs
3 Beer, Coke, Diaper, Milk
Milk
4 Beer, Bread, Diaper, Milk
Milk
5 Bread, Coke, Diaper, Milk
Milk

Item Count Items (1-itemsets)

Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4 {Beer, Bread} 2 (No need to generate
Eggs 1 {Bread,Diaper} 3
candidates involving Coke
{Beer,Milk} 2
{Diaper,Milk} 3 or Eggs)
{Beer,Diaper} 3
Minimum Support = 3

If every subset is considered,

6
C1 + 6C2 + 6C3
6 + 15 + 20 = 41
With support-based pruning,
6 + 6 + 4 = 16

3/8/2021 Introduction to Data Mining, 2 nd Edition 38

Illustrating Apriori Principle TID
1
Items
Bread, Milk
2 Beer, Bread, Diaper, Eggs
3 Beer, Coke, Diaper, Milk
Milk
4 Beer, Bread, Diaper, Milk
Milk
5 Bread, Coke, Diaper, Milk
Milk

Item Count Items (1-itemsets)

Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4 {Bread,Beer} 2 (No need to generate
Eggs 1
{Bread,Diaper} 3 candidates involving Coke
{Milk,Beer} 2 or Eggs)
{Milk,Diaper} 3
{Beer,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)
If every subset is considered, Itemset
6
C1 + 6C2 + 6C3 { Beer, Diaper, Milk}
{ Beer,Bread,Diaper}
6 + 15 + 20 = 41 {Bread,Diaper,Milk}
With support-based pruning, { Beer, Bread, Milk}
6 + 6 + 4 = 16

3/8/2021 Introduction to Data Mining, 2 nd Edition 39

Illustrating Apriori Principle TID
1
Items
Bread, Milk
2 Beer, Bread, Diaper, Eggs
3 Beer, Coke, Diaper, Milk
Milk
4 Beer, Bread, Diaper, Milk
Milk
5 Bread, Coke, Diaper, Milk
Milk

Item Count Items (1-itemsets)

If every subset is considered, Itemset Count

6
C1 + 6C2 + 6C3 { Beer, Diaper, Milk} 2
{ Beer,Bread, Diaper} 2
6 + 15 + 20 = 41 {Bread, Diaper, Milk} 2
With support-based pruning, {Beer, Bread, Milk} 1
6 + 6 + 4 = 16

3/8/2021 Introduction to Data Mining, 2 nd Edition 40

Illustrating Apriori Principle TID
1
Items
Bread, Milk
2 Beer, Bread, Diaper, Eggs
3 Beer, Coke, Diaper, Milk
Milk
4 Beer, Bread, Diaper, Milk
Milk
5 Bread, Coke, Diaper, Milk
Milk

Item Count Items (1-itemsets)

Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4 {Bread,Beer} 2 (No need to generate
Eggs 1
{Bread,Diaper} 3 candidates involving Coke
{Milk,Beer} 2 or Eggs)
{Milk,Diaper} 3
{Beer,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)
If every subset is considered, Itemset Count
6
C1 + 6C2 + 6C3 { Beer, Diaper, Milk} 2
6 + 15 + 20 = 41 { Beer,Bread, Diaper} 2
{Bread, Diaper, Milk} 2
With support-based pruning,
{Beer, Bread, Milk} 1
6 + 6 + 4 = 16
6 + 6 + 1 = 13

3/8/2021 Introduction to Data Mining, 2 nd Edition 41

Apriori Algorithm


Method:

– Let k=1
– Generate frequent itemsets of length 1
– Repeat until no new frequent itemsets are identified

Generate length(k+1) candidate itemsets from length k
frequent itemsets

Prune candidate itemsets containing subset of length k
that are infrequent

Count the support of each candidate by scanning the DB

Eliminate candidates that are infrequent, leaving only
those that are frequent

3/8/2021 Introduction to Data Mining, 2 nd Edition 42

Support Counting: An Example
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5},
{3 5 6}, {3 5 7}, {6 8 9}, {3 6 7}, {3 6 8}

How many of these itemsets are supported by transaction (1,2,3,5,6)?

Transaction, t
1 2 3 5 6

Level 1
1 2 3 5 6 2 3 5 6 3 5 6

Level 2

12 3 5 6 13 5 6 15 6 23 5 6 25 6 35 6

123
135 235
125 156 256 356
136 236
126

Level 3 Subsets of 3 items

3/8/2021 Introduction to Data Mining, 2 nd Edition 43
Support Counting Using a Hash Tree
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5},
{3 5 6}, {3 5 7}, {6 8 9}, {3 6 7}, {3 6 8}
You need:
 Hash function
 Max leaf size: max number of itemsets stored in a leaf node (if number
of candidate itemsets exceeds max leaf size, split the node)

Hash function 234

3,6,9 567
1,4,7 145 345 356 367
136 368
2,5,8 357
124 689
457 125 159
458
3/8/2021 Introduction to Data Mining, 2 nd Edition 44
Support Counting Using a Hash Tree

Hash Function Candidate Hash Tree

1,4,7 3,6,9

2,5,8

234
567

145 136
345 356 367
Hash on
357 368
1, 4 or 7
124 159 689
125
457 458

3/8/2021 Introduction to Data Mining, 2 nd Edition 45

Support Counting Using a Hash Tree

Hash Function Candidate Hash Tree

1,4,7 3,6,9

2,5,8

234
567

145 136
345 356 367
Hash on
357 368
2, 5 or 8
124 159 689
125
457 458

3/8/2021 Introduction to Data Mining, 2 nd Edition 46

Support Counting Using a Hash Tree

Hash Function Candidate Hash Tree

1,4,7 3,6,9

2,5,8

234
567

145 136
345 356 367
Hash on
357 368
3, 6 or 9
124 159 689
125
457 458

3/8/2021 Introduction to Data Mining, 2 nd Edition 47

Support Counting Using a Hash Tree

Hash Function
1 2 3 5 6 transaction

1+ 2356
2+ 356 1,4,7 3,6,9

2,5,8
3+ 56

234
567

145 136
345 356 367
357 368
124 159 689
125
457 458

3/8/2021 Introduction to Data Mining, 2 nd Edition 48

Support Counting Using a Hash Tree

Hash Function
1 2 3 5 6 transaction

1+ 2356
2+ 356 1,4,7 3,6,9
12+ 356 2,5,8
3+ 56
13+ 56
234
15+ 6 567

145 136
345 356 367
357 368
124 159 689
125
457 458

3/8/2021 Introduction to Data Mining, 2 nd Edition 49

Support Counting Using a Hash Tree

Hash Function
1 2 3 5 6 transaction

1+ 2356
2+ 356 1,4,7 3,6,9
12+ 356 2,5,8
3+ 56
13+ 56
234
15+ 6 567

145 136
345 356 367
357 368
124 159 689
125
457 458
Match transaction against 11 out of 15 candidates
3/8/2021 Introduction to Data Mining, 2 nd Edition 50
Rule Generation

 Given a frequent itemset L, find all non-empty

subsets f L such that f L – f satisfies the
minimum confidence requirement
– If {A,B,C,D} is a frequent itemset, candidate rules:
ABC 
D, ABD 
C, ACD 
B, BCD 
A,
ABCD,B 
ACD,C 
ABD, DABC
AB 
CD,AC BD, AD BC, BC 
AD,
BD 
AC, CD 
AB,

 If |L| = k, then there are 2k – 2 candidate

association rules (ignoring L and L)

3/8/2021 Introduction to Data Mining, 2 nd Edition 51

Rule Generation

 In general, confidence does not have an anti-

monotone property
c(ABC 
D) can be larger or smaller than c(AB 
D)

 But confidence of rules generated from the same

itemset has an anti-monotone property
– E.g., Suppose {A,B,C,D} is a frequent 4-itemset:

c(ABC D) c(AB CD) c(A BCD)

– Confidence is anti-monotone w.r.t. number of items

on the RHS of the rule
3/8/2021 Introduction to Data Mining, 2 nd Edition 52
Rule Generation for Apriori Algorithm

Lattice of rules
ABCD=>{ }
Low
Confidence
Rule
BCD=>A ACD=>B ABD=>C ABC=>D

CD=>AB BD=>AC BC=>AD AD=>BC AC=>BD AB=>CD

D=>ABC C=>ABD B=>ACD A=>BCD

Pruned
Rules

3/8/2021 Introduction to Data Mining, 2 nd Edition 53

Association Analysis: Basic Concepts

and Algorithms

Algorithms and Complexity

3/8/2021 Introduction to Data Mining, 2 nd Edition 54

Factors Affecting Complexity of Apriori

 Choice of minimum support threshold

 Dimensionality (number of items) of the data set

 Size of database

 Average transaction width

–

3/8/2021 Introduction to Data Mining, 2 nd Edition 55

 Size of database TID Items

–
1 Bread, Milk
2 Beer, B read, Diaper, Eggs
 Average transaction width 3 Beer, Coke, Diaper, Milk
– 4 Beer, B read, Diaper, Milk
5 Bread, Coke, Diaper, Milk

3/8/2021 Introduction to Data Mining, 2 nd Edition 56

Impact of Support Based Pruning

TID Items
Items (1-itemsets)
1 Bread, Milk
2 Item Count
Beer, Bread, Diaper, Eggs
Bread 4
3 Beer, Coke, Diaper, Milk Coke 2
4 Beer, Bread, Diaper, Milk Milk 4
5 Bread, Coke, Diaper, Milk Beer 3
Diaper 4
Eggs 1

Minimum Support = 3
If every subset is Minimum Support = 2
considered,
6
C1 + 6C2 + 6C3
If every subset is considered,
6 + 15 + 20 = 41 6
C1 + 6C2 + 6C3 + 6C4
With support-based
6 + 15 + 20 +15 = 56
pruning,
6 + 6 + 4 = 16

3/8/2021 Introduction to Data Mining, 2 nd Edition 57

3/8/2021 Introduction to Data Mining, 2 nd Edition 58

Factors Affecting Complexity of
Apriori
 Choice of minimum support threshold
– lowering support threshold results in more frequent itemsets
– this may increase number of candidates and max length of
frequent itemsets
 Dimensionality (number of items) of the data set
– More space is needed to store support count of itemsets
– if number of frequent itemsets also increases, both computation
and I/O costs may also increase
 Size of database
– run time of algorithm increases with number of transactions
 Average transaction width
TID Items
1 Bread, Milk
2 Beer, Bread, Diaper, Eggs
3 Beer, Coke, Diaper, Milk
4 Beer, Bread, Diaper, Milk
5 Bread, Coke, Diaper, Milk
3/8/2021 Introduction to Data Mining, 2 nd Edition 59
Factors Affecting Complexity of
Apriori
 Choice of minimum support threshold
– lowering support threshold results in more frequent itemsets
– this may increase number of candidates and max length of
frequent itemsets
 Dimensionality (number of items) of the data set
– More space is needed to store support count of itemsets
– if number of frequent itemsets also increases, both computation
and I/O costs may also increase
 Size of database
– run time of algorithm increases with number of transactions
 Average transaction width
– transaction width increases the max length of frequent itemsets
– number of subsets in a transaction increases with its width,
increasing computation time for support counting

3/8/2021 Introduction to Data Mining, 2 nd Edition 60

Factors Affecting Complexity of
Apriori

3/8/2021 Introduction to Data Mining, 2 nd Edition 61

Compact Representation of Frequent Itemsets

 Some frequent itemsets are redundant because their

supersets are also frequent
Consider the following data set. Assume support threshold =5
TID A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

 10 
10
Number of frequent itemsets 3   
k
k 1

 Need a compact representation

3/8/2021 Introduction to Data Mining, 2 nd Edition 62

Maximal Frequent Itemset
An itemset is maximal frequent if it is frequent and none of its
immediate supersets is frequent null

Maximal A B C D E
Itemsets

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Infrequent
Itemsets Border
ABCD
E

3/8/2021 Introduction to Data Mining, 2 nd Edition 63

Chap5 Basic Association Analysis
No ratings yet
Chap5 Basic Association Analysis
105 pages
Chap5 Basic Association Analysis
No ratings yet
Chap5 Basic Association Analysis
105 pages
Unit 4 DWM by DR KSR Association - Analysis
No ratings yet
Unit 4 DWM by DR KSR Association - Analysis
68 pages
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
102 pages
Chap5-Association Analysis
No ratings yet
Chap5-Association Analysis
29 pages
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
104 pages
Chapter 5
No ratings yet
Chapter 5
37 pages
Wk. 10. Association Rule Mining (07.12.2020)
No ratings yet
Wk. 10. Association Rule Mining (07.12.2020)
92 pages
Chap5 Basic Association Analysis
No ratings yet
Chap5 Basic Association Analysis
145 pages
Rule Mining
No ratings yet
Rule Mining
20 pages
Chap5-Association Analysis
No ratings yet
Chap5-Association Analysis
102 pages
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
65 pages
Association Rule Mining
No ratings yet
Association Rule Mining
92 pages
Chap6 Basic Association Analysis
No ratings yet
Chap6 Basic Association Analysis
82 pages
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
82 pages
Association
No ratings yet
Association
54 pages
BITS WASE Data Mining Session 5 PDF
No ratings yet
BITS WASE Data Mining Session 5 PDF
83 pages
Association Rules & Frequent Itemsets: The Market-Basket Problem
No ratings yet
Association Rules & Frequent Itemsets: The Market-Basket Problem
5 pages
Unit 3 - Asso Rule Mining
No ratings yet
Unit 3 - Asso Rule Mining
27 pages
Dmunit 2
No ratings yet
Dmunit 2
85 pages
Chap6 Basic Association Analysis
No ratings yet
Chap6 Basic Association Analysis
82 pages
Associationunit 3
No ratings yet
Associationunit 3
15 pages
Association Rule Mining Basics
No ratings yet
Association Rule Mining Basics
102 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Dataanalytics Unit-4
No ratings yet
Dataanalytics Unit-4
23 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
5 pages
Unit 2
No ratings yet
Unit 2
14 pages
Lect 6
No ratings yet
Lect 6
74 pages
Association Rule
No ratings yet
Association Rule
17 pages
9 Association
No ratings yet
9 Association
56 pages
Association Analysis: Basic Concepts and Algorithms: Market-Basket Transactions
No ratings yet
Association Analysis: Basic Concepts and Algorithms: Market-Basket Transactions
42 pages
1.assosiation Rules
No ratings yet
1.assosiation Rules
21 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
30 pages
Basic Association Analysis
No ratings yet
Basic Association Analysis
58 pages
Association Rule Mining Task
No ratings yet
Association Rule Mining Task
40 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
Association Rule
No ratings yet
Association Rule
22 pages
Data Mining Association Analysis
No ratings yet
Data Mining Association Analysis
18 pages
DSTBD 9-DMassrules
No ratings yet
DSTBD 9-DMassrules
98 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
CH 5
No ratings yet
CH 5
53 pages
Mining Frequent Pattern
No ratings yet
Mining Frequent Pattern
36 pages
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
No ratings yet
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
45 pages
Arm PPT
No ratings yet
Arm PPT
15 pages
Unit 5
No ratings yet
Unit 5
40 pages
Association Rule Mining Basics
No ratings yet
Association Rule Mining Basics
17 pages
Slides
No ratings yet
Slides
92 pages
CS2202 AssociationRuleMining
No ratings yet
CS2202 AssociationRuleMining
59 pages
TMK - DWDM - Unit 4. From Government Engineering College
No ratings yet
TMK - DWDM - Unit 4. From Government Engineering College
176 pages
DMDW Unit 4 Association 29.12.2020
No ratings yet
DMDW Unit 4 Association 29.12.2020
31 pages
Web Minng - Mining Association Rules in Large Databases
No ratings yet
Web Minng - Mining Association Rules in Large Databases
108 pages
Data Mining and Data Analytics Unit-II
No ratings yet
Data Mining and Data Analytics Unit-II
26 pages
04 Frequent Patterns Analysis
No ratings yet
04 Frequent Patterns Analysis
37 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Data Mining: Association Rules
No ratings yet
Data Mining: Association Rules
43 pages
Data Mining for Computer Science Students
No ratings yet
Data Mining for Computer Science Students
45 pages
Epidemiological Study of Congenital Heart Diseases
No ratings yet
Epidemiological Study of Congenital Heart Diseases
17 pages
Topology Concepts
No ratings yet
Topology Concepts
12 pages
Polynomial Rings and Field Theory
No ratings yet
Polynomial Rings and Field Theory
19 pages
Report
No ratings yet
Report
20 pages
Swish Max Unleashed
No ratings yet
Swish Max Unleashed
83 pages
C 100 Dev
No ratings yet
C 100 Dev
10 pages
DRAM Controller Design Guide
No ratings yet
DRAM Controller Design Guide
1 page
MATRICES 3x3 - Equations
No ratings yet
MATRICES 3x3 - Equations
28 pages
Chapter-2-Static of Data-1
No ratings yet
Chapter-2-Static of Data-1
13 pages
Chemical Kinetics Worksheet Answers Set 3
No ratings yet
Chemical Kinetics Worksheet Answers Set 3
7 pages
This Little-Known PDF Parsing Library Will Save Enterprises Millions by Michael Ryaboy Jun, 2025
No ratings yet
This Little-Known PDF Parsing Library Will Save Enterprises Millions by Michael Ryaboy Jun, 2025
1 page
SKN 5
No ratings yet
SKN 5
7 pages
Reconquest Manuscript PDF
No ratings yet
Reconquest Manuscript PDF
78 pages
0625 s17 QP 23 PDF
No ratings yet
0625 s17 QP 23 PDF
20 pages
Web Portal For Student Information System: Prepared by
No ratings yet
Web Portal For Student Information System: Prepared by
16 pages
11th Mathematics Public QP
No ratings yet
11th Mathematics Public QP
117 pages
Van Berkel Bos - Diagrams
No ratings yet
Van Berkel Bos - Diagrams
4 pages
Faculty of Science and Technology Savitribai Phule Pune University Maharashtra, India
No ratings yet
Faculty of Science and Technology Savitribai Phule Pune University Maharashtra, India
15 pages
EIM 4 Lesson 2 Cable Tray
100% (2)
EIM 4 Lesson 2 Cable Tray
64 pages
Characterization and Performance Comparison of Ripple-Based Control For Voltage Regulator Modules
No ratings yet
Characterization and Performance Comparison of Ripple-Based Control For Voltage Regulator Modules
8 pages
IBIS Io Buffer
No ratings yet
IBIS Io Buffer
66 pages
Nurul Syazwani Binti Mohamad Fauzi DSK1A 18DSK2F2012 Exercise Page 44
No ratings yet
Nurul Syazwani Binti Mohamad Fauzi DSK1A 18DSK2F2012 Exercise Page 44
4 pages
348 15035 BA114 2011 4 2 1 Sheet Ser - Noans
No ratings yet
348 15035 BA114 2011 4 2 1 Sheet Ser - Noans
13 pages
Seminar Topics
No ratings yet
Seminar Topics
5 pages
District of Bocaue: Grade 1 Name of School N I HS LS X MPS
No ratings yet
District of Bocaue: Grade 1 Name of School N I HS LS X MPS
3 pages
5.3-Lumineau, F., Wang, W., & Schilke, O. - (2021) - Blockchain Governance-A New Way of Organizing Collaborations. Organization Science, 32 (2) .
No ratings yet
5.3-Lumineau, F., Wang, W., & Schilke, O. - (2021) - Blockchain Governance-A New Way of Organizing Collaborations. Organization Science, 32 (2) .
22 pages
Change in PV, DV C DP: ENM200 Tutorial Solutions Reservoir Rock Properties 2010
No ratings yet
Change in PV, DV C DP: ENM200 Tutorial Solutions Reservoir Rock Properties 2010
3 pages
EPTD W8-L2 Corona in Transmission Lines MAZS
No ratings yet
EPTD W8-L2 Corona in Transmission Lines MAZS
17 pages
Categorical Encoding & Outlier Detection
No ratings yet
Categorical Encoding & Outlier Detection
21 pages
K (B) FD - TG4V-3, - 12 Series BC455580201423en-000102
No ratings yet
K (B) FD - TG4V-3, - 12 Series BC455580201423en-000102
19 pages