Data Mining
Association Analysis: Basic Concepts
and Algorithms
ASSOCIATION RULE MINING
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1
Association Rule Mining
● Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other
items in the transaction
Market-Basket transactions
Example of Association Rules
{Diaper} → {Beer},
{Milk, Bread} → {Eggs,Coke},
{Beer, Bread} → {Milk},
Implication means co-occurrence,
not causality!
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2
Definition: Frequent Itemset
● Itemset
– A collection of one or more items
◆ Example: {Milk, Bread, Diaper}
– k-itemset
◆ An itemset that contains k items
● Support count (σ)
– Frequency of occurrence of an itemset
– E.g. σ({Milk, Bread,Diaper}) = 2
● Support
– Fraction of transactions that contain an
itemset
– E.g. s({Milk, Bread, Diaper}) = 2/5
● Frequent Itemset
– An itemset whose support is greater
than or equal to a minsup threshold
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3
Definition: Association Rule
● Association Rule
– An implication expression of the form
X → Y, where X and Y are itemsets
– Example:
{Milk, Diaper} → {Beer}
● Rule Evaluation Metrics
– Support (s)
◆ Fraction of transactions that contain Example:
both X and Y
– Confidence (c)
◆ Measures how often items in Y
appear in transactions that
contain X
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4
Association Rule Mining Task
● Given a set of transactions T, the goal of
association rule mining is to find all rules having
– support ≥ minsup threshold
– confidence ≥ minconf threshold
● Brute-force approach:
– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf
thresholds
⇒ Computationally prohibitive!
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7
Mining Association Rules
Example of Rules:
{Milk,Diaper} → {Beer} (s=0.4, c=0.67)
{Milk,Beer} → {Diaper} (s=0.4, c=1.0)
{Diaper,Beer} → {Milk} (s=0.4, c=0.67)
{Beer} → {Milk,Diaper} (s=0.4, c=0.67)
{Diaper} → {Milk,Beer} (s=0.4, c=0.5)
{Milk} → {Diaper,Beer} (s=0.4, c=0.5)
Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but
can have different confidence
• Thus, we may decouple the support and confidence requirements
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8
Mining Association Rules
● Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support ≥ minsup
2. Rule Generation
– Generate high confidence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent itemset
● Frequent itemset generation is still
computationally expensive
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9
Frequent Itemset Generation
Given d items, there
are 2d possible
candidate itemsets
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10
Illustrating Apriori Principle
Found to be
Infrequent
Pruned
supersets
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11
Apriori Algorithm
● Method:
– Let k=1
– Generate frequent itemsets of length 1
– Repeat until no new frequent itemsets are identified
◆ Generate length (k+1) candidate itemsets from length k
frequent itemsets
◆ Prune candidate itemsets containing subsets of length k that
are infrequent
◆ Count the support of each candidate by scanning the DB
◆ Eliminate candidates that are infrequent, leaving only those
that are frequent
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12
Rule Generation
● Frequent itemset L=L1 U L2
● Deriving strong rules
– Consider a frequent 2 item set. {Bread,Milk}
– First identify all non empty proper subsets
– {Bread},{Milk}
– For each subset a rule is formed as follows
– {Bread}=>{Milk}
– {Milk=>{Bread}
● To determine which rules are strong
● Find the confidence
● Rule 1: {Bread}=>{Milk} :3/4 or 75%
● Rule 2:{Milk}=>{Bread} :3/4 or 75%
● If the confidence is greater than threshold then the rule is strong
(Assume the threshold to be 60% then both the rules are strong).
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13
Apriori example
Minimum Support :3
● Step 1: Data in the database
● Step 2: Calculate the support/frequency of all
items
● Step 3: Discard the items with minimum support
less than 3
● Step 4: Combine two items
● Step 5: Calculate the support/frequency of all
items
● Step 6: Discard the items with minimum support
less than 3
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14
Contd…
● Step 7: Combine three items and calculate their
support.
● Step 8: Discard the items with minimum support
of less than 3. So all itemsets are excluded
except “Eggs, Cold drink” because this itemset
has the support of 3.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15