0% found this document useful (0 votes)
18 views15 pages

Associationunit 3

associationunit3

Uploaded by

S SMRITI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views15 pages

Associationunit 3

associationunit3

Uploaded by

S SMRITI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Data Mining

Association Analysis: Basic Concepts


and Algorithms

ASSOCIATION RULE MINING

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1


Association Rule Mining

● Given a set of transactions, find rules that will predict the


occurrence of an item based on the occurrences of other
items in the transaction

Market-Basket transactions
Example of Association Rules

{Diaper} → {Beer},
{Milk, Bread} → {Eggs,Coke},
{Beer, Bread} → {Milk},

Implication means co-occurrence,


not causality!

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2


Definition: Frequent Itemset
● Itemset
– A collection of one or more items
◆ Example: {Milk, Bread, Diaper}
– k-itemset
◆ An itemset that contains k items
● Support count (σ)
– Frequency of occurrence of an itemset
– E.g. σ({Milk, Bread,Diaper}) = 2
● Support
– Fraction of transactions that contain an
itemset
– E.g. s({Milk, Bread, Diaper}) = 2/5
● Frequent Itemset
– An itemset whose support is greater
than or equal to a minsup threshold

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3


Definition: Association Rule
● Association Rule
– An implication expression of the form
X → Y, where X and Y are itemsets
– Example:
{Milk, Diaper} → {Beer}

● Rule Evaluation Metrics


– Support (s)
◆ Fraction of transactions that contain Example:
both X and Y
– Confidence (c)
◆ Measures how often items in Y
appear in transactions that
contain X

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4


Association Rule Mining Task

● Given a set of transactions T, the goal of


association rule mining is to find all rules having
– support ≥ minsup threshold
– confidence ≥ minconf threshold

● Brute-force approach:
– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf
thresholds
⇒ Computationally prohibitive!

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5


© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7
Mining Association Rules

Example of Rules:
{Milk,Diaper} → {Beer} (s=0.4, c=0.67)
{Milk,Beer} → {Diaper} (s=0.4, c=1.0)
{Diaper,Beer} → {Milk} (s=0.4, c=0.67)
{Beer} → {Milk,Diaper} (s=0.4, c=0.67)
{Diaper} → {Milk,Beer} (s=0.4, c=0.5)
{Milk} → {Diaper,Beer} (s=0.4, c=0.5)

Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but
can have different confidence
• Thus, we may decouple the support and confidence requirements
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8
Mining Association Rules

● Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support ≥ minsup

2. Rule Generation
– Generate high confidence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent itemset

● Frequent itemset generation is still


computationally expensive

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9


Frequent Itemset Generation

Given d items, there


are 2d possible
candidate itemsets
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10
Illustrating Apriori Principle

Found to be
Infrequent

Pruned
supersets
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11
Apriori Algorithm

● Method:

– Let k=1
– Generate frequent itemsets of length 1
– Repeat until no new frequent itemsets are identified
◆ Generate length (k+1) candidate itemsets from length k
frequent itemsets
◆ Prune candidate itemsets containing subsets of length k that
are infrequent
◆ Count the support of each candidate by scanning the DB
◆ Eliminate candidates that are infrequent, leaving only those
that are frequent

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12


Rule Generation

● Frequent itemset L=L1 U L2


● Deriving strong rules
– Consider a frequent 2 item set. {Bread,Milk}
– First identify all non empty proper subsets
– {Bread},{Milk}
– For each subset a rule is formed as follows
– {Bread}=>{Milk}
– {Milk=>{Bread}
● To determine which rules are strong
● Find the confidence
● Rule 1: {Bread}=>{Milk} :3/4 or 75%
● Rule 2:{Milk}=>{Bread} :3/4 or 75%
● If the confidence is greater than threshold then the rule is strong
(Assume the threshold to be 60% then both the rules are strong).

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13


Apriori example

Minimum Support :3
● Step 1: Data in the database
● Step 2: Calculate the support/frequency of all
items
● Step 3: Discard the items with minimum support
less than 3
● Step 4: Combine two items
● Step 5: Calculate the support/frequency of all
items
● Step 6: Discard the items with minimum support
less than 3
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14
Contd…

● Step 7: Combine three items and calculate their


support.
● Step 8: Discard the items with minimum support
of less than 3. So all itemsets are excluded
except “Eggs, Cold drink” because this itemset
has the support of 3.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15

You might also like