0% found this document useful (0 votes)
31 views22 pages

04-Association Rule Mining

Association Rule Mining involves discovering frequent patterns, associations, or correlations among sets of items, commonly applied in market basket analysis. Key concepts include support and confidence, which measure the frequency and reliability of item associations, respectively. The Apriori Algorithm is a popular method for identifying frequent itemsets and generating association rules based on specified minimum support and confidence thresholds.

Uploaded by

cessmania
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views22 pages

04-Association Rule Mining

Association Rule Mining involves discovering frequent patterns, associations, or correlations among sets of items, commonly applied in market basket analysis. Key concepts include support and confidence, which measure the frequency and reliability of item associations, respectively. The Apriori Algorithm is a popular method for identifying frequent itemsets and generating association rules based on specified minimum support and confidence thresholds.

Uploaded by

cessmania
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Association Rule Mining

Outline

◘ What is Association Rule Mining?


◘ Confidence and Support
◘ Market Basket Analysis
◘ Apriori Algorithm
◘ Types of Association Rules
– Single-Dimensional vs. Multi-Dimensional Association Rules
– Multi-Level Association Rules
What is Association Rule Mining?

◘ Association Rule Mining:


– Finding frequent patterns, associations, or correlations among sets of items.

◘ First proposed by Agrawal, Imielinski and Swami [AIS93]

◘ Uncover relationships among data

1995 Milk and Zzzz... 1998 Milk and


Diaper sell Diaper sell
together! together!
What is Association Rule Mining?

◘ Analyze all relationships between items.

◘ Example: When people buy diapers they also buy beer 60% of the time
Association Rule Mining
Market Basket Analysis
POS Transactions Co-occurrence of Products
Customer Items Purchased OJ Window Milk Soda Detergent
1 OJ, soda cleaner
2 Milk, OJ, window cleaner OJ 4 1 1 2 1
3 OJ Window cleaner 1 2 1 1 0
4 OJ, detergent, soda Milk 1 1 1 0 0
5 Window cleaner, soda Soda 2 1 0 3 1
Detergent 1 0 0 1 2

Simple patterns:
1. OJ and soda are more likely purchased together
than any other two items
2. Detergent is never purchased with milk or
window cleaner
Association Rule Example 1

◘ Example:
– 5% of customers buy all of them together
– 80% of customers who buy bread and milk also buy cheese

Bread, Milk→ Cheese [support=5%, confidence=80%]


Confidence and Support

I = i1, i2, …, im: set of items


D : database of transactions

◘ Association rule: X  Y
here X  I, Y  I and X Y = .

◘ Rule X  Y has a support s in D


if s% of transactions in D contain X  Y.

◘ Rule X  Y has a confidence c in D


if c% of transactions in D that contain X also contain Y.

Count (X  Y) Count (X  Y)
Supp (X  Y) = Conf (X  Y) =
|D| Count (X)

support ≥ minsup threshold


confidence ≥ minconf threshold
Confidence and Support

◘ Support (s)
– Fraction of transactions that contain both X and Y
Count (X  Y)
Supp (X  Y) =
|D|

◘ Confidence (c)
– Measures how often items in Y appear in transactions that contain X

Count (X  Y) Customer
Conf (X  Y) = buys both
Customer
Count (X) buys diaper

Customer
buys beer
Association Rule Example 2

TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

Diaper → Beer
Support = #{Diaper, Beer} / Total # of record = 3/5
Confidence = # {Diaper, Beer}/ # {Diaper} = 3/4
Association Rule Example 3

1 {{cucumber, parsley, onion, tomato, salt, bread},


2 {tomato, cucumber, parsley},
3 {tomato, cucumber, olives, onion, parsley},
4 {tomato, cucumber, onion, bread},
5 {tomato, salt, onion},
6 {bread, cheese}
7 {tomato, cheese, onion, cucumber}
8 {bread, butter}}

Count (X  Y)
◘ Supp(tomato  onion) = 5 / 8 = 0.62 Supp (X  Y) =
|D|
◘ Conf(tomato  onion) = 5 / 6 = 0.83 Count (X  Y)
Conf (X  Y) =
Count (X)
Association Rule Example 4
Transaction ID Items Bought
1000 A,B,C Count (X  Y)
Supp (X  Y) =
2000 A,C |D|
3000 A,D Count (X  Y)
4000 B,E,F Conf (X  Y) =
Count (X)

A  C (50%, 66.6%)
C  A (50%, 100%)
Min. support 50%
Frequent Itemset Support
{A} 75%
{B} 50%
{C} 50%
{A,C} 50%
Association Rule Example 5
TransID Items
1 cucumber, parsley, onion, tomato, salt, bread
2 tomato, cucumber, parsley
3 tomato, cucumber, olives, onion, parsley
4 tomato, cucumber, onion, bread
5 tomato, salt, onion
6 bread, cheese
7 tomato, cheese, onion, cucumber
8 bread, butter

◘ Find all relations with minimum support 50%


{bread}
Count (X  Y)
{cucumber} Supp (X  Y) =
{onion} |D|
{tomato}
{cucumber, onion}
{cucumber, tomato}
{onion, tomato}
{cucumber, onion, tomato}
Association Rule Example 7

◘ Assume: t1: Beef, Chicken, Milk


minsup = 30% t2: Beef, Cheese
minconf = 80% t3: Cheese, Boots
t4: Beef, Chicken, Cheese
◘ An example frequent itemset: t5: Beef, Chicken, Clothes, Cheese, Milk
t6: Chicken, Clothes, Milk
{Chicken, Clothes, Milk} [sup = 3/7]
t7: Chicken, Milk, Clothes

◘ Association rules from the itemset:


Clothes → Milk, Chicken [sup = 3/7, conf = 3/3]
Milk → Clothes, Chicken [sup = 3/7, conf = 3/4]
Chicken → Clothes, Milk [sup = 3/7, conf = 3/5]
… …
Clothes, Chicken → Milk [sup = 3/7, conf = 3/3]
Finding Association Rules

◘ A user can ask for rules with minimum support minSup and
minimum confidence minConf.

– Firstly, all frequent itemsets with support > minSup are computed.

– Secondly, rules are generated using the frequent itemsets, and


checked for minConf.
The Apriori Algorithm
Input
The market base transaction dataset.
Process
– Determine large 1-itemsets.
– Repeat until no new large 1-itemsets are identified.
– Generate (k+1) length candidate itemsets from length k large itemsets.
– Prune candidate itemsets that are not large.
– Count the support of each candidate itemset.
– Eliminate candidate itemsets that are small.
Output
Itemsets that are “large” and qualify the min support and min confidence
thresholds. Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
The Apriori Algorithm — Example

Item Count Items (1-itemsets)


Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4
{Bread,Beer} 2
Eggs 1
{Bread,Diaper} 3
{Milk,Beer} 2
{Milk,Diaper} 3
{Beer,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)
Itemset Count
{Bread,Milk,Diaper} 3
{Milk,Diaper,Beer} 2
Apriori Example

1-itemset Pruning

2-itemset Pruning

3-itemset

Minimum Support : 2
Apriori Principle
● Apriori Principle: Candidate Pruning

If {a,b} is infrequent,
then all it supersets
are infrequent
Applications

◘ Market Basket Analysis


– Input: Basket Data - Collection of records consisting of transaction identifier
and the items bought in a transaction.
– Items: Products
– Goal: To identify items that are bought together

◘ Supermarket Shelf Management


– For Example:
Many customers (%75) who buy diapers also buy beer on every Friday.

Moved the beer and snacks Increased sales on peanuts and


such as peanuts and pretzels → pretzels by more that 27%
next to the diapers
Applications

◘ Text Mining
– Input: documents;
– Items: words in those documents
– Goal: Find words that appear together unusually frequently, i.e. linked concepts.

◘ Web Mining
– Input: Web pages (a web page p)
– Items: Pages that link to p .
– Goal: Pages with many of the same links may be mirrors or about the same topic.

◘ Healty and Drug Industry


– What kinds of DNA are sensitive to this new drug?

◘ Other Applications
– Catalog design
– Promotion decision
– Genomic Data
– Customer Profiling
Single-Dimensional vs Multi-Dimensional

◘ Single-Dimensional Rules:
– buys(X, “milk”)  buys(X, “bread”)

◘ Multi-Dimensional rules:
 2 dimensions or predicates

– age(x, “30..39”) ^ income(x, “42..48K”) → buys(x, “car”) [1%, 75%]

– Inter-dimension assoc. rules (no repeated predicates)


• age(X,”19-25”)  occupation(X,“student”)  buys(X,“coke”)

– hybrid-dimension assoc. rules (repeated predicates)


• age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”)
Multilevel Association Rules

all

computer
computer software printer
accessory

financial
desktop laptop educational color b/w wrist pad mouse
management

IBM Dell Sony Toshiba Microsoft … … HP … Sony … Ergoway Logitech

computer → printer [support=12%, confidence = 70%] (Level1)


desktop computer → color printer [support=8%, confidence = 70%] (Level2)
IBM desktop computer → HP color printer [support=2%, confidence=72%] (Level3)
Reduce Minimum Support

You might also like