Association Rule Mining
Outline
◘ What is Association Rule Mining?
◘ Confidence and Support
◘ Market Basket Analysis
◘ Apriori Algorithm
◘ Types of Association Rules
– Single-Dimensional vs. Multi-Dimensional Association Rules
– Multi-Level Association Rules
What is Association Rule Mining?
◘ Association Rule Mining:
– Finding frequent patterns, associations, or correlations among sets of items.
◘ First proposed by Agrawal, Imielinski and Swami [AIS93]
◘ Uncover relationships among data
1995 Milk and Zzzz... 1998 Milk and
Diaper sell Diaper sell
together! together!
What is Association Rule Mining?
◘ Analyze all relationships between items.
◘ Example: When people buy diapers they also buy beer 60% of the time
Association Rule Mining
Market Basket Analysis
POS Transactions Co-occurrence of Products
Customer Items Purchased OJ Window Milk Soda Detergent
1 OJ, soda cleaner
2 Milk, OJ, window cleaner OJ 4 1 1 2 1
3 OJ Window cleaner 1 2 1 1 0
4 OJ, detergent, soda Milk 1 1 1 0 0
5 Window cleaner, soda Soda 2 1 0 3 1
Detergent 1 0 0 1 2
Simple patterns:
1. OJ and soda are more likely purchased together
than any other two items
2. Detergent is never purchased with milk or
window cleaner
Association Rule Example 1
◘ Example:
– 5% of customers buy all of them together
– 80% of customers who buy bread and milk also buy cheese
Bread, Milk→ Cheese [support=5%, confidence=80%]
Confidence and Support
I = i1, i2, …, im: set of items
D : database of transactions
◘ Association rule: X Y
here X I, Y I and X Y = .
◘ Rule X Y has a support s in D
if s% of transactions in D contain X Y.
◘ Rule X Y has a confidence c in D
if c% of transactions in D that contain X also contain Y.
Count (X Y) Count (X Y)
Supp (X Y) = Conf (X Y) =
|D| Count (X)
support ≥ minsup threshold
confidence ≥ minconf threshold
Confidence and Support
◘ Support (s)
– Fraction of transactions that contain both X and Y
Count (X Y)
Supp (X Y) =
|D|
◘ Confidence (c)
– Measures how often items in Y appear in transactions that contain X
Count (X Y) Customer
Conf (X Y) = buys both
Customer
Count (X) buys diaper
Customer
buys beer
Association Rule Example 2
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Diaper → Beer
Support = #{Diaper, Beer} / Total # of record = 3/5
Confidence = # {Diaper, Beer}/ # {Diaper} = 3/4
Association Rule Example 3
1 {{cucumber, parsley, onion, tomato, salt, bread},
2 {tomato, cucumber, parsley},
3 {tomato, cucumber, olives, onion, parsley},
4 {tomato, cucumber, onion, bread},
5 {tomato, salt, onion},
6 {bread, cheese}
7 {tomato, cheese, onion, cucumber}
8 {bread, butter}}
Count (X Y)
◘ Supp(tomato onion) = 5 / 8 = 0.62 Supp (X Y) =
|D|
◘ Conf(tomato onion) = 5 / 6 = 0.83 Count (X Y)
Conf (X Y) =
Count (X)
Association Rule Example 4
Transaction ID Items Bought
1000 A,B,C Count (X Y)
Supp (X Y) =
2000 A,C |D|
3000 A,D Count (X Y)
4000 B,E,F Conf (X Y) =
Count (X)
A C (50%, 66.6%)
C A (50%, 100%)
Min. support 50%
Frequent Itemset Support
{A} 75%
{B} 50%
{C} 50%
{A,C} 50%
Association Rule Example 5
TransID Items
1 cucumber, parsley, onion, tomato, salt, bread
2 tomato, cucumber, parsley
3 tomato, cucumber, olives, onion, parsley
4 tomato, cucumber, onion, bread
5 tomato, salt, onion
6 bread, cheese
7 tomato, cheese, onion, cucumber
8 bread, butter
◘ Find all relations with minimum support 50%
{bread}
Count (X Y)
{cucumber} Supp (X Y) =
{onion} |D|
{tomato}
{cucumber, onion}
{cucumber, tomato}
{onion, tomato}
{cucumber, onion, tomato}
Association Rule Example 7
◘ Assume: t1: Beef, Chicken, Milk
minsup = 30% t2: Beef, Cheese
minconf = 80% t3: Cheese, Boots
t4: Beef, Chicken, Cheese
◘ An example frequent itemset: t5: Beef, Chicken, Clothes, Cheese, Milk
t6: Chicken, Clothes, Milk
{Chicken, Clothes, Milk} [sup = 3/7]
t7: Chicken, Milk, Clothes
◘ Association rules from the itemset:
Clothes → Milk, Chicken [sup = 3/7, conf = 3/3]
Milk → Clothes, Chicken [sup = 3/7, conf = 3/4]
Chicken → Clothes, Milk [sup = 3/7, conf = 3/5]
… …
Clothes, Chicken → Milk [sup = 3/7, conf = 3/3]
Finding Association Rules
◘ A user can ask for rules with minimum support minSup and
minimum confidence minConf.
– Firstly, all frequent itemsets with support > minSup are computed.
– Secondly, rules are generated using the frequent itemsets, and
checked for minConf.
The Apriori Algorithm
Input
The market base transaction dataset.
Process
– Determine large 1-itemsets.
– Repeat until no new large 1-itemsets are identified.
– Generate (k+1) length candidate itemsets from length k large itemsets.
– Prune candidate itemsets that are not large.
– Count the support of each candidate itemset.
– Eliminate candidate itemsets that are small.
Output
Itemsets that are “large” and qualify the min support and min confidence
thresholds. Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
The Apriori Algorithm — Example
Item Count Items (1-itemsets)
Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4
{Bread,Beer} 2
Eggs 1
{Bread,Diaper} 3
{Milk,Beer} 2
{Milk,Diaper} 3
{Beer,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)
Itemset Count
{Bread,Milk,Diaper} 3
{Milk,Diaper,Beer} 2
Apriori Example
1-itemset Pruning
2-itemset Pruning
3-itemset
Minimum Support : 2
Apriori Principle
● Apriori Principle: Candidate Pruning
If {a,b} is infrequent,
then all it supersets
are infrequent
Applications
◘ Market Basket Analysis
– Input: Basket Data - Collection of records consisting of transaction identifier
and the items bought in a transaction.
– Items: Products
– Goal: To identify items that are bought together
◘ Supermarket Shelf Management
– For Example:
Many customers (%75) who buy diapers also buy beer on every Friday.
Moved the beer and snacks Increased sales on peanuts and
such as peanuts and pretzels → pretzels by more that 27%
next to the diapers
Applications
◘ Text Mining
– Input: documents;
– Items: words in those documents
– Goal: Find words that appear together unusually frequently, i.e. linked concepts.
◘ Web Mining
– Input: Web pages (a web page p)
– Items: Pages that link to p .
– Goal: Pages with many of the same links may be mirrors or about the same topic.
◘ Healty and Drug Industry
– What kinds of DNA are sensitive to this new drug?
◘ Other Applications
– Catalog design
– Promotion decision
– Genomic Data
– Customer Profiling
Single-Dimensional vs Multi-Dimensional
◘ Single-Dimensional Rules:
– buys(X, “milk”) buys(X, “bread”)
◘ Multi-Dimensional rules:
2 dimensions or predicates
– age(x, “30..39”) ^ income(x, “42..48K”) → buys(x, “car”) [1%, 75%]
– Inter-dimension assoc. rules (no repeated predicates)
• age(X,”19-25”) occupation(X,“student”) buys(X,“coke”)
– hybrid-dimension assoc. rules (repeated predicates)
• age(X,”19-25”) buys(X, “popcorn”) buys(X, “coke”)
Multilevel Association Rules
all
computer
computer software printer
accessory
financial
desktop laptop educational color b/w wrist pad mouse
management
IBM Dell Sony Toshiba Microsoft … … HP … Sony … Ergoway Logitech
computer → printer [support=12%, confidence = 70%] (Level1)
desktop computer → color printer [support=8%, confidence = 70%] (Level2)
IBM desktop computer → HP color printer [support=2%, confidence=72%] (Level3)
Reduce Minimum Support