Association Rule Mining
&
Apriori Algorithm
Binu Jasim
Data Mining (Monsoon Sem-2014)
NIT Calicut
Association Analysis
 To find out Associations b/w Items/Objects
Market Basket Analysis
 One Application of Association Analysis
 Other: Bioinformatics, medical diagnosis etc.
Two key Challenges
 Efficiently mine enormous amounts of data for
association patterns
 Associations occurring due to chance should
be avoided
What is in a Super Market ?
 Items/Products  Lot of them !
- Denoted as I
 Transactions (T )- People Buying different
Items
A single transaction/a bill Contains a list of
Items customer i bought
-Denoted as tr i
Transaction Data
 Items (I): The Set of all items
I = { i1 , i2 , , i }
 Transactions
T = { tr1 , tr2 , , tr }
Transactions
tr  I
i1
i2
i3
i4
i5
tr1
tr2
tr3
tr4
tr5
Item Sets
 Items set is a collection of zero or more Items
 K-Item set contains K Items
 trj contains an item set X
if x  trj
Document Collection as Transaction
Data
 Treat words as Items & Each document as a
transaction
Word1
Word2
D1
D2
D2
:
:
:
Wordr
Images as Transaction Data
 Each Pixel as an Item and an Image as a
Transaction
 So A may be represented as
0111101010100111 -for 4x4 image resolution
Associations
 {A} -> {B} indicates Item B is also bought if
Item A is bought
 {A,B} -> {C} indicates C is bought if Items A &
B are bought together
 So We have X -> Y as associations
where X  Y = 
Total # Associations
 If we have d items
Then total # Associations = 3  2+1 + 1
 Eg: - Items set {A,B}
Possible Associations: {A} -> {B}
& {B} -> {A}
d = 2, so 32  23 + 1 = 2
Total # Associations
 I = {A, B, C}
d=3
Total # Associations = 33  24 + 1 = 12
{A} -> {B, C}
{B, C} -> {A}
{B} -> {A, C}
{A, C} -> {B}
{C} -> {A, B}
{A, B}-> {C}
{A}->{B}
{B}-> {A}
{A}->{C}
{C}-> {A}
{B}->{C}
{C}->{B}
Proof !
 Each item can go into either of the 3 boxes
 Antecedent and Consequent cant be empty
 Which gives 3  2+1 + 1
Support Count ()
 Support count of an Item set X is given as
() = | *trj : x  trj +|
Eg: , 
tr1
tr2
= 2,
= 0
Support & Confidence
 We have the association X-> Y s.t. X  Y = 
 Support(X->Y) =
 Confidence(X->Y) =
Support & Confidence Thresholds
 R: {C,D} -> {E} (Association Rule)
 Support(R) = 3/6 = 0.5 > minsup
 Confidence(R) =  = 0.75 > minconf
Why large support?
 Items people seldom buy can still have large
confidence. Eg:- {1GB usb} -> {headset}
- may give large confidence as support
count of {1 GB usb} is small
- But this transaction {1GB usb, headset}
together is rare, so small support
Why large confidence ?
 Confidence is a better measure than support
to indicate how often items are bought
together.
 Confidence(X->Y) also gives an estimate of
conditional probability of Y given X
Caution!: Correlation doesnt imply
Causation
 Just because a rule X->Y has large support and
large confidence,
X need not be the cause of Y.
It only implies correlation
Association Rule Mining Problem
 Given a set of transactions T, find all the rules
having
support > minsup & confidence > minconf
Eg;- minsup = 20%, minconf = 50%
Obvious:
minconf > minsup
Association Rule Mining
 Brute Force: List all the rules and compute
minsup & minconf
- Computationally expensive: O(3 )
 Better Strategy: Check minsup & minconf
for all the subsets: Still exponential: O(2 )
The Idea
 If {Milk, Bread, Butter} is frequent
then all of its subsets are also frequent
i.e. support =
({Milk,Bread,Butter})/N > minsup
then  ( Milk, Bread /N > minsup
Decomposed into 2 Sub Tasks
 Frequent Item Set Generation: Find all the
item sets which satisfy minsup threshold
 Rule Generation: Extract all the high
confidence rules out of the Frequent Item set
found in step 1