0% found this document useful (0 votes)
151 views24 pages

Data Mining for Market Analysis

The document discusses association rule mining and the Apriori algorithm. It describes how association rule mining can be used to find relationships between items in transactional data, like products purchased together in market basket analysis. The key challenges are efficiently mining large amounts of data to find these relationships while avoiding coincidental associations. It then introduces some concepts needed for association rule mining, like support count, support and confidence thresholds. Finally, it explains how the Apriori algorithm addresses the computation challenges by decomposing the problem into first finding frequent itemsets that satisfy a minimum support threshold before generating rules from those itemsets.

Uploaded by

Eric Gardner
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views24 pages

Data Mining for Market Analysis

The document discusses association rule mining and the Apriori algorithm. It describes how association rule mining can be used to find relationships between items in transactional data, like products purchased together in market basket analysis. The key challenges are efficiently mining large amounts of data to find these relationships while avoiding coincidental associations. It then introduces some concepts needed for association rule mining, like support count, support and confidence thresholds. Finally, it explains how the Apriori algorithm addresses the computation challenges by decomposing the problem into first finding frequent itemsets that satisfy a minimum support threshold before generating rules from those itemsets.

Uploaded by

Eric Gardner
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Association Rule Mining

&
Apriori Algorithm
Binu Jasim
Data Mining (Monsoon Sem-2014)
NIT Calicut

Association Analysis
To find out Associations b/w Items/Objects

Market Basket Analysis


One Application of Association Analysis

Other: Bioinformatics, medical diagnosis etc.

Two key Challenges


Efficiently mine enormous amounts of data for
association patterns
Associations occurring due to chance should
be avoided

What is in a Super Market ?


Items/Products Lot of them !
- Denoted as I

Transactions (T )- People Buying different


Items
A single transaction/a bill Contains a list of
Items customer i bought
-Denoted as tr i

Transaction Data
Items (I): The Set of all items

I = { i1 , i2 , , i }
Transactions

T = { tr1 , tr2 , , tr }

Transactions
tr I
i1

i2

i3

i4

i5

tr1

tr2

tr3

tr4

tr5

Item Sets
Items set is a collection of zero or more Items

K-Item set contains K Items


trj contains an item set X
if x trj

Document Collection as Transaction


Data
Treat words as Items & Each document as a
transaction
Word1

Word2

D1

D2

D2

:
:
:

Wordr

Images as Transaction Data


Each Pixel as an Item and an Image as a
Transaction

So A may be represented as
0111101010100111 -for 4x4 image resolution

Associations
{A} -> {B} indicates Item B is also bought if
Item A is bought
{A,B} -> {C} indicates C is bought if Items A &
B are bought together

So We have X -> Y as associations


where X Y =

Total # Associations
If we have d items
Then total # Associations = 3 2+1 + 1

Eg: - Items set {A,B}


Possible Associations: {A} -> {B}
& {B} -> {A}
d = 2, so 32 23 + 1 = 2

Total # Associations
I = {A, B, C}
d=3
Total # Associations = 33 24 + 1 = 12
{A} -> {B, C}

{B, C} -> {A}

{B} -> {A, C}

{A, C} -> {B}

{C} -> {A, B}

{A, B}-> {C}

{A}->{B}

{B}-> {A}

{A}->{C}

{C}-> {A}

{B}->{C}

{C}->{B}

Proof !
Each item can go into either of the 3 boxes

Antecedent and Consequent cant be empty


Which gives 3 2+1 + 1

Support Count ()
Support count of an Item set X is given as

() = | *trj : x trj +|
Eg: ,

tr1

tr2

= 2,

= 0

Support & Confidence


We have the association X-> Y s.t. X Y =

Support(X->Y) =

Confidence(X->Y) =

Support & Confidence Thresholds


R: {C,D} -> {E} (Association Rule)

Support(R) = 3/6 = 0.5 > minsup


Confidence(R) = = 0.75 > minconf

Why large support?


Items people seldom buy can still have large
confidence. Eg:- {1GB usb} -> {headset}
- may give large confidence as support
count of {1 GB usb} is small
- But this transaction {1GB usb, headset}
together is rare, so small support

Why large confidence ?


Confidence is a better measure than support
to indicate how often items are bought
together.
Confidence(X->Y) also gives an estimate of
conditional probability of Y given X

Caution!: Correlation doesnt imply


Causation
Just because a rule X->Y has large support and
large confidence,
X need not be the cause of Y.
It only implies correlation

Association Rule Mining Problem


Given a set of transactions T, find all the rules
having
support > minsup & confidence > minconf
Eg;- minsup = 20%, minconf = 50%
Obvious:

minconf > minsup

Association Rule Mining


Brute Force: List all the rules and compute
minsup & minconf
- Computationally expensive: O(3 )
Better Strategy: Check minsup & minconf
for all the subsets: Still exponential: O(2 )

The Idea
If {Milk, Bread, Butter} is frequent
then all of its subsets are also frequent

i.e. support =
({Milk,Bread,Butter})/N > minsup
then ( Milk, Bread /N > minsup

Decomposed into 2 Sub Tasks


Frequent Item Set Generation: Find all the
item sets which satisfy minsup threshold
Rule Generation: Extract all the high
confidence rules out of the Frequent Item set
found in step 1

You might also like