Association Rule (AR):
implication X Y where X,Y I and X Y = ;
|X ∪ Y| (Number of transactions containing X and Y)
Support (X → Y) =
|D|(number of transactions in dataset)
Number of transactions containing X and Y
Confidence(X → Y) =
Number of transactions containing X
1. Compute the support and confidence of A→ D , B →C , A →E
solution :
Items: {A,B,C,D,E}
# of all transactions: 5
# AD
Support A→D: = 2/5
# of all transactions
# AD
Confidence A→D: = 2/3
# of transactions of A
support B →C = 3/4 confidence B →C = 3/3
support A → E = 1/4 confidence A →E = 1/3
2. Consider the data set shown in the Table (a).Compute the support and
confidence for items in the table (b) by treating each transaction ID as a
market basket.
TID
Bread, Jam ⇒ Milk
Milk, Jam ⇒Bread
Bread ⇒Milk, Jam
Jam ⇒Bread, Milk
Milk ⇒ Bread, Jam
solution :
TID S C
Bread, Jam ⇒ Milk 3/8 3/4
Milk, Jam ⇒Bread 3/8 3/4
Bread ⇒Milk, Jam 3/8 3/4
Jam ⇒Bread, Milk 3/8 3/5
Milk ⇒ Bread, Jam 3/8 3/6
3. Consider the following dataset and find frequent itemsets and generate
association rules for them. Given that minimum support count is 2 and
minimum confidence is 60%.
Step-1: K=1
(I) Create a table containing support count of each item present in dataset
– Called C1(candidate set)
=> L1
if support_count of candidate set items is less than min_support then
remove those items(min_sup_count=2)
Step-2: K=2
Generate candidate set C2 using L1
and find support count of these itemsets
if support_count of candidate set item is less than min_support then
remove those items
=> L2
Step-3:
Generate candidate set C3 using L2
if support_count of candidate set item is less than min_support then remove
those items
=> L3
Step-4:
Generate candidate set C4 using L3
l1,l2,l3,l5 => count = 1 => not frequent
no itemset in C4
stop
by taking an example of any frequent itemset, we will show the rule generation.
Itemset {I1, I2, I3} //from L3
SO rules can be
[I1^I2]=>[I3] //confidence = sup(I1^I2^I3)/sup(I1^I2) = 2/4*100=50%
[I1^I3]=>[I2] //confidence = sup(I1^I2^I3)/sup(I1^I3) = 2/4*100=50%
[I2^I3]=>[I1] //confidence = sup(I1^I2^I3)/sup(I2^I3) = 2/4*100=50%
[I1]=>[I2^I3] //confidence = sup(I1^I2^I3)/sup(I1) = 2/6*100=33%
[I2]=>[I1^I3] //confidence = sup(I1^I2^I3)/sup(I2) = 2/7*100=28%
[I3]=>[I1^I2] //confidence = sup(I1^I2^I3)/sup(I3) = 2/6*100=33%
So if minimum confidence is 50%, then first 3 rules can be considered as strong
association rules.
4. Consider the data set from a supermarket as shown in the Table. Apply the
Apriori algorithm to the dataset of transactions and identify all frequent k
itemsets. (Min_sup= 20%)
T_ID List of Items
1 Milk, Bread, Eggs
2 Bread, Sugar
3 Bread, Cheese
4 Milk, Bread, Sugar
5 Milk, Chesse
6 Bread, Chesse
7 Milk, Chesse
8 Milk, Bread, Chesse, Eggs
9 Milk, Bread, Chesse