0% found this document useful (0 votes)

26 views22 pages

Association Rule

Association Rule Algorithm Data Mining Lectures

Uploaded by

Muhammed Elbarber

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views22 pages

Association Rule

Association Rule Algorithm Data Mining Lectures

Uploaded by

Muhammed Elbarber

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Data Mining

Association Analysis: Basic Concepts

and Algorithms
What Are Patterns?
What are patterns?
– Patterns: A set of items, subsequences, or substructures
that occur frequently together (or strongly correlated) in
a data set
– Patterns represent intrinsic and important properties of
datasets

Frequent item set Frequent sequences Frequent structures

What Is Pattern Discovery?

Pattern discovery: Uncovering patterns from

massive data sets
It can answer questions such as:
– What products were often purchased together?
– What are the subsequent purchases after buying an
iPad?
Association Rule Mining

Given a set of transactions, find rules that will predict the

occurrence of an item based on the occurrences of other
items in the transaction
An association rule consists of two parts, i.e., an antecedent (if) and a consequent
(then).
Market-Basket transactions
Example of Association Rules
TID Items
{Diaper} → {Beer},
1 Bread, Milk {Milk, Bread} → {Eggs,Coke},
2 Bread, Diaper, Beer, Eggs {Beer, Bread} → {Milk},
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer Implication means co-occurrence,
5 Bread, Milk, Diaper, Coke not causality!
Basic Concepts: Transactional
Database
Transactional Database (TDB)
– Each transaction is associated with an identifier, called
a TID.
– May also have counts associated with each item sold

Tid Items bought

1 Beer, Nuts, Diaper

2 Beer, Coffee, Diaper

3 Beer, Diaper, Eggs

4 Nuts, Eggs, Milk

5 Nuts, Coffee, Diaper, Eggs, Milk

Definition: Frequent Itemset
Itemset
– A collection of one or more items
◆ Example: {Milk, Bread, Diaper}
– k-itemset TID Items

◆ An itemset that contains k items 1 Bread, Milk

Support count () 2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
– Frequency of occurrence of an itemset
4 Bread, Milk, Diaper, Beer
– E.g. ({Milk, Bread,Diaper}) = 2
5 Bread, Milk, Diaper, Coke
Relative support
– The fraction of transactions that
contains X (i.e., the probability that a
transaction contains X)
– E.g. s({Milk, Bread, Diaper}) = 2/5
Frequent Itemset
– An itemset whose support is greater
than or equal to a minsup threshold
Rule Measures: Support and
Confidence
Customer
buys both
Customer Find all the rules X & Y  Z with
buys diaper
minimum confidence and support
– support, s, probability that a transaction
contains {X U Y U Z}
– confidence, c, conditional probability
that a transaction having {X U Y} also
Customer contains Z
buys beer

Transaction ID Items Bought Let minimum support 50%, and

2000 A,B,C minimum confidence 50%, we
1000 A,C have
4000 A,D – A  C (50%, 66.6%)
5000 B,E,F – C  A (50%, 100%)
Definition: Association Rule
Association Rule
TID Items
– An implication expression of the form
1 Bread, Milk
X → Y, where X and Y are itemsets
2 Bread, Diaper, Beer, Eggs
– Example:
3 Milk, Diaper, Beer, Coke
{Milk, Diaper} → {Beer}
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Rule Evaluation Metrics
– Support (s)
◆ Fraction of transactions that contain Example:
both X and Y {Milk , Diaper}  Beer
– Confidence (c)
Measures how often items in Y appear  ( Milk , Diaper, Beer ) 2
◆
in transactions that contain X
s= = = 0.4
|T| 5
◆ Confidence is the conditional
probability of Y when X has already  (Milk, Diaper, Beer ) 2
c= = = 0.67
occurred  (Milk , Diaper ) 3
Mining Association Rules—An
Example

Transaction ID Items Bought Min. support 50%

2000 A,B,C Min. confidence 50%
1000 A,C
4000 A,D Frequent Itemset Support
{A} 75%
5000 B,E,F
{B} 50%
{C} 50%
For rule A  C: {A,C} 50%
support = P (A U C) = 50%
confidence = P (C|A) = 66.6%
Association Rule Mining Task

Given a set of transactions T, the goal of

association rule mining is to find all rules having
– support ≥ minsup threshold
– confidence ≥ minconf threshold

Brute-force approach:
– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf
thresholds
 Computationally prohibitive!
Mining Association Rules

Example of Rules:
TID Items
1 Bread, Milk {Milk,Diaper} → {Beer} (s=0.4, c=0.67)
2 Bread, Diaper, Beer, Eggs {Milk,Beer} → {Diaper} (s=0.4, c=1.0)
{Diaper,Beer} → {Milk} (s=0.4, c=0.67)
3 Milk, Diaper, Beer, Coke
{Beer} → {Milk,Diaper} (s=0.4, c=0.67)
4 Bread, Milk, Diaper, Beer
{Diaper} → {Milk,Beer} (s=0.4, c=0.5)
5 Bread, Milk, Diaper, Coke {Milk} → {Diaper,Beer} (s=0.4, c=0.5)

Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but
can have different confidence
• Thus, we may decouple the support and confidence requirements
Mining Association Rules

Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support  minsup

2. Rule Generation
– Generate high confidence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent itemset

Frequent itemset generation is still computationally expensive

Frequent Itemset Generation
null

A B C D E

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Given d items, there
are 2d possible
ABCDE candidate itemsets
Frequent Itemset Generation

Brute-force approach:
– Each itemset in the lattice is a candidate frequent itemset
– Count the support of each candidate by scanning the
database
Transactions List of
Candidates
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
N 3 Milk, Diaper, Beer, Coke M
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
w

– Match each transaction against every candidate

– Complexity ~ O(NMw) => Expensive since M = 2d !!!
Reducing Number of Candidates
Apriori principle:
– If an itemset is frequent, then all of its subsets must also
be frequent
– uses prior knowledge of frequent itemset properties.

Apriori principle holds due to the following property

of the support measure:
X ,Y : ( X  Y )  s( X )  s(Y )
– Support of an itemset never exceeds the support of its
subsets
– This is known as the anti-monotone property of support
Illustrating Apriori Principle
Lattice structure of frequent itemsets
null

A B C D E

AB AC AD AE BC BD BE CD CE DE

Found to be
Infrequent
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Pruned
ABCDE
supersets
Illustrating Apriori Principle

Item Count Items (1-itemsets)

Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4 {Bread,Beer} 2 (No need to generate
Eggs 1
{Bread,Diaper} 3 candidates involving Coke
{Milk,Beer} 2 or Eggs)
{Milk,Diaper} 3
{Beer,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)
Itemset Count
{Bread,Milk,Diaper} 3
Mining Frequent Itemsets: the Key Step

1. Find the frequent itemsets: the sets of items that

have minimum support
– A subset of a frequent itemset must also be a frequent
itemset
◆i.e.,if {AB} is a frequent itemset, both {A} and {B} should be a
frequent itemset
– Iteratively find frequent itemsets with cardinality from 1 to k
(k-itemset)
2. Use the frequent itemsets to generate association
rules.
The Apriori Algorithm—An Example
minsup = 2 Itemset sup
Database TDB Itemset sup
{A} 2 F1
Tid Items C1 {A} 2
{B} 3
10 A, C, D {B} 3
{C} 3
20 B, C, E 1st scan {C} 3
{D} 1
30 A, B, C, E {E} 3
{E} 3
40 B, E
C2 Itemset sup C2 Itemset
F2 Itemset sup {A, B} 1 {A, B}
{A, C} 2 {A, C} 2 {A, C}
2nd scan
{B, C} 2 {A, E} 1
{A, E}
{B, E} 3 {B, C} 2
{B, C}
{C, E} 2 {B, E} 3
{B, E}
{C, E} 2
{C, E}

C3 Itemset F3 Itemset sup

3rd scan
{B, C, E} {B, C, E} 2
Apriori Algorithm

Method:

– Let k=1
– Generate frequent itemsets of length 1
– Repeat until no new frequent itemsets are identified
◆ Generate length (k+1) candidate itemsets from length k
frequent itemsets
◆ Prune candidate itemsets containing subsets of length k that
are infrequent
◆ Count the support of each candidate by scanning the DB
◆ Eliminate candidates that are infrequent, leaving only those
that are frequent
The Apriori Algorithm
Join Step: Ck is generated by joining Lk-1with itself
Prune Step: Any (k-1)-itemset that is not frequent cannot be a
subset of a frequent k-itemset
Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;

Associationrule 1
No ratings yet
Associationrule 1
30 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
New Microsoft Power Point Presentation
No ratings yet
New Microsoft Power Point Presentation
18 pages
Data Mining Association Analysis
No ratings yet
Data Mining Association Analysis
18 pages
Dmunit 2
No ratings yet
Dmunit 2
85 pages
06 FPBasic
No ratings yet
06 FPBasic
77 pages
Unit 2
No ratings yet
Unit 2
14 pages
Association Rules & Frequent Itemsets: The Market-Basket Problem
No ratings yet
Association Rules & Frequent Itemsets: The Market-Basket Problem
5 pages
Lect 6
No ratings yet
Lect 6
74 pages
Slides
No ratings yet
Slides
92 pages
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
No ratings yet
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
45 pages
04 Frequent Patterns Analysis
No ratings yet
04 Frequent Patterns Analysis
37 pages
Data Mining Chapter 4 Association Analysis
No ratings yet
Data Mining Chapter 4 Association Analysis
31 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Data Mining - Module2
No ratings yet
Data Mining - Module2
112 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Rule Mining
No ratings yet
Rule Mining
20 pages
Data Mining: Association Rules
No ratings yet
Data Mining: Association Rules
43 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
30 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
DM Association
No ratings yet
DM Association
43 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Unit 4
No ratings yet
Unit 4
97 pages
Association
No ratings yet
Association
54 pages
Arm PPT
No ratings yet
Arm PPT
15 pages
Association Rule
No ratings yet
Association Rule
17 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Unit 4 DWM by DR KSR Association - Analysis
No ratings yet
Unit 4 DWM by DR KSR Association - Analysis
68 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
CS2202 AssociationRuleMining
No ratings yet
CS2202 AssociationRuleMining
59 pages
DWDM Unit IV Mining - FP Association Rules
No ratings yet
DWDM Unit IV Mining - FP Association Rules
82 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
Chapter 5 - Association Rule Mining
No ratings yet
Chapter 5 - Association Rule Mining
45 pages
Association Rules
No ratings yet
Association Rules
39 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
DSTBD 9-DMassrules
No ratings yet
DSTBD 9-DMassrules
98 pages
304A Data Warehousing and Data Mining Unit-3
No ratings yet
304A Data Warehousing and Data Mining Unit-3
17 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
DMT Unit-IV - UR20 - New
No ratings yet
DMT Unit-IV - UR20 - New
62 pages
DS2 Association
No ratings yet
DS2 Association
48 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Lecture Notes Session-2
No ratings yet
Lecture Notes Session-2
4 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Mining Frequent Pattern
No ratings yet
Mining Frequent Pattern
36 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
BD25
No ratings yet
BD25
19 pages
Associate Rules
No ratings yet
Associate Rules
26 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
Unit 5
No ratings yet
Unit 5
40 pages
Big Data Analytics Unit3
No ratings yet
Big Data Analytics Unit3
27 pages
06 FPBasic
No ratings yet
06 FPBasic
103 pages
Data Mining: Frequent Itemsets & Clustering
No ratings yet
Data Mining: Frequent Itemsets & Clustering
152 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
SC Expt 03
No ratings yet
SC Expt 03
4 pages
MIS Summarization (2025)
No ratings yet
MIS Summarization (2025)
13 pages
Neural Network 3
No ratings yet
Neural Network 3
19 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
3 pages
Inside Deep Learning: Math, Algorithms, Models 1st Edition Edward Raff Available Full Chapters
No ratings yet
Inside Deep Learning: Math, Algorithms, Models 1st Edition Edward Raff Available Full Chapters
113 pages
Deep Learning Applications & CNNs
No ratings yet
Deep Learning Applications & CNNs
14 pages
Pytorch Cheatsheet EN
No ratings yet
Pytorch Cheatsheet EN
1 page
Project Report Format (2024-25)
No ratings yet
Project Report Format (2024-25)
35 pages
Introduction To Machine Learning-1
No ratings yet
Introduction To Machine Learning-1
28 pages
B.Tech Data Mining Exam Guide
No ratings yet
B.Tech Data Mining Exam Guide
3 pages
Deep Learning Nanodegree Syllabus: Project: Find Donors For Charityml
No ratings yet
Deep Learning Nanodegree Syllabus: Project: Find Donors For Charityml
13 pages
UNIT-4 Foundations of Deep Learning
100% (1)
UNIT-4 Foundations of Deep Learning
43 pages
Unit 3
100% (1)
Unit 3
11 pages
Matlab Neural Network Guide
No ratings yet
Matlab Neural Network Guide
13 pages
Hopfield Networks Is All You Need
No ratings yet
Hopfield Networks Is All You Need
94 pages
2019人工智能发展报告
No ratings yet
2019人工智能发展报告
391 pages
Deep Learning - IIT Ropar - Unit 14 - Week 11
No ratings yet
Deep Learning - IIT Ropar - Unit 14 - Week 11
4 pages
Major Base 3
No ratings yet
Major Base 3
43 pages
Pattern Classification Labreport
No ratings yet
Pattern Classification Labreport
5 pages
ML Unit-2
100% (1)
ML Unit-2
28 pages
Nptel ML Questions
100% (2)
Nptel ML Questions
12 pages
Week 6 Prev & Current Assignments
No ratings yet
Week 6 Prev & Current Assignments
21 pages
Machine Learning Lesson Plan
No ratings yet
Machine Learning Lesson Plan
5 pages
Deep Learning - Question Papers
67% (3)
Deep Learning - Question Papers
7 pages
Hindi Handwritten Character Recognition Using Deep Learning
No ratings yet
Hindi Handwritten Character Recognition Using Deep Learning
7 pages
The Backpropagation Algorithm
No ratings yet
The Backpropagation Algorithm
4 pages
Module 4-6 DWM QUESTION BANK
No ratings yet
Module 4-6 DWM QUESTION BANK
25 pages
ANN Forecasting for Project Cost
No ratings yet
ANN Forecasting for Project Cost
4 pages
Fuzzy C-Means Clustering Guide
No ratings yet
Fuzzy C-Means Clustering Guide
13 pages
Slides ML Ch2
No ratings yet
Slides ML Ch2
17 pages
Perceptron Basics for ML Enthusiasts
0% (1)
Perceptron Basics for ML Enthusiasts
8 pages
CCS355 SET1 Anna University Lab Manual Question Set
100% (1)
CCS355 SET1 Anna University Lab Manual Question Set
3 pages
CNNs vs RNNs: Key Differences & Uses
No ratings yet
CNNs vs RNNs: Key Differences & Uses
2 pages
Application of Deep Belief Networks For Natural Language Understanding
No ratings yet
Application of Deep Belief Networks For Natural Language Understanding
7 pages

Association Rule

Uploaded by

Association Rule

Uploaded by

Data Mining

Association Analysis: Basic Concepts

Frequent item set Frequent sequences Frequent structures

Pattern discovery: Uncovering patterns from

Given a set of transactions, find rules that will predict the

Tid Items bought

1 Beer, Nuts, Diaper

2 Beer, Coffee, Diaper

3 Beer, Diaper, Eggs

4 Nuts, Eggs, Milk

5 Nuts, Coffee, Diaper, Eggs, Milk

◆ An itemset that contains k items 1 Bread, Milk

Transaction ID Items Bought Let minimum support 50%, and

Transaction ID Items Bought Min. support 50%

Given a set of transactions T, the goal of

Frequent itemset generation is still computationally expensive

ABCD ABCE ABDE ACDE BCDE

– Match each transaction against every candidate

Apriori principle holds due to the following property

ABCD ABCE ABDE ACDE BCDE

Item Count Items (1-itemsets)

1. Find the frequent itemsets: the sets of items that

C3 Itemset F3 Itemset sup

You might also like