0% found this document useful (0 votes)

57 views12 pages

DWDM Unit 4

Uploaded by

nirmalcarey38

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views12 pages

DWDM Unit 4

Uploaded by

nirmalcarey38

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

UNIT-IV

UNIT – IV SYLLABUS
Association Analysis: Basic Concepts and Algorithms: Problem Definition,
Frequent Item Set generation, Rule generation, compact representation of
frequent item sets, FP-Growth Algorithm.

Association Analysis
Association mining aims to extract interesting correlations, frequent patterns,
associations or casual structures among sets of items or objects in transaction databases,
relational database or other data repositories. Association rules are widely used in various
areas such as telecommunication networks, market and risk management, inventory control,
cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc.

EXAMPLES:
Rule Form: BodyHead [Support, confidence]

Buys (X, “Computer”)  Buys (X, “Software”) [40%, 50%]

ASSOCIATION RULE: BASIC CONCEPTS:

 Given: (1) database of transaction, (2) each transaction is a list of items (purchased by
a customer in visit)
 Find: all rules that correlate the presence of one set of items with that of another set
of items.
 E.g., 98% of people who purchase tires and auto accessories also get
automotive services done.
 E.g., Market Basket Analysis  This process analyzes customer buying habits
by finding associations between the different items that customers place in
their “Shopping Baskets”. The discovery of such associations can help retailers
develop marketing strategies by gaining insight into which items are frequently
purchased together by customer.

APPLICATIONS:
 Maintenance agreement (what the store should do to boost maintenance agreement
sales)
 Home Electronics (what other products should the store stocks up?)
 Attached mailing in direct marketing 

1
Data Warehousing and Data Mining
ASSOCIATION RULE:
An association rule is an implication expression of the form XY, where X and Y are
disjoint itemsets, i.e., X ∩ Y = ∅. The strength of an association rule can be measured in
terms of its support and confidence. Support determines how often a rule is applicable to a
given data set, while confidence determines how frequently items in Y appear in
transactions that contain X. The formal definition of these metrics are

Support, s(XY) = 𝜎(𝑋𝖴Y)

𝑁

Confidence, c(XY) =𝜎(𝑋𝖴Y)

𝜎(𝑋)

Why Use Support and Confidence? Support is an important measure because a rule that has
very low support may occur simply by chance. A low support rule is also likely to be
uninteresting from a business perspective because it may not be profitable to promote items
that customers seldom buy together. For these reasons, support is often used to eliminate
uninteresting rules.

Confidence, on the other hand, measures the reliability of the inference made by a
rule. For a given rule XY, the higher the confidence, the more likely it is for Y to be present
in transactions that contain X. Confidence also provides an estimate of the conditional
probability of Y given X.

Therefore, a common strategy adopted by many association rule mining algorithms is to

decompose the problem into two major subtasks:

1. Frequent Itemset Generation, whose objective is to find all the itemsets that satisfy
the minsupthreshold. Theseitemsets are called frequent itemsets.
2. Rule Generation, whose objective is to extract all the high-confidence rules from the
frequent itemsets found in the previous step. These rules are called strong rules.

2
Data Warehousing and Data Mining

Frequent Itemset Generation:

A lattice structure can be used to enumerate the list of all possible itemsets. Above
Figure shows an itemset lattice for I = {a, b, c, d, e}. In general, a data set that contains k
items can potentially generate up to 2k − 1 frequent itemsets, excluding the null set.Because
k can be very large in many practical applications, the search space of itemsets that need to
be explored is exponentially large.
To find frequent itemsets we have two algorithms,
a) Apriori Algorithm
b) FP-Growth

3
Data Warehousing and Data Mining

a) APRIORI ALGORITHM:
Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant in 1994 for mining
frequent itemsets for Boolean association rules. The name of the algorithm is based on the
fact that the algorithm uses prior knowledge of frequent itemset properties, as we shall see
later. Apriori employs an iterative approach known as a level-wise search, where k- itemsets
are used to explore (k+1)-itemsets.

First, the set of frequent 1-itemsets is found by scanning the database to accumulate
the count for each item, and collecting those items that satisfy minimum support. The
resulting set is denoted by L1. Next, L1 is used to find L2, the set of frequent 2-itemsets, which
is used to find L3, and so on, until no more frequent k-itemsets can be found. The finding of
each Lk requires one full scan of the database.

To improve the efficiency of the level-wise generation of frequent itemsets, an

important property called the Apriori property is used to reduce the search space.

Apriori property: All nonempty subsets of a frequent itemset must also be frequent.

The Apriori property is based on the following observation. By definition, if an itemset

I does not satisfy the minimum support threshold, min sup, then I is not frequent, that is, P(I)<
min sup. If an item A is added to the itemset I, then the resulting itemset (i.e.,IUA) cannot
occur more frequently than I. Therefore, IUA is not frequent either, that is, P(IUA) < min sup.

This property belongs to a special category of properties called antimonotonicity in

the sense that if a set cannot pass a test, all of its supersets will fail the same test as well. It is
called antimonotonicity because the property is monotonic in the context of failing a test.

A two-step process is followed, consisting of join and prune actions.

1.
The join step: To find Lk, a set of candidate k-itemsets is generated by joining Lk-1 with itself.
This set of candidates is denoted Ck.
2.
The prune step: Ck is a superset of Lk, that is, its members may or may not be frequent, but
all of the frequent k-itemsets are included in Ck. A database scan to determine the count of
each candidate in Ck would result in the determination of Lk.

4
Data Warehousing and Data Mining
EXAMPLE:

5
Data Warehousing and Data Mining

1. In the first iteration of the algorithm, each item is a member of the set of candidate 1-
itemsets, C1. The algorithm simply scans all of the transactions to count the number
of occurrences of each item.
2. Suppose that the minimum support count required is 2, that is, min sup = 2. (Here, we
are referring to absolute support because we are using a support count. The
corresponding relative support is 2/9 = 22%.) The set of frequent 1-itemsets, L1, can
then be determined. It consists of the candidate 1-itemsets satisfying minimum
support. In our example, all of the candidates in C1 satisfy minimum support.
3. To discover the set of frequent 2-itemsets, L2, the algorithm uses the join L1 ⋈ L1 to
generate a candidate set of 2-itemsets, C2. C2 consists of 2-itemsets. Note that no
candidates are removed from C2 during the prune step because each subset of the
candidates is also frequent.
4. Next, the transactions in D are scanned and the support count of each candidate
itemset in C2 is accumulated, as shown in the middle table of the second row in Figure
5. The set of frequent 2-itemsets, L2, is then determined, consisting of those candidate
2- itemsets in C2 having minimum support.

6
6. The generation of the set of the candidate 3-itemsets, C3, is detailed in Figure From
the join step, we first get C3 = L2 ⋈ L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2,
I3, I5}, {I2, I4, I5}} Based on the Apriori property that all subsets of a frequent itemset
must also be frequent, we can determine that the four latter candidates cannot
possibly be frequent. We therefore remove them from C3, thereby saving the effort of
unnecessarily obtaining their counts during the subsequent scan of D to determine L3.
7. The transactions in D are scanned to determine L3, consisting of those candidate 3-
itemsets in C3 having minimum support.
8. The algorithm uses L3 ⋈ L3 to generate a candidate set of 4-itemsets, C4. Although the
join results in {I1, I2, I3, I5}, itemset {I1, I2, I3, I5} is pruned because its subset

{I2, I3, I5} is not frequent. Thus, C4 ≠Ø, and the algorithm terminates, having found all of
the frequent itemsets.

b) FP-GROWTH:
FP-growth (finding frequent itemsets without candidate generation). We reexamine
the mining of transaction database, D, of Table in previous Example using the frequent pattern
growth approach.

The first scan of the database is the same as Apriori, which derives the set of frequent
items (1-itemsets) and their support counts (frequencies). Let the minimum support count be
2. The set of frequent items is sorted in the order of descending support count. This resulting set
or list is denoted by L. Thus, we have L = {{I2:7}, {I1:6}, {I3:6}, {I4:2}, {I5:2}}

An FP-tree is then constructed as follows. First, create the root of the tree, labeled
with “null.” Scan database D a second time. The items in each transaction are processed in L
order (i.e., sorted according to descending support count), and a branch is created for each
transaction.

7
The FP-tree is mined as follows. Start from each frequent length-1 pattern (as an initial
suffix pattern), construct its conditional pattern base (a “sub-database,” which consists of
the set of prefix paths in the FP-tree co-occurring with the suffix pattern), then construct its
(conditional) FP-tree, and perform mining recursively on the tree. The pattern growth is
achieved by the concatenation of the suffix pattern with the frequent patterns generated
from a conditional FP-tree.

Finally, we can conclude that frequent itemsets are {I2, I1, I5} and {I2, I1, I3}.
GENERATING ASSOCIATION RULES FROM FREQUENT ITEMSETS
Once the frequent itemsets from transactions in a database D have been found, it is
straightforward to generate strong association rules from them (where strong association
rules satisfy both minimum support and minimum confidence). This can be done using Eq. for
confidence, which we show again here for completeness:
COMAPCT REPRESENTATION OF FREQUENT ITEMSETS:
compact representations of frequent itemsets are subsets of the initial set that can be used to
generate all other frequent itemsets. These are two types.

1.maximal frequent itemset

2.closed frequent itemset

MAXIMAL:
The number of frequent itemsets generated by the Apriori algorithm can often be very large,
so it is beneficial to identify a small representative set from which every frequent itemset can
be derived. One such approach is using maximal frequent itemsets.
A maximal frequent itemset is a frequent itemset for which none of its immediate supersets
are frequent. To illustrate this concept, consider the example given below:
The support counts are shown on the top left of each node. Assume support count
threshold = 50%, that is, each item must occur in 2 or more transactions. Based on that
threshold, the frequent itemsets are a, b, c, d, ab, ac, and ad (shaded nodes).

Out of these 7 frequent itemsets, 3 are identified as maximal frequent (having red
outline):
 ab: Immediate supersets abc and abd are infrequent.
 ac: Immediate supersets abc and acd are infrequent.
 ad: Immediate supersets abd and bcd are infrequent.
The remaining 4 frequent nodes (a, b, c, and d) cannot be maximal frequent because they all
have at least 1 immediate superset that is frequent.
Advantage:
Maximal frequent itemsets provide a compact representation of all the frequent
itemsets for a particular dataset. In the above example, all frequent itemsets are subsets of
the maximal frequent itemsets, since we can obtain sets a, b, c, and d by enumerating
subsets of ab, ac, and ad (including the maximal frequent itemsets themselves).

Disadvantage:
sThe support count of maximal frequent itemsets does not provide any information about
the support count of their subsets. This means that an additional traversal of data is needed
to determine the support count for non-maximal frequent itemsets, which may be undesirable
in certain cases.

CLOSED:
It is a frequent itemset that is both closed and its support is greater than or equal to minsup.
An itemset is closed in a data set if there exists no superset that has the same support count as
this original itemset.
PROCESS:

1. First identify all frequent itemsets.

2. Then from this group find those that are closed by checking to see if there exists a superset
that has the same support as the frequent itemset, if there is, the itemset is disqualified, but
if none can be found, the itemset is closed.
An alternative method is to first identify the closed itemsets and then use the minsup to
determine which ones are frequent.
The lattice diagram above shows the maximal, closed and frequent itemsets. The itemsets that are
circled with blue are the frequent itemsets. The itemsets that are circled with the thick blue are the
closed frequent itemsets. The itemsets that are circled with the thick blue and have the yellow fill
are the maximal frequent itemsets. In order to determine which of the frequent itemsets are closed,
all you have to do is check to see if they have the same support as their supersets, if they do they
are not closed.
For example ad is a frequent itemset but has the same support as abd so it is NOT a closed
frequent itemset; c on the other hand is a closed frequent itemset because all of its supersets, ac,
bc, and cd have supports that are less than 3.
As you can see there are a total of 9 frequent itemsets, 4 of them are closed frequent itemsets and
out of these 4, 2 of them are maximal frequent itemsets. This brings us to the relationship between
the three representations of frequent itemsets.

RELATIONSHIP BETWEEN FREQUENT ITEMSET

REPRESENTATION:

it is important to point out the relationship between

frequent itemsets, closed frequent itemsets and maximal
frequent itemsets. As mentioned earlier closed and
maximal frequent itemsets are subsets of frequent
itemsets but maximal frequent itemsets are a more
compact representation because it is a subset of closed
frequent itemsets. The diagram to the right shows the
relationship between these three types of itemsets.
Closed frequent itemsets are more widely used than
maximal frequent itemset because when efficiency is
more important that space, they provide us with the
support of the subsets so no additional pass is needed to
find this information.

Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
DWM Unit 4
No ratings yet
DWM Unit 4
11 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
7 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Ijctt V27P116
No ratings yet
Ijctt V27P116
7 pages
UNIT-5 DWDM (Data Warehousing and Data Mining) Association Analysis
No ratings yet
UNIT-5 DWDM (Data Warehousing and Data Mining) Association Analysis
7 pages
Efficient Algorithm for Closed Itemsets
No ratings yet
Efficient Algorithm for Closed Itemsets
8 pages
Module 2
No ratings yet
Module 2
14 pages
UNIT-3 DM
No ratings yet
UNIT-3 DM
9 pages
Mod 5
No ratings yet
Mod 5
56 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
23 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Data Mining Notes UNIT III
No ratings yet
Data Mining Notes UNIT III
26 pages
An Efficient Algorithm For Mining
No ratings yet
An Efficient Algorithm For Mining
6 pages
Chapter 5 Mining Frequent Pattern-DWM
No ratings yet
Chapter 5 Mining Frequent Pattern-DWM
48 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Unsupervised Learning Essentials
No ratings yet
Unsupervised Learning Essentials
64 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
Association Rule Mining Using Apriori Al PDF
No ratings yet
Association Rule Mining Using Apriori Al PDF
11 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
Data Mining 2, 3 Material
No ratings yet
Data Mining 2, 3 Material
173 pages
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
No ratings yet
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
19 pages
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
No ratings yet
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
5 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
16 pages
Lab8 Apriori
No ratings yet
Lab8 Apriori
9 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Chapter 7
No ratings yet
Chapter 7
8 pages
DM Module 3
No ratings yet
DM Module 3
11 pages
Frequent Pattern Mining With Associations: Lesson Introduction
No ratings yet
Frequent Pattern Mining With Associations: Lesson Introduction
6 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
Unit 3 - DM FULL
No ratings yet
Unit 3 - DM FULL
46 pages
7apriori Algorithm Slide
No ratings yet
7apriori Algorithm Slide
15 pages
DMT Unit-IV - UR20 - New
No ratings yet
DMT Unit-IV - UR20 - New
62 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
FDS Unit02
No ratings yet
FDS Unit02
16 pages
DMDW - Association Analysis
No ratings yet
DMDW - Association Analysis
12 pages
What Is A Frequent Itemset?
No ratings yet
What Is A Frequent Itemset?
7 pages
Unit 4
No ratings yet
Unit 4
72 pages
Report of 2nd Defence
No ratings yet
Report of 2nd Defence
6 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
DWDM Unit IV Mining - FP Association Rules
No ratings yet
DWDM Unit IV Mining - FP Association Rules
82 pages
Module 2
No ratings yet
Module 2
13 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Unit 5 Notes DWM
No ratings yet
Unit 5 Notes DWM
11 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Lecture Notes Session-2
No ratings yet
Lecture Notes Session-2
4 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Unit 2-2
No ratings yet
Unit 2-2
53 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
Unit - III
No ratings yet
Unit - III
27 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Unit 4
No ratings yet
Unit 4
97 pages
Mining Association Rules Guide
No ratings yet
Mining Association Rules Guide
41 pages
MZ Case Merged
No ratings yet
MZ Case Merged
4 pages
Tender Documents - 2022 06 25 17 03 54
No ratings yet
Tender Documents - 2022 06 25 17 03 54
37 pages
Cisco SPA500S Expansion Module: Highlights
No ratings yet
Cisco SPA500S Expansion Module: Highlights
3 pages
1483B Trust Disputes Practice Brochure d14
No ratings yet
1483B Trust Disputes Practice Brochure d14
6 pages
Union Bank of India Equity Report
No ratings yet
Union Bank of India Equity Report
30 pages
Societies Registration Application
No ratings yet
Societies Registration Application
5 pages
Deed of Assignment of Partnership Interest: Know All Men by These Presents
80% (5)
Deed of Assignment of Partnership Interest: Know All Men by These Presents
3 pages
22.T2283-Talent Management
No ratings yet
22.T2283-Talent Management
3 pages
Offer Letter-Mr. Abhishek Sharma
No ratings yet
Offer Letter-Mr. Abhishek Sharma
6 pages
Chapter - 2 - Fundamental Cost Concepts (Pt1)
No ratings yet
Chapter - 2 - Fundamental Cost Concepts (Pt1)
24 pages
Mahmoud A. El-Gamal Islamic Finance: Law, Economics and Practice Cambridge University Press, Cambridge, New York. 2006, 221 PP
No ratings yet
Mahmoud A. El-Gamal Islamic Finance: Law, Economics and Practice Cambridge University Press, Cambridge, New York. 2006, 221 PP
13 pages
Ielts Trainer 2 - Test 3 - Correcting Papers
No ratings yet
Ielts Trainer 2 - Test 3 - Correcting Papers
3 pages
Fish Farming for ALS Learners
No ratings yet
Fish Farming for ALS Learners
7 pages
Schiffmann CH 6
No ratings yet
Schiffmann CH 6
23 pages
Dhanda Invoice 003 - Periaandavar Engg
No ratings yet
Dhanda Invoice 003 - Periaandavar Engg
2 pages
Credit Standards for Loan Officers
No ratings yet
Credit Standards for Loan Officers
7 pages
Intro to Production & Ops Mgmt
No ratings yet
Intro to Production & Ops Mgmt
46 pages
Retail Location Analytics Guide
No ratings yet
Retail Location Analytics Guide
23 pages
Lesson Notes BCOM 333 Public Finance
No ratings yet
Lesson Notes BCOM 333 Public Finance
6 pages
Business Strategy
No ratings yet
Business Strategy
9 pages
Class-11th Accountancy Ques Paper
No ratings yet
Class-11th Accountancy Ques Paper
8 pages
Jewelry Store Sample Chart of Accounts
No ratings yet
Jewelry Store Sample Chart of Accounts
4 pages
Enterprise Information Systems and Strategic Management: Content
No ratings yet
Enterprise Information Systems and Strategic Management: Content
3 pages
E-Business Plan of A Shoe Business
No ratings yet
E-Business Plan of A Shoe Business
24 pages
GGL Certificate Holders 6 Sept 2022
No ratings yet
GGL Certificate Holders 6 Sept 2022
2 pages
HR and Organisation Unit 4
No ratings yet
HR and Organisation Unit 4
17 pages
City of Cabuyao Hatid-Sundo Program and Overseas Filipino Workers Satisfaction Basis For Action Plan
No ratings yet
City of Cabuyao Hatid-Sundo Program and Overseas Filipino Workers Satisfaction Basis For Action Plan
12 pages
Summative Quiz Part 2 Week 3suyman
No ratings yet
Summative Quiz Part 2 Week 3suyman
3 pages
Logistics: Hub & Spoke Model Insights
No ratings yet
Logistics: Hub & Spoke Model Insights
5 pages
PMS CH-5 Project Control, Review and Audit
No ratings yet
PMS CH-5 Project Control, Review and Audit
3 pages

DWDM Unit 4

Uploaded by

DWDM Unit 4

Uploaded by

UNIT-IV

Buys (X, “Computer”)  Buys (X, “Software”) [40%, 50%]

ASSOCIATION RULE: BASIC CONCEPTS:

Support, s(XY) = 𝜎(𝑋𝖴Y)

Confidence, c(XY) =𝜎(𝑋𝖴Y)

Therefore, a common strategy adopted by many association rule mining algorithms is to

Frequent Itemset Generation:

To improve the efficiency of the level-wise generation of frequent itemsets, an

The Apriori property is based on the following observation. By definition, if an itemset

This property belongs to a special category of properties called antimonotonicity in

A two-step process is followed, consisting of join and prune actions.

1.maximal frequent itemset

1. First identify all frequent itemsets.

RELATIONSHIP BETWEEN FREQUENT ITEMSET

it is important to point out the relationship between

You might also like