0% found this document useful (0 votes)

40 views12 pages

DMDW - Association Analysis

The document discusses association rule mining, focusing on frequent patterns and the generation of rules that predict item occurrences based on transaction data. It outlines methods such as the Apriori algorithm and FP-growth for mining frequent itemsets, emphasizing the importance of support and confidence thresholds. Additionally, it explores various types of association rules, including multilevel, multidimensional, and quantitative rules, highlighting their applications in data analysis.

Uploaded by

pavankumardokku2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views12 pages

DMDW - Association Analysis

Uploaded by

pavankumardokku2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Association Analysis

Association Rule Mining

Frequent patterns are patterns (such as itemsets, subsequences, or substructures) that appear
in a data set frequently.

Given a set of transactions, find rules that will predict the occurrence of an item based on the
occurrences of other items in the transaction

For example, a set of items, such as milk and bread, that appear frequently together in a
transaction data set is a frequent itemset.

TID List of Items

1 Milk, Bread, Cereal

2 Milk, Bread, Sugar, Eggs
3 Milk, Bread, Butter
4 Sugar, Eggs

Example of Association Rules

({Milk}  {Bread})
({Milk,Bread}  {Butter})
({Bread}  {Sugar})

Definitions:
Itemset
– A collection of one or more items
 Example: {Milk, Bread}
– k-itemset
 An itemset that contains k items
Support count ()
– Frequency of occurrence of an itemset
– E.g. ({Milk, Bread}) = 3
Support
– Fraction of transactions that contain an itemset
– E.g. S({Milk}  {Bread}) = 3/4
Frequent Itemset
– An itemset whose support is greater than or equal to a minsup threshold
Ex., computer ⇒ antivirus software

[support = 2%,confidence = 60%]

Rule support and confidence are two measures of rule interestingness.

A support of 2% for Association Rule means that 2% of all the transactions under
analysis show that computer and antivirus software are purchased together.

A confidence of 60% means that 60% of the customers who purchased a computer
also bought the software.

Typically, association rules are considered interesting if they satisfy both a minimum
support threshold and a minimum confidence threshold.

Association Rule Mining Task:

Given a set of transactions T, the goal of association rule mining is to find all rules
having
– support ≥ minsup threshold
– confidence ≥ minconf threshold
Brute-force approach:
– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf thresholds
– Computationally prohibitive!
The total number of possible rules extracted from a data set that contains d items is

Mining Association Rules:

A common strategy adopted by many association rule mining algorithms is to

decompose the problem into two major subtasks.
Two-step approach:
1. Frequent Itemset Generation
– whose objective is to find all the itemsets that satisfy the minsup
threshold.
– These itemsets are called frequent itemsets.
2. Rule Generation
– whose objective is to extract all the high-confidence rules from the
frequent itemsets found in the previous step.
– These rules are called strong rules.
Frequent itemset generation is still computationally expensive.
Association Rule Mining Methods:

1. Apriori Method: Finding Frequent Itemsets with candidate generation

The name of the algorithm is based on the fact that the algorithm uses prior knowledge of
frequent itemset properties, as we shall see later. Apriori employs an iterative approach
known as a level-wise search, where k-itemsets are used to explore (k+1) - itemsets. First, the
set of frequent 1-itemsets is found by scanning the database to accumulate the count for each
item, and collecting those items that satisfy minimum support. The resulting set is denoted by
L1. Next, L1 is used to find L2, the set of frequent 2-itemsets, which is used to find L3, and so
on, until no more frequent k-itemsets can be found. The finding of each Lk requires one full
scan of the database.

To improve the efficiency of the level-wise generation of frequent itemsets, an important

property called the Apriori property is used to reduce the search space.

Apriori property: All nonempty subsets of a frequent itemset must also be frequent.

It includes two steps:

i. Join step:
• To find Lk, a set of candidate k-itemsets is generated by joining Lk−1 with
itself.
• The join, Lk−1 on Lk−1,that is, members l1 and l2 of Lk−1 are joined if
• (l1[1] = l2[1]) ∧ (l1[2] =l2[2]) ∧...∧(l1[k−2] = l2[k−2]) ∧(l1[k−1] < l2[k−1]).
• Ensures that no duplicates are generated
•
ii. Prune step:
The prune component employs the Apriori property to remove candidates that have a
subset that is not frequent.

Apriori Example-1:
The Apriori Algorithm
Generating association rules from Frequent Itemsets:
Once the frequent itemsets from transactions in a database D have been found, it is straight
forward to generate strong association rules from them (where strong association rules satisfy
both minimum support and minimum confidence). This can be done by using the below
equation,

Suppose the data contain the frequent itemset l = {I1, I2, I5}
What are the association rules that can be generated from l?
The nonempty subsets of l are {I1, I2}, {I1, I5}, {I2, I5}, {I1}, {I2}, and {I5}.
The resulting association rules are as shown below, each listed with its confidence:

If the minimum confidence threshold is, say, 70%, then only the second, third, and last rules
above are output, because these are the only ones generated that are strong.

Example-2: A database has 5 transactions. Let min sup = 60% and min conf = 80%.

a) Find all frequent itemsets using Apriori

b) List all of the strong association rules (with support s and confidence c) matching the
following metarule, where X is a variable representing customers, and item i denotes
variables representing items (e.g., “A”, “B”, etc.):

Scan DB and find frequent 1-itemset (single item pattern) their support > = 3
Finally resulting in the complete set of frequent itemsets: { e, k, m, o, y, ke, oe, mk, ok, ky,
oke }

2. Frequent Pattern(FP)-Growth Approach for Mining Frequent

Itemsets without candidate generation
As we have seen, in many cases the Apriori candidate generate-and-test method significantly
reduces the size of candidate sets, leading to good performance gain. However, it can suffer
from two nontrivial costs:

 It may still need to generate a huge number of candidate sets. For example, if there
are 104 frequent 1-itemsets, the Apriori algorithm will need to generate more than 107
candidate 2-itemsets.
 It may need to repeatedly scan the whole database and check a large set of candidates
bypattern matching. It is costly to go over each transaction in the database to
determine the support of the candidate itemsets.

“Can we design a method that mines the complete set of frequent itemsets without such a
costly candidate generation process?” An interesting method in this attempt is called
frequent pattern growth, or simply FP-growth, which adopts a divide-and-conquer strategy
as follows. First, it compresses the database representing frequent items into a frequent
pattern tree, or FP-tree, which retains the itemset association information. It then divides
the compressed database into a set of conditional databases (a special kind of projected
database), each associated with one frequent item or “pattern fragment,” and mines each
database separately. For each “pattern fragment,” only its associated data sets need to be
examined. Therefore, this approach may substantially reduce the size of the data sets to be
searched, along with the “growth” of patterns being examined.

Example:
Mining Various Kinds of Association Rules

• Mining multilevel association rules

• Miming multidimensional association rules
• Mining quantitative association rules

Multilevel associations involve concepts at different abstraction levels.

Multidimensional associations involve more than one dimension or predicate (e.g.,
rules that relate what a customer buys to his or her age). Quantitative association rules
involve numeric attributes that have an implicit ordering among values (e.g., age).
Rare patterns are patterns that suggest interesting although rare item combinations.

1. Mining multilevel association rules

For many applications, strong associations discovered at high abstraction levels, though with
high support, could be commonsense knowledge. We may want to drill down to find novel
patterns at more detailed levels.

The same minimum support threshold is used when mining at each abstraction level. For
example, in Figure, a minimum support threshold of 5% is used throughout (e.g., for mining
from “computer” downward to “laptop computer”). Both “computer” and “laptop computer”
are found to be frequent, whereas “desktop computer” is not.

Each abstraction level has its own minimum support threshold. The deeper the abstraction
level, the smaller the corresponding threshold. For example, in Figure , the minimum support
thresholds for levels 1 and 2 are 5% and 3%, respectively. In this way, “computer,” “laptop
computer,” and “desktop computer” are all considered frequent.

2. Mining Multi-Dimensional Association rules

Following the terminology used in multi-dimensional databases, we refer to each distinct predicate in
a rule as a dimension. Hence, we can refer to Rule as a single dimensional or intra-dimensional
association rule because it contains a single distinct predicate (e.g., buys) with multiple occurrences
(i.e., the predicate occurs more than once within the rule). Such rules are commonly mined from
transactional data.

Association rules that involve two or more dimensions or predicates can be referred to as multi-
dimensional association rules. Rule (7.7) contains three predicates (age, occupation, and buys), each
of which occurs only once in the rule. Hence, we say that it has no repeated predicates.
Multidimensional association rules with no repeated predicates are called inter-dimensional
association rules. We can also mine multidimensional association rules with repeated predicates,
which contain multiple occurrences of some predicates. These rules are called hybrid-dimensional
association rules. An example of such a rule is the following, where the predicate buys is repeated

3. Mining Quantitative Association Rules

Mining Multidimensional Association Rules Using Static Discretization of Quantitative Attributes

 The transformed multidimensional data may be used to construct a data cube.

 Data cubes are well suited for the mining of multidimensional association rules
 They store aggregates (such as counts), in multi dimensional space, which is essential
for computing the support and confidence of multidimensional association rules.
 The base cuboid aggregates the task-relevant data by age, income, and buys;
 The 2-D cuboid, (age, income), aggregates by age and income, and so on;
 The 0-D (apex) cuboid contains the total number of transactions in the task-relevant
data.

Inbound 5799672056943946753
No ratings yet
Inbound 5799672056943946753
47 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Module 2
No ratings yet
Module 2
14 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
16 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Retail Market Basket Analysis
No ratings yet
Retail Market Basket Analysis
43 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
FDS Unit02
No ratings yet
FDS Unit02
16 pages
UNIT 5 Frequent Pattern Mining
No ratings yet
UNIT 5 Frequent Pattern Mining
42 pages
Contents
No ratings yet
Contents
59 pages
Mining Frequent Itemset-Association Analysis
No ratings yet
Mining Frequent Itemset-Association Analysis
59 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
DMDW U3
No ratings yet
DMDW U3
16 pages
Mining Frequent Patterns Ubnit 3
No ratings yet
Mining Frequent Patterns Ubnit 3
25 pages
Data Analytics and Visualization Unit-IV
No ratings yet
Data Analytics and Visualization Unit-IV
4 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Efficient Algorithm for Closed Itemsets
No ratings yet
Efficient Algorithm for Closed Itemsets
8 pages
Thabet Slimani - Efficiant Analysis of Pattern and Association Rule Mining Approaches
No ratings yet
Thabet Slimani - Efficiant Analysis of Pattern and Association Rule Mining Approaches
14 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
Unit 2-2
No ratings yet
Unit 2-2
53 pages
Apriori Algorithm for Association Rule Mining
No ratings yet
Apriori Algorithm for Association Rule Mining
32 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
Unit 2 Material
No ratings yet
Unit 2 Material
17 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
Chapter 5 Mining Frequent Pattern-DWM
No ratings yet
Chapter 5 Mining Frequent Pattern-DWM
48 pages
Efficient Mining Frequent Itemsets Algorithms: Marghny H. Mohamed Mohammed M. Darwieesh
No ratings yet
Efficient Mining Frequent Itemsets Algorithms: Marghny H. Mohamed Mohammed M. Darwieesh
11 pages
UNIT-3 DM
No ratings yet
UNIT-3 DM
9 pages
An Efficient Algorithm For Mining
No ratings yet
An Efficient Algorithm For Mining
6 pages
Data Mining: Frequent Itemsets & Clustering
No ratings yet
Data Mining: Frequent Itemsets & Clustering
152 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Association Rules
No ratings yet
Association Rules
48 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Association RuleMining
No ratings yet
Association RuleMining
52 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Association Rule Mining Explained
No ratings yet
Association Rule Mining Explained
16 pages
Data Mining M2
No ratings yet
Data Mining M2
18 pages
Data Mining-Knowledge Presentation 2: Prof. Sin-Min Lee
No ratings yet
Data Mining-Knowledge Presentation 2: Prof. Sin-Min Lee
54 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
DWDM Unit IV Mining - FP Association Rules
No ratings yet
DWDM Unit IV Mining - FP Association Rules
82 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Unsupervised Learning Essentials
No ratings yet
Unsupervised Learning Essentials
64 pages
Unit-5 Finalized
No ratings yet
Unit-5 Finalized
15 pages
Unit 2 Question and Answers Bdhdns
No ratings yet
Unit 2 Question and Answers Bdhdns
15 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
12 pages
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
No ratings yet
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
19 pages
Mod 5
No ratings yet
Mod 5
56 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
Grade 9 Network Cable Installation Guide
No ratings yet
Grade 9 Network Cable Installation Guide
12 pages
VADS Installation Guide
No ratings yet
VADS Installation Guide
10 pages
Boom Barrier
No ratings yet
Boom Barrier
35 pages
Brochure Houillon
No ratings yet
Brochure Houillon
2 pages
DOS Commands Guide for Tech Users
0% (1)
DOS Commands Guide for Tech Users
2 pages
Ansys Digital Twin Solutions
No ratings yet
Ansys Digital Twin Solutions
22 pages
22 Dom Selectors PDF
No ratings yet
22 Dom Selectors PDF
14 pages
GMI Questions and Their Answers in English
No ratings yet
GMI Questions and Their Answers in English
17 pages
COME 2202 NAHPI Introduction To Computer Networks Course Outline April 2024docx
No ratings yet
COME 2202 NAHPI Introduction To Computer Networks Course Outline April 2024docx
24 pages
Anurag-Sah (Data Engineer) - 2
No ratings yet
Anurag-Sah (Data Engineer) - 2
2 pages
UNIT-5 Part1
No ratings yet
UNIT-5 Part1
15 pages
LACLS Fiscal ISV Integrations For Brazil
100% (1)
LACLS Fiscal ISV Integrations For Brazil
266 pages
Ankit Data Engineer Resume
No ratings yet
Ankit Data Engineer Resume
8 pages
ORG 300-00 - Service Manual - QM164165 - 2019-11-04 - 10
No ratings yet
ORG 300-00 - Service Manual - QM164165 - 2019-11-04 - 10
120 pages
Introduction to ICT and IT Basics
No ratings yet
Introduction to ICT and IT Basics
111 pages
2
No ratings yet
2
7 pages
A Sic Final Report
No ratings yet
A Sic Final Report
13 pages
Master Troubleshooting Guide For Payment Process Requests
No ratings yet
Master Troubleshooting Guide For Payment Process Requests
17 pages
Seccionadora A SF6 - Média Tensão
No ratings yet
Seccionadora A SF6 - Média Tensão
4 pages
Apple Inc.
0% (1)
Apple Inc.
7 pages
Comprehensive Linux Training Guide
No ratings yet
Comprehensive Linux Training Guide
2 pages
FASER: Binary Code Similarity Search Through The Use of Intermediate Representations
No ratings yet
FASER: Binary Code Similarity Search Through The Use of Intermediate Representations
12 pages
ACS800 Parameter Copy Guide
No ratings yet
ACS800 Parameter Copy Guide
3 pages
Infini-Solar V Protocol 20170926 (PI18)
No ratings yet
Infini-Solar V Protocol 20170926 (PI18)
8 pages
Professional Audio Equipment Prices
No ratings yet
Professional Audio Equipment Prices
5 pages
DAE Scientific Officer Recruitment 2022
No ratings yet
DAE Scientific Officer Recruitment 2022
34 pages
Dgtl-Brkent-2711 (2020)
No ratings yet
Dgtl-Brkent-2711 (2020)
90 pages
The Bollywood Bride - Pccm1qa PDF
No ratings yet
The Bollywood Bride - Pccm1qa PDF
2 pages
GAZELLE G9202, G9203 600A True RMS Digital Clamp Meter
No ratings yet
GAZELLE G9202, G9203 600A True RMS Digital Clamp Meter
1 page
Nec XN120 Programming Manual
No ratings yet
Nec XN120 Programming Manual
410 pages

DMDW - Association Analysis

Uploaded by

DMDW - Association Analysis

Uploaded by

Association Analysis

Association Rule Mining

TID List of Items

1 Milk, Bread, Cereal

Example of Association Rules

[support = 2%,confidence = 60%]

Rule support and confidence are two measures of rule interestingness.

Association Rule Mining Task:

Mining Association Rules:

A common strategy adopted by many association rule mining algorithms is to

1. Apriori Method: Finding Frequent Itemsets with candidate generation

To improve the efficiency of the level-wise generation of frequent itemsets, an important

It includes two steps:

a) Find all frequent itemsets using Apriori

2. Frequent Pattern(FP)-Growth Approach for Mining Frequent

• Mining multilevel association rules

Multilevel associations involve concepts at different abstraction levels.

1. Mining multilevel association rules

2. Mining Multi-Dimensional Association rules

3. Mining Quantitative Association Rules

Mining Multidimensional Association Rules Using Static Discretization of Quantitative Attributes

 The transformed multidimensional data may be used to construct a data cube.

You might also like