0% found this document useful (0 votes)

31 views22 pages

04-Association Rule Mining

Association Rule Mining involves discovering frequent patterns, associations, or correlations among sets of items, commonly applied in market basket analysis. Key concepts include support and confidence, which measure the frequency and reliability of item associations, respectively. The Apriori Algorithm is a popular method for identifying frequent itemsets and generating association rules based on specified minimum support and confidence thresholds.

Uploaded by

cessmania

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views22 pages

04-Association Rule Mining

Uploaded by

cessmania

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Association Rule Mining

Outline

◘ What is Association Rule Mining?

◘ Confidence and Support
◘ Market Basket Analysis
◘ Apriori Algorithm
◘ Types of Association Rules
– Single-Dimensional vs. Multi-Dimensional Association Rules
– Multi-Level Association Rules
What is Association Rule Mining?

◘ Association Rule Mining:

– Finding frequent patterns, associations, or correlations among sets of items.

◘ First proposed by Agrawal, Imielinski and Swami [AIS93]

◘ Uncover relationships among data

1995 Milk and Zzzz... 1998 Milk and

Diaper sell Diaper sell
together! together!
What is Association Rule Mining?

◘ Analyze all relationships between items.

◘ Example: When people buy diapers they also buy beer 60% of the time
Association Rule Mining
Market Basket Analysis
POS Transactions Co-occurrence of Products
Customer Items Purchased OJ Window Milk Soda Detergent
1 OJ, soda cleaner
2 Milk, OJ, window cleaner OJ 4 1 1 2 1
3 OJ Window cleaner 1 2 1 1 0
4 OJ, detergent, soda Milk 1 1 1 0 0
5 Window cleaner, soda Soda 2 1 0 3 1
Detergent 1 0 0 1 2

Simple patterns:
1. OJ and soda are more likely purchased together
than any other two items
2. Detergent is never purchased with milk or
window cleaner
Association Rule Example 1

◘ Example:
– 5% of customers buy all of them together
– 80% of customers who buy bread and milk also buy cheese

Bread, Milk→ Cheese [support=5%, confidence=80%]

Confidence and Support

I = i1, i2, …, im: set of items

D : database of transactions

◘ Association rule: X  Y
here X  I, Y  I and X Y = .

◘ Rule X  Y has a support s in D

if s% of transactions in D contain X  Y.

◘ Rule X  Y has a confidence c in D

if c% of transactions in D that contain X also contain Y.

Count (X  Y) Count (X  Y)
Supp (X  Y) = Conf (X  Y) =
|D| Count (X)

support ≥ minsup threshold

confidence ≥ minconf threshold
Confidence and Support

◘ Support (s)
– Fraction of transactions that contain both X and Y
Count (X  Y)
Supp (X  Y) =
|D|

◘ Confidence (c)
– Measures how often items in Y appear in transactions that contain X

Count (X  Y) Customer
Conf (X  Y) = buys both
Customer
Count (X) buys diaper

Customer
buys beer
Association Rule Example 2

TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

Diaper → Beer
Support = #{Diaper, Beer} / Total # of record = 3/5
Confidence = # {Diaper, Beer}/ # {Diaper} = 3/4
Association Rule Example 3

1 {{cucumber, parsley, onion, tomato, salt, bread},

2 {tomato, cucumber, parsley},
3 {tomato, cucumber, olives, onion, parsley},
4 {tomato, cucumber, onion, bread},
5 {tomato, salt, onion},
6 {bread, cheese}
7 {tomato, cheese, onion, cucumber}
8 {bread, butter}}

Count (X  Y)
◘ Supp(tomato  onion) = 5 / 8 = 0.62 Supp (X  Y) =
|D|
◘ Conf(tomato  onion) = 5 / 6 = 0.83 Count (X  Y)
Conf (X  Y) =
Count (X)
Association Rule Example 4
Transaction ID Items Bought
1000 A,B,C Count (X  Y)
Supp (X  Y) =
2000 A,C |D|
3000 A,D Count (X  Y)
4000 B,E,F Conf (X  Y) =
Count (X)

A  C (50%, 66.6%)
C  A (50%, 100%)
Min. support 50%
Frequent Itemset Support
{A} 75%
{B} 50%
{C} 50%
{A,C} 50%
Association Rule Example 5
TransID Items
1 cucumber, parsley, onion, tomato, salt, bread
2 tomato, cucumber, parsley
3 tomato, cucumber, olives, onion, parsley
4 tomato, cucumber, onion, bread
5 tomato, salt, onion
6 bread, cheese
7 tomato, cheese, onion, cucumber
8 bread, butter

◘ Find all relations with minimum support 50%

{bread}
Count (X  Y)
{cucumber} Supp (X  Y) =
{onion} |D|
{tomato}
{cucumber, onion}
{cucumber, tomato}
{onion, tomato}
{cucumber, onion, tomato}
Association Rule Example 7

◘ Assume: t1: Beef, Chicken, Milk

minsup = 30% t2: Beef, Cheese
minconf = 80% t3: Cheese, Boots
t4: Beef, Chicken, Cheese
◘ An example frequent itemset: t5: Beef, Chicken, Clothes, Cheese, Milk
t6: Chicken, Clothes, Milk
{Chicken, Clothes, Milk} [sup = 3/7]
t7: Chicken, Milk, Clothes

◘ Association rules from the itemset:

Clothes → Milk, Chicken [sup = 3/7, conf = 3/3]
Milk → Clothes, Chicken [sup = 3/7, conf = 3/4]
Chicken → Clothes, Milk [sup = 3/7, conf = 3/5]
… …
Clothes, Chicken → Milk [sup = 3/7, conf = 3/3]
Finding Association Rules

◘ A user can ask for rules with minimum support minSup and
minimum confidence minConf.

– Firstly, all frequent itemsets with support > minSup are computed.

– Secondly, rules are generated using the frequent itemsets, and

checked for minConf.
The Apriori Algorithm
Input
The market base transaction dataset.
Process
– Determine large 1-itemsets.
– Repeat until no new large 1-itemsets are identified.
– Generate (k+1) length candidate itemsets from length k large itemsets.
– Prune candidate itemsets that are not large.
– Count the support of each candidate itemset.
– Eliminate candidate itemsets that are small.
Output
Itemsets that are “large” and qualify the min support and min confidence
thresholds. Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
The Apriori Algorithm — Example

Item Count Items (1-itemsets)

Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4
{Bread,Beer} 2
Eggs 1
{Bread,Diaper} 3
{Milk,Beer} 2
{Milk,Diaper} 3
{Beer,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)
Itemset Count
{Bread,Milk,Diaper} 3
{Milk,Diaper,Beer} 2
Apriori Example

1-itemset Pruning

2-itemset Pruning

3-itemset

Minimum Support : 2
Apriori Principle
● Apriori Principle: Candidate Pruning

If {a,b} is infrequent,
then all it supersets
are infrequent
Applications

◘ Market Basket Analysis

– Input: Basket Data - Collection of records consisting of transaction identifier
and the items bought in a transaction.
– Items: Products
– Goal: To identify items that are bought together

◘ Supermarket Shelf Management

– For Example:
Many customers (%75) who buy diapers also buy beer on every Friday.

Moved the beer and snacks Increased sales on peanuts and

such as peanuts and pretzels → pretzels by more that 27%
next to the diapers
Applications

◘ Text Mining
– Input: documents;
– Items: words in those documents
– Goal: Find words that appear together unusually frequently, i.e. linked concepts.

◘ Web Mining
– Input: Web pages (a web page p)
– Items: Pages that link to p .
– Goal: Pages with many of the same links may be mirrors or about the same topic.

◘ Healty and Drug Industry

– What kinds of DNA are sensitive to this new drug?

◘ Other Applications
– Catalog design
– Promotion decision
– Genomic Data
– Customer Profiling
Single-Dimensional vs Multi-Dimensional

◘ Single-Dimensional Rules:
– buys(X, “milk”)  buys(X, “bread”)

◘ Multi-Dimensional rules:
 2 dimensions or predicates

– age(x, “30..39”) ^ income(x, “42..48K”) → buys(x, “car”) [1%, 75%]

– Inter-dimension assoc. rules (no repeated predicates)

• age(X,”19-25”)  occupation(X,“student”)  buys(X,“coke”)

– hybrid-dimension assoc. rules (repeated predicates)

• age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”)
Multilevel Association Rules

all

computer
computer software printer
accessory

financial
desktop laptop educational color b/w wrist pad mouse
management

IBM Dell Sony Toshiba Microsoft … … HP … Sony … Ergoway Logitech

computer → printer [support=12%, confidence = 70%] (Level1)

desktop computer → color printer [support=8%, confidence = 70%] (Level2)
IBM desktop computer → HP color printer [support=2%, confidence=72%] (Level3)
Reduce Minimum Support

CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
30 pages
Data Mining: Association Rules
No ratings yet
Data Mining: Association Rules
43 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
Data Mining: Association
No ratings yet
Data Mining: Association
41 pages
Association Rules
No ratings yet
Association Rules
39 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Association Rule
No ratings yet
Association Rule
17 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
6 - Association Rules - For Students
No ratings yet
6 - Association Rules - For Students
39 pages
DM Association
No ratings yet
DM Association
43 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
UNIT 2 Updated
No ratings yet
UNIT 2 Updated
50 pages
Class 4-Associative Analysis
No ratings yet
Class 4-Associative Analysis
42 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
Association Rules & Frequent Itemsets: The Market-Basket Problem
No ratings yet
Association Rules & Frequent Itemsets: The Market-Basket Problem
5 pages
Data Analysis (No Free Launch Theorem)
No ratings yet
Data Analysis (No Free Launch Theorem)
8 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
44 pages
New Microsoft Power Point Presentation
No ratings yet
New Microsoft Power Point Presentation
18 pages
Unit-2 Dma
No ratings yet
Unit-2 Dma
68 pages
Arm PPT
No ratings yet
Arm PPT
15 pages
Lesson #9
No ratings yet
Lesson #9
18 pages
Slides
No ratings yet
Slides
92 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Dmunit 2
No ratings yet
Dmunit 2
85 pages
Association Rule Mining
No ratings yet
Association Rule Mining
8 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
Association
No ratings yet
Association
54 pages
Unit 2
No ratings yet
Unit 2
14 pages
Mod 4 Part1 - Merged
No ratings yet
Mod 4 Part1 - Merged
104 pages
DM-M4.1-Association v25.4.2
No ratings yet
DM-M4.1-Association v25.4.2
40 pages
Mining: Association Rules
No ratings yet
Mining: Association Rules
54 pages
Clickstream Analytics
No ratings yet
Clickstream Analytics
22 pages
06 FPBasic
No ratings yet
06 FPBasic
77 pages
COS10022 DSP Week06 Association Rules
No ratings yet
COS10022 DSP Week06 Association Rules
52 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
CH 5
No ratings yet
CH 5
53 pages
Data Mining Association Analysis
No ratings yet
Data Mining Association Analysis
18 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Association Rule Mining
No ratings yet
Association Rule Mining
26 pages
6 Asso
No ratings yet
6 Asso
37 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
Big Data Analytics Unit3
No ratings yet
Big Data Analytics Unit3
27 pages
Association Rule
No ratings yet
Association Rule
22 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
New Association Rule
No ratings yet
New Association Rule
37 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
10 pages
Topic 03 - Mining Association Rules
No ratings yet
Topic 03 - Mining Association Rules
12 pages
Market Basket Analysis & Apriori Algorithm
No ratings yet
Market Basket Analysis & Apriori Algorithm
10 pages
Data Mining and Data Analytics Unit-II
No ratings yet
Data Mining and Data Analytics Unit-II
26 pages
UNIT 5 Frequent Pattern Mining
No ratings yet
UNIT 5 Frequent Pattern Mining
42 pages
RDBMS: Relational Database Management System
No ratings yet
RDBMS: Relational Database Management System
32 pages
Accenture - Azure Data Engineer - 3+
No ratings yet
Accenture - Azure Data Engineer - 3+
4 pages
SQ L Questions
No ratings yet
SQ L Questions
6 pages
MCQ Unit-3
No ratings yet
MCQ Unit-3
18 pages
Intercollegiate Athletic DB Guide
No ratings yet
Intercollegiate Athletic DB Guide
4 pages
Session7: Using SQL in VB6.0: G@small T@top S@big St@lrgte M@middle
No ratings yet
Session7: Using SQL in VB6.0: G@small T@top S@big St@lrgte M@middle
7 pages
Tantra Udgama 2K25-Committees
No ratings yet
Tantra Udgama 2K25-Committees
3 pages
Day18-Recommendation Engine
No ratings yet
Day18-Recommendation Engine
3 pages
Mock Test-RTO & RPO (CISA-Domain-2)
No ratings yet
Mock Test-RTO & RPO (CISA-Domain-2)
9 pages
0303 Sop It 00018
No ratings yet
0303 Sop It 00018
8 pages
Operate Spreadsheet Application
No ratings yet
Operate Spreadsheet Application
7 pages
Amazon S3 - Comprehensive Notes
No ratings yet
Amazon S3 - Comprehensive Notes
19 pages
NetBackup Command Reference Guide
No ratings yet
NetBackup Command Reference Guide
4 pages
Session 10 Data Structures
No ratings yet
Session 10 Data Structures
18 pages
Cs Project File
No ratings yet
Cs Project File
22 pages
B.Tech DBMS Syllabus & Course Outline
No ratings yet
B.Tech DBMS Syllabus & Course Outline
16 pages
Database Concepts for CS Students
No ratings yet
Database Concepts for CS Students
26 pages
DB Ass2
No ratings yet
DB Ass2
3 pages
Syllabus Hadoop
No ratings yet
Syllabus Hadoop
4 pages
MySQL DBA Essentials
No ratings yet
MySQL DBA Essentials
24 pages
An Introduction To Database Systems Bipin C.desaI
No ratings yet
An Introduction To Database Systems Bipin C.desaI
849 pages
Quick Reference Guide-OrACLE SUCRITY
No ratings yet
Quick Reference Guide-OrACLE SUCRITY
2 pages
Manual Installation Guide ParaDM Document Management
No ratings yet
Manual Installation Guide ParaDM Document Management
113 pages
Chapter 1 DBMS DJSCE
No ratings yet
Chapter 1 DBMS DJSCE
27 pages
Some Common Essbase Errors
No ratings yet
Some Common Essbase Errors
7 pages
Unit 3
No ratings yet
Unit 3
35 pages
CP4152 Database Practices Answer Key 2 IAT2
No ratings yet
CP4152 Database Practices Answer Key 2 IAT2
24 pages
Data Integration
No ratings yet
Data Integration
26 pages
Ids Course Content
No ratings yet
Ids Course Content
98 pages
Oracle.1z0-071.v2020-01-19.q80: Leave A Reply
100% (1)
Oracle.1z0-071.v2020-01-19.q80: Leave A Reply
42 pages

04-Association Rule Mining

Uploaded by

04-Association Rule Mining

Uploaded by

Association Rule Mining

◘ What is Association Rule Mining?

◘ Association Rule Mining:

◘ First proposed by Agrawal, Imielinski and Swami [AIS93]

◘ Uncover relationships among data

1995 Milk and Zzzz... 1998 Milk and

◘ Analyze all relationships between items.

Bread, Milk→ Cheese [support=5%, confidence=80%]

I = i1, i2, …, im: set of items

◘ Rule X  Y has a support s in D

◘ Rule X  Y has a confidence c in D

support ≥ minsup threshold

1 {{cucumber, parsley, onion, tomato, salt, bread},

◘ Find all relations with minimum support 50%

◘ Assume: t1: Beef, Chicken, Milk

◘ Association rules from the itemset:

– Secondly, rules are generated using the frequent itemsets, and

Item Count Items (1-itemsets)

◘ Market Basket Analysis

◘ Supermarket Shelf Management

Moved the beer and snacks Increased sales on peanuts and

◘ Healty and Drug Industry

– age(x, “30..39”) ^ income(x, “42..48K”) → buys(x, “car”) [1%, 75%]

– Inter-dimension assoc. rules (no repeated predicates)

– hybrid-dimension assoc. rules (repeated predicates)

IBM Dell Sony Toshiba Microsoft … … HP … Sony … Ergoway Logitech

computer → printer [support=12%, confidence = 70%] (Level1)

You might also like