0% found this document useful (0 votes)

2 views32 pages

L13 Apriori

The document provides an introduction to machine learning and data mining, focusing on association rule discovery and the market-basket model. It explains the process of identifying frequently bought items together and the significance of support and confidence in generating association rules. Additionally, it discusses practical applications and challenges in mining association rules from large datasets.

Uploaded by

anmd.clone2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views32 pages

L13 Apriori

Uploaded by

anmd.clone2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Introduction to

Machine Learning and Data Mining

(Học máy và Khai phá dữ liệu)

Khoat Than
School of Information and Communication Technology
Hanoi University of Science and Technology

2021
2

Contents
¡ Introduction to Machine Learning & Data Mining
¡ Unsupervised learning
¡ Supervised learning
¡ Probabilistic modeling
¡ Data mining
¨ Association rule
¨ Apriori
¡ Practical advice
3
Association rule discovery
¡ Supermarket shelf management: Market-basket model
¨ Goal: identify items that are bought together by sufficiently
many customers.
(tìm ra những sản phẩm mà hay được mua cùng nhau)

¨ Approach: process the sales data collected with barcode

scanners to find dependencies among items.
(xử lý dữ liệu bán hàng đã thu thập bằng máy quét mã vạch để tìm mối liên
hệ giữa các sản phẩm)

¡ A classical rule:
¨ If someone buys diaper and milk,
then he/she is likely to buy beer.
¨ Do not surprised if you find beer packs
next to diapers !

(Adapted from a lecture by Jure Leskovec)

4
The Market-Basket Model
¡ A large set of items TID Items

¨ E.g., things sold in a supermarket 1 Bread, Coke, Milk

2 Beer, Bread
3 Beer, Coke, Diaper, Milk
¡ A large set of baskets 4 Beer, Bread, Diaper, Milk
¡ Each basket 5 Coke, Diaper, Milk
is a small subset of items
¨ e.g., the things one customer buys on one day
¡ Association: a general many-to-many mapping between
two kinds of things (một ánh xạ nhiều-nhiều giữa hai loại đối tượng)
¨ But we ask about connections among “items”, not “baskets”
5
Association rules: approach
¡ Given a set of baskets TID Items
1 Bread, Coke, Milk
¡ Want to discover
association rules 2 Beer, Bread
(tìm tập các luật kết hợp) 3 Beer, Coke, Diaper, Milk

¨ People who bought {x,y,z} 4 Beer, Bread, Diaper, Milk

tend to buy {v,w} 5 Coke, Diaper, Milk

¡ 2 step approach:
¨ Find frequent itemsets
(tìm tập thường xuyên)
Rules discovered:
¨ Generate association rules {Milk} à {Coke}
(sinh các luật kết hợp) {Diaper, Milk} à {Beer}
6
Applications
¡ Items = products; baskets = sets of products someone
bought in one trip to the store
¡ Real market baskets: stores might keep TBs of data about
what customers buy together
¨ Tells how typical customers navigate stores, lets them position
tempting items
¨ Suggests tie-in “tricks”, e.g., run sale on diapers and raise the
price of beer
¨ Need the rule to occur frequently
7
Applications: amazon
People who bought X also bought Y.
8
Frequent itemsets
¡ Question: find sets of items that appear together
“frequently” in baskets
¡ Support_count for itemset I: number of baskets containing
all items in I.
¡ Support for itemset I: fraction of baskets containing I.
¨ Support(I) = support_count(I)/Total number of transactions
¡ Given a support threshold s, TID Items
the sets of items that has 1 Bread, Coke, Milk
support at least s are called 2 Beer, Bread
frequent itemsets.
3 Beer, Coke, Diaper, Milk
(tập thường xuyên là tập những sản
phẩm mà chúng xuất hiện cùng nhau, 4 Beer, Bread, Diaper, Milk
có ngưỡng support ít nhất là s) 5 Coke, Diaper, Milk

Support of {Beer, Bread} = 2/5

9
Frequent itemsets: example
¡ Items = {Milk, Coke, Pepsi, Beer, Juice}
¡ Support threshold = 3/8
¨ B1 = {m, c, b} B2 = {m, p, j}
¨ B3 = {m, b} B4 = {c, j}
¨ B5 = {m, p, b} B6 = {m, c, b, j}
¨ B7 = {c, b, j} B8 = {b, c}
¡ Frequent itemsets: {m}, {c}, {b}, {j},
{m, b}, {b, c}, {c, j}
10
Association rules
¡ Association rules: If-then rules about the content of baskets
¡ {i1, i2, …, ik} ⟹ j: “if a basket contains all of {i1, i2, …, ik}, then
it is likely to contain j”
(nếu một giỏ hàng mà chứa {i1, i2, …, ik} thì nó cũng có thể chứa j)
¡ In practice there are many rules, we want to find significant
or interesting rules.
¡ Confidence of this association rule is the probability of j
given I = {i1, i2, …, ik}
¨ Confidence của luật I à j là xác suất của j với điều kiện I

𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝐼 ∪ 𝑗) 𝑠𝑢𝑝𝑝𝑜𝑟𝑡_𝑐𝑜𝑢𝑛𝑡(𝐼 ∪ 𝑗)
𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝐼 ⟹ 𝑗 = =
𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝐼) 𝑠𝑢𝑝𝑝𝑜𝑟𝑡_𝑐𝑜𝑢𝑛𝑡(𝐼)
11
Interesting association rules
¡ Not all high-confidence rules are interesting
¨ The rule X ⟹ milk may have high confidence for many itemsets
X, because milk is purchased very often (independent of X)
and the confidence will be high
¡ Interest of an association rule I ⟹ j:
difference between its confidence and the fraction of
baskets that contain j.
Interest(I ⟹ j) = confidence(I ⟹ j) – Pr(j)
¨ Interesting rules are those with high positive or negative interest
values (usually above 0.5)
¨ Why?
12
Confidence and Interest: example
¨ B1 = {m, c, b} B2 = {m, p, j}
¨ B3 = {m, b} B4 = {c, j}
¨ B5 = {m, p, b} B6 = {m, c, b, j}
¨ B7 = {c, b, j} B8 = {b, c}
¡ Association rule: {m,b} ⟹ c
¨ Confidence = 2/4 = 0.5
¨ Interest = |0.5 - 5/8| = 1/8
(item c appears in 5/8 of the baskets)
¨ Rule is not very interesting
13
Finding association rules
¡ Problem: find all association rules with support ≥ s and
confidence ≥ c.
¨ Note: support of an association rule is the support of the set of
items on the left side.
¡ Hard part: finding the frequent itemsets
¨ If {i1, i2, …, ik} ⟹ j has high support and confidence, then both
{i1, i2, …, ik} and {i1, i2, …, ik, j} will be frequent.

𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝐼 ∪ 𝑗)
𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝐼 ⟹ 𝑗 =
𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝐼)
14
Mining association rules
¡ Step 1: find all frequent itemsets I
¡ Step 2: rule generation
¨ For every subset A of I, generate rule A ⟹ I \ A
u Since I is frequent, A is also frequent
u Variant 1: Single pass to compute the rule confidence
confidence(A,B ⟹ C,D) = support(A,B,C,D) / support(A,B)
u Variant 2:
u Observation: If A,B,C ⟹ D is below confidence, so is A,B ⟹ C,D
u Can generate “bigger” rules from smaller ones!

¨ Output the rules above the confidence threshold

15
Example
¨ B1 = {m, c, b} B2 = {m, p, j}
¨ B3 = {m, b} B4 = {c, j}
¨ B5 = {m, p, b, c} B6 = {m, c, b, j}
¨ B7 = {c, b, j} B8 = {b, c}
¡ Support_count threshold s = 3, confidence c = 0.75
¨ Frequent itemsets: {b,m}, {b,c}, {c,m}, {c,j}, {m,c,b}
¨ Generate rules:
¨ b ⟹ m: c=4/6 m ⟹ b: c=4/5
¨ b ⟹ c: c=5/6 ...
¨ b,c ⟹ m: c=3/5 b,m ⟹ c: c=3/4
¨ b ⟹ c,m: c=3/6
16
Finding Frequent Itemsets
17
Itemsets: computation model
Backdata
¡ Typically, to finding
is kept infrequent itemsets
flat files rather than in
Item
Item

a database system:
Typically, data is kept in flat files Item
Item

¨ rather
Stored than in a database system:
on disk Item
Item

¨ StoredStored
basket-by-basket
on disk Item
Item
¨ Baskets are small
Stored but we have many baskets
basket-by-basket Item
Item
and many items Item
Baskets are small but we have Item
u Expand baskets into pairs, triples, etc.
many
as you read baskets
baskets and many items
u
Expand
Use k nested baskets
loops into pairs,
to generate triples,
all sets of sizeetc.
k Etc.
as you read baskets
Use k nested loops to generate all
sets of size k
Note: We want to find frequent itemsets. To Items
Items are are positive
positive intege
find We
them, weto
have integers, and
Note: want findto count them.
frequent To count
itemsets. To find them, we and boundaries betwee
boundaries between
them, we have to generate them. basketsareare-1.
–1.
have to count them. To count them, we have to generate them. baskets
18
Computational model
¡ The true cost of mining disk-resident data is usually the
number of disk I/Os
¡ In practice, association-rule algorithms read the data in
passes – all baskets read in turn
¡ We measure the cost by the number of passes an
algorithm makes over the data
19
Main-memory bottleneck
¡ For many frequent-itemset algorithms, main-memory is the
critical resource
¨ As we read baskets, we need to count somethings, e.g.,
occurrences of pairs of items
¨ The number of different things we can count is limited by main
memory
¨ Swapping counts in/out is a disaster (why?)
20
Finding Frequent Pairs
¡ The hardest problem often turns out to be finding the
frequent pairs of items {i1, i2}
¨ Why?
¨ Frequent pairs are common, frequent triples are rare.
¨ Probability of being frequent drops exponentially with size,
number of sets grows more slowly with size.
¡ Let’s first concentrate on pairs, then extend to larger sets
¡ The approach:
¨ We always need to generate all the itemsets.
¨ But we would only like to count (keep track of) those itemsets
that in the end turn out to be frequent.
21
Naïve algorithm
¡ Naïve approach to finding frequent pairs
¡ Read file once, counting in main memory the occurrences
of each pair:
¨ If a basket has n items, then we need to generate n(n-1)/2
pairs.
¡ Fail if (#items)2 exceeds main memory
¡ In practice, #items can be
¨ 100K (Wal-Mart) or 10B (Web pages)
¨ Suppose 105 items, counts are 4-byte integers
¨ Number of pairs of items: 105(105-1)/2 = 5*109
¨ Therefore, 2*1010 (20 gigabytes) of memory needed
22
Counting pairs in memory
¡ Approach 1: count all pairs using a matrix
¡ Approach 2: keep a table of triples
{i, j, c} = “the number of pairs of items {i,j} is c”
¨ If integers and item ids are 4 bytes, we need approximately 12
bytes for pairs with count > 0.
¨ Plus some additional overhead for the hash table.
¡ Note:
¨ Approach 1 only requires 4 bytes per pair
¨ Approach 2 uses 12 bytes per pairs
(but only for pairs with count > 0)
23
Comparing the two approaches

12 per
4 bytes per pair
occurring pair

Triangular Matrix Triples

24
Comparing the two approaches
¡ Approach 1: triangular matrix
¨ n = total number of items
¨ Count pair of items {i,j} only if i < j
¨ Keep pair counts in lexicographic order:
{1,2}, {1,3},..., {1,n}, {2,3}, {2,4},...,{2,n}, {3,4},...
¨ Pair {i,j} is at position (i –1)(n– i/2) + j –1
¨ Total number of pairs n(n –1)/2; total bytes= 2n2
¨ Triangular Matrix requires 4 bytes per pair
¡ Approach 2 uses 12 bytes per counting pair
(but only for pairs with count > 0)
¨ Beats Approach 1 if less than 1/3 of possible pairs actually
occur.
25
Comparing the two approaches
¡ Approach 1: triangular matrix
¨ n = total number of items
¨ Count pair of items {i,j} only if i < j
¨ Problem order:
Keep pair counts in lexicographic is
{1,2}, {1,3},..., {1,n}, {2,3}, {2,4},...,{2,n}, {3,4},...
the pairs do not fit into memory
Pair {i,j} is at position (i –1)(n– i/2) + j –1
if we have too many items.
¨

¨ Total number of pairs n(n –1)/2; total bytes= 2n2

Triangular Matrix requires 4 bytes per pair

Can we do better?
¨

¡ Approach 2 uses 12 bytes per counting pair

(but only for pairs with count > 0)
¨ Beats Approach 1 if less than 1/3 of possible pairs actually
occur.
26
A-Priori algorithm (1)
¡ A two-pass approach
A two-pass called called
approach
A-Priori (by Agrawal and Srikant, 1994)
A-Priori
limits the needlimits thememory
for main need for
¡ Keymain
idea: memory
Monotonicity
Key idea:(đơn điệu)
monotonicity
If a set I of items appears at least b
¨
If a set of items I appears at
times, so does every subset J of I.
least s times, so does every subset J of I
¡ Contrapositive (tương phản):
Contrapositive
If item i has support lessfor pairs:
than s, then no pair including i has
support ≥ 𝑠 i does not appear in s baskets, then no
If item
¡ So, pair
how does A-Prioriifind
including canfrequent in s baskets
appearpairs?

So, how does A-Priori find freq. pairs?

27
A-Priori algorithm (2)
¡ Pass 1: read baskets and count in main memory the
occurrences of each individual item.
¨ Requires only memory proportional to #items
¡ Items with support at least s are the frequent items
¡ Pass 2: Read baskets again and count in main memory
only those pairs where both elements are frequent (from
Pass 1)
¨ Requires memory proportional to square of frequent items only
(for counts)
¨ Plus a list of the frequent items
(so you know what must be counted)
28
Main-Memory: picture of A-Priori

Frequent items
Item counts

Counts of
Main memory

pairs of
frequent items
(candidate
pairs)

Pass 1 Pass 2

1/8/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 33

29
Details of A-Priori
¡ You canuse
You can usethethe
triangular matrix method with n = number
of frequent items
triangular matrix
¨May savewith
method spacencompared
= number Frequent Old
Item counts
with storing triples items item
of frequent items #s
¡ Trick: re-number frequent
May1,2,...
items saveand
space compared
keep a Counts of pairs

Main memory
withrelating
table storingnew
triples
numbers Counts
of of
frequent
Trick: re-number
to original item numbers pairs
itemsof
frequent items
frequent items 1,2,…
and keep a table relating
new numbers to original Pass 1 Pass 2
item numbers
30
For each k, we construct two sets of
Frequent triples, …
k-tuples (sets of size k):
¡ ForCeach k, we construct
k = candidate k-tuplestwo
= sets
thoseof k-tuples (sets
that might beof size k):
¨ Cfrequent setsk-tuples
k = candidate (support > s) that
= those based onbe
might information
frequent sets
(support
from the≥ s)pass
basedforonk–1
information from the pass for k–1
¨ Lk = the set of truly frequent k-tuples
Lk = the set of truly frequent k-tuples
Count All pairs Count
All of items To be
the items the pairs explained
items from L1

C1 Filter L1 Construct C2 Filter L2 Construct C3

1/8/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 35

31
A-Priori: example
¡ Hypothetical steps of the A-Priori algorithm
¨ Generate C1 ={{b} {c} {j} {m} {n} {p}}
¨ Count the support of itemsets in C1
¨ Prune non-frequent to get L1 = { b, c, j, m }
¨ Generate C2 = { {b,c} {b,j} {b,m} {c,j} {c,m} {j,m} }
¨ Count the support of itemsets in C2
¨ Prune non-frequent to get L2 = { {b,m} {b,c} {c,m} {c,j} }
¨ Generate C3 = { {b,c,m} {b,c,j} {b,m,j} {c,m,j} }
¨ Count the support of itemsets in C3
¨ Prune non-frequent to get L3 = { {b,c,m} }
32
A-Priori for All frequent itemsets
¡ One pass for each k (itemset size)
¡ Needs memory to count each candidate k–tuple
¡ For typical market-basket data and reasonable support
(e.g., 1%), k = 2 requires the most memory
¡ Many possible extensions:
¨ Association rules with intervals:
u For example: Men over 65 have 2 cars
¨ Association rules when items are in a taxonomy
u Bread, Butter ⟹ FruitJam
u BakedGoods, MilkProduct ⟹ PreservedGoods
¨ Lower the support s as itemset gets bigger

ch03 Assocrules
No ratings yet
ch03 Assocrules
59 pages
Big Data Analytics AAM Unit 4
No ratings yet
Big Data Analytics AAM Unit 4
80 pages
Association Rules
No ratings yet
Association Rules
58 pages
L2: Frequent Itemsets Mining and Association Rules
No ratings yet
L2: Frequent Itemsets Mining and Association Rules
54 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
Big Data - Week04 - Association Rules
No ratings yet
Big Data - Week04 - Association Rules
46 pages
"Association Rules": Market Baskets Frequent Itemsets A-Priori Algorithm
No ratings yet
"Association Rules": Market Baskets Frequent Itemsets A-Priori Algorithm
30 pages
Association Rules
No ratings yet
Association Rules
56 pages
ch06 Assocrules
No ratings yet
ch06 Assocrules
110 pages
Ch06 Frequent Itemsets
No ratings yet
Ch06 Frequent Itemsets
59 pages
Assoc Rules1
No ratings yet
Assoc Rules1
32 pages
Mining Frequent Patterns
No ratings yet
Mining Frequent Patterns
108 pages
Asssociation Rules: Prof. Sin-Min Lee Department of Computer Science
No ratings yet
Asssociation Rules: Prof. Sin-Min Lee Department of Computer Science
68 pages
02 Assocrules
No ratings yet
02 Assocrules
56 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
178 pages
Class 4-Associative Analysis
No ratings yet
Class 4-Associative Analysis
42 pages
Unit 3
No ratings yet
Unit 3
44 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
4 Frequent Item Set Mining & Association Rules
No ratings yet
4 Frequent Item Set Mining & Association Rules
68 pages
Lecture 10-Assiciation Rule Mining-I-M
No ratings yet
Lecture 10-Assiciation Rule Mining-I-M
30 pages
Unit 2-2
No ratings yet
Unit 2-2
53 pages
Mining Frequent, Patterns, Associations, and Correlations
No ratings yet
Mining Frequent, Patterns, Associations, and Correlations
13 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Unsupervised Learning Essentials
No ratings yet
Unsupervised Learning Essentials
64 pages
Frequent Patterns & Association Rules
No ratings yet
Frequent Patterns & Association Rules
13 pages
Data Mining: Frequent Itemsets & Clustering
No ratings yet
Data Mining: Frequent Itemsets & Clustering
152 pages
Lect 6
No ratings yet
Lect 6
74 pages
38 GM - ASAP-Association Rule Mining
No ratings yet
38 GM - ASAP-Association Rule Mining
64 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Data Mining for Computer Science Students
No ratings yet
Data Mining for Computer Science Students
36 pages
Lec1b Assoc Rules
No ratings yet
Lec1b Assoc Rules
32 pages
Data Mining: Frequent Itemsets and Association Rules
No ratings yet
Data Mining: Frequent Itemsets and Association Rules
105 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
Association Rules and Frequent Item Sets
No ratings yet
Association Rules and Frequent Item Sets
98 pages
33 GM - ASAP-Association Rule Mining
No ratings yet
33 GM - ASAP-Association Rule Mining
64 pages
P8 FPBasic
No ratings yet
P8 FPBasic
53 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
ch06 Assocrules
No ratings yet
ch06 Assocrules
59 pages
BDA Module 5
No ratings yet
BDA Module 5
212 pages
Incremental Rules: Goals For Market-Basket Mining
No ratings yet
Incremental Rules: Goals For Market-Basket Mining
5 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
Data Warehouse and Data Mining - Unit 5
No ratings yet
Data Warehouse and Data Mining - Unit 5
30 pages
A Survey of Association Rule Mining For Customer Relationship Management
No ratings yet
A Survey of Association Rule Mining For Customer Relationship Management
7 pages
VIPDMTheoryChapter 5
No ratings yet
VIPDMTheoryChapter 5
96 pages
Data Mining 2, 3 Material
No ratings yet
Data Mining 2, 3 Material
173 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
Association Rules
No ratings yet
Association Rules
24 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages
Network Access Control Quiz3 PDF
No ratings yet
Network Access Control Quiz3 PDF
2 pages
AJP 22517 Lab Manual All Practical
No ratings yet
AJP 22517 Lab Manual All Practical
121 pages
LABScan3D Quick Reference Guide
No ratings yet
LABScan3D Quick Reference Guide
40 pages
AI & ML Curriculum Overview
No ratings yet
AI & ML Curriculum Overview
66 pages
Blocks Programming Topics
No ratings yet
Blocks Programming Topics
22 pages
Chapter 2
No ratings yet
Chapter 2
20 pages
Samsung PM851 128GB MZ7TE128HMGR 000 Technische Details 48a8d7 PDF
No ratings yet
Samsung PM851 128GB MZ7TE128HMGR 000 Technische Details 48a8d7 PDF
2 pages
Microcontroller66864 Topi 5
No ratings yet
Microcontroller66864 Topi 5
14 pages
Syllabus For SDN
No ratings yet
Syllabus For SDN
3 pages
2777959-Day 8 - Data Wrangling
No ratings yet
2777959-Day 8 - Data Wrangling
2 pages
SIMATIC S7 Data Blocks Guide
No ratings yet
SIMATIC S7 Data Blocks Guide
8 pages
Launching An Instance From A Launch Template
No ratings yet
Launching An Instance From A Launch Template
3 pages
B4A Tutorials
67% (3)
B4A Tutorials
119 pages
SDLC Report
No ratings yet
SDLC Report
38 pages
S5800 Series Switches: Ideal For Enterprise, Data Center, Metro and Hci Networks
No ratings yet
S5800 Series Switches: Ideal For Enterprise, Data Center, Metro and Hci Networks
12 pages
Tingl Deck v08.2.1 Short
No ratings yet
Tingl Deck v08.2.1 Short
9 pages
Oxygen 49 - User Guide - V1.0
100% (1)
Oxygen 49 - User Guide - V1.0
20 pages
Vivek Resume (1) - 1
No ratings yet
Vivek Resume (1) - 1
5 pages
Managing Drill Holes With Torque
100% (1)
Managing Drill Holes With Torque
67 pages
LOADRUNNER Interview Questions & Answers: What Is Load Testing?
No ratings yet
LOADRUNNER Interview Questions & Answers: What Is Load Testing?
27 pages
Dlink DGS310048
No ratings yet
Dlink DGS310048
4 pages
India's E-Commerce Evolution
No ratings yet
India's E-Commerce Evolution
4 pages
Integrative Programming and Technologies 2 June 30 First Lecture
No ratings yet
Integrative Programming and Technologies 2 June 30 First Lecture
11 pages
Epson Stylus Pro 9800 Error Code
No ratings yet
Epson Stylus Pro 9800 Error Code
9 pages
Lab12: Java Database Connectivity: Exercise
100% (1)
Lab12: Java Database Connectivity: Exercise
23 pages
Arcgis 9.3 Installation Tutorial: Important: R
No ratings yet
Arcgis 9.3 Installation Tutorial: Important: R
24 pages
2023 Msce Computer Studies Paper I Theory - Done
50% (2)
2023 Msce Computer Studies Paper I Theory - Done
10 pages
KeyShot7 Manual en
No ratings yet
KeyShot7 Manual en
302 pages
How To Upgrade Your HikCentral Professional by Installing Latest Package...
No ratings yet
How To Upgrade Your HikCentral Professional by Installing Latest Package...
15 pages
CRM and ERP Essentials for Businesses
No ratings yet
CRM and ERP Essentials for Businesses
27 pages

L13 Apriori

Uploaded by

L13 Apriori

Uploaded by

Introduction to

Machine Learning and Data Mining

¨ Approach: process the sales data collected with barcode

(Adapted from a lecture by Jure Leskovec)

¨ E.g., things sold in a supermarket 1 Bread, Coke, Milk

¨ People who bought {x,y,z} 4 Beer, Bread, Diaper, Milk

Support of {Beer, Bread} = 2/5

¨ Output the rules above the confidence threshold

Triangular Matrix Triples

¨ Total number of pairs n(n –1)/2; total bytes= 2n2

Triangular Matrix requires 4 bytes per pair

¡ Approach 2 uses 12 bytes per counting pair

So, how does A-Priori find freq. pairs?

1/8/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 33

C1 Filter L1 Construct C2 Filter L2 Construct C3

1/8/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 35

You might also like