0% found this document useful (0 votes)

68 views11 pages

Assignment ON Data Mining: Submitted by Name: Manjula.T

The document provides information on two data mining techniques: 1) Association rule learning, which discovers interesting relationships between variables in large databases. It identifies strong rules using measures of interestingness like support and confidence. 2) K-medoids clustering, which is similar to K-means but uses actual data points as centroids (medoids). It assigns data points to the closest medoid and tries to minimize the cost by swapping medoids until convergence. An example of applying K-medoids on a sample dataset is given.

Uploaded by

Lõvey Dôvey Ãnanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views11 pages

Assignment ON Data Mining: Submitted by Name: Manjula.T

Uploaded by

Lõvey Dôvey Ãnanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

ASSIGNMENT

ON
DATA MINING

Submitted by
Name: Manjula.T
Roll.no:16suca003
Class: III BCA
1.Association rule:
Association rule learning is a rule-based machine learning method for
discovering interesting relations between variables in large databases. It is intended
to identify strong rules discovered in databases using some measures of
interestingness. For example, the rule found in the sales data of a supermarket
would indicate that if a customer buys onions and potatoes together, they are likely
to also buy hamburger meat. Such information can be used as the basis for
decisions about marketing activities such as, e.g., promotional pricing or product
placements

Example database with 5 transactions and 5 items

transaction ID milk Bread butter beer diapers

1 1 1 0 0 0

2 0 0 1 0 0

3 0 0 0 1 1

4 1 1 1 0 0

5 0 1 0 0 0

Following the original definition by Agrawal, Imieliński, Swami[2] the problem of

association rule mining is defined as:
Let be a set of I={i1,i2,…..,in}binary attributes called items.
Let D={t1,t2,……,tm} be a set of transactions called the database.
Each transaction in has a unique transaction ID and contains a subset of the items
in .
A rule is defined as an implication of the form:
Every rule is composed by two different sets of items, also known as itemsets, X
and Y , where X is called antecedent or left-hand-side (LHS) and
consequent or right-hand-side (RHS).
To illustrate the concepts, we use a small example from the supermarket domain.
The set of items is I={milk,butter,bread,beer,diapers} and in the table is shown a
small database containing the items, where, in each entry, the value 1 means the
presence of the item in the corresponding transaction, and the value 0 represents
the absence of an item in that transaction.
An example rule for the supermarket could be {butter,bread}=>milk meaning that
if butter and bread are bought, customers also buy milk.
Note: this example is extremely small. In practical applications, a rule needs a
support of several hundred transactions before it can be considered statistically
significant, and datasets often contain thousands or millions of transactions.

Useful Concept
In order to select interesting rules from the set of all possible rules, constraints on
various measures of significance and interest are used. The best-known constraints
are minimum thresholds on support and confidence.
Let X be an itemset, X=>Y an association rule and T a set of transactions of a
given database.
Support
Support is an indication of how frequently the itemset appears in the dataset.

The support of X with respect to T is defined as the proportion of transactions

in the dataset which contains the itemset X.

In the example dataset, the itemset X={beer,diapers} has a support of 1/5=0.2

since it occurs in 20% of all transactions (1 out of 5 transactions). The argument
supp() of X is a set of preconditions, and thus becomes more restrictive as it grows
(instead of more inclusive).[3]
Confidence
Confidence is an indication of how often the rule has been found to be true.
The confidence value of a rule, X=>Y , with respect to a set of transactions T, is
the proportion of the transactions that contains X which also contains Y.
Confidence is defined as:
Conf(X=>Y)=supp(X U Y) / supp(X)
For example, the rule {butter,bread}=> milk has a confidence of 0.2/0.2=1 in the
database, which means that for 100% of the transactions containing butter and
bread the rule is correct (100% of the times a customer buys butter and bread, milk
is bought as well).
Lift
The lift of a rule is defined as:
lift(x,y)=supp(XUY) /supp(X) *supp(Y)
or the ratio of the observed support to that expected if X and Y were independent.

Process

Frequent itemset lattice, where the color of the box indicates how many
transactions contain the combination of items. Note that lower levels of the lattice
can contain at most the minimum number of their parents' items; e.g. {ac} can have
only at most items. This is called the downward-closure property.[2]
Association rules are usually required to satisfy a user-specified minimum support
and a user-specified minimum confidence at the same time. Association rule
generation is usually split up into two separate steps:

1. A minimum support threshold is applied to find all frequent itemsets in a

database.
2. A minimum confidence constraint is applied to these frequent itemsets in
order to form rules.
While the second step is straightforward, the first step needs more attention.
Finding all frequent itemsets in a database is difficult since it involves searching all
possible itemsets (item combinations). The set of possible itemsets is the power
set over I and has size 2n-1 (excluding the empty set which is not a valid itemset).
Although the size of the power-set grows exponentially in the number of items n in
I efficient search is possible using the downward-closure property of support (also
called anti-monotonicity) which guarantees that for a frequent itemset, all its
subsets are also frequent and thus no infrequent itemset can be a subset of a
frequent itemset. Exploiting this property, efficient algorithms (e.g., Apriori[8] and
Eclat) can find all frequent itemsets.

2.K-Medoid Algorithm:
K-medoids Clustering Algorithm Partitioning Around Medoids or
the K-medoids algorithm is a partitional clustering algorithm which is slightly
modified from the K-means algorithm. They both attempt to minimize the squared-
error but the K-medoids algorithm is more robust to noise than K-means algorithm.
In K-means algorithm, they choose means as the centroids but in the K-medoids,
data points are chosen to be the medoids. A medoid can be defined as that object of
a cluster, whose average dissimilarity to all the objects in the cluster is minimal.
The difference between k-means and k-medoids is analogous to the difference
between mean and median: where mean indicates the average value of all data
items collected, while median indicates the value around that which all data items
are evenly distributed around it. The basic idea of this algorithm is to first compute
the K representative objects which are called as medoids. After finding the set of
medoids, each object of the data set is assigned to the nearest medoid. That is,
object i is put into cluster vi, when medoid mvi is nearer than any other medoid
mw. The algorithm proceeds in two steps: BUILD-step: This step sequentially
selects k "centrally located" objects, to be used as initial medoids SWAP-step: If
the objective function can be reduced by interchanging (swapping) a selected
object with an unselected object, then the swap is carried out. This is continued till
the objective function can no longer be decreased. The algorithm is as follows: 1.
Initially select k random points as the medoids from the given n data points of the
data set. 2. Associate each data point to the closest medoid by using any of the
most common distance metrics. 3. For each pair of non-selected object h and
selected object i, calculate the total swapping cost TCih. o If TCih < 0, i is replaced
by h 4. Repeat the steps 2-3 until there is no change of the medoids. There are four
situations to be considered in this process: i. Shift-out membership: an object pi
may need to be shifted from currently considered cluster of oj to another cluster; ii.
Update the current medoid: a new medoid oc is found to replace the current
medoid oj ; iii. No change: objects in the current cluster result have the same or
even smaller square error criterion(SEC) measure for all the possible
redistributions considered; iv. Shift-in membership: an outside object pi is assigned
to the current cluster with the new (replaced) medoid ocPAM works efficiently for
small data sets but does not scale well for large data sets. The complexity of this
algorithm is O(k(n-k)2 ). Example: For a given k=2, cluster the following data set
using Pam.

Let us choose that (3, 4) and (7, 4) are the medoids. Suppose considering the
Manhattan distance metric as the distance measure, So, now if we calculate the
distance from each point: For (7, 6), Calculating the distance from the medoids
chosen, this point is nearest to (7, 4) For (2, 6) , Calculating the distance from the
medoids chosen, this point is nearest to (3, 4) For (3, 8) , Calculating the distance
from the medoids chosen, this point is at same distance from both the points. So
choosing that it is nearest to (3, 4) For (8, 5) , Calculating the distance from the
medoids chosen, this point is nearest to (7, 4) For (4, 7) , Calculating the distance
from the medoids chosen, this point is nearest to (3, 4) For (6, 2) , Calculating the
distance from the medoids chosen, this point is nearest to (7, 4) For (7, 3) ,
Calculating the distance from the medoids chosen, this point is nearest to (7, 4) For
(6, 4) , Calculating the distance from the medoids chosen, this point is nearest to
(7, 4) So, now after the clustering, the clusters formed are: {(3,4), (2,6), (3,8),
(4,7)} and{(7,4), (6,2), (6,4), (7,3), (8,5), (7,6)}. Now calculating the cost which is
nothing but the sum of distance of each non-selected point from the selected point
which is medoid of the cluster it belongs to. Total Cost = cost((3, 4), (2, 6)) +
cost((3, 4), (3, 8)) + cost((3, 4), (4, 7)) + cost((7, 4), (6, 2))+ cost((7, 4), (6, 4))+
cost((7, 4), (7, 3))+ cost((7, 4), (8, 5))+ cost((7, 4), (7, 6)) = 3 + 4 + 4 + 3 + 1 + 1 +
2 + 2 = 20. So, now let us choose some other point to be a medoid instead of (7, 4).
Let us randomly choose (7, 3). Not the new medoid set is: (3, 4) and (7, 3). Now
repeating the same task as earlier:

So, now if we calculate the distance from each point: For (7, 6), Calculating the
distance from the medoids chosen, this point is nearest to (7, 3) For (2, 6) ,
Calculating the distance from the medoids chosen, this point is nearest to (3, 4) For
(3, 8) , Calculating the distance from the medoids chosen, this point is nearest to
(3, 4) For (8, 5) , Calculating the distance from the medoids chosen, this point is
nearest to (7, 3) For (4, 7) , Calculating the distance from the medoids chosen, this
point is nearest to (3, 4) For (6, 2) , Calculating the distance from the medoids
chosen, this point is nearest to (7, 3) For (7, 4) , Calculating the distance from the
medoids chosen, this point is nearest to (7, 3) For (6, 4) , Calculating the distance
from the medoids chosen, this point is nearest to (7, 3) Calculating the total cost =
cost((3, 4), (2, 6)) + cost((3, 4), (3, 8)) + cost((3, 4), (4, 7)) + cost((7, 3), (7, 6)) +
cost((7, 3), (8, 5)) + cost((7, 3), (6, 2)) + cost((7, 3), (7, 4)) + cost((7, 3), (6, 4)) = 3
+ 4 + 4 + 3 + 3 + 2 + 1 + 2 = 22. The total cost when (7, 3) is the medoid > the
total cost when (7, 4) was the medoid earlier. Hence, (7, 4) should be chosen
instead of (7, 3) as the medoid. Since there is no change in the medoid set, the
algorithm ends here. Hence the clusters obtained finally are: {(3,4), (2,6), (3,8),
(4,7)} and{(7,4), (6,2), (6,4), (7,3), (8,5), (7,6)}.

3.CLARA concept
Instead of finding medoids for the entire data set, CLARA considers a small sample of the data with fixed
size (sampsize) and applies the PAM algorithm (Chapter @ref(k-medoids)) to generate an optimal set of
medoids for the sample. The quality of resulting medoids is measured by the average dissimilarity
between every object in the entire data set and the medoid of its cluster, defined as the cost function.
CLARA repeats the sampling and clustering processes a pre-specified number of times in order to
minimize the sampling bias. The final clustering results correspond to the set of medoids with the minimal
cost. The CLARA algorithm is summarized in the next section.

CLARA Algorithm:
Input Database of D objects
Repeat for m times
Draw a sample S c D randomly from D
Call pam (S,k) to get K medoids
Classify the entire data set D to c1,c2,c3……Ck
Calculate the quality of clustering as the average dissimilarity
end.
The algorithm is as follow:
1. Split randomly the data sets in multiple subsets with fixed size (sampsize)
2. Compute PAM algorithm on each subset and choose the corresponding k
representative objects (medoids). Assign each observation of the entire data set
to the closest medoid.
3. Calculate the mean (or the sum) of the dissimilarities of the observations to
their closest medoid. This is used as a measure of the goodness of the
clustering.
4. Retain the sub-dataset for which the mean (or sum) is minimal. A further
analysis is carried out on the final partition.
Note that, each sub-data set is forced to contain the medoids obtained from the best
sub-data set until then. Randomly drawn observations are added to this set until
sampsize has been reached.
 medoids: Objects that represent clusters
 clustering: a vector containing the cluster number of each object
 sample: labels or case numbers of the observations in the best sample, that is,
the sample used by the clara algorithm for the final partition,

CLARANS ALGORITHM:
 CLARANS Algorithm CLARANS 1. Input parameters numlocal and
maxneighbor.
 Initialize i to 1, and mincost to a large number. 2. Set current to an arbitrary
node in Gn;k. 3. Set j to 1. 4.
 Consider a random neighbor S of current, and based on 5, calculate the cost
differential of the two nodes. 5. If S has a lower cost, set current to S.
 Otherwise, increment j by 1. If j maxneighbor, go to Step 4. 7.
 Otherwise, when j > maxneighbor, compare the cost of current with mincost.
 If the former is less than mincost, set mincost to the cost of current and set
bestnode to current. 8. Increment i by 1.
 If i > numlocal, output bestnode and halt. Otherwise, go to Step 2.
 Steps 3 to 6 above search for nodes with progressively lower costs.
 But, if the current node has already been compared with the maximum number
of the neighbors of the node (specified by maxneighbor) and is still of the
lowest cost, the current node is declared to be a “local” minimum.
 Then, in Step 7, the cost of this local minimum is compared with the lowest
cost obtained so far.
 The lower of the two costs above is stored in mincost. Algorithm CLARANS
then repeats to search for other local minima, until numlocal of them have been
found.
 As shown above, CLARANS has two parameters: the maximum number of
neighbors examined (maxneighbor) and the number of local minima obtained
(numlocal). The higher the value of maxneighbor, the closer is CLARANS to
PAM, and the longer is each search of a local minima. But, the quality of such
a local minima is higher and fewer local minima needs to be obtained.
 Like many applications of randomized search [16], [17], we rely on
experiments to determine the appropriate values of these parameters

5. Decision tree:

A decision tree is a structure that includes a root node, branches, and leaf nodes.
Each internal node denotes a test on an attribute, each branch denotes the outcome
of a test, and each leaf node holds a class label. The topmost node in the tree is the
root node.
The following decision tree is for the concept buy_computer that indicates
whether a customer at a company is likely to buy a computer or not. Each internal
node represents a test on an attribute. Each leaf node represents a class.
The benefits of having a decision tree are as follows −

 It does not require any domain knowledge.

 It is easy to comprehend.
 The learning and classification steps of a decision tree are simple and fast.
Decision trees used in data mining are of two main types:

 Classification tree analysis is when the predicted outcome is the class to which
the data belongs.
 Regression tree analysis is when the predicted outcome can be considered a
real number (e.g. the price of a house, or a patient's length of stay in a hospital).
The term Classification And Regression Tree (CART) analysis is an umbrella
term used to refer to both of the above procedures, first introduced by Breiman et
al. in 1984.[3] Trees used for regression and trees used for classification have some
similarities - but also some differences, such as the procedure used to determine
where to split.[3]
Some techniques, often called ensemble methods, construct more than one decision
tree:

 Boosted trees Incrementally building an ensemble by training each new

instance to emphasize the training instances previously mis-modeled. A typical
example is AdaBoost. These can be used for regression-type and classification-
type problems.[4][5]
 Bootstrap aggregated (or bagged) decision trees, an early ensemble method,
builds multiple decision trees by repeatedly resampling training data with
replacement, and voting the trees for a consensus prediction.[6]
 A random forest classifier is a specific type of bootstrap aggregating
 Rotation forest – in which every decision tree is trained by first
applying principal component analysis (PCA) on a random subset of the input
features.[7]
A special case of a decision tree is a decision list,[8] which is a one-sided decision
tree, so that every internal node has exactly 1 leaf node and exactly 1 internal node
as a child (except for the bottommost node, whose only child is a single leaf node).
While less expressive, decision lists are arguably easier to understand than general
decision trees due to their added sparsity, permit non-greedy learning
methods[9] and monotonic constraints to be imposed.[10]
Decision tree learning is the construction of a decision tree from class-labeled
training tuples. A decision tree is a flow-chart-like structure, where each internal
(non-leaf) node denotes a test on an attribute, each branch represents the outcome
of a test, and each leaf (or terminal) node holds a class label. The topmost node in a
tree is the root node
******************************************************************

Lab8 Apriori
No ratings yet
Lab8 Apriori
9 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
5 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
DWDM FINAL4
No ratings yet
DWDM FINAL4
37 pages
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
No ratings yet
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
18 pages
Lecture Notes Session-2
No ratings yet
Lecture Notes Session-2
4 pages
DWDM Unit 2 and 3
No ratings yet
DWDM Unit 2 and 3
31 pages
Association Rule Mining Using Apriori Al PDF
No ratings yet
Association Rule Mining Using Apriori Al PDF
11 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
21 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
UNIT 5 Frequent Pattern Mining
No ratings yet
UNIT 5 Frequent Pattern Mining
42 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
12 pages
Unsupervised Learning Essentials
No ratings yet
Unsupervised Learning Essentials
64 pages
Data Mining for CSE Students
No ratings yet
Data Mining for CSE Students
11 pages
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
No ratings yet
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
5 pages
Lecture 11 Assiciation Rules II M
No ratings yet
Lecture 11 Assiciation Rules II M
27 pages
DWM Labno 9 A222
No ratings yet
DWM Labno 9 A222
10 pages
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
No ratings yet
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
27 pages
Unit 2
No ratings yet
Unit 2
14 pages
Data Mining 2, 3 Material
No ratings yet
Data Mining 2, 3 Material
173 pages
A216 - DWM - LAbno 9
No ratings yet
A216 - DWM - LAbno 9
8 pages
Data Mining for Retail Insights
No ratings yet
Data Mining for Retail Insights
44 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
Q) Frequent Itemset Generation: States That If An Itemset Is Frequent, Then All of Its Subsets Must Also Be Frequent. This
No ratings yet
Q) Frequent Itemset Generation: States That If An Itemset Is Frequent, Then All of Its Subsets Must Also Be Frequent. This
9 pages
DWDM Unit 4 (R22)
No ratings yet
DWDM Unit 4 (R22)
25 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
Clustering & Association Algorithms 4
No ratings yet
Clustering & Association Algorithms 4
17 pages
Market Basket Analysis & Apriori Algorithm
No ratings yet
Market Basket Analysis & Apriori Algorithm
10 pages
DWM Unit 4
No ratings yet
DWM Unit 4
11 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
A216 - DWM - LAbno 9
No ratings yet
A216 - DWM - LAbno 9
8 pages
DM Association
No ratings yet
DM Association
43 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Unit 3 - DM FULL
No ratings yet
Unit 3 - DM FULL
46 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Data Mining Notes UNIT III
No ratings yet
Data Mining Notes UNIT III
26 pages
Data Mining: Apriori Algorithm
No ratings yet
Data Mining: Apriori Algorithm
27 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
304A Data Warehousing and Data Mining Unit-3
No ratings yet
304A Data Warehousing and Data Mining Unit-3
17 pages
ARM and Clustering
No ratings yet
ARM and Clustering
79 pages
Mining Frequent Itemsets Using Apriori Algorithm
No ratings yet
Mining Frequent Itemsets Using Apriori Algorithm
5 pages
Data Mining for CSE Students
No ratings yet
Data Mining for CSE Students
11 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
23 pages
Optimization Algorithms For Association Rule Mining (ARM) : K.Indira
No ratings yet
Optimization Algorithms For Association Rule Mining (ARM) : K.Indira
118 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
ADB Slides 5
No ratings yet
ADB Slides 5
52 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
Unit 5 Notes DWM
No ratings yet
Unit 5 Notes DWM
11 pages
Unit 5
No ratings yet
Unit 5
40 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Data Mining-Knowledge Presentation 2: Prof. Sin-Min Lee
No ratings yet
Data Mining-Knowledge Presentation 2: Prof. Sin-Min Lee
54 pages
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
100% (1)
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
15 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages
Unit 4 - Association Analysis
No ratings yet
Unit 4 - Association Analysis
12 pages
Slides With Solutions PDF
No ratings yet
Slides With Solutions PDF
483 pages
HRA
No ratings yet
HRA
3 pages
Parameter Statistic Parameter Population Characteristic Statistic Sample Characteristic
No ratings yet
Parameter Statistic Parameter Population Characteristic Statistic Sample Characteristic
9 pages
Final Summit Homework
No ratings yet
Final Summit Homework
36 pages
Tutorial 2 (Answer)
No ratings yet
Tutorial 2 (Answer)
10 pages
Confounding and Interaction in Regression: 11-1 Preview
No ratings yet
Confounding and Interaction in Regression: 11-1 Preview
9 pages
2420 Finals Notes
No ratings yet
2420 Finals Notes
21 pages
Deep Learning
No ratings yet
Deep Learning
26 pages
Tutorial 8 - Analysis of Variance (ANOVA) : Presented by Eng. Alaa Zarif & Eng. Lobna El Seify
No ratings yet
Tutorial 8 - Analysis of Variance (ANOVA) : Presented by Eng. Alaa Zarif & Eng. Lobna El Seify
19 pages
Economics and Its Stories 1st Edition Amal Sanyal PDF Download
100% (8)
Economics and Its Stories 1st Edition Amal Sanyal PDF Download
132 pages
Module 1 - Introduction To Statistics
No ratings yet
Module 1 - Introduction To Statistics
10 pages
Case Study Research Design
No ratings yet
Case Study Research Design
4 pages
M.A. Economics Program Details
No ratings yet
M.A. Economics Program Details
47 pages
Changes in Student Attitude Toward Physical Education Across A Unit of Instruction
No ratings yet
Changes in Student Attitude Toward Physical Education Across A Unit of Instruction
9 pages
Ssmda End Sem
No ratings yet
Ssmda End Sem
152 pages
Business Statistics Practice Questions
No ratings yet
Business Statistics Practice Questions
8 pages
Measurement of Risk
No ratings yet
Measurement of Risk
19 pages
Decision Risk Analysis Guide
No ratings yet
Decision Risk Analysis Guide
24 pages
Demand Forecasting
0% (1)
Demand Forecasting
56 pages
Shukla Et Al., 2024
No ratings yet
Shukla Et Al., 2024
21 pages
Statistics For Engineers and Scientists, 6th Edition William Navidi Available All Format
0% (1)
Statistics For Engineers and Scientists, 6th Edition William Navidi Available All Format
72 pages
Mathematics: Random Variables
No ratings yet
Mathematics: Random Variables
11 pages
Diagnosis of Operating Conditi
No ratings yet
Diagnosis of Operating Conditi
20 pages
Complete Download How Cancer Crossed The Color Line 1st Edition Keith Wailoo PDF All Chapters
100% (26)
Complete Download How Cancer Crossed The Color Line 1st Edition Keith Wailoo PDF All Chapters
55 pages
Machine Learning: An Applied Econometric Approach: Sendhil Mullainathan and Jann Spiess
No ratings yet
Machine Learning: An Applied Econometric Approach: Sendhil Mullainathan and Jann Spiess
48 pages
Advanced Statistics Non Parametric Statistics
No ratings yet
Advanced Statistics Non Parametric Statistics
64 pages
Forensic Dna Statistic Evett I Weir PDF
No ratings yet
Forensic Dna Statistic Evett I Weir PDF
306 pages
The Definitive Book of Body Language 1st Edition Allan Pease Complete Edition
No ratings yet
The Definitive Book of Body Language 1st Edition Allan Pease Complete Edition
114 pages
Local Brand Satisfaction in Myanmar
No ratings yet
Local Brand Satisfaction in Myanmar
6 pages
Control Charts
No ratings yet
Control Charts
7 pages

Assignment ON Data Mining: Submitted by Name: Manjula.T

Uploaded by

Assignment ON Data Mining: Submitted by Name: Manjula.T

Uploaded by

ASSIGNMENT

Example database with 5 transactions and 5 items

transaction ID milk Bread butter beer diapers

Following the original definition by Agrawal, Imieliński, Swami[2] the problem of

The support of X with respect to T is defined as the proportion of transactions

In the example dataset, the itemset X={beer,diapers} has a support of 1/5=0.2

1. A minimum support threshold is applied to find all frequent itemsets in a

 It does not require any domain knowledge.

 Boosted trees Incrementally building an ensemble by training each new

You might also like