0% found this document useful (0 votes)

24 views8 pages

A216 - DWM - LAbno 9

Uploaded by

kratikpaliwal20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views8 pages

A216 - DWM - LAbno 9

Uploaded by

kratikpaliwal20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Experiment No.

9
Name: -KRATIK PALIWAL Part A

Roll No: - A216

Aim:Implement a program for K-Means Clustering algorithm using WEKA

Tool.

Prerequisite: Database

Outcome: To impart knowledge of Data Mining& its techniques

Theory:

Data Mining Definition

Data mining not only finds out but also provides analysis of the hidden patterns of data in a
data warehouse. Data mining aims at exploring knowledge from, data warehouses it
organizes data in a manner so that it can derive the inherent meaning to contribute in the
knowledge base. The data from a database or a data warehouse is first sorted to prepare the
target data and then analysed to find out the structure, correlations and the meaning that it
contains. It has immense applications in the finance industry, healthcare industry and the
intelligence industry.

K-means Algorithm:
Part B

Code:
Output:(Paste screen shot of Output)
Conclusion:

Clustering is a powerful technique for discovering patterns and similarities in data. By

grouping similar data points together, clustering can help in various tasks such as customer
segmentation, anomaly detection, and pattern recognition. However, it is important to
choose the right clustering algorithm, similarity measure, and number of clusters to ensure
that the results are meaningful and useful.

Questions:
Q1. Explain K-Medoid Algorithm?

1. Choose k number of random points from the data and assign these k points to
k number of clusters. These are the initial medoids.

2. For all the remaining data points, calculate the distance from each medoid and
assign it to the cluster with the nearest medoid.

3. Calculate the total cost (Sum of all the distances from all the data points to the
medoids)

4. Select a random point as the new medoid and swap it with the previous
medoid. Repeat 2 and 3 steps.

5. If the total cost of the new medoid is less than that of the previous medoid,
make the new medoid permanent and repeat step 4.

6. If the total cost of the new medoid is greater than the cost of the previous
medoid, undo the swap and repeat step 4.

7. The Repetitions have to continue until no change is encountered with new

medoids to classify data points.

In the K-Means algorithm, given the value of k and unlabelled data:

Choose k number of random points (Data point from the data set or some other
points). These points are also called"Centroids" or "Means".

Assign all the data points in the data set to the closest centroid by applying any
distance formula like Euclidian distance, Manhattan distance, etc.

Now, choose new centroids by calculating the mean of all the data points in the
clusters and goto step 2

Continue step 3 until no data point changes classification between two iterations.

The problem with the K-Means algorithm is that the algorithm needs to handle
outlier data. An outlier is a point different from the rest of the points. All the outlier
data points show up in a different cluster and will attract other clusters to merge with
it. Outlier data increases the mean of a cluster by up to 10 units. Hence, K-Means
clustering is highly affected by outlier data.

Q2. Discuss about Association Rule Mining: Also define Support & Confidence?
Association Rule Mining is a data mining technique used to identify relationships
between items in a dataset. It is often used in market basket analysis to discover
patterns in consumer purchasing behavior. The goal is to find sets of items that
frequently co-occur in transactions, indicating that they are likely to be bought
together.

Support and confidence are two important metrics used in association rule mining:

1. Support: The support of an itemset is the proportion of transactions in the

dataset that contain that itemset. It measures the frequency of occurrence of an
itemset in the dataset. Mathematically, it is defined as:

Support(A) = (Transactions containing A) / (Total transactions)

For example, if there are 100 transactions in a dataset and 50 of them contain item A,
then the support of A is 50/100 = 0.5.

2. Confidence: The confidence of a rule A → B is the conditional probability that

B occurs in a transaction given that A occurs in that transaction. It measures the
strength of the association between A and B. Mathematically, it is defined as:

Confidence(A → B) = Support(A ∪ B) / Support(A)

For example, if the support of {A, B} is 0.3 and the support of A is 0.5, then the
confidence of the rule A → B is 0.3 / 0.5 = 0.6.

Association rule mining typically involves finding rules that have both high support
and high confidence. High support indicates that the rule is applicable to a large
number of transactions, while high confidence indicates that the rule is reliable.

A216 - DWM - LAbno 9
No ratings yet
A216 - DWM - LAbno 9
8 pages
DWM Labno 9 A222
No ratings yet
DWM Labno 9 A222
10 pages
Clustering & Association Algorithms 4
No ratings yet
Clustering & Association Algorithms 4
17 pages
Assignment ON Data Mining: Submitted by Name: Manjula.T
No ratings yet
Assignment ON Data Mining: Submitted by Name: Manjula.T
11 pages
Complete Referenec of Sementics
No ratings yet
Complete Referenec of Sementics
6 pages
Predictive Analysis 5
No ratings yet
Predictive Analysis 5
8 pages
Model Question Paper 2
No ratings yet
Model Question Paper 2
7 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
Data Ming
No ratings yet
Data Ming
28 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
21 pages
ARM and Clustering
No ratings yet
ARM and Clustering
79 pages
Data Mining: Patterns & Clustering
No ratings yet
Data Mining: Patterns & Clustering
9 pages
Data Mining Algorithms Guide
No ratings yet
Data Mining Algorithms Guide
8 pages
Unit 5
No ratings yet
Unit 5
38 pages
Shivwangi Banerjee (ML)
No ratings yet
Shivwangi Banerjee (ML)
8 pages
Clustring Data Mining
No ratings yet
Clustring Data Mining
21 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
DWM
No ratings yet
DWM
66 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
Data Mining Suggestions
No ratings yet
Data Mining Suggestions
5 pages
Unsupervised Machine Learning Techniques
No ratings yet
Unsupervised Machine Learning Techniques
58 pages
Lec 02
No ratings yet
Lec 02
33 pages
Data Mining & Machine Learning Guide
No ratings yet
Data Mining & Machine Learning Guide
19 pages
Clustering Algorithm and Analyasis
No ratings yet
Clustering Algorithm and Analyasis
12 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
Business Intelligence DM5 Unsupervised Learning - Updated sl28-29
No ratings yet
Business Intelligence DM5 Unsupervised Learning - Updated sl28-29
31 pages
Lecturenotes Data Mining
No ratings yet
Lecturenotes Data Mining
23 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
22 pages
DWM - LAbno 9
No ratings yet
DWM - LAbno 9
3 pages
Data Mining Tools: Done by Muthurathna G (10AD19)
No ratings yet
Data Mining Tools: Done by Muthurathna G (10AD19)
27 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
A Dynamic K-Means Clustering For Data Mining
No ratings yet
A Dynamic K-Means Clustering For Data Mining
6 pages
Unsupervised ML
No ratings yet
Unsupervised ML
15 pages
Introduction Lecture1gghhhhh
No ratings yet
Introduction Lecture1gghhhhh
23 pages
Unit 4
No ratings yet
Unit 4
5 pages
Data Mining for Business Insights
100% (3)
Data Mining for Business Insights
11 pages
DM Unit Wise Important Questions
No ratings yet
DM Unit Wise Important Questions
6 pages
Datamining & Cluster Coputing
No ratings yet
Datamining & Cluster Coputing
16 pages
Unit Ii DM
No ratings yet
Unit Ii DM
82 pages
DM Passing Package
No ratings yet
DM Passing Package
38 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
45 pages
Intro to Cluster Analysis
No ratings yet
Intro to Cluster Analysis
90 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
24 pages
Data Mining: Techniques & Applications
No ratings yet
Data Mining: Techniques & Applications
16 pages
Knowledge Engineering Report: Apriori Algorithm
0% (1)
Knowledge Engineering Report: Apriori Algorithm
13 pages
A Dynamic K-Means Clustering For Data Mining-Dikonversi
No ratings yet
A Dynamic K-Means Clustering For Data Mining-Dikonversi
6 pages
AI Clustering Techniques Guide
No ratings yet
AI Clustering Techniques Guide
2 pages
Importance of Clustering
No ratings yet
Importance of Clustering
5 pages
Clustering in AI
No ratings yet
Clustering in AI
16 pages
DM Notes
No ratings yet
DM Notes
91 pages
Cluster Analysis for Marketing
No ratings yet
Cluster Analysis for Marketing
25 pages
Apriori Algorithm for Purchase Analysis
No ratings yet
Apriori Algorithm for Purchase Analysis
9 pages
Clustering Deep Dive
No ratings yet
Clustering Deep Dive
8 pages
Fundamentals of Data Mining
No ratings yet
Fundamentals of Data Mining
36 pages
Unit 2 (DWDM)
No ratings yet
Unit 2 (DWDM)
54 pages
Data Mining, Data Warehousing and Knowledge Discovery
No ratings yet
Data Mining, Data Warehousing and Knowledge Discovery
70 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
Paper - Xvii Data Mining and Warehousing
No ratings yet
Paper - Xvii Data Mining and Warehousing
140 pages
Data Mining Course Syllabus
No ratings yet
Data Mining Course Syllabus
8 pages
What Are The Data Mining Model
No ratings yet
What Are The Data Mining Model
1 page
6 Asso
No ratings yet
6 Asso
37 pages
Data Mining Techniques: Introductory and Advanced Topics
100% (1)
Data Mining Techniques: Introductory and Advanced Topics
17 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
48 pages
Bloom Filter & Algorithms Guide
No ratings yet
Bloom Filter & Algorithms Guide
9 pages
Dataanalytics Unit-4
No ratings yet
Dataanalytics Unit-4
23 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Data Science
No ratings yet
Data Science
85 pages
DWM Paper (S2022) Solution
No ratings yet
DWM Paper (S2022) Solution
7 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
29 pages
Unit-2 Data Mining
No ratings yet
Unit-2 Data Mining
23 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
MCQ On Data Mining
No ratings yet
MCQ On Data Mining
20 pages
17CS651 DMDW
No ratings yet
17CS651 DMDW
302 pages
Data Mining and Data Warehousing: Unit - III Association Rules
No ratings yet
Data Mining and Data Warehousing: Unit - III Association Rules
19 pages
Computers
No ratings yet
Computers
167 pages
Apriori Algorithm for Association Rules
No ratings yet
Apriori Algorithm for Association Rules
64 pages
Datamining Lab Record
No ratings yet
Datamining Lab Record
36 pages
6.DMBI Question Bank PDF
No ratings yet
6.DMBI Question Bank PDF
12 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
122 pages
TYBSc (CS) WT - Deleted (2) - Removed - Removed - Removed - Removed
No ratings yet
TYBSc (CS) WT - Deleted (2) - Removed - Removed - Removed - Removed
11 pages
Btech Sem6 Cs1141 Data Mining
No ratings yet
Btech Sem6 Cs1141 Data Mining
5 pages
It 404 Data Analytics Semester Viii Assignment - I (Modules I & Ii)
No ratings yet
It 404 Data Analytics Semester Viii Assignment - I (Modules I & Ii)
3 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
16 pages
Ijebea14 141
No ratings yet
Ijebea14 141
7 pages
Feedback The Correct Answer Is:analysis of Time Series
No ratings yet
Feedback The Correct Answer Is:analysis of Time Series
42 pages
Structured Format of Predictive
No ratings yet
Structured Format of Predictive
13 pages
Dynamic Itemset Counting in Market Basket Analysis
No ratings yet
Dynamic Itemset Counting in Market Basket Analysis
10 pages
Data Science & DBMS Problem Sets
No ratings yet
Data Science & DBMS Problem Sets
21 pages

A216 - DWM - LAbno 9

Uploaded by

A216 - DWM - LAbno 9

Uploaded by

Experiment No.

Roll No: - A216

Aim:Implement a program for K-Means Clustering algorithm using WEKA

Outcome: To impart knowledge of Data Mining& its techniques

Data Mining Definition

Clustering is a powerful technique for discovering patterns and similarities in data. By

7. The Repetitions have to continue until no change is encountered with new

In the K-Means algorithm, given the value of k and unlabelled data:

1. Support: The support of an itemset is the proportion of transactions in the

Support(A) = (Transactions containing A) / (Total transactions)

2. Confidence: The confidence of a rule A → B is the conditional probability that

Confidence(A → B) = Support(A ∪ B) / Support(A)

You might also like