DWM Exp7 C49

Uploaded by

yadneshshende2223

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views11 pages

DWM Exp7 C49

Uploaded by

yadneshshende2223

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

LAB Manual

PART A
(PART A: TO BE REFFERED BY STUDENTS)

Experiment No.07
A.1 Aim:
Implementation of K-means clustering using JAVA or WEKA.

A.2 Prerequisite:
Familiarity with the WEKA tool and programming languages.

A.3 Outcome:
After successful completion of this experiment students will be able
to
Use classification and clustering algorithms of data mining.

A.4 Theory:

THEORY:
K -means is one of the simplest
unsupervised learning algorithms that solve the well-known clustering
problem. The procedure follows a simple and easy way to classify a given
data set through a certain number of clusters (assume k clusters) fixed
apriori. The main idea is to define k centers, one for each cluster. These
centers should be placed in a cunning way because of different location
causes different result. So, the better choice is to place them as much as
possible far away from each other. The next step is to take each point
belonging to a given data set and associate it to the nearest center. When
no point is pending, the first step is completed and an early group age is
done. At this point we need to re-calculate k new centroids as barycenter
of the clusters resulting from the previous step. After we have these k new
centroids, a new binding has to be done between the same data set
points and the nearest new center. A loop has been generated. As a
result of this loop we may notice that the k centers change their location
step by step until no more changes are done or in other words centers
do not move any more.

Algorithmic steps for k-means clustering

Let X = {x ,x ,x ,……..,x } be the set of data points and V =
1 2 3 n
{v ,v ,…….,v } be the set of centers.
1 2 c
1) Randomly select ‘c’ cluster centers.
2) Calculate the distance between each data point and cluster centers.
3) Assign the data point to the cluster center whose distance from the
cluster center is minimum of all the cluster centers.
4) Recalculate the new cluster center using:

where, ‘c ’ represents the number of data points

i
th
in i cluster.
5) Recalculate the distance between each data point and new obtained
cluster centers.
6) If no data point was reassigned then stop, otherwise repeat from step
3).
PART B
(PART B: TO BE COMPLETED BY STUDENTS)

(Students must submit the soft copy as per following segments

within two hours of the practical. The soft copy must be uploaded
on the Blackboard or emailed to the concerned lab in charge
faculties at the end of the practical in case the there is no Black
board access available)

B.1 Software Code written by student:

(Paste your problem statement related to your case study completed during
the 2 hours of practical in the lab here)
# importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('diabetes_csv.csv')
x = dataset.iloc[:, [7, 5]].values
#finding optimal number of clusters using the elbow method
from sklearn.cluster import KMeans
wcss_list= [] #Initializing the list for the values of WCSS
#Using a loop for iterations from 1 to 10.
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)
kmeans.fit(x)
wcss_list.append(kmeans.inertia_)
mtp.plot(range(1, 11), wcss_list)
mtp.title('The Elbow Method Graph')
mtp.xlabel('Number of clusters(k)')
mtp.ylabel('wcss_list')
mtp.show()
#training the K-means model on a dataset
kmeans = KMeans(n_clusters=2, init='k-means++', random_state= 42)
y_predict= kmeans.fit_predict(x)
mtp.scatter(x[y_predict == 0, 0], x[y_predict == 0, 1], s = 100, c = 'blue', label =
'Cluster 1') #for first cluster
mtp.scatter(x[y_predict == 1, 0], x[y_predict == 1, 1], s = 100, c = 'green', label =
'Cluster 2') #for second cluster
mtp.scatter(kmeans.cluster_centers_[:, 0],
kmeans.cluster_centers_[:, 1], s = 300, c = 'yellow', label = 'Centroid')
mtp.title('Clusters of patients')
mtp.xlabel('Age(in years)')
mtp.ylabel('BMI(Body Mass Index)')
mtp.legend()
mtp.show()
B.2 Input and Output:
(Paste diagram of star schema and snowflake schema model related to your
case study in following format )
Jupyter Notebook
Jupyter/Binder:

Jupyter Notebook:
Sample Dataset:

The Elbow Method Graph:

Python Output:
Weka Tool
Weka Output:

B.3 Observations and learning:

(Students are expected to comment on the output obtained with clear
observations and learning for each task/ sub part assigned)
K -means is one of the simplest unsupervised learning algorithms that solve
the well-known clustering problem. A cluster refers to a collection of data points
aggregated together because of certain similarities. The ‘means’ in the K-means
refers to averaging of the data; that is, finding the centroid.
B.4 Conclusion:
(Students must write the conclusion as per the attainment of individual
outcome listed above and learning/observation noted in section B.3)
Hence we’ve successfully implemented K-means clustering through Python as well
as Weka Tool.
B.5 Question of Curiosity
(To be answered by student based on the practical performed and
learning/observations)
1. What is Clustering? Types of clustering? Explain the advantages and
disadvantages of clustering.
Ans:
Clustering
Clustering is the task of dividing the population or data points into several groups
such that data points in the same groups are more similar to other data points
in the same group and dissimilar to the data points in other groups. It is a
collection of objects based on similarity and dissimilarity between them.
Cluster analysis or clustering is the task of grouping a set of objects in such a way
that objects in the same group (called a cluster) are more similar (in some
sense) to each other than to those in other groups (clusters). It is the main task
of exploratory data mining, and a common technique for statistical data
analysis, used in many fields, including machine learning, pattern recognition,
image analysis, information retrieval, bioinformatics, data compression, and
computer graphics.

Types Of Clustering Algorithms

1. Connectivity-based Clustering (Hierarchical clustering)
2. Centroids-based Clustering (Partitioning methods)
3. Distribution-based Clustering
4. Density-based Clustering (Model-based methods)
5. Fuzzy Clustering
6. Constraint-based (Supervised Clustering)
Advantages and Disadvantages of Clustering
The main advantage of a clustered solution is automatic recovery from failure,
that is, recovery without user intervention.
Disadvantages of clustering are complexity and inability to recover from database
corruption.
1. Give the advantages and disadvantages of K- means clustering.
Ans:
Advantages of K-means clustering:
1. Relatively simple to implement.
2. Scales to large data sets.
3. Guarantees convergence.
4. Can warm-start the positions of centroids.
5. Easily adapts to new examples.
6. Generalizes to clusters of different shapes and sizes, such as elliptical clusters.
Disadvantages of K-means clustering:
1. Choosing manually.
2. Being dependent on initial values.
3. Clustering data of varying sizes and density.
4. Clustering outliers.
5. Scaling with several dimensions.

1. How is the number of clusters chosen?

Ans:
A fundamental step for any unsupervised algorithm is to determine the optimal
number of clusters into which the data may be clustered. The Elbow Method is one of
the most popular methods to determine this optimal value of k. Distortion: It is
calculated as the average of the squared distances from the cluster centres of the
respective clusters. Typically, the Euclidean distance metric is used. Inertia: It is the
sum of squared distances of samples to their closest cluster centre. We iterate the
values of k from 1 to 9 and calculate the values of distortions for each value of k and
calculate the distortion and inertia for each value of k in the given range.

Exp 7
No ratings yet
Exp 7
6 pages
DWM Exp8 C49
No ratings yet
DWM Exp8 C49
10 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
26 pages
K-Means Clustering Guide & Python Implementation
No ratings yet
K-Means Clustering Guide & Python Implementation
21 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
Algo
No ratings yet
Algo
59 pages
K Means Clustering
No ratings yet
K Means Clustering
27 pages
Unit 4
No ratings yet
Unit 4
22 pages
Unit 4
No ratings yet
Unit 4
63 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
24 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
ML Seminar
No ratings yet
ML Seminar
37 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
4 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
12 pages
K Clustering
No ratings yet
K Clustering
28 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Kmeansfinal
No ratings yet
Kmeansfinal
16 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
K Mean
No ratings yet
K Mean
7 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
L7 Clustering
No ratings yet
L7 Clustering
58 pages
Kmean Clustering
No ratings yet
Kmean Clustering
3 pages
Analysis and Study of K Means Clustering Algorithm IJERTV2IS70648
No ratings yet
Analysis and Study of K Means Clustering Algorithm IJERTV2IS70648
6 pages
Wa0033.
No ratings yet
Wa0033.
38 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
Eml 10 250825
No ratings yet
Eml 10 250825
91 pages
Session 3-Clustering
No ratings yet
Session 3-Clustering
41 pages
Experiment 10 Vtu ML
No ratings yet
Experiment 10 Vtu ML
5 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
27 pages
Unit 4 Aiml
No ratings yet
Unit 4 Aiml
24 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
20 pages
K-Means Clustering Algorithm: - V - ' Is The Euclidean Distance Between X ' Is The Number of Data Points in I
No ratings yet
K-Means Clustering Algorithm: - V - ' Is The Euclidean Distance Between X ' Is The Number of Data Points in I
3 pages
Aam Unit 4 QB With Answer
No ratings yet
Aam Unit 4 QB With Answer
11 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
10 Marks Questions
No ratings yet
10 Marks Questions
19 pages
02.1 K-Means Example
No ratings yet
02.1 K-Means Example
12 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
ML Unit5 Notes
No ratings yet
ML Unit5 Notes
18 pages
Unit 3 - KmeansClustering
No ratings yet
Unit 3 - KmeansClustering
17 pages
ML-Unit III - K-Means Clustering
No ratings yet
ML-Unit III - K-Means Clustering
22 pages
Unit IV
No ratings yet
Unit IV
96 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
Kmea
No ratings yet
Kmea
53 pages
K-Means Clustering Tutorial
No ratings yet
K-Means Clustering Tutorial
16 pages
C-49 DWM Expt10
No ratings yet
C-49 DWM Expt10
13 pages
DWM Exp9 C49
No ratings yet
DWM Exp9 C49
8 pages
C49 DWM Expt4
No ratings yet
C49 DWM Expt4
14 pages
DWM Exp6 C49
No ratings yet
DWM Exp6 C49
15 pages
Weaving of 3D Fabrics: A Critical Appreciation
0% (1)
Weaving of 3D Fabrics: A Critical Appreciation
59 pages
Supra
No ratings yet
Supra
22 pages
9.2 The Halogen Displacement Reaction and Group 7 Reactivity Trend
No ratings yet
9.2 The Halogen Displacement Reaction and Group 7 Reactivity Trend
4 pages
Joint Inspection Report - Raceways/Trays
No ratings yet
Joint Inspection Report - Raceways/Trays
10 pages
8.2 Homework
No ratings yet
8.2 Homework
2 pages
Console Commands - Official Galactic Civilizations III Wiki
No ratings yet
Console Commands - Official Galactic Civilizations III Wiki
9 pages
371 810 Falk Quadrive Shaft Mounted Drive Interchange Guide
No ratings yet
371 810 Falk Quadrive Shaft Mounted Drive Interchange Guide
12 pages
Technical Data Sheet: Wire Rope Oil
No ratings yet
Technical Data Sheet: Wire Rope Oil
1 page
InfinityPools WP
No ratings yet
InfinityPools WP
8 pages
User's Manual: 1-800-HEATH-US
No ratings yet
User's Manual: 1-800-HEATH-US
35 pages
Modern Rent & Wage Theories Explained
No ratings yet
Modern Rent & Wage Theories Explained
13 pages
MATH4 Q2 Week4 Lesson1-25pages
No ratings yet
MATH4 Q2 Week4 Lesson1-25pages
25 pages
Infosys Campus Recruitment Program Eligibility Criteria For BCA and BSC Graduates
No ratings yet
Infosys Campus Recruitment Program Eligibility Criteria For BCA and BSC Graduates
2 pages
5th Connective Grammar 2-Worksheet 4
No ratings yet
5th Connective Grammar 2-Worksheet 4
4 pages
PB 12.5 Application Techniques
No ratings yet
PB 12.5 Application Techniques
716 pages
AIYB-015-Xxx-031 - AA para Gabinete 1500W DBS de TEMPEL
No ratings yet
AIYB-015-Xxx-031 - AA para Gabinete 1500W DBS de TEMPEL
2 pages
Kinetics of Castor Oil Alkyd Resin Polycondensation Reaction 2157 7048 1000240
No ratings yet
Kinetics of Castor Oil Alkyd Resin Polycondensation Reaction 2157 7048 1000240
8 pages
West Natuna Basin PDF
No ratings yet
West Natuna Basin PDF
12 pages
Final Flux Decay Ukf Full
No ratings yet
Final Flux Decay Ukf Full
7 pages
Khan Chap-3
No ratings yet
Khan Chap-3
37 pages
Lexical Semantics
No ratings yet
Lexical Semantics
32 pages
Earth Science Nat Q
No ratings yet
Earth Science Nat Q
8 pages
Brochur Michelin Racing Tires 2010
100% (1)
Brochur Michelin Racing Tires 2010
20 pages
Rocket Nozzle Design & Analysis
No ratings yet
Rocket Nozzle Design & Analysis
9 pages
Knex Gears Tguide
No ratings yet
Knex Gears Tguide
42 pages
Form 2 Term 2 Maths Schemes
No ratings yet
Form 2 Term 2 Maths Schemes
13 pages
Flasheff2 Quick Guide: The Fe2 Package
No ratings yet
Flasheff2 Quick Guide: The Fe2 Package
7 pages
Elements of Surface Texture
No ratings yet
Elements of Surface Texture
3 pages
Features & Specifications: Ordering Tree Nlight Platform Sensor Switch Jot Photometrics Performance Data
No ratings yet
Features & Specifications: Ordering Tree Nlight Platform Sensor Switch Jot Photometrics Performance Data
9 pages
STPM 2016 Sem 1 Real Ms
No ratings yet
STPM 2016 Sem 1 Real Ms
2 pages

DWM Exp7 C49

Uploaded by

DWM Exp7 C49

Uploaded by

LAB Manual

Algorithmic steps for k-means clustering

where, ‘c ’ represents the number of data points

(Students must submit the soft copy as per following segments

B.1 Software Code written by student:

The Elbow Method Graph:

B.3 Observations and learning:

Types Of Clustering Algorithms

1. How is the number of clusters chosen?

You might also like