0% found this document useful (0 votes)
20 views4 pages

Pranav ML-8

The document outlines a practical exercise on implementing the K-means clustering algorithm in Python, including dataset preparation and visualization of clusters. It describes the theory behind clustering, the algorithm steps, and the analysis of clustering results, emphasizing the importance of selecting the appropriate number of clusters (K). The conclusion highlights the successful implementation and the necessity of using methods like the Elbow method and silhouette scores to determine the optimal K.

Uploaded by

raghaV nama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views4 pages

Pranav ML-8

The document outlines a practical exercise on implementing the K-means clustering algorithm in Python, including dataset preparation and visualization of clusters. It describes the theory behind clustering, the algorithm steps, and the analysis of clustering results, emphasizing the importance of selecting the appropriate number of clusters (K). The conclusion highlights the successful implementation and the necessity of using methods like the Elbow method and silhouette scores to determine the optimal K.

Uploaded by

raghaV nama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

PRACTICAL - 8

Aim:
 To identify/prepare a dataset and implement the K-means clustering algorithm in
Python.
 To visualize the resulting clusters and centroids.
 To analyze the clustering result and study the effect of varying the number of clusters
K.
Theory:
 Clustering is an unsupervised learning task that groups data points so that points in
the same group (cluster) are more similar to each other than to those in other groups.
 K-means seeks to partition nnn observations into K clusters {C1,…,CK} by
minimizing the within-cluster sum of squares (WCSS):

where μi is the centroid of cluster Ci.


 Algorithm steps:
1. Initialize K centroids (randomly select K points).
2. Assign each data point to the nearest centroid.
3. Update each centroid as the mean of points assigned to it.
4. Repeat steps 2–3 until centroids move less than a tolerance or max iterations
reached.
Dataset
 Generated synthetically using sklearn.datasets.make_blobs:
o n_samples = 300
o centers = 4
o random_state = 42
 This produces four well-separated Gaussian clusters in 2D.

Program:
Step 1:
47
Department of Computer Science & Engineering
Student Name: Pranav Nama Enrollment No: EN22CS303035
Step 2:

Step 3:

Step 4:

Step 5:

Step 6:

Step 7:

Step 8:

Step 9:

Step 10:

48
Department of Computer Science & Engineering
Student Name: Pranav Nama Enrollment No: EN22CS303035
Step 11:

Step 12:

Output: K-Means Clustering Result:


The plot shows four clusters (different colors) and their centroids (red X’s)

Analysis of Results
1. Cluster quality
o Clusters are compact and well-separated, reflecting the way the data were
generated.
o Centroids lie near the “centre” of each blob.
2. Effect of changing K
49
Department of Computer Science & Engineering
Student Name: Pranav Nama Enrollment No: EN22CS303035
o Under-clustering (K<4):
 e.g. K=3 merges two true blobs into one cluster → increased WCSS.
o Over-clustering (K>4):
 e.g. K=5 splits a true blob into two smaller clusters → may overfit noise.
o Use the Elbow method (plot WCSS vs. K) to pick the “elbow” point where
adding another cluster yields diminishing returns.
3. Suggested extension
o Compute and plot WCSS for K=1to K=8, identify elbow.
o Compute silhouette scores for different K to assess cluster separation.
Conclusion
 Successful implementation of K-means and visualization of four natural clusters in the
dataset.
 Proper choice of K is crucial: too small merges distinct groups; too large over-splits.
 Elbow and silhouette analyses help select an optimal K.

50
Department of Computer Science & Engineering
Student Name: Pranav Nama Enrollment No: EN22CS303035

You might also like