Experiment 10 Vtu ML

The document outlines the implementation of K-Means clustering using the Wisconsin Breast Cancer dataset, detailing the algorithm's steps, advantages, and challenges. It includes methods for selecting the optimal number of clusters, distance metrics, and visualization techniques for the clustering results. A Python program is provided to execute the clustering and visualize the outcomes using scatter plots and PCA.

Uploaded by

vikasvikki158

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views5 pages

Experiment 10 Vtu ML

Uploaded by

vikasvikki158

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Experiment 10

Develop a program to implement k-means clustering using Wisconsin Breast Cancer

data set and visualize the clustering result.

Introduction to K-Means Clustering

What is Clustering?
Clustering is an unsupervised machine learning technique used to group data points into
clusters based on their similarity. The goal is to identify hidden patterns or natural
groupings in the data.
One of the most widely used clustering algorithms is K-Means Clustering, which divides the
dataset into K clusters, where each data point belongs to the nearest cluster center.

What is K-Means Clustering?

K-Means is a centroid-based clustering algorithm that partitions data into K clusters by
minimizing the variance within each cluster.

Working of K-Means Algorithm

1. Choose the number of clusters (K).
2. Randomly initialize K cluster centroids.
3. Assign each data point to the nearest centroid based on distance (e.g., Euclidean
distance).
4. Update the centroids by computing the mean of all points assigned to each cluster.
5. Repeat Steps 3 and 4 until convergence (when centroids no longer change
significantly).

Mathematical Representation
The objective is to minimize the sum of squared distances (SSD) between data points
and their assigned cluster centroid

where:
K = Number of clusters
xj = Data point
μi = Centroid of cluster Ci
Choosing the Optimal Number of Clusters (K)
Selecting the right value of K is crucial. Some common methods include:
1. Elbow Method:
Plots the within-cluster sum of squares (WCSS) for different K values.
The "elbow point" where WCSS stops decreasing significantly is chosen as the
optimal K.
2. Silhouette Score:
Measures how well-separated the clusters are.
A higher score indicates better clustering.
3. Gap Statistics:
Compares clustering performance to randomly generated reference data.

Distance Metrics in K-Means

K-Means typically uses Euclidean Distance to measure how close a data point is to a
centroid:

Other distance metrics include:

Manhattan Distance
Cosine Similarity
Mahalanobis Distance

Advantages of K-Means Clustering

Efficient and Scalable – Works well with large datasets.
Easy to Implement – Simple and interpretable.
Handles High-Dimensional Data – Can work on complex datasets

Challenges of K-Means Clustering

• Sensitive to Initial Centroid Selection – Different initializations may lead to different
results.
• Not Suitable for Non-Spherical Clusters – Assumes clusters are circular and evenly
sized.
• Outliers Affect Centroids – Presence of outliers can distort clustering results.
Visualization of Clusters
After applying K-Means Clustering, the results can be visualized using:
Scatter Plots: Plot the clusters with different colors.
Centroid Markers: Display cluster centers for better interpretation.
2D/3D PCA Visualization: Reduce dimensions for better visualization.

Applications of K-Means Clustering

Customer Segmentation – Grouping customers based on purchasing behavior.
Image Compression – Reducing image colors to dominant clusters.
Anomaly Detection – Identifying fraudulent transactions.
Medical Diagnosis – Classifying patients based on symptoms and medical data

Program
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from sklearn.datasets import load_breast_cancer
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.metrics import confusion_matrix, classification_report

# Suppress warnings
warnings.filterwarnings("ignore", category=UserWarning) # For edgecolor
warnings.filterwarnings("ignore", category=FutureWarning) # For n_init

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target
# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply KMeans clustering with explicit n_init

kmeans = KMeans(n_clusters=2, random_state=42, n_init=10)
y_kmeans = kmeans.fit_predict(X_scaled)

# Evaluation
print("Confusion Matrix:")
print(confusion_matrix(y, y_kmeans))
print("\nClassification Report:")
print(classification_report(y, y_kmeans))

# PCA for 2D visualization

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Create DataFrame for plotting

df = pd.DataFrame(X_pca, columns=['PC1', 'PC2'])
df['Cluster'] = y_kmeans
df['True Label'] = y

# Plot: K-Means Clustering

plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='PC1', y='PC2', hue='Cluster', palette='Set1', s=100,
edgecolor='black', alpha=0.7, marker='o')
plt.title('K-Means Clustering of Breast Cancer Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(title="Cluster")
plt.show()
# Plot: True Labels
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='PC1', y='PC2', hue='True Label', palette='coolwarm',
s=100, edgecolor='black', alpha=0.7, marker='o')
plt.title('True Labels of Breast Cancer Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(title="True Label")
plt.show()

# Plot: K-Means with Centroids

plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='PC1', y='PC2', hue='Cluster', palette='Set1', s=100,
edgecolor='black', alpha=0.7, marker='o')
centers = pca.transform(kmeans.cluster_centers_)
plt.scatter(centers[:, 0], centers[:, 1], s=200, c='red', marker='X', label='Centroids')
plt.title('K-Means Clustering with Centroids')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend(title="Cluster")
plt.show()

10.program K Means
No ratings yet
10.program K Means
16 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
26 pages
02.1 K-Means Example
No ratings yet
02.1 K-Means Example
12 pages
Data Mining
No ratings yet
Data Mining
10 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
0006 - K Means Clustering - Introduction - 2025
No ratings yet
0006 - K Means Clustering - Introduction - 2025
19 pages
Unit 4
No ratings yet
Unit 4
125 pages
Kmeansfinal
No ratings yet
Kmeansfinal
16 pages
K Mean
No ratings yet
K Mean
12 pages
K Clustering
No ratings yet
K Clustering
28 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Unit 4
No ratings yet
Unit 4
46 pages
K Means Clustering
No ratings yet
K Means Clustering
27 pages
K Mean
No ratings yet
K Mean
7 pages
Mini Project
No ratings yet
Mini Project
8 pages
K.means Clustering
No ratings yet
K.means Clustering
8 pages
Digital Image Processing: Segmentation-5
No ratings yet
Digital Image Processing: Segmentation-5
43 pages
Wa0033.
No ratings yet
Wa0033.
38 pages
Da Exp 10 66
No ratings yet
Da Exp 10 66
6 pages
Unit 7 Clustering (P)
No ratings yet
Unit 7 Clustering (P)
22 pages
K-Means Clustering Guide 2023
No ratings yet
K-Means Clustering Guide 2023
14 pages
K Means
No ratings yet
K Means
4 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
10.lab Activity
No ratings yet
10.lab Activity
11 pages
Minor Project
No ratings yet
Minor Project
10 pages
K-Means Clustering
No ratings yet
K-Means Clustering
4 pages
Algo
No ratings yet
Algo
59 pages
KMeans Clustering Report
No ratings yet
KMeans Clustering Report
2 pages
K Means
No ratings yet
K Means
25 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
27 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
Intro To ML Ass
No ratings yet
Intro To ML Ass
3 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
DWM Exp7 C49
No ratings yet
DWM Exp7 C49
11 pages
K-Means Clustering Guide & Python Implementation
No ratings yet
K-Means Clustering Guide & Python Implementation
21 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
Clustering
No ratings yet
Clustering
18 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
K Means Clustering
No ratings yet
K Means Clustering
3 pages
K Means
No ratings yet
K Means
23 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
K Mean Clustering
No ratings yet
K Mean Clustering
48 pages
Module 4
No ratings yet
Module 4
63 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
4 pages
Neural Network Clustering Guide
No ratings yet
Neural Network Clustering Guide
168 pages
Day 3
No ratings yet
Day 3
74 pages
K-Means Clustering
No ratings yet
K-Means Clustering
7 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
ML Clustering2
No ratings yet
ML Clustering2
11 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
Experiment 8 ML Vtu
No ratings yet
Experiment 8 ML Vtu
4 pages
Experiment 7 ML Vtu
No ratings yet
Experiment 7 ML Vtu
5 pages
Decision Tree c4.5
No ratings yet
Decision Tree c4.5
22 pages
Aqar 2023 24 1 3 2 Aiml
No ratings yet
Aqar 2023 24 1 3 2 Aiml
344 pages
Ad3461 ML Split Up Set 1
No ratings yet
Ad3461 ML Split Up Set 1
4 pages
Ace The Data Science 2
No ratings yet
Ace The Data Science 2
235 pages
Machine Learning Platforms: The Definitive Guide To
No ratings yet
Machine Learning Platforms: The Definitive Guide To
39 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
Introduction Machine Learning
No ratings yet
Introduction Machine Learning
53 pages
Precision and Recall in ML Evaluation
No ratings yet
Precision and Recall in ML Evaluation
28 pages
Resume Rohnit
No ratings yet
Resume Rohnit
1 page
CS103: Mathematical Foundations of Computer Science, Stanford University
No ratings yet
CS103: Mathematical Foundations of Computer Science, Stanford University
3 pages
SNUC B - Tech - Curriculum AI and DS
No ratings yet
SNUC B - Tech - Curriculum AI and DS
10 pages
D4304-Syllabus-Neural Networks and Fuzzy Systems
0% (1)
D4304-Syllabus-Neural Networks and Fuzzy Systems
1 page
Developing Intelligent and Immutable Vaccine Supply and Operation Platform Using Blockchain and Artificial Intelligence Technologies
No ratings yet
Developing Intelligent and Immutable Vaccine Supply and Operation Platform Using Blockchain and Artificial Intelligence Technologies
13 pages
AI T-Bone
No ratings yet
AI T-Bone
9 pages
AI ML DS Syllabus-Sem-Vi-Mumbai-University
0% (1)
AI ML DS Syllabus-Sem-Vi-Mumbai-University
42 pages
Chapter 22 Artificial Intelligence
No ratings yet
Chapter 22 Artificial Intelligence
57 pages
A Systematic Review On Deep Learning-Based Automated Cancer
No ratings yet
A Systematic Review On Deep Learning-Based Automated Cancer
20 pages
FALLSEM2024-25 BCSE409L TH VL2024250101879 2024-11-12 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE409L TH VL2024250101879 2024-11-12 Reference-Material-I
19 pages
Class10 Ch6 AI
No ratings yet
Class10 Ch6 AI
10 pages
MATLAB - Programs
No ratings yet
MATLAB - Programs
10 pages
Gen AI For Disease Prediction
No ratings yet
Gen AI For Disease Prediction
8 pages
TYD CV CP IEEEReport 7,24,26 Final-1
No ratings yet
TYD CV CP IEEEReport 7,24,26 Final-1
9 pages
The Increasing Impact of Technology On Translation
No ratings yet
The Increasing Impact of Technology On Translation
5 pages
Landing Trajectory Prediction For UAS Based On Generative Adversarial Network
No ratings yet
Landing Trajectory Prediction For UAS Based On Generative Adversarial Network
10 pages
Detecting Clinical Signs of Anaemia Using Machine Learning Report
No ratings yet
Detecting Clinical Signs of Anaemia Using Machine Learning Report
72 pages
No-Code AI & ML Course by MIT
No ratings yet
No-Code AI & ML Course by MIT
15 pages
Robust Human Target Detection and Acquisition
No ratings yet
Robust Human Target Detection and Acquisition
14 pages
Special Issue Editorial. Artificial Intelligence in Organizations
No ratings yet
Special Issue Editorial. Artificial Intelligence in Organizations
14 pages
Docs
No ratings yet
Docs
108 pages
Digital Nomad Destination Choices
No ratings yet
Digital Nomad Destination Choices
15 pages
Technical Report On Robots
No ratings yet
Technical Report On Robots
27 pages
Nith in Resume
No ratings yet
Nith in Resume
2 pages