0% found this document useful (0 votes)

47 views6 pages

Vid 4

The document discusses performing K-Means clustering on a dataset using Python and R. It describes preprocessing the data, using the elbow method to determine the optimal number of clusters, applying K-Means clustering with a specified number of clusters, and visualizing the clustered data points and centroids.

Uploaded by

diyalap01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views6 pages

Vid 4

Uploaded by

diyalap01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Name: Vidya Janani V

Register Number: 913121205090

EX.NO: 4 CLUSTERING THE GIVEN DATA USING PYTHON/R

Date: 12.03.2024

AIM:

To perform clustering of the given data using K-Means in Python and R

STEPS:

1. Data Preparation: Load and pre-process the data. Ensure it's in a suitable format for
clustering

2. Library Imports: Import necessary Python libraries, such as sklearn for K-Means
and matplotlib for visualization.

3. K-Means Clustering: Initialize and fit a K-Means model, specifying the number of
clusters (K)
4. Visualization: Visualize the clusters to identify patterns and structures within the data

PYTHON:

ELBOW METHOD:

The Elbow Method to find the optimal number of clusters (K) for K-Means clustering. It
loods a dataset, selects specific features, and calculates the Within-Cluster Variance (WSS)
for Kvalues ranging from 1 to 10. The resulting WSS values are plotted to visualize the
"elbow" point where the rate of decrease in WSS slows down, indicating the optimal K. This
helps in determining the most suitable number of clusters for the given dataset.

K-MEANS

The Python code performs K-Means clustering with a specified number of clusters (K) on a
dataset with two selected features. It adds cluster assignments to the original dataset and
visualizes the data points with different colors for each cluster. Additionally, it plots the
cluster centroids. The "k" variable should be replaced with the chosen number of clusters, and
the code provides a visual representation of the clustering results.

SCATTER PLOT

1. A scatter plot will be displayed, where data points are colored differently based on their
assigned clusters, showing the clusters formed by K-Means.

2. The cluster centroids will be marked as red "x" symbols on the plot.

3. The title of the plot will indicate the number of clusters used for K-Means clustering
(specified by the 'k' variable). 4. A legend will be displayed in the upper right corner of the
plot, indicating the labels for datapoints and centroids.

Thus the k means clustering was performed for the global air pollution dataset

21PCS02 – Exploratory Data Analysis Laboratory

Name: Vidya Janani V
Register Number: 913121205090

Python Code:

1.Elbow method

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Load the dataset

data = pd.read_csv("job_placement.csv")

# Display the first few rows of the dataset

print(data.head())

# Preprocessing the data

# Dropping non-numeric columns if any and handling missing values
data = data.dropna()
numeric_data = data.select_dtypes(include=['float64', 'int64'])

# Standardizing the data

scaler = StandardScaler()
scaled_data = scaler.fit_transform(numeric_data)

# Applying PCA for dimensionality reduction

pca = PCA(n_components=2)
pca_data = pca.fit_transform(scaled_data)

# Elbow Method to find the optimal number of clusters

inertia = []
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, random_state=42)
kmeans.fit(pca_data)
inertia.append(kmeans.inertia_)

# Plotting the Elbow Method

plt.plot(range(1, 11), inertia, marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('Inertia')
plt.show()

21PCS02 – Exploratory Data Analysis Laboratory

Name: Vidya Janani V
Register Number: 913121205090

Output

2. K-Means Clustering
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Load the dataset

data = pd.read_csv("job_placement.csv")

# Display the first few rows of the dataset

print(data.head())

# Preprocessing the data

# Dropping non-numeric columns if any and handling missing values
data = data.dropna()
numeric_data = data.select_dtypes(include=['float64', 'int64'])

# Standardizing the data

scaler = StandardScaler()
scaled_data = scaler.fit_transform(numeric_data)

# Applying PCA for dimensionality reduction

pca = PCA(n_components=2)
pca_data = pca.fit_transform(scaled_data)

21PCS02 – Exploratory Data Analysis Laboratory

Name: Vidya Janani V
Register Number: 913121205090

# Applying K-means clustering

kmeans = KMeans(n_clusters=3, random_state=42)
cluster_labels = kmeans.fit_predict(pca_data)

# Visualizing the clusters

plt.figure(figsize=(8, 6))
plt.scatter(pca_data[:, 0], pca_data[:, 1], c=cluster_labels, cmap='viridis', s=50, alpha=0.5)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=200, c='red',
marker='X', label='Centroids')
plt.title('K-means Clustering')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.show()

Output

21PCS02 – Exploratory Data Analysis Laboratory

Name: Vidya Janani V
Register Number: 913121205090

3. Scatter plot
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Load the dataset

data = pd.read_csv("job_placement.csv")

# Display the first few rows of the dataset

print(data.head())

# Preprocessing the data

# Dropping non-numeric columns if any and handling missing values
data = data.dropna()
numeric_data = data.select_dtypes(include=['float64', 'int64'])

# Standardizing the data

scaler = StandardScaler()
scaled_data = scaler.fit_transform(numeric_data)

# Applying PCA for dimensionality reduction

pca = PCA(n_components=2)
pca_data = pca.fit_transform(scaled_data)

# Applying K-means clustering

kmeans = KMeans(n_clusters=3, random_state=42)
cluster_labels = kmeans.fit_predict(pca_data)

# Visualizing the clusters

plt.figure(figsize=(10, 6))

# Plotting points with cluster centers

plt.scatter(pca_data[:, 0], pca_data[:, 1], c=cluster_labels, cmap='viridis', s=50, alpha=0.5)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=200, c='red',
marker='X', label='Centroids')

plt.title('K-means Clustering')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.grid(True)
plt.show()

21PCS02 – Exploratory Data Analysis Laboratory

Name: Vidya Janani V
Register Number: 913121205090

Output

Result:
In this experiment , Clustering the given data using Python /R was implemented and the
output was verified successfully.

21PCS02 – Exploratory Data Analysis Laboratory

Unit 3 Unsupervised Learning
No ratings yet
Unit 3 Unsupervised Learning
9 pages
EXP-6 K Mean Clustring
No ratings yet
EXP-6 K Mean Clustring
6 pages
DADV Exp-5
No ratings yet
DADV Exp-5
3 pages
Pa66 ML Exp6
No ratings yet
Pa66 ML Exp6
9 pages
ML Assignment-10
No ratings yet
ML Assignment-10
5 pages
K Means Clustering - Experiment 12
No ratings yet
K Means Clustering - Experiment 12
3 pages
Pranav ML-8
No ratings yet
Pranav ML-8
4 pages
Department Of: Computer Science & Engineering
No ratings yet
Department Of: Computer Science & Engineering
4 pages
Application of Linear Algebra
No ratings yet
Application of Linear Algebra
7 pages
Da Exp 10
No ratings yet
Da Exp 10
6 pages
DWM Exp7 C49
No ratings yet
DWM Exp7 C49
11 pages
AdityaGaur BDA Exp8
No ratings yet
AdityaGaur BDA Exp8
4 pages
Python K-Means Clustering Guide
No ratings yet
Python K-Means Clustering Guide
6 pages
Practical 03
No ratings yet
Practical 03
3 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
26 pages
Da Exp 10
No ratings yet
Da Exp 10
6 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Learn Lab3
No ratings yet
Learn Lab3
12 pages
Avinash Tiwari 9
No ratings yet
Avinash Tiwari 9
4 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
Ex No: Date: K-Means Clustering Using Python: Scatter
No ratings yet
Ex No: Date: K-Means Clustering Using Python: Scatter
10 pages
Building K-Means Clustering Algorithm From Scratch
No ratings yet
Building K-Means Clustering Algorithm From Scratch
10 pages
DWM Exp4
No ratings yet
DWM Exp4
9 pages
Tutorial 8
No ratings yet
Tutorial 8
12 pages
Practical 5
No ratings yet
Practical 5
6 pages
ML 2.3 Prashant
No ratings yet
ML 2.3 Prashant
4 pages
Neural Networks & Machine Learning: Worksheet 3
No ratings yet
Neural Networks & Machine Learning: Worksheet 3
3 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
Exp 5 ML
No ratings yet
Exp 5 ML
9 pages
Elbow Method
No ratings yet
Elbow Method
2 pages
Aiml Assignment 10
No ratings yet
Aiml Assignment 10
6 pages
DS Manual
No ratings yet
DS Manual
30 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
Da Exp 10 66
No ratings yet
Da Exp 10 66
6 pages
20bcs7635-EXP 10
No ratings yet
20bcs7635-EXP 10
5 pages
FullMarks - Clustering StudentSolution 2
No ratings yet
FullMarks - Clustering StudentSolution 2
13 pages
Experiment 3.1 K-Mean
No ratings yet
Experiment 3.1 K-Mean
8 pages
Wa0033.
No ratings yet
Wa0033.
38 pages
Ex No 9
No ratings yet
Ex No 9
1 page
SOLUTION ONLY CODE DWDM - Lab - All
No ratings yet
SOLUTION ONLY CODE DWDM - Lab - All
8 pages
ML SummaryFINAL
No ratings yet
ML SummaryFINAL
48 pages
ML Summary
No ratings yet
ML Summary
23 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
Unsupervisd Learning Algorithm
No ratings yet
Unsupervisd Learning Algorithm
6 pages
Week 8 R Assignment
No ratings yet
Week 8 R Assignment
17 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
10 pages
Exp 7 PDF
No ratings yet
Exp 7 PDF
4 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
Lab Report6 - B21CI014
No ratings yet
Lab Report6 - B21CI014
8 pages
Drawback of Standard K-Means Algorithm
No ratings yet
Drawback of Standard K-Means Algorithm
5 pages
K.means Clustering
No ratings yet
K.means Clustering
8 pages
K Means
No ratings yet
K Means
5 pages
Clustering 1
No ratings yet
Clustering 1
18 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
SE KMeansClustering
No ratings yet
SE KMeansClustering
21 pages
Vidya PC Lab
No ratings yet
Vidya PC Lab
9 pages
Assvid
No ratings yet
Assvid
13 pages
Ex. No. 03 Construct An Application That Draws Basic Graphical Primitives On The Screen Date
No ratings yet
Ex. No. 03 Construct An Application That Draws Basic Graphical Primitives On The Screen Date
4 pages
Logs
No ratings yet
Logs
7 pages
Android Graphics App Tutorial
No ratings yet
Android Graphics App Tutorial
4 pages
Query Languages - DPP 02 Discussion Notes
No ratings yet
Query Languages - DPP 02 Discussion Notes
12 pages
4.1 Using Libraries in Python
No ratings yet
4.1 Using Libraries in Python
11 pages
Excel For Scientists and Engineers - Numerical Methods (E. Joseph Billo) (Z-Library) (032-042)
No ratings yet
Excel For Scientists and Engineers - Numerical Methods (E. Joseph Billo) (Z-Library) (032-042)
11 pages
Python Transforms for Maltego TDS
No ratings yet
Python Transforms for Maltego TDS
49 pages
DBMS Experiment - Lab 3
No ratings yet
DBMS Experiment - Lab 3
32 pages
Fundamentals of Programming Without Restriction
0% (1)
Fundamentals of Programming Without Restriction
149 pages
ML Week 3 Logistic Regression
60% (10)
ML Week 3 Logistic Regression
6 pages
Unit-V DS Pattern Matching and Tries
No ratings yet
Unit-V DS Pattern Matching and Tries
26 pages
4 - Pseudocode With WHILE
No ratings yet
4 - Pseudocode With WHILE
17 pages
07 - Own Alert Rules Part 2 Revised
No ratings yet
07 - Own Alert Rules Part 2 Revised
5 pages
Windows FOrm Introduction
No ratings yet
Windows FOrm Introduction
6 pages
Architecture and Design of The HeuristicLab Optimization Environment
No ratings yet
Architecture and Design of The HeuristicLab Optimization Environment
65 pages
Dlbcsoopj01 Course Book
No ratings yet
Dlbcsoopj01 Course Book
164 pages
The VHDL Code For 4 Bit Johnson Counter Is
No ratings yet
The VHDL Code For 4 Bit Johnson Counter Is
3 pages
How To Make Animated Graphs in Excel
No ratings yet
How To Make Animated Graphs in Excel
5 pages
Delete Data From A MySQL Database
No ratings yet
Delete Data From A MySQL Database
6 pages
Answer Key of CS Cracker
No ratings yet
Answer Key of CS Cracker
26 pages
The Standard C Library
100% (2)
The Standard C Library
514 pages
Data Structures and Algorithm-1
No ratings yet
Data Structures and Algorithm-1
306 pages
SQL Developer Data Modeler Guide
No ratings yet
SQL Developer Data Modeler Guide
5 pages
Chapter 13 Binary Search Trees
No ratings yet
Chapter 13 Binary Search Trees
39 pages
Laboratory Activity 2 - PLAD
No ratings yet
Laboratory Activity 2 - PLAD
8 pages
Dsa Phase2
No ratings yet
Dsa Phase2
2 pages
Online Grocery Shopping System
70% (10)
Online Grocery Shopping System
40 pages
Module in CC104 Information Management PDF
No ratings yet
Module in CC104 Information Management PDF
51 pages
1 s2.0 S0142061524000231 Main PDF
No ratings yet
1 s2.0 S0142061524000231 Main PDF
5 pages
Xiaomi RedmiNote5 Whyred 2022-12-14 09-50-35
No ratings yet
Xiaomi RedmiNote5 Whyred 2022-12-14 09-50-35
2 pages
Quickpanel
No ratings yet
Quickpanel
5 pages
A Data Locality For Z-Curve Cache Oblivious Matrix Multiplication Algorithms
No ratings yet
A Data Locality For Z-Curve Cache Oblivious Matrix Multiplication Algorithms
5 pages
Final Sample 2
100% (1)
Final Sample 2
9 pages

Vid 4

Uploaded by

Vid 4

Uploaded by

Name: Vidya Janani V

Register Number: 913121205090

EX.NO: 4 CLUSTERING THE GIVEN DATA USING PYTHON/R

To perform clustering of the given data using K-Means in Python and R

21PCS02 – Exploratory Data Analysis Laboratory

# Load the dataset

# Display the first few rows of the dataset

# Preprocessing the data

# Standardizing the data

# Applying PCA for dimensionality reduction

# Elbow Method to find the optimal number of clusters

# Plotting the Elbow Method

21PCS02 – Exploratory Data Analysis Laboratory

# Load the dataset

# Display the first few rows of the dataset

# Preprocessing the data

# Standardizing the data

# Applying PCA for dimensionality reduction

21PCS02 – Exploratory Data Analysis Laboratory

# Applying K-means clustering

# Visualizing the clusters

21PCS02 – Exploratory Data Analysis Laboratory

# Load the dataset

# Display the first few rows of the dataset

# Preprocessing the data

# Standardizing the data

# Applying PCA for dimensionality reduction

# Applying K-means clustering

# Visualizing the clusters

# Plotting points with cluster centers

21PCS02 – Exploratory Data Analysis Laboratory

21PCS02 – Exploratory Data Analysis Laboratory

You might also like