Experiment 6

Uploaded by

mohammed.ansari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views7 pages

Experiment 6

Uploaded by

mohammed.ansari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Experiment - 6

Name: Ansari Mohammed Shanouf Valijan

Class: B.E. Computer Engineering, Semester - VII
UID: 2021300004
Batch: M

Aim:
To implement K-means Clustering for a particular dataset using R programming.

Objectives:
▪ To understand the basics of R programming and RStudio IDE.
▪ To use K-means clustering on a heart-disease dataset for better distinction among
groups/levels of severity of the disease.
▪ To visualize and interpret the clusters formed.

Outcomes:
▪ Familiarization with R programming as a tool to perform statistical analysis.
▪ Proper interpretation of the results.

Theory:
R programming is a powerful language widely used for statistical computing and data analysis.
It offers a vast array of packages and built-in functions that make it particularly suitable for
tasks involving data manipulation, visualization, and modeling. Among its many applications,
R excels in clustering analysis, which is crucial for uncovering patterns within datasets. One
popular clustering technique is K-means clustering, a method that partitions data into distinct
groups based on feature similarity.

K-means clustering in R can be easily implemented using the kmeans() function, which
requires the dataset and the number of clusters as input. The function works by iteratively
assigning data points to the nearest cluster centroid and updating the centroids based on
these assignments. This process continues until the centroids stabilize, resulting in optimal
cluster formation. R also provides the factoextra package, which offers additional tools for
visualizing clustering results, such as silhouette plots and cluster plots, enhancing
interpretability.

In practice, using K-means clustering in R involves several key steps. First, data must be pre-
processed, which may include handling missing values, standardizing features, and selecting
relevant variables. Once the data is ready, the kmeans() function can be executed, allowing
the user to specify the desired number of clusters. It is common to use techniques such as
the elbow method to determine the optimal number of clusters, where the within-cluster
sum of squares is plotted against different values of K.

After fitting the K-means model, R provides various methods for analyzing and interpreting
the results. The final cluster assignments can be added back to the original dataset for further
analysis. Visualization tools, such as scatter plots with cluster color coding, help in
understanding the distribution of data points across clusters. R's rich ecosystem of libraries
and functions not only simplifies the clustering process but also empowers users to derive
meaningful insights from complex data.

Dataset Description:
For the purpose of experimenting with K-means, heart-disease dataset from Kaggle was
utilized.

It consists of the records of various patients suffering from heart diseases. Features like Age,
Sex, Chest pain type, blood pressure, serum cholesterol, fasting blood sugar, etc are included
in the data. The aim of using k-means clustering, here, is to try to group the patients in
specific levels of risk factor based on all the features available.
Implementation:
Following is a step-by-step implementation that was carried out in R-Studio, using R
programming-

Importing the required libraries and the dataset

Viewing the data at a glance to get information about different types of variables used

Performing data imputation to handle missing values (Median for numerical columns and
Mode for categorical columns)
Encoding the categorical columns and scaling the dataset

Viewing the dataset summary

Viewing the scaled dataset

Visualizing age distribution, cholesterol levels and chest pain types to better understand the
further resulting clusters
Using the Elbow method to determine the optimum number of clusters based on within-
cluster sum of squares parameter
Performing final k-means using k as 3 (obtained from above graph) and segregating the
patients in different risk groups based on cholesterol and age

Cluster projections as obtained (Considering cholesterol and age as the major components)

Conclusion:
By performing this experiment, I was able to get familiar with R programming. Further I was
able to write a program in R through R studio that performs k-means clustering on a dataset
consisting of records of patients suffering from heart diseases. While finding optimum
number of clusters, the elbow method showed no significant improvement after k was 3. The
resulting clusters, when projected on the XY plane, considering cholesterol and age as the
major components, show a good amount of separation. This, thus, hints at 3 classes of risks
involved among heart patients which one may understand as mild, at par and severe.

Da 06-10
No ratings yet
Da 06-10
14 pages
R Code For Discriminant and Cluster Analysis
No ratings yet
R Code For Discriminant and Cluster Analysis
23 pages
Datamininganddataware
No ratings yet
Datamininganddataware
25 pages
Aman DA 111
No ratings yet
Aman DA 111
14 pages
Alehandro Lumentah 210211010188 Assignment09
No ratings yet
Alehandro Lumentah 210211010188 Assignment09
10 pages
Mla - 2 (Cia - 3) - 20221013
No ratings yet
Mla - 2 (Cia - 3) - 20221013
21 pages
Lab Report6 - B21CI014
No ratings yet
Lab Report6 - B21CI014
8 pages
Clustering 2
No ratings yet
Clustering 2
11 pages
Control Charts & Cluster Analysis
No ratings yet
Control Charts & Cluster Analysis
8 pages
Clustering
No ratings yet
Clustering
43 pages
Data Mining Business Report 2
No ratings yet
Data Mining Business Report 2
18 pages
STAT452 Project1
No ratings yet
STAT452 Project1
13 pages
K-Means Clustering - Numerical Example
100% (1)
K-Means Clustering - Numerical Example
6 pages
2 Ye UTPMmltmx JD 3 y
No ratings yet
2 Ye UTPMmltmx JD 3 y
7 pages
Unit 6 - Machine Learning in R
No ratings yet
Unit 6 - Machine Learning in R
45 pages
Toc ch1
No ratings yet
Toc ch1
9 pages
DSCI 100 Clustering Concept Cheat Sheet
No ratings yet
DSCI 100 Clustering Concept Cheat Sheet
4 pages
Statistical Computing With R: Masters in Data Sciences 503 (S27) Third Batch, SMS, TU, 2024
No ratings yet
Statistical Computing With R: Masters in Data Sciences 503 (S27) Third Batch, SMS, TU, 2024
30 pages
Kmeans Clustering
No ratings yet
Kmeans Clustering
4 pages
Cluster
No ratings yet
Cluster
3 pages
Clustering R Codes
No ratings yet
Clustering R Codes
2 pages
Health & Economic Clustering Report
91% (11)
Health & Economic Clustering Report
18 pages
R Lab Program
No ratings yet
R Lab Program
20 pages
Optimal K-Means Clustering on Iris
No ratings yet
Optimal K-Means Clustering on Iris
4 pages
Da Thoery
No ratings yet
Da Thoery
24 pages
K-Means Cluster Analysis UC Business Analytics R Programming Guide
No ratings yet
K-Means Cluster Analysis UC Business Analytics R Programming Guide
19 pages
Learn Lab3
No ratings yet
Learn Lab3
12 pages
K-Means Cluter Analysis For IRIS Data Frame in R
No ratings yet
K-Means Cluter Analysis For IRIS Data Frame in R
3 pages
BAN5
No ratings yet
BAN5
2 pages
K Means
No ratings yet
K Means
26 pages
Experiment 10 Vtu ML
No ratings yet
Experiment 10 Vtu ML
5 pages
Practical No - 4
No ratings yet
Practical No - 4
3 pages
Data Visualization with R
No ratings yet
Data Visualization with R
4 pages
Step by Step K Means Example
No ratings yet
Step by Step K Means Example
3 pages
K Means Clustering in R Example - Learn by Marketing
No ratings yet
K Means Clustering in R Example - Learn by Marketing
3 pages
k-Means Clustering Guide
No ratings yet
k-Means Clustering Guide
2 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
26 pages
Vid 4
No ratings yet
Vid 4
6 pages
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
No ratings yet
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
7 pages
Week 10 Abhishek Srivastava VFinal
No ratings yet
Week 10 Abhishek Srivastava VFinal
14 pages
DATAMINING
No ratings yet
DATAMINING
24 pages
Digital Image Processing: Segmentation-5
No ratings yet
Digital Image Processing: Segmentation-5
43 pages
Materi Praktikum
No ratings yet
Materi Praktikum
7 pages
FullMarks - Clustering StudentSolution 2
No ratings yet
FullMarks - Clustering StudentSolution 2
13 pages
02.1 K-Means Example
No ratings yet
02.1 K-Means Example
12 pages
Unsupervised Methods Overview
No ratings yet
Unsupervised Methods Overview
26 pages
K Mean Clustering 1
No ratings yet
K Mean Clustering 1
26 pages
Assignment-1 80501
No ratings yet
Assignment-1 80501
6 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
21 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
22 pages
DWM Exp7 C49
No ratings yet
DWM Exp7 C49
11 pages
K-Means Clustering Guide
100% (1)
K-Means Clustering Guide
12 pages
K Means
No ratings yet
K Means
23 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
Lab 07
No ratings yet
Lab 07
4 pages
LAB 4 - K-Means and Elbow Technique
No ratings yet
LAB 4 - K-Means and Elbow Technique
3 pages
Experiment 3
No ratings yet
Experiment 3
9 pages
DSM Mini Project
No ratings yet
DSM Mini Project
11 pages
Experiment 7
No ratings yet
Experiment 7
13 pages
DSM Practical 1
No ratings yet
DSM Practical 1
14 pages
Experiment 1
No ratings yet
Experiment 1
21 pages
Experiment 4
No ratings yet
Experiment 4
8 pages
Experiment 2
No ratings yet
Experiment 2
7 pages
Experiment 2
No ratings yet
Experiment 2
12 pages
SHA-512 Hashing for Cybersecurity
No ratings yet
SHA-512 Hashing for Cybersecurity
14 pages
Experiment 8
No ratings yet
Experiment 8
13 pages
Neural Network for Ionosphere Analysis
No ratings yet
Neural Network for Ionosphere Analysis
7 pages
Experiment 5
No ratings yet
Experiment 5
8 pages
Class Assignment On Decision Trees
No ratings yet
Class Assignment On Decision Trees
6 pages
Experiment 3
No ratings yet
Experiment 3
5 pages
Assignment On Module-3
No ratings yet
Assignment On Module-3
3 pages
Experiment 7
No ratings yet
Experiment 7
6 pages
Lab6A-Asset Tracking
No ratings yet
Lab6A-Asset Tracking
27 pages
Assignment-1, 2
No ratings yet
Assignment-1, 2
2 pages
Experiment 4
No ratings yet
Experiment 4
12 pages
Experiment 5
No ratings yet
Experiment 5
10 pages
AI Regression in Healthcare
No ratings yet
AI Regression in Healthcare
16 pages
CSS 2024 25 BE CE A B Sem VII OTH Lec 4 Unit II Asymmetric RSA DH Ciphers
No ratings yet
CSS 2024 25 BE CE A B Sem VII OTH Lec 4 Unit II Asymmetric RSA DH Ciphers
29 pages
Experiment 3
No ratings yet
Experiment 3
6 pages
Class-Work-1 (26-08-2024)
No ratings yet
Class-Work-1 (26-08-2024)
5 pages
Class-Work-Naive-Bayes (21-10-2024)
No ratings yet
Class-Work-Naive-Bayes (21-10-2024)
5 pages
Experiment 4
No ratings yet
Experiment 4
8 pages
CSS 2024 25 BE CE A B Sem VII AVN Lec 1 Introduction
No ratings yet
CSS 2024 25 BE CE A B Sem VII AVN Lec 1 Introduction
14 pages
LP Simplex Victoriano
0% (1)
LP Simplex Victoriano
36 pages
How Can You Create A SUDOKU Puzzle?
No ratings yet
How Can You Create A SUDOKU Puzzle?
8 pages
CSS Flex
No ratings yet
CSS Flex
8 pages
Computational Fluid Dynamics: Seminar I (Introduction)
No ratings yet
Computational Fluid Dynamics: Seminar I (Introduction)
13 pages
2012 Mathematics D Paper
33% (3)
2012 Mathematics D Paper
13 pages
Chapter 1 Nature of Strategic Management
No ratings yet
Chapter 1 Nature of Strategic Management
46 pages
Physics Experiment: Coulomb's Law
No ratings yet
Physics Experiment: Coulomb's Law
5 pages
An Artificial Intelligence-Based Crop Recommendation System Using ML
No ratings yet
An Artificial Intelligence-Based Crop Recommendation System Using ML
10 pages
Quiz 1 - Introduction To Statistical Methods
No ratings yet
Quiz 1 - Introduction To Statistical Methods
5 pages
Amrita Syllabus Notes
No ratings yet
Amrita Syllabus Notes
112 pages
Geostatistics: PCA & Clustering Analysis
No ratings yet
Geostatistics: PCA & Clustering Analysis
6 pages
Heteroscedasticity
No ratings yet
Heteroscedasticity
21 pages
Modeling and Simulation of A Distillation Column Using ASPEN PLUS
No ratings yet
Modeling and Simulation of A Distillation Column Using ASPEN PLUS
10 pages
Worksheet 3 Square Cube Roots
No ratings yet
Worksheet 3 Square Cube Roots
5 pages
Stats - Lecture 1 - F2020 PDF
100% (1)
Stats - Lecture 1 - F2020 PDF
45 pages
Types and Properties of Fields
No ratings yet
Types and Properties of Fields
6 pages
Sample 2018 GRADE 6 Solution MTAP
100% (2)
Sample 2018 GRADE 6 Solution MTAP
14 pages
Feature Selection For Unsupervised Learning: Jennifer G. Dy
No ratings yet
Feature Selection For Unsupervised Learning: Jennifer G. Dy
45 pages
St. Xavier'S Seniorsecondary School, Delhi-54
No ratings yet
St. Xavier'S Seniorsecondary School, Delhi-54
1 page
Made Familiar Mathematics
No ratings yet
Made Familiar Mathematics
157 pages
CSS Sociology Solved MCQS of Past Papers
0% (1)
CSS Sociology Solved MCQS of Past Papers
9 pages
Maclaurin Series in IBDP Calculus
No ratings yet
Maclaurin Series in IBDP Calculus
10 pages
Logarithms 2
No ratings yet
Logarithms 2
11 pages
K-Nearest Neighbour (KNN) Algorithm
No ratings yet
K-Nearest Neighbour (KNN) Algorithm
5 pages
Design and Comparison of Robust Nonlinear Controllers For The Lateral Dynamics of Intelligent Vehicles (Controlador Lateral e I&I)
No ratings yet
Design and Comparison of Robust Nonlinear Controllers For The Lateral Dynamics of Intelligent Vehicles (Controlador Lateral e I&I)
14 pages
Java Questions1
No ratings yet
Java Questions1
5 pages
Aits 2 Paper 1 Advanced
100% (3)
Aits 2 Paper 1 Advanced
18 pages
BBA Course Syllabus
No ratings yet
BBA Course Syllabus
51 pages
Stoichiometry for Chemistry Students
100% (1)
Stoichiometry for Chemistry Students
27 pages
Matrices - CET
0% (1)
Matrices - CET
22 pages

Experiment 6

Uploaded by

Experiment 6

Uploaded by

Experiment - 6

Name: Ansari Mohammed Shanouf Valijan

Importing the required libraries and the dataset

Viewing the dataset summary

Viewing the scaled dataset

You might also like