0% found this document useful (0 votes)

32 views6 pages

Clusterig

This document describes analyzing clustering data using K-means clustering. It generates sample data points from 3 clusters and initial cluster centers. Custom K-means clustering code is written and compared to the built-in K-means function. The custom and built-in clustering assignments are shown to match for the sample data.

Uploaded by

john.nstat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views6 pages

Clusterig

Uploaded by

john.nstat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

HW5 - Analyze Data (Clustering)

2023-10-28

Preamble
# Library tidyverse
suppressMessages("tidyverse")

## [1] "tidyverse"

library(tidyverse)

# this ensures the random number generator gives

# reproducible results.
set.seed(7)

# the centers of our clusters

centerx <- c(3, -3, 0)
centery <- c(3, 0, -3)
n <- 25
# data
x1 <- centerx[1] + rnorm(n, 0, 1)
y1 <- centery[1] + rnorm(n, 0, 2)
x2 <- centerx[2] + rnorm(n, 0, 2)
y2 <- centery[2] + rnorm(n, 0, 1)
x3 <- centerx[3] + rnorm(n, 0, 2)
y3 <- centery[3] + rnorm(n, 0, 2)
# our dataset
tib <- tibble(
x = c(x1, x2, x3),
y = c(y1, y2, y3)
)

# initial centers
init_centers_tib <- tibble(
x = c(-2, 0, 2),
y = c(-1, 3, 0),
)

Question 1
Write code to visualize the dataset and the initial clusters with the following guidelines: 1.
Use ggplot with geom_point layers to produce a scatter plot. 2. Overlay the initial centers on
top of the data points with both a custom color and shape so that the initial centers are
visually noticeable.
# 1
# Scatterplot

# loading the ggplot library

library(ggplot2)

ggplot(tib, aes(x = x, y = y))+

geom_point(position = "identity", na.rm = FALSE)+
theme_minimal()+
ggtitle("Scatterplot for Initial Clusters")

# Overlaying the initial centers on top of data points

ggplot(tib, aes(x = x, y = y))+
geom_point()+
theme_minimal()+
geom_point(data = init_centers_tib, aes(x = x, y = y), color = "green",
size = 4, shape = 18)+
ggtitle("The Scatterplot with Marked Initial Centers")
Question 2
Write your own kmeans function with the following guidelines: 1. Call your custom
function k_means. Note the underscore as we do not want to mask the built in kmeans
function. 2. This should take two arguments: • tib: your actual data. • centers: your initial
centroids. 3. The output should be a vector of cluster assignments that correspond with
each observation of the original dataset tib.
k_means <- function(tib, centers) {
# Convert the tib to a matrix for easy implementation
tib <- as.matrix(tib)

# Total number of observations and the number of centroids

num_observations <- nrow(tib)
num_centroids <- nrow(centers)

# cluster assignments
cluster_assignments <- rep(0, num_observations)

# Iterate until convergence using the repeat loop

repeat {
# The closer the observation to the centroid the observation is assigned
to that centroid
for (obs in 1:num_observations) {
distances <- sqrt(rowSums((tib[obs, ] - centers)^2))
cluster_assignments[obs] <- which.min(distances)
}

# The centers needs to be updated

new_centroids <- matrix(0, nrow = num_centroids, ncol = ncol(tib))
for (j in 1:num_centroids) {
new_centroids[j, ] <- colMeans(tib[cluster_assignments == j, ])
}

# Check for convergence

if (identical(centers, new_centroids)) {
break
} else {
centers <- new_centroids
}
}

# Return the cluster assignments

return(cluster_assignments)
}

custom_res <- k_means(tib, init_centers_tib)

builtin_res <- kmeans(tib, init_centers_tib)

# previewing the results

print(custom_res)

## [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 3 1 1 1 1 1 1
1 1 1
## [39] 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 1 3 3 3 3 3 3 3 3 1 3 3 3 3 3 1 3 3
3 3

print(builtin_res)

## K-means clustering with 3 clusters of sizes 27, 25, 23

##
## Cluster means:
## x y
## 1 -2.7811405 -0.3899246
## 2 3.4640362 3.0267708
## 3 0.8813379 -2.6080654
##
## Clustering vector:
## [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 3 1 1 1 1 1 1
1 1 1
## [39] 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 1 3 3 3 3 3 3 3 3 1 3 3 3 3 3 1 3 3
3 3
##
## Within cluster sum of squares by cluster:
## [1] 107.89646 87.12228 113.80475
## (between_SS / total_SS = 74.5 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss"
"tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"

# add cluster assignments to the original data.

results_tib <- tib |>
mutate(
custom_cluster = custom_res,
builtin_cluster = builtin_res$cluster,
)
# see if any cluster assignments don't match.
results_tib |>
filter(
custom_cluster != builtin_cluster
) |>
dim()

## [1] 0 4

The number of rows is equal 0 since all the assignments matches for both built in k means
and custom k means.

Question 3
results_tib |>
mutate(
custom_cluster = as_factor(custom_res),
)

## # A tibble: 75 × 4
## x y custom_cluster builtin_cluster
## <dbl> <dbl> <fct> <int>
## 1 5.29 3.37 2 2
## 2 1.80 4.50 2 2
## 3 2.31 4.18 2 2
## 4 2.59 1.03 2 2
## 5 2.03 2.45 2 2
## 6 2.05 1.26 2 2
## 7 3.75 4.44 2 2
## 8 2.88 3.22 2 2
## 9 3.15 2.84 2 2
## 10 5.19 2.16 2 2
## # ℹ 65 more rows

ggplot(results_tib, aes(x, y, color = custom_cluster))+

geom_point()+
theme_minimal()+
ggtitle("Graphical Representation of Custom Clusters")

The clusters looks relative located to where the centroids are located.

401 Week7 Part 1 KMeans
No ratings yet
401 Week7 Part 1 KMeans
45 pages
Lab 07
No ratings yet
Lab 07
4 pages
Clustering
No ratings yet
Clustering
25 pages
Week 10 Abhishek Srivastava VFinal
No ratings yet
Week 10 Abhishek Srivastava VFinal
14 pages
Clustering On Boston Dataset
No ratings yet
Clustering On Boston Dataset
3 pages
Homework#6
No ratings yet
Homework#6
10 pages
Learn Lab3
No ratings yet
Learn Lab3
12 pages
Clustering
No ratings yet
Clustering
55 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
Artificial Intelligence Lab 10
No ratings yet
Artificial Intelligence Lab 10
8 pages
Unit 6 - Machine Learning in R
No ratings yet
Unit 6 - Machine Learning in R
45 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Cluster R
No ratings yet
Cluster R
1 page
7 K-Means Clustering
No ratings yet
7 K-Means Clustering
27 pages
Fourth: Aeideirhelnnom
No ratings yet
Fourth: Aeideirhelnnom
9 pages
K-Means Clustering Paper
No ratings yet
K-Means Clustering Paper
28 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
K Means Clustering: All All
No ratings yet
K Means Clustering: All All
5 pages
K-Means and Hierarchical Clustering
No ratings yet
K-Means and Hierarchical Clustering
37 pages
Drawback of Standard K-Means Algorithm
No ratings yet
Drawback of Standard K-Means Algorithm
5 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Data Science Exercise Hard
No ratings yet
Data Science Exercise Hard
12 pages
HW 8
No ratings yet
HW 8
4 pages
Assignment Clustering
No ratings yet
Assignment Clustering
22 pages
AIML Lab 10
No ratings yet
AIML Lab 10
4 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
Application of Linear Algebra
No ratings yet
Application of Linear Algebra
7 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
6 pages
AI&ML Lab-Ex.9corre
No ratings yet
AI&ML Lab-Ex.9corre
5 pages
K-Means Clustering
No ratings yet
K-Means Clustering
7 pages
Clustering L7
No ratings yet
Clustering L7
7 pages
Exercises695Clus Solution - Doc Exercises695Clus Solution
No ratings yet
Exercises695Clus Solution - Doc Exercises695Clus Solution
7 pages
Clustering - R Program-Practicals
No ratings yet
Clustering - R Program-Practicals
17 pages
Cluster Analysis Usingr PDF
No ratings yet
Cluster Analysis Usingr PDF
0 pages
Clustering R Codes
No ratings yet
Clustering R Codes
2 pages
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
No ratings yet
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
7 pages
K Means Tutorial
No ratings yet
K Means Tutorial
8 pages
Introduction To The City Clustering Algorithm: Steffen Kriewald December 19, 2019
No ratings yet
Introduction To The City Clustering Algorithm: Steffen Kriewald December 19, 2019
8 pages
Unit V
No ratings yet
Unit V
165 pages
Materi Praktikum
No ratings yet
Materi Praktikum
7 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
Machine Learning Week 8 Homework
No ratings yet
Machine Learning Week 8 Homework
5 pages
Tutorial Exercises Clustering - K-Means, Nearest Neighbor and Hierarchical
No ratings yet
Tutorial Exercises Clustering - K-Means, Nearest Neighbor and Hierarchical
7 pages
Task 2
No ratings yet
Task 2
3 pages
K Means Example
No ratings yet
K Means Example
14 pages
008 Clustering With Examples - Unlocked
No ratings yet
008 Clustering With Examples - Unlocked
6 pages
ML Minors Exp7
No ratings yet
ML Minors Exp7
6 pages
Assignment1 M0719077 Naufal Adhi Iyansyah
No ratings yet
Assignment1 M0719077 Naufal Adhi Iyansyah
4 pages
K-Means Clustering Using Matlab: December 2015
No ratings yet
K-Means Clustering Using Matlab: December 2015
6 pages
K-Means Clustering Tutorial
No ratings yet
K-Means Clustering Tutorial
8 pages
K Means Example
No ratings yet
K Means Example
8 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
26 pages
AI Ass 2
No ratings yet
AI Ass 2
32 pages
Cluster Analysis Overview
No ratings yet
Cluster Analysis Overview
77 pages
K-Means Clustering Guide 2023
No ratings yet
K-Means Clustering Guide 2023
14 pages
Lecture 18 K Means Clustering
No ratings yet
Lecture 18 K Means Clustering
77 pages
ML-Notes - 4 and 5 - 16 Marks
No ratings yet
ML-Notes - 4 and 5 - 16 Marks
21 pages
Gis Unit 5
No ratings yet
Gis Unit 5
27 pages
E.Samoli Et Al 2020
No ratings yet
E.Samoli Et Al 2020
9 pages
Lecture 3
No ratings yet
Lecture 3
33 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
47 pages
Chapter 4 - Clustering
No ratings yet
Chapter 4 - Clustering
21 pages
Aiello Lammens Et Al. - 2015 - SpThin An R Package For Spatial Thinning of Species Occurrence Records For Use in Ecological Niche
No ratings yet
Aiello Lammens Et Al. - 2015 - SpThin An R Package For Spatial Thinning of Species Occurrence Records For Use in Ecological Niche
5 pages
Principles of Remote Sensing
0% (1)
Principles of Remote Sensing
410 pages
Unit-4 New
No ratings yet
Unit-4 New
36 pages
1902 06672 PDF
No ratings yet
1902 06672 PDF
24 pages
Geographical Information Systems - Lec 3
No ratings yet
Geographical Information Systems - Lec 3
52 pages
Question Multiple Choices
No ratings yet
Question Multiple Choices
21 pages
Using Gis in Urban Planning Analysis
No ratings yet
Using Gis in Urban Planning Analysis
18 pages
GIS Vector Spatial Analysis Guide
No ratings yet
GIS Vector Spatial Analysis Guide
45 pages
UNIT IV - Spacial Data Analysis
No ratings yet
UNIT IV - Spacial Data Analysis
42 pages
Final Thesis Proposal
No ratings yet
Final Thesis Proposal
51 pages
GISBook
No ratings yet
GISBook
138 pages
Ecological Modelling and Energy DSS
No ratings yet
Ecological Modelling and Energy DSS
325 pages
Mapping Cultural Ecosystem Services
No ratings yet
Mapping Cultural Ecosystem Services
12 pages
AP Human Geography Unit Guides 2020
No ratings yet
AP Human Geography Unit Guides 2020
106 pages
Intoxication in The City
No ratings yet
Intoxication in The City
19 pages
Stein Et Al 2014 Environmental Heterogeneity
No ratings yet
Stein Et Al 2014 Environmental Heterogeneity
15 pages
A GIS Framework To Forecast Residential Home Prices
No ratings yet
A GIS Framework To Forecast Residential Home Prices
27 pages
2000 Holguin-Veras e Thorso
No ratings yet
2000 Holguin-Veras e Thorso
12 pages
Introduction To Geomatics 1620770423
No ratings yet
Introduction To Geomatics 1620770423
116 pages
Coastal Dune Conservation Study
No ratings yet
Coastal Dune Conservation Study
12 pages
Topic 1 PDF
No ratings yet
Topic 1 PDF
23 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
4 pages
SAADI 2023 Archivage
No ratings yet
SAADI 2023 Archivage
303 pages
Introduction to GIS Concepts
No ratings yet
Introduction to GIS Concepts
185 pages
ASA University Bangladesh: MPH 536 Introduction To Geographic Information System (3 Credits)
No ratings yet
ASA University Bangladesh: MPH 536 Introduction To Geographic Information System (3 Credits)
4 pages

Clusterig

Uploaded by

Clusterig

Uploaded by

HW5 - Analyze Data (Clustering)

# this ensures the random number generator gives

# the centers of our clusters

# loading the ggplot library

ggplot(tib, aes(x = x, y = y))+

# Overlaying the initial centers on top of data points

# Total number of observations and the number of centroids

# Iterate until convergence using the repeat loop

# The centers needs to be updated

# Check for convergence

# Return the cluster assignments

custom_res <- k_means(tib, init_centers_tib)

# previewing the results

## K-means clustering with 3 clusters of sizes 27, 25, 23

# add cluster assignments to the original data.

ggplot(results_tib, aes(x, y, color = custom_cluster))+

You might also like