0% found this document useful (0 votes)
105 views18 pages

R Cluster Analysis: Wage Data

This document discusses cluster analysis techniques in R. It introduces hierarchical clustering and k-means clustering. For hierarchical clustering, it covers building a dendrogram, extracting clusters, and reviewing results. For k-means, it discusses estimating the optimal number of clusters and exploring the results. It compares the two methods and highlights what was learned about distance, scale, linkage, and evaluating the number of clusters.

Uploaded by

debojyoti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views18 pages

R Cluster Analysis: Wage Data

This document discusses cluster analysis techniques in R. It introduces hierarchical clustering and k-means clustering. For hierarchical clustering, it covers building a dendrogram, extracting clusters, and reviewing results. For k-means, it discusses estimating the optimal number of clusters and exploring the results. It compares the two methods and highlights what was learned about distance, scale, linkage, and evaluating the number of clusters.

Uploaded by

debojyoti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

DataCamp Cluster

Analysis in R

CLUSTER ANALYSIS IN R

Occupational Wage
Data

Dmitriy (Dima) Gorenshteyn


Sr. Data Scientist,
Memorial Sloan Kettering Cancer Center
DataCamp Cluster Analysis in R

Occupational Wage Data


22 Occupation Observations
15 Measurements of Average Income from 2001-2016
DataCamp Cluster Analysis in R

Occupational Wage Data


print(oes)
2001 2002 2003 2004 2005 ...
Management 70800 78870 83400 87090 88450 ...
Business Operations 50580 53350 56000 57120 57930 ...
Computer Science 60350 61630 64150 66370 67100 ...
Architecture/Engineering 56330 58020 60390 63060 63910 ...
Life/Physical/Social Sci. 49710 52380 54930 57550 58030 ...
Community Services 34190 34630 35800 37050 37530 ...
... ... ... ... ... ... ...
DataCamp Cluster Analysis in R

Occupational Wage Data


DataCamp Cluster Analysis in R

Next Steps: Hierarchical Clustering


Evaluate whether pre-processing is necessary
Create a distance matrix
Build a dendrogram
Extract clusters from dendrogram
Explore resulting clusters
DataCamp Cluster Analysis in R

CLUSTER ANALYSIS IN R

Let's practice!
DataCamp Cluster Analysis in R

CLUSTER ANALYSIS IN R

Reviewing the
Hierarchical Clustering
Results
Dmitriy (Dima) Gorenshteyn
Sr. Data Scientist,
Memorial Sloan Kettering Cancer Center
DataCamp Cluster Analysis in R

The Dendrogram
DataCamp Cluster Analysis in R

The Trends
DataCamp Cluster Analysis in R

Connecting The Two


DataCamp Cluster Analysis in R

Next Steps: k-means Clustering


Evaluate whether pre-processing is necessary
Estimate the "best" k using the elbow plot
Estimate the "best" k using the maximum average silhouette width
Explore resulting clusters
DataCamp Cluster Analysis in R

CLUSTER ANALYSIS IN R

Let's cluster!
DataCamp Cluster Analysis in R

CLUSTER ANALYSIS IN R

Review K-means
Results

Dmitriy (Dima) Gorenshteyn


Sr. Data Scientist,
Memorial Sloan Kettering Cancer Center
DataCamp Cluster Analysis in R
DataCamp Cluster Analysis in R

Comparing The Two Clustering Methods


Hierarchical Clustering k-means

Distance Used: virtually any euclidean only

Results Stable: Yes No

Evaluating # of dendrogram, silhouette, silhoette,


Clusters: elbow elbow

Computation Relatively Higher Relatively


Complexity: Lower
DataCamp Cluster Analysis in R

What you have learned?


Chapter 1:
What is distance
Why is scale important
Chapter 2:
How linkage works
How the dendrogram is formed
How to analyze your clusters
Chapter 3:
How k-means works
How to estimate k
DataCamp Cluster Analysis in R

Lot's More to Learn


k-mediods
DBSCAN
Optics
DataCamp Cluster Analysis in R

CLUSTER ANALYSIS IN R

Congratulations!

You might also like