0% found this document useful (0 votes)
7 views26 pages

Segmentation:Clustering: Krissie 2024-11-21

The document outlines a clustering analysis using hierarchical methods on a dataset containing demographic information. It details the steps of loading, cleaning, and preparing the data, as well as computing a distance matrix and performing hierarchical clustering. The analysis aims to group the data into segments and visualize the relationships within the data using dendrograms.

Uploaded by

Gamyuii Kitsana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views26 pages

Segmentation:Clustering: Krissie 2024-11-21

The document outlines a clustering analysis using hierarchical methods on a dataset containing demographic information. It details the steps of loading, cleaning, and preparing the data, as well as computing a distance matrix and performing hierarchical clustering. The analysis aims to group the data into segments and visualize the relationships within the data using dendrograms.

Uploaded by

Gamyuii Kitsana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Segmentation:Clustering

Krissie

2024-11-21

#----------------Segmentation---------------------#

##--------##Clustering##---------##

#1.Loading data
setwd("/Users/kitsanasudsaneh/Desktop/fall2024")
seg.raw <-read.csv("rintro-chapter5.csv")
str(seg.raw)

## ’data.frame’: 300 obs. of 7 variables:


## $ age : num 47.3 31.4 43.2 37.3 41 ...
## $ gender : chr "Male" "Male" "Male" "Female" ...
## $ income : num 49483 35546 44169 81042 79353 ...
## $ kids : int 2 1 0 1 3 4 3 0 1 0 ...
## $ ownHome : chr "ownNo" "ownYes" "ownYes" "ownNo" ...
## $ subscribe: chr "subNo" "subNo" "subNo" "subNo" ...
## $ Segment : chr "Suburb mix" "Suburb mix" "Suburb mix" "Suburb mix" ...

#2.Cleaning and Preparing the data: convert some variable to factor (chr > num)
seg.raw$gender <- factor(seg.raw$gender)
seg.raw$ownHome <- factor(seg.raw$ownHome)
seg.raw$subscribe <- factor(seg.raw$subscribe)
seg.raw$Segment <- factor(seg.raw$Segment)
str(seg.raw)

## ’data.frame’: 300 obs. of 7 variables:


## $ age : num 47.3 31.4 43.2 37.3 41 ...
## $ gender : Factor w/ 2 levels "Female","Male": 2 2 2 1 1 2 2 2 1 1 ...
## $ income : num 49483 35546 44169 81042 79353 ...
## $ kids : int 2 1 0 1 3 4 3 0 1 0 ...
## $ ownHome : Factor w/ 2 levels "ownNo","ownYes": 1 2 2 1 2 2 1 1 1 2 ...
## $ subscribe: Factor w/ 2 levels "subNo","subYes": 1 1 1 1 1 1 1 1 1 1 ...
## $ Segment : Factor w/ 4 levels "Moving up","Suburb mix",..: 2 2 2 2 2 2 2 2 2 2 ...

## group into 4 segments

#---Remove Segment column (column 7)


seg.df <- seg.raw[,-7]

#-------##Distance-based Methods##--------#

1
##
##
##
##
##
Height

4
3
2
1
0.0 0.2 0.4 0.6 0.8

0.2617250
0.2329028
0.2532815
0.0000000
1
128
137
102
101
107
library(cluster)
173
219
298
256
287 65
172
141
121
129
89
257
242
278
294
283
288
185
204
21513053
258
20
271
276
261
293199
223
194
224
19184108

0.4129493
0.0679978
0.0000000
0.2532815
206
186
3595
249
154 2
227
218
229
205
212
217
111
262
133
147
146
149
117
142
124
134
132
#---Compute a Distance Matrix

138
82
246
49
233
32
63
266
292 88
248
260
282
42
255
281484
30
24413
289
237 0.4246012
0.0000000
0.0679978
0.2329028
3

277
254
12
243
#---Perform Hierarchical Clustering
#ideal for exploring relationships.

245
69
268
272
66
297
285
251
284
270
67
275
61

#---Cut the tree and Focus on a Branch


38
64
71
98
280 9
29
259
155
159
221
200
220
180
166
178
201
210
162
151
181
207
183
228 plot(seg.hc) #Visualize the Full Dendrogram
182
188
169
0.0000000
0.4246012
0.4129493
0.2617250
4

187
161
197
157
171
213

2
51
72
24740
24
97
240
279
68
5
47
22
273
55
274
118
145 25
122
150
231
39
139
190
19
83
296

seg.dist
10
252
17
286
18
168
21
8781
92
73
26
241
232
300 265
91
96
52
236
41
439 0
99
37
14
167
238
65

plot(cut(as.dendrogram(seg.hc),h=0.5)$lower[[1]]) #cut 0.5


hclust (*, "complete")
264
15
80
33
57502
144
120
125
Cluster Dendrogram

189
167
226
184
225
153
160
176
211
214
165
216
as.matrix(seg.dist)[1:4,1:4] #Distances of first 4 observations

11
79
195
295
175 3
269
156
179
192
152
208
196
203
198
170
177
163
164
209
174
193
230 202
112
115
109
126
127
131
143
103
113
105
140
119
148
106
123
135
#3.Hierarchical Clustering Method:groups data based on a tree structure,

116
110
136
104
114 7
27
58
239
250
23
234
235
54
78
60
94 76
34
253
267
222
93
568
28
15831
62
441
45
77
70
74
29059
100
86
36
263
85
291
46
299
seg.dist <- daisy(seg.df) #calculates distances for mixed data (numeric + categorical)

seg.hc <-hclust(seg.dist,method = "complete") #This method creates a tree structure (dendrogram)


0.30
0.20
0.10
0.00

128
137
102
101
107
173
219
298
256
287
65
172
141
121
129
89
257
242
278
294
283
288
#---Comparing observations in branches
seg.df[c(128,137),] #similar

## age gender income kids ownHome subscribe


## 128 21.80737 Male 27807.24 2 ownNo subYes
## 137 24.88971 Male 23326.45 2 ownNo subYes

seg.df[c(141,89),] #less similar

## age gender income kids ownHome subscribe


## 141 25.17703 Female 20125.80 2 ownNo subYes
## 89 41.89570 Female 58871.48 3 ownNo subYes

#---Cophenetic Correlation Coefficient: how well the hierarchical clustering (dendrogram) represents
#the actual distances in the data (close to 1 > good)
cor(cophenetic(seg.hc),seg.dist)

## [1] 0.7682436

#---Getting K groups from the tree: Decide how many you decide how many groups (clusters) you want to ma
plot(seg.hc)
rect.hclust(seg.hc, k=4,border = "red")

3
Cluster Dendrogram
0.8
0.6
0.4
Height

0.2
0.0

13053
84108

202
265

76
258

568
185

7
199
223
204
215
141

3595

116
88

5750
248

91
190

441
81
92

209
153
160
0
25

65
238
276
172
65

156

110
184
225
207
254

231
118
180

267
121
129
173
219
298

270

157

235
22
273
68
283
288

222
96
155
237
277

136
296
186

240
279
194
224

210
111
262

484

250
191
206

295
271
20

289
13

26
241
146

100
59
31
62
40

27
47
5

73

196
203
165
216
161
197

189

174
200
220
102

264
242

19
83
162
257
89

285
12

86
274
55

239
58
154
227
132
138

168
18

11
261
293

93
69
268
266
292

176
182

70
128
137

205

139
39

99

106
54
34
253
61

232
300

103
234
23
41
43
171
213
188

236
52

193
230

45
77
166

252
10
256
287

51
72
243
245

247

3
163
164

263
36
158
28
272
32
63
233
49
30
244

259
29
151
181

37

104
114
211
214
152
208
144
278
294

183
228

112
115
145

179
192
275
67

21
87
249

46
299
24
97
149

198

119
148
42

251
284
124
134

159
221

195
79
133
147

131
143
260
282

286
17

33
9
297
66
64
71

169
187

78
246
82

7
255
281

389

109
113
101
107

167
226

74
290
212
217

175
269

123
135
280
98

105
140
218
229

122
150

15
80
117
142

178
201

14
16

170
177
120
125

60
94
126
127

291
85
seg.dist
hclust (*, "complete")

seg.hc.segment <-cutree(seg.hc, k =4) #It assigns each data point to one of these 4 clusters.
table(seg.hc.segment) #counts how many observations fall into each cluster.

## seg.hc.segment
## 1 2 3 4
## 124 136 18 22

#4.Quick summary to inspect the variables in seg.df with reference to the 4 clusters.
seg.summ <- function(data, groups)
{
aggregate(data, list(groups), function(x) mean(as.numeric(x)))
}
seg.summ(seg.df,seg.hc.segment)

## Group.1 age gender income kids ownHome subscribe


## 1 1 40.78456 2.000000 49454.08 1.314516 1.467742 1
## 2 2 42.03492 1.000000 53759.62 1.235294 1.477941 1
## 3 3 44.31194 1.388889 52628.42 1.388889 2.000000 2
## 4 4 35.82935 1.545455 40456.14 1.136364 1.000000 2

#5.Relate to the clusters(seg.hc.segment),creates a scatterplot to visualize how two variables


#gender and subscribe
plot(jitter(as.numeric(seg.df$gender)) ~ jitter(as.numeric(seg.df$subscribe)),
col=seg.hc.segment, yaxt="n", xaxt="n", ylab="", xlab="")

4
axis(1, at=c(1, 2), labels=c("Subscribe: No", "Subscribe: Yes"))
axis(2, at=c(1, 2), labels=levels(seg.df$gender))
Male
Female

Subscribe: No Subscribe: Yes

##the choice of variables (like gender and subscribe) depends on what we want to learn.
##For example:Are males more likely to subscribe?
#Do certain clusters have more females or males who subscribe?

##6.K-means clustering

#---Convert Factors to Numeric (K-means works only with numeric data)


seg.df.num <- seg.df
seg.df.num$gender <- ifelse(seg.df$gender == "Male",0,1)
seg.df.num$ownHome <- ifelse(seg.df$ownHome == "ownNo",0,1)
seg.df.num$subscribe <- ifelse(seg.df$subscribe == "subNo",0,1)
str(seg.df.num)

## ’data.frame’: 300 obs. of 6 variables:


## $ age : num 47.3 31.4 43.2 37.3 41 ...
## $ gender : num 0 0 0 1 1 0 0 0 1 1 ...
## $ income : num 49483 35546 44169 81042 79353 ...
## $ kids : int 2 1 0 1 3 4 3 0 1 0 ...
## $ ownHome : num 0 1 1 0 1 1 0 0 0 1 ...
## $ subscribe: num 0 0 0 0 0 0 0 0 0 0 ...

5
#---Run K-means: minimizes the distance within clusters finding clusters by finding cluster centers
set.seed(96743) #Set seed for reproducibility
seg.k <- kmeans(seg.df.num, centers = 4)

#---Summarize Clusters
seg.summ <- function(data, groups)
{
aggregate(data, list(groups), mean)
}
seg.summ(seg.df,seg.k$cluster)

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:


## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Group.1 age gender income kids ownHome subscribe


## 1 1 29.58704 NA 21631.79 1.063492 NA NA
## 2 2 55.40968 NA 89959.96 0.360000 NA NA
## 3 3 42.38909 NA 47799.84 1.434783 NA NA
## 4 4 43.66931 NA 63630.70 1.443299 NA NA

#---Visualize the Clusters: A plot show how well-separated the clusters are.

##--Boxplot: Use to compare group on 1 variable


boxplot(seg.df.num$income ~ seg.k$cluster,
xlab = "Segments", ylab = "Income", horizontal = TRUE,
main = "Boxplot of comparing income in each Segment")

6
Boxplot of comparing income in each Segment
4
3
Income

2
1

0e+00 2e+04 4e+04 6e+04 8e+04 1e+05

Segments

##--Clusplot: use to visualizing the overall clusters to see that they share similarities in the data va
library(cluster)
clusplot(seg.df, seg.k$cluster, color = TRUE, shade = TRUE, labels = 4,
lines = 0,main = "Clusplot of the overall segment")

7
Clusplot of the overall segment

3
3

1 4
2
2
Component 2

1
0
−1
−2
−3

−4 −2 0 2 4

Component 1
These two components explain 48.49 % of the point variability.
#--------Model-based Methods-------#

#1.poLCA(): use only categorical variables ; finds hidden groups (latent class) when they aren't directl

#---Convert Continuous Data to Catgories:numeric variables need to be grouped


seg.df.cut <-seg.df
seg.df.cut$age <- factor (ifelse(seg.df$age < median (seg.df$age),"LessAge","MoreAge"))
seg.df.cut$income <- factor(ifelse(seg.df$income < median(seg.df$income), "LessInc", "MoreInc"))
seg.df.cut$kids <- factor(ifelse(seg.df$kids < median(seg.df$kids),"Lesskids","Morekids"))
str(seg.df.cut)

## ’data.frame’: 300 obs. of 6 variables:


## $ age : Factor w/ 2 levels "LessAge","MoreAge": 2 1 2 1 2 2 1 1 2 1 ...
## $ gender : Factor w/ 2 levels "Female","Male": 2 2 2 1 1 2 2 2 1 1 ...
## $ income : Factor w/ 2 levels "LessInc","MoreInc": 1 1 1 2 2 2 1 1 1 2 ...
## $ kids : Factor w/ 2 levels "Lesskids","Morekids": 2 2 1 2 2 2 2 1 2 1 ...
## $ ownHome : Factor w/ 2 levels "ownNo","ownYes": 1 2 2 1 2 2 1 1 1 2 ...
## $ subscribe: Factor w/ 2 levels "subNo","subYes": 1 1 1 1 1 1 1 1 1 1 ...

#--- Create a formula: that combines multiple variable into one group.It prepares these variables to be
seg.f <- with(seg.df.cut,cbind(age, gender, income, kids, ownHome, subscribe)~1)

#2.Run model in different class (start with the smaller number) How well the data fits into n classes.
library(poLCA)

## Loading required package: scatterplot3d

8
## Loading required package: MASS

set.seed(02807)
seg.LCA3 <- poLCA(seg.f, data = seg.df.cut,nclass = 3) # there are 3 latent (hidden) classes.

## Conditional item response (column) probabilities,


## by outcome variable, for each class (row)
##
## $age
## LessAge MoreAge
## class 1: 1.0000 0.0000
## class 2: 0.0000 1.0000
## class 3: 0.6555 0.3445
##
## $gender
## Female Male
## class 1: 0.4211 0.5789
## class 2: 0.4681 0.5319
## class 3: 0.6079 0.3921
##
## $income
## LessInc MoreInc
## class 1: 1.0000 0.0000
## class 2: 0.3803 0.6197
## class 3: 0.3746 0.6254
##
## $kids
## Lesskids Morekids
## class 1: 0.2818 0.7182
## class 2: 0.8065 0.1935
## class 3: 0.1575 0.8425
##
## $ownHome
## ownNo ownYes
## class 1: 0.7289 0.2711
## class 2: 0.2338 0.7662
## class 3: 0.6638 0.3362
##
## $subscribe
## subNo subYes
## class 1: 0.7496 0.2504
## class 2: 0.8948 0.1052
## class 3: 0.8960 0.1040
##
## Estimated class population shares
## 0.1974 0.341 0.4616
##
## Predicted class memberships (by modal posterior prob.)
## 0.2333 0.3467 0.42
##
## =========================================================
## Fit for 3 latent classes:
## =========================================================
## number of observations: 300

9
## number of estimated parameters: 20
## residual degrees of freedom: 43
## maximum log-likelihood: -1092.345
##
## AIC(3): 2224.691
## BIC(3): 2298.767
## G^2(3): 42.77441 (Likelihood ratio/deviance statistic)
## X^2(3): 38.47647 (Chi-square goodness of fit)
##

seg.LCA4 <- poLCA(seg.f,data = seg.df.cut,nclass = 4) #there are 4 latent (hidden) classes.

## Conditional item response (column) probabilities,


## by outcome variable, for each class (row)
##
## $age
## LessAge MoreAge
## class 1: 0.6823 0.3177
## class 2: 0.0000 1.0000
## class 3: 1.0000 0.0000
## class 4: 1.0000 0.0000
##
## $gender
## Female Male
## class 1: 0.5853 0.4147
## class 2: 0.4810 0.5190
## class 3: 0.8466 0.1534
## class 4: 0.3277 0.6723
##
## $income
## LessInc MoreInc
## class 1: 0.4137 0.5863
## class 2: 0.3701 0.6299
## class 3: 0.5850 0.4150
## class 4: 1.0000 0.0000
##
## $kids
## Lesskids Morekids
## class 1: 0.0000 1.0000
## class 2: 0.8114 0.1886
## class 3: 1.0000 0.0000
## class 4: 0.2506 0.7494
##
## $ownHome
## ownNo ownYes
## class 1: 0.6540 0.3460
## class 2: 0.2688 0.7312
## class 3: 0.6537 0.3463
## class 4: 0.7721 0.2279
##
## $subscribe
## subNo subYes
## class 1: 0.8746 0.1254
## class 2: 0.8965 0.1035

10
## class 3: 1.0000 0.0000
## class 4: 0.7203 0.2797
##
## Estimated class population shares
## 0.4101 0.3697 0.0643 0.1559
##
## Predicted class memberships (by modal posterior prob.)
## 0.41 0.3733 0.0667 0.15
##
## =========================================================
## Fit for 4 latent classes:
## =========================================================
## number of observations: 300
## number of estimated parameters: 27
## residual degrees of freedom: 36
## maximum log-likelihood: -1088.021
##
## AIC(4): 2230.041
## BIC(4): 2330.043
## G^2(4): 34.12473 (Likelihood ratio/deviance statistic)
## X^2(4): 31.50696 (Chi-square goodness of fit)
##

#---compare Model Fit using: BIC (A lower BIC value indicates a better fit)
seg.LCA3$bic

## [1] 2298.767

seg.LCA4$bic

## [1] 2330.043

#---Examine the 3-group model


seg.summ(seg.df, seg.LCA3$predclass) #function summarizes the dataset

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:


## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

11
## Group.1 age gender income kids ownHome subscribe
## 1 1 28.22385 NA 30075.32 1.1285714 NA NA
## 2 2 54.44407 NA 60082.47 0.3846154 NA NA
## 3 3 37.47652 NA 54977.08 2.0793651 NA NA

table(seg.LCA3$predclass) #how many observations belong to each class

##
## 1 2 3
## 70 104 126

clusplot(seg.df, seg.LCA3$predclass, color = TRUE, shade = TRUE, lables = 4,


lines=0, main ="LCA plot (K=3)")

## Warning in plot.window(...): "lables" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "lables" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "lables" is not a


## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "lables" is not a
## graphical parameter

## Warning in box(...): "lables" is not a graphical parameter

## Warning in title(...): "lables" is not a graphical parameter

## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical


## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter

12
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter

## Warning in polygon(z[[k]], density = if (shade) density[k] else 0, col =

13
## col.clus[jInd[i]], : "lables" is not a graphical parameter

## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical


## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter

14
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter

## Warning in polygon(z[[k]], density = if (shade) density[k] else 0, col =


## col.clus[jInd[i]], : "lables" is not a graphical parameter

## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical


## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter

15
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter

16
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter

## Warning in polygon(z[[k]], density = if (shade) density[k] else 0, col =


## col.clus[jInd[i]], : "lables" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "lables" is not a


## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "lables" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "lables" is not a
## graphical parameter

LCA plot (K=3)


3
2
Component 2

1
0
−1
−2
−3

−3 −2 −1 0 1 2 3 4

Component 1
These two components explain 48.49 % of the point variability.

#---Examine the 4-group model


seg.summ(seg.df, seg.LCA4$predclass) #function summarizes the dataset

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:


## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:


## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:

17
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Group.1 age gender income kids ownHome subscribe


## 1 1 36.62554 NA 52080.13 2.1951220 NA NA
## 2 2 53.64073 NA 60534.17 0.5178571 NA NA
## 3 3 30.22575 NA 41361.81 0.0000000 NA NA
## 4 4 27.61506 NA 28178.70 1.1777778 NA NA

table(seg.LCA4$predclass) #how many observations belong to each class

##
## 1 2 3 4
## 123 112 20 45

clusplot(seg.df, seg.LCA4$predclass, color = TRUE, shade = TRUE, lables = 4,


lines=0, main ="LCA plot (K=4)")

## Warning in plot.window(...): "lables" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "lables" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "lables" is not a


## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "lables" is not a
## graphical parameter

## Warning in box(...): "lables" is not a graphical parameter

## Warning in title(...): "lables" is not a graphical parameter

## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical


## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical

18
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter

## Warning in polygon(z[[k]], density = if (shade) density[k] else 0, col =


## col.clus[jInd[i]], : "lables" is not a graphical parameter

## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical


## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical

19
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter

## Warning in polygon(z[[k]], density = if (shade) density[k] else 0, col =


## col.clus[jInd[i]], : "lables" is not a graphical parameter

## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical


## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical

20
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter

## Warning in polygon(z[[k]], density = if (shade) density[k] else 0, col =


## col.clus[jInd[i]], : "lables" is not a graphical parameter

## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical


## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical

21
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical

22
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter
## Warning in segments(lx1, ly1, lx2, ly2, ...): "lables" is not a graphical
## parameter

## Warning in polygon(z[[k]], density = if (shade) density[k] else 0, col =


## col.clus[jInd[i]], : "lables" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "lables" is not a


## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "lables" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "lables" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "lables" is not a
## graphical parameter

LCA plot (K=4)


3
2
Component 2

1
0
−1
−2

−2 0 2

Component 1
These two components explain 48.49 % of the point variability.

#2.Mclust:
#---Fit a Model-Based Clustering
library(mclust)

## Warning: package ’mclust’ was built under R version 4.3.2

## Package ’mclust’ version 6.1.1


## Type ’citation("mclust")’ for citing this R package in publications.

23
seg.mc <-Mclust(seg.df.num)
summary(seg.mc) #Fit a 3-Cluster Model

## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VEV (ellipsoidal, equal shape) model with 3 components:
##
## log-likelihood n df BIC ICL
## -5137.106 300 73 -10690.59 -10690.59
##
## Clustering table:
## 1 2 3
## 163 71 66

#---Fit a 4-Cluster Model


#The log-likelihood: how well the model fits the data: higher is better.
seg.mc4 <- Mclust(seg.df.num,G=4)
summary(seg.mc4)

## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VII (spherical, varying volume) model with 4 components:
##
## log-likelihood n df BIC ICL
## -16862.69 300 31 -33902.19 -33906.18
##
## Clustering table:
## 1 2 3 4
## 104 66 59 71

#---Compare Models Using BIC


BIC(seg.mc,seg.mc4) #Lower BIC is better

## df BIC
## seg.mc 73 10690.59
## seg.mc4 31 33902.19

#---Summarize Variables by Cluster


seg.summ(seg.df,seg.mc$class)

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:


## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:


## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

24
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Group.1 age gender income kids ownHome subscribe


## 1 1 44.68018 NA 52980.52 1.171779 NA NA
## 2 2 38.02229 NA 51550.98 1.422535 NA NA
## 3 3 36.02187 NA 45227.51 1.348485 NA NA

#---Visualize Clusters #if clusters overlap, it means some groups are less distinct.
library(cluster)
clusplot(seg.df, seg.mc$class, color= TRUE, shade = TRUE, labels = 4,
main = "Mcluster Plot")

Mcluster Plot

1
3
2

3
Component 2

2
1
0
−1
−2
−3

−3 −2 −1 0 1 2 3 4

Component 1
These two components explain 48.49 % of the point variability.

25
Including Plots

You can also embed plots, for example:


800
600
pressure

400
200
0

0 50 100 150 200 250 300 350

temperature

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that
generated the plot.

26

You might also like