0% found this document useful (0 votes)

18 views10 pages

Lecture08b Kmeans

Lecture notes on the k means algorithm

Uploaded by

Greg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views10 pages

Lecture08b Kmeans

Lecture notes on the k means algorithm

Uploaded by

Greg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Machine Learning Course - CS-433

K-Means Clustering

Nov 15, 2018

c Mohammad Emtiyaz Khan 2015

minor changes by Martin Jaggi 2016
minor changes by Martin Jaggi 2017
changes by Ruediger Urbanke 2018
Last updated: November 13, 2018
Clustering
Clusters are groups of points so that the distances within
the clusters (groups) are small compared to the distances
between the clusters (groups).
The clusters are typically defined by finding some “centers”
µ1, µ2,, . . . , µK . Each such center then defines one cluster
via an assignement mapping zn ∈ {1, 2, . . . , K} for all N
data vectors xn ∈ RD .

K-means clustering
The perhaps best-known clustering algorithm is the K-means
algorithm. Assume that K is known. The optimization that
leads to the clusters is the following:
N X
X K
min L(z, µ) = znk kxn − µk k22
z,µ
n=1 k=1
K
X
s.t. µk ∈ RD , znk ∈ {0, 1}, znk = 1,
k=1
where zn = [zn1, zn2, . . . , znK ]>
z = [z1, z2, . . . , zN ]>
µ = [µ1, µ2, . . . , µK ]>

Let us discuss this in some more detail. For fixed centers µk ,

the cost is minimized if we map each sample to its nearest
center, where we measure distance in terms of Euclidean
distance. This is accomplished by means of the indicator
variables znk , znk ∈ {0, 1}, where for each sample n the sum
PK
k=1 znk = 1. In words, each sample is assigned exactly
to one center (cluster) and this is indicated by setting the
corresponding indicator variable snk to 1 and all the other
ones znk0 to 0. In addition we still have to minimize over the
best choices for the centers.
The above description leads very naturally to an algorithm.
Algorithm: Initialize µk ∀k.
Iterate:
1. For all n, compute zn given µ.
2. For all k, compute µk given z.
As we discussed, step (1) is clear. Once we fixed the cen-
ters the best assignments is to map each point to the nearest
center. But (2) is equally natural. If we have fixed assigne-
ments then it is easy to compute the best centers. This can
be formally seen by taking derivatives of the cost function
w.r.t. µk and solving for the centers. We get:
PN
n=1 znk xn
µk = PN
n=1 znk

Hence, the best centers are just the means of each cluster.
Hence, the name ‘K-means’.
Let us look at the following example. Figure (a) shows a set
of points. We have K = 2 and two initial centers are shown.
Step 1: Given the centers, in the first step we compute
the optimal assignments. I.e., for all n, compute zn given
µ. The result is shown in Figure (b) via a color-coding. We
map each point to its closest center.
2 (a) 2 (b)

0 0

−2 −2

−2 0 2 −2 0 2

1 if k = arg minj=1,2,...K kxn − µj k22

znk =
0 otherwise
Step 2: In the second step we now freeze the assignments
and re-compute the best centers for each cluster. I.e., for all
k, compute µk given z. As we mentioned, this computation

2 (b) 2 (c)

0 0

−2 −2

−2 0 2 −2 0 2

is just the computation of the mean of each cluster. T

Summary of K-means
Initialize µk ∀k, then iterate:
1. For all n, compute zn given µ.
1 if k = arg minj kxn − µj k22

znk =
0 otherwise

2. For all k, compute µk given z.

PN
n=1 znk xn
µk = PN
n=1 znk

Convergence to a local optimum is assured since each step

decreases the cost (see Bishop, Exercise 9.1). But note that
we are not guaranteed to reach the globally optimal solution
with this iterative algorithm.

Coordinate descent
K-means is a coordinate descent algorithm, where, to find
minz,µ L(z, µ), we start with some µ(0) and repeat the fol-
lowing:

z(t+1) := arg min L(z, µ(t))

z
µ(t+1) := arg min L(z(t+1), µ)
µ
Examples
K-means for the “old-faithful” dataset (Bishop’s Figure 9.1)

2 (a) 2 (b) 2 (c)

0 0 0

−2 −2 −2

−2 0 2 −2 0 2 −2 0 2

(e) Iteration 0 (f) Iteration 1 (g) Iteration 1

2 (d) 2 (e) 2 (f)

0 0 0

−2 −2 −2

−2 0 2 −2 0 2 −2 0 2

(h) Iteration 2 (i) Iteration 2 (j) Iteration 3

2 (g) 2 (h) 2 (i)

0 0 0

−2 −2 −2

−2 0 2 −2 0 2 −2 0 2

(k) Iteration 3 (l) Iteration 4 (m) Iteration 4

To consider a second example. In the figures below the colors

contained in the photo were quantized to a small number
K = 2, 3, · · · . And these “center” colors were chosen by
clustering. This is a form of data compression. It is known
the signal processing community as vector quantization.
K =2 K =3 K = 10 Original image

Probabilistic model for K-means

So far we have presented K-means using a geometric rea-
soning. But as so many other models in ML, it also has a
probabilistic interpretation.
Assume that, conditioned that a point is associated to cluster
k, we consider it a sample from the D-dimensional Gaussian
with mean µk and covariance matrix I. I.e., the likelihood
of a sample x given the cluster assignment z and the centers
µ = {µK }K
k=1 is

K
Y
p(x|z, µ) = [N (x|µk , I)]zk .
k=1

Then the likelihood associated to a whole data set Strain =

{xk }K
k=1 .

N Y
Y K
p(X|z, µ) = [N (xn|µk , I)]znk .
n=1 k=1

Taking, as always, the log and multiplying with −1 in order

to get a minimization problem this gives us
N X
X K
− log p(x|z, µ) = znk kx − µk22,
n=1 k=1

which corresponds exactly to our original formulation.

K-means as a Matrix Factorization

K-means can be interpreted as a matrix factorization prob-
lem. We will learn much more about the matrix factorization
problem in a future lecture but the set-up is very natural.
We can write:
N X
X K
min L(z, µ) = znk kxn − µk k22
z,µ
n=1 k=1
= kX − MZ>k2Frob
>

s.t. µk ∈ RD ,
K
X
znk ∈ {0, 1}, znk = 1.
k=1

As always X is the N × D data matrix whose rows are the

individual featue vectors. Hence X> has dimensions D × N .
The matrix Z is the N × K matrix whose rows are the
indicator vectors zn. And M is the real-valued matrix of
dimension D × K whose columns are the centers µk . Note
that MZ> is a D × N matrix that contains in its n-th
column one of the K centers µk . Therefore kX> − MZ>k
is a D × N matrix that contains in its n-th column the
difference between the n-th sample and the center that we
assigned it to. The Frobenius norm now squares all these
entries and sums them up. This is hence the total distance.
In order to have a short Frobenius norm our task hence is
to “factorize” the matrix X> in the form MZ> where the
matrix Z is restricted to entries from {0, 1} and each row of
Z contains exactly a single 1. This is why we say that this
is a matrix factorization formulation.
Issues with K-means
1. Computation can be heavy for large N, D and K.
2. Clusters are forced to be spherical (e.g. cannot be el-
liptical).
3. Each example can belong to only one cluster (“hard”
cluster assignments).

AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Report 1
No ratings yet
Report 1
3 pages
Clustering and K-Means Lecture
No ratings yet
Clustering and K-Means Lecture
36 pages
1 The K-Medoids Algorithm
No ratings yet
1 The K-Medoids Algorithm
5 pages
Lec11 Ann
No ratings yet
Lec11 Ann
61 pages
Clustering Techniques for CS Students
100% (1)
Clustering Techniques for CS Students
26 pages
1 Kmeans
No ratings yet
1 Kmeans
13 pages
Unit 4
No ratings yet
Unit 4
125 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
K Means
No ratings yet
K Means
33 pages
WINSEM2021-22 ECE6093 ETH VL2021220505450 Reference Material I 23-03-2022 Slides Kmeans
No ratings yet
WINSEM2021-22 ECE6093 ETH VL2021220505450 Reference Material I 23-03-2022 Slides Kmeans
28 pages
7 K-Means
No ratings yet
7 K-Means
55 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
20 pages
Lect 6 - Clustering
No ratings yet
Lect 6 - Clustering
50 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
Lec 11
No ratings yet
Lec 11
57 pages
K Mean Clustering
No ratings yet
K Mean Clustering
45 pages
Digital Image Processing: Segmentation-5
No ratings yet
Digital Image Processing: Segmentation-5
43 pages
K Mean Clustering 1
No ratings yet
K Mean Clustering 1
26 pages
3 UnSupervised Learning
No ratings yet
3 UnSupervised Learning
53 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
UNIT III Part-1
No ratings yet
UNIT III Part-1
69 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
K-Means Clustering-Converted-Merged
No ratings yet
K-Means Clustering-Converted-Merged
76 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
20 pages
K Means
No ratings yet
K Means
10 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
K Mean
No ratings yet
K Mean
12 pages
Clustering MIT 15.097 Course Notes
No ratings yet
Clustering MIT 15.097 Course Notes
9 pages
Week 10
No ratings yet
Week 10
41 pages
L4 Kmeans
No ratings yet
L4 Kmeans
30 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
ML Lec13
No ratings yet
ML Lec13
3 pages
K Mean
No ratings yet
K Mean
7 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
K - Means Clustering
No ratings yet
K - Means Clustering
3 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
Kmea
No ratings yet
Kmea
53 pages
K-means Clustering Explained
No ratings yet
K-means Clustering Explained
41 pages
Clustering
No ratings yet
Clustering
18 pages
5clustering 2
No ratings yet
5clustering 2
35 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
22 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
06 - Unsupervised Learning - 18 Dec 2023
No ratings yet
06 - Unsupervised Learning - 18 Dec 2023
50 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Session 3-Clustering
No ratings yet
Session 3-Clustering
41 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
A Course in Algebra 1, English Translation Edition Ernest Borisovich Vinberg Available Instanly
No ratings yet
A Course in Algebra 1, English Translation Edition Ernest Borisovich Vinberg Available Instanly
98 pages
MLP Vectors Answers
No ratings yet
MLP Vectors Answers
2 pages
Fe Sem I Tutorial 2017-18 (Improved)
No ratings yet
Fe Sem I Tutorial 2017-18 (Improved)
3 pages
3 Marks em
No ratings yet
3 Marks em
25 pages
Geometry Transformations: 1 MCA 401A - Computer Graphics, Dept of Computer SC, M U
No ratings yet
Geometry Transformations: 1 MCA 401A - Computer Graphics, Dept of Computer SC, M U
70 pages
03 Homogeneous Transformations
No ratings yet
03 Homogeneous Transformations
38 pages
CE2407B Lecture 2 PDF
No ratings yet
CE2407B Lecture 2 PDF
26 pages
Tut 2 - FromStats2DM - Linear Algebra and Convex Optimzation
No ratings yet
Tut 2 - FromStats2DM - Linear Algebra and Convex Optimzation
26 pages
Namma Kalvi 12th Maths Pta Question Papers Answer Keys em 217915
No ratings yet
Namma Kalvi 12th Maths Pta Question Papers Answer Keys em 217915
74 pages
CBSE Test Paper 01 Chapter 3 Matrices
No ratings yet
CBSE Test Paper 01 Chapter 3 Matrices
7 pages
Revision Worksheet (Transformation and Vector) - Grade IX
No ratings yet
Revision Worksheet (Transformation and Vector) - Grade IX
3 pages
gsm036 Endmatter
100% (1)
gsm036 Endmatter
32 pages
Feyn Calc
No ratings yet
Feyn Calc
6 pages
Exercise4.1 Solution
No ratings yet
Exercise4.1 Solution
2 pages
Linear Algebra
No ratings yet
Linear Algebra
17 pages
Matrix Power ComputationBand Toeplitz Structure
No ratings yet
Matrix Power ComputationBand Toeplitz Structure
5 pages
Quantum Superpositions Explained
No ratings yet
Quantum Superpositions Explained
8 pages
Stokes Theorem MCQ
No ratings yet
Stokes Theorem MCQ
2 pages
Plasticity Theory Jacob Lubliner
80% (5)
Plasticity Theory Jacob Lubliner
540 pages
LA Final 223
No ratings yet
LA Final 223
41 pages
Mathematical Economics: Hasin Yousaf
0% (1)
Mathematical Economics: Hasin Yousaf
36 pages
Solved Paper 1 1
No ratings yet
Solved Paper 1 1
9 pages
57F0D
No ratings yet
57F0D
10 pages
Linear Algebra
No ratings yet
Linear Algebra
23 pages
Change of Basis
No ratings yet
Change of Basis
6 pages
Kalman Filtering for Engineers
No ratings yet
Kalman Filtering for Engineers
34 pages
PCA Steps - Numerical Problem
No ratings yet
PCA Steps - Numerical Problem
8 pages
MA 1140: Elementary Linear Algebra: Dipankar Ghosh (IIT Hyderabad)
No ratings yet
MA 1140: Elementary Linear Algebra: Dipankar Ghosh (IIT Hyderabad)
16 pages
Linear Transformations & Matrix Multiplication
No ratings yet
Linear Transformations & Matrix Multiplication
4 pages
Rank and Nullity
No ratings yet
Rank and Nullity
6 pages

Lecture08b Kmeans

Uploaded by

Lecture08b Kmeans

Uploaded by

Machine Learning Course - CS-433

Nov 15, 2018

c Mohammad Emtiyaz Khan 2015

Let us discuss this in some more detail. For fixed centers µk ,

1 if k = arg minj=1,2,...K kxn − µj k22

is just the computation of the mean of each cluster. T

2. For all k, compute µk given z.

Convergence to a local optimum is assured since each step

z(t+1) := arg min L(z, µ(t))

2 (a) 2 (b) 2 (c)

(e) Iteration 0 (f) Iteration 1 (g) Iteration 1

2 (d) 2 (e) 2 (f)

(h) Iteration 2 (i) Iteration 2 (j) Iteration 3

2 (g) 2 (h) 2 (i)

(k) Iteration 3 (l) Iteration 4 (m) Iteration 4

To consider a second example. In the figures below the colors

Probabilistic model for K-means

Then the likelihood associated to a whole data set Strain =

Taking, as always, the log and multiplying with −1 in order

which corresponds exactly to our original formulation.

K-means as a Matrix Factorization

As always X is the N × D data matrix whose rows are the

You might also like