0% found this document useful (0 votes)
11 views40 pages

Gaussian Mixture Models: LE Thi Khuyen

The document provides an overview of Gaussian Mixture Models (GMMs), including their definition, the Expectation-Maximization (EM) algorithm for fitting GMMs, and implementation details using scikit-learn. It discusses the advantages and disadvantages of GMMs compared to K-means clustering, particularly in handling non-circular clusters and overlapping data points. Additionally, it includes practical exercises for implementing the EM algorithm on a sample dataset.

Uploaded by

datuan.b5tneu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views40 pages

Gaussian Mixture Models: LE Thi Khuyen

The document provides an overview of Gaussian Mixture Models (GMMs), including their definition, the Expectation-Maximization (EM) algorithm for fitting GMMs, and implementation details using scikit-learn. It discusses the advantages and disadvantages of GMMs compared to K-means clustering, particularly in handling non-circular clusters and overlapping data points. Additionally, it includes practical exercises for implementing the EM algorithm on a sample dataset.

Uploaded by

datuan.b5tneu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn

-learn Advan

Gaussian Mixture Models

LE Thi Khuyen

February 2025

LE Thi Khuyen Gaussian Mixture Models February 2025 1 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

1 Context

2 Introduction of Gaussian Mixture model

3 Expectation-Maximization for GMM

4 GMM implementation from scratch and with scikit-learn

5 Advantages and disadvantages of GMMs

LE Thi Khuyen Gaussian Mixture Models February 2025 2 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

Drawback of K-means clustering:


• Uses Euclidean distance for assigning data. What if clusters has non-circular shape?
• Assigns each datapoint to exactly one cluster (hard assignment). What if the clusters are
overlapping.

LE Thi Khuyen Gaussian Mixture Models February 2025 3 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

1 Context

2 Introduction of Gaussian Mixture model

3 Expectation-Maximization for GMM

4 GMM implementation from scratch and with scikit-learn

5 Advantages and disadvantages of GMMs

LE Thi Khuyen Gaussian Mixture Models February 2025 4 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

Gaussian mixture model


A Gaussian mixture model is a weighted sum of K component Gaussian densities, given by
the equation:
K
X
p(x|πk , µk , Σk ) = πk g(x|µk , Σk ), (1)
k =1
PK
where x ∈ RD , {πk }k=1,K are the mixture weights, k =1 πk = 1, and g(x|µk , Σk ) is the
Gaussian density function of k th -cluster,
 
1 1
g(x|µk , Σk ) = exp − (x − µk )T Σ−1
k
(x − µk ) , (2)
(2π)D/2 |Σk |1/2 2
with mean vector µk ∈ RD and covariance matrix Σk ∈ RD×D .

Source: www.robots.ox.ac.uk
LE Thi Khuyen Gaussian Mixture Models February 2025 5 / 34
Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

The cost function for fitting a GMM


Given X = x1 , . . . , xN , for each data point xi ,

K
X
p(xi ) = πk N (xi |µk , Σk ). (3)
k =1

Assuming {x1 , . . . , xN } are independent, the likelihood of the GMM for N points is:

N
Y N X
Y K
P(x|θ) = p(xi ) = πk N (xi |µk , Σk ) (4)
i=1 i=1 k =1

where θ = {(πk , µk , Σk )}k =1,K .


Maximizing P(x|θ) ⇔ minimizing the following function (cost function) wrt (πk , µk , Σk ):

N
X K
X
L(θ) = − ln πk N (xi |µk , Σk ). (5)
i=1 k=1

LE Thi Khuyen Gaussian Mixture Models February 2025 6 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

Latent variables
Given a datapoint xi , let define: 
1 if xi ∈ Ck ,
zik = (6)
0 otherwise.
zi = (zi1 , zi2 , . . . , ziK ) is called a latent variable (unobserved) and zik ∼ Ber (πk ), (i.e
p(zik = 1) = πk ).
Generative model

• The GMM is also a generative model.


• To generate a datapoint xi , we first generate the latent variable zi
• Then, generating xi ∼ N (x|µk , Σk ).

LE Thi Khuyen Gaussian Mixture Models February 2025 7 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

Given xi , the conditional probability that xi belonging to cluster Ck is:


π N (x |µ ,Σ )
z }|k { z i }|k k{
P(zik = 1) P(xi |zik =1 )
γik = P(zik = 1|xi ) = PK . (7)
l=1 P(zil = 1) P(xi |zil = 1)
| {z } | {z }
πl N (xi |µl ,Σl )

LE Thi Khuyen Gaussian Mixture Models February 2025 8 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

Given xi , the conditional probability that xi belonging to cluster Ck is:


π N (x |µ ,Σ )
z }|k { z i }|k k{
P(zik = 1) P(xi |zik =1 )
γik = P(zik = 1|xi ) = PK . (7)
l=1 P(zil = 1) P(xi |zil = 1)
| {z } | {z }
πl N (xi |µl ,Σl )

The derivative of L(θ) wrt µk :


N
X πk N (xi |µk , Σk )
∂L(θ) −1
=− PK Σk (xi − µk ). (8)
∂µk i=1 l=1 πl N (xi |µl , Σl )
| {z }
γik

N N
∂L(θ) X X
=0⇔ γik xi = γik µk . (9)
∂µk i=1 i=1

LE Thi Khuyen Gaussian Mixture Models February 2025 8 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

Given xi , the conditional probability that xi belonging to cluster Ck is:


π N (x |µ ,Σ )
z }|k { z i }|k k{
P(zik = 1) P(xi |zik =1 )
γik = P(zik = 1|xi ) = PK . (7)
l=1 P(zil = 1) P(xi |zil = 1)
| {z } | {z }
πl N (xi |µl ,Σl )

The derivative of L(θ) wrt µk :


N
X πk N (xi |µk , Σk )
∂L(θ) −1
=− PK Σk (xi − µk ). (8)
∂µk i=1 l=1 πl N (xi |µl , Σl )
| {z }
γik

N N
∂L(θ) X X
=0⇔ γik xi = γik µk . (9)
∂µk i=1 i=1

We obtain:
N
1 X
µk = γik xi , (10)
Nk i=1
where
π N (xi |µk , Σk )
• γik = P k is called the responsibility of mixture component k for vector xi ,
K
l=1 πl N (xi |µl , Σl )
γik ∈ [0, 1].
PN
• Nk =
i=1 γik is the effective number of vectors assigned to component k.

LE Thi Khuyen Gaussian Mixture Models February 2025 8 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

The derivative of L(θ) wrt Σk gives:

N
1 X
Σk = γik (xi − µk )(xi − µk )T . (11)
Nk
i=1

LE Thi Khuyen Gaussian Mixture Models February 2025 9 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

The derivative of L(θ) wrt Σk gives:

N
1 X
Σk = γik (xi − µk )(xi − µk )T . (11)
Nk
i=1

Let define L(θ, λ) = L(θ) + λ( K


P
k =1 πk − 1). We obtain:

N
∂L(θ, λ) X N (x|µk , Σk )
=− PK + λ = 0, (12)
∂πk l=1 πl N (xi |µl , Σl )
i=1

LE Thi Khuyen Gaussian Mixture Models February 2025 9 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

The derivative of L(θ) wrt Σk gives:

N
1 X
Σk = γik (xi − µk )(xi − µk )T . (11)
Nk
i=1

Let define L(θ, λ) = L(θ) + λ( K


P
k =1 πk − 1). We obtain:

N
∂L(θ, λ) X N (x|µk , Σk )
=− PK + λ = 0, (12)
∂πk l=1 πl N (xi |µl , Σl )
i=1

Hence,
N K X
N K
X γik X X
=λ⇔ γik = pk λ, (13)
πk
i=1 k=1 i=1 k =1

γik N
= k.
PN
Therefore, λ = N and πk = i=1
N N

LE Thi Khuyen Gaussian Mixture Models February 2025 9 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

1 Context

2 Introduction of Gaussian Mixture model

3 Expectation-Maximization for GMM

4 GMM implementation from scratch and with scikit-learn

5 Advantages and disadvantages of GMMs

LE Thi Khuyen Gaussian Mixture Models February 2025 10 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

Expectation Maximization (EM) for GMM


1. Initialize µk , Σk , πk .
2. (Expectation step) Compute the responsibilities using current parameters µk , Σk , πk
(assigment)
π N (xi |µk , Σk )
γik = PKk . (14)
l=1 πl N (xi |µl , Σl )

3. (Maximization step) Re-estimate the parameters (µk , Σk , πk ) using the computed


responsibilities:

N
1 X
µk = γik xi ,
Nk
i=1
N
1 X
Σk = γik (xi − µk )(xi − µk )T ,
Nk
i=1
N
Nk X
πk = , where Nk = γik .
N
i=1

• Repeat two steps 2 and 3 until convergence.

LE Thi Khuyen Gaussian Mixture Models February 2025 11 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

Exercise
Implement the EM algorithm for a GMM with K = 2 components on the following 1D-dataset:

X = {1, 2, 3, 10, 15, 20}

Assume initial parameters:


• π1 = 0.5, µ1 = 2, σ 2 = 1
1
• π2 = 0.5, µ2 = 12, σ 2 = 4
1
Perform one iteration of the EM algorithm.

LE Thi Khuyen Gaussian Mixture Models February 2025 12 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

Solution
The density functions corresponding to two initialized distributions:
!
(x − µ1 )2 (x − 2)2
 
1 1
• f (x|µ1 , σ 2 ) = q exp − = √ exp −
1 2
2πσ12 2σ1 2π 2
!
(x − µ2 )2 (x − 12)2
 
1 1
• f (x|µ2 , σ 2 ) = q exp − = √ exp −
2 2
2πσ22 2σ2 2 2π 8

E-step
Compute the responsibilities (γik ) for each data point xi and each component k:

πk f (xi |µk , σk2 )


γik = P2
2
j=1 πj f (xi |µj , σj )

We obtain:
• γi1 ∼ 1 for i = 1, 2, 3, γi1 ∼ 0 for i = 4, 5, 6
• γi2 ∼ 0 for i = 1, 2, 3, γi2 ∼ 1 for i = 4, 5, 6

LE Thi Khuyen Gaussian Mixture Models February 2025 13 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

M-step Update parameters πk , µk , Σk using responsibilities γik


1 PN
Update πk : πk = i=1 γik
N
1 P6 3
• π1 = γi1 = = 0.5.
6 i=1 6
1 P6 3
• π2 = γi2 = = 0.5.
6 i=1 6

LE Thi Khuyen Gaussian Mixture Models February 2025 14 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

M-step Update parameters πk , µk , Σk using responsibilities γik


1 PN
Update πk : πk = i=1 γik
N
1 P6 3
• π1 = γi1 = = 0.5.
6 i=1 6
1 P6 3
• π2 = γi2 = = 0.5.
6 i=1 6
PN
1 PN i=1 γik xi
Update µk : µk = i=1 γ ij xi = PN
Nk i=1 γik
P6
γ x 6
• µ1 = Pi=1 i1 i = = 2.
6 3
i=1 γi1
P6
γ x 45
• µ2 = Pi=1 i2 i = = 15.
6 3
i=1 γi2

LE Thi Khuyen Gaussian Mixture Models February 2025 14 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

M-step Update parameters πk , µk , Σk using responsibilities γik


1 PN
Update πk : πk = i=1 γik
N
1 P6 3
• π1 = γi1 = = 0.5.
6 i=1 6
1 P6 3
• π2 = γi2 = = 0.5.
6 i=1 6
PN
1 PN i=1 γik xi
Update µk : µk = i=1 γ ij xi = PN
Nk i=1 γik
P6
γ x 6
• µ1 = Pi=1 i1 i = = 2.
6 3
i=1 γi1
P6
γ x 45
• µ2 = Pi=1 i2 i = = 15.
6 3
i=1 γ i2
PN 2
1 PN i=1 γik (xi − µk )
Update σk2 : σk2 = i=1 γ ik (x i − µ k )2 =
PN
Nk i=1 γik
P6 2
γ
i=1 i1 i (x − µ 1 ) 2
• σ2 = = .
1 P6
i=1 γi1
3
P6 2
• σ2 = i=1 γi1 (xi − µ2 ) 50
2 P6 = .
i=1 γi2 3

LE Thi Khuyen Gaussian Mixture Models February 2025 14 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

1 Context

2 Introduction of Gaussian Mixture model

3 Expectation-Maximization for GMM

4 GMM implementation from scratch and with scikit-learn

5 Advantages and disadvantages of GMMs

LE Thi Khuyen Gaussian Mixture Models February 2025 15 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

Exercise
Training GMM model with EM algorithm from scratch. Given three datasets following normal
distribution X1 ∼ N (0, 1), X2 ∼ N (−3, 1.5), X3 ∼ N (8, 2).
1. Write python code to generate these datasets with 1000 samples in each group.
(Hint: use function np.random.normal(loc = mean, scale = std, size = samples) or
norm.rvs(loc = mean, scale = std, size = samples))
2. Implement the EM-algorithm to estimate the data distribution.
3. Apply the function GaussianMixture from the scikit-learn package to estimate the data
distribution. Compare the results from two approaches.

LE Thi Khuyen Gaussian Mixture Models February 2025 16 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

Solution Original dataset:

LE Thi Khuyen Gaussian Mixture Models February 2025 17 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

LE Thi Khuyen Gaussian Mixture Models February 2025 18 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

LE Thi Khuyen Gaussian Mixture Models February 2025 19 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

Expectation step

Maximization step

LE Thi Khuyen Gaussian Mixture Models February 2025 20 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

LE Thi Khuyen Gaussian Mixture Models February 2025 21 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

LE Thi Khuyen Gaussian Mixture Models February 2025 22 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

LE Thi Khuyen Gaussian Mixture Models February 2025 23 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

LE Thi Khuyen Gaussian Mixture Models February 2025 24 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

LE Thi Khuyen Gaussian Mixture Models February 2025 25 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

LE Thi Khuyen Gaussian Mixture Models February 2025 26 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

LE Thi Khuyen Gaussian Mixture Models February 2025 27 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

LE Thi Khuyen Gaussian Mixture Models February 2025 28 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

LE Thi Khuyen Gaussian Mixture Models February 2025 29 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

GMM with scikit-learn

LE Thi Khuyen Gaussian Mixture Models February 2025 30 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

LE Thi Khuyen Gaussian Mixture Models February 2025 31 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

1 Context

2 Introduction of Gaussian Mixture model

3 Expectation-Maximization for GMM

4 GMM implementation from scratch and with scikit-learn

5 Advantages and disadvantages of GMMs

LE Thi Khuyen Gaussian Mixture Models February 2025 32 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

Advantages of GMMs
• Flexibility: GMMs have a ability to model a wide range of probability distributions (which
can be represented as a weighted sum of multiple normal distribution).
• Interpretability: The parameters of GMMs (weights, means, covariances) provide a clear
interpretation, which is useful for understanding the underlying data structure.
• Handles overlapping clusters by assigning probabilities to each point belonging to
different clusters.
• Handles missing values: EM estimates missing data probabilistically, GMMs can be
used to model incomplete datasets.

LE Thi Khuyen Gaussian Mixture Models February 2025 33 / 34


Context Introduction of Gaussian Mixture model Expectation-Maximization for GMM GMM implementation from scratch and with scikit-learn Advan

Disadvantages of GMMs
• Sensitive to initialization of parameters as weights, means, covariances. Poor
initialization may lead to local optima (bad clustering).
• Assumption of Normality which is not always true in real world dataset.
• Computational Complexity is expensive compared to K-means clustering, making it is
not scalable for large datasets.
• Number of components K must be predefined (choosing a wrong value of K may lead
to overfitting or underfiting).

LE Thi Khuyen Gaussian Mixture Models February 2025 34 / 34

You might also like