Mlgs 2021 Endterm Solution
Mlgs 2021 Endterm Solution
Department of Informatics
Technical University of Munich
Note:
Esolution • During the attendance check a sticker containing a unique code will be put on this exam.
Place student sticker here
• This code contains a unique number that associates this exam with your registration number.
• This number is printed both next to the code and to the signature field in the attendance check
list.
n
Graded Exercise: IN2323 / Endterm Date: Friday 30th July, 2021
tio
Examiner: Prof. Dr. Stephan Günnemann Time: 11:30 – 12:45
Working instructions
lu
• This graded exercise consists of 26 pages with a total of 19 problems.
Please make sure now that you received a complete copy of the answer sheet.
• The total amount of achievable credits in this graded exercise is 140 credits.
So
• Allowed resources:
– all materials that you will use on your own (lecture slides, calculator etc.)
– not allowed are any forms of collaboration between examinees and plagiarism
• You have to sign the code of conduct. (Typing your name is fine)
• You have to either print this document and scan your solutions or paste scans/pictures of your handwritten
e
solutions into the solution boxes in this PDF. Editing the PDF digitally is prohibited except for signing the
code of conduct and answering multiple choice questions.
pl
• Make sure that the QR codes are visible on every uploaded page. Otherwise, we cannot grade your
submission.
• You must solve the specified version of the problem. Different problems may have different version: e.g.
m
Problem 1 (Version A), Problem 5 (Version C), etc. If you solve the wrong version you get zero points.
• Only write on the provided sheets, submitting your own additional sheets is not possible.
• Last two pages can be used as scratch paper.
Sa
• All sheets (including scratch paper) have to be submitted to the upload queue. Missing pages will be
considered empty.
• Only use a black or blue color (no red or green)! Pencils are allowed.
• Write your answers only in the provided solution boxes or the scratch paper.
• For problems that say "Justify your answer" you only get points if you provide a valid explanation.
• For problems that say "derive" you only get points if you provide a valid derivation.
• If a problem does not say "Justify your answer" or "derive", it’s sufficient to only provide the correct answer.
• Instructor announcements and clarifications will be posted on Piazza with email notifications.
– Page 1 / 26 –
Problem 1 (Version C)
0
1
2
3 We compute the density using the change of variables formula
4
5 ∂ f −1 (x (0) )
6 p2 (x (0) ) = p1 (f −1 (x (0) )) .
∂x
n
Therefore,
1
p1 (f −1 (x (0) )) = .
tio
4
Since f is a linear transformation, the Jacobian determinant is
∂ f −1 (x (0) )
= det(A −1 ) = 8.
∂x
lu
Putting everything together, we get
1
p2 (x (0) ) = · 8 = 2.
4
So
e
pl
m
Sa
– Page 2 / 26 –
Problem 2 (Version A)
0
1
2
Our goal is to find a transformation Tϕ : [0, 1] → R such that 3
4
Fx (a) = Pr(x ≤ a) 5
6
= Pr(Tϕ (u) ≤ a)
= Pr(u ≤ Tϕ−1 (a))
= Fu (Tϕ−1 (a))
= Tϕ−1 (a)
n
where Fu is the CDF of the Uniform([0, 1]) distribution.
We can rewrite the above equation as
tio
1
= Tϕ−1 (a)
1 + exp(−ϕa)
1
= Tϕ−1 (Tϕ (a))
1 + exp(−ϕTϕ (a))
1
=a
1 + exp(−ϕTϕ (a))
lu
1
exp(−ϕTϕ (a)) = − 1
a
1 1
Tϕ (a) = − log −1
So
ϕ a
– Page 3 / 26 –
Problem 3 (Version B)
0 a)
1
2
No, since N ̸= M means that the transformation f is not invertible. A normalizing flow model can only be
defined using an invertible transformation.
n
tio
0 b)
1
2
Yes, since there are no restrictions on the decoder in a VAE.We might need to apply some nonlinearity to
ensure that the parameters are valid (e.g., nonnegative).
lu
So
0 c)
1
2
Yes, since there are no restrictions on the the generator in a GAN.
e
pl
m
Sa
– Page 4 / 26 –
Problem 4 (Version B)
0
1
2
We introduce a vector q ∈ {0, 1}D of binary variables, indicating which input features are perturbed by the 3
adversary. 4
We can then introduce 2 · D constraints to express that qd = 0 =⇒ x̃d = Xd : 5
6
7
x̃d − xd ≤ qd ϵ ∀i, j 8
x̃d − xd ≥ qd ϵ ∀i, j
n
D
X −1
qd ≤ η
tio
d=0
lu
So
e
pl
m
Sa
– Page 5 / 26 –
Problem 5 (Version B)
0 a)
1
2
The only root of its characteristic polynomial is 1 which is not strictly outside the unit circle. Therefore the
process is not stationary.
n
tio
lu
So
0 b)
1
2
By plugging in the original process we see that
Xt′ = (Xt −1 + εt ) − Xt −1 = εt .
e
As such the sequence elements will just be i.i.d. noise variables which directly fulfill the definition of
stationarity: their mean is 0 and therefore constant, their covariance is also 0 and thus independent of t and,
pl
– Page 6 / 26 –
Problem 6 (Version D)
0
1
2
We are looking for the most likely latent state at a single point in time arg maxZ2 Pr(Z2 | X1:3 ) which we can get 3
from the forward-backward algorithm. 4
5
Pr(Z2 | X1:3 ) ∝ Pr(Z2 , X1:3 ) = Pr(Z2 , X1:2 ) · Pr(X3 | Z2 ) = α2 ⊙ β 2 6
7
8
We compute the forward and backward variables αt and β t as follows.
2 16 16
α1 = B :,2 ⊙ π ∝ 6 α2 = B :,2 ⊙ A T α1 ∝ B :,2 ⊙ 14 ∝ 42
0 10 0
n
1 1 9
β 3 ∝ 1 β 2 = A (B :,3 ⊙ β 3 ) ∝ A 1 ∝ 7
tio
1 3 13
In the end, we get that
144
Pr(Z2 | X1:3 ) ∝ α2 ⊙ β 2 = 294
0
lu
and therefore the most likely latent state Z2 is 2.
So
e
pl
m
Sa
– Page 7 / 26 –
Problem 7 (Version D)
0
1
2
3 There are two equivalent ways to answer this question.
4
5 • We know that the inter-event times τi in a homogeneous Poisson process with rate are distributed
6 according to the Exponential( µ1 ) distribution. Therefore,
• Equivalently, we know from the properties of the Poisson process that the number of events N follows
RT
Poisson( 0 µdt) = Poisson(µT ). Hence, we can use the probability mass function of the Poisson
n
distribution to compute
(µT )0 exp(−µT )
Pr(N = 0) = = exp(−µT ).
tio
0!
lu
So
e
pl
m
Sa
– Page 8 / 26 –
Problem 8: Graphs - Clustering (Version A)
0
1
2
We compute the likelihood 3
4
Y (ηzi zj )Aij 5
P(A |η , z) = e −ηzi zj 6
Aij !
ij
n
X X X
=− log(Aij !) + Aij log ηzi zj − ηzi zj
ij ij ij
tio
1(zi Aij 1(zi = p, zj = q),
PN PN PN
We make use of the shorthand notation Np = i=1 = p) and Mpq = i=1 j=1
and rewrite the log-likelihood:
X X
log P(A |η , z) = const + mpq log ηpq − np nq ηpq .
p,q p,q
lu
We compute the derivative with respect to ηpq and set it to 0:
– Page 9 / 26 –
Problem 8: Graphs - Clustering (Version B)
0
1
2
3 We compute the likelihood
4
5 Y (ηzi zj )Aij
6 P(A |η , z) = e −ηzi zj
Aij !
ij
n
X X X
=− log(Aij !) + Aij log ηzi zj − ηzi zj
ij ij ij
tio
1(zi Aij 1(zi = p, zj = q),
PN PN PN
We make use of the shorthand notation Np = i=1 = p) and Mpq = i=1 j=1
and rewrite the log-likelihood:
X X
log P(A |η , z) = const + mpq log ηpq − np nq ηpq .
p,q p,q
lu
We compute the derivative with respect to ηpq and set it to 0:
– Page 10 / 26 –
Problem 8: Graphs - Clustering (Version C)
0
1
2
We compute the likelihood 3
4
Y (ηzi zj )Aij 5
P(A |η , z) = e −ηzi zj 6
Aij !
ij
n
X X X
=− log(Aij !) + Aij log ηzi zj − ηzi zj
ij ij ij
tio
1(zi Aij 1(zi = p, zj = q),
PN PN PN
We make use of the shorthand notation Np = i=1 = p) and Mpq = i=1 j=1
and rewrite the log-likelihood:
X X
log P(A |η , z) = const + mpq log ηpq − np nq ηpq .
p,q p,q
lu
We compute the derivative with respect to ηpq and set it to 0:
– Page 11 / 26 –
Problem 8: Graphs - Clustering (Version D)
0
1
2
3 We compute the likelihood
4
5 Y (ηzi zj )Aij
6 P(A |η , z) = e −ηzi zj
Aij !
ij
n
X X X
=− log(Aij !) + Aij log ηzi zj − ηzi zj
ij ij ij
tio
1(zi Aij 1(zi = p, zj = q),
PN PN PN
We make use of the shorthand notation Np = i=1 = p) and Mpq = i=1 j=1
and rewrite the log-likelihood:
X X
log P(A |η , z) = const + mpq log ηpq − np nq ηpq .
p,q p,q
lu
We compute the derivative with respect to ηpq and set it to 0:
– Page 12 / 26 –
Problem 9: Graphs - Ranking (Version A)
a) 0
1
2
3
The stationnary distribution of the random walk associated with G is the vector π (∞) = [1, 0, 0, 0] satisfies
A π (∞) = π (∞) and normalized to 1.
n
tio
lu
So
b) 0
1
e
pl
It is impossible to get from state 1 to other states i.e. state 1 is a dead end without out-links.
m
Sa
0
1
2
3
4
5
6
c)
– Page 13 / 26 –
The node 1 is a dead end. Therefore, there are three options to make the graph G ′ irreducible: add edge
(1, 2), (1, 3) or (1, 4).
The three systems of pagerank equations are respectively:
r2 r3 r2 r3 r2 r3
r 1 = + r 1 = + r1 = +
2 2 2 2 2 2
r2 = r4 + r1 r2 = r4 r2 = r4
r2 r2 r2
r3 = r3 = + r1 r3 =
2 2 2
r4 = r3 r4 = r3 r4 = r3 + r1
2 2 2
where we also enforce r1 + r2 + r3 + r4 = 1. Solving the systems lead respectively to:
n
r2 r3
r2 r3 r2 r3
r1 = + r1 = + r1 = +
2 2 2 2
2 2
4r1
2r 1
4r 1
r 2 = r2 = r2 =
tio
3 3 3
2r1 4r 1 2r 1
r3 = r3 = r3 =
3 3
3
r4 = r1 r4 = 2r1 r4 = 4r1
3 3 3
lu
3 4 2 1 3
Taking into account the normalization constraint, we obtain r1 = ,r
10 2
= ,r
10 3
= ,r
10 4
= 10
and r1 = ,r
11 2
=
2 4 2 3 4 2 4
, r = 11
11 3
, r4 = 11 and r1 = 13 , r2 = 13 , r3 = 13 , r4 = 13 .
The best edge to add is (1, 2) to maximize the rank of node 1.
So
e
pl
m
Sa
– Page 14 / 26 –
Problem 9: Graphs - Ranking (Version B)
a) 0
1
2
3
The stationnary distribution of the random walk associated with G is the vector π (∞) = [0, 1, 0, 0] satisfies
A π (∞) = π (∞) and normalized to 1. It defines the stationnary distribution of the random walk associated
with G .
n
tio
lu
So
b) 0
1
e
pl
It is impossible to get from state 2 to other states i.e. state 2 is a dead end without out-links.
m
Sa
0
1
2
3
4
5
6
c)
– Page 15 / 26 –
The node 2 is a dead end. Therefore, there are three options to make the graph G ′ irreducible: add edge
(2, 1), (2, 3) or (2, 4).
The three systems of pagerank equations are respectively:
r2 r3
r 1 = r 4 + r2 r 1 = r 4 r1 = +
2 2
r r r r
1 3 1 3 r1 r3
r = + r = +
2 2 r2 = +
2 2 2 2 2 2
r1 r1 r1
r 3 = r 3 = + r 2 r3 =
2 2
2
r r
r 4 = 3 r4 = 3 r4 = r3 + r2
2 2 2
where we also enforce r1 + r2 + r3 + r4 = 1. Solving the systems lead respectively to:
n
2r 4r2
4r2
r 1 =
r1 = 2
r1 =
3
3
3
r1 r3
r r3 r r3
tio
1 1
r 2 = + r2 = + r2 = +
2 2 2 2 2 2
2r 2 4r 2 2r 2
r3 = r3 = r3 =
3 3
3
r 4 =
r2
2r
r4 = 2
r4 = 2
4r
3 3 3
lu
4 3 2 1 2
Taking into account the normalization constraint, we obtain r1 = ,r
10 2
= ,r
10 3
= ,r
10 4
= 10
and r1 = ,r
11 2
=
3 4 2 4 3 2 4
, r = 11
11 3
, r4 = 11 and r1 = 13 , r2 = 13 , r3 = 13 , r4 = 13 .
The best edge to add is (2, 1) to maximize the rank of node 1.
So
e
pl
m
Sa
– Page 16 / 26 –
Problem 9: Graphs - Ranking (Version C)
a) 0
1
2
3
The stationnary distribution of the random walk associated with G is the vector π (∞) = [0, 0, 1, 0] satisfies
A π (∞) = π (∞) and normalized to 1. It defines the stationnary distribution of the random walk associated
with G .
n
tio
lu
So
b) 0
1
e
pl
It is impossible to get from state 3 to other states i.e. state 3 is a dead end without out-links.
m
Sa
0
1
2
3
4
5
6
c)
– Page 17 / 26 –
The node 3 is a dead end. Therefore, there are three options to make the graph G ′ irreducible: add edge
(3, 1), (3, 2) or (3, 4).
The three systems of pagerank equations are respectively:
r2 r2 r2
r 1 = + r3 r 1 = r1 =
2 2 2
r 2 = r 4 r2 = r4 + r3 r2 = r4
r1 r2 r1 r2 r1 r2
r3 = + r3 = + r3 = +
2 2 2 2 2 2
r4 = r1 r4 = r1 r4 = r1 + r3
2 2 2
where we also enforce r1 + r2 + r3 + r4 = 1. Solving the systems lead respectively to:
n
4r3 2r3 2r3
r1 =
r1 =
r1 =
3
3
3
2r3 4r3 4r3
r 2 = r2 = r2 =
tio
3 3 2
r1 r2 r1 r3 r1 r3
r3 = + r3 = + r3 = +
2 2
2 2
2 2
r4 = 2r3 r4 = 1r3 r4 = 4r3
3 3 3
lu
4 2 3 2 2
Taking into account the normalization constraint, we obtain r1 = ,r
11 2
= ,r
11 3
= ,r
11 4
= 11
and r1 = ,r
10 2
=
4 3 1 2 4 3 4
, r = 10
10 3
, r4 = 10 and r1 = 13 , r2 = 13 , r3 = 13 , r4 = 13 .
The best edge to add is (3, 1) to maximize the rank of node 1.
So
e
pl
m
Sa
– Page 18 / 26 –
Problem 9: Graphs - Ranking (Version D)
a) 0
1
2
3
The stationnary distribution of the random walk associated with G is the vector π (∞) = [0, 0, 0, 1] satisfies
A π (∞) = π (∞) and normalized to 1. It defines the stationnary distribution of the random walk associated
with G .
n
tio
lu
So
b) 0
1
e
pl
It is impossible to get from state 4 to other states i.e. state 4 is a dead end without out-links.
m
Sa
0
1
2
3
4
5
6
c)
– Page 19 / 26 –
The node 4 is a dead end. Therefore, there are three options to make the graph G ′ irreducible: add edge
(4, 1), (4, 2) or (4, 3).
The three systems of pagerank equations are respectively:
r3 r3 r3
r 1 = + r4 r 1 = r1 =
2 2 2
r 2 = r 1 r2 = r1 + r4 r2 = r1
r2 r2 r2
r3 = r3 = r3 = + r4
2 2 2
r4 = r2 + r3 r4 = r2 + r3 r4 = r2 + r3
2 2 2 2 2 2
where we also enforce r1 + r2 + r3 + r4 = 1. Solving the systems lead respectively to:
n
4r4 1r4 2r4
r1 =
r1 =
r1 =
3
3
3
4r4 4r4 2r4
r 2 = r2 = r2 =
tio
3 3 3
2r4 2r4 4r4
r3 =
r3 =
r3 =
3 3 3
r4 = r2 + r3 r4 = r2 + r3 r4 = r2 + r3
2 2 2 2 2 2
lu
4 4 2 3 1
Taking into account the normalization constraint, we obtain r1 = ,r
13 2
= ,r
13 3
= ,r
13 4
= 13
and r1 = ,r
10 2
=
4 2 3 2 2 4 3
, r = 10
10 3
, r4 = 10 and r1 = 11 , r2 = 11 , r3 = 11 , r4 = 11 .
The best edge to add is (3, 1) to maximize the rank of node 1.
So
e
pl
m
Sa
– Page 20 / 26 –
Problem 10: Graphs - Semi-Supervised Learning (Version A)
a) 0
1
2
This problem is equivalent to the minimum
cut problem
separating the two clusters. 3
1 0
Optimal label assignments are y U = 1 and y U = 0. They achieve a minimum cost of 2.
1 0
n
tio
b) 0
lu
1
2
We first write the Laplacian L and L ′ in block form of both graphs G and G ′ : 3
4
L SS L SU 5
L= ∈ R5×5
So
L US L UU
and
L ′SS L ′SU
L′ = ∈ R6×6
L ′US L ′UU
Given this notation, we can write the closed-form solution for both graphs:
′
y ∗U = −L −1 ′∗ ′−1 ′
UU L US ŷ S and y U = −L UU L US ŷ S
e
′ ′
′ 1
We note that L UU = L UU , L SU = 0 L SU and ŷ S = .
pl
ŷ S
Finally, we obtain:
′−1 ′ ′
y ′∗
U = −L UU L US ŷ S
= −L ′−1
m
UU (0 × 1 + L SU ŷ S )
∗
= yU
′∗ 1
and the final solution is y = ∗ .
Sa
– Page 21 / 26 –
Problem 10: Graphs - Semi-Supervised Learning (Version B)
0 a)
1
2
3 This problem is equivalent to the minimum
cut problem
separating the two clusters.
1 0
Optimal label assignments are y U = 1 and y U = 0. They achieve a minimum cost of 2.
1 0
n
tio
0 b)
lu
1
2
3 We first write the Laplacian L and L ′ in block form of both graphs G and G ′ :
4
5 L SS L SU
L= ∈ R5×5
So
L US L UU
and
L ′SS L ′SU
L′ = ∈ R6×6
L ′US L ′UU
Given this notation, we can write the closed-form solution for both graphs:
′
y ∗U = −L −1 ′∗ ′−1 ′
UU L US ŷ S and y U = −L UU L US ŷ S
e
′ ′
′ 1
We note that L UU = L UU , L SU = 0 L SU and ŷ S = .
pl
ŷ S
Finally, we obtain:
′−1 ′ ′
y ′∗
U = −L UU L US ŷ S
= −L ′−1
m
UU (0 × 1 + L SU ŷ S )
∗
= yU
′∗ 1
and the final solution is y = ∗ .
Sa
– Page 22 / 26 –
Problem 10: Graphs - Semi-Supervised Learning (Version C)
a) 0
1
2
This problem is equivalent to the minimum
cut problem
separating the two clusters. 3
1 0
Optimal label assignments are y U = 1 and y U = 0. They achieve a minimum cost of 2.
1 0
n
tio
b) 0
lu
1
2
We first write the Laplacian L and L ′ in block form of both graphs G and G ′ : 3
4
L SS L SU 5
L= ∈ R5×5
So
L US L UU
and
L ′SS L ′SU
L′ = ∈ R6×6
L ′US L ′UU
Given this notation, we can write the closed-form solution for both graphs:
′
y ∗U = −L −1 ′∗ ′−1 ′
UU L US ŷ S and y U = −L UU L US ŷ S
e
′ ′
′ 1
We note that L UU = L UU , L SU = 0 L SU and ŷ S = .
pl
ŷ S
Finally, we obtain:
′−1 ′ ′
y ′∗
U = −L UU L US ŷ S
= −L ′−1
m
UU (0 × 1 + L SU ŷ S )
∗
= yU
′∗ 1
and the final solution is y = ∗ .
Sa
– Page 23 / 26 –
Problem 10: Graphs - Semi-Supervised Learning (Version D)
0 a)
1
2
3 This problem is equivalent to the minimum
cut problem
separating the two clusters.
1 0
Optimal label assignments are y U = 1 and y U = 0. They achieve a minimum cost of 2.
1 0
n
tio
0 b)
lu
1
2
3 We first write the Laplacian L and L ′ in block form of both graphs G and G ′ :
4
5 L SS L SU
L= ∈ R5×5
So
L US L UU
and
L ′SS L ′SU
L′ = ∈ R6×6
L ′US L ′UU
Given this notation, we can write the closed-form solution for both graphs:
′
y ∗U = −L −1 ′∗ ′−1 ′
UU L US ŷ S and y U = −L UU L US ŷ S
e
′ ′
′ 1
We note that L UU = L UU , L SU = 0 L SU and ŷ S = .
pl
ŷ S
Finally, we obtain:
′−1 ′ ′
y ′∗
U = −L UU L US ŷ S
= −L ′−1
m
UU (0 × 1 + L SU ŷ S )
∗
= yU
′∗ 1
and the final solution is y = ∗ .
Sa
– Page 24 / 26 –
Additional space for solutions–clearly mark the (sub)problem your answers are related to and strike out
invalid solutions.
n
tio
lu
So
e
pl
m
Sa
– Page 25 / 26 –
n
tio
lu
So
e
pl
m
Sa
– Page 26 / 26 –