ITU Computer and Informatics Faculty
BLG 454E Learning From Data, Spring 2018
Homework #1
Due March 07, 2018 10pm
HOMEWORK #1 SOLUTIONS
Important Note: If you see ANY mistakes please inform me via kivrakh@itu.edu.tr
1. (10 pts.) In general, the probability that it rains on Saturday is 25%.
Weekend rain has the following relationships:
• If it rains on Saturday, the probability that it rains on Sunday is 50%.
• If it does not rain on Saturday, the probability that it rains on Sunday is 25%.
Given that it rained on Sunday, what is the probability that it rained on Saturday?
Correct answer: 40%
Let T and U be the events of rain on Saturday and Sunday, respectively, and denote the event of no rain
on Saturday as T . The problem gives us:
P (T ) = 25% P (U |T ) = 50% P (U |T ) = 25%.
Via Bayes’ Theorem:
P (T ) · P (U |T ) P (T ) · P (U |T ) 25% · 50%
P (T |U ) = = = = 40%.
P (U ) P (T ) · P (U |T ) + P (T ) · P (U |T ) 25% · 50% + 75% · 25%
2. (20 pts.) A bug stands on a random point of the lattice below. Each point is equally likely to be the
starting point.
Every minute, the bug selects an adjacent point at random and moves to it. Each adjacent point is
equally likely to be chosen. For example, if the bug is on point B, then each probability to move to the
points A, C, or G is 13 .
What is the probability that the bug reaches point A in 2 moves or less? Each point is equally likely to
be the bug’s starting point. Also, assume starting at A will ”reach” the point in 0 moves.
Correct answer:
22
63
Page 1 of 5
BLG 454E Learning From Data Homework #1
Let A2 be the event that bug reaches point A in two moves or less.
Let A, B, C, D, E, F, and G be the events that the bug starts on each point, respectively. The probability
of each of these events is 71 .
If the bug starts on A, then it doesn’t have to move:
P (A2 | A) = 1.
If the bug starts on B, then it can get to A directly, or through G :
1 1 1 7
P (A2 | B) = + × = .
3 3 6 18
Due to the symmetry of the lattice, the probability to get to A is the same starting from point F :
7
P (A2 | F ) = P (A2 | B) = .
18
If the bug starts on C, then it can get to A through B or through G :
1 1 1 1 1
P (A2 | C) = × + × = .
3 3 3 6 6
Due to the symmetry of the lattice, the probability to get to A is the same starting from point E :
1
P (A2 | E) = P (A2 | C) = .
6
If the bug starts at D, then it can only go through point G :
1 1 1
P (A2 | D) = × = .
3 6 18
If the bug starts at G, it can go to A directly, or it can go through B or F :
1 1 1 5
P (A2 | G) = + 2 × = .
6 6 3 18
P (A2 ) is the sum of all these probabilities multiplied by 71 :
1 7 1 1 5
P (A2 ) = 1+2 +2 + + (1)
7 18 6 18 18
(2)
22
= . (3)
63
3. (40 pts.) The idea for the maximum likelihood estimate (MLE) is to find the value of the pa-
rameter(s) for which the data has the highest probability. You are going to do this with the densities.
Suppose the 1-dimension data points x1 , x2 , ...xn given in ”data.txt” file are drawn from a normal(gauss)
N (µ, σ 2 ) distribution, where µ and σ are unknown.
• (20 pts.) Formulate the likelihood function and derive the equation to find the maximum likelihood
estimate for the pair (µ, σ 2 ).
Page 2 of 5
BLG 454E Learning From Data Homework #1
• (20 pts.) Implement (write the code) MLE in Matlab or Python language and provide your
plot that is similar to Figure 1. You are not allowed use any built-in functions except histogram
functions to provide you a quick of the distribution of the data.
(a) Gaussian probability density function
1 1
P (xi ) = √ exp − 2 (xi − µ)2 , −∞ < xi < ∞
2πσ 2 2σ
X ∼ N (µ, σ 2 )
Given N data samples, the likelihood is
N N
Y Y 1 1 2
L(µ, σ) = P (x1:N |µ, σ) = P (xi |µ, σ) = √ exp − 2 (xi − µ)
i=1 i=1 2πσ 2 2σ
and the log-likelihood is
XN
1 1
log L(µ, σ) = log √ + − 2 (xi − µ)2
2πσ 2 2σ i=1
In order to find estimate µ̂, we need to take derivative of log L(µ, σ) with respect to µ and equate
it to zero.
N X N
∂ ∂ X 1 ∂ 1
log L(µ, σ) = log √ + − 2 (xi − µ)2 = 0
∂µ ∂µ i=1 2πσ 2 ∂µ 2σ i=1
"N #
1 X
= 2 (xi − µ) = 0
σ i=1
N
X N
X
= xi − µ=0
i=1 i=1
Therefore, the estimate of µ and the mean value of the given weights is
N
1 X
µ̂ = xi
N i=1
For the estimate σˆ2 , we need to take derivative of log L(µ, σ) with respect to σ 2 and equate it to
zero. In order to avoid possible mistakes, θ will be used instead of σ 2 throughout the equations.
N N
∂ ∂ X 1 ∂ 1 X
log L(µ, θ) = log √ + − (xi − µ)2 = 0
∂θ ∂θ i=1 2πθ ∂θ 2θ i=1
"N #
N 1 X 2
=− + 2 (xi − µ) = 0
2θ 2θ i=1
N
X
= −N θ + (xi − µ)2 = 0
i=1
2
Thus, the estimate of θ or σ is
PN
i=1 (xi − µ̂)2
θ̂ = σˆ2 =
N
Page 3 of 5
BLG 454E Learning From Data Homework #1
(b)
import numpy as np
import matplotlib . pyplot as plt
from numpy import loadtxt
data = loadtxt ( " data . txt " )
mu = sum ( data ) / len ( data )
variance = sum ( np . power ( data - mu , 2 ) ) / len ( data )
# Plot the h i s t o g r a m .
plt . hist ( data , bins = 25 , normed = True , alpha = 0 .6 , color = ’g ’ , label = " data " )
# Plot the PDF .
xmin , xmax = plt . xlim ()
x = np . linspace ( xmin , xmax , 885 )
num = np . exp ( - ( x - mu ) ** 2 / ( 2 * variance ) )
denom = np . sqrt ( 2 * np . pi * variance )
p = num / denom
plt . plot (x , p , ’k ’ , linewidth =2 , label = " MLE fixed distribution " )
title = " MLE results : mu = % . 2f , std = % . 2f " % ( mu , np . sqrt ( variance ) )
plt . title ( title )
plt . legend ( bbox_ to_ancho r = ( 0 . 65 , 0 . 8 ) , loc =2 , borderaxespad = 0 .)
plt . show ()
0.16 MLE results: mu = 10.06, std = 2.57
0.14
0.12 MLE fixed distribution
data
0.10
0.08
0.06
0.04
0.02
0.00
0 2 4 6 8 10 12 14 16 18
Figure 1: Data and fixed gaussian distribution with MLE
4. (30 pts.) In the Table 1 below, x1 , x2 , x3 and xi ∈ {0, 1}, i = 1, 2, 3. xi represent the i feature vector
and y ∈ {+, −} represents the class label.
(a) (15 pts.) Construct the Naive Bayes classifier for the given training dataset in Table 1.
Hint: Estimate the class conditional prob. for each feature vector x1 , x2 , x3
(b) (5 pts.) Predict the class label for (x1 = 1, x2 = 1, x3 = 1) data using trained Naive Bayes approach
in part (a)
(c) (10 pts.) Calculate the probabilities of P (x1 = 1), P (x2 = 1), and P (x1 = 1, x2 = 1). Decide
whether x1 and x2 are independent or not.
(a) P (x1 = 1|+) = 3/5 = 0.6, P (x2 = 1|+) = 2/5 = 0.4, P (x3 = 1|+) = 4/5 = 0.8,
P (x1 = 0|+) = 2/5 = 0.4, P (x2 = 0|+) = 3/5 = 0.6, P (x3 = 0|+) = 1/5 = 0.2,
P (x1 = 1|−) = 2/5 = 0.4, P (x2 = 1|−) = 2/5 = 0.4, P (x3 = 1|−) = 1/5 = 0.2.
P (x1 = 0|−) = 3/5 = 0.6, P (x2 = 0|−) = 3/5 = 0.6, P (x3 = 0|−) = 4/5 = 0.8.
Page 4 of 5
BLG 454E Learning From Data Homework #1
(b) Let R : (x1 = 1, x2 = 1, x3 = 1) be the test instance. To determine its class, we need to compute
P (+|R) and P (−|R). Using Bayes theorem:
P (R|+)P (+) P (R|−)P (−)
P (+|R) = P (R) and P (−|R) = P (R) .
Since P (+) = P (−) = 5/10 = 0.5 and P (R) is the same for both class, R can be classified by
comparing P (R|+) and P (R|−).
Naive bayes assumes features are independent, so we can write,
P (R|+) = P (x1 , x2 , x3 |+) = P (x1 = 1|+) × P (x2 = 1|+) × P (x3 = 1|+) = 0.192
P (R|−) = P (x1 = 1|−) × P (x2 = 1|−) × P (x3 = 1|−) = 0.032
Since P (R|+) is larger, the record is assigned to (+) class.
(c) P (x1 = 1) = 5/10 = 0.5, P (x2 = 1) = 4/10 = 0.4 and P (x1 = 1, x2 = 1) = P (x1 ) × P (x2 ) = 0.2.
Therefore, x1 and x2 are independent.
Table 1: Training set for question 4
Instance x1 x2 x3 y
1 0 0 1 -
2 1 0 1 +
3 0 1 0 -
4 1 0 0 -
5 1 0 1 +
6 0 0 1 +
7 1 1 0 -
8 0 0 0 -
9 0 1 0 +
10 1 1 1 +
Page 5 of 5 End of homework.