Maximum Likelihood Estimation
Santosh K. Vipparthi
School of AI & DE, IIT Ropar
Jan - May
Definition
Maximum Likelihood Estimation (MLE) is a statistical method used to
estimate the parameters of a probability distribution by maxi-
mizing the likelihood function.
Santosh K. Vipparthi Machine Learning: MLE 2/19
Definition
Maximum Likelihood Estimation (MLE) is a statistical method used to
estimate the parameters of a probability distribution by maxi-
mizing the likelihood function.
AIM
In statistics, Maximum Likelihood Estimation (MLE) is a method used
to estimate the parameters of an assumed probability distribu-
tion based on observed data.
This is by maximizing the likelihood function, ensuring that, un-
der the assumed statistical model, the observed data is most
probable.
Santosh K. Vipparthi Machine Learning: MLE 2/19
Definition
Maximum Likelihood Estimation (MLE) is a statistical method used to
estimate the parameters of a probability distribution by maxi-
mizing the likelihood function.
AIM
In statistics, Maximum Likelihood Estimation (MLE) is a method used
to estimate the parameters of an assumed probability distribu-
tion based on observed data.
This is by maximizing the likelihood function, ensuring that, un-
der the assumed statistical model, the observed data is most
probable.
Assumptions
Let (Ω, P) be a probability space, and X be an random variable. Suppose
fX be the probability density function (pdf) and FX is the (probability)
distribution of X.
Santosh K. Vipparthi Machine Learning: MLE 2/19
Example
Suppose we have a bag that contains 3 balls and we would like to pick
one ball in each time. Each ball is either red or blue, but we have no
information in addition to this. Let X1 is the first chosen ball is of blue
color that is
Santosh K. Vipparthi Machine Learning: MLE 3/19
Step 1: Probability Setup
A bag contains 3 balls, each ball is either red or blue.
We do not know the number of blue balls, so we define an
unknown parameter:
θ = Number of blue balls in the bag
Possible values of θ:
θ ∈ {0, 1, 2, 3}
Define X1 as:
(
1, if the first drawn ball is blue
X1 =
0, if the first drawn ball is red
Santosh K. Vipparthi Machine Learning: MLE 4/19
Step 2: Finding Probabilities
Probability of selecting a blue ball depends on θ:
If θ = 3, all balls are blue:
P (X1 = 1) = 1, P (X1 = 0) = 0.
If θ = 2, two blue and one red ball:
2 1
P (X1 = 1) = , P (X1 = 0) = .
3 3
If θ = 1, one blue and two red balls:
1 2
P (X1 = 1) = , P (X1 = 0) = .
3 3
If θ = 0, no blue balls:
P (X1 = 1) = 0, P (X1 = 0) = 1.
Santosh K. Vipparthi Machine Learning: MLE 5/19
Step 3: Probability Mass Function (PMF)
The probability Mass function (PMF) describes how X1 is
distributed based on θ:
1, if θ = 3 and X1 = 1
2 , if θ = 2 and X = 1
1
f (X1 |θ) = 31
3 , if θ = 1 and X1 = 1
0, if θ = 0 and X = 1
1
Santosh K. Vipparthi Machine Learning: MLE 6/19
Step 3: Probability Mass Function (PMF)
The probability Mass function (PMF) describes how X1 is
distributed based on θ:
1, if θ = 3 and X1 = 1
2 , if θ = 2 and X = 1
1
f (X1 |θ) = 31
3 , if θ = 1 and X1 = 1
0, if θ = 0 and X = 1
1
Key point: The PDF depends on θ, which is unknown.
Santosh K. Vipparthi Machine Learning: MLE 6/19
Step 4: Likelihood Function
Likelihood Function Definition:
L(θ) = f (X1 |θ)
Santosh K. Vipparthi Machine Learning: MLE 7/19
Step 4: Likelihood Function
Likelihood Function Definition:
L(θ) = f (X1 |θ)
What does this mean?
Instead of asking: ”What is the probability of getting X1 given θ?”
(PDF)
We now ask: ”Given that we observed X1 , which θ is most likely?”
(Likelihood)
Santosh K. Vipparthi Machine Learning: MLE 7/19
PDF: f (X1 |θ) → Given θ, what is the probability of drawing a
blue ball?
P (X1 = 1|θ) = f (X1 |θ)
Santosh K. Vipparthi Machine Learning: MLE 8/19
PDF: f (X1 |θ) → Given θ, what is the probability of drawing a blue
ball?
P (X1 = 1|θ) = f (X1 |θ)
Likelihood: L(θ|X1 ) → Given that we drew a blue ball, which θ is
most likely?
L(θ) = f (X1 |θ)
Santosh K. Vipparthi Machine Learning: MLE 8/19
Santosh K. Vipparthi Machine Learning: MLE 8/19
PDF: f (X1 |θ) → Given θ, what is the probability of drawing a blue
ball?
P (X1 = 1|θ) = f (X1 |θ)
Likelihood: L(θ|X1 ) → Given that we drew a blue ball, which θ is
most likely?
L(θ) = f (X1 |θ)
Main Difference:
In PDF, θ is fixed, and X1 varies.
In Likelihood, X1 is fixed (we observed it), and θ varies.
Likelihood Measures how well a model explains data
MLE chooses parameters that maximize this likelihood
Santosh K. Vipparthi Machine Learning: MLE 8/19
Let (Ω, P) be a probability space, and X1 , X2 , · · · , Xn be an random
variables. Suppose fk be the probability density function (pdf) of Xk ,
for k = 1, 2, · · · , n.
Suppose θ is an unknown parameter and fk (Xk |θ) is the probability
density function for the true value of θ.
Santosh K. Vipparthi Machine Learning: MLE 9/19
Let (Ω, P) be a probability space, and X1 , X2 , · · · , Xn be an random
variables. Suppose fk be the probability density function (pdf) of Xk ,
for k = 1, 2, · · · , n.
Suppose θ is an unknown parameter and fk (Xk |θ) is the probability
density function for the true value of θ.
Definition
In this case the function f : X = (X1 , X2 , · · · , Xn ) → R defined by
n
Y
f (x1 , x2 , · · · , xn |θ) = f (x1 |θ) · f (x2 |θ) · · · f (xn |θ) = f (xk |θ) is
k=1
known as joint pdf of X = (X1 , X2 , · · · , Xn ).
Santosh K. Vipparthi Machine Learning: MLE 9/19
Let (Ω, P) be a probability space, and X1 , X2 , · · · , Xn be an random
variables. Suppose fk be the probability density function (pdf) of Xk ,
for k = 1, 2, · · · , n.
Suppose θ is an unknown parameter and fk (Xk |θ) is the probability
density function for the true value of θ.
Definition
In this case the function f : X = (X1 , X2 , · · · , Xn ) → R defined by
n
Y
f (x1 , x2 , · · · , xn |θ) = f (x1 |θ) · f (x2 |θ) · · · f (xn |θ) = f (xk |θ) is
k=1
known as joint pdf of X = (X1 , X2 , · · · , Xn ).
Definition
In this case the function θ 7→ L(θ) = L(θ|X = (X1 , X2 , · · · , Xn ))
defined by L(θ|X = (X1 , X2 , · · · , Xn ) = f (x1 , x2 , · · · , xn |θ) is
likelihood function of X = (X1 , X2 , · · · , Xn ).
Santosh K. Vipparthi Machine Learning: MLE 9/19
Example Suppose we have a bag that contains 3 balls and we would like to
pick one ball in each time for 4 times. Each ball is either red or blue, but we
have no information in addition. Let Xi is the ith chosen ball is of blue color,
for i = 1, 2, 3, 4 that is
1 if the ith chosen ball is blue
Xi =
0 if the ith chosen ball is red.
Santosh K. Vipparthi Machine Learning: MLE 10/19
Example Suppose we have a bag that contains 3 balls and we would like to
pick one ball in each time for 4 times. Each ball is either red or blue, but we
have no information in addition. Let Xi is the ith chosen ball is of blue color,
for i = 1, 2, 3, 4 that is
1 if the ith chosen ball is blue
Xi =
0 if the ith chosen ball is red.
Assume that θ is the number of blue of balls in the bag.
Santosh K. Vipparthi Machine Learning: MLE 10/19
Example Suppose we have a bag that contains 3 balls and we would like to
pick one ball in each time for 4 times. Each ball is either red or blue, but we
have no information in addition. Let Xi is the ith chosen ball is of blue color,
for i = 1, 2, 3, 4 that is
1 if the ith chosen ball is blue
Xi =
0 if the ith chosen ball is red.
Assume that θ is the number of blue of balls in the bag. Then
θ
if xi = 1
P(Xi = xi |θ) = 3
1 − θ3 if xi = 0.
Santosh K. Vipparthi Machine Learning: MLE 10/19
Example Suppose we have a bag that contains 3 balls and we would like to
pick one ball in each time for 4 times. Each ball is either red or blue, but we
have no information in addition. Let Xi is the ith chosen ball is of blue color,
for i = 1, 2, 3, 4 that is
1 if the ith chosen ball is blue
Xi =
0 if the ith chosen ball is red.
Assume that θ is the number of blue of balls in the bag. Then
θ
if xi = 1
P(Xi = xi |θ) = 3
1 − θ3 if xi = 0.
By joint pdf we get
4
Y
P(X1 = x1 , X2 = x2 , X3 = x3 , X4 = x4 |θ) = P(Xi = x|θ). In particular
i=1
(x1 , x2 , x3 , x4 ) = (1, 0, 1, 1)
Santosh K. Vipparthi Machine Learning: MLE 10/19
Example Suppose we have a bag that contains 3 balls and we would like to
pick one ball in each time for 4 times. Each ball is either red or blue, but we
have no information in addition. Let Xi is the ith chosen ball is of blue color,
for i = 1, 2, 3, 4 that is
1 if the ith chosen ball is blue
Xi =
0 if the ith chosen ball is red.
Assume that θ is the number of blue of balls in the bag. Then
θ
if xi = 1
P(Xi = xi |θ) = 3
1 − θ3 if xi = 0.
By joint pdf we get
4
Y
P(X1 = x1 , X2 = x2 , X3 = x3 , X4 = x4 |θ) = P(Xi = x|θ). In particular
i=1
(x1 , x2 , x3 , x4 ) = (1, 0, 1, 1) we get
Santosh K. Vipparthi Machine Learning: MLE 10/19
Example Suppose we have a bag that contains 3 balls and we would like to
pick one ball in each time for 4 times. Each ball is either red or blue, but we
have no information in addition. Let Xi is the ith chosen ball is of blue color,
for i = 1, 2, 3, 4 that is
1 if the ith chosen ball is blue
Xi =
0 if the ith chosen ball is red.
Assume that θ is the number of blue of balls in the bag. Then
θ
if xi = 1
P(Xi = xi |θ) = 3
1 − θ3 if xi = 0.
By joint pdf we get
4
Y
P(X1 = x1 , X2 = x2 , X3 = x3 , X4 = x4 |θ) = P(Xi = x|θ). In particular
i=1
(x1 , x2 , x3 , x4 ) = (1, 0, 1, 1) we get
In the maximum likelihood esti-
mate, we are looking for θ = 2.
Santosh K. Vipparthi Machine Learning: MLE 10/19
Likelihood Function
Let (Ω, P) be a probability space, and X = (X1 , X2 , . . . , Xn ) be
random variables with probability density functions (PDFs) fk for Xk ,
where k = 1, 2, . . . , n.
Given an unknown parameter θ, the conditional PDF of Xk is:
fk (Xk |θ).
Santosh K. Vipparthi Machine Learning: MLE 11/19
Likelihood Function
Let (Ω, P) be a probability space, and X = (X1 , X2 , . . . , Xn ) be
random variables with probability density functions (PDFs) fk for Xk ,
where k = 1, 2, . . . , n.
Given an unknown parameter θ, the conditional PDF of Xk is:
fk (Xk |θ).
The joint PDF of X is:
n
Y
f (X|θ) = fk (Xk |θ),
k=1
which defines the likelihood function:
L(θ|X) = f (X|θ).
Santosh K. Vipparthi Machine Learning: MLE 11/19
Maximum Likelihood Estimate (MLE)
The Maximum Likelihood Estimate (MLE), denoted by θ,
b is the
value of θ that maximizes the likelihood function:
θb = arg max L(θ|X).
θ
Santosh K. Vipparthi Machine Learning: MLE 12/19
Maximum Likelihood Estimate (MLE)
The Maximum Likelihood Estimate (MLE), denoted by θ,
b is the
value of θ that maximizes the likelihood function:
θb = arg max L(θ|X).
θ
Santosh K. Vipparthi Machine Learning: MLE 12/19
Finding Maximum Likelihood Estimate
Let (Ω, P) be a probability space, and X = (X1 , X2 , · · · , Xn ) be an random
variables. Suppose fk is the pdf of Xk , for k = 1, 2, · · · , n.
Suppose θ is an unknown parameter and fk (Xk |θ) is the pdf for the true value
of θ, f (X|θ) is the joint pdf of X. Suppose L(X|θ) is the likelihood function
of X = (X1 , X2 , · · · , Xn ).
Santosh K. Vipparthi Machine Learning: MLE 13/19
Finding Maximum Likelihood Estimate
Let (Ω, P) be a probability space, and X = (X1 , X2 , · · · , Xn ) be an random
variables. Suppose fk is the pdf of Xk , for k = 1, 2, · · · , n.
Suppose θ is an unknown parameter and fk (Xk |θ) is the pdf for the true value
of θ, f (X|θ) is the joint pdf of X. Suppose L(X|θ) is the likelihood function
of X = (X1 , X2 , · · · , Xn ).
Using Calculus : Test for finding maximum value If the likelihood
function θ 7→ L(X|θ) is a smooth function (twice differentiable).
Santosh K. Vipparthi Machine Learning: MLE 13/19
Finding Maximum Likelihood Estimate
Let (Ω, P) be a probability space, and X = (X1 , X2 , · · · , Xn ) be an random
variables. Suppose fk is the pdf of Xk , for k = 1, 2, · · · , n.
Suppose θ is an unknown parameter and fk (Xk |θ) is the pdf for the true value
of θ, f (X|θ) is the joint pdf of X. Suppose L(X|θ) is the likelihood function
of X = (X1 , X2 , · · · , Xn ).
Using Calculus : Test for finding maximum value If the likelihood
function θ 7→ L(X|θ) is a smooth function (twice differentiable).
Step 1:
d dL
Find all θ’s such that L= = 0. (here we get all the extreme points of
dθ dθ
L).
Santosh K. Vipparthi Machine Learning: MLE 13/19
Finding Maximum Likelihood Estimate
Let (Ω, P) be a probability space, and X = (X1 , X2 , · · · , Xn ) be an random
variables. Suppose fk is the pdf of Xk , for k = 1, 2, · · · , n.
Suppose θ is an unknown parameter and fk (Xk |θ) is the pdf for the true value
of θ, f (X|θ) is the joint pdf of X. Suppose L(X|θ) is the likelihood function
of X = (X1 , X2 , · · · , Xn ).
Using Calculus : Test for finding maximum value If the likelihood
function θ 7→ L(X|θ) is a smooth function (twice differentiable).
Step 1:
d dL
Find all θ’s such that L= = 0. (here we get all the extreme points of
dθ dθ
L).
Step 2:
d2 d2 L
For those above obtained values of θ’s, find all θ’s for which 2 L = < 0.
dθ dθ2
Santosh K. Vipparthi Machine Learning: MLE 13/19
Finding Maximum Likelihood Estimate
Let (Ω, P) be a probability space, and X = (X1 , X2 , · · · , Xn ) be an random
variables. Suppose fk is the pdf of Xk , for k = 1, 2, · · · , n.
Suppose θ is an unknown parameter and fk (Xk |θ) is the pdf for the true value
of θ, f (X|θ) is the joint pdf of X. Suppose L(X|θ) is the likelihood function
of X = (X1 , X2 , · · · , Xn ).
Using Calculus : Test for finding maximum value If the likelihood
function θ 7→ L(X|θ) is a smooth function (twice differentiable).
Step 1:
d dL
Find all θ’s such that L= = 0. (here we get all the extreme points of
dθ dθ
L).
Step 2:
d2 d2 L
For those above obtained values of θ’s, find all θ’s for which 2 L = < 0.
dθ dθ2
Step 3:
Still if you have more θ’s, then find for what value of θ, with L(θ) is maximum.
Santosh K. Vipparthi Machine Learning: MLE 13/19
Finding Maximum Likelihood Estimate
Let (Ω, P) be a probability space, and X = (X1 , X2 , · · · , Xn ) be an random
variables. Suppose fk is the pdf of Xk , for k = 1, 2, · · · , n.
Suppose θ is an unknown parameter and fk (Xk |θ) is the pdf for the true value
of θ, f (X|θ) is the joint pdf of X. Suppose L(X|θ) is the likelihood function
of X = (X1 , X2 , · · · , Xn ).
Using Calculus : Test for finding maximum value If the likelihood
function θ 7→ L(X|θ) is a smooth function (twice differentiable).
Step 1:
d dL
Find all θ’s such that L= = 0. (here we get all the extreme points of
dθ dθ
L).
Step 2:
d2 d2 L
For those above obtained values of θ’s, find all θ’s for which 2 L = < 0.
dθ dθ2
Step 3:
Still if you have more θ’s, then find for what value of θ, with L(θ) is maximum.
using log-likelihood function Some times it is difficult to differentiate the
likelihood function L. In this case we take logarithmic L, which is known as
log-likelihood function, denoted by ℓ(θ). Now proceed with the same
procedure mentioned above for ℓ(θ).
Santosh K. Vipparthi Machine Learning: MLE 13/19
MLE for more then one unknown parameters Let (Ω, P) be a
probability space, and X = (X1 , X2 , · · · , Xn ) be an random variables.
Suppose fk is the pdf of Xk , for k = 1, 2, · · · , n.
Suppose θi (for i ≥ 2) is an unknown parameter and fk (Xk |θi ) is the pdf for
the true value of θi , f (X|θi ) is the joint pdf of X. Suppose ℓ(X|θi ) is the
log-likelihood function of X = (X1 , X2 , · · · , Xn ).
Santosh K. Vipparthi Machine Learning: MLE 14/19
MLE for more then one unknown parameters Let (Ω, P) be a
probability space, and X = (X1 , X2 , · · · , Xn ) be an random variables.
Suppose fk is the pdf of Xk , for k = 1, 2, · · · , n.
Suppose θi (for i ≥ 2) is an unknown parameter and fk (Xk |θi ) is the pdf for
the true value of θi , f (X|θi ) is the joint pdf of X. Suppose ℓ(X|θi ) is the
log-likelihood function of X = (X1 , X2 , · · · , Xn ).
Using Calculus : Test for finding maximum value If the likelihood
function ℓ(X|θi ) is a smooth function (twice differentiable).
Santosh K. Vipparthi Machine Learning: MLE 14/19
MLE for more then one unknown parameters Let (Ω, P) be a
probability space, and X = (X1 , X2 , · · · , Xn ) be an random variables.
Suppose fk is the pdf of Xk , for k = 1, 2, · · · , n.
Suppose θi (for i ≥ 2) is an unknown parameter and fk (Xk |θi ) is the pdf for
the true value of θi , f (X|θi ) is the joint pdf of X. Suppose ℓ(X|θi ) is the
log-likelihood function of X = (X1 , X2 , · · · , Xn ).
Using Calculus : Test for finding maximum value If the likelihood
function ℓ(X|θi ) is a smooth function (twice differentiable).
Step 1:
∂ℓ
Find all θi ’s such that = 0. (here we get all the extreme points of L).
∂θi
Santosh K. Vipparthi Machine Learning: MLE 14/19
MLE for more then one unknown parameters Let (Ω, P) be a
probability space, and X = (X1 , X2 , · · · , Xn ) be an random variables.
Suppose fk is the pdf of Xk , for k = 1, 2, · · · , n.
Suppose θi (for i ≥ 2) is an unknown parameter and fk (Xk |θi ) is the pdf for
the true value of θi , f (X|θi ) is the joint pdf of X. Suppose ℓ(X|θi ) is the
log-likelihood function of X = (X1 , X2 , · · · , Xn ).
Using Calculus : Test for finding maximum value If the likelihood
function ℓ(X|θi ) is a smooth function (twice differentiable).
Step 1:
∂ℓ
Find all θi ’s such that = 0. (here we get all the extreme points of L).
∂θi
Step 2:
For those above obtained values of find any one of θi ’s, say θ1 such that
∂2ℓ
< 0.
∂θ12
Santosh K. Vipparthi Machine Learning: MLE 14/19
MLE for more then one unknown parameters Let (Ω, P) be a
probability space, and X = (X1 , X2 , · · · , Xn ) be an random variables.
Suppose fk is the pdf of Xk , for k = 1, 2, · · · , n.
Suppose θi (for i ≥ 2) is an unknown parameter and fk (Xk |θi ) is the pdf for
the true value of θi , f (X|θi ) is the joint pdf of X. Suppose ℓ(X|θi ) is the
log-likelihood function of X = (X1 , X2 , · · · , Xn ).
Using Calculus : Test for finding maximum value If the likelihood
function ℓ(X|θi ) is a smooth function (twice differentiable).
Step 1:
∂ℓ
Find all θi ’s such that = 0. (here we get all the extreme points of L).
∂θi
Step 2:
For those above obtained values of find any one of θi ’s, say θ1 such that
∂2ℓ
< 0.
∂θ12
Step 3:
∂2ℓ
Replacing this θ1 , find next θ2 such that < 0.
∂θ22
Santosh K. Vipparthi Machine Learning: MLE 14/19
MLE for more then one unknown parameters Let (Ω, P) be a
probability space, and X = (X1 , X2 , · · · , Xn ) be an random variables.
Suppose fk is the pdf of Xk , for k = 1, 2, · · · , n.
Suppose θi (for i ≥ 2) is an unknown parameter and fk (Xk |θi ) is the pdf for
the true value of θi , f (X|θi ) is the joint pdf of X. Suppose ℓ(X|θi ) is the
log-likelihood function of X = (X1 , X2 , · · · , Xn ).
Using Calculus : Test for finding maximum value If the likelihood
function ℓ(X|θi ) is a smooth function (twice differentiable).
Step 1:
∂ℓ
Find all θi ’s such that = 0. (here we get all the extreme points of L).
∂θi
Step 2:
For those above obtained values of find any one of θi ’s, say θ1 such that
∂2ℓ
< 0.
∂θ12
Step 3:
∂2ℓ
Replacing this θ1 , find next θ2 such that < 0.
∂θ22
Step 4:
Similary find all θi ’s
Santosh K. Vipparthi Machine Learning: MLE 14/19
MLE for more then one unknown parameters Let (Ω, P) be a
probability space, and X = (X1 , X2 , · · · , Xn ) be an random variables.
Suppose fk is the pdf of Xk , for k = 1, 2, · · · , n.
Suppose θi (for i ≥ 2) is an unknown parameter and fk (Xk |θi ) is the pdf for
the true value of θi , f (X|θi ) is the joint pdf of X. Suppose ℓ(X|θi ) is the
log-likelihood function of X = (X1 , X2 , · · · , Xn ).
Using Calculus : Test for finding maximum value If the likelihood
function ℓ(X|θi ) is a smooth function (twice differentiable).
Step 1:
∂ℓ
Find all θi ’s such that = 0. (here we get all the extreme points of L).
∂θi
Step 2:
For those above obtained values of find any one of θi ’s, say θ1 such that
∂2ℓ
< 0.
∂θ12
Step 3:
∂2ℓ
Replacing this θ1 , find next θ2 such that < 0.
∂θ22
Step 4:
Similary find all θi ’s
Using this we get required MEL, θˆi
Santosh K. Vipparthi Machine Learning: MLE 14/19
Let X be a Bernaoulli distribution and x ∈ {0, 1} be a trail in X.
Probability mass function (PMF): P (X = x|θ) = θx (1 − θ)1−x
Log-likelihood: ℓ(θ) = x log θ + (1 − x) log(1 − θ)
dℓ
MLE method: Find θ̂ such that = 0.
dθ
dℓ x 1−x
= − =0
dθ θ 1−θ
x(1 − θ) − θ(1 − x)
= 0 ⇒ x(1 − θ) − θ(1 − x) = 0
θ(1 − θ)
x − xθ − θ(1 − x) = 0 ⇒ x − θ = 0 ⇒ θ = x
MLE: θ̂ = x
Santosh K. Vipparthi Machine Learning: MLE 15/19
MLE for Bernoulli Trials
Let X = (X1 , X2 , · · · Xn ) be iid of Bernaoulli distributions and
x = (x1 , x2 , · · · , xn ) trails in X (here each xn ∈ {0, 1}).
Probability mass function (PMF):
n
Y
P (X = x|θ) = θx1 (1 − θ)1−x1 = θx (1 − θ)1−x , x ∈ {0, 1}
i=1
n
X
Log-likelihood: ℓ(θ) = xi log θ + (1 − xi ) log(1 − θ)
i=1
dℓ
MLE method: Find θ̂ such that =0
dθ
k
X xi 1 − xi
⇒ − =0
θ 1−θ
i=1
n
1X
MLE: θ̂ = xi
n
i=1 Santosh K. Vipparthi Machine Learning: MLE 16/19
MLE for a Binomial Trail
Let X ≈ Bio(n, θ) be iid of Binomial distribution and x trail in X.
n x
Probability mass function (PMF): P (X = x|θ) = θ (1 − θ)n−x .
x
n
Log-likelihood: ℓ(p) = x log θ + (n − x) log(1 − θ)
x
dℓ
MLE method: Find θ̂ such that = 0.
dθ
dℓ n x n−x
= − =0
dθ x θ 1−θ
x(1 − θ) − θ(n − x)
= 0 ⇒ x(1 − θ) − θ(1 − x) = 0
θ(1 − θ)
x
x − xθ − θ(n − x) = 0 ⇒ x − nθ = 0 ⇒ θ =
n
x
MLE: θ̂ =
n
Santosh K. Vipparthi Machine Learning: MLE 17/19
MLE for Binomial Trails
Let X = (X1 , X2 , · · · Xn ) be iid of Binomial distributions with
Xi ≈ Bio(n, θ), for each i, and x = (x1 , x2 , · · · , xn ) trails in X.
Probability mass function (PMF):
n
Y n
P (X = x|θ) = θxi (1 − θ)n−xi
xi
i=1
n
X n
Log-likelihood: ℓ(θ) = [xi log θ + (n − xi ) log(1 − θ)]
xi
i=1
dℓ
MLE method: Find θ̂ such that =0
dθ
k
X n xi 1 − xi
⇒ − =0
xi θ 1−θ
i=1
n
1X
MLE: θ̂ = xi
n
i=1
Santosh K. Vipparthi Machine Learning: MLE 18/19
MLE for a Normal Distribution
Let X = (X1 , X2 , · · · Xn ) be iid of Binomial distributions with
Xi ≈ N (µ, σ 2 ), for each i, and x = (x1 , x2 , · · · , xn ) trails in X.
(x−µ)2
Probability density function (PDF): f (x|µ, σ 2 ) = √ 1 e− 2σ 2
2πσ 2
n
n n 1 X
Log-likelihood: ℓ(µ, σ 2 ) = − log(2π) − log(σ 2 ) − 2 (xi − µ)2
2 2 2σ
i=1
∂ℓ ∂ℓ
MLE Method: Find µ and σ2 such that ∂ ∂µ =0= ∂σ 2
∂2ℓ
and > 0.
∂µ2
Santosh K. Vipparthi Machine Learning: MLE 19/19
MLE for a Normal Distribution
Let X = (X1 , X2 , · · · Xn ) be iid of Binomial distributions with
Xi ≈ N (µ, σ 2 ), for each i, and x = (x1 , x2 , · · · , xn ) trails in X.
(x−µ)2
Probability density function (PDF): f (x|µ, σ 2 ) = √ 1 e− 2σ 2
2πσ 2
n
n n 1 X
Log-likelihood: ℓ(µ, σ 2 ) = − log(2π) − log(σ 2 ) − 2 (xi − µ)2
2 2 2σ
i=1
∂ℓ ∂ℓ
MLE Method: Find µ and σ2 such that ∂ ∂µ =0= ∂σ 2
∂2ℓ
and > 0.
∂µ2
MLE:
n
1X
µ̂ = xi
n
i=1
n
2 1X
σ̂ = (xi − µ̂)2
n
i=1
Santosh K. Vipparthi Machine Learning: MLE 19/19
MLE for a Normal Distribution
Let X = (X1 , X2 , · · · Xn ) be iid of Binomial distributions with
Xi ≈ N (µ, σ 2 ), for each i, and x = (x1 , x2 , · · · , xn ) trails in X.
(x−µ)2
Probability density function (PDF): f (x|µ, σ 2 ) = √ 1 e− 2σ 2
2πσ 2
n
n n 1 X
Log-likelihood: ℓ(µ, σ 2 ) = − log(2π) − log(σ 2 ) − 2 (xi − µ)2
2 2 2σ
i=1
∂ℓ ∂ℓ
MLE Method: Find µ and σ2 such that ∂ ∂µ =0= ∂σ 2
∂2ℓ
and > 0.
∂µ2
MLE:
n
1X
µ̂ = xi
n
i=1
n
2 1X
σ̂ = (xi − µ̂)2
n
i=1
Santosh K. Vipparthi Machine Learning: MLE 19/19