Statistical Modeling, Lecture 4
Melike Efe
July 21, 2025
Sabancı University
Table of contents
1. Point Estimation
Efficient Estimators
Consistency
Sufficient Estimator
1
Point Estimation
Efficient Estimators
▶ Unbiasedness is a desirable property for an estimator. The estimator’s
expected value equals the true value of the parameter.
▶ There are usually many unbiased estimators of a parameter θ.
▶ In that case we look in the class of unbiased estimators for an
estimator with a variance as small as possible.
▶ This process leads to the MVUE, the minimum variance unbiased
estimator.
Definition: For a random sample X1 , X2 , . . . , Xn from a given
distribution with parameter θ, the estimator Θ̂ = h(X1 , X2 , . . . , Xn ) is a
minimum variance unbiased estimator of θ if Θ̂ is unbiased, that is, if
E(Θ̂) = θ (for all possible θ), and Var(Θ̂) is less than or equal to the
variance of any unbiased estimator of θ.
2
▶ We may use the variance as a measure to compare two unbiased
estimators of the same parameter.
▶ Obviously the estimator with the smallest variance is preferred.
Definition: If Θ̂1 and Θ̂2 are two unbiased estimators for the parameter
θ of a given population, we say that Θ̂1 is more efficient than Θ̂2 if the
relative efficiency
Var(Θ̂2 )
e(Θ̂1 , Θ̂2 ) = >1
Var(Θ̂1 )
The above ratio e(Θ̂1 , Θ̂2 ) is called the efficiency of Θ̂1 relative to Θ̂2 .
3
Theorem: Let X1 , X2 , . . . , Xn be a random sample drawn from a
population with the density f (x) and Θ̂ be an unbiased estimator of θ.
Under reasonably general regularity conditions, it follows that
1
Var(Θ̂) ≥
∂ ln f (X ) 2
nE[( ) ]
∂θ
The right-hand side in the above equation is called the Cramér-Rao lower
bound (CRLB).
Therorem: If Θ̂ is an unbiased estimator of θ and
1
Var(Θ̂) =
∂ ln f (X ) 2
nE[( ) ]
∂θ
then Θ̂ is a minimum variance unbiased estimator of θ.
4
Example: Show that X is a minimum variance unbiased estimator of the
mean µ of a normal population.
5
Example: Let X be a random variable taken from a Binomial population
with parameters n and θ. Use Cramér-Rao inequality to show that
Θ̂ = X /n is a minimum variance unbiased estimator of θ.
6
Understanding Mean Square Error (MSE)
Why Variance is Not Always Enough:
If an estimator is biased, variance alone is not a sufficient measure of its
performance.
Definition (Mean Square Error (MSE)): If Θ̂ is an estimator of θ, its
mean square error is:
MSE (Θ̂) := E[(Θ̂ − θ)2 ]
▶ The MSE accounts for both variance and bias.
▶ If the estimator is unbiased, MSE reduces to its variance.
7
Decomposition of MSE: The mean square error can be decomposed as:
MSE (Θ̂) = Var(Θ̂) + [b(Θ̂)2 ]
where b(Θ̂) = E(Θ̂) − θ is the bias of Θ̂.
8
MSE and Biased Estimators
Key Insight:
▶ If we are not restricted to unbiased estimators, the decomposition
formula suggests that a biased estimator might have a lower MSE than
an unbiased one.
▶ This means biased estimators can sometimes be preferable if MSE is
the primary criterion.
9
Example: Let X1 , X2 , . . . , Xn be a random sample from a normal
population with µ = 0 and unknown variance θ = σ 2 .
1 Pn
(a) Show that Θ̂1 = X 2 is an unbiased estimator for θ with
n i=1 i
2θ2
variance and minimum variance unbiased estimator for θ.
n
α Pn
(b) Let Θ̂2 = X 2 and find the value of α such that the MSE is
n i=1 i
minimal.
10
Key Takeaway: Tradeoff Between Bias and Variance
Important Insight:
▶ An estimator with a small bias can have a significantly lower MSE
than an unbiased estimator.
▶ MSE accounts for both variance and bias: MSE = Variance + Bias2 .
▶ This example demonstrates that in some cases, introducing bias can
improve the overall accuracy of an estimator.
▶ In practice, minimizing MSE is often preferable to insisting on
unbiased estimators.
▶ This is the foundation for regularization techniques in machine
learning, where small bias is introduced to significantly reduce variance.
11
What is Consistency?
Key Idea: An estimator is consistent if it gets closer and closer to the
true parameter as the sample size increases.
Definition: An estimator Θ̂ is consistent for a parameter θ if:
lim P(|Θ̂ − θ| < ε) = 1, for every ε > 0.
n→∞
This means that as n grows, the probability that Θ̂ is close to θ
approaches 1.
Intuition:
- If we keep collecting more data, our estimate should get closer to the
true value.
- The probability of making a big mistake should go to zero.
- Unbiasedness is not required, but it helps.
12
Example: Let X1 , X2 , . . . , Xn be a random sample from a continuous
population with the density:
(
1
for 0 < x < θ
f (x) = θ
0 elsewhere
where θ > 0 is an unknown parameter. Determine whether
Yn = max{X1 , X2 , . . . , Xn } is a consistent estimator of θ.
13
Example: Suppose X1 , X2 , . . . , Xn is a random sample from a
distribution with the density function
(
e −(x−θ) for x > θ
f (x) =
0 elsewhere
Determine whether Y1 = min{X1 , X2 , . . . , Xn } is a consistent estimator of
θ.
14
A Sufficient Condition for Consistency
▶ Checking the consistency using its definition is not always easy.
Sometimes the following sufficient condition proves useful:
Theorem: If Θ̂n is an estimator of θ and MSE (Θ̂n ) → 0 as n → ∞,
then Θ̂n is a consistent estimator of θ.
Proof: The proof follows from the Markov inequality which says that for
a given non-negative random variable Y
E (Y )
P(Y ≥ ℓ) ≤ for any constant ℓ > 0.
ℓ
Pick an ϵ > 0, take Y = |Θ̂n − θ|. By Markov inequality,
P(|Θ̂n − θ| > ϵ) = P(|Θ̂n − θ|2 > ϵ2 )
MSE (Θ̂n )
≤
ϵ2
since E (|Θ̂n − θ|2 ) = MSE (Θ̂) by definition of MSE .
15
Remark: Recall the decomposition
MSE (Θ̂n ) = Var(Θ̂n ) + [b(Θ̂n )]2 .
If Θ̂n is an unbiased or asymptotically unbiased estimator of θ, then
[b(Θ̂n )]2 → 0 as n → ∞.
Hence, if Var(Θ̂n ) → 0 as n → ∞, then MSE (Θ̂n ) → 0 as n → ∞.
16
Example: For a random sample from a normal distribution, verify that
the sample variance is a consistent estimator of σ 2 .
17
Example: For a random sample from a normal distribution, the
alternative estimator:
n
1X
Θ̂n = (Xi − X )2
n
i=1
is also a consistent estimator of σ 2 .
18