0% found this document useful (0 votes)

29 views36 pages

02 Review Estimation 2

Uploaded by

dummyrummy50

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views36 pages

02 Review Estimation 2

Uploaded by

dummyrummy50

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Decision Theory

(Largely parametric) Estimation

Prof. Dr. Matei Demetrescu

1 / 36
Stronger assumptions may actually help...
We've seen that

■ some estimators don't require any assumptions except the ones we

make about the sampling scheme, yet

■ others manage to estimate parameters on the basis of minimal

additional assumptions.

Neither the plug-in estimator(s) nor the method-of-moment approach

make any optimality/eciency claim.

This may however be achieved in a more parametric setup!

2 / 36
Today's outline
(Largely parametric) Estimation
1 The MLE

2 MLE (asymptotic) properties

3 Bayesian inference

4 Up next

3 / 36
The MLE

Moving on to...
1 The MLE

2 MLE (asymptotic) properties

3 Bayesian inference

4 Up next

4 / 36
The MLE

R. A. Fisher: Likelihood estimation

The idea is to pick those parameter values as estimates that have most
plausibly generated the observed sample, as quantied by the likelihood
⃗ (x; θ).
L(θ; x) := fX
Formally, our Maximum Likelihood estimate, is for a given sample x,

θ̂ = arg max L(θ; x).

θ∈Θ

Given random sampling (and measurability) we up with the ML estimator

⃗ = arg max log L(θ; X).

θ̂ = arg max L(θ; X) ⃗
θ∈Θ θ∈Θ 5 / 36
The MLE

Dierent samples, dierent outcomes

−10
2.0e−05

−20
Log−Likelihood
Likelihood

−30
1.0e−05

−40
−50
0.0e+00

−2 −1 0 1 2 −2 −1 0 1 2

mu mu 6 / 36
The MLE

Implementation of the MLE

For smooth log-likelihoods with the maximum in the interior of Θ,
■ Solve the rst-order conditions (f.o.c.),

⃗
∂ℓ(θ̂;X)
⃗ = ∇ℓ(θ; X)
∇ log L(θ; X) ⃗ = = 0;
∂θ

■ More often than not this needs to be done numerically.

⃗
∂ 2 ℓ(θ̂;X)
■ The 2nd-order condition is a negative denite Hessian
∂θ∂θ ′
.

Even without interior solutions or smoothness of ℓ, a value θ̂ that solves

⃗
arg max ℓ(θ; X) is a ML estimate, no matter how it is obtained.
θ∈Θ
7 / 36
The MLE

A hopefully seldom situation

Take e.g. Θ = [0, ∞).

0
−2
With no stationary points,

Log−Likelihood
we need to check the behavior at

−4
the boundaries!

−6
(Should in fact do that anyway.)

−8
−10
In this case, θ̂ = 0.
0 2 4 6 8 10

mu 8 / 36
The MLE

Two very dierent cases

Example (Smooth... )

Let X⃗ = (X1 , ..., Xn ) be a iid sample from an exponential population,

f (x; θ) = 1θ exp − xθ , x ∈ (0, ∞), θ ∈ Θ = (0, ∞).

The MLE of θ is X̄n .

Example (... and not so smooth)

Let X⃗ = (X1 , ..., Xn ) be a iid sample from a uniform population

1
f (x; a, b) = b−a I[a,b] (x), −∞ < a < b < ∞.
The MLEs are â = X[1] and b̂ = X[n] . 9 / 36
The MLE

Concentrating out parameters

′
Let θ = θ ′1 , θ ′2 ; θ1 is of interest and θ2 is a nuisance ...

Example (Normal population)

Let θ = (µ, σ)′ , with σ 2 the nuisance parameter. Then

n n 1 1 Xn
ℓ µ, σ 2 ; x = − log 2π − log σ 2 − (Xi − µ)2

2 2 2σ 2 i=1
1
Pn
(X i − µ)
leading to the f.o.c. ∇ℓ = σ 2 i=1
= 0.
− 2σn2 + 2σ1 4 ni=1 (Xi − µ)2
P

We may solve the f.o.c. for µ independently of σ 2 !

10 / 36
The MLE

Prole likelihood
Example (continued)

In fact, one may compute σ 2 (µ) = n1 ni=1 (Xi − µ)2 and plug it into ℓ
P

to obtain the concentrated or prole likelihood

Pn 2
ℓ = − n2 log 2π − n2 − n2 log n1 i=1 (X i − µ)
which is still optimized at µ̂ = X̄ .
■ Even if we can't directly solve the f.o.c. for θ1 only

■ we may still solve for θ 2 = f (θ 1 ), and plug in this expression in ℓ to

obtain the prole log-likelihood.

11 / 36
The MLE

Conditional likelihood I
Conditioning on a (sucient) statistic S for θ2 may also eliminate the

nuisance parameter.

A simple and intuitive form of this conditioning is available for

regression models with stochastic regressors.

f (x, y; θ) be the joint density of X and Y (both scalar for simplicity).

Let

■ The likelihood is L(θ) = ni=1 f (xi , yi ; θ)

■ ... which can be a complicated expression (even in log form).

But say you're only interested in the dependence of Y on X ...

12 / 36
The MLE

Conditional likelihood II
Express the joint pdf as

f (x, y; θ) = fY |X (y, x; θ 1 ) · fX (x; θ 2 ) .

leading to a log-likelihood

Xn Xn
ℓ= log fY |X (Yi , Xi ; θ 1 ) + log fX (Xi ; θ 2 )
i=1 i=1

where maximizations w.r.t. θ1 and θ2 are entirely independent!

If the marginal behavior of X is not of interest, just specify fY |X !

(Or just the regression function if not interested in more.) 13 / 36
The MLE

ML as MM
iid ⃗ = Pn ln f (X i ; θ).
Let X i ∼ f (x; θ) s.t. ln L(θ; X) i=1

Regularity conditions assumed, we have (in the continuous case)

∂ ln f (X i ;θ)
E ∂θ = 0.
Hence,

∂ ln f (X i ;θ 0 )
Eθ0 (g(X i , θ 0 )) = Eθ0 ∂θ = 0,
which are moment conditions with sample counterparts

n n
1X 1 X ∂ ln f (X i ; θ) ⃗)
1 ∂ ln L(θ; X
g(X i , θ) = = = 0.
n i=1 n i=1 ∂θ n ∂θ
14 / 36
The MLE

Quasi likelihood
Often, we do not really know what the population distribution looks like.

■ But you still have a iid sample X 1 , . . . , X n , and

■ you have assumed something, say that X ∼ g (x, θ).

(Perhaps even knowing it's not the right density.)

This allows you to set up a likelihood,

Yn
L(θ; X 1 , . . . , X n ) = fX
⃗ (X 1 . . . , X n ; θ) = g (X i , θ) .
i=1

And you can easily compute the resulting ML estimator.

But if actually X i ∼ f (x), what does θ̂ behave like?

15 / 36
The MLE

Some simple linear regression

Example

Assume Y |X = x ∼ N α + βX, σ 2 . The conditional likelihood is then

all you have,

Yn 1 Y −α−βX 2
− 12 ( i σ i )
L (α, β) = √ e
i=1 2πσ 2
leading to the log-likelihood
n n 1 Xn
ℓ (α, β) = − ln 2π − ln σ 2 − 2 (Yi − α − βXi )2 .
2 2 2σ i=1

16 / 36
The MLE

Quasi-ML
Maximizing the Gaussian (conditional) likelihood is equivalent to

minimizing the residual sum of squares. But Least Squares is nice!

So Gaussian Quasi-ML also behaves nicely (in semiparametric ways):

■ The important part is that parameters of interest still identied!

■ Identication: expected gradient of quasi-log-likelihood still has the

true parameters as zeros, which amounts to

■ ... correctly specifying the conditional mean for the regression case.

From this perspective, Quasi-ML can be interpreted as a

method-of-moments estimator too.

17 / 36
MLE (asymptotic) properties

Moving on to...
1 The MLE

2 MLE (asymptotic) properties

3 Bayesian inference

4 Up next

18 / 36
MLE (asymptotic) properties

Sometimes we're (not) lucky

In many cases, MLEs are explicit functions of the sample, ⃗ ,
θ̂ = θ(X) and

we may analyze it directly:

■ either derive exact sampling distribution, or

■ provide some asymptotic approximation (using WLLN, CLT, delta

method).

Otherwise, asymptotics are the only chance:

■ Argue rst that the ML estimator is consistent, and

■ (for smooth likelihoods) linearize the f.o.c. to establish normality.

We don't deal with this (but see maybe additional material in moodle).
19 / 36
MLE (asymptotic) properties

Need to talk about identication

Note that

■ ... identication is a blurred concept in the nonparametric case.

■ This changes in the (fully specied) parametric case!

Denition (model-based, global identication)

A parametric
modelis identied
if and only if, for any sample X
⃗,
⃗ X |θ 2 implies θ 1 = θ 2 .
⃗
⃗ X |θ 1 = fX
fX ⃗

If identication is given then the MLE must be unique.

20 / 36
MLE (asymptotic) properties

The asymptotic take

If consistency is given, then

Proposition

Let θ = θ0 . Regularity conditions assumed, it then holds

 2 
E ∂ ln f (Xi ; θ0 )/∂θ
√ d  
n(θ̂ − θ0 ) → N  0 , h 2 i2 .
 ∂ ln f (Xi ;θ0 ) 
E ∂θ 2

21 / 36
MLE (asymptotic) properties

The information equality

Under additional regularity conditions,

2
∂ ln f (Xi ; θ) 2 ∂ ln f (Xi ; θ)
E = −E = −I(θ),
∂θ ∂θ2

so that
!
√
2 −1
d
n(θ̂ − θ0 ) → N 0 ; −I −1 (θ0 ) ≡ N 0 ; E ∂ ln f∂θ(Xi ;θ)

.

In such cases the asymptotic variance of the MLE converges to the

so-called Cramér-Rao lower bound: the MLE is asymptotically ecient.

22 / 36
MLE (asymptotic) properties

Estimating standard errors

Consistent estimates for the variance of the asymptotic distribution of

MLEs can be obtained by

2 n
1 X ∂ 2 ln f (Xi ; θ̂)

∂ ln f (X i ; θ 0 )
E
b =
∂θ2 n i=1 ∂θ2
2 n
∂ ln f (X i ; θ 0 ) 1 X ∂ ln f (Xi ; θ̂) 2
E
b = .
∂θ n i=1 ∂θ

This delivers the so called asymptotic standard errors of MLEs.

23 / 36
Bayesian inference

Moving on to...
1 The MLE

2 MLE (asymptotic) properties

3 Bayesian inference

4 Up next

24 / 36
Bayesian inference

A change of perspective
So far, we assumed that

■ our sample ⃗
X is random with some joint distribution/pdf ⃗ (x; θ),
fX
where θ is some xed, though unknown parameter.
■ The data are used to obtain an estimate for θ, and

■ the sampling distribution of the estimator describes the uncertainty,

■ which was analyzed in a (ctional?) repeated sampling framework.

In Bayesian statistics, θ itself is regarded as a random variable.

A random θ captures the uncertainty about the parameter,

... and allows for (dierent) inference from data!

25 / 36
Bayesian inference

Why would we want random parameters?

If θ is (regarded as) random, then ⃗
X and θ have a joint distribution.

Moreover, conditional distributions have nice interpretations:

■ fX
⃗
⃗
X|θ is the distribution of the sample given specic parameter

values (i.e. the data generating process)

⃗
■ f θ|X is the distribution of the parameters given the sample.

posterior distribution

The so-called ⃗
f θ|X then conveys all the

available information available about θ.

The goal of Bayesian statistics is to provide ⃗
f θ|X .
26 / 36
Bayesian inference

Setup
Assume that ⃗ = (X 1 , ..., X n )
X is a iid sample from the joint pdf

⃗ (x|θ),
fX dening the sample likelihood.

We have ⃗ (x|θ) = L (θ; x),
fX but would like ⃗
f θ|X !

To get there, we need another ingredient:

■ Any information we have about θ before observing the data ⃗

X is

represented by the so-called prior density denoted by f (θ).

■ This is a density which we need to specify in such a way that it

reects our prior knowledge or beliefs (or both).

27 / 36
Bayesian inference

The Bayes theorem

If we have a prior, then we may resort to Bayes' theorem.

The posterior density is then

f (θ, x) f ⃗ (x|θ)f (θ) f ⃗ (x|θ)f (θ)
f (θ|x) = = X = R X .
f (x) f (x) fX⃗ (x|θ)f (θ)dθ

The posterior combines

■ the prior knowledge about θ summarized by the prior f (θ) with

■ the
information
about ⃗ , contained
θ in the data X in the likelihood
⃗
L θ; X given as fX ⃗
⃗ (x|θ) evaluated at X .
28 / 36
Bayesian inference

Almost like learning

The posterior

■ represents our revised knowledge/beliefs about the distribution of θ

after seeing ⃗
the data X

■ is obtained as a mixture of the prior information and current

information (data!)

■ is available to be the prior when the next body of data is available.

Essentially, we continuously update our knowledge about θ.

One other advantage over classical statistics is that

■ once you agree on prior and model,

■ you always get the same answer from the data. 29 / 36

Bayesian inference

Focus on the essential

In this setting, f (x) is referred to as the marginal data density, which
does not involve the parameter of interest. With ∝ meaning `is

proportional to',

⃗ (x|θ) · f (θ) .
f (θ|x) ∝ fX
| {z } | {z } |{z}
posterior likelihood prior

Note that the product ⃗
L θ; X f (θ) does not dene a proper density. It

represents a so called density kernel for the posterior density of θ.

■ Sometimes the kernel suces (e.g. when nding the posterior mode).

■ Other times we need to compute it or some other workaround.

30 / 36
Bayesian inference

The normal-normal example

Example

Let X
⃗ = (X1 , ..., Xn ) be a iid sample from a N (µ, σ 2 ) population.
■ The variance σ 2 is assumed to be known.
■ Say the prior information can be represented by a µ ∼ N (m, v 2 )
prior distribution, where the prior parameters (m, v 2 ) are also known.
The posterior f (µ|X)
⃗ is then a N (µ∗ , σ 2 ) density with
∗

σ2v2 nX̄v 2 + mσ 2
σ∗2 1
P
= 2 , µ∗ = , where X̄ = n i Xi .
nv + σ 2 nv 2 + σ 2
31 / 36
Likelihood dominance Typical prior vs. posterior

0 1 2 3 4 0.0 0.5 1.0 1.5

−2
−1
0

Index
1
2
Bayesian inference

Flat Likelihood Data−Prior conflict

0 1 2 3 4 0.0 0.5 1.0 1.5

−2
−1
0

Index
The normal-normal example cont'd

1
2

32 / 36
Bayesian inference

Modelling choices
One problem with the Bayes approach is its dependence on a prior. (The

likelihood is also a matter of choice, actually, but we ignore that for now.)

■ While there are cases where a particular prior is well-justied,

■ The prior may seem entirely arbitrary in others.

Good Bayesians make choices transparent, and assess sensitivity of the

results w.r.t. the choices they make.

33 / 36
Bayesian inference

In practice
Applied researchers may consider

■ Priors chosen for computational convenience (e.g. conjugate priors);

■ So-called non-informative priors (e.g. uniform distributions);

■ Empirical Bayes (estimate the prior from data);

■ Hierarchical priors; or just

■ Other people's priors.

(For more interesting stu take a class dedicated to Bayesian Statistics.)

34 / 36
Up next

Moving on to...
1 The MLE

2 MLE (asymptotic) properties

3 Bayesian inference

4 Up next

35 / 36
Up next

Coming up

Review of testing

36 / 36

MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
MLE Assingnment
No ratings yet
MLE Assingnment
7 pages
Ps 2,3
No ratings yet
Ps 2,3
48 pages
T 3 Estimation
No ratings yet
T 3 Estimation
20 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
6 pages
7 Mle
No ratings yet
7 Mle
31 pages
Maximum Likelihood Estimation (MLE)
No ratings yet
Maximum Likelihood Estimation (MLE)
4 pages
Mstat Note12 Parametric Inference FSP
No ratings yet
Mstat Note12 Parametric Inference FSP
45 pages
Maximum Likelihood Estimation Lecture
No ratings yet
Maximum Likelihood Estimation Lecture
22 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
7 pages
Mathematical Statistics (MA212M) : Lecture Slides
No ratings yet
Mathematical Statistics (MA212M) : Lecture Slides
14 pages
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
No ratings yet
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
5 pages
L08 MaximumLikelihoodEstimation
No ratings yet
L08 MaximumLikelihoodEstimation
5 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
Applied Time-Series Analysis: Arun K. Tangirala
No ratings yet
Applied Time-Series Analysis: Arun K. Tangirala
50 pages
Lecture1 ML MLE
No ratings yet
Lecture1 ML MLE
103 pages
Lecture 15
No ratings yet
Lecture 15
10 pages
3 - Mle
No ratings yet
3 - Mle
14 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
4 - Logistic Reg 2
No ratings yet
4 - Logistic Reg 2
44 pages
Module 4
No ratings yet
Module 4
3 pages
11 Mle
No ratings yet
11 Mle
26 pages
Section 5
No ratings yet
Section 5
18 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
Module02B Slides Print 1
No ratings yet
Module02B Slides Print 1
59 pages
STAT 135 Solutions To Homework 4:: 30 Points
No ratings yet
STAT 135 Solutions To Homework 4:: 30 Points
9 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
16 pages
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
No ratings yet
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
6 pages
Inf 2
No ratings yet
Inf 2
37 pages
Math for CompSci: MLE & Regularization
No ratings yet
Math for CompSci: MLE & Regularization
46 pages
Statistical Inference: Classical and Bayesian Methods
No ratings yet
Statistical Inference: Classical and Bayesian Methods
22 pages
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
No ratings yet
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
35 pages
Maximum Likelihood Learning of Gaussians For Data Mining
No ratings yet
Maximum Likelihood Learning of Gaussians For Data Mining
25 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
11 pages
Estimadores Extremos: Máxima Verossimilhança (MLE)
No ratings yet
Estimadores Extremos: Máxima Verossimilhança (MLE)
35 pages
DS 630 - Lec 02 - ST
No ratings yet
DS 630 - Lec 02 - ST
34 pages
4 MLEHandout
No ratings yet
4 MLEHandout
36 pages
Sta255 Week 11-2 Pre
No ratings yet
Sta255 Week 11-2 Pre
21 pages
Lecture17 Mle Map
No ratings yet
Lecture17 Mle Map
29 pages
Frequentist Estimation: 4.1 Likelihood Function
No ratings yet
Frequentist Estimation: 4.1 Likelihood Function
6 pages
Sta255 Week 11-1 Pre
No ratings yet
Sta255 Week 11-1 Pre
37 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Invariance and Unbaisness
No ratings yet
Invariance and Unbaisness
16 pages
Lec11 Introduction2BayesianStatistics
No ratings yet
Lec11 Introduction2BayesianStatistics
48 pages
Hasan 2 - Estimation Methods Method of Moments and Maximum Likelihood
No ratings yet
Hasan 2 - Estimation Methods Method of Moments and Maximum Likelihood
5 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
55 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
جلسه پنجم-1
No ratings yet
جلسه پنجم-1
15 pages
M3 DensityEstimation v1
No ratings yet
M3 DensityEstimation v1
65 pages
Lec 4
No ratings yet
Lec 4
35 pages
ELEN4001 Electric Power Transmission and Distribution Semester 2 2017 Bentley Campus INT
No ratings yet
ELEN4001 Electric Power Transmission and Distribution Semester 2 2017 Bentley Campus INT
7 pages
Concealment and Exposure - Thomas Nagel
No ratings yet
Concealment and Exposure - Thomas Nagel
17 pages
Home Exercise 3 Algorithms
No ratings yet
Home Exercise 3 Algorithms
8 pages
Resume: (VLSI) Design. Power Consumed During Test Mode Operation Higher Than During Normal
No ratings yet
Resume: (VLSI) Design. Power Consumed During Test Mode Operation Higher Than During Normal
3 pages
Et200sp System Manual en-US en-US
No ratings yet
Et200sp System Manual en-US en-US
271 pages
IV Sem - Information Technology For Office
100% (1)
IV Sem - Information Technology For Office
30 pages
Stability Analysis of Microgrids With Constant Power Loads
No ratings yet
Stability Analysis of Microgrids With Constant Power Loads
6 pages
Bilingual Advertisement
No ratings yet
Bilingual Advertisement
32 pages
wwjc2013 Openjailbreakv2
No ratings yet
wwjc2013 Openjailbreakv2
40 pages
DIGITAL ENERGY MEDICINE: A Revolitonary New Form of Energy Medicine For The Digital Age
100% (20)
DIGITAL ENERGY MEDICINE: A Revolitonary New Form of Energy Medicine For The Digital Age
29 pages
Speck and Eiseley - 1942 - Montagnais-Naskapi Bands and Family Hunting Districts of The Central and Southeastern Labrador Penin
No ratings yet
Speck and Eiseley - 1942 - Montagnais-Naskapi Bands and Family Hunting Districts of The Central and Southeastern Labrador Penin
29 pages
STRSTR Function
No ratings yet
STRSTR Function
2 pages
Recycling PDF
No ratings yet
Recycling PDF
5 pages
Grade 8 Quiz 4 T2 Reinforcement Sheet Answer Key
No ratings yet
Grade 8 Quiz 4 T2 Reinforcement Sheet Answer Key
3 pages
Bihar State Universities (Amendment and Validation) Act, 2012
100% (1)
Bihar State Universities (Amendment and Validation) Act, 2012
22 pages
Grade IX SST-PT 3 Model QP 2024-25
No ratings yet
Grade IX SST-PT 3 Model QP 2024-25
4 pages
Universal Design For Learning
100% (3)
Universal Design For Learning
6 pages
Every Man in His Humour
No ratings yet
Every Man in His Humour
149 pages
ICT 7 Activity Sheet: Quarter 3 - Week 2
No ratings yet
ICT 7 Activity Sheet: Quarter 3 - Week 2
6 pages
Form Two Time Classroom 2024
No ratings yet
Form Two Time Classroom 2024
36 pages
Conocimiento y Empresa La Industria Del Software E... - (PG 33 - 42)
No ratings yet
Conocimiento y Empresa La Industria Del Software E... - (PG 33 - 42)
10 pages
d800 Spec
No ratings yet
d800 Spec
2 pages
Music Makers' Digital Identity
No ratings yet
Music Makers' Digital Identity
17 pages
Ikea Case Study
No ratings yet
Ikea Case Study
9 pages
Ethics Explored in Dilbert Episode
No ratings yet
Ethics Explored in Dilbert Episode
2 pages
Holiday Home Work Class Vii All
No ratings yet
Holiday Home Work Class Vii All
28 pages
Math - 1ST Summative
No ratings yet
Math - 1ST Summative
2 pages
COMP2012 Object-Oriented Programming and Data Structures: Fall 2014
No ratings yet
COMP2012 Object-Oriented Programming and Data Structures: Fall 2014
12 pages
Public-Private Agencies Networking For Food Safety
88% (8)
Public-Private Agencies Networking For Food Safety
4 pages
Samtools Manual Page
No ratings yet
Samtools Manual Page
29 pages

02 Review Estimation 2

Uploaded by

02 Review Estimation 2

Uploaded by

Decision Theory

(Largely parametric) Estimation

Prof. Dr. Matei Demetrescu

■ some estimators don't require any assumptions except the ones we

make about the sampling scheme, yet

■ others manage to estimate parameters on the basis of minimal

Neither the plug-in estimator(s) nor the method-of-moment approach

make any optimality/eciency claim.

This may however be achieved in a more parametric setup!

2 MLE (asymptotic) properties

2 MLE (asymptotic) properties

R. A. Fisher: Likelihood estimation

θ̂ = arg max L(θ; x).

Given random sampling (and measurability) we up with the ML estimator

⃗ = arg max log L(θ; X).

Dierent samples, dierent outcomes

Implementation of the MLE

■ More often than not this needs to be done numerically.

Even without interior solutions or smoothness of ℓ, a value θ̂ that solves

A hopefully seldom situation

Two very dierent cases

Let X⃗ = (X1 , ..., Xn ) be a iid sample from an exponential population,

The MLE of θ is X̄n .

Example (... and not so smooth)

Let X⃗ = (X1 , ..., Xn ) be a iid sample from a uniform population

Concentrating out parameters

Example (Normal population)

Let θ = (µ, σ)′ , with σ 2 the nuisance parameter. Then

We may solve the f.o.c. for µ independently of σ 2 !

to obtain the concentrated or prole likelihood

■ we may still solve for θ 2 = f (θ 1 ), and plug in this expression in ℓ to

obtain the prole log-likelihood.

A simple  and intuitive  form of this conditioning is available for

regression models with stochastic regressors.

f (x, y; θ) be the joint density of X and Y (both scalar for simplicity).

■ The likelihood is L(θ) = ni=1 f (xi , yi ; θ)

■ ... which can be a complicated expression (even in log form).

But say you're only interested in the dependence of Y on X ...

f (x, y; θ) = fY |X (y, x; θ 1 ) · fX (x; θ 2 ) .

where maximizations w.r.t. θ1 and θ2 are entirely independent!

If the marginal behavior of X is not of interest, just specify fY |X !

Regularity conditions assumed, we have (in the continuous case)

■ But you still have a iid sample X 1 , . . . , X n , and

(Perhaps even knowing it's not the right density.)

This allows you to set up a likelihood,

And you can easily compute the resulting ML estimator.

But if actually X i ∼ f (x), what does θ̂ behave like?

Some simple linear regression

Assume Y |X = x ∼ N α + βX, σ 2 . The conditional likelihood is then

all you have,

minimizing the residual sum of squares. But Least Squares is nice!

So Gaussian Quasi-ML also behaves nicely (in semiparametric ways):

■ Identication: expected gradient of quasi-log-likelihood still has the

From this perspective, Quasi-ML can be interpreted as a

method-of-moments estimator too.

2 MLE (asymptotic) properties

Sometimes we're (not) lucky

we may analyze it directly:

■ either derive exact sampling distribution, or

■ provide some asymptotic approximation (using WLLN, CLT, delta

Otherwise, asymptotics are the only chance:

■ Argue rst that the ML estimator is consistent, and

■ (for smooth likelihoods) linearize the f.o.c. to establish normality.

Need to talk about identication

■ ... identication is a blurred concept in the nonparametric case.

■ This changes in the (fully specied) parametric case!

Denition (model-based, global identication)

If identication is given then the MLE must be unique.

The asymptotic take

Let θ = θ0 . Regularity conditions assumed, it then holds

The information equality

In such cases the asymptotic variance of the MLE converges to the

so-called Cramér-Rao lower bound: the MLE is asymptotically ecient.

Estimating standard errors

MLEs can be obtained by

This delivers the so called asymptotic standard errors of MLEs.

make any optimality/eciency claim.

Dierent samples, dierent outcomes

Two very dierent cases

to obtain the concentrated or prole likelihood

obtain the prole log-likelihood.

A simple and intuitive form of this conditioning is available for

■ Identication: expected gradient of quasi-log-likelihood still has the

■ Argue rst that the ML estimator is consistent, and

Need to talk about identication

■ ... identication is a blurred concept in the nonparametric case.

■ This changes in the (fully specied) parametric case!

Denition (model-based, global identication)

If identication is given then the MLE must be unique.

so-called Cramér-Rao lower bound: the MLE is asymptotically ecient.

■ which was analyzed in a (ctional?) repeated sampling framework.

... and allows for (dierent) inference from data!

reects our prior knowledge or beliefs (or both).

■ Other times we need to compute it or some other workaround.

■ While there are cases where a particular prior is well-justied,

Good Bayesians make choices transparent, and assess sensitivity of the

(For more interesting stu take a class dedicated to Bayesian Statistics.)