0% found this document useful (0 votes)

8 views36 pages

Lecture 3

This document is a lecture on Advanced Econometrics I, focusing on Ordinary Least Squares Estimation and Basic Asymptotic Theory. It covers definitions of random variables, distributions, weak convergence, and the Central Limit Theorem, providing essential theoretical foundations for statistical analysis. The lecture also discusses the moment generating function and its applications in establishing limiting distributions.

Uploaded by

juliaxiyi.zhuang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views36 pages

Lecture 3

Uploaded by

juliaxiyi.zhuang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Advanced Econometrics I

Jürgen Meinecke
Lecture 3 of 12
Research School of Economics, Australian National University

1 / 36
Roadmap

Ordinary Least Squares Estimation

Basic Asymptotic Theory (part 2 of 2)
Asymptotic Distribution of the OLS Estimator
Asymptotic Variance Estimation

2 / 36
Let there be a probability space (Ω, ℱ , 𝑃)

• Ω is the outcome space

• ℱ collects events from Ω
• 𝑃 is a probability measure on ℱ

Example (Only Looks Like Rolling a Die)

• Ω = {1, 2, 3, 4, 5, 6}
• ℱ = {{1, 3, 5} , {2, 4, 6} , Ω, ∅}
• Consider all 𝐴 ∈ ℱ
⎧0 if 𝐴 = ∅
{
{
{
{1/2 if 𝐴 = {1, 3, 5}
𝑃(𝐴) = ⎨
{
{ 1/2 if 𝐴 = {2, 4, 6}
{
{1
⎩ if 𝐴 = Ω

Notice that 𝑃({2}) is not specified

3 / 36
Definition (Random Variable—first attempt)
A random variable on (Ω, ℱ ) is a function 𝑍 ∶ Ω → R.

Example
⎧
{18 if 𝜔 even,
𝑋(𝜔) = ⎨
⎩24 if 𝜔 odd
{

Induced probability Pr(𝑋 = 18) ∶= 𝑃({2, 4, 6}) = 1/2

Instead of writing Pr(𝑋 = 18) I will use 𝑃(𝑋 = 18)

Example
⎧
{2 if 𝜔 = 6,
𝑌(𝜔) = ⎨
{
⎩7 if 𝜔 = 1

Induced probability Pr(𝑌 = 2) ∶= 𝑃({6}) = ?

4 / 36
The event {6} is not assigned a probability
Of course we have a reasonable suspicion that 𝑃({6}) should equal
1/6, but strictly speaking this hasn’t been defined two slides earlier
So we have to treat 𝑃({6}) as unknown
To make sure that our random variable is not ill-defined like this we
need to rule out such situations
Here’s a more robust definition

5 / 36
Definition (Random Variable—second and final attempt)
A random variable on (Ω, ℱ ) is a function 𝑍 ∶ Ω → R such that
{𝜔 ∈ Ω ∶ 𝑍(𝑤) ∈ 𝐵} ∈ ℱ for all 𝐵 ∈ ℬ(R).

ℬ(R) is the 𝜎-algebra generated by the closed intervals [𝑎, 𝑏], for
𝑎, 𝑏 ∈ R
Intuition: ℬ(R) describes all events that can be created out of all
the points on the real line
ℬ(R) is a rich set containing pretty much every subset of R that we
will ever be dealing with (including intervals, points)
I don’t need you to understand all intricacies here
Bottom line is:
The image 𝑍(𝑤) gets pulled back to an element of ℱ for which
probabilities are well-defined
Using this more robust definition, 𝑌 is not a random variable
6 / 36
To see this, pick the subset 𝐵 = {2} from ℬ(R)

• pick 𝐵 = {2}
• {𝜔 ∈ Ω ∶ 𝑌(𝜔) = 2} = {6} ∉ ℱ
• same for 𝐵 = {7}

The problem here is that 𝑌 is not ℱ -measurable

7 / 36
Definition (Distribution or Law)
Given a random variable 𝑍 on a probability space (Ω, ℱ , 𝑃), the
distribution or law of the random variable is the probability
measure defined by
𝜇(𝐵) ∶= 𝑃(𝑍 ∈ 𝐵), 𝐵 ∈ ℬ(R).

We say that 𝜇 is the distribution of 𝑍, or ℒ(𝑍) is the law of 𝑍.

Definition (Distribution Function)

The distribution function of a random variable 𝑍 is defined by
𝐹(𝑧) ∶= 𝜇((−∞, 𝑧]) = 𝑃(𝑍 ≤ 𝑧), 𝑧 ∈ R.

𝐹 is also referred to as cumulative distribution function or cdf.

There is a one-to-one correspondence between distribution and cdfs

So we use them interchangeably

8 / 36
Definition (Weak Convergence)
Let 𝐹 be a distribution function, and {𝐹𝑁 } be a sequence of
distribution functions. Then 𝐹𝑁 converges weakly to 𝐹 if
lim𝑁→∞ 𝐹𝑁 (𝑧) = 𝐹(𝑧) for each 𝑧 at which 𝐹 is continuous.
w
We write 𝐹𝑁 → 𝐹.

w
Equivalently we could say 𝜇𝑁 → 𝜇 for weak convergence

Definition (Convergence in Distribution)

Let 𝑍 be a random variable, and {𝑍𝑁 } be a sequence of random
w
variables. Then 𝑍𝑁 converges in distribution or law to 𝑍 if 𝐹𝑁 → 𝐹.
d
We write 𝑍𝑁 → 𝑍.

Now we turn to a few practical results that will help us soon when we
derive the asymptotic distribution of 𝛽OLS
̂

9 / 36
Theorem (Continuous Mapping Theorem)
d d
If 𝑍𝑁 → 𝑍 then 𝑔(𝑍𝑁 ) → 𝑔(𝑍) for continuous 𝑔.

Corollary
d
If 𝑍𝑁 → 𝑁(0, Ω) then
d
𝐴𝑍𝑁 → 𝑁(0, 𝐴Ω𝐴′ )
d
(𝐴 + o𝑝 (1))𝑍𝑁 → 𝑁(0, 𝐴Ω𝐴′ ),

and since 𝑍 ∼ 𝑁(0, Ω) ⇒ 𝑍′ Ω−1 𝑍 ∼ 𝜒 2 (dim(𝑍)),

d
𝑍′𝑁 Ω−1 𝑍𝑁 → 𝜒 2 (𝑑𝑖𝑚(𝑍𝑁 ))
d
𝑍′𝑁 (Ω + o𝑝 (1))−1 𝑍𝑁 → 𝜒 2 (𝑑𝑖𝑚(𝑍𝑁 )).

10 / 36
𝑁
Another important result for the sample average 𝑍̄ 𝑁 ∶= ∑𝑖=1 𝑍𝑖 /𝑁.

Theorem (Central Limit Theorem (CLT))

Let 𝑍1 , 𝑍2 , … be a sequence of independent and identically
2
distributed random vectors with E ∥𝑍𝑖 ∥ < ∞. Then
d
√𝑁 (𝑍̄ 𝑁 − 𝜇𝑍 ) → N(0, E ((𝑍𝑖 − 𝜇𝑍 )(𝑍𝑖 − 𝜇𝑍 )′ ) ),

where 𝜇𝑧 ∶= E𝑍𝑖 .

Notice:

• ‖𝑧‖ ∶= √𝑧′ 𝑧 is the Euclidian norm here

2
• E ∥𝑍𝑖 ∥ < ∞ is an economical way of saying that all components
of 𝑍𝑖 have finite means, variances, and covariances

The CLT is a remarkable result

p
From the WLLN we know that (𝑍̄ 𝑁 − 𝜇𝑍 ) → 0
At the same time √𝑁 → ∞
Yet their product converges to a normal distribution!
11 / 36
The restrictions imposed in it don’t seem very strong
For example, it does not matter what distribution the 𝑍𝑖 come from
2
(as long as E ∥𝑍𝑖 ∥ < ∞)
The sample average multiplied by √𝑁 converges to a normal
distribution

12 / 36
Conventional terminology with regard to the result
d
√𝑁 (𝑍̄ 𝑁 − 𝜇𝑍 ) → N(0, Ω)

where Ω ∶= E ((𝑍𝑖 − 𝜇𝑍 )(𝑍𝑖 − 𝜇𝑍 )′ )

• 𝑍̄ 𝑁 is asymptotically normally distributed

• The large sample distribution of 𝑍̄ 𝑁 is normal
• Ω is the asymptotic variance of √𝑁 (𝑍̄ 𝑁 − 𝜇𝑍 )
• Ω/𝑁 is the asymptotic variance of 𝑍̄ 𝑁

13 / 36
Primitive usage

• when the sample size 𝑁 is large yet finite

• the sample average 𝑍̄ 𝑁 almost has a normal distribution
• around the population mean 𝜇𝑍
• with variance Ω/𝑁
• irrespective of the underlying distribution of the 𝑍1 , 𝑍2 , …

Practical meaning of CLT: for large sample sizes

𝑎𝑝𝑝𝑟𝑜𝑥
𝑍̄ 𝑁 ∼ 𝑁(𝜇𝑍 , Ω/𝑁)

14 / 36
Let’s sketch the proof for a scalar-version of the CLT, where E𝑍𝑖 = 𝜇𝑍
and Var 𝑍𝑖 = 𝜎𝑍2
We know from undergrad that E𝑍̄ 𝑁 = 𝜇𝑍 and Var 𝑍̄ 𝑁 = 𝜎𝑍2 /𝑁,
therefore CLT says that

d
√𝑁 (𝑍̄ 𝑁 − 𝜇𝑍 ) → N(0, 𝜎𝑍2 )

or, equivalently,
√𝑁 (𝑍̄ 𝑁 − 𝜇𝑍 ) d
→ N(0, 1)
𝜎𝑍

To prove this, we need a new concept

15 / 36
Definition (Moment Generating Function)
Let 𝑍 be a random variable, the moment generating function (mgf)
of 𝑍 is given by 𝑀𝑍 (𝑡) = E (𝑒𝑡𝑍 ), where 𝑡 ∈ R.

Fun facts about the mgf

• The curvature of the mgf at zero describes all moments:

𝑑 𝑘 𝑀𝑍
𝑑𝑡𝑘
= E𝑍𝑘
(0)
𝑘th derivative evaluated at zero is equal to 𝑘th moment
(hence that name)
• not every random variable has a well-defined mgf
(there exists a generalization, called characteristic function that
overcomes this problem, mgf is a slightly less general version
but easier to work with)
• for random variables whose mgf exist:
two random variables have identical distributions if and only if
their mgf are the same

16 / 36
Mgf can be a useful device for establishing limiting distributions
Lemma (Curtiss’ Continuity Theorem)
Let 𝑀𝑍 (𝑡) be the mgf of 𝑍 and let 𝑀𝑍𝑁 (𝑡) be the mgf of 𝑍𝑁 .
d
If lim𝑁→∞ 𝑀𝑍𝑁 (𝑡) = 𝑀𝑍 (𝑡) for every 𝑡 then 𝑍𝑁 → 𝑍.

This is based on Lévy’s Continuity Theorem (1937)

√𝑁(𝑍̄ 𝑁 −𝜇𝑍 ) d
We’re interested in showing 𝜎𝑍 → N(0, 1)
√𝑁(𝑍̄ −𝜇 )
Let’s consider the mgf of 𝑍̃𝑁 ∶= 𝑁
𝜎𝑍
𝑍

and show that its limit is equal to the mgf of a 𝑁(0, 1)

Wait! What is the mgf of the standard normal distribution?
Lemma
2 /2
The mgf of the standard normal distribution is 𝑡 ↦ 𝑒𝑡 .

(Proof: see assignment)

17 / 36
√𝑁(𝑍̄ 𝑁 −𝜇𝑍 ) (∑ 𝑍𝑖 −𝑁𝜇𝑍 )
Notice 𝑍
̃𝑁 ∶=
𝜎𝑍 =
𝜎𝑍 √𝑁

∑(𝑍𝑖 − 𝑁𝜇𝑍 ) ⎞⎞
) = E⎛⎜exp ⎛
̃𝑁
̃𝑁 (𝑡) = E (𝑒
𝑀𝑍 𝑡𝑍 ⎜𝑡 ⎟⎟
⎝ ⎝ 𝜎𝑍 √𝑁 ⎠⎠
(𝑍 − 𝜇 ) (𝑍 − 𝜇 ) (𝑍 − 𝜇𝑍 ) ⎞⎞
⎜exp ⎛
= E⎛ ⎜𝑡 1 𝑍 ⎞
⎟ ⋅ exp ⎛
⎜𝑡 2 𝑍 ⎞
⎟ ⋯ exp ⎛
⎜𝑡 𝑁 ⎟⎟
⎝ ⎝ 𝜎𝑍 √𝑁 ⎠ ⎝ 𝜎 𝑍 √𝑁 ⎠ ⎝ 𝜎 𝑍 √𝑁 ⎠ ⎠
(𝑍 − 𝜇𝑍 ) ⎞⎞ (𝑍 − 𝜇𝑍 ) ⎞⎞
⎜exp ⎛
= E⎛ ⎜𝑡 1 ⎟⎟ ⋯ E ⎛⎜exp ⎛
⎜𝑡 𝑁 ⎟⎟
⎝ ⎝ 𝜎𝑍 √𝑁 ⎠⎠ ⎝ ⎝ 𝜎𝑍 √𝑁 ⎠⎠
𝑁
(𝑍 − 𝜇𝑍 ) ⎞⎞⎞
= E⎛
⎜⎛⎜exp ⎛
⎜𝑡 1 ⎟⎟⎟
⎝⎝ ⎝ 𝜎𝑍 √𝑁 ⎠⎠⎠
𝑁
𝑡 ⎞
= 𝑚𝑍1 ⎛
⎜ ⎟
⎝ 𝜎 𝑧 √𝑁 ⎠

where we define 𝑚𝑍1 (𝑡) ∶= E (𝑒𝑡(𝑍1 −𝜇𝑍 ) )

18 / 36
Copy and paste last line: 𝑚𝑍1 (𝑡) ∶= E (𝑒𝑡(𝑍1 −𝜇𝑍 ) )
Notice that

• 𝑚𝑍1 (0) = 1
• 𝑚′𝑍1 (0) = E(𝑍1 − 𝜇𝑍 ) = 0
• 𝑚″𝑍1 (0) = E(𝑍1 − 𝜇𝑍 )2 = 𝜎𝑍2

Applying a second order Taylor approximation (at zero):

𝑚𝑍1 (𝑡) ≈ 𝑚𝑍1 (0) + 𝑚′𝑍1 (0) ⋅ 𝑡 + (1/2)𝑚″𝑍1 (0) ⋅ 𝑡2
= 1 + (1/2)𝜎𝑍2 ⋅ 𝑡2

and therefore,
𝑡 ⎞ 𝑡2
𝑚𝑍1 ⎛
⎜ ⎟ = 1 + 𝜎𝑍2 ⋅
⎝ 𝜎𝑧 √𝑁 ⎠ 2 ⋅ 𝜎𝑍2 𝑁
𝑡2 /2
= 1+
𝑁

19 / 36
Connecting the dots

𝑁 𝑁
⎛ 𝑡 ⎞ 𝑡2 /2
̃𝑁 (𝑡) = 𝑚𝑍1 ⎜
𝑀𝑍 ⎟ = (1 + )
⎝ 𝜎 𝑧 √𝑁 ⎠ 𝑁

And finally, to evaluate the limit use this result:

Lemma
𝑐 𝑁
lim𝑁→∞ (1 + 𝑁
) = 𝑒𝑐 .

It follows that
𝑁
𝑡2 /2 2 /2
lim 𝑀𝑍
̃𝑁 (𝑡) = lim (1 + ) = 𝑒𝑡 ,
𝑁→∞ 𝑁→∞ 𝑁

which is the mgf of a standard normal distribution

d
It follows that 𝑍
̃𝑁 → 𝑁(0, 1)

20 / 36
Illustration of CLT
The underlying distribution of 𝑍1 , … , 𝑍𝑁 is exponential

21 / 36
Illustration of CLT
The underlying distribution of 𝑍1 , … , 𝑍𝑁 is exponential

22 / 36
Roadmap

Ordinary Least Squares Estimation

Basic Asymptotic Theory (part 2 of 2)
Asymptotic Distribution of the OLS Estimator
Asymptotic Variance Estimation

23 / 36
We know that 𝛽OLS
̂ ∈ 𝐿2
We would like to know the exact distribution of 𝛽OLS
̂ for finite
samples (so-called small sample distribution)
Remember
𝑁 −1 𝑁
𝛽OLS
̂ = 𝛽∗ + (∑ 𝑋𝑖 𝑋𝑖′ ) ∑ 𝑋𝑖 𝑢𝑖
𝑖=1 𝑖=1

𝛽∗ = E(𝑋𝑖 𝑋𝑖′ )−1 E(𝑋𝑖 𝑌𝑖 )

We suspect that 𝛽OLS

̂ |𝑋𝑖 ∼ N(⋅, ⋅) if 𝑢𝑖 ∼ N(⋅, ⋅)

In the absence of such a restrictive assumption, we are unable to

determine the exact distribution of 𝛽OLS
̂

We approximate exact distribution by asymptotic distribution

Our hope is that the asymptotic (aka large sample) distribution is a
good approximation

24 / 36
The CLT will be our main tool in deriving the asymptotic distribution
of 𝛽OLS
̂

Big picture: we already know that 𝛽OLS

̂ − 𝛽∗ = o𝑝 (1)

From what I said earlier, we may suspect that √𝑁(𝛽OLS

̂ − 𝛽∗ ) could
converge to a normal distribution
To derive this result, let’s recall the following representation of the
OLS estimator from last week:
−1
1 𝑁 1 𝑁
𝛽OLS
̂ =𝛽 + ⎜ ∑ 𝑋𝑖 𝑋𝑖 ⎟ ⎛
∗ ⎛ ′ ⎞ ⎜ ∑ 𝑋𝑖 𝑢 𝑖 ⎞
⎟
⎝ 𝑁 𝑖=1 ⎠ ⎝ 𝑁 𝑖=1 ⎠

Let’s re-arrange terms …

25 / 36
Copy and past, for convenience:
−1
𝑁 𝑁
1 1
𝛽OLS
̂ =𝛽∗ +⎛
⎜ ∑ 𝑋𝑖 𝑋𝑖′ ⎞
⎟ ⎛
⎜ ∑ 𝑋𝑖 𝑢 𝑖 ⎞
⎟
𝑁
⎝ 𝑖=1 ⎠ 𝑁
⎝ 𝑖=1 ⎠

Then isolating √𝑁(𝛽OLS

̂ − 𝛽∗ ):
−1
𝑁
1 1 𝑁
̂ − 𝛽∗ ) = ⎛
√𝑁 (𝛽OLS ⎜ ∑ 𝑋𝑖 𝑋𝑖′ ⎞
⎟ ⎛
⎜√𝑁 ⎛
⎜ ∑ 𝑋𝑖 𝑢 𝑖 ⎞
⎟⎞
⎟
𝑁
⎝ 𝑖=1 ⎠ ⎝ ⎝ 𝑁 𝑖=1 ⎠⎠

Can you see how the CLT can now be applied to the second factor on
the rhs?
Let’s break the rhs up again into its bits and pieces

26 / 36
We’ve already shown last week that, given E(𝑋𝑖 𝑋𝑖′ ) < ∞,
−1
𝑁
⎛ 1
⎜ ∑ 𝑋𝑖 𝑋𝑖′ ⎞
⎟ = E(𝑋𝑖 𝑋𝑖′ )−1 + o𝑝 (1) = O𝑝 (1)
⎝ 𝑁 𝑖=1 ⎠

For the second factor on the rhs, we know that E (∑ 𝑋𝑖 𝑢𝑖 /𝑁) = 0,

then applying the CLT is easy:
⎛ 1 𝑁 d
⎜√𝑁 ⎛⎜ ∑ 𝑋𝑖 𝑢𝑖 ⎞⎟⎞⎟→ N(0, E(𝑢2𝑖 𝑋𝑖 𝑋𝑖′ ))
⎝ 𝑁
⎝ 𝑖=1 ⎠⎠

27 / 36
Using our tools from basic asymptotic theory (part 2)
Proposition (Asymptotic Distribution of OLS Estimator)
−1
𝑁 𝑁
√𝑁 (𝛽OLS
̂ − 𝛽∗ ) =⎛
⎜𝑁 −1 ∑ 𝑋𝑖 𝑋𝑖′ ⎞
⎟ ⎛
⎜𝑁 −1/2 ∑ 𝑋𝑖 𝑢𝑖 ⎞
⎟
⎝ 𝑖=1 ⎠ ⎝ 𝑖=1 ⎠
d
→ N(0, Ω)

where Ω ∶= E(𝑋𝑖 𝑋𝑖′ )−1 E(𝑢2𝑖 𝑋𝑖 𝑋𝑖′ )E(𝑋𝑖 𝑋𝑖′ )−1 .

Ω is the asymptotic variance of √𝑁 (𝛽OLS

̂ − 𝛽∗ )

Ω/𝑁 is the asymptotic variance of 𝛽OLS

We take this to mean that 𝛽OLS

̂ has an approximate normal
distribution with mean 𝛽 and variance Ω/𝑁
∗

28 / 36
Roadmap

Ordinary Least Squares Estimation

Basic Asymptotic Theory (part 2 of 2)
Asymptotic Distribution of the OLS Estimator
Asymptotic Variance Estimation

29 / 36
The asymptotic variance of √𝑁(𝛽OLS
̂ − 𝛽∗ ) is

Ω ∶= E(𝑋𝑖 𝑋𝑖′ )−1 E(𝑢2𝑖 𝑋𝑖 𝑋𝑖′ )E(𝑋𝑖 𝑋𝑖′ )−1

The rhs is a function of unobserved population moments

How would we estimate Ω?
𝑁
Clearly, we estimate E(𝑋𝑖 𝑋𝑖′ ) by (1/𝑁) ∑𝑖=1 𝑋𝑖 𝑋𝑖′
But what about E(𝑢2𝑖 𝑋𝑖 𝑋𝑖′ )?
We don’t know 𝑢𝑖

30 / 36
𝑁
If we observed 𝑢𝑖 then we would surely use (1/𝑁) ∑𝑖=1 𝑢2𝑖 𝑋𝑖 𝑋𝑖′
That would be an unbiased variance estimator
But we don’t observe the errors 𝑢𝑖 , instead we “observe” the
residuals 𝑢𝑖̂ ∶= 𝑌𝑖 − 𝑋𝑖′ 𝛽OLS
̂
𝑁
So how about using (1/𝑁) ∑𝑖=1 𝑢2𝑖̂ 𝑋𝑖 𝑋𝑖′ to estimate the middle piece?
While this is in principal the right idea, it results in a biased variance
estimator
Let’s try understand the source of this bias

31 / 36
First some new tools
Let 𝑀𝑋 ∶= 𝐼𝑁 − 𝑃𝑋 with 𝑃𝑋 ∶= 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′
Then 𝑢̂ = 𝑀𝑋 𝑢
Cool facts about 𝑀𝑋 :
𝑀𝑋 = 𝑀 𝑋 ′ (symmetric) and 𝑀 𝑀 = 𝑀 (idempotent)
𝑋 𝑋 𝑋

The trace of a 𝐾 × 𝐾 matrix is the sum of its diagonal elements:

𝐾
tr 𝐴 ∶= ∑𝑖=1 𝑎𝑖𝑖
Savvy tricks: tr (𝐴𝐵) = tr (𝐵𝐴) and tr (𝐴 + 𝐵) = tr 𝐴 + tr 𝐵
Then
𝑁
tr (𝑢𝑢
̂ ′̂ ) tr (𝑢′̂ 𝑢)̂ tr ((𝑀𝑋 𝑢)′ (𝑀𝑋 𝑢))
𝜎̂ 𝑢2 ∶= ∑ 𝑢2𝑖̂ /𝑁 = = =
𝑖=1
𝑁 𝑁 𝑁
tr (𝑢′ 𝑀𝑋′ 𝑀 𝑢)
𝑋 tr (𝑢′ 𝑀𝑋 𝑢) tr (𝑀𝑋 𝑢𝑢′ )
= = =
𝑁 𝑁 𝑁

Aside: dim 𝑀𝑋 = 𝑁 × 𝑁 and dim(𝑢𝑢′ ) = 𝑁 × 𝑁

32 / 36
Now studying the conditional expectation
E (𝜎̂ 𝑢2 |𝑋) = E (tr (𝑀𝑋 𝑢𝑢′ )|𝑋) /𝑁
= tr (𝐸 (𝑀𝑋 𝑢𝑢′ |𝑋)) /𝑁
= tr (𝑀𝑋 E (𝑢𝑢′ |𝑋)) /𝑁
= 𝜎𝑢2 ⋅ tr (𝑀𝑋 ) /𝑁
= 𝜎𝑢2 ( 𝑁−𝐾
𝑁
)
< 𝜎𝑢2 ,

where in the fourth equality we simplified our lives by setting

E(𝑢𝑢′ |𝑋) = 𝜎𝑢2 𝐼𝑁 (conditional homoskedasticity)
(The fifth equality will be justified in Assignment 3)
Big picture: 𝜎̂ 𝑢2 is downwards biased which is not good
Confidence intervals based on 𝜎̂ 𝑢2 would be too narrow
Statistical inference based on 𝜎̂ 𝑢2 would be too optimistic

33 / 36
There is an easy fix!
𝑁 1 𝑁
Use 𝑠2𝑢 ∶= 𝜎̂ 2
𝑁−𝐾 𝑢
= 𝑁−𝐾
∑𝑖=1 𝑢2𝑖̂ instead
Obviously 𝑠2𝑢 will be unbiased
I’m not particularly concerned about this bias
That’s because 𝑁 should be a much larger number than 𝐾
The whole idea of using asymptotic approximations to finite sample
distributions is to let 𝑁 → ∞ while 𝐾 is fixed
In other words lim𝑁→∞ 𝜎̂ 𝑢2 = lim𝑁→∞ 𝑠2𝑢
(asymptotic bias is the same)

34 / 36
Combining things, we propose the following asymptotic variance
estimator
Definition (Asymptotic Variance Estimator)
−1 −1
𝑁 𝑁 𝑁
Ω̂ = ⎛
⎜𝑁1
∑ 𝑋𝑖 𝑋𝑖′ ⎞
⎟ ⎛ 1
⎜ 𝑁−𝐾 ∑ 𝑢2𝑖̂ 𝑋𝑖 𝑋𝑖′ ⎞
⎟⎛ 1
⎜𝑁 ∑ 𝑋𝑖 𝑋𝑖′ ⎞
⎟
⎝ 𝑖=1 ⎠ ⎝ 𝑖=1 ⎠ ⎝ 𝑖=1 ⎠

Stata calculates Ω̂ when you type something like

regress lwage schooling experience, robust

Textbooks call Ω̂ the heteroskedasticity robust variance estimator

The standard errors derived from Ω̂ are sometimes referred to as
Eicker-Huber-White standard errors
(or some subset permutation of these names)

35 / 36
Notice: Wooldridge, on page 61, proposes this version
Definition (Asymptotic Variance Estimator)
−1 −1
𝑁 𝑁 𝑁
Ω̂ Wool
dridge
=⎛
⎜𝑁1
∑ 𝑋𝑖 𝑋𝑖′ ⎞
⎟ ⎛
⎜𝑁1
∑ 𝑢2𝑖̂ 𝑋𝑖 𝑋𝑖′ ⎞
⎟⎛ 1
⎜𝑁 ∑ 𝑋𝑖 𝑋𝑖′ ⎞
⎟
⎝ 𝑖=1 ⎠ ⎝ 𝑖=1 ⎠ ⎝ 𝑖=1 ⎠

This is NOT what Stata implements

(to the best of my knowledge)
But from what I said earlier, it merely creates rounding error
Asymptotically they are all identical
(because 𝐾 is a finite number)

36 / 36

Statistics Course Review Notes
No ratings yet
Statistics Course Review Notes
20 pages
MIT14 384F13 Rec7
No ratings yet
MIT14 384F13 Rec7
6 pages
Chapter3 Asymtotic Stats
No ratings yet
Chapter3 Asymtotic Stats
114 pages
Econometrics I 11
No ratings yet
Econometrics I 11
47 pages
Econ 623 AsymptoticTheory 2023
No ratings yet
Econ 623 AsymptoticTheory 2023
74 pages
ch5 Handout
No ratings yet
ch5 Handout
6 pages
CH 2
No ratings yet
CH 2
24 pages
Two Proofs of The Central Limit Theorem
No ratings yet
Two Proofs of The Central Limit Theorem
13 pages
College Statistics
No ratings yet
College Statistics
244 pages
Bgpev2 Asymptotic
No ratings yet
Bgpev2 Asymptotic
31 pages
CLT PDF
No ratings yet
CLT PDF
13 pages
Lec 7
No ratings yet
Lec 7
7 pages
Handout For Chapters 1-3 of Bouchaud: 1 Denitions
No ratings yet
Handout For Chapters 1-3 of Bouchaud: 1 Denitions
10 pages
ემპირიული პროცესები
No ratings yet
ემპირიული პროცესები
131 pages
Convergence of Binomial, Poisson, Negative-Binomial, and Gamma To Normal Distribution: Moment Generating Functions Technique
No ratings yet
Convergence of Binomial, Poisson, Negative-Binomial, and Gamma To Normal Distribution: Moment Generating Functions Technique
7 pages
Section 53
No ratings yet
Section 53
35 pages
MIT14 30s09 Lec17
No ratings yet
MIT14 30s09 Lec17
9 pages
Lecture Notes 1: Brief Review of Basic Probability (Casella and Berger Chapters 1-4)
100% (1)
Lecture Notes 1: Brief Review of Basic Probability (Casella and Berger Chapters 1-4)
14 pages
Delta Method
No ratings yet
Delta Method
10 pages
Central Limit Theorem
No ratings yet
Central Limit Theorem
3 pages
Central Limit Theorem
100% (1)
Central Limit Theorem
20 pages
Lecture 1: Stochastic Convergence and CLT
No ratings yet
Lecture 1: Stochastic Convergence and CLT
102 pages
Lec 6
No ratings yet
Lec 6
7 pages
Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
100% (1)
Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
14 pages
1.7.1 Moments and Moment Generating Functions: Chapter 1. Elements of Probability Distribution Theory
No ratings yet
1.7.1 Moments and Moment Generating Functions: Chapter 1. Elements of Probability Distribution Theory
8 pages
1 Notes On Brownian Motion: 1.1 Normal Distribution
No ratings yet
1 Notes On Brownian Motion: 1.1 Normal Distribution
15 pages
Proof Central Limit Theorem
No ratings yet
Proof Central Limit Theorem
4 pages
Limiting Distributions
No ratings yet
Limiting Distributions
10 pages
Probability Distributions & Theorems
No ratings yet
Probability Distributions & Theorems
18 pages
Introduction
No ratings yet
Introduction
11 pages
Asymptotic Statistics (By Changliang ZOU)
No ratings yet
Asymptotic Statistics (By Changliang ZOU)
115 pages
Econometrics Homework Solutions
No ratings yet
Econometrics Homework Solutions
11 pages
Normal Distribution Explained
No ratings yet
Normal Distribution Explained
4 pages
Chap 1samp Distributions
No ratings yet
Chap 1samp Distributions
7 pages
Notes
No ratings yet
Notes
10 pages
Random Signals: 1 Kolmogorov's Axiomatic Definition of Probability
No ratings yet
Random Signals: 1 Kolmogorov's Axiomatic Definition of Probability
14 pages
Classical Normal Linear Regression Model
No ratings yet
Classical Normal Linear Regression Model
13 pages
Data Science Probability Review
No ratings yet
Data Science Probability Review
12 pages
Math408 Lecture 9 10
No ratings yet
Math408 Lecture 9 10
17 pages
Chapter II. Random Signals
No ratings yet
Chapter II. Random Signals
31 pages
405 Econometrics: Domodar N. Gujarati
No ratings yet
405 Econometrics: Domodar N. Gujarati
12 pages
MIT18 S096F13 Lecnote3
No ratings yet
MIT18 S096F13 Lecnote3
7 pages
Lecture Notes 4 Convergence (Chapter 5) 1 Random Samples: 1 N N 1 N N I
No ratings yet
Lecture Notes 4 Convergence (Chapter 5) 1 Random Samples: 1 N N 1 N N I
12 pages
Empirical Process (Sara Van de Geer)
No ratings yet
Empirical Process (Sara Van de Geer)
91 pages
Probability Theorems Explained
No ratings yet
Probability Theorems Explained
43 pages
Section06 Solutions
No ratings yet
Section06 Solutions
11 pages
Probability Theory Notes Chapter 2 Varadhan
No ratings yet
Probability Theory Notes Chapter 2 Varadhan
20 pages
Convergence of Random Variables
No ratings yet
Convergence of Random Variables
11 pages
Uniform and Normal Distribution
No ratings yet
Uniform and Normal Distribution
4 pages
Lecture 11: Standard Error, Propagation of Error, Central Limit Theorem in The Real World
No ratings yet
Lecture 11: Standard Error, Propagation of Error, Central Limit Theorem in The Real World
13 pages
Lect 2
No ratings yet
Lect 2
7 pages
Random Variables
No ratings yet
Random Variables
8 pages
Convergence Concepts: 2.1 Convergence of Random Variables
No ratings yet
Convergence Concepts: 2.1 Convergence of Random Variables
6 pages
Stochastic
No ratings yet
Stochastic
63 pages
Asymptotic Theory & Inference Guide
No ratings yet
Asymptotic Theory & Inference Guide
32 pages
Random Variables: 1.1 Elementary Examples
No ratings yet
Random Variables: 1.1 Elementary Examples
14 pages
1.8. Large Deviation and Some Exponential Inequalities.: B R e DX Essinf G (X), T e DX Esssup G (X)
No ratings yet
1.8. Large Deviation and Some Exponential Inequalities.: B R e DX Essinf G (X), T e DX Esssup G (X)
4 pages
Biochemical Engineering Toutorial Questions
No ratings yet
Biochemical Engineering Toutorial Questions
3 pages
03 - Platelets and Von Willebrand Factor
No ratings yet
03 - Platelets and Von Willebrand Factor
35 pages
The Structure of Kaolinite and Metakaolinite
No ratings yet
The Structure of Kaolinite and Metakaolinite
4 pages
M - Light Duty Aries Parts and Service Manual
No ratings yet
M - Light Duty Aries Parts and Service Manual
26 pages
.Design and Analysis of An Aerial Scissor Lift
No ratings yet
.Design and Analysis of An Aerial Scissor Lift
14 pages
Plagiarism Checker X Originality Report: Similarity Found: 18%
No ratings yet
Plagiarism Checker X Originality Report: Similarity Found: 18%
5 pages
PEC Practical No 13
No ratings yet
PEC Practical No 13
4 pages
EXECUTIVE SUMMARY (Nanthini Palanesany) : Name Email Address Prakash A/L Neela Mehan Muhammad Nazmi Bin Sharin
No ratings yet
EXECUTIVE SUMMARY (Nanthini Palanesany) : Name Email Address Prakash A/L Neela Mehan Muhammad Nazmi Bin Sharin
4 pages
Performance Analysis of Fuzzy Logic Controlled DC-DC Converters - OK
No ratings yet
Performance Analysis of Fuzzy Logic Controlled DC-DC Converters - OK
7 pages
Text Extraction Techniques Guide
No ratings yet
Text Extraction Techniques Guide
23 pages
Gotu+Kola+p e +80%COA 11)
No ratings yet
Gotu+Kola+p e +80%COA 11)
1 page
Stack Data Structures Guide
No ratings yet
Stack Data Structures Guide
30 pages
Homework-1 Counting Technique
No ratings yet
Homework-1 Counting Technique
9 pages
Beam Deflection Assignment
No ratings yet
Beam Deflection Assignment
4 pages
Chemical Engineering Process Design: Mody and Marchildon: Chapter 16 Mixing and Agitation
No ratings yet
Chemical Engineering Process Design: Mody and Marchildon: Chapter 16 Mixing and Agitation
11 pages
Industrial RO System Dubai
No ratings yet
Industrial RO System Dubai
34 pages
Integral Calculus: Engr. Rodriguez
No ratings yet
Integral Calculus: Engr. Rodriguez
9 pages
Exotel: India's Cloud Telephony Leader
No ratings yet
Exotel: India's Cloud Telephony Leader
10 pages
Boosting in Finance Automation
No ratings yet
Boosting in Finance Automation
186 pages
List of Elements in Order of Electronegativity χ
No ratings yet
List of Elements in Order of Electronegativity χ
4 pages
Ovens, Also With Safety Technology According To EN 1539: TR 60 - TR 1050
No ratings yet
Ovens, Also With Safety Technology According To EN 1539: TR 60 - TR 1050
2 pages
Manual - Orange Instruments
No ratings yet
Manual - Orange Instruments
27 pages
Mine Fire
No ratings yet
Mine Fire
64 pages
Subject-Verb Agreement: - The Subject and The Verb Must Agree With Each Other
No ratings yet
Subject-Verb Agreement: - The Subject and The Verb Must Agree With Each Other
10 pages
OSHA Hoist & Crane Safety Guide
No ratings yet
OSHA Hoist & Crane Safety Guide
45 pages
Compact Fluorescent Lamps: Spectrum - The Ge Lighting Lamp Catalogue
No ratings yet
Compact Fluorescent Lamps: Spectrum - The Ge Lighting Lamp Catalogue
28 pages
SAP MM ABAP Design Specification
No ratings yet
SAP MM ABAP Design Specification
2 pages
Control Cable - Schedule - BCCL
No ratings yet
Control Cable - Schedule - BCCL
11 pages
EC GATE 2010: Q. No. 1 - 25 Carry One Mark Each
No ratings yet
EC GATE 2010: Q. No. 1 - 25 Carry One Mark Each
16 pages
Architecture and Design Fundamentals Slides
No ratings yet
Architecture and Design Fundamentals Slides
11 pages

Lecture 3

Uploaded by

Lecture 3

Uploaded by

Advanced Econometrics I

Ordinary Least Squares Estimation

• Ω is the outcome space

Example (Only Looks Like Rolling a Die)

Notice that 𝑃({2}) is not specified

Induced probability Pr(𝑋 = 18) ∶= 𝑃({2, 4, 6}) = 1/2

Instead of writing Pr(𝑋 = 18) I will use 𝑃(𝑋 = 18)

Induced probability Pr(𝑌 = 2) ∶= 𝑃({6}) = ?

The problem here is that 𝑌 is not ℱ -measurable

We say that 𝜇 is the distribution of 𝑍, or ℒ(𝑍) is the law of 𝑍.

Definition (Distribution Function)

𝐹 is also referred to as cumulative distribution function or cdf.

There is a one-to-one correspondence between distribution and cdfs

Definition (Convergence in Distribution)

and since 𝑍 ∼ 𝑁(0, Ω) ⇒ 𝑍′ Ω−1 𝑍 ∼ 𝜒 2 (dim(𝑍)),

Theorem (Central Limit Theorem (CLT))

• ‖𝑧‖ ∶= √𝑧′ 𝑧 is the Euclidian norm here

The CLT is a remarkable result

where Ω ∶= E ((𝑍𝑖 − 𝜇𝑍 )(𝑍𝑖 − 𝜇𝑍 )′ )

• 𝑍̄ 𝑁 is asymptotically normally distributed

• when the sample size 𝑁 is large yet finite

Practical meaning of CLT: for large sample sizes

To prove this, we need a new concept

Fun facts about the mgf

• The curvature of the mgf at zero describes all moments:

This is based on Lévy’s Continuity Theorem (1937)

and show that its limit is equal to the mgf of a 𝑁(0, 1)

(Proof: see assignment)

where we define 𝑚𝑍1 (𝑡) ∶= E (𝑒𝑡(𝑍1 −𝜇𝑍 ) )

Applying a second order Taylor approximation (at zero):

And finally, to evaluate the limit use this result:

which is the mgf of a standard normal distribution

Ordinary Least Squares Estimation

𝛽∗ = E(𝑋𝑖 𝑋𝑖′ )−1 E(𝑋𝑖 𝑌𝑖 )

We suspect that 𝛽OLS

In the absence of such a restrictive assumption, we are unable to

We approximate exact distribution by asymptotic distribution

Big picture: we already know that 𝛽OLS

From what I said earlier, we may suspect that √𝑁(𝛽OLS

Let’s re-arrange terms …

Then isolating √𝑁(𝛽OLS

For the second factor on the rhs, we know that E (∑ 𝑋𝑖 𝑢𝑖 /𝑁) = 0,

where Ω ∶= E(𝑋𝑖 𝑋𝑖′ )−1 E(𝑢2𝑖 𝑋𝑖 𝑋𝑖′ )E(𝑋𝑖 𝑋𝑖′ )−1 .

Ω is the asymptotic variance of √𝑁 (𝛽OLS

Ω/𝑁 is the asymptotic variance of 𝛽OLS

We take this to mean that 𝛽OLS

Ordinary Least Squares Estimation

Ω ∶= E(𝑋𝑖 𝑋𝑖′ )−1 E(𝑢2𝑖 𝑋𝑖 𝑋𝑖′ )E(𝑋𝑖 𝑋𝑖′ )−1

The rhs is a function of unobserved population moments

The trace of a 𝐾 × 𝐾 matrix is the sum of its diagonal elements:

Aside: dim 𝑀𝑋 = 𝑁 × 𝑁 and dim(𝑢𝑢′ ) = 𝑁 × 𝑁

where in the fourth equality we simplified our lives by setting

Stata calculates Ω̂ when you type something like

regress lwage schooling experience, robust

Textbooks call Ω̂ the heteroskedasticity robust variance estimator

This is NOT what Stata implements

You might also like