Econometric Analysis of Cross Section and Panel Data
Lecture 3: Finite-Sample Properties of the OLS Estimator
Zhian Hu
Central University of Finance and Economics
Fall 2024
1 / 25
This Lecture
▶ Hansen (2022): Chapter 4
▶ We investigate some finite-sample properties of the least squares estimator in the
linear regression model.
▶ In particular we calculate its finite-sample expectation and covariance matrix and
propose standard errors for the coefficient estimators.
2 / 25
Sample Mean in an Intercept-Only Model
OLS \frac{\sum Y_i}{n}
▶ The intercept-only model: Y = µ + e, assuming that E[e] = 0 and E[e 2 ] = σ 2 .
▶ In this model µ = E[Y ] is the expectation of Y . Given a random sample, the
least squares estimator µ
b = Ȳ equals the sample mean.
▶ We now calculate the expectation and variance of the estimator Ȳ .
" n # n Y
1X 1X
E[Ȳ ] = E Yi = E [Yi ] = µ
n n
i=1 i=1
▶ An estimator with the property that its expectation equals the parameter it is
estimating is called unbiased.
▶ An estimator θb for θ is unbiased if E[θ]b =θ
OLS E[], Y \beta
n^{-1}\sum Y_i
3 / 25
Sample Mean in an Intercept-Only Model
▶ We next calculate the variance of the estimator.
1 Pn
▶ Making the substitution Yi = µ + ei we find Ȳ − µ = n i=1 ei .
!
n n
1X 1X
var[Ȳ ] = E (Ȳ − µ)2 = E
ei ej
n n
i=1 j=1
n X
n n
1 X 1 X
= E [ei ej ] = σ2 i,i
n2 n2
i=1 j=1 i=1
1
= σ2.
n E[e_i]=0,E[e_j]=0
▶ The second-to-last inequality is because E [ei ej ] = σ2 for i = j but E [ei ej ] = 0 for
i ̸= j due to independence. E[e_ie_i]=0
4 / 25
Linear Regression Model
▶ We now consider the linear regression model.
▶ The variables (Y , X ) satisfy the linear regression equation
Y = X ′β + e
E[e | X ] = 0
The variables have finite second moments
E Y2 < ∞
E∥X ∥2 < ∞
and an invertible design matrix QXX = E [XX ′ ] > 0
▶ Homoskedastic Linear Regression Model: E e 2 | X = σ 2 (X ) = σ 2 is independent
of X .
5 / 25
Expectation of Least Squares Estimator
▶ The OLS estimator is unbiased in the linear regression model.
▶ In summation notation:
!−1
n n
!
h i X X
E βb | X1 , . . . , Xn = E Xi Xi′ Xi Yi | X1 , . . . , Xn
i=1 i=1
n
!−1 " n
! #
X X
X = Xi Xi′ E Xi Yi | X1 , . . . , Xn
i=1 i=1
n
!−1 n
X X
= Xi Xi′ E [Xi Yi | X1 , . . . , Xn ]
i=1 i=1
n
!−1 n
X X
= Xi Xi′ Xi E [Yi | Xi ] X_i X_j
i=1 i=1
n
!−1 n
E[Y_i|X_i]=X
X X _i\beta
= Xi Xi′ Xi Xi′ β = β.
i=1 i=1 6 / 25
Expectation of Least Squares Estimator
▶ In matrix notation: h i
−1 ′
E[βb | X ] = EX ′X X Y |X
−1 ′
= X ′X X E[Y | X ]
−1
= X ′X X ′X β
= β.
▶ In the linear regression model with i.i.d. sampling
E[βb | X ] = β
▶ Using the law of iterative expectation, we can further prove that E[β]
b = β.
7 / 25
Variance of Least Squares Estimator
(A^{-1})^T=(A^T)^{-1}
var[Z]ij=cov(Z_iZ_j)
▶ For any r × 1 random vector Z , define the r × r covariance matrix
var[Z ] = E (Z − E[Z ])(Z − E[Z ])′ = E ZZ ′ − (E[Z ])(E[Z ])′
▶ For any pair (Z , X ), define the conditional covariance matrix
var[Z | X ] = E (Z − E[Z | X ])(Z − E[Z | X ])′ | X
▶ We define Vβb def
= var[βb | X ] as the conditional covariance matrix of the regression
coefficient estimators.
▶ We now derive its form.
8 / 25
Variance of Least Squares Estimator
▶ The conditional covariance matrix of the n × 1 regression error e is the n × n
matrix. def
var[e | X ] = E ee ′ | X = D
var[e|x]_{ij}=E[e_ie_j|x]
▶ The i th diagonal element of D is
E[e_i^2|X]=E[(e_i-E[e_i|x])^2|x]
2 2 2
E ei | X = E ei | Xi = σi
▶ The ij th off-diagonal element of D is
E [ei ej | X ] = E (ei | Xi ) E [ej | Xj ] = 0
9 / 25
Variance of Least Squares Estimator
▶ Thus D is a diagonal matrix with i th diagonal element σi2 :
2
σ1 0 · · · 0
2
0 σ2 · · · 0
D = diag σ12 , . . . , σn2 = . . .
.. .. . . . ..
0 0 · · · σn 2
+LRM
▶ In the special case of the linear homoskedastic regression model, then
E ei2 | Xi = σi2 = σ 2 and we have the simplification D = In σ 2 . In general,
however, D need not necessarily take this simplified form.
+LRM
10 / 25
Variance of Least Squares Estimator
▶ For any n × r matrix A = A(X ),
var A′ Y | X = var A′ e | X = A′ DA
▶ In particular, we can write βb = A′ Y where A = X (X ′ X )−1 and thus
−1 −1
Vβb = var[βb | X ] = A′ DA = X ′ X X ′ DX X ′ X
▶ It is useful to note that X ′ DX = ni=1 Xi Xi′ σi2 , a weighted version of X ′ X .
P
▶ In the special case of the linear homoskedastic regression model, D = In σ 2 , so
X ′ DX = X ′ X σ 2 , and the covariance matrix simplifies to Vβb = (X ′ X )−1 σ 2
" "
V_{\beta}=(X'X)^{-1}X'DX(X'X)^{-1}
11 / 25
Gauss-Markov Theorem
▶ Write the homoskedastic linear regression model in vector format as
BLUE=Best+Linear+Unbiased+Estimator
Y = Xβ + e
E[e | X ] = 0
var[e | X ] = In σ 2
▶ In this model we know that the least squares estimator is unbiased for β and has
covariance matrix σ 2 (X ′ X )−1 .
▶ Is there an alternative unbiased estimator βe which has a smaller covariance
matrix?
Linear:\Theta=A'Y,\theta=\sum a_i·y_i,
\beta=(X'X)^{-1}X'Y,A'=(X'X)^{-1}X'
Unbiased: if linear, it is unbiased
12 / 25
Gauss-Markov Theorem
▶ Take the homoskedastic linear regression model. If βe is an unbiased
estimator of β then var[βe | X ] ≥ σ 2 (X ′ X )−1
▶ Since the variance of the OLS estimator is exactly equal to this bound this means
that no unbiased estimator has a lower variance than OLS. Consequently we
describe OLS as efficient in the class of unbiased estimators.
▶ Let’s restrict attention to linear estimator of β, which are estimators that can be
written as βe = A′ Y , where A = A(X ) is an m × n function of the regressors X .
▶ This restriction give rise to the description of OLS as the best linear unbiased
estimator (BLUE).
OLS
\beta=A'·Y
:E(β|X)=β
13 / 25
Gauss-Markov Theorem
▶ For βe = A′ Y we have
E[βe | X ] = A′ E[Y | X ] = A′ X β
▶ Then βe is unbiased for all β if (and only if) A′ X = Ik . Furthermore
E[(β-E(β|X))(β-E
var[βe | X ] = var A′ Y | X = A′ DA = A′ Aσ 2
(β|
X))'|X]=E[(\hat{β}-
▶ the last equality using the homoskedasticity assumption. To establish
β)(\the
Theorem we need to show that for any such matrix A hat{β}-β)'|X]=E[A'
−1 ee'A|
A′ A ≥ X ′ X X]=
▶ Set C = A − X (X ′ X )−1 . Note that X ′ C = 0. We calculate that
−1 ′
−1 −1 −1
A′ A − (X ′ X ) = C + X (X ′ X ) C + X (X ′ X ) − (X ′ X )
−1 −1
= C ′ C + C ′ X (X ′ X ) + (X ′ X ) X ′C
−1 −1 −1
+ (X ′ X ) X ′ X (X ′ X ) − (X ′ X )
= C ′ C ≥ 0.
14 / 25
Generalized Least Squares OLS
▶ Take the linear regression model in matrix format Y = X β + e
▶ Consider a generalized situation where the observation errors are possibly
correlated and/or heteroskedastic.
E[e | X ] = 0
var[e | X ] = Σσ 2
for some n × n matrix Σ > 0 (possibly a function of X ) and some scalar σ 2 . This
includes the independent sampling framework where Σ is diagonal but allows for
non-diagonal covariance matrices as well.
▶ Under these assumptions, we can calculate the expectation and variance of the
OLS estimator:
E[βb | X ] = β
−1 −1
var[βb | X ] = σ 2 X ′ X X ′ ΣX X ′ X
15 / 25
Generalized Least Squares
▶ In this case, the OLS estimator is not efficient. Instead, we develop the
Generalized Least Squares (GLS) estimator of β.
▶ When Σ is known, take the linear model and pre-multiply by Σ−1/2 . This
produces the equation Ye = X e β + ee where Ye = Σ−1/2 Y , Xe = Σ−1/2 X , and
ee = Σ −1/2 e.
▶ Consider OLS estimation of β in this equation.
−1
βegls = Xe ′X
e e ′ Ye
X
′ −1 ′
−1/2 −1/2
= Σ X Σ X Σ−1/2 X Σ−1/2 Y
−1 E[\tidle{e}\tidle{e}'|X]
= X ′ Σ−1 X X ′ Σ−1 Y =E[\sigma^{-frac{1}{2}ee'\sigma^{-frac
▶ You can calculate that {1}{2}}]
h i =\sigma^{-frac{1}{2}}E[ee'|X]\sigma^
E β̃gls | X = β {-frac{1}{2}
=I_n\sigma^2
h i −1
var βegls | X = σ 2 X ′ Σ−1 X
16 / 25
Residuals
▶ What are some properties of the residuals ebi = Yi − Xi′ βb in the context of the
linear regression model?
▶ Recall that eb = Me
e | X ] = E[Me | X ] = ME[e | X ] = 0
E[b
e | X ] = var[Me | X ] = M var[e | X ]M = MDM
var[b
▶ Under the assumption of conditional homoskedasticity
e | X ] = Mσ 2
var[b
▶ In particular, for a single observation i we can find the variance of ebi by taking the
i th diagonal element. Since the i th diagonal element of M is 1 − hii we obtain
ei | X ] = E êi2 | X = (1 − hii ) σ 2
var [b
▶ Can you show the conditional expectation and variance of the prediction errors
eei = Yi − Xi′ βb(−i) ?
i i h_ii
17 / 25
Estimation of Error Variance
▶ The error variance σ 2 = E e 2 can be a parameter of interest.
▶ One estimator is the sample average of the squared residuals:
n
2 1X 2
σ
b = ebi
n
i=1
▶ We can calculate the expectation of b2 :
σ
1 1 1
b2 = e ′ Me = tr e ′ Me = tr Mee ′
σ
n n n
▶ Then MM=M
1
b | X = tr E Mee ′ | X
2
E σ
n
1
= tr ME ee ′ | X
n
1
= tr(MD)
n
n
1X
= (1 − hii ) σi2
n 18 / 25
Estimation of Error Variance
▶ Adding the assumption of conditional homoskedasticity
2 n−k
2 1 2
E σb | X = tr Mσ = σ
n n M n-k
b2 is biased towards zero.
which means that σ
▶ So we can define an unbiased estimator by rescaling
n
1 X 2
s2 = ebi
n−k
i=1
▶ By the above calculation E s 2 | X = σ 2 and E s 2 = σ 2 . Hence the estimator
s 2 is unbiased for σ 2 . Consequently, s 2 is known as the bias-corrected estimator
for σ 2 and in empirical practice s 2 is the most widely used estimator for σ 2 .
19 / 25
Covariance Matrix Estimation Under Homoskedasticity
▶ For inference we need an estimator of the covariance matrix Vβb of the least
squares estimator.
▶ Under homoskedasticity the covariance matrix takes the simple form
−1
Vβb0 = X ′ X σ2
▶ Replacing σ 2 with its estimator s 2 , we have
−1
Vbβb0 = X ′ X s2
▶ Since s 2 is conditionally unbiased for σ 2 it is simple to calculate that Vbb0 is
β
conditionally unbiased for Vβb under the assumption of homoskedasticity:
h i −1 2 −1 2
E Vbβb0 | X = X ′ X E s | X = X ′X
σ = Vβb
20 / 25
Covariance Matrix Estimation Under Homoskedasticity
▶ This was the dominant covariance matrix estimator in applied econometrics for
many years and is still the default method in most regression packages.
▶ Stata uses the covariance matrix estimator by default in linear regression unless an
alternative is specified.
▶ However, the above covariance matrix estimator can be highly biased if
homoskedasticity fails.
21 / 25
Covariance Matrix Estimation Under Heteroskedasticity
▶ Recall that the general form for the covariance matrix is
−1 −1
Vβb = X ′ X X ′ DX X ′ X
▶ This depends on the unknown matrix D: D = diag σ12 , . . . , σn2 = E [ee ′ | X ]
▶ An ideal but infeasible estimator is
−1 ′ −1
Vbβbideal = X ′ X X DX
e X ′X
n
!
−1 −1
X
= X ′X Xi Xi′ ei2 X ′ X
i=1
h i
▶ You can verify that E Vbbideal | X = Vβb. However, the errors ei2 are unobserved.
β
22 / 25
Covariance Matrix Estimation Under Heteroskedasticity
▶ We can replace ei2 with the squared residuals ebi2 .
n
!
′
−1 X −1
VbβbHC0 = XX Xi Xi′ ebi2 X ′X
i=1
▶ The label ”HC” refers to ”heteroskedasticity-consistent”. The label ”HC0” refers
to this being the baseline heteroskedasticity-consistent covariance matrix
estimator.
▶ We know, however, that ebi2 is biased towards zero. Recall that to estimate the
variance σ 2 the unbiased estimator s 2 scales the moment estimator σ̂ 2 by
n/(n − k). We make the same adjustment here:
n
!
n −1
X −1
VbβbHC1 = X ′X Xi Xi′ ebi2 X ′ X
n−k
i=1
23 / 25
Standard Errors
q rh i
s βj = Vβbj =
b b Vbβb
jj
24 / 25
Measures of Fit
▶ As we described in the previous chapter a commonly reported measure of
regression fit is the regression R 2 defined as
Pn
2 eb2 b2
σ
R = 1 − Pn i=1 i 2 = 1 − 2
σ
i=1 Yi − Ȳ
bY
2
where σ̂Y2 = n−1 ni=1 Yi − Ȳ .R 2 is an estimator of the population parameter
P
,n-1
var [X ′ β] σ2
ρ2 = =1− 2
var[Y ] σY
▶ However, σb2 and σbY2 are biased. Theil (1961) proposed replacing these by the
2
eY2 = (n − 1)−1 ni=1 Yi − Ȳ yielding what is known
unbiased versions s 2 and σ
P
as R-bar-squared or adjusted R-squared:
Pn 1
2 (n − 1) e 2
s
R̄ 2 = 1 − 2 = 1 − i=1 i
b
Pn 2
σ
eY (n − k) i=1 Yi − Ȳ
25 / 25