0% found this document useful (0 votes)

82 views44 pages

CLM: Review: - OLS Estimation

This document provides a review and discussion of the generalized linear model (GLM) by relaxing some of the classical linear model assumptions. It begins by reviewing the classical linear model assumptions and ordinary least squares estimation. It then discusses relaxing some assumptions, including allowing for nonlinearities in the data generating process, stochastic regressors, and endogenous regressors. The document then introduces the generalized regression model, which relaxes the assumption that the error terms have equal and uncorrelated variances. This allows for heteroscedasticity and autocorrelation. It discusses how this impacts the properties of ordinary least squares, including its consistency and asymptotic normality. Finally, it discusses estimating the robust covariance matrix of the O

Uploaded by

S Stanev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views44 pages

CLM: Review: - OLS Estimation

Uploaded by

S Stanev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

RS-11

Lecture 11
GLS

CLM: Review
• Recall the CLM Assumptions
(A1) DGP: y = X  +  is correctly specified.
(A2) E[|X] = 0
(A3) Var[|X] = σ2 IT
(A4) X has full column rank –rank(X)=k–, where T ≥ k.

• OLS estimation: b = (X′X)-1X′ y

Var[b|X] = σ2 (X′X)-1
 b unbiased and efficient (MVUE)

• If (A5) |X ~N(0, σ2IT)  b|X ~N(, σ2(X’ X)-1)

Now, b is also the MLE (consistency, efficiency, invariance, etc). (A5)
gives us finite sample results for b (and for tests: t-test, F-test, Wald tests).

1
RS-11

CLM: Review - Relaxing the Assumptions

• Relaxing the CLM Assumptions:
(1) (A1) – Lecture 5. Now, we allow for some non-linearities in the
DGP.
 as long as we have intrinsic linearity, b keeps its nice properties.

(2) (A4) and (A5) – Lecture 7. Now, X stochastic: {xi,εi} i=1, 2, ...., T
is a sequence of independent observations. We require X to have finite
means and variances. Similar requirement for ε, but we also require
E[]=0. Two new assumptions:
(A2’) plim (X’/T) = 0.
(A4’) plim (XX/T)=Q.
 We only get asymptotic results for b (consistency, asymptotic
normality). Tests only have large sample distributions. Boostrapping or
simulations may give us better finite sample behavior.

CLM: Review - Relaxing the Assumptions

(3) (A2’) – Lecture 8. Now, a new estimation is needed: IVE/2SLS. We
need to find a set of l variables, Z such that
(1) plim(Z’X/T)  0 (relevant condition)
(2) plim(Z’/T) = 0 (valid condition –or exogeneity)

b 2 SLS  ( Xˆ ' Xˆ )  1 Xˆ ' y

b IV  ( Z ' X ) 1 Z ' y

 We only get asymptotic results for b2SLS (consistency, asymptotic

normality). Tests only have asymptotic distributions. Small sample
behavior may be bad. Problem: Finding Z.

(4) (A1) again! – Lecture 9. Any functional form is allowed. General

estimation framework: M-estimation, with only asymptotic results. A
special case: NLLS. Numerical optimization needed.

2
RS-11

Generalized Regression Model

• Now, we go back to the CLM Assumptions:
(A1) DGP: y = X  +  is correctly specified.
(A2) E[|X] = 0
(A3) Var[|X] = σ2 IT
(A4) X has full column rank – rank(X)=k –, where T ≥ k.

• We will relax (A3). The CLM assumes that observations are

uncorrelated and all are drawn from a distribution with the same
variance, σ2. Instead, we will assume:
(A3’) Var[|X] = Σ = 2. where  ≠ IT

• The generalized regression model (GRM) allows the variances to

differ across observations and allows correlation across observations.

Generalized Regression Model: Implications

• From (A3) Var[|X] = σ2 IT  Var[b|X] = σ2 (XX)-1.

• The true variance of b under (A3’) should be:

VarT[b|X] = E[(b – )(b – )’|X]
= (X’X)-1 E[X’εε’X|X] (X’X)-1
= (X’X)-1 XΣX (X’X)-1

• Under (A3’), the usual estimator of Var[b|X] –i.e., s2 (XX)-1– is

biased. If we want to use OLS, we need to estimate VarT[b|X].

• To avoid the bias of inference based on OLS, we would like to

estimate the unknown Σ. But, it has Tx(T+1)/2 parameters. Too many
to estimate with only T observations!

Note: We used (A3) to derive our test statistics. A revision is needed!

3
RS-11

Generalized Regression Model – Pure cases

• The generalized regression model:
(A1) DGP: y = X  +  is correctly specified.
(A2) E[|X] = 0
(A3’) Var[|X] = Σ = 2.
(A4) X has full column rank – rank(X)=k –, where T ≥ k.

• Leading Cases:
– Pure heteroscedasticity: E[ij|X] = ij = i2 if i=j
=0 if i ≠j
 Var[i|X] = i2

– Pure autocorrelation: E[ij|X] = ij if i ≠j

= 2 if i=j

Generalized Regression Model – Pure cases

• Relative to pure heteroscedasticity, LS gives each observation a
weight of 1/T. But, if the variances are not equal, then some
observations (low variance ones) are more informative than others.
Y

1

X1 X2 X3 X4 X5

4
RS-11

Generalized Regression Model – Pure cases

• Relative to pure autocorrelation, LS is based on simple sums, so the
information that one observation (today’s) might provide about
another (tomorrow’s) is never used.

Note: Heteroscedasticity and autocorrelation are different problems

and generally occur with different types of data. But, the implications
for OLS are the same.

GR Model: OLS Properties

• Unbiased
Given assumption (A2), the OLS estimator b is still unbiased. (Proof
does not rely on Σ):
E[b|X] =  + (X’X)-1 E[X’|X] = 0.

• Consistency
We relax (A2). Now, we assume use (A2’) instead. To get consistency,
we need VarT[b|X] → ∞ as T → ∞:
VarT[b|X] = (X’X)-1 XΣX (X’X)-1
= (1/T )(X’X/T)-1 (XΣX/T) (X’X/T)-1
Assumptions:
- plim (XX/T) = QXX a pd matrix of finite elements
- plim (XΣX/T) = QXΣX a finite matrix.

Under these assumptions, we get consistency for OLS.

5
RS-11

GR Model: OLS Properties

• Asymptotic normality?
√T (b – ) = (X’X/T)-1 (X’ε/√T)

Asymptotic normality for OLS followed from the application of the

CLT to X’ε/√T: 1 1
Q Q Q
b  a
 N( , xx xx xx )
T
where QXΣX = limT→∞Var[ T-½ t xt t].

In the context of the GR Model:

- Easy to do for heteroscedastic data. We can use the Lindeberg-
Feller (assuming only independence) version of the CLT.

- Difficult for autocorrelated data, since X’ε/√T is not longer an

independent sum. We will need more assumptions to get asymptotic
normality.

GR Model: Robust Covariance Matrix

• Σ = 2 is unknown. It has Tx(T+1)/2 elements to estimate. Too
many! A solution? Be explicit about (A3’): we model Σ.

• But, models for autocorrelation and/or heteroscedasticity may be

incorrect. The robust approach estimates VarT[b|X], without
specifying (A3’) –i.e., a covariance robust to misspecifications of (A3’).

• We need to estimate VarT[b|X] = (X’X)-1 XΣX (X’X)-1

• It is important to notice a distinction between estimating

Σ, a (TxT) matrix  difficult with T observations.
& estimating
XΣX = ij ij xi xj, a (kxk) matrix  easier!

6
RS-11

GR Model: Robust Covariance Matrix

• We will not be estimating Σ = 2. That is, we are not estimating
Tx(T+1)/2 elements. Impossible with T observations!

• We will estimate XΣX = ij ij xi xj, a (kxk) matrix. That is, we
are estimating [kx(k+1)]/2 elements.

• This distinction is very important in modern applied econometrics:

– The White estimator
– The Newey-West estimator

• Both estimators produce a consistent estimator of VarT[b|X]. To get

consistency, they both rely on the OLS residuals, e. Since b
consistently estimates , the OLS residuals, e, are also consistent
estimators of . We use e to consistently estimate XΣX.

GR Model: XΣX
• Q: How does XΣX look like? Time series intuition.
We look at the simple linear model, with only one regressor (in this
case, xii is just a scalar). Assume xii is covariance stationary (see Lecture
13) with autocovariances γj. Then, we derive XΣX:

XΣX = Var[X’ε/√T] = Var[(1/√T) (x11+ x22 + ... + xTT)]

= (1/T)[Tγ0+(T -1)(γ1+γ-1)+(T -2)(γ2+γ-2)+... +1 (γT-1+γ1-T)]
1 T 1
  0   (T  j )( j   j )
T j 1
T 1
1 T 1
   j   j ( j   j )
j   T 1 T j 1

where γj is the autocovariance of xii at lag j (γ0= σ2 = variance of xii).

7
RS-11

GR Model: XΣX
Under some conditions (autocovariances are “l-summable”, so
j j|γj|<∞), then
 1 T  
X ' X  var  
 T t 1
( xt et )  

p

j  
j

Note: In the frequency domain, we define the spectrum of xe at

frequency ω as:
1 
S ( )    j e  i j
2 j   1

Then, Q= 2π S(0) (Q is called the long-run variance.)

Covariance Matrix: The White Estimator

• The White estimator simplifies the estimation since it assumes
heteroscedasticity only –i.e., γj =0 (for j≠0). That is, Σ is a diagonal
matrix, with diagonal elements i2. Thus, we need to estimate:
Q* = (1/T) XΣX = (1/T) i i2 xi xi

• The OLS residuals, e, are consistent estimators of . This suggests

using ei2 to estimate i2.

That is, we estimate (1/T) XΣX with S0 = (1/T) i ei2 xi xi.

Note: The estimator is also called the sandwich estimator or

the White estimator (also known as Eiker-Huber-White
estimator).
Halbert White (1950-2012, USA)

8
RS-11

Covariance Matrix: The White Estimator

• White (1980) shows that a consistent estimator of Var[b|X] is
obtained if the squared residual in observation i –i.e., ei2-- is used as
an estimator of i2. Taking the square root, one obtains a
heteroscedasticity-consistent (HC) standard error.

• Sketch of proof.
Suppose we observe i. Then, each element of Q* would be equal to
E[i2 xi xi|xi].
Then, by LLN plim (1/T) ii2 xi xi = plim (1/T) i i2 xi xi

Q: Can we replace i2 by ei2? Yes, since the residuals e are consistent.
Then, the estimated HC variance is:
Est. VarT[b|X] = ( 1/T) (X’X/T)-1 [i ei2 xi xi/T] (X’X/T)-1

Covariance Matrix: The White Estimator

• Note that (A3) was not specified. That is, the White estimator is
robust to a potential misspecifications of heteroscedasticity in (A3).

• The White estimator allows us to make inferences using the OLS

estimator b in situations where heteroscedasticity is suspected, but we
do not know enough to identify its nature.

• Since there are many refinements of the White estimator, the White
estimator is usually referred as HC0 (or just “HC”):
HC0 = (X’X)-1 X’ Diag[ei2] X (X’X)-1

9
RS-11

The White Estimator: Some Remarks

(1) The White estimator is consistent, but it may not perform well in
finite samples –see, MacKinnon and White (1985). A good small
sample adjustment, HC3, following the logic of analysis of outliers:
HC3 = (X’X)-1 X’ Diag[ei2/(1-hii )2]X (X’X)-1
where hii = xi(X’X)-1 xi.
HC3 is also recommended by Long and Ervin (2000).
(2) The White estimator is biased (show it!). Biased corrections are
popular –see above & Wu (1986).
(3) In large samples, SEs, t-tests and F-tests are asymptotically valid.
(4) The OLS estimator remains inefficient. But inferences are
asymptotically correct.
(5) The HC standard errors can be larger or smaller than the OLS
ones. It can make a difference to the tests.

The White Estimator: Some Remarks

(6) It is used, along the Newey-West estimator, in almost all papers.
Included in all the packaged software programs. In R, you can use the
library “sandwich,” to calculate White SEs. They are easy to program:

# White SE in R
White_f <- function(y,X,b) {
T <- length(y); k <- length(b);
yhat <- X%*%b
e <- y-yhat
hhat <- t(X)*as.vector(t(e))
G <- matrix(0,k,k)
za <- hhat[,1:k]%*%t(hhat[,1:k])
G <- G + za
F <- t(X)%*%X
V <- solve(F)%*%G%*%solve(F)
white_se <- sqrt(diag(V))
ols_se <- sqrt(diag(solve(F)*drop((t(e)%*%e))/(T-k)))
l_se = list(white_se,olse_se)
return(l_se) }

10
RS-11

Application: 3 Factor Fama-French Model

• We estimate the 3 factor F-F model for IBM returns, using monthly
data Jan 1990 – Aug 2016 (T=320):
(U) IBMRet - rf = 0 + 1 (MktRet - rf) + 2 SMB + 4 HML + 

> b <- solve(t(x)%% x)%% t(x)%*%y # OLS regression

> t(b)
x1 x2 x3
[1,] -0.2331471 0.01018722 0.0009802843 -0.004445901
> SE_b
x1 x2 x3
0.011286100 0.002671466 0.003727686 0.003811820 ⟹ OLS SE

> library(sandwich)
> reg <- lm(y~x -1)
> VCHC <- vcovHC(reg, type = "HC0")
> sqrt(diag(VCHC))
x xx1 xx2 xx3
0.011389299 0.002724617 0.004054315 0.004223813 ⟹ White SE HC0

Application: 3 Factor Fama-French Model

> VCHC <- vcovHC(reg, type = "HC3")
> sqrt(diag(VCHC))
x xx1 xx2 xx3
0.011583411 0.002808216 0.004322115 0.004416238 ⟹ White SE HC3

11
RS-11

Baltagi and Griffin’s Gasoline Data (Greene)

World Gasoline Demand Data, 18 OECD Countries, 19 years
Variables in the file are

COUNTRY = name of country

YEAR = year, 1960-1978
LGASPCAR = log of consumption per car
LINCOMEP = log of per capita income
LRPMG = log of real price of gasoline
LCARPCAP = log of per capita number of cars

See Baltagi (2001, p. 24) for analysis of these data. The article on
which the analysis is based is Baltagi, B. and Griffin, J., "Gasolne
Demand in the OECD: An Application of Pooling and Testing
Procedures," European Economic Review, 22, 1983, pp. 117-
137. The data were downloaded from the website for Baltagi's text.

Groupwise Heteroscedasticity: Gasoline (Greene)

Countries
are ordered
by the
standard
deviation
of their 19
residuals.

Regression of log of per capita gasoline use on log of per

capita income, gasoline price and number of cars per capita for
18 OECD countries for 19 years. The standard deviation varies
by country. The “solution” is “weighted least squares.”

12
RS-11

White Estimator vs. Standard OLS (Greene)

BALTAGI & GRIFFIN DATA SET

| White heteroscedasticity robust covariance matrix |

+----------------------------------------------------+
Constant| 2.39132562 .11794828 20.274 .0000
LINCOMEP| .88996166 .04429158 20.093 .0000
LRPMG | -.89179791 .03890922 -22.920 .0000
LCARPCAP| -.76337275 .02152888 -35.458 .0000

Autocorrelated Residuals: Gasoline Demand

logG=β1 + β2logPg + β3logY + β4logPnc + β5logPuc + ε

13
RS-11

Newey-West Estimator
• Now, we also have autocorrelation. We need to estimate
Q* = (1/T) XΣX = (1/T) ij ij xi xj

• Newey and West (1987) follow White (1980) to produce a HAC

(Heteroscedasticity and Autocorrelation Consistent) estimator of Q*,
also referred as long-run variance (LRV): Use eiej to estimate ij
 natural estimator of Q*: (1/T) ij xiei ejxj

Or using time series notation, estimator of Q*: (1/T) ts xtet esxs

• That is, we have a sum of the estimated autocovariances of xtεt, Гj:

T 1
T ( j )   E[x  
j   ( T 1 )
t t t j xt  j ' ]

Whitney Newey, USA Kenneth D. West, USA

Newey-West Estimator
• Natural estimator of Q*: ST = (1/T) ts xtet esxs.

Note: If xtεt are serially uncorrelated, the autocovariances vanish. We

are left with the White estimator.

Under some conditions (autocovariances are “l-summable”), then

 1 T
 
Q *  var 
 T
 ( x e ) 
t 1
t t 
p

j  
T ( j)


• Natural estimator of Q*: ST   ˆ ( j )
j  
T

• We can estimate Q* in two ways:

(1) parametrically, assuming a model to calculate γj.
(2) non-parametrically, using kernel estimation.
Note: (1) needs a specification of (A3’); while (2) does not.

14
RS-11

Newey-West Estimator
• The parametric estimation uses an ARMA model – say, an AR(2) –
to calculate γj.

• The non-parametric estimation uses:

L
1 j
ST  k( j)ˆT ( j), where ˆT ( j)   xt et xt  j et  j  ˆT ( j) ( j  0).
j L T t  j1
• Issues:
– Order of ARMA in parametric estimation or number of lags (L) in
non-parametric estimation.
– Choice of k(j) weights –i.e., kernel choice.
– The estimator, ST, needs to be psd.

• NW propose a robust –no model for (A3’) needed– non-parametric

estimator.

Newey-West Estimator

• Natural estimator of Q*: ST   ˆ
j  
T ( j)

Issue 1: This sum has T2 terms. It is difficult to get convergence.

Solution: We need to make sure the sum converges. Cutting short the
sum is one way to do it, but we need to careful, for consistency the
sum needs to grow as T→ ∞ (we need to sum infinite Гj’s).

• Trick: Use a truncation lag, L, that grows with T but at a slower rate
–i.e., L=L(T); say, L=0.75*(T)1/3-1. Then, as T → ∞ and L/T→ 0:
L (T )
QT*  
j   L (T )
T ( j) 
p
Q*

• Replacing Г(j) by its estimate, we get ST, which would be consistent

for Q* provided that L(T) does not grow too fast with T.

15
RS-11

Newey-West Estimator
• Issue 2 (& 3): ST needs to be psd to be a proper covariance matrix.

• Newey-West (1987): Based on a quadratic form and using the

Bartlett kernel produce a consistent psd estimator of Q*:
T 1
j ˆ
ST  
j   ( T 1)
k(
L (T )
) T ( j )
j | j|
where k ( )  1 is the Bartlett kernel or window,
L (T ) L 1
and L(T) is its bandwidth.

• Intuition for Bartlett kernel: Use weights in the sum that imply that
the process becomes less autocorrelated as time goes by –i.e, the
terms have a lower weight in the sum as the difference between t and
s grows.

Newey-West Estimator
• Other kernels work too. Typical requirements for k(.):
– |k(x)| ≤ 1;
– k(0) = 1;
– k(x) = k(−x) for all x∈R,
– ∫|k(x)| dx <∞;
– k(.) is continuous at 0 and at all but a finite number of other points
in R, and


k ( x)e it dx  0,   

The last condition is bit technical and ensures psd, see Andrews
(1991).

16
RS-11

Newey-West Estimator
• Two components for the NW HAC estimator:
(1) Start with Heteroscedasticity Component:
S0 = (1/T) i ei2 xi xi – the White estimator.

(2) Add the Autocorrelation Component

ST = S0 + (1/T) l kL(l) t=l+1,...,T (xt-let-l etxt+ xtet et-lxt-l)
where
j | j|
k( )  1 –The Bartlett kernel
L (T ) L 1
 linearly decaying weights.
Then,
Est. Var[b] = (1/T) (X’X/T)-1 S (X’X/T)-1 –NW’s HAC Var.

• Under suitable conditions, as L, T → ∞, and L/T→ 0, ST → Q*.

Asymptotic inferences on β, based on OLS b, can be done with t-test
and Wald tests using N(0,1) and χ2 critical values, respectively.

NW Estimator: Alternative Computation

• The sum-of-covariance estimator can alternatively be computed in
the frequency domain as a weighted average of periodogram ordinates
(an estimator of the spectrum at frequency (2π j/T). To be discussed
in Time Series lectures.):
T 1
S TWP  2  K T ( 2 j / T ) I xeex ( 2 j / T )
j 1
T 1
where K T  T 1  K T ( u / L ) e iwu and Ixe is the periodogram of xtet at
u 0
frequency ω:
T
I xeex ( )  d xe ( )d xe ( )' , where d xe ( )  2  ( xt et )eit
t 1

• Under suitable conditions, as L & T → ∞ and L/T→ 0,

SWPT → Q*.

17
RS-11

NW Estimator: Kernel Choice

• Other kernels, kL(l), besides the Bartlett kernel, can be used:

- Parzen kernel –Gallant (1987).

k L ( l )  1  6 l 2  6 | l |3 for 0 | l | 1 / 2
 2 (1 | l | )
3
for 0 | l | 1 / 2
- 0 otherwise
Quadratic spectral (QS) kernel –Andrews (1991):
kL(l) = 25/(12π2l2)[sin(6 π l/5)/(6 π l) - cos(6 π l/5)]

- Daniell kernel –Ng and Perron (1996):

kL(l) =sin(π l)/(π l)

• These kernels are all symmetric about the vertical axis. The Bartlett
and Parzen kernels have a bounded support [−1, 1], but the other
two have unbounded support.

NW Estimator: Kernel Choice

kL(x)

x
• Q: In practice –i.e., in finite samples– which kernel to use? And
L(T)? Asymptotic theory does not help us to determine them.

• Andrews (1991) finds optimal kernels and bandwidths by

minimizing the (asymptotic) MSE of the LRV. The QS kernel is
8.6% more efficient than the Parzen kernel; the Bartlett kernel is the
worst one. (BTW, different kernels have different optimal L.)

18
RS-11

NW Estimator: Remarks
• Today, the HAC estimators are usually referred as NW estimators,
regardless of the kernel used if they produce a psd covariance matrix.

• All econometric packages (SAS, SPSS, Eviews, etc.) calculate NW

SE. In R, you can use the library “sandwich,” to calculate NW SEs:
> NeweyWest(x, lag = NULL, order.by = NULL, prewhite = TRUE, adjust = FALSE,
diagnostics = FALSE, sandwich = TRUE, ar.method = "ols", data = list(), verbose = FALSE)

Example:
## fit investment equation using the 3 factor Fama French Model for IBM returns,
fit <- lm(y ~ x -1)

## NeweyWest computes the NW SEs. It requires lags=L & suppression of prewhitening

NeweyWest(fit, lag = 4, prewhite = FALSE)

Note: It is usually found that the NW SEs are downward biased.

NW Estimator: Remarks
• You can also program the NW SEs yourself. In R:
NW_f <- function(y,X,b,lag)
{ F <- t(X)%*%X
T <- length(y); V <- solve(F)%*%G%*%solve(F)
k <- length(b); nw_se <- sqrt(diag(V))
yhat <- X%*%b ols_se <- sqrt(diag(solve(F)*drop((t(e)%*%e))/(T-k)))
e <- y - yhat l_se = list(nw_se,ols_se)
hhat <- t(X)*as.vector(t(e)) return(l_se)
G <- matrix(0,k,k) }
a <- 0
w <- numeric(T) NW_f(y,X,b,lag=4)
while (a <= lag) {
Ta <- T - a
ga <- matrix(0,k,k)
w[lag+1+a] <- (lag+1-a)/(lag+1)
za <- hhat[,(a+1):T] %*% t(hhat[,1:Ta])
ga <- ga + za
G <- G + w[lag+1+a]*ga
a <- a+1
}

19
RS-11

NW Estimator: Example in R
Example: We estimate the 3 factor F-F model for IBM returns:
> library(sandwich)
> reg <- lm(y~x -1)
> reg$coefficients
x xx1 xx2 xx3
-0.2331470817 0.0101872239 0.0009802843 -0.0044459013 ⟹ OLS b

> SE_HC <- diag(sqrt(abs(vcovHC(reg, type= "HC3"))))

> SE_HC
x xx1 xx2 xx3
0.011583411 0.002808216 0.004322115 0.004416238 ⟹ White SE HC3

> NW <- NeweyWest(reg, lag = 4, prewhite = FALSE)

> SE_NW <- diag(sqrt(abs(NW)))
> SE_NW
x xx1 xx2 xx3
0.023629578 0.002798171 0.003895311 0.005431146 ⟹ NW SE

NW Estimator: Remarks
• Parametric estimators of Q* are simple and perform reasonably
well. But, we need to specify the ARMA model. Thus, they are not
robust to misspecification of (A3’). This is the appeal of White & NW.

• NW SEs perform poorly in Monte Carlo simulations:

- NW SEs tend to be downward biased.
- The finite-sample performance of tests using NW SE is not well
approximated by the asymptotic theory.
- Tests have serious size distortions.

• A key assumption in establishing consistency is that L → ∞ as

T → ∞, but L/T→ 0. But, in practice, the fraction L/T is never equal
to 0, but approaches some positive fraction b (b є (0,1]). Under this
situation, we need new asymptotics to derive properties of estimator.

20
RS-11

NW Estimator: Remarks
• There are estimators of Q* that are not consistent, but with better
small sample properties. See Kiefer, Vogelsang and Bunzel (2000).

• The SE based on these inconsistent estimators of Q* that are used

for testing are referred as Heteroskedasticity-Autocorrelation Robust
(HAR) SE.

• More on this topic in Lecture 13.

References: Müller (2014) & Sun (2014). There is a recent review (not
that technical) paper by Lazarus, Lewis, Stock & Watson (2016) with
recommendations on how to use these HAR estimators.

Autocorrelated Residuals: Gasoline Demand

logG=β1 + β2logPg + β3logY + β4logPnc + β5logPuc + ε

21
RS-11

NW Estimator vs. Standard OLS (Greene)

BALTAGI & GRIFFIN DATA SET
--------+--------------------------------------------------
Variable| Coefficient Standard Error t-ratio P[|T|>t]
Standard OLS
-------+---------------------------------------------------
Constant| -21.2111*** .75322 -28.160 .0000
LP| -.02121 .04377 -.485 .6303
LY| 1.09587*** .07771 14.102 .0000
LPNC| -.37361** .15707 -2.379 .0215
LPUC| .02003 .10330 .194 .8471
--------+------------------------------------------------
--------+--------------------------------------------------
Variable| Coefficient Standard Error t-ratio P[|T|>t]
Robust VC Newey-West, Periods = 10
--------+--------------------------------------------------
Constant| -21.2111*** 1.33095 -15.937 .0000
LP| -.02121 .06119 -.347 .7305
LY| 1.09587*** .14234 7.699 .0000
LPNC| -.37361** .16615 -2.249 .0293
LPUC| .02003 .14176 .141 .8882
--------+--------------------------------------------------

Generalized Least Squares (GLS)

• Assumptions (A1), (A2), (A3’) & (A4) hold. That is,
(A1) DGP: y = X  +  is correctly specified.
(A2) E[|X] = 0
(A3’) Var[|X] = σ2  (recall  is symmetric  T’T= )
(A4) X has full column rank –i.e., rank(X)=k–, where T ≥ k.

Note:  is symmetric  exists T ∋ T’T=   T’-1  T-1= I

• We transform the linear model in (A1) using P = -1/2.

P = -1/2  P’P = -1
Py = PX + P or
y* = X* + *.
E[**’|X*] = PE[’|X*]P’ = PE[’|X]P’ = σ2PP’
= σ2 -1/2  -1/2 = σ2IT  back to (A3)

22
RS-11

Generalized Least Squares (GLS)

• The transformed model is homoscedastic:
E[**’|X*]= PE[’|X*]P’ = σ2PP’ = σ2IT

• We have the CLM framework back  we can use OLS!

• Key assumption:  is known, and, thus, P is also known; otherwise

we cannot transformed the model.

• Q: Is  known?

Alexander C. Aitken (1895 –1967, NZ)

Generalized Least Squares (GLS)

• Aitken Theorem (1935): The Generalized Least Squares estimator.
Py = PX + P or
y* = X* + *.
E[**’|X*] = σ2IT
We can use OLS in the transformed model. It satisfies G-M theorem.
Thus, the GLS estimator is:
bGLS = b* = (X*’X*)-1 X*’y* = (X’P’PX)-1 X’P’Py
= (X’Ω-1X)-1 X’Ω-1y

Note I: bGLS ≠ b. bGLS is BLUE by construction, b is not.

Note II: Both unbiased and consistent. In practice, both estimators
will be different, but not that different. If they are very different,
something is rotten in Denmark.

23
RS-11

Generalized Least Squares (GLS)

• Check unbiasedness:
bGLS = (X’Ω-1X)-1 X’Ω-1y =  +(X’Ω-1X)-1 X’Ω-1 
E[bGLS |X]= 

• Efficient Variance
Var[bGLS |X] = E[(bGLS - )(bGLS - )’|X]
= E[(X’Ω-1X)-1 X’Ω-1  ’ X’Ω-1 (X’Ω-1X)-1|X]
= (X’Ω-1X)-1 X’Ω-1 E[’|X] Ω-1X(X’Ω-1X)-1
= σ2(X’Ω-1X)-1

Note: bGLS is BLUE. This “best” variance can be derived from

Var[bGLS|X] = σ2(X*’X*)-1 = σ2(X’Ω-1X)-1
Then, the usual variance of the OLS estimator is biased and
inefficient!

Generalized Least Squares (GLS)

• If we relax the CLM assumptions (A2) and (A4), as we did in
Lecture 7, we only have asymptotic properties for GLS:
– Consistency - “well behaved data.”
– Asymptotic distribution under usual assumptions.
(easy for heteroscedasticity, complicated for autocorrelation.)
– Wald tests and F-tests with usual asymptotic χ2 distributions.

24
RS-11

Consistency (Green)
Use Mean Square
1
 2  X'Ω -1 X 
ˆ|X ]=
Var[ β    0?
n  n 
 X'Ω -1 X

Requires to be   "well behaved"
 n 
Either converge to a constant matrix or diverge.
Heteroscedasticity case:
X'Ω -1 X 1 1
  i1
n
x i x i'
n n ii
Autocorrelatio n case:
X'Ω -1 X 1 1
  i1  j 1
n n
x i x j'. n 2 terms. Convergence is unclear.
n n ij

Consistency – Autocorrelation case (Green)

X '  1 X 1 T T 1  2 T T
  xi x j '  02   t s
T T j 1 i 1 ij T t 1 s 1

• If the {Xt} were uncorrelated –i.e., ρk=0–, then Var[bGLS|X] → 0.

• We need to impose restrictions on the dependence among the Xt’s.

Usually, we require that the autocorrelation, ρk, gets weaker as t-s
grows (and the double sum becomes finite).

25
RS-11

Asymptotic Normality (Green)

1
ˆ  β)  n  X ' Ω X  1 X ' Ω 1ε
1
n(β
 n  n
Converge to normal with a stable variance O(1)?
1
 X ' Ω 1 X 
   a constant matrix?
 n 
1 1
X ' Ω ε  a mean to which we can apply the
n
central limit theorem?
Heteroscedasticity case?
1 1 n x i  i     x
n
X ' Ω 1 ε =  
n i1 i  i
. Var  i
  
  2 , i is just data.
 i
  i 
Apply Lindeberg-Feller. (Or assuming x i / i is a draw from a common
distribution with mean and fixed variance - some recent treatments.)
Autocorrelation case?

Asymptotic Normality – Autocorrelation case

For the autocorrelation case
1 1
 i1  j1 Ω ij x i x j  i i
n n
X ' Ω 1 ε =
n n

Does the double sum converge? Uncertain. Requires elem ents

of Ω  1 to becom e sm all as the distance betw een i and j increases.
(Has to resem ble the heteroscedasticity case.)

• The dependence is usually broken by assuming {xi et} form a mixing

sequence. The intuition behind mixing is simple; but, the formal
details and its application to the CLT can get complicated.

• Intuition: {Zt} is a mixing sequence if any two groups of terms of the

sequence that are far apart from each other are approximately
independent --and the further apart, the closer to being independent.

26
RS-11

Brief Detour: Time Series

• With autocorrelated data, we get dependent observations. Recall,
t = t-1 + ut

• The independence assumption (A2’) is violated. The LLN and the

CLT cannot be easily applied, in this context. We need new tools
and definitions.

• We will introduce the concepts of stationarity and ergodicity. The

ergodic theorem will give us a counterpart to the LLN.

• To get asymptotic distributions, we also need a CLT for dependent

variables, using the concept of mixing and stationarity. Or we can
rely on the martingale CLT. We will leave this as “coming attractions.”

Brief Detour: Time Series - Stationarity

• Consider the joint probability distribution of the collection of RVs:

F ( z t1 , z t 2 ,..... z tn )  P ( Z t1  z t1 , Z t2  z t 2 ,... Z tn  z tn )

Then, we say that a process is

1st order stationary if F ( zt1 )  F ( zt1  k ) for any t1 , k

2nd order stationary if F ( zt1 , zt2 )  F ( zt1 k , zt2 k ) for any t1 , t2 , k

Nth-order stationary if F ( zt1 .....ztn )  F ( zt1 k .....ztn k ) for any t1, tn , k

• Definition. A process is strongly (strictly) stationary if it is a Nth-order

stationary process for any N.

27
RS-11

Brief Detour: Time Series – Moments

• The moments describe a distribution. We calculate the moments as
usual.
E (Z t )   t 
 Z t f ( z t ) dz t

Var ( Z t )   2t  E ( Z t   t ) 2 
 ( Z t   t ) 2 f ( z t ) dz t

Cov ( Z t , Z t )  E [( Z t   t )( Z t   t )]
1 2 1 1 2 2

cov( Z t , Z t )
 ( t1 , t 2 )  1 2

2 2
t1 t2

• Stationarity requires all these moments to be independent of time.

Brief Detour: Time Series – Moments

• For strictly stationary process: t   and  t2   2

because F ( z t1 )  F ( z t1  k )   t1   t1  k  

provided that E ( Z t )  , E ( Z 2 t )  
Then, F ( z t1 , z t 2 )  F ( z t 1  k , z t 2  k ) 
cov( z t1 , z t 2 )  cov( z t1  k , z t 2  k ) 
 ( t1 , t 2 )   ( t 1  k , t 2  k )
let t1  t  k and t 2  t , then
 ( t1 , t 2 )   ( t  k , t )   ( t , t  k )   k

The correlation between any two RVs depends on the time difference.

28
RS-11

Brief Detour: Time Series – Weak Stationarity

• A process is said to be Nth-order weakly stationary if all its joint

moments up to order N exist and are time invariant.

• A Covariance stationary process (or 2nd order weakly stationary) has:

– constant mean
– constant variance
– covariance function depends on time difference between RV.

Brief Detour: Time Series – Ergodicity

• We want to allow as much dependence as the LLN allows us to do it.
• But, stationarity is not enough, as the following example shows:

• Example: Let {Ut} be a sequence of i.i.d. RVs uniformly distributed

on [0, 1] and let Z be N(0, 1) independent of {Ut}.
Define Yt = Z + Ut. Then, Yt is stationary (why?), but
n
1 1
Yn 
n Y
t 1
t 
no
E (Yt ) 
2
1
Yn  Z 
p
2
The problem is that there is too much dependence in the sequence {Yt}.
In fact the correlation between Y1 and Yt is always positive for any
value of t.

29
RS-11

Brief Detour: Time Series – Ergodicity of mean

• We want to estimate the mean of the process {Zt}, μ(Zt). But, we
need to distinguishing between ensemble average and time average:
m

 Z i
– Ensemble Average z  i1
m
n

 Z t
– Time Series Average z  t1
n

Q: Which estimator is the most appropriate?

A: Ensemble Average, the average of re-runs of identical experiments.
But, it is impossible to calculate. We only observe one experiment: Zt.

• Q: Under which circumstances we can use the time average (only one
realization of {Zt})? Is the time average an unbiased and consistent
estimator of the mean? The Ergodic Theorem gives us the answer.

Brief Detour: Time Series – Ergodicity of mean

• Recall the sufficient conditions for consistency of an estimator: the
estimator is asymptotically unbiased and its variance asymptotically
collapses to zero.
1. Q: Is the time average asymptotically unbiased? Yes.
1 1
E (z) 
n
 E (Z
t
t )
n
 
t

2. Q: Is the variance going to zero as T grows? It depends.

0 0
n n n n n
1
var(z) 
n2
cov(Z , Z )  n 
t 1 s 1
t s 2
t 1 s 1
t s 
n2
(
t 1
t 1  t 2 
⋯ t n ) 

0
 [(0  1  ⋯ n1 )  (1  0  1  ⋯ n2 ) 
n2
 ⋯ ((n1)  (n2)  ⋯ 0 )] 

30
RS-11

Brief Detour: Time Series – Ergodicity of mean

n 1
0 0
  (1 
k
var( z )  2
( n  k ) k  ) k
n k   ( n  1)
n k
n

0
 (1 
k ?
lim var( z )  lim ) k 
 0
n  n  n n
k

• If Zt were uncorrelated, the variance of the time average would be

O(n-1). Since independent random variables are necessarily
uncorrelated (but not vice versa), we have just recovered a form of the
LLN for independent data.

Q: How can we make the remaining part, the sum over the upper
triangle of the covariance matrix, go to zero as well?
A: We need to impose conditions on ρk. Conditions weaker than "they
are all zero;" but, strong enough to exclude the sequence of identical
copies.

Brief Detour: Time Series – Ergodicity of mean

• We use two inequalities to put upper bounds on the variance of the
time average:
n 1 n t n 1 n t n 1 

 t 1 k 1
k  
t 1 k 1
k   
t 1 k 1
k

Covariances can be negative, so we upper-bound the sum of the actual

covariances by the sum of their magnitudes. Then, we extend the inner
sum so it covers all lags. This might of course be infinite (sequence-of-
identical-copies).
• Definition: A covariance-stationary process is ergodic for the mean if
p lim z  E ( Z t )  
Ergodicity Theorem: Then, a sufficient condition for ergodicity for
the mean is
 k  0 as k  

31
RS-11

Brief Detour: Time Series – Ergodicity of 2nd

moments
• A sufficient condition to ensure ergodicity for second moments is:

 k
 k  

A process which is ergodic in the first and second moments is usually

referred as ergodic in the wide sense .

• Ergodicity under Gaussian Distribution

If {Zt}is a stationary Gaussian process, 
k
k 

is sufficient to ensure ergodicity for all moments.

Note: Recall that only the first two moments are needed to describe
the normal distribution.

Test Statistics (Assuming Known Ω) (Green)

• Back to GLS. From (A1)-(A4), we get:
bGLS = (X*’X*)-1 X*’y* = (X’Ω-1X)-1 X’Ω-1y
Var[bGLS|X] = σ2(X*’X*)-1 = σ2(X’Ω-1X)-1

• With known Ω, apply all familiar results to the transformed model:

- With normality, (A5) holds, t- and F-statistics apply to least squares
based on Py and PX

- Without normality, we rely on asymptotic results, where we get

asymptotic normality for bGLS. We use Wald statistics and the chi-
squared distribution, still based on the transformed model.

• Key step to do GLS: Derive the transformation matrix P = -1/2.

32
RS-11

(Weighted) GLS: Pure Heteroscedasticity

• Key step to do GLS: Derive the transformation matrix P = -1/2.
 1 0 ... 0
0 2 ... 0 
Var[]  2   2 
0 0 ... 0
 
0 0 ... n 
1 / 1 0 ... 0
 
 0 1 / 2 ... 0 
 -1/2  
 0 0 ... 0 
 0 0 ... 1 / n 

1
ˆ  ( X'Ω -1 X ) 1 ( X'Ω -1 y )    n 1 x x     n 1 x y 
β i i i i

i1
i  
i1
i 
2
ˆ
 y  x β
 i1  i  i 
n

 2
ˆ 
 i   WLS: Think of [ωi]-1/2 as weights.
nK
We do OLS with the weighted data.

GLS: First-order Autocorrelation -AR(1)- Case

• Let t = t-1 + ut (a first order autocorrelated process).

Let ut = non-autocorrelated, white noise error ~ D(0, σu2)

• Then, t = t-1 + ut (the autoregressive form)

= (t-2 + ut-1) + ut
= ...
= ut + ut-1 + 2ut-2 + 3ut-3 + ...
= Σj=0 j ut-j (a moving average)

• Var[t] = Σj=0 2j Var[ut-j] = Σj=0 2j σu2

= σu2 /(1 – 2) –we need to assume ||< 1.

• Easier:
Var[t] = 2 Var[t-1] + Var[ut]  Var[t] = σu2 /(1- 2)

33
RS-11

GLS: AR(1) Case - Autocovariances

Continuing...
Cov[ t ,  t 1 ] = Cov[ t 1  ut ,  t 1 ]
= Cov[ t 1 ,  t 1 ]  Cov[ut ,  t 1 ]
= Var[ t-1 ]  Var[ t ]
u2
=
(1  2 )
Cov[ t ,  t 2 ] = Cov[ t 1  ut ,  t 2 ]
= Cov[ t 1 ,  t 2 ]  Cov[ut ,  t 2 ]
= Cov[ t ,  t 1 ]
2 u2
= and so on.
(1  2 )

GLS: AR(1) Case - Autocorrelation Matrix

• Now, we get Σ = σ2 .

 1  2 L
⋯  T 1 
 
 1  L
⋯ T 2 
 u   2
2
 Ω  
2
2 
   1 L
⋯ T 3 
1   
 ⋯M ⋯
M ⋯
M ⋯
O ⋯
M 
  1
T
 T 2
 T 3
⋯
L 1 


( N o te , tr a c e Ω = n a s r e q u ir e d .)

34
RS-11

GLS: First-order Autocorrelation Case

• Then, we can get the transformation matrix P = -1/2:

 1  2 0 0 ... 0
 
  1 0 ... 0
Ω 1 / 2
 0  1 ... 0
 
 ... ... ... ... ...
 
 0 0 0  0

 
 1  2 y 
1
 
 y  y 
 2 21 
Ω 1 / 2 y =  y  y   GLS: Transformed y*.
3 2
 
 ... 
 y  
 T T 1 

GLS: The Autoregressive Transformation

• With AR models, sometimes it is easier to transform the data by
taking pseudo differences.

• For the AR(1) model, we have,

yt  x t 'β  t  t   t 1  u t
 y t 1   x t -1 'β   t 1

y t   y t 1  ( x t   x t -1 )'β + (  t   t 1 )
y t   y t 1  ( x t   x t -1 )'β + u t

(Where did the first observation go?)

35
RS-11

GLS: Unknown  (Green)

• Problem with GLS:  is unknown.

• Solution: Estimate .  Feasible GLS (FGLS).

• For now, we will consider two methods of estimation:

– Two-step, or Feasible estimation. Estimate  first, then do GLS.
Emphasize same logic as White and Newey-West: We do not need
to estimate . We need to find a matrix that behaves the same as
(1/T)X-1X.
– Nice asymptotic properties of the FGLS estimator.

• ML estimation of , 2, and  all at the same time.

– Joint estimation of all parameters. Fairly rare. Some generalities.
– We will examine two applications:
- Harvey’s model of heteroscedasticity
- Beach-MacKinnon on the AR(1)model (see Lecture 13).

GLS: Specification of  (Green)

•  must be specified first.
• A full unrestricted  contains T(T+1)/2 - 1 parameters. (Why
minus 1? Remember, tr() = T, so one element is determined.)
•  is generally specified (modeled) in terms of a few parameters.
Thus,  = () for some small parameter vector . It becomes a
question of estimating .

• Examples:
(1) Var[i|X] = 2 exp(zi). Variance a function of  and some
variable zi (say, firm size or country).

(2) i with AR(1) process. We have already derived 2  as a

function of .

36
RS-11

Harvey’s Model of Heteroscedasticity (Green)

• The variance for observation i is a function of zi:
Var[i|X] = 2 exp(zi)
But, errors are not auto/cross correlated:
Cov[i,j|X] = 0

• The driving variable, z, can be firm size, a set of dummy variables -

for example, for countries. This example is the one used for the
estimation of the previous groupwise heteroscedasticity model.

• Then, we have a functional form for Σ = 2 

Σ = diagonal [exp(  + zi)],
 = log(2)
Once we specify  (and can be estimated), GLS is feasible.

GLS: AR(1) Model of Autocorrelation (Green)

• We have already derived Σ = 2  for the AR(1) case..

 1  2 L
⋯  T 1 
 
 1  L
⋯ T 2 
 u   2
2
 Ω 
2
2 
   1 L
⋯ T 3 
 1    
 M ⋯ M
⋯ M⋯ O
⋯ M
⋯ 
  T 1 T 2 T 3 ⋯
L 1 


• Now, if we estimate σu2 and , we can do FGLS.

37
RS-11

Estimated AR(1) Model (Greene)

AR(1) Model: e(t) = rho * e(t-1) + u(t)
Initial value of rho = .87566
Maximum iterations = 1
Method = Prais - Winsten
Iter= 1, SS= .022, Log-L= 127.593
Final value of Rho = .959411
Std. Deviation: e(t) = .076512
Std. Deviation: u(t) = .021577
Autocorrelation: u(t) = .253173
N[0,1] used for significance levels
--------+-------------------------------------------------
Variable| Coefficient Standard Error b/St.Er. P[|Z|>z]
--------+-------------------------------------------------
Constant| -20.3373*** .69623 -29.211 .0000
LP| -.11379*** .03296 -3.453 .0006
LY| .87040*** .08827 9.860 .0000
LPNC| .05426 .12392 .438 .6615
LPUC| -.04028 .06193 -.650 .5154
RHO| .95941*** .03949 24.295 .0000
--------+-------------------------------------------------
Standard OLS
Constant| -21.2111*** .75322 -28.160 .0000
LP| -.02121 .04377 -.485 .6303
LY| 1.09587*** .07771 14.102 .0000
LPNC| -.37361** .15707 -2.379 .0215
LPUC| .02003 .10330 .194 .8471

Two-Step Estimation (Green)

• The general result for estimation when  is estimated.

• GLS uses [X-1X]-1 X -1 y which converges in probability to .

• We seek a vector which converges to the same thing that this does.
Call it “Feasible GLS” or FGLS, based on [X X]-1 X y

• The object is to find a set of parameters such that

[X X]-1 X y – [X -1 X]-1 X -1 y  0

38
RS-11

Feasible GLS (Green)

For FGLS estimation, we do not seek an estimator of Ω
such that
ˆ -Ω  0
Ω
ˆ is nxn and does not "converge" to
This makes no sense, since Ω
anything. We seek a matrix Ω such that
ˆ -1 X - (1/n)X'Ω-1 X  0
(1/n)X'Ω
For the asymptotic properties, we will require that
ˆ -1 - (1/n)X'Ω-1  0
(1/ n)X'Ω
Note in this case, these are two random vectors, which we require
to converge to the same random vector.

Two-Step FGLS (Green)

• Theorem 8.5: To achieve full efficiency, we do not need an efficient
estimate of the parameters in , only a consistent one.

• Q: Why?

39
RS-11

Harvey’s Model (Green)

• Examine Harvey’s model once again.
Estimation:
(1) Two-step FGLS: Use the OLS to estimate   . Then, use
{X [Ω()]-1 X}-1 X’ [Ω()]-1 y to estimate .

(2) Full ML estimation. Estimate all parameters simultaneously.

A handy result due to Oberhofer and Kmenta –the “zig-zag”
approach.

Examine a model of groupwise heteroscedasticity.

Andrew C. Harvey, England

Harvey’s Model: Groupwise Heteroscedasticity

• We have a sample, yig, xig,…, with

N groups, each with Tg observations.
Each group variance: Var[εig] = σg2

• Define a group dummy variable.

dig = 1 if observation ig is in group j,
=0 otherwise.
Then, model variances as:
Var[εig] = σg2 exp(θ2d2 + … + θNdN)
Var1 = σg2 –normalized variance (remember dummy trap!)
Var2 = σg2 exp(θ2)
... etc.

40
RS-11

Harvey’s Model: Two-Step Procedure (Green)

• OLS is still consistent. Do OLS and keep e.

Step 1. Using e, calculate the group variances. That is,
- Est.Var1 = e1’e1/T1 estimates σg2
- Est.Var2 = e2’e2/T2 estimates σg2 exp(θ2)
- Estimator of θ2 is ln[(e2’e2/T2)/(e1’e1/T1)]
- .... etc.
Step 2. Now, use FGLS –weighted least squares. Keep WLS residuals
Step 3. Using WLS residuals, recompute variance estimators.

Iterate until convergence between steps 2 and 3.

GLS: General Remarks

• GLS is great (BLUE) if we know . Very rare case.
• It needs the specification of  –i.e., the functional form of
autocorrelation and heteroscedasticity.
• If the specification is bad  estimates are biased.
• In general, GLS is used for larger samples, because more
parameters need to be estimated.
• Feasible GLS is not BLUE (unlike GLS); but, it is consistent and
asymptotically more efficient than OLS.
• We use GLS for inference and/or efficiency. OLS is still unbiased
and consistent.
• OLS and GLS estimates will be different due to sampling error. But,
if they are very different, then it is likely that some other CLM
assumption is violated –likely, (A2’).

41
RS-11

Baltagi and Griffin’s Gasoline Data (Greene)

World Gasoline Demand Data, 18 OECD Countries, 19 years
Variables in the file are

COUNTRY = name of country

YEAR = year, 1960-1978
LGASPCAR = log of consumption per car
LINCOMEP = log of per capita income
LRPMG = log of real price of gasoline
LCARPCAP = log of per capita number of cars

Baltagi and Griffin’s Gasoline Data (Greene) -

ANOVA

42
RS-11

White Estimator vs. Standard OLS (Greene)

BALTAGI & GRIFFIN DATA SET

| White heteroscedasticity robust covariance matrix |

Baltagi and Griffin’s Gasoline Data (Greene) –

Harvey’s Model
----------------------------------------------------------------------
Multiplicative Heteroskedastic Regression Model...
Ordinary least squares regression ............
LHS=LGASPCAR Mean = 4.29624
Standard deviation = .54891
Number of observs. = 342
Model size Parameters = 4
Degrees of freedom = 338
Residuals Sum of squares = 14.90436
Wald statistic [17 d.f.] = 699.43 (.0000) (Large)
B/P LM statistic [17 d.f.] = 111.55 (.0000) (Large)
Cov matrix for b is sigma^2*inv(X'X)(X'WX)inv(X'X)
--------+-------------------------------------------------------------
Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X
--------+-------------------------------------------------------------
Constant| 2.39133*** .20010 11.951 .0000
LINCOMEP| .88996*** .07358 12.094 .0000 -6.13943
LRPMG| -.89180*** .06119 -14.574 .0000 -.52310
LCARPCAP| -.76337*** .03030 -25.190 .0000 -9.04180
--------+-------------------------------------------------------------

43
RS-11

Baltagi and Griffin’s Gasoline Data (Greene) -

Variance Estimates = log[e(i)’e(i)/T]
Sigma| .48196*** .12281 3.924 .0001
D1| -2.60677*** .72073 -3.617 .0003 .05556
D2| -1.52919** .72073 -2.122 .0339 .05556
D3| .47152 .72073 .654 .5130 .05556
D4| -3.15102*** .72073 -4.372 .0000 .05556
D5| -3.26236*** .72073 -4.526 .0000 .05556
D6| -.09099 .72073 -.126 .8995 .05556
D7| -1.88962*** .72073 -2.622 .0087 .05556
D8| .60559 .72073 .840 .4008 .05556
D9| -1.56624** .72073 -2.173 .0298 .05556
D10| -1.53284** .72073 -2.127 .0334 .05556
D11| -2.62835*** .72073 -3.647 .0003 .05556
D12| -2.23638*** .72073 -3.103 .0019 .05556
D13| -.77641 .72073 -1.077 .2814 .05556
D14| -1.27341* .72073 -1.767 .0773 .05556
D15| -.57948 .72073 -.804 .4214 .05556
D16| -1.81723** .72073 -2.521 .0117 .05556
D17| -2.93529*** .72073 -4.073 .0000 .05556

Baltagi and Griffin’s Gasoline Data (Greene) -

OLS vs. Iterative FGLS
--------+-------------------------------------------------
Variable| Coefficient Standard Error b/St.Er. P[|Z|>z]
--------+-------------------------------------------------
|Ordinary Least Squares
|Cov matrix for b is sigma^2*inv(X'X)(X'WX)inv(X'X)
Constant| 2.39133*** .20010 11.951 .0000
LINCOMEP| .88996*** .07358 12.094 .0000
LRPMG| -.89180*** .06119 -14.574 .0000
LCARPCAP| -.76337*** .03030 -25.190 .0000
--------+------------------------------------------------
|FGLS - Regression (mean) function
Constant| 1.56909*** .06744 23.267 .0000
LINCOMEP| .60853*** .02097 29.019 .0000
LRPMG| -.61698*** .01902 -32.441 .0000
LCARPCAP| -.66938*** .01116 -59.994 .0000

• It looks like a substantial gain in reduced standard errors. OLS and

GLS estimates a bit different => problems?

Econometrics: CLM & OLS Basics
No ratings yet
Econometrics: CLM & OLS Basics
11 pages
Ecom 165 Notes
No ratings yet
Ecom 165 Notes
98 pages
Econometrics for Finance Students
No ratings yet
Econometrics for Finance Students
64 pages
Advanced Econometrics 1 (Lecture of 5 August 2025)
No ratings yet
Advanced Econometrics 1 (Lecture of 5 August 2025)
13 pages
FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai
No ratings yet
FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai
7 pages
Econometrics Study Guide
No ratings yet
Econometrics Study Guide
9 pages
Clusteringmf
No ratings yet
Clusteringmf
11 pages
CLRM Assumptions
No ratings yet
CLRM Assumptions
20 pages
Lecture Note 5: I V (IV) E: Outline
No ratings yet
Lecture Note 5: I V (IV) E: Outline
16 pages
Cameron & Trivedi 2005 Microeconometrics Methods and Applications Solutions
0% (3)
Cameron & Trivedi 2005 Microeconometrics Methods and Applications Solutions
19 pages
Cameron & Trivedi - Solution Manual Cap. 4-5
0% (1)
Cameron & Trivedi - Solution Manual Cap. 4-5
12 pages
3 SimpleLinearRegression
No ratings yet
3 SimpleLinearRegression
30 pages
ECO 401 Econometrics: SI 2021 Week 2, 14 September
100% (1)
ECO 401 Econometrics: SI 2021 Week 2, 14 September
47 pages
04 - Multiple Regression Asymptotics
No ratings yet
04 - Multiple Regression Asymptotics
32 pages
Introduction To Mathematical Modeling: Simple Linear Regression
No ratings yet
Introduction To Mathematical Modeling: Simple Linear Regression
21 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
4basic Econometrics Chapter III
No ratings yet
4basic Econometrics Chapter III
13 pages
Aitken' GLS
No ratings yet
Aitken' GLS
7 pages
Econometrics Regression Insights
No ratings yet
Econometrics Regression Insights
20 pages
Ordinary Least Squares: Rómulo A. Chumacero
No ratings yet
Ordinary Least Squares: Rómulo A. Chumacero
50 pages
Tema I (Mínimos Cuadrados Ordinarios)
No ratings yet
Tema I (Mínimos Cuadrados Ordinarios)
49 pages
Lecture 2: Simple Linear Regression Model: Recap
No ratings yet
Lecture 2: Simple Linear Regression Model: Recap
5 pages
Chapter3
No ratings yet
Chapter3
52 pages
CHP 3 Notes, Gujarati
No ratings yet
CHP 3 Notes, Gujarati
4 pages
Linear Regression for Statisticians
No ratings yet
Linear Regression for Statisticians
51 pages
Trix - Post Midsem Merged
No ratings yet
Trix - Post Midsem Merged
49 pages
Hayashi Solucionario
No ratings yet
Hayashi Solucionario
4 pages
Hayashi CH 1 Answers
No ratings yet
Hayashi CH 1 Answers
4 pages
Chapter3 PDF
No ratings yet
Chapter3 PDF
52 pages
Simple Linear Regression Model
No ratings yet
Simple Linear Regression Model
6 pages
Two-Variable Regression Model: The Problem of Estimation: Gujarati 4e, Chapter 3
No ratings yet
Two-Variable Regression Model: The Problem of Estimation: Gujarati 4e, Chapter 3
15 pages
EC501 Lecture 02
No ratings yet
EC501 Lecture 02
27 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
Tests
No ratings yet
Tests
10 pages
Chapter 2 Econometrics
No ratings yet
Chapter 2 Econometrics
9 pages
02 Simple Regression
No ratings yet
02 Simple Regression
29 pages
Lecture I - Docx - 12
No ratings yet
Lecture I - Docx - 12
10 pages
Additional Problem Set Units I and II
No ratings yet
Additional Problem Set Units I and II
8 pages
CH 05
No ratings yet
CH 05
18 pages
Linear Regression Analysis: Module - Vii
No ratings yet
Linear Regression Analysis: Module - Vii
10 pages
Econometrics - Fumio Hayashi (Solutions)
No ratings yet
Econometrics - Fumio Hayashi (Solutions)
19 pages
Wooldridge Notes
No ratings yet
Wooldridge Notes
15 pages
Multiple Linear Reegression
No ratings yet
Multiple Linear Reegression
21 pages
Week 2 - The Simple Linear Regression Model PDF
No ratings yet
Week 2 - The Simple Linear Regression Model PDF
47 pages
ECO375H Slides 3
No ratings yet
ECO375H Slides 3
39 pages
Properties of OLS Estimators: Assumptions Underlying Model
100% (1)
Properties of OLS Estimators: Assumptions Underlying Model
23 pages
GLS MMH
No ratings yet
GLS MMH
35 pages
1-Chap II Econometrics ABC DR Mitiku
No ratings yet
1-Chap II Econometrics ABC DR Mitiku
80 pages
Multiple Regression Analysis: y + X + X + - . - X + U
No ratings yet
Multiple Regression Analysis: y + X + X + - . - X + U
22 pages
Financial Econometrics Lecture 4
No ratings yet
Financial Econometrics Lecture 4
41 pages
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
No ratings yet
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
17 pages
Econometrics Chap 3
No ratings yet
Econometrics Chap 3
19 pages
Finance Students' Matlab Guide
No ratings yet
Finance Students' Matlab Guide
3 pages
Heteroskedasticity & Autocorrelation
No ratings yet
Heteroskedasticity & Autocorrelation
5 pages
Non-Spherical Errors: 1 Efficient OLS
No ratings yet
Non-Spherical Errors: 1 Efficient OLS
14 pages
Projet - COLD STORAGE
No ratings yet
Projet - COLD STORAGE
21 pages
Linear Discriminant Analysis (LDA) Tutorial
No ratings yet
Linear Discriminant Analysis (LDA) Tutorial
2 pages
Excel Guide for Data Analysts
No ratings yet
Excel Guide for Data Analysts
62 pages
2.2 Correlation
No ratings yet
2.2 Correlation
23 pages
Reynolds Child Depression Scale Article
No ratings yet
Reynolds Child Depression Scale Article
9 pages
The Boston Housing Dataset
100% (2)
The Boston Housing Dataset
4 pages
Continuous Random Variables & Distributions
No ratings yet
Continuous Random Variables & Distributions
47 pages
Research Method Slides Chap 1-13 Complete
83% (6)
Research Method Slides Chap 1-13 Complete
186 pages
Research Objective Past Questions
No ratings yet
Research Objective Past Questions
11 pages
Statistical Modeling For Forecasting of Area, Production and Productivity of Cumin in Banaskantha District of Gujarat
No ratings yet
Statistical Modeling For Forecasting of Area, Production and Productivity of Cumin in Banaskantha District of Gujarat
14 pages
Statistics Help Card Full
No ratings yet
Statistics Help Card Full
6 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
3 pages
Rahul Lidhar
No ratings yet
Rahul Lidhar
28 pages
(Ebook) Data Analysis Using Stata, Third Edition by Ulrich Kohler, Frauke Kreuter ISBN 9781597181105, 1597181102 Instant Download
100% (1)
(Ebook) Data Analysis Using Stata, Third Edition by Ulrich Kohler, Frauke Kreuter ISBN 9781597181105, 1597181102 Instant Download
55 pages
Xtxtgee
No ratings yet
Xtxtgee
19 pages
SHS Module: Understanding Bivariate Data
No ratings yet
SHS Module: Understanding Bivariate Data
17 pages
An Introduction To Linear Correlation - IBDP Mathematics - Applications and Interpretation SL FE2021 - Kognity
No ratings yet
An Introduction To Linear Correlation - IBDP Mathematics - Applications and Interpretation SL FE2021 - Kognity
6 pages
Student's T Test (With Correlation)
No ratings yet
Student's T Test (With Correlation)
23 pages
Clinical Applications Fourth Edition Revised and Expanded Drugs and The Pharmaceutical Sciences 1830408
100% (7)
Clinical Applications Fourth Edition Revised and Expanded Drugs and The Pharmaceutical Sciences 1830408
53 pages
Chapter 8 Hypotheses
No ratings yet
Chapter 8 Hypotheses
21 pages
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
Assignment Task
No ratings yet
Assignment Task
10 pages
Okafor.2022.e.sdm
No ratings yet
Okafor.2022.e.sdm
13 pages
Module 10 Confidence Interval For The Population Mean When Standard Deviation Is Unknown
No ratings yet
Module 10 Confidence Interval For The Population Mean When Standard Deviation Is Unknown
19 pages
An Introduction To Categorical Data Analysis, 2Nd Ed
No ratings yet
An Introduction To Categorical Data Analysis, 2Nd Ed
13 pages
The Parametic Test of Significance Test T - Distribution
No ratings yet
The Parametic Test of Significance Test T - Distribution
43 pages
Statistical Analysis in Climate Research Hans Von
No ratings yet
Statistical Analysis in Climate Research Hans Von
4 pages
Lesson Plan - ML
No ratings yet
Lesson Plan - ML
12 pages
Module 4. Data Collection and Sampling Week 3
No ratings yet
Module 4. Data Collection and Sampling Week 3
29 pages
Lecture 4 Biometry
No ratings yet
Lecture 4 Biometry
40 pages

CLM: Review: - OLS Estimation

Uploaded by

CLM: Review: - OLS Estimation

Uploaded by

RS-11

• OLS estimation: b = (X′X)-1X′ y

• If (A5) |X ~N(0, σ2IT)  b|X ~N(, σ2(X’ X)-1)

CLM: Review - Relaxing the Assumptions

CLM: Review - Relaxing the Assumptions

b 2 SLS  ( Xˆ ' Xˆ )  1 Xˆ ' y

 We only get asymptotic results for b2SLS (consistency, asymptotic

(4) (A1) again! – Lecture 9. Any functional form is allowed. General

Generalized Regression Model

• We will relax (A3). The CLM assumes that observations are

• The generalized regression model (GRM) allows the variances to

Generalized Regression Model: Implications

• The true variance of b under (A3’) should be:

• Under (A3’), the usual estimator of Var[b|X] –i.e., s2 (XX)-1– is

• To avoid the bias of inference based on OLS, we would like to

Note: We used (A3) to derive our test statistics. A revision is needed!

Generalized Regression Model – Pure cases

– Pure autocorrelation: E[ij|X] = ij if i ≠j

Generalized Regression Model – Pure cases

Generalized Regression Model – Pure cases

Note: Heteroscedasticity and autocorrelation are different problems

GR Model: OLS Properties

Under these assumptions, we get consistency for OLS.

GR Model: OLS Properties

Asymptotic normality for OLS followed from the application of the

In the context of the GR Model:

- Difficult for autocorrelated data, since X’ε/√T is not longer an

GR Model: Robust Covariance Matrix

• But, models for autocorrelation and/or heteroscedasticity may be

• We need to estimate VarT[b|X] = (X’X)-1 XΣX (X’X)-1

• It is important to notice a distinction between estimating

GR Model: Robust Covariance Matrix

• This distinction is very important in modern applied econometrics:

• Both estimators produce a consistent estimator of VarT[b|X]. To get

XΣX = Var[X’ε/√T] = Var[(1/√T) (x11+ x22 + ... + xTT)]

where γj is the autocovariance of xii at lag j (γ0= σ2 = variance of xii).

Note: In the frequency domain, we define the spectrum of xe at

Then, Q*= 2π S(0) (Q* is called the long-run variance.)

Covariance Matrix: The White Estimator

• The OLS residuals, e, are consistent estimators of . This suggests

That is, we estimate (1/T) XΣX with S0 = (1/T) i ei2 xi xi.

Note: The estimator is also called the sandwich estimator or

Covariance Matrix: The White Estimator

Covariance Matrix: The White Estimator

• The White estimator allows us to make inferences using the OLS

The White Estimator: Some Remarks

The White Estimator: Some Remarks

Application: 3 Factor Fama-French Model

> b <- solve(t(x)%*% x)%*% t(x)%*%y # OLS regression

Application: 3 Factor Fama-French Model

Baltagi and Griffin’s Gasoline Data (Greene)

COUNTRY = name of country

Groupwise Heteroscedasticity: Gasoline (Greene)

Regression of log of per capita gasoline use on log of per

White Estimator vs. Standard OLS (Greene)

BALTAGI & GRIFFIN DATA SET

| White heteroscedasticity robust covariance matrix |

Autocorrelated Residuals: Gasoline Demand

logG=β1 + β2logPg + β3logY + β4logPnc + β5logPuc + ε

• Newey and West (1987) follow White (1980) to produce a HAC

• That is, we have a sum of the estimated autocovariances of xtεt, Гj:

Whitney Newey, USA Kenneth D. West, USA

Note: If xtεt are serially uncorrelated, the autocovariances vanish. We

Under some conditions (autocovariances are “l-summable”), then

• We can estimate Q* in two ways:

• The non-parametric estimation uses:

• NW propose a robust –no model for (A3’) needed– non-parametric

Issue 1: This sum has T2 terms. It is difficult to get convergence.

• Replacing Г(j) by its estimate, we get ST, which would be consistent

• Newey-West (1987): Based on a quadratic form and using the

(2) Add the Autocorrelation Component

• Under suitable conditions, as L, T → ∞, and L/T→ 0, ST → Q*.

NW Estimator: Alternative Computation

• Under suitable conditions, as L & T → ∞ and L/T→ 0,

NW Estimator: Kernel Choice

- Parzen kernel –Gallant (1987).

Then, Q= 2π S(0) (Q is called the long-run variance.)

> b <- solve(t(x)%% x)%% t(x)%*%y # OLS regression