0% found this document useful (0 votes)
82 views44 pages

CLM: Review: - OLS Estimation

This document provides a review and discussion of the generalized linear model (GLM) by relaxing some of the classical linear model assumptions. It begins by reviewing the classical linear model assumptions and ordinary least squares estimation. It then discusses relaxing some assumptions, including allowing for nonlinearities in the data generating process, stochastic regressors, and endogenous regressors. The document then introduces the generalized regression model, which relaxes the assumption that the error terms have equal and uncorrelated variances. This allows for heteroscedasticity and autocorrelation. It discusses how this impacts the properties of ordinary least squares, including its consistency and asymptotic normality. Finally, it discusses estimating the robust covariance matrix of the O

Uploaded by

S Stanev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views44 pages

CLM: Review: - OLS Estimation

This document provides a review and discussion of the generalized linear model (GLM) by relaxing some of the classical linear model assumptions. It begins by reviewing the classical linear model assumptions and ordinary least squares estimation. It then discusses relaxing some assumptions, including allowing for nonlinearities in the data generating process, stochastic regressors, and endogenous regressors. The document then introduces the generalized regression model, which relaxes the assumption that the error terms have equal and uncorrelated variances. This allows for heteroscedasticity and autocorrelation. It discusses how this impacts the properties of ordinary least squares, including its consistency and asymptotic normality. Finally, it discusses estimating the robust covariance matrix of the O

Uploaded by

S Stanev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

RS-11

Lecture 11
GLS

CLM: Review
• Recall the CLM Assumptions
(A1) DGP: y = X  +  is correctly specified.
(A2) E[|X] = 0
(A3) Var[|X] = σ2 IT
(A4) X has full column rank –rank(X)=k–, where T ≥ k.

• OLS estimation: b = (X′X)-1X′ y


Var[b|X] = σ2 (X′X)-1
 b unbiased and efficient (MVUE)

• If (A5) |X ~N(0, σ2IT)  b|X ~N(, σ2(X’ X)-1)


Now, b is also the MLE (consistency, efficiency, invariance, etc). (A5)
gives us finite sample results for b (and for tests: t-test, F-test, Wald tests).

1
RS-11

CLM: Review - Relaxing the Assumptions


• Relaxing the CLM Assumptions:
(1) (A1) – Lecture 5. Now, we allow for some non-linearities in the
DGP.
 as long as we have intrinsic linearity, b keeps its nice properties.

(2) (A4) and (A5) – Lecture 7. Now, X stochastic: {xi,εi} i=1, 2, ...., T
is a sequence of independent observations. We require X to have finite
means and variances. Similar requirement for ε, but we also require
E[]=0. Two new assumptions:
(A2’) plim (X’/T) = 0.
(A4’) plim (XX/T)=Q.
 We only get asymptotic results for b (consistency, asymptotic
normality). Tests only have large sample distributions. Boostrapping or
simulations may give us better finite sample behavior.

CLM: Review - Relaxing the Assumptions


(3) (A2’) – Lecture 8. Now, a new estimation is needed: IVE/2SLS. We
need to find a set of l variables, Z such that
(1) plim(Z’X/T)  0 (relevant condition)
(2) plim(Z’/T) = 0 (valid condition –or exogeneity)

b 2 SLS  ( Xˆ ' Xˆ )  1 Xˆ ' y


b IV  ( Z ' X ) 1 Z ' y

 We only get asymptotic results for b2SLS (consistency, asymptotic


normality). Tests only have asymptotic distributions. Small sample
behavior may be bad. Problem: Finding Z.

(4) (A1) again! – Lecture 9. Any functional form is allowed. General


estimation framework: M-estimation, with only asymptotic results. A
special case: NLLS. Numerical optimization needed.

2
RS-11

Generalized Regression Model


• Now, we go back to the CLM Assumptions:
(A1) DGP: y = X  +  is correctly specified.
(A2) E[|X] = 0
(A3) Var[|X] = σ2 IT
(A4) X has full column rank – rank(X)=k –, where T ≥ k.

• We will relax (A3). The CLM assumes that observations are


uncorrelated and all are drawn from a distribution with the same
variance, σ2. Instead, we will assume:
(A3’) Var[|X] = Σ = 2. where  ≠ IT

• The generalized regression model (GRM) allows the variances to


differ across observations and allows correlation across observations.

Generalized Regression Model: Implications


• From (A3) Var[|X] = σ2 IT  Var[b|X] = σ2 (XX)-1.

• The true variance of b under (A3’) should be:


VarT[b|X] = E[(b – )(b – )’|X]
= (X’X)-1 E[X’εε’X|X] (X’X)-1
= (X’X)-1 XΣX (X’X)-1

• Under (A3’), the usual estimator of Var[b|X] –i.e., s2 (XX)-1– is


biased. If we want to use OLS, we need to estimate VarT[b|X].

• To avoid the bias of inference based on OLS, we would like to


estimate the unknown Σ. But, it has Tx(T+1)/2 parameters. Too many
to estimate with only T observations!

Note: We used (A3) to derive our test statistics. A revision is needed!

3
RS-11

Generalized Regression Model – Pure cases


• The generalized regression model:
(A1) DGP: y = X  +  is correctly specified.
(A2) E[|X] = 0
(A3’) Var[|X] = Σ = 2.
(A4) X has full column rank – rank(X)=k –, where T ≥ k.

• Leading Cases:
– Pure heteroscedasticity: E[ij|X] = ij = i2 if i=j
=0 if i ≠j
 Var[i|X] = i2

– Pure autocorrelation: E[ij|X] = ij if i ≠j


= 2 if i=j

Generalized Regression Model – Pure cases


• Relative to pure heteroscedasticity, LS gives each observation a
weight of 1/T. But, if the variances are not equal, then some
observations (low variance ones) are more informative than others.
Y

1

X1 X2 X3 X4 X5

4
RS-11

Generalized Regression Model – Pure cases


• Relative to pure autocorrelation, LS is based on simple sums, so the
information that one observation (today’s) might provide about
another (tomorrow’s) is never used.

Note: Heteroscedasticity and autocorrelation are different problems


and generally occur with different types of data. But, the implications
for OLS are the same.

GR Model: OLS Properties


• Unbiased
Given assumption (A2), the OLS estimator b is still unbiased. (Proof
does not rely on Σ):
E[b|X] =  + (X’X)-1 E[X’|X] = 0.

• Consistency
We relax (A2). Now, we assume use (A2’) instead. To get consistency,
we need VarT[b|X] → ∞ as T → ∞:
VarT[b|X] = (X’X)-1 XΣX (X’X)-1
= (1/T )(X’X/T)-1 (XΣX/T) (X’X/T)-1
Assumptions:
- plim (XX/T) = QXX a pd matrix of finite elements
- plim (XΣX/T) = QXΣX a finite matrix.

Under these assumptions, we get consistency for OLS.

5
RS-11

GR Model: OLS Properties


• Asymptotic normality?
√T (b – ) = (X’X/T)-1 (X’ε/√T)

Asymptotic normality for OLS followed from the application of the


CLT to X’ε/√T: 1 1
Q Q Q
b  a
 N( , xx xx xx )
T
where QXΣX = limT→∞Var[ T-½ t xt t].

In the context of the GR Model:


- Easy to do for heteroscedastic data. We can use the Lindeberg-
Feller (assuming only independence) version of the CLT.

- Difficult for autocorrelated data, since X’ε/√T is not longer an


independent sum. We will need more assumptions to get asymptotic
normality.

GR Model: Robust Covariance Matrix


• Σ = 2 is unknown. It has Tx(T+1)/2 elements to estimate. Too
many! A solution? Be explicit about (A3’): we model Σ.

• But, models for autocorrelation and/or heteroscedasticity may be


incorrect. The robust approach estimates VarT[b|X], without
specifying (A3’) –i.e., a covariance robust to misspecifications of (A3’).

• We need to estimate VarT[b|X] = (X’X)-1 XΣX (X’X)-1

• It is important to notice a distinction between estimating


Σ, a (TxT) matrix  difficult with T observations.
& estimating
XΣX = ij ij xi xj, a (kxk) matrix  easier!

6
RS-11

GR Model: Robust Covariance Matrix


• We will not be estimating Σ = 2. That is, we are not estimating
Tx(T+1)/2 elements. Impossible with T observations!

• We will estimate XΣX = ij ij xi xj, a (kxk) matrix. That is, we
are estimating [kx(k+1)]/2 elements.

• This distinction is very important in modern applied econometrics:


– The White estimator
– The Newey-West estimator

• Both estimators produce a consistent estimator of VarT[b|X]. To get


consistency, they both rely on the OLS residuals, e. Since b
consistently estimates , the OLS residuals, e, are also consistent
estimators of . We use e to consistently estimate XΣX.

GR Model: XΣX
• Q: How does XΣX look like? Time series intuition.
We look at the simple linear model, with only one regressor (in this
case, xii is just a scalar). Assume xii is covariance stationary (see Lecture
13) with autocovariances γj. Then, we derive XΣX:

XΣX = Var[X’ε/√T] = Var[(1/√T) (x11+ x22 + ... + xTT)]


= (1/T)[Tγ0+(T -1)(γ1+γ-1)+(T -2)(γ2+γ-2)+... +1 (γT-1+γ1-T)]
1 T 1
  0   (T  j )( j   j )
T j 1
T 1
1 T 1
   j   j ( j   j )
j   T 1 T j 1

where γj is the autocovariance of xii at lag j (γ0= σ2 = variance of xii).

7
RS-11

GR Model: XΣX
Under some conditions (autocovariances are “l-summable”, so
j j|γj|<∞), then
 1 T  
X ' X  var  
 T t 1
( xt et )  

p

j  
j

Note: In the frequency domain, we define the spectrum of xe at


frequency ω as:
1 
S ( )    j e  i j
2 j   1

Then, Q*= 2π S(0) (Q* is called the long-run variance.)

Covariance Matrix: The White Estimator


• The White estimator simplifies the estimation since it assumes
heteroscedasticity only –i.e., γj =0 (for j≠0). That is, Σ is a diagonal
matrix, with diagonal elements i2. Thus, we need to estimate:
Q* = (1/T) XΣX = (1/T) i i2 xi xi

• The OLS residuals, e, are consistent estimators of . This suggests


using ei2 to estimate i2.

That is, we estimate (1/T) XΣX with S0 = (1/T) i ei2 xi xi.

Note: The estimator is also called the sandwich estimator or


the White estimator (also known as Eiker-Huber-White
estimator).
Halbert White (1950-2012, USA)

8
RS-11

Covariance Matrix: The White Estimator


• White (1980) shows that a consistent estimator of Var[b|X] is
obtained if the squared residual in observation i –i.e., ei2-- is used as
an estimator of i2. Taking the square root, one obtains a
heteroscedasticity-consistent (HC) standard error.

• Sketch of proof.
Suppose we observe i. Then, each element of Q* would be equal to
E[i2 xi xi|xi].
Then, by LLN plim (1/T) ii2 xi xi = plim (1/T) i i2 xi xi

Q: Can we replace i2 by ei2? Yes, since the residuals e are consistent.
Then, the estimated HC variance is:
Est. VarT[b|X] = ( 1/T) (X’X/T)-1 [i ei2 xi xi/T] (X’X/T)-1

Covariance Matrix: The White Estimator


• Note that (A3) was not specified. That is, the White estimator is
robust to a potential misspecifications of heteroscedasticity in (A3).

• The White estimator allows us to make inferences using the OLS


estimator b in situations where heteroscedasticity is suspected, but we
do not know enough to identify its nature.

• Since there are many refinements of the White estimator, the White
estimator is usually referred as HC0 (or just “HC”):
HC0 = (X’X)-1 X’ Diag[ei2] X (X’X)-1

9
RS-11

The White Estimator: Some Remarks


(1) The White estimator is consistent, but it may not perform well in
finite samples –see, MacKinnon and White (1985). A good small
sample adjustment, HC3, following the logic of analysis of outliers:
HC3 = (X’X)-1 X’ Diag[ei2/(1-hii )2]X (X’X)-1
where hii = xi(X’X)-1 xi.
HC3 is also recommended by Long and Ervin (2000).
(2) The White estimator is biased (show it!). Biased corrections are
popular –see above & Wu (1986).
(3) In large samples, SEs, t-tests and F-tests are asymptotically valid.
(4) The OLS estimator remains inefficient. But inferences are
asymptotically correct.
(5) The HC standard errors can be larger or smaller than the OLS
ones. It can make a difference to the tests.

The White Estimator: Some Remarks


(6) It is used, along the Newey-West estimator, in almost all papers.
Included in all the packaged software programs. In R, you can use the
library “sandwich,” to calculate White SEs. They are easy to program:

# White SE in R
White_f <- function(y,X,b) {
T <- length(y); k <- length(b);
yhat <- X%*%b
e <- y-yhat
hhat <- t(X)*as.vector(t(e))
G <- matrix(0,k,k)
za <- hhat[,1:k]%*%t(hhat[,1:k])
G <- G + za
F <- t(X)%*%X
V <- solve(F)%*%G%*%solve(F)
white_se <- sqrt(diag(V))
ols_se <- sqrt(diag(solve(F)*drop((t(e)%*%e))/(T-k)))
l_se = list(white_se,olse_se)
return(l_se) }

10
RS-11

Application: 3 Factor Fama-French Model

• We estimate the 3 factor F-F model for IBM returns, using monthly
data Jan 1990 – Aug 2016 (T=320):
(U) IBMRet - rf = 0 + 1 (MktRet - rf) + 2 SMB + 4 HML + 

> b <- solve(t(x)%*% x)%*% t(x)%*%y # OLS regression


> t(b)
x1 x2 x3
[1,] -0.2331471 0.01018722 0.0009802843 -0.004445901
> SE_b
x1 x2 x3
0.011286100 0.002671466 0.003727686 0.003811820 ⟹ OLS SE

> library(sandwich)
> reg <- lm(y~x -1)
> VCHC <- vcovHC(reg, type = "HC0")
> sqrt(diag(VCHC))
x xx1 xx2 xx3
0.011389299 0.002724617 0.004054315 0.004223813 ⟹ White SE HC0

Application: 3 Factor Fama-French Model


> VCHC <- vcovHC(reg, type = "HC3")
> sqrt(diag(VCHC))
x xx1 xx2 xx3
0.011583411 0.002808216 0.004322115 0.004416238 ⟹ White SE HC3

11
RS-11

Baltagi and Griffin’s Gasoline Data (Greene)


World Gasoline Demand Data, 18 OECD Countries, 19 years
Variables in the file are

COUNTRY = name of country


YEAR = year, 1960-1978
LGASPCAR = log of consumption per car
LINCOMEP = log of per capita income
LRPMG = log of real price of gasoline
LCARPCAP = log of per capita number of cars

See Baltagi (2001, p. 24) for analysis of these data. The article on
which the analysis is based is Baltagi, B. and Griffin, J., "Gasolne
Demand in the OECD: An Application of Pooling and Testing
Procedures," European Economic Review, 22, 1983, pp. 117-
137. The data were downloaded from the website for Baltagi's text.

Groupwise Heteroscedasticity: Gasoline (Greene)

Countries
are ordered
by the
standard
deviation
of their 19
residuals.

Regression of log of per capita gasoline use on log of per


capita income, gasoline price and number of cars per capita for
18 OECD countries for 19 years. The standard deviation varies
by country. The “solution” is “weighted least squares.”

12
RS-11

White Estimator vs. Standard OLS (Greene)

BALTAGI & GRIFFIN DATA SET

Standard OLS
+--------+--------------+----------------+--------+--------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]|
+--------+--------------+----------------+--------+--------+
Constant| 2.39132562 .11693429 20.450 .0000
LINCOMEP| .88996166 .03580581 24.855 .0000
LRPMG | -.89179791 .03031474 -29.418 .0000
LCARPCAP| -.76337275 .01860830 -41.023 .0000

| White heteroscedasticity robust covariance matrix |


+----------------------------------------------------+
Constant| 2.39132562 .11794828 20.274 .0000
LINCOMEP| .88996166 .04429158 20.093 .0000
LRPMG | -.89179791 .03890922 -22.920 .0000
LCARPCAP| -.76337275 .02152888 -35.458 .0000

Autocorrelated Residuals: Gasoline Demand

logG=β1 + β2logPg + β3logY + β4logPnc + β5logPuc + ε

13
RS-11

Newey-West Estimator
• Now, we also have autocorrelation. We need to estimate
Q* = (1/T) XΣX = (1/T) ij ij xi xj

• Newey and West (1987) follow White (1980) to produce a HAC


(Heteroscedasticity and Autocorrelation Consistent) estimator of Q*,
also referred as long-run variance (LRV): Use eiej to estimate ij
 natural estimator of Q*: (1/T) ij xiei ejxj

Or using time series notation, estimator of Q*: (1/T) ts xtet esxs

• That is, we have a sum of the estimated autocovariances of xtεt, Гj:


T 1
T ( j )   E[x  
j   ( T 1 )
t t t j xt  j ' ]

Whitney Newey, USA Kenneth D. West, USA

Newey-West Estimator
• Natural estimator of Q*: ST = (1/T) ts xtet esxs.

Note: If xtεt are serially uncorrelated, the autocovariances vanish. We


are left with the White estimator.

Under some conditions (autocovariances are “l-summable”), then


 1 T
 
Q *  var 
 T
 ( x e ) 
t 1
t t 
p

j  
T ( j)


• Natural estimator of Q*: ST   ˆ ( j )
j  
T

• We can estimate Q* in two ways:


(1) parametrically, assuming a model to calculate γj.
(2) non-parametrically, using kernel estimation.
Note: (1) needs a specification of (A3’); while (2) does not.

14
RS-11

Newey-West Estimator
• The parametric estimation uses an ARMA model – say, an AR(2) –
to calculate γj.

• The non-parametric estimation uses:


L
1 j
ST  k( j)ˆT ( j), where ˆT ( j)   xt et xt  j et  j  ˆT ( j) ( j  0).
j L T t  j1
• Issues:
– Order of ARMA in parametric estimation or number of lags (L) in
non-parametric estimation.
– Choice of k(j) weights –i.e., kernel choice.
– The estimator, ST, needs to be psd.

• NW propose a robust –no model for (A3’) needed– non-parametric


estimator.

Newey-West Estimator

• Natural estimator of Q*: ST   ˆ
j  
T ( j)

Issue 1: This sum has T2 terms. It is difficult to get convergence.

Solution: We need to make sure the sum converges. Cutting short the
sum is one way to do it, but we need to careful, for consistency the
sum needs to grow as T→ ∞ (we need to sum infinite Гj’s).

• Trick: Use a truncation lag, L, that grows with T but at a slower rate
–i.e., L=L(T); say, L=0.75*(T)1/3-1. Then, as T → ∞ and L/T→ 0:
L (T )
QT*  
j   L (T )
T ( j) 
p
Q*

• Replacing Г(j) by its estimate, we get ST, which would be consistent


for Q* provided that L(T) does not grow too fast with T.

15
RS-11

Newey-West Estimator
• Issue 2 (& 3): ST needs to be psd to be a proper covariance matrix.

• Newey-West (1987): Based on a quadratic form and using the


Bartlett kernel produce a consistent psd estimator of Q*:
T 1
j ˆ
ST  
j   ( T 1)
k(
L (T )
) T ( j )
j | j|
where k ( )  1 is the Bartlett kernel or window,
L (T ) L 1
and L(T) is its bandwidth.

• Intuition for Bartlett kernel: Use weights in the sum that imply that
the process becomes less autocorrelated as time goes by –i.e, the
terms have a lower weight in the sum as the difference between t and
s grows.

Newey-West Estimator
• Other kernels work too. Typical requirements for k(.):
– |k(x)| ≤ 1;
– k(0) = 1;
– k(x) = k(−x) for all x∈R,
– ∫|k(x)| dx <∞;
– k(.) is continuous at 0 and at all but a finite number of other points
in R, and


k ( x)e it dx  0,   

The last condition is bit technical and ensures psd, see Andrews
(1991).

16
RS-11

Newey-West Estimator
• Two components for the NW HAC estimator:
(1) Start with Heteroscedasticity Component:
S0 = (1/T) i ei2 xi xi – the White estimator.

(2) Add the Autocorrelation Component


ST = S0 + (1/T) l kL(l) t=l+1,...,T (xt-let-l etxt+ xtet et-lxt-l)
where
j | j|
k( )  1 –The Bartlett kernel
L (T ) L 1
 linearly decaying weights.
Then,
Est. Var[b] = (1/T) (X’X/T)-1 S (X’X/T)-1 –NW’s HAC Var.

• Under suitable conditions, as L, T → ∞, and L/T→ 0, ST → Q*.


Asymptotic inferences on β, based on OLS b, can be done with t-test
and Wald tests using N(0,1) and χ2 critical values, respectively.

NW Estimator: Alternative Computation


• The sum-of-covariance estimator can alternatively be computed in
the frequency domain as a weighted average of periodogram ordinates
(an estimator of the spectrum at frequency (2π j/T). To be discussed
in Time Series lectures.):
T 1
S TWP  2  K T ( 2 j / T ) I xeex ( 2 j / T )
j 1
T 1
where K T  T 1  K T ( u / L ) e iwu and Ixe is the periodogram of xtet at
u 0
frequency ω:
T
I xeex ( )  d xe ( )d xe ( )' , where d xe ( )  2  ( xt et )eit
t 1

• Under suitable conditions, as L & T → ∞ and L/T→ 0,


SWPT → Q*.

17
RS-11

NW Estimator: Kernel Choice


• Other kernels, kL(l), besides the Bartlett kernel, can be used:

- Parzen kernel –Gallant (1987).


k L ( l )  1  6 l 2  6 | l |3 for 0 | l | 1 / 2
 2 (1 | l | )
3
for 0 | l | 1 / 2
- 0 otherwise
Quadratic spectral (QS) kernel –Andrews (1991):
kL(l) = 25/(12π2l2)[sin(6 π l/5)/(6 π l) - cos(6 π l/5)]

- Daniell kernel –Ng and Perron (1996):


kL(l) =sin(π l)/(π l)

• These kernels are all symmetric about the vertical axis. The Bartlett
and Parzen kernels have a bounded support [−1, 1], but the other
two have unbounded support.

NW Estimator: Kernel Choice

kL(x)

x
• Q: In practice –i.e., in finite samples– which kernel to use? And
L(T)? Asymptotic theory does not help us to determine them.

• Andrews (1991) finds optimal kernels and bandwidths by


minimizing the (asymptotic) MSE of the LRV. The QS kernel is
8.6% more efficient than the Parzen kernel; the Bartlett kernel is the
worst one. (BTW, different kernels have different optimal L.)

18
RS-11

NW Estimator: Remarks
• Today, the HAC estimators are usually referred as NW estimators,
regardless of the kernel used if they produce a psd covariance matrix.

• All econometric packages (SAS, SPSS, Eviews, etc.) calculate NW


SE. In R, you can use the library “sandwich,” to calculate NW SEs:
> NeweyWest(x, lag = NULL, order.by = NULL, prewhite = TRUE, adjust = FALSE,
diagnostics = FALSE, sandwich = TRUE, ar.method = "ols", data = list(), verbose = FALSE)

Example:
## fit investment equation using the 3 factor Fama French Model for IBM returns,
fit <- lm(y ~ x -1)

## NeweyWest computes the NW SEs. It requires lags=L & suppression of prewhitening


NeweyWest(fit, lag = 4, prewhite = FALSE)

Note: It is usually found that the NW SEs are downward biased.

NW Estimator: Remarks
• You can also program the NW SEs yourself. In R:
NW_f <- function(y,X,b,lag)
{ F <- t(X)%*%X
T <- length(y); V <- solve(F)%*%G%*%solve(F)
k <- length(b); nw_se <- sqrt(diag(V))
yhat <- X%*%b ols_se <- sqrt(diag(solve(F)*drop((t(e)%*%e))/(T-k)))
e <- y - yhat l_se = list(nw_se,ols_se)
hhat <- t(X)*as.vector(t(e)) return(l_se)
G <- matrix(0,k,k) }
a <- 0
w <- numeric(T) NW_f(y,X,b,lag=4)
while (a <= lag) {
Ta <- T - a
ga <- matrix(0,k,k)
w[lag+1+a] <- (lag+1-a)/(lag+1)
za <- hhat[,(a+1):T] %*% t(hhat[,1:Ta])
ga <- ga + za
G <- G + w[lag+1+a]*ga
a <- a+1
}

19
RS-11

NW Estimator: Example in R
Example: We estimate the 3 factor F-F model for IBM returns:
> library(sandwich)
> reg <- lm(y~x -1)
> reg$coefficients
x xx1 xx2 xx3
-0.2331470817 0.0101872239 0.0009802843 -0.0044459013 ⟹ OLS b

> SE_HC <- diag(sqrt(abs(vcovHC(reg, type= "HC3"))))


> SE_HC
x xx1 xx2 xx3
0.011583411 0.002808216 0.004322115 0.004416238 ⟹ White SE HC3

> NW <- NeweyWest(reg, lag = 4, prewhite = FALSE)


> SE_NW <- diag(sqrt(abs(NW)))
> SE_NW
x xx1 xx2 xx3
0.023629578 0.002798171 0.003895311 0.005431146 ⟹ NW SE

NW Estimator: Remarks
• Parametric estimators of Q* are simple and perform reasonably
well. But, we need to specify the ARMA model. Thus, they are not
robust to misspecification of (A3’). This is the appeal of White & NW.

• NW SEs perform poorly in Monte Carlo simulations:


- NW SEs tend to be downward biased.
- The finite-sample performance of tests using NW SE is not well
approximated by the asymptotic theory.
- Tests have serious size distortions.

• A key assumption in establishing consistency is that L → ∞ as


T → ∞, but L/T→ 0. But, in practice, the fraction L/T is never equal
to 0, but approaches some positive fraction b (b є (0,1]). Under this
situation, we need new asymptotics to derive properties of estimator.

20
RS-11

NW Estimator: Remarks
• There are estimators of Q* that are not consistent, but with better
small sample properties. See Kiefer, Vogelsang and Bunzel (2000).

• The SE based on these inconsistent estimators of Q* that are used


for testing are referred as Heteroskedasticity-Autocorrelation Robust
(HAR) SE.

• More on this topic in Lecture 13.

References: Müller (2014) & Sun (2014). There is a recent review (not
that technical) paper by Lazarus, Lewis, Stock & Watson (2016) with
recommendations on how to use these HAR estimators.

Autocorrelated Residuals: Gasoline Demand

logG=β1 + β2logPg + β3logY + β4logPnc + β5logPuc + ε

21
RS-11

NW Estimator vs. Standard OLS (Greene)


BALTAGI & GRIFFIN DATA SET
--------+--------------------------------------------------
Variable| Coefficient Standard Error t-ratio P[|T|>t]
Standard OLS
-------+---------------------------------------------------
Constant| -21.2111*** .75322 -28.160 .0000
LP| -.02121 .04377 -.485 .6303
LY| 1.09587*** .07771 14.102 .0000
LPNC| -.37361** .15707 -2.379 .0215
LPUC| .02003 .10330 .194 .8471
--------+------------------------------------------------
--------+--------------------------------------------------
Variable| Coefficient Standard Error t-ratio P[|T|>t]
Robust VC Newey-West, Periods = 10
--------+--------------------------------------------------
Constant| -21.2111*** 1.33095 -15.937 .0000
LP| -.02121 .06119 -.347 .7305
LY| 1.09587*** .14234 7.699 .0000
LPNC| -.37361** .16615 -2.249 .0293
LPUC| .02003 .14176 .141 .8882
--------+--------------------------------------------------

Generalized Least Squares (GLS)


• Assumptions (A1), (A2), (A3’) & (A4) hold. That is,
(A1) DGP: y = X  +  is correctly specified.
(A2) E[|X] = 0
(A3’) Var[|X] = σ2  (recall  is symmetric  T’T= )
(A4) X has full column rank –i.e., rank(X)=k–, where T ≥ k.

Note:  is symmetric  exists T ∋ T’T=   T’-1  T-1= I

• We transform the linear model in (A1) using P = -1/2.


P = -1/2  P’P = -1
Py = PX + P or
y* = X* + *.
E[**’|X*] = PE[’|X*]P’ = PE[’|X]P’ = σ2PP’
= σ2 -1/2  -1/2 = σ2IT  back to (A3)

22
RS-11

Generalized Least Squares (GLS)


• The transformed model is homoscedastic:
E[**’|X*]= PE[’|X*]P’ = σ2PP’ = σ2IT

• We have the CLM framework back  we can use OLS!

• Key assumption:  is known, and, thus, P is also known; otherwise


we cannot transformed the model.

• Q: Is  known?

Alexander C. Aitken (1895 –1967, NZ)

Generalized Least Squares (GLS)


• Aitken Theorem (1935): The Generalized Least Squares estimator.
Py = PX + P or
y* = X* + *.
E[**’|X*] = σ2IT
We can use OLS in the transformed model. It satisfies G-M theorem.
Thus, the GLS estimator is:
bGLS = b* = (X*’X*)-1 X*’y* = (X’P’PX)-1 X’P’Py
= (X’Ω-1X)-1 X’Ω-1y

Note I: bGLS ≠ b. bGLS is BLUE by construction, b is not.


Note II: Both unbiased and consistent. In practice, both estimators
will be different, but not that different. If they are very different,
something is rotten in Denmark.

23
RS-11

Generalized Least Squares (GLS)


• Check unbiasedness:
bGLS = (X’Ω-1X)-1 X’Ω-1y =  +(X’Ω-1X)-1 X’Ω-1 
E[bGLS |X]= 

• Efficient Variance
Var[bGLS |X] = E[(bGLS - )(bGLS - )’|X]
= E[(X’Ω-1X)-1 X’Ω-1  ’ X’Ω-1 (X’Ω-1X)-1|X]
= (X’Ω-1X)-1 X’Ω-1 E[’|X] Ω-1X(X’Ω-1X)-1
= σ2(X’Ω-1X)-1

Note: bGLS is BLUE. This “best” variance can be derived from


Var[bGLS|X] = σ2(X*’X*)-1 = σ2(X’Ω-1X)-1
Then, the usual variance of the OLS estimator is biased and
inefficient!

Generalized Least Squares (GLS)


• If we relax the CLM assumptions (A2) and (A4), as we did in
Lecture 7, we only have asymptotic properties for GLS:
– Consistency - “well behaved data.”
– Asymptotic distribution under usual assumptions.
(easy for heteroscedasticity, complicated for autocorrelation.)
– Wald tests and F-tests with usual asymptotic χ2 distributions.

24
RS-11

Consistency (Green)
Use Mean Square
1
 2  X'Ω -1 X 
ˆ|X ]=
Var[ β    0?
n  n 
 X'Ω -1 X

Requires to be   "well behaved"
 n 
Either converge to a constant matrix or diverge.
Heteroscedasticity case:
X'Ω -1 X 1 1
  i1
n
x i x i'
n n ii
Autocorrelatio n case:
X'Ω -1 X 1 1
  i1  j 1
n n
x i x j'. n 2 terms. Convergence is unclear.
n n ij

Consistency – Autocorrelation case (Green)

X '  1 X 1 T T 1  2 T T
  xi x j '  02   t s
T T j 1 i 1 ij T t 1 s 1

• If the {Xt} were uncorrelated –i.e., ρk=0–, then Var[bGLS|X] → 0.

• We need to impose restrictions on the dependence among the Xt’s.


Usually, we require that the autocorrelation, ρk, gets weaker as t-s
grows (and the double sum becomes finite).

25
RS-11

Asymptotic Normality (Green)


1
ˆ  β)  n  X ' Ω X  1 X ' Ω 1ε
1
n(β
 n  n
Converge to normal with a stable variance O(1)?
1
 X ' Ω 1 X 
   a constant matrix?
 n 
1 1
X ' Ω ε  a mean to which we can apply the
n
central limit theorem?
Heteroscedasticity case?
1 1 n x i  i     x
n
X ' Ω 1 ε =  
n i1 i  i
. Var  i
  
  2 , i is just data.
 i
  i 
Apply Lindeberg-Feller. (Or assuming x i / i is a draw from a common
distribution with mean and fixed variance - some recent treatments.)
Autocorrelation case?

Asymptotic Normality – Autocorrelation case


For the autocorrelation case
1 1
 i1  j1 Ω ij x i x j  i i
n n
X ' Ω 1 ε =
n n

Does the double sum converge? Uncertain. Requires elem ents


of Ω  1 to becom e sm all as the distance betw een i and j increases.
(Has to resem ble the heteroscedasticity case.)

• The dependence is usually broken by assuming {xi et} form a mixing


sequence. The intuition behind mixing is simple; but, the formal
details and its application to the CLT can get complicated.

• Intuition: {Zt} is a mixing sequence if any two groups of terms of the


sequence that are far apart from each other are approximately
independent --and the further apart, the closer to being independent.

26
RS-11

Brief Detour: Time Series


• With autocorrelated data, we get dependent observations. Recall,
t = t-1 + ut

• The independence assumption (A2’) is violated. The LLN and the


CLT cannot be easily applied, in this context. We need new tools
and definitions.

• We will introduce the concepts of stationarity and ergodicity. The


ergodic theorem will give us a counterpart to the LLN.

• To get asymptotic distributions, we also need a CLT for dependent


variables, using the concept of mixing and stationarity. Or we can
rely on the martingale CLT. We will leave this as “coming attractions.”

Brief Detour: Time Series - Stationarity


• Consider the joint probability distribution of the collection of RVs:

F ( z t1 , z t 2 ,..... z tn )  P ( Z t1  z t1 , Z t2  z t 2 ,... Z tn  z tn )

Then, we say that a process is


1st order stationary if F ( zt1 )  F ( zt1  k ) for any t1 , k

2nd order stationary if F ( zt1 , zt2 )  F ( zt1 k , zt2 k ) for any t1 , t2 , k

Nth-order stationary if F ( zt1 .....ztn )  F ( zt1 k .....ztn k ) for any t1, tn , k

• Definition. A process is strongly (strictly) stationary if it is a Nth-order


stationary process for any N.

27
RS-11

Brief Detour: Time Series – Moments


• The moments describe a distribution. We calculate the moments as
usual.
E (Z t )   t 
 Z t f ( z t ) dz t

Var ( Z t )   2t  E ( Z t   t ) 2 
 ( Z t   t ) 2 f ( z t ) dz t

Cov ( Z t , Z t )  E [( Z t   t )( Z t   t )]
1 2 1 1 2 2

cov( Z t , Z t )
 ( t1 , t 2 )  1 2

2 2
t1 t2

• Stationarity requires all these moments to be independent of time.

Brief Detour: Time Series – Moments


• For strictly stationary process: t   and  t2   2

because F ( z t1 )  F ( z t1  k )   t1   t1  k  

provided that E ( Z t )  , E ( Z 2 t )  
Then, F ( z t1 , z t 2 )  F ( z t 1  k , z t 2  k ) 
cov( z t1 , z t 2 )  cov( z t1  k , z t 2  k ) 
 ( t1 , t 2 )   ( t 1  k , t 2  k )
let t1  t  k and t 2  t , then
 ( t1 , t 2 )   ( t  k , t )   ( t , t  k )   k

The correlation between any two RVs depends on the time difference.

28
RS-11

Brief Detour: Time Series – Weak Stationarity

• A process is said to be Nth-order weakly stationary if all its joint


moments up to order N exist and are time invariant.

• A Covariance stationary process (or 2nd order weakly stationary) has:


– constant mean
– constant variance
– covariance function depends on time difference between RV.

Brief Detour: Time Series – Ergodicity


• We want to allow as much dependence as the LLN allows us to do it.
• But, stationarity is not enough, as the following example shows:

• Example: Let {Ut} be a sequence of i.i.d. RVs uniformly distributed


on [0, 1] and let Z be N(0, 1) independent of {Ut}.
Define Yt = Z + Ut. Then, Yt is stationary (why?), but
n
1 1
Yn 
n Y
t 1
t 
no
E (Yt ) 
2
1
Yn  Z 
p
2
The problem is that there is too much dependence in the sequence {Yt}.
In fact the correlation between Y1 and Yt is always positive for any
value of t.

29
RS-11

Brief Detour: Time Series – Ergodicity of mean


• We want to estimate the mean of the process {Zt}, μ(Zt). But, we
need to distinguishing between ensemble average and time average:
m

 Z i
– Ensemble Average z  i1
m
n

 Z t
– Time Series Average z  t1
n

Q: Which estimator is the most appropriate?


A: Ensemble Average, the average of re-runs of identical experiments.
But, it is impossible to calculate. We only observe one experiment: Zt.

• Q: Under which circumstances we can use the time average (only one
realization of {Zt})? Is the time average an unbiased and consistent
estimator of the mean? The Ergodic Theorem gives us the answer.

Brief Detour: Time Series – Ergodicity of mean


• Recall the sufficient conditions for consistency of an estimator: the
estimator is asymptotically unbiased and its variance asymptotically
collapses to zero.
1. Q: Is the time average asymptotically unbiased? Yes.
1 1
E (z) 
n
 E (Z
t
t )
n
 
t

2. Q: Is the variance going to zero as T grows? It depends.

0 0
n n n n n
1
var(z) 
n2
cov(Z , Z )  n 
t 1 s 1
t s 2
t 1 s 1
t s 
n2
(
t 1
t 1  t 2 
⋯ t n ) 

0
 [(0  1  ⋯ n1 )  (1  0  1  ⋯ n2 ) 
n2
 ⋯ ((n1)  (n2)  ⋯ 0 )] 

30
RS-11

Brief Detour: Time Series – Ergodicity of mean


n 1
0 0
  (1 
k
var( z )  2
( n  k ) k  ) k
n k   ( n  1)
n k
n

0
 (1 
k ?
lim var( z )  lim ) k 
 0
n  n  n n
k

• If Zt were uncorrelated, the variance of the time average would be


O(n-1). Since independent random variables are necessarily
uncorrelated (but not vice versa), we have just recovered a form of the
LLN for independent data.

Q: How can we make the remaining part, the sum over the upper
triangle of the covariance matrix, go to zero as well?
A: We need to impose conditions on ρk. Conditions weaker than "they
are all zero;" but, strong enough to exclude the sequence of identical
copies.

Brief Detour: Time Series – Ergodicity of mean


• We use two inequalities to put upper bounds on the variance of the
time average:
n 1 n t n 1 n t n 1 

 t 1 k 1
k  
t 1 k 1
k   
t 1 k 1
k

Covariances can be negative, so we upper-bound the sum of the actual


covariances by the sum of their magnitudes. Then, we extend the inner
sum so it covers all lags. This might of course be infinite (sequence-of-
identical-copies).
• Definition: A covariance-stationary process is ergodic for the mean if
p lim z  E ( Z t )  
Ergodicity Theorem: Then, a sufficient condition for ergodicity for
the mean is
 k  0 as k  

31
RS-11

Brief Detour: Time Series – Ergodicity of 2nd


moments
• A sufficient condition to ensure ergodicity for second moments is:

 k
 k  

A process which is ergodic in the first and second moments is usually


referred as ergodic in the wide sense .

• Ergodicity under Gaussian Distribution


If {Zt}is a stationary Gaussian process, 
k
k 

is sufficient to ensure ergodicity for all moments.

Note: Recall that only the first two moments are needed to describe
the normal distribution.

Test Statistics (Assuming Known Ω) (Green)


• Back to GLS. From (A1)-(A4), we get:
bGLS = (X*’X*)-1 X*’y* = (X’Ω-1X)-1 X’Ω-1y
Var[bGLS|X] = σ2(X*’X*)-1 = σ2(X’Ω-1X)-1

• With known Ω, apply all familiar results to the transformed model:


- With normality, (A5) holds, t- and F-statistics apply to least squares
based on Py and PX

- Without normality, we rely on asymptotic results, where we get


asymptotic normality for bGLS. We use Wald statistics and the chi-
squared distribution, still based on the transformed model.

• Key step to do GLS: Derive the transformation matrix P = -1/2.

32
RS-11

(Weighted) GLS: Pure Heteroscedasticity


• Key step to do GLS: Derive the transformation matrix P = -1/2.
 1 0 ... 0
0 2 ... 0 
Var[]  2   2 
0 0 ... 0
 
0 0 ... n 
1 / 1 0 ... 0
 
 0 1 / 2 ... 0 
 -1/2  
 0 0 ... 0 
 0 0 ... 1 / n 

1
ˆ  ( X'Ω -1 X ) 1 ( X'Ω -1 y )    n 1 x x     n 1 x y 
β i i i i

i1
i  
i1
i 
2
ˆ
 y  x β
 i1  i  i 
n

 2
ˆ 
 i   WLS: Think of [ωi]-1/2 as weights.
nK
We do OLS with the weighted data.

GLS: First-order Autocorrelation -AR(1)- Case

• Let t = t-1 + ut (a first order autocorrelated process).


Let ut = non-autocorrelated, white noise error ~ D(0, σu2)

• Then, t = t-1 + ut (the autoregressive form)


= (t-2 + ut-1) + ut
= ...
= ut + ut-1 + 2ut-2 + 3ut-3 + ...
= Σj=0 j ut-j (a moving average)

• Var[t] = Σj=0 2j Var[ut-j] = Σj=0 2j σu2


= σu2 /(1 – 2) –we need to assume ||< 1.

• Easier:
Var[t] = 2 Var[t-1] + Var[ut]  Var[t] = σu2 /(1- 2)

33
RS-11

GLS: AR(1) Case - Autocovariances

Continuing...
Cov[ t ,  t 1 ] = Cov[ t 1  ut ,  t 1 ]
= Cov[ t 1 ,  t 1 ]  Cov[ut ,  t 1 ]
= Var[ t-1 ]  Var[ t ]
u2
=
(1  2 )
Cov[ t ,  t 2 ] = Cov[ t 1  ut ,  t 2 ]
= Cov[ t 1 ,  t 2 ]  Cov[ut ,  t 2 ]
= Cov[ t ,  t 1 ]
2 u2
= and so on.
(1  2 )

GLS: AR(1) Case - Autocorrelation Matrix


• Now, we get Σ = σ2 .

 1  2 L
⋯  T 1 
 
 1  L
⋯ T 2 
 u   2
2
 Ω  
2
2 
   1 L
⋯ T 3 
1   
 ⋯M ⋯
M ⋯
M ⋯
O ⋯
M 
  1
T
 T 2
 T 3

L 1 

( N o te , tr a c e Ω = n a s r e q u ir e d .)

34
RS-11

GLS: First-order Autocorrelation Case

• Then, we can get the transformation matrix P = -1/2:

 1  2 0 0 ... 0
 
  1 0 ... 0
Ω 1 / 2
 0  1 ... 0
 
 ... ... ... ... ...
 
 0 0 0  0

 
 1  2 y 
1
 
 y  y 
 2 21 
Ω 1 / 2 y =  y  y   GLS: Transformed y*.
3 2
 
 ... 
 y  
 T T 1 

GLS: The Autoregressive Transformation


• With AR models, sometimes it is easier to transform the data by
taking pseudo differences.

• For the AR(1) model, we have,

yt  x t 'β  t  t   t 1  u t
 y t 1   x t -1 'β   t 1

y t   y t 1  ( x t   x t -1 )'β + (  t   t 1 )
y t   y t 1  ( x t   x t -1 )'β + u t

(Where did the first observation go?)

35
RS-11

GLS: Unknown  (Green)


• Problem with GLS:  is unknown.

• Solution: Estimate .  Feasible GLS (FGLS).

• For now, we will consider two methods of estimation:


– Two-step, or Feasible estimation. Estimate  first, then do GLS.
Emphasize same logic as White and Newey-West: We do not need
to estimate . We need to find a matrix that behaves the same as
(1/T)X-1X.
– Nice asymptotic properties of the FGLS estimator.

• ML estimation of , 2, and  all at the same time.


– Joint estimation of all parameters. Fairly rare. Some generalities.
– We will examine two applications:
- Harvey’s model of heteroscedasticity
- Beach-MacKinnon on the AR(1)model (see Lecture 13).

GLS: Specification of  (Green)


•  must be specified first.
• A full unrestricted  contains T(T+1)/2 - 1 parameters. (Why
minus 1? Remember, tr() = T, so one element is determined.)
•  is generally specified (modeled) in terms of a few parameters.
Thus,  = () for some small parameter vector . It becomes a
question of estimating .

• Examples:
(1) Var[i|X] = 2 exp(zi). Variance a function of  and some
variable zi (say, firm size or country).

(2) i with AR(1) process. We have already derived 2  as a


function of .

36
RS-11

Harvey’s Model of Heteroscedasticity (Green)


• The variance for observation i is a function of zi:
Var[i|X] = 2 exp(zi)
But, errors are not auto/cross correlated:
Cov[i,j|X] = 0

• The driving variable, z, can be firm size, a set of dummy variables -


for example, for countries. This example is the one used for the
estimation of the previous groupwise heteroscedasticity model.

• Then, we have a functional form for Σ = 2 


Σ = diagonal [exp(  + zi)],
 = log(2)
Once we specify  (and can be estimated), GLS is feasible.

GLS: AR(1) Model of Autocorrelation (Green)

• We have already derived Σ = 2  for the AR(1) case..

 1  2 L
⋯  T 1 
 
 1  L
⋯ T 2 
 u   2
2
 Ω 
2
2 
   1 L
⋯ T 3 
 1    
 M ⋯ M
⋯ M⋯ O
⋯ M
⋯ 
  T 1 T 2 T 3 ⋯
L 1 

• Now, if we estimate σu2 and , we can do FGLS.


.

37
RS-11

Estimated AR(1) Model (Greene)


AR(1) Model: e(t) = rho * e(t-1) + u(t)
Initial value of rho = .87566
Maximum iterations = 1
Method = Prais - Winsten
Iter= 1, SS= .022, Log-L= 127.593
Final value of Rho = .959411
Std. Deviation: e(t) = .076512
Std. Deviation: u(t) = .021577
Autocorrelation: u(t) = .253173
N[0,1] used for significance levels
--------+-------------------------------------------------
Variable| Coefficient Standard Error b/St.Er. P[|Z|>z]
--------+-------------------------------------------------
Constant| -20.3373*** .69623 -29.211 .0000
LP| -.11379*** .03296 -3.453 .0006
LY| .87040*** .08827 9.860 .0000
LPNC| .05426 .12392 .438 .6615
LPUC| -.04028 .06193 -.650 .5154
RHO| .95941*** .03949 24.295 .0000
--------+-------------------------------------------------
Standard OLS
Constant| -21.2111*** .75322 -28.160 .0000
LP| -.02121 .04377 -.485 .6303
LY| 1.09587*** .07771 14.102 .0000
LPNC| -.37361** .15707 -2.379 .0215
LPUC| .02003 .10330 .194 .8471

Two-Step Estimation (Green)


• The general result for estimation when  is estimated.

• GLS uses [X-1X]-1 X -1 y which converges in probability to .

• We seek a vector which converges to the same thing that this does.
Call it “Feasible GLS” or FGLS, based on [X X]-1 X y

• The object is to find a set of parameters such that


[X X]-1 X y – [X -1 X]-1 X -1 y  0

38
RS-11

Feasible GLS (Green)


For FGLS estimation, we do not seek an estimator of Ω
such that
ˆ -Ω  0

ˆ is nxn and does not "converge" to
This makes no sense, since Ω
anything. We seek a matrix Ω such that
ˆ -1 X - (1/n)X'Ω-1 X  0
(1/n)X'Ω
For the asymptotic properties, we will require that
ˆ -1 - (1/n)X'Ω-1  0
(1/ n)X'Ω
Note in this case, these are two random vectors, which we require
to converge to the same random vector.

Two-Step FGLS (Green)


• Theorem 8.5: To achieve full efficiency, we do not need an efficient
estimate of the parameters in , only a consistent one.

• Q: Why?

39
RS-11

Harvey’s Model (Green)


• Examine Harvey’s model once again.
Estimation:
(1) Two-step FGLS: Use the OLS to estimate   . Then, use
{X [Ω()]-1 X}-1 X’ [Ω()]-1 y to estimate .

(2) Full ML estimation. Estimate all parameters simultaneously.


A handy result due to Oberhofer and Kmenta –the “zig-zag”
approach.

Examine a model of groupwise heteroscedasticity.

Andrew C. Harvey, England

Harvey’s Model: Groupwise Heteroscedasticity

• We have a sample, yig, xig,…, with


N groups, each with Tg observations.
Each group variance: Var[εig] = σg2

• Define a group dummy variable.


dig = 1 if observation ig is in group j,
=0 otherwise.
Then, model variances as:
Var[εig] = σg2 exp(θ2d2 + … + θNdN)
Var1 = σg2 –normalized variance (remember dummy trap!)
Var2 = σg2 exp(θ2)
... etc.

40
RS-11

Harvey’s Model: Two-Step Procedure (Green)

• OLS is still consistent. Do OLS and keep e.


Step 1. Using e, calculate the group variances. That is,
- Est.Var1 = e1’e1/T1 estimates σg2
- Est.Var2 = e2’e2/T2 estimates σg2 exp(θ2)
- Estimator of θ2 is ln[(e2’e2/T2)/(e1’e1/T1)]
- .... etc.
Step 2. Now, use FGLS –weighted least squares. Keep WLS residuals
Step 3. Using WLS residuals, recompute variance estimators.

Iterate until convergence between steps 2 and 3.

GLS: General Remarks


• GLS is great (BLUE) if we know . Very rare case.
• It needs the specification of  –i.e., the functional form of
autocorrelation and heteroscedasticity.
• If the specification is bad  estimates are biased.
• In general, GLS is used for larger samples, because more
parameters need to be estimated.
• Feasible GLS is not BLUE (unlike GLS); but, it is consistent and
asymptotically more efficient than OLS.
• We use GLS for inference and/or efficiency. OLS is still unbiased
and consistent.
• OLS and GLS estimates will be different due to sampling error. But,
if they are very different, then it is likely that some other CLM
assumption is violated –likely, (A2’).

41
RS-11

Baltagi and Griffin’s Gasoline Data (Greene)


World Gasoline Demand Data, 18 OECD Countries, 19 years
Variables in the file are

COUNTRY = name of country


YEAR = year, 1960-1978
LGASPCAR = log of consumption per car
LINCOMEP = log of per capita income
LRPMG = log of real price of gasoline
LCARPCAP = log of per capita number of cars

See Baltagi (2001, p. 24) for analysis of these data. The article on
which the analysis is based is Baltagi, B. and Griffin, J., "Gasolne
Demand in the OECD: An Application of Pooling and Testing
Procedures," European Economic Review, 22, 1983, pp. 117-
137. The data were downloaded from the website for Baltagi's text.

Baltagi and Griffin’s Gasoline Data (Greene) -


ANOVA

42
RS-11

White Estimator vs. Standard OLS (Greene)

BALTAGI & GRIFFIN DATA SET

Standard OLS
+--------+--------------+----------------+--------+--------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]|
+--------+--------------+----------------+--------+--------+
Constant| 2.39132562 .11693429 20.450 .0000
LINCOMEP| .88996166 .03580581 24.855 .0000
LRPMG | -.89179791 .03031474 -29.418 .0000
LCARPCAP| -.76337275 .01860830 -41.023 .0000

| White heteroscedasticity robust covariance matrix |


+----------------------------------------------------+
Constant| 2.39132562 .11794828 20.274 .0000
LINCOMEP| .88996166 .04429158 20.093 .0000
LRPMG | -.89179791 .03890922 -22.920 .0000
LCARPCAP| -.76337275 .02152888 -35.458 .0000

Baltagi and Griffin’s Gasoline Data (Greene) –


Harvey’s Model
----------------------------------------------------------------------
Multiplicative Heteroskedastic Regression Model...
Ordinary least squares regression ............
LHS=LGASPCAR Mean = 4.29624
Standard deviation = .54891
Number of observs. = 342
Model size Parameters = 4
Degrees of freedom = 338
Residuals Sum of squares = 14.90436
Wald statistic [17 d.f.] = 699.43 (.0000) (Large)
B/P LM statistic [17 d.f.] = 111.55 (.0000) (Large)
Cov matrix for b is sigma^2*inv(X'X)(X'WX)inv(X'X)
--------+-------------------------------------------------------------
Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X
--------+-------------------------------------------------------------
Constant| 2.39133*** .20010 11.951 .0000
LINCOMEP| .88996*** .07358 12.094 .0000 -6.13943
LRPMG| -.89180*** .06119 -14.574 .0000 -.52310
LCARPCAP| -.76337*** .03030 -25.190 .0000 -9.04180
--------+-------------------------------------------------------------

43
RS-11

Baltagi and Griffin’s Gasoline Data (Greene) -


Variance Estimates = log[e(i)’e(i)/T]
Sigma| .48196*** .12281 3.924 .0001
D1| -2.60677*** .72073 -3.617 .0003 .05556
D2| -1.52919** .72073 -2.122 .0339 .05556
D3| .47152 .72073 .654 .5130 .05556
D4| -3.15102*** .72073 -4.372 .0000 .05556
D5| -3.26236*** .72073 -4.526 .0000 .05556
D6| -.09099 .72073 -.126 .8995 .05556
D7| -1.88962*** .72073 -2.622 .0087 .05556
D8| .60559 .72073 .840 .4008 .05556
D9| -1.56624** .72073 -2.173 .0298 .05556
D10| -1.53284** .72073 -2.127 .0334 .05556
D11| -2.62835*** .72073 -3.647 .0003 .05556
D12| -2.23638*** .72073 -3.103 .0019 .05556
D13| -.77641 .72073 -1.077 .2814 .05556
D14| -1.27341* .72073 -1.767 .0773 .05556
D15| -.57948 .72073 -.804 .4214 .05556
D16| -1.81723** .72073 -2.521 .0117 .05556
D17| -2.93529*** .72073 -4.073 .0000 .05556

Baltagi and Griffin’s Gasoline Data (Greene) -


OLS vs. Iterative FGLS
--------+-------------------------------------------------
Variable| Coefficient Standard Error b/St.Er. P[|Z|>z]
--------+-------------------------------------------------
|Ordinary Least Squares
|Cov matrix for b is sigma^2*inv(X'X)(X'WX)inv(X'X)
Constant| 2.39133*** .20010 11.951 .0000
LINCOMEP| .88996*** .07358 12.094 .0000
LRPMG| -.89180*** .06119 -14.574 .0000
LCARPCAP| -.76337*** .03030 -25.190 .0000
--------+------------------------------------------------
|FGLS - Regression (mean) function
Constant| 1.56909*** .06744 23.267 .0000
LINCOMEP| .60853*** .02097 29.019 .0000
LRPMG| -.61698*** .01902 -32.441 .0000
LCARPCAP| -.66938*** .01116 -59.994 .0000

• It looks like a substantial gain in reduced standard errors. OLS and


GLS estimates a bit different => problems?

44

You might also like