CLM: Review: - OLS Estimation
CLM: Review: - OLS Estimation
Lecture 11
GLS
CLM: Review
• Recall the CLM Assumptions
(A1) DGP: y = X + is correctly specified.
(A2) E[|X] = 0
(A3) Var[|X] = σ2 IT
(A4) X has full column rank –rank(X)=k–, where T ≥ k.
1
RS-11
(2) (A4) and (A5) – Lecture 7. Now, X stochastic: {xi,εi} i=1, 2, ...., T
is a sequence of independent observations. We require X to have finite
means and variances. Similar requirement for ε, but we also require
E[]=0. Two new assumptions:
(A2’) plim (X’/T) = 0.
(A4’) plim (XX/T)=Q.
We only get asymptotic results for b (consistency, asymptotic
normality). Tests only have large sample distributions. Boostrapping or
simulations may give us better finite sample behavior.
2
RS-11
3
RS-11
• Leading Cases:
– Pure heteroscedasticity: E[ij|X] = ij = i2 if i=j
=0 if i ≠j
Var[i|X] = i2
1
X1 X2 X3 X4 X5
4
RS-11
• Consistency
We relax (A2). Now, we assume use (A2’) instead. To get consistency,
we need VarT[b|X] → ∞ as T → ∞:
VarT[b|X] = (X’X)-1 XΣX (X’X)-1
= (1/T )(X’X/T)-1 (XΣX/T) (X’X/T)-1
Assumptions:
- plim (XX/T) = QXX a pd matrix of finite elements
- plim (XΣX/T) = QXΣX a finite matrix.
5
RS-11
6
RS-11
• We will estimate XΣX = ij ij xi xj, a (kxk) matrix. That is, we
are estimating [kx(k+1)]/2 elements.
GR Model: XΣX
• Q: How does XΣX look like? Time series intuition.
We look at the simple linear model, with only one regressor (in this
case, xii is just a scalar). Assume xii is covariance stationary (see Lecture
13) with autocovariances γj. Then, we derive XΣX:
7
RS-11
GR Model: XΣX
Under some conditions (autocovariances are “l-summable”, so
j j|γj|<∞), then
1 T
X ' X var
T t 1
( xt et )
p
j
j
8
RS-11
• Sketch of proof.
Suppose we observe i. Then, each element of Q* would be equal to
E[i2 xi xi|xi].
Then, by LLN plim (1/T) ii2 xi xi = plim (1/T) i i2 xi xi
Q: Can we replace i2 by ei2? Yes, since the residuals e are consistent.
Then, the estimated HC variance is:
Est. VarT[b|X] = ( 1/T) (X’X/T)-1 [i ei2 xi xi/T] (X’X/T)-1
• Since there are many refinements of the White estimator, the White
estimator is usually referred as HC0 (or just “HC”):
HC0 = (X’X)-1 X’ Diag[ei2] X (X’X)-1
9
RS-11
# White SE in R
White_f <- function(y,X,b) {
T <- length(y); k <- length(b);
yhat <- X%*%b
e <- y-yhat
hhat <- t(X)*as.vector(t(e))
G <- matrix(0,k,k)
za <- hhat[,1:k]%*%t(hhat[,1:k])
G <- G + za
F <- t(X)%*%X
V <- solve(F)%*%G%*%solve(F)
white_se <- sqrt(diag(V))
ols_se <- sqrt(diag(solve(F)*drop((t(e)%*%e))/(T-k)))
l_se = list(white_se,olse_se)
return(l_se) }
10
RS-11
• We estimate the 3 factor F-F model for IBM returns, using monthly
data Jan 1990 – Aug 2016 (T=320):
(U) IBMRet - rf = 0 + 1 (MktRet - rf) + 2 SMB + 4 HML +
> library(sandwich)
> reg <- lm(y~x -1)
> VCHC <- vcovHC(reg, type = "HC0")
> sqrt(diag(VCHC))
x xx1 xx2 xx3
0.011389299 0.002724617 0.004054315 0.004223813 ⟹ White SE HC0
11
RS-11
See Baltagi (2001, p. 24) for analysis of these data. The article on
which the analysis is based is Baltagi, B. and Griffin, J., "Gasolne
Demand in the OECD: An Application of Pooling and Testing
Procedures," European Economic Review, 22, 1983, pp. 117-
137. The data were downloaded from the website for Baltagi's text.
Countries
are ordered
by the
standard
deviation
of their 19
residuals.
12
RS-11
Standard OLS
+--------+--------------+----------------+--------+--------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]|
+--------+--------------+----------------+--------+--------+
Constant| 2.39132562 .11693429 20.450 .0000
LINCOMEP| .88996166 .03580581 24.855 .0000
LRPMG | -.89179791 .03031474 -29.418 .0000
LCARPCAP| -.76337275 .01860830 -41.023 .0000
13
RS-11
Newey-West Estimator
• Now, we also have autocorrelation. We need to estimate
Q* = (1/T) XΣX = (1/T) ij ij xi xj
Or using time series notation, estimator of Q*: (1/T) ts xtet esxs
Newey-West Estimator
• Natural estimator of Q*: ST = (1/T) ts xtet esxs.
• Natural estimator of Q*: ST ˆ ( j )
j
T
14
RS-11
Newey-West Estimator
• The parametric estimation uses an ARMA model – say, an AR(2) –
to calculate γj.
Newey-West Estimator
• Natural estimator of Q*: ST ˆ
j
T ( j)
Solution: We need to make sure the sum converges. Cutting short the
sum is one way to do it, but we need to careful, for consistency the
sum needs to grow as T→ ∞ (we need to sum infinite Гj’s).
• Trick: Use a truncation lag, L, that grows with T but at a slower rate
–i.e., L=L(T); say, L=0.75*(T)1/3-1. Then, as T → ∞ and L/T→ 0:
L (T )
QT*
j L (T )
T ( j)
p
Q*
15
RS-11
Newey-West Estimator
• Issue 2 (& 3): ST needs to be psd to be a proper covariance matrix.
• Intuition for Bartlett kernel: Use weights in the sum that imply that
the process becomes less autocorrelated as time goes by –i.e, the
terms have a lower weight in the sum as the difference between t and
s grows.
Newey-West Estimator
• Other kernels work too. Typical requirements for k(.):
– |k(x)| ≤ 1;
– k(0) = 1;
– k(x) = k(−x) for all x∈R,
– ∫|k(x)| dx <∞;
– k(.) is continuous at 0 and at all but a finite number of other points
in R, and
k ( x)e it dx 0,
The last condition is bit technical and ensures psd, see Andrews
(1991).
16
RS-11
Newey-West Estimator
• Two components for the NW HAC estimator:
(1) Start with Heteroscedasticity Component:
S0 = (1/T) i ei2 xi xi – the White estimator.
17
RS-11
• These kernels are all symmetric about the vertical axis. The Bartlett
and Parzen kernels have a bounded support [−1, 1], but the other
two have unbounded support.
kL(x)
x
• Q: In practice –i.e., in finite samples– which kernel to use? And
L(T)? Asymptotic theory does not help us to determine them.
18
RS-11
NW Estimator: Remarks
• Today, the HAC estimators are usually referred as NW estimators,
regardless of the kernel used if they produce a psd covariance matrix.
Example:
## fit investment equation using the 3 factor Fama French Model for IBM returns,
fit <- lm(y ~ x -1)
NW Estimator: Remarks
• You can also program the NW SEs yourself. In R:
NW_f <- function(y,X,b,lag)
{ F <- t(X)%*%X
T <- length(y); V <- solve(F)%*%G%*%solve(F)
k <- length(b); nw_se <- sqrt(diag(V))
yhat <- X%*%b ols_se <- sqrt(diag(solve(F)*drop((t(e)%*%e))/(T-k)))
e <- y - yhat l_se = list(nw_se,ols_se)
hhat <- t(X)*as.vector(t(e)) return(l_se)
G <- matrix(0,k,k) }
a <- 0
w <- numeric(T) NW_f(y,X,b,lag=4)
while (a <= lag) {
Ta <- T - a
ga <- matrix(0,k,k)
w[lag+1+a] <- (lag+1-a)/(lag+1)
za <- hhat[,(a+1):T] %*% t(hhat[,1:Ta])
ga <- ga + za
G <- G + w[lag+1+a]*ga
a <- a+1
}
19
RS-11
NW Estimator: Example in R
Example: We estimate the 3 factor F-F model for IBM returns:
> library(sandwich)
> reg <- lm(y~x -1)
> reg$coefficients
x xx1 xx2 xx3
-0.2331470817 0.0101872239 0.0009802843 -0.0044459013 ⟹ OLS b
NW Estimator: Remarks
• Parametric estimators of Q* are simple and perform reasonably
well. But, we need to specify the ARMA model. Thus, they are not
robust to misspecification of (A3’). This is the appeal of White & NW.
20
RS-11
NW Estimator: Remarks
• There are estimators of Q* that are not consistent, but with better
small sample properties. See Kiefer, Vogelsang and Bunzel (2000).
References: Müller (2014) & Sun (2014). There is a recent review (not
that technical) paper by Lazarus, Lewis, Stock & Watson (2016) with
recommendations on how to use these HAR estimators.
21
RS-11
22
RS-11
• Q: Is known?
23
RS-11
• Efficient Variance
Var[bGLS |X] = E[(bGLS - )(bGLS - )’|X]
= E[(X’Ω-1X)-1 X’Ω-1 ’ X’Ω-1 (X’Ω-1X)-1|X]
= (X’Ω-1X)-1 X’Ω-1 E[’|X] Ω-1X(X’Ω-1X)-1
= σ2(X’Ω-1X)-1
24
RS-11
Consistency (Green)
Use Mean Square
1
2 X'Ω -1 X
ˆ|X ]=
Var[ β 0?
n n
X'Ω -1 X
Requires to be "well behaved"
n
Either converge to a constant matrix or diverge.
Heteroscedasticity case:
X'Ω -1 X 1 1
i1
n
x i x i'
n n ii
Autocorrelatio n case:
X'Ω -1 X 1 1
i1 j 1
n n
x i x j'. n 2 terms. Convergence is unclear.
n n ij
X ' 1 X 1 T T 1 2 T T
xi x j ' 02 t s
T T j 1 i 1 ij T t 1 s 1
25
RS-11
26
RS-11
F ( z t1 , z t 2 ,..... z tn ) P ( Z t1 z t1 , Z t2 z t 2 ,... Z tn z tn )
27
RS-11
Var ( Z t ) 2t E ( Z t t ) 2
( Z t t ) 2 f ( z t ) dz t
Cov ( Z t , Z t ) E [( Z t t )( Z t t )]
1 2 1 1 2 2
cov( Z t , Z t )
( t1 , t 2 ) 1 2
2 2
t1 t2
because F ( z t1 ) F ( z t1 k ) t1 t1 k
provided that E ( Z t ) , E ( Z 2 t )
Then, F ( z t1 , z t 2 ) F ( z t 1 k , z t 2 k )
cov( z t1 , z t 2 ) cov( z t1 k , z t 2 k )
( t1 , t 2 ) ( t 1 k , t 2 k )
let t1 t k and t 2 t , then
( t1 , t 2 ) ( t k , t ) ( t , t k ) k
The correlation between any two RVs depends on the time difference.
28
RS-11
29
RS-11
Z i
– Ensemble Average z i1
m
n
Z t
– Time Series Average z t1
n
• Q: Under which circumstances we can use the time average (only one
realization of {Zt})? Is the time average an unbiased and consistent
estimator of the mean? The Ergodic Theorem gives us the answer.
0 0
n n n n n
1
var(z)
n2
cov(Z , Z ) n
t 1 s 1
t s 2
t 1 s 1
t s
n2
(
t 1
t 1 t 2
⋯ t n )
0
[(0 1 ⋯ n1 ) (1 0 1 ⋯ n2 )
n2
⋯ ((n1) (n2) ⋯ 0 )]
30
RS-11
0
(1
k ?
lim var( z ) lim ) k
0
n n n n
k
Q: How can we make the remaining part, the sum over the upper
triangle of the covariance matrix, go to zero as well?
A: We need to impose conditions on ρk. Conditions weaker than "they
are all zero;" but, strong enough to exclude the sequence of identical
copies.
t 1 k 1
k
t 1 k 1
k
t 1 k 1
k
31
RS-11
k
k
Note: Recall that only the first two moments are needed to describe
the normal distribution.
32
RS-11
2
ˆ
i WLS: Think of [ωi]-1/2 as weights.
nK
We do OLS with the weighted data.
• Easier:
Var[t] = 2 Var[t-1] + Var[ut] Var[t] = σu2 /(1- 2)
33
RS-11
Continuing...
Cov[ t , t 1 ] = Cov[ t 1 ut , t 1 ]
= Cov[ t 1 , t 1 ] Cov[ut , t 1 ]
= Var[ t-1 ] Var[ t ]
u2
=
(1 2 )
Cov[ t , t 2 ] = Cov[ t 1 ut , t 2 ]
= Cov[ t 1 , t 2 ] Cov[ut , t 2 ]
= Cov[ t , t 1 ]
2 u2
= and so on.
(1 2 )
1 2 L
⋯ T 1
1 L
⋯ T 2
u 2
2
Ω
2
2
1 L
⋯ T 3
1
⋯M ⋯
M ⋯
M ⋯
O ⋯
M
1
T
T 2
T 3
⋯
L 1
( N o te , tr a c e Ω = n a s r e q u ir e d .)
34
RS-11
1 2 0 0 ... 0
1 0 ... 0
Ω 1 / 2
0 1 ... 0
... ... ... ... ...
0 0 0 0
1 2 y
1
y y
2 21
Ω 1 / 2 y = y y GLS: Transformed y*.
3 2
...
y
T T 1
yt x t 'β t t t 1 u t
y t 1 x t -1 'β t 1
y t y t 1 ( x t x t -1 )'β + ( t t 1 )
y t y t 1 ( x t x t -1 )'β + u t
35
RS-11
• Examples:
(1) Var[i|X] = 2 exp(zi). Variance a function of and some
variable zi (say, firm size or country).
36
RS-11
1 2 L
⋯ T 1
1 L
⋯ T 2
u 2
2
Ω
2
2
1 L
⋯ T 3
1
M ⋯ M
⋯ M⋯ O
⋯ M
⋯
T 1 T 2 T 3 ⋯
L 1
37
RS-11
• We seek a vector which converges to the same thing that this does.
Call it “Feasible GLS” or FGLS, based on [X X]-1 X y
38
RS-11
• Q: Why?
39
RS-11
40
RS-11
41
RS-11
See Baltagi (2001, p. 24) for analysis of these data. The article on
which the analysis is based is Baltagi, B. and Griffin, J., "Gasolne
Demand in the OECD: An Application of Pooling and Testing
Procedures," European Economic Review, 22, 1983, pp. 117-
137. The data were downloaded from the website for Baltagi's text.
42
RS-11
Standard OLS
+--------+--------------+----------------+--------+--------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]|
+--------+--------------+----------------+--------+--------+
Constant| 2.39132562 .11693429 20.450 .0000
LINCOMEP| .88996166 .03580581 24.855 .0000
LRPMG | -.89179791 .03031474 -29.418 .0000
LCARPCAP| -.76337275 .01860830 -41.023 .0000
43
RS-11
44