Instrumental Variables and Two-Stage Least Squares
Generalised Least Squares
Professor Menelaos Karanasos
December 2011
(Institute) Generalised Least Squares 03/06 1/1
Generalised Least Squares:
Assume that the postulated model is
y = Xb + e, (1)
where
e N (0, σ2 Ω),
where Ω is a positive de…nite matrix-this implies that
Ω 1 is also a positive de…nite matrix.
Thus it is possible to …nd a nonsingular matrix P such that
Ω 1
= P 0 P.
or
Ω = (P 0 P ) 1
=P 1
(P 0 ) 1
.
(Institute) Generalised Least Squares 03/06 1/1
Premultiply the linear model in equation (1) by P, to obtain
Py = PXb + Pe (2)
Denote Pe by u. Then
Var(u ) = E (uu 0 ) = E (Pee 0 P 0 )
= PE (ee 0 )P 0 = σ2 P |{z}
Ω P0
| {z }
P 1 (P 0 ) 1
σ2 Ω
= σ2 PP 1
(P 0 ) 1
P 0 = σ2 I .
Thus the transformed variables in equation (2) satisfy the conditions under
which OLS is BLUE.
(Institute) Generalised Least Squares 03/06 1/1
The coe¢ cient estimated vector from OLS regression of Py on PX is the
generalized least squares (GLS) estimator:
βG = [(PX )0 PX ] 1
(PX )0 Py
= (X 0 |{z}
P 0P X ) 1 0 0
X P Py
Ω 1
= (X Ω 0 1
X) 1
X 0Ω 1
y.
From the OLS theory it follows that
Var ( βG ) = σ2 [(PX )0 PX ] 1
= σ2 (X 0 |{z}
P 0P X ) 1
Ω 1
= σ (X Ω
2 0 1
X) 1
.
(Institute) Generalised Least Squares 03/06 1/1
An unbiased estimator of the unknown σ2 is readily obtained from the
application of OLS to the transformed model. It is
b0 u
s2 = u b/N k
= (Py PX βG )0 (Py PX βG )/N k
= (y X βG )0 |{z}
P 0 P (y X βG )/N k=
Ω 1
(y X βG ) Ω 0 1
(y X βG )/N k,
where βG = (X 0 Ω 1X ) 1X 0Ω 1y .
(Institute) Generalised Least Squares 03/06 1/1
Note that the procedures outlined so far imply knowledge of Ω. In practice
Ω is unknown, and it is important to develop feasible generalized least
squares (FGLS).
Finally, note that if σ2 Ω = V , where V is a positive de…nite
variance-covariance matrix.
Then, it follows directly that
βG = (X 0 |{z}
Ω 1 X) 1
X 0 |{z}
Ω 1y =
σ2 V 1 σ2 V 1
0 1 1 0 1
= (X V X) X V y,
and
Var ( βG ) = σ2 (X 0 |{z}
Ω 1 X) 1
σ2 V 1
= σ 2 (X 0 V 1
X) 1
.
(Institute) Generalised Least Squares 03/06 1/1
Feasible GLS procedure:
Let 2 3
σ21 0
6 .. 7 .
V = 4 ... ..
. . 5
0 σ2N
Further, we hypothesize that
σ2i = a0 + a1 zia2 , i = 1, . . . , N,
where zi is a single variable, possibly one of the regressors, thought to
determine the heteroscedasticity.
(Institute) Generalised Least Squares 03/06 1/1
Because the OLS residuals be=y X β are consistent estimates of e one
can run the nonlinear regression
ei2 = a0 + a1 zia2 + vi .
b
Estimates of the disturbance variances are then
b2i = α0 + α1 ziα2 , i = 1, . . . , N.
σ
b matrix and a feasible GLS procedure.
These estimates give the V
(Institute) Generalised Least Squares 03/06 1/1
Instrumental variables (IV) estimators:
Consider the relation
yi = bxi + ei , (3)
where for simplicity the constant term has been dropped. Suppose
however, that the observed value xi can be represented as the sum of the
true value e
xi and a random measurement error vi , that is
xi = e
xi + vi .
In this case the apropriate relation may be
yi = be
xi + ei (4)
(Institute) Generalised Least Squares 03/06 1/1
If we assume that equation (4) is the maintained speci…cation but that
observations are only available on xi and not on e
xi , what happens if we use
OLS?
The OLS slope is
∑ y x
|{z}
∑ x (be
x i +e i
be x + e)
β = =
∑ x2 ∑ x2
∑ xe
x ∑ xe
= b + )
∑x 2
∑ x2
∑ xe
x
E ( β) = b 6= b.
∑ x2
Thus OLS is biased. This is an example of speci…cation error.
(Institute) Generalised Least Squares 03/06 1/1
In this case one should make use of instrumental variables which are also
commonly referred to as instruments.
Suppose that it is possible to …nd a data matrix Z of order N l (l k),
which posseses two vital properties:
1. The variables in Z are correlated with those of X
2. The variables in Z are (in the limit) uncorrelated with the disturbance
term e.
Premultiplying the general relation by Z 0 gives
Z 0 y = Z 0 X b + |{z}
Z 0e ,
|{z} |{z}
y X e
with
Var (e ) = (Z 0 e ) = E [(Z 0 e )(Z 0 e )0 ]
= E (Z 0 ee 0 Z )
= Z 0 E (ee 0 )Z = σ2 Z 0 Z .
(Institute) Generalised Least Squares 03/06 1/1
This suggests the use of the GLS. The resultant estimator is
βGLS X 0 Z (Z 0 Z )
= βIV = [|{z} 1
Z 0X ]
|{z}
1
X 0 Z (Z 0 Z )
|{z}
1
Z 0y (5)
|{z}
(X ) 0 X (X ) 0 y
0 1 0
= (X Pz X ) X Pz y ,
where Pz = Z (Z 0 Z ) 1 Z 0 .
The variance-covariance matrix is
Var ( βIV ) = σ2 [X 0 Z (Z 0 Z ) 1
Z 0X ] 1
= σ2 (X 0 Pz X ) 1
and the distrurbance variance maybe estimated consistenly from
b 2 = (y
σ X βIV )0 (y X βIV )/N.
Note that the use of N or N k or N l in the divisor does not matter
asymptotically.
(Institute) Generalised Least Squares 03/06 1/1
Special case:
When l = k, that is, when Z contains the same number of columns as X ,
we have a special case of the foregoing results.
Now X 0 Z is k k and nonsingular. This implies that
X 0 Z ) (Z 0 Z )
[(|{z} 1
Z 0 X )]
(|{z} 1
=
(X ) 0 X
0
(Z X ) 1
(Z 0 Z )(X 0 Z ) 1
.
Thus the estimator in equation (5) reduces to
βIV = (Z 0 X ) 1
(Z 0 Z )(X 0 Z ) 1
(X 0 Z )(Z 0 Z ) 1
Z 0y
= (Z 0 X ) 1 0
Zy
Moreover, Var ( βIV ) simpli…es to
Var ( βIV ) = σ2 [X 0 Z (Z 0 Z ) 1
Z 0X ] 1
= σ 2 (Z 0 X ) 1
(Z 0 Z )(X 0 Z ) 1
.
(Institute) Generalised Least Squares 03/06 1/1
Two-stage least square (2SLS)
The IV estimator may also be seen as the result of a double application of
least squares:
Stage (i): Regress each variable in the X matrix on Z (X = Zd + u) to
obtain a matrix of …tted values X
b
X = Z δ
|{z}
(Z 0 Z ) 1Z 0X
0
= Z (Z Z ) 1
Z 0 X = PZ X .
| {z }
PZ
(Institute) Generalised Least Squares 03/06 1/1
b to obtain the 2SLS estimated β vector
Stage (ii): Regress y on X
β2SLS b 0X
= (X b ) 1Xb 0y
= (X 0 PZ0 PZ X ) 1 X 0 PZ0 y
| {z }|{z} | {z }
b0
X b
X b0
X
0 0
= (X PZ X )X PZ y = βGLS ,
since PZ0 PZ = [Z (Z 0 Z ) 1 Z 0 ]0 = Z (Z 0 Z ) 1Z 0
= PZ and
PZ0 PZ = PZ2 = Z (Z 0 Z ) 1 Z 0 Z (Z 0 Z ) 1 Z 0 = Z (Z 0 Z ) 1 Z 0 = PZ .
(Institute) Generalised Least Squares 03/06 1/1
Thus the IV estimator can be obtained by a two-stage least-squares
procedure.
The variance-covariance matrix and the estimated disturbance term are
given by
Var ( βIV ) = σ2 [X 0 Z (Z 0 Z ) 1
Z 0X ] 1
= σ2 (X 0 Pz X ) 1
and
b 2 = (y
σ X βIV )0 (y X βIV )/N.
(Institute) Generalised Least Squares 03/06 1/1
Choice of instruments:
The crucial question is, where do we …nd the instruments?
Some of them are often variables from X matrix itself.
Any variables that are thought to be exogenous and indepedent of the
disturbance are retained to serve in the Z matrix.
When some of the X variables are used as instruments, we may particion
X and Z as
X = [X1 X2 ], Z = [X1 Z1 ],
where X1 is of order N r (r < k), X2 is N (k r ), and Z1 is
N (l r ).
(Institute) Generalised Least Squares 03/06 1/1
b , the matrix of regressors in the second-stage
It can be shown that X
regression, is then
b = [X1 X
X b2 ],
and
b2 = Z (Z 0 Z )
X 1
Z 0 X2 ,
| {z }
PZ
b2 are the …tted values of X2 obtained from the regression of X2 on
that is X
the full set of instruments: X2 = Z γ + v
and
b2 = Z γ
X b,
b = (Z 0 Z )
γ 1
Z 0 X2 .
(Institute) Generalised Least Squares 03/06 1/1
There still remains the question of how many instruments to use.
The minimum number is k.
The asymptotic e¢ ciency increases with the number of isntruments.
However, the small sample bias also increases with the number of
instruments.
If, in fact, we select N instruments, it is simple to show that PZ = I
in which case the IV estimator is simply the OLS which is biased and
inconsistent.
If on the other hand, we use the minimum or close to the minimum,
number of instruments, the results may also be poor.
It has been shown that the mth moment of the 2SLS estimator exists if
and only if m < l k + 1.
Thus, if there are just as many instruments as explanatory variables, the
2SLS estimator will not have a mean.
With one more instrument there will be a mean but not variance, and so
forth.
(Institute) Generalised Least Squares 03/06 1/1