Efficiency and BLUEness
Under assumptions 5 and 6, we can prove that the OLS estimators are the most efficient
among all unbiased linear estimators. Thus we can conclude that the OLS procedure
yields BLU estimators.
The proof that the OLS estimators are BLU estimators is relatively complicated.
It entails a procedure which goes the opposite way from that followed so far. We start
the estimation from the beginning, trying to derive a BLU estimator of β based on the
properties of linearity, unbiasedness and minimum variance one by one, and then we
check whether the BLU estimator derived by this procedure is the same as the OLS
estimator.
Thus, we want to derive the BLU estimator of β, say β˘, concentrating first on the
property of linearity. For β˘ to be linear we need to have:
β˘ = δ1Y1 + δ2Y2 + · · · + δnYn = δtYt (3.45)
where the δt terms are constants, the values of which are to be determined.
Proceeding with the property of unbiasedness, for β˘ to be unbiased we must have
E(β) ˘ = β. We know that:
E(β) ˘ = E δtYt = δtE(Yt) (3.46)
ASTERIOU: “chap03” — 2011/3/29 — 18:47 — page 41 — #15
Simple Regression 41
Substituting E(Yt) = a + βXt (because Yt = a + βXt + ut, and also because Xt is nonstochastic and E(ut) = 0, given by the basic
assumptions of the model), we get:
E(β) ˘ = δt(a + βXt) = a δt + β δt Xt (3.47)
and therefore, in order to have unbiased β ˘, we need:
δt = 0 and δt Xt = 1 (3.48)
Next, we proceed by deriving an expression for the variance (which we need to
minimize) of β ˘:
Var(β) ˘ = E β ˘ − E(β) ˘ 2
= E δt Yt − E δt Yt2
= E δt Yt − δt E(Yt)2
= E δt(Yt − E(Yt))2 (3.49)
In this expression we can use Yt = a + βXt + ut and E(Yt) = a + βXt to give:
Var(β) ˘ = E δt(a + βXt + ut − (a + βXt)2
= E δt ut2
= E(δ1 2u1 2 + δ2 2u2 2 + δ3 2u3 2 + · · · + δn 2un 2
+ 2δ1δ2u1u2 + 2δ1δ3u1u3 + · · ·)
= δ2
1E(u1 2) + δ2 2E(u2 2) + δ3 2E(u3 2) + · · · + δn 2E(un 2)
+ 2δ1δ2E(u1u2) + 2δ1δ3E(u1u3) + · · ·) (3.50)
Using assumptions 5 (Var(ut) = σ 2) and 6 (Cov(ut, us) = E(ut us) = 0 for all t = s) we
obtain that:
Var(β) ˘ = δt 2σ 2 (3.51)
We now need to choose δt in the linear estimator (Equation (3.46)) to be such as
to minimize the variance (Equation (3.51)) subject to the constraints (Equation (3.48))
which ensure unbiasedness (with this then having a linear, unbiased minimum variance
estimator). We formulate the Lagrangian function:
L = σ 2 δt 2 − λ1 δt − λ2 δt Xt − 1 (3.52)
where λ1 and λ2 are Lagrangian multipliers.
ASTERIOU: “chap03” — 2011/3/29 — 18:47 — page 42 — #16
42 The Classical Linear Regression Model
Following the regular procedure, which is to take the first-order conditions (that is
the partial derivatives of L with respect to δt, λ1 and λ2) and set them equal to zero,
and after rearrangement and mathematical manipulations (we omit the mathematical
details of the derivation because it is very lengthy and tedious, and because it does not
use any of the assumptions of the model in any case), we obtain the optimal δt as:
δ
t=
xt
x2 t (3.53)
Therefore we have that δt = zt of the OLS expression given by Equation (3.34). So,
substituting this into our linear estimator β ˘ we have:
β ˘ = δtYt = ztYt
= zt (Yt − Y ¯ + Y ¯ )∗
= zt (Yt − Y ¯ ) + Y ¯ zt
= ztyt =
xtyt
x2 t
= β ˆ (3.54)
Thus, the β ˆ of the OLS is BLU.
The advantage of the BLUEness condition is that it provides us with an expression for
the variance by substituting the optimal δt given in Equation (3.53) into Equation (3.51)
to give:
Var(β) ˘ = Var(β) ˆ = xtx2 t 2 σ 2
=
x2 t σ 2
x2 t 2 = σ 2 1x2 t (3.55)
Consistency
Consistency is the idea that, as the sample becomes infinitely large, the parameter
estimate given by a procedure such as OLS converges on the true parameter value.
This is obviously true when the estimator is unbiased, as shown above, as consistency
is really just a weaker form of unbiasedness. However, the proof above rests on our
assumption 3, that the X variables are fixed. If we relax this assumption it is no longer
possible to prove the unbiasedness of OLS but we can still establish that it is a consistent
estimator. That is, when we relax assumption 3, OLS is no longer a BLU estimator but
it is still consistent.
∗ We add and subtract Y ¯ .
ASTERIOU: “chap03” — 2011/3/29 — 18:47 — page 43 — #17
Simple Regression 43
We showed in Equation (3.31) that βˆ = β + Cov(X, u)/Var(X). Dividing the top and
the bottom of the last term by n gives
βˆ = β + Cov(X, u)/n
Var(X)/n
(3.56)
Using the law of large numbers, we know that Cov(X, u)/n converges to its expectation,
which is Cov(Xt, ut). Similarly, Var(X)/n converges to Var(Xt). So, as n → ∞, βˆ → β +
Cov(Xt, ut)/Var(Xt), which is equal to the true population parameter β if Cov(Xt, ut) = 0
(that is if Xt and ut are uncorrelated). Thus βˆ is a consistent estimator of the true
population parameter β.