Omitted Variable Bias:
The Simple Case
Ingredientes
Suppose that we omit a variable that actually belongs in the
true (or population) model.
This is often called the problem of excluding a relevant
variable or under-specifying the model.
This problem generally causes the OLS estimators to be
biased.
Deriving the bias caused by omitting an important variable
is an example of misspecification analysis.
Let us begin assuming that the true population model is
y = β 0 + β 1 x1 + β 2 x2 + u
and that this model satisfies Assumptions MLR.1–MLR.4.
Primary interest: β1 , the partial effect of x1 on y.
Example: y is log of hourly wage, x1 is education, and x2
is a measure of innate ability. To get an unbiased estimator
of β1 , we should run a regression of y on x1 and x2 (which
gives unbiased estimators of β0 , β1 and β2 ).
However, due to our ignorance or data unavailability, we
estimate the model by excluding x2 .
In other words, we perform a simple regression of y on x1
only, obtaining the equation
ỹ = β̃0 + β̃1 x1
We use the symbol “e” rather than “b” to emphasize that
β̃1 comes from an underspecified model.
We can derive the algebraic relationship
β̃1 = β̂1 + β̂2 δ̃
where β̂1 and β̂2 are the slope estimators (if we could have
them) from the multiple regression
yi on xi1 , xi2 i = 1, . . . , n,
and δ̃ is the slope from the simple regression
xi2 on xi1 i = 1, . . . , n.
Because δ̃ depends only on the independent variables in the
sample, we treat it as fixed (nonrandom) when computing
E(δ̃).
Bias size
It is known that β̂1 and β̂2 are unbiased for β1 and β2 .
Therefore,
E(β̃1 ) = E(β̂1 + β̂2 δ̃)
= E(β̂1 ) + E(β̂2 )δ̃ = β1 + β2 δ̃
which implies that the bias in β̃1 is
Bias(β̃1 ) = E(β̃1 ) − β1 = β2 δ̃.
Because the bias in this case arises from omitting the
explanatory variable x2 , the term on the right-hand side of
the above equation (β2 δ̃) is often called the omitted variable
bias.
It is easy to see that Bias(β̃1 ) = 0 when
1 β2 = 0
The omitted variable x2 is not in the “true” model.
2 δ̃ = 0
Recall that δ̃ is the slope from the simple regression
xi2 on xi1 i = 1, . . . , n,
which is directly related to the correlation between x1
and x2 . Therefore, when x1 and x2 are uncorrelated,
omitting x2 does NOT lead to biased estimate of β1 ,
regardless of the value of β2 .
Corr(x1 , x2 ) > 0 Corr(x1 , x2 ) < 0
β2 > 0 Positive bias Negative bias
β2 < 0 Negative bias Positive bias
Wage example
More ability ⇒ higher productivity ⇒ higher wages ⇒
β2 > 0 in
wage = β0 + β1 educ + β2 abil + u,
Conjecture: educ and abil are positively correlated
On average, individuals with more innate ability choose
higher levels of education.
Consequence: OLS estimates from
wage = β0 + β1 educ + u,
are on average too large.