0% found this document useful (0 votes)
83 views8 pages

Omitted Variable Bias: The Simple Case

Omitting an important explanatory variable from a regression model can bias the estimated coefficients on the included variables. Specifically: 1) If the true model is y = β0 + β1x1 + β2x2 + u, but we estimate y = β0 + β1x1, the bias on β1 is β2δ, where δ is the coefficient from regressing x2 on x1. 2) The bias will be zero if either β2 or δ is zero, i.e. if the omitted variable is not important or is uncorrelated with the included variables. 3) For the example of wage determination, omitting ability (which is positively correlated with

Uploaded by

cj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views8 pages

Omitted Variable Bias: The Simple Case

Omitting an important explanatory variable from a regression model can bias the estimated coefficients on the included variables. Specifically: 1) If the true model is y = β0 + β1x1 + β2x2 + u, but we estimate y = β0 + β1x1, the bias on β1 is β2δ, where δ is the coefficient from regressing x2 on x1. 2) The bias will be zero if either β2 or δ is zero, i.e. if the omitted variable is not important or is uncorrelated with the included variables. 3) For the example of wage determination, omitting ability (which is positively correlated with

Uploaded by

cj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Omitted Variable Bias:

The Simple Case


Ingredientes

Suppose that we omit a variable that actually belongs in the


true (or population) model.

This is often called the problem of excluding a relevant


variable or under-specifying the model.

This problem generally causes the OLS estimators to be


biased.

Deriving the bias caused by omitting an important variable


is an example of misspecification analysis.
Let us begin assuming that the true population model is

y = β 0 + β 1 x1 + β 2 x2 + u

and that this model satisfies Assumptions MLR.1–MLR.4.

Primary interest: β1 , the partial effect of x1 on y.

Example: y is log of hourly wage, x1 is education, and x2


is a measure of innate ability. To get an unbiased estimator
of β1 , we should run a regression of y on x1 and x2 (which
gives unbiased estimators of β0 , β1 and β2 ).

However, due to our ignorance or data unavailability, we


estimate the model by excluding x2 .
In other words, we perform a simple regression of y on x1
only, obtaining the equation

ỹ = β̃0 + β̃1 x1

We use the symbol “e” rather than “b” to emphasize that


β̃1 comes from an underspecified model.
We can derive the algebraic relationship

β̃1 = β̂1 + β̂2 δ̃

where β̂1 and β̂2 are the slope estimators (if we could have
them) from the multiple regression

yi on xi1 , xi2 i = 1, . . . , n,

and δ̃ is the slope from the simple regression

xi2 on xi1 i = 1, . . . , n.

Because δ̃ depends only on the independent variables in the


sample, we treat it as fixed (nonrandom) when computing
E(δ̃).
Bias size

It is known that β̂1 and β̂2 are unbiased for β1 and β2 .


Therefore,

E(β̃1 ) = E(β̂1 + β̂2 δ̃)


= E(β̂1 ) + E(β̂2 )δ̃ = β1 + β2 δ̃

which implies that the bias in β̃1 is

Bias(β̃1 ) = E(β̃1 ) − β1 = β2 δ̃.

Because the bias in this case arises from omitting the


explanatory variable x2 , the term on the right-hand side of
the above equation (β2 δ̃) is often called the omitted variable
bias.
It is easy to see that Bias(β̃1 ) = 0 when
1 β2 = 0
The omitted variable x2 is not in the “true” model.

2 δ̃ = 0
Recall that δ̃ is the slope from the simple regression

xi2 on xi1 i = 1, . . . , n,

which is directly related to the correlation between x1


and x2 . Therefore, when x1 and x2 are uncorrelated,
omitting x2 does NOT lead to biased estimate of β1 ,
regardless of the value of β2 .

Corr(x1 , x2 ) > 0 Corr(x1 , x2 ) < 0


β2 > 0 Positive bias Negative bias
β2 < 0 Negative bias Positive bias
Wage example
More ability ⇒ higher productivity ⇒ higher wages ⇒
β2 > 0 in

wage = β0 + β1 educ + β2 abil + u,

Conjecture: educ and abil are positively correlated


On average, individuals with more innate ability choose
higher levels of education.

Consequence: OLS estimates from

wage = β0 + β1 educ + u,

are on average too large.

You might also like