Development Economics I
Dr. Elisabetta Gentile
orientation tutorial
Recall: correlation does not imply
causation!
A positive correlation between two variables, X and
Y, could be explained as:
X causes Y: the causation may be direct, or it may
operate through a chain of causal links;
Y causes X (reverse causation): one might think that X
causes Y when its really the opposite;
Causation runs in both directions;
There is no causal relationship between X and Y: some
third variable Z causes both X and Y.
Endogenous explanatory variables
Consider the following simple regression equation:
y = 0 + 1x + u
Estimate of 1 will be biased if:
1. reverse causation: xy and yx (e.g., policecrime
crimepolice), leading to simultaneity bias;
2. there is some omitted variable v that affects both y and x
(e.g., educationwage, but unobserved
abilityeducation and abilitywage);
3. one or more of the explanatory variables is measured
with error, leading to attenuation bias.
Addressing endogeneity
When faced with the prospect of unobserved endogeneity,
we can:
1. ignore the problem, and suffer the consequences of biased and
inconsistent estimators;
2. try to find and use a suitable proxy variable for the unobserved
variable (e.g., IQ is a good proxy for ability in the regression of
wage on education);
3. assume that the unobservables do not change over time, and use
fixed effects or first-differencing;
4. leave the unobserved variable in the error term, but use
instrumental variables estimation, (a.k.a. two-stage least squares,
or 2SLS), which recognizes the presence of the omitted variable.
Instrumental Variable (IV)
estimation
Consider the simple regression equation again:
y = 0 + 1x + u
Violation of the assumption that Cov(x,u) = 0 has
serious consequences for the OLS estimator:
when one or more of the explanatory variables is
correlated with the error term u, we have both E(u|x) 0
and E(xu) 0, so the OLS estimator will be both biased
and inconsistent.
Instrumental Variable (IV)
estimation
Suppose that we have an observable variable z
that satisfies the following assumptions:
1. Exogeneity: Cov(z,u) = 0; i.e. z should have no partial
effect on y (after x and omitted variables have been
controlled for), an z should be uncorrelated with the
omitted variable;
2. Relevance: Cov(z,x) 0; i.e. z must be related, either
positively or negatively, to the endogenous explanatory
variable x;
Then we call z an instrument for x.
Testing for instrument relevance
Because the exogeneity assumption involves the
covariance between z and the unobserved error u, we
cannot test it, but rather we make an argument for it;
The relevance assumption can (and will) be tested in all
empirical papers using IV estimation:
Estimate a simple regression between x and z:
x = 0 + 1z +
z fulfills the relevance assumption if we are able to reject the null
hypothesis that:
H0: 1 = 0 at a sufficiently small significance level (i.e. 5% or 1%).
How IV estimation works
Given the linear regression:
y = 0 + 1x1 + 2x2 + 3x3 + u
Assume Cov(x1,u) 0, but we have an instrument z:
Stage 1: regress x1 on z, x2 and x3, to obtain x1 :
x1 = 0 + 1z + 2x2 + 3 x3 +
Stage 2: plug the fitted values of x1 into the original linear
regression equation:
y = 0 + 1 x1+ 2x2 + 3x3 +
where is a composite error term that is uncorrelated
with x1, x2 and x3.
8
Sample table
Table X: the title will specify what regressions we are looking at
Model
(1)
Model
(2)
Model
(3)
Independent variable of
interest
Coefficient
(standard error)
Coefficient
(standard error)
Coefficient
(standard error)
Control 1
Coefficient
(standard error)
Coefficient
(standard error)
Coefficient
(standard error)
Control 2
Coefficient
(standard error)
Coefficient
(standard error)
Coefficient
(standard error)
Number
Number
Number
Y/N
Y/N
Y/N
Number of observations
Fixed effects
Statistical significance
The most common convention to convey statistical
significance is to place a number of stars next to
the estimated coefficient as follows:
* for 10% significance;
** for 5% significance;
*** for 1% significance;
Sometimes the Authors do not highlight statistically
significant coefficients in their tables:
In such cases, it is up to you to point out statistical
significance, at least for the independent variable of
interest.
10
Questions?
11