Instrumental Variable
Rus’an Nasrudin
November 16, 2021
Rus’an Nasrudin Instrumental Variable November 16, 2021 1 / 29
Theory
Table of Contents
1 Theory
2 Implementation: Two-stage least squares
3 Asymptotic 2SLS inference
Rus’an Nasrudin Instrumental Variable November 16, 2021 2 / 29
Theory
Introduction
When non-observable factors significantly drive the nonrandom assignment to treatment,
recovering consistent estimations of average treatment effects relying only on
observables is no longer possible.
Conditioning strategy is ineffective, in this case.
Ideally, we relies on Randomised Control Trial (RCT) data to elicit the true causal effect.
There is a technique to establish the causality using observational data called
“instrumental-variable” method, largely can be attributed to the contribution of the LATE
analysis by recent 2021 Sveriges Riksbank Prize in Economic Sciences in Memory of
Alfred Nobel.
Historical origin: simultaneous equation model (SEM) of Wright (1928) or bias in
measurement error in regression of Wald (1940) or Durbin (1954).
Rus’an Nasrudin Instrumental Variable November 16, 2021 3 / 29
Theory
Selection-on-unobservables
Recall the selection-on-observables that we learned earlier. The long regression takes form of:
Yi = α + ρSi + A0i γ + εi
and its short regression was:
Yi = α + ρSi + µi
Rus’an Nasrudin Instrumental Variable November 16, 2021 4 / 29
Theory
Selection-on-unobservables
The problem that we want to tackle initially was how to estimate ρ if Ai is unobserved.
Selection-on-observables approach reduces bias by exploiting within variation of included
control that explain the selection mechanism toward treatment variable.
There is another path of road be taken, we can use only part of the treatment variable that is
exogenously induced by some ‘instrumental’ (called Z ) variable to make ρ has causal
interpretation.
We call this approach ‘selection-on-unobservable’. That is when we do not know Ai we use Z
to purge out non-random component of S to make its prediction to Y be causal.
Rus’an Nasrudin Instrumental Variable November 16, 2021 5 / 29
Theory
DAG of IV
Suppose we want to establish causality of D on Y. The unobserved confounder U is present.
With an instrumental variable Z, the causality is identified.
Z U
D Y
The open back door D ← U → Y is unblocked, yet Z is a valid IV to make causality between D
and Y.
Rus’an Nasrudin Instrumental Variable November 16, 2021 6 / 29
Theory
What is Z ?
Z is correlated with the causal variable of interest Si but not correlated with any other
determinants of Yi or
Z is not correlated both with Ai and εi or with µi .
In other words, the only reason Z affect Yi is through Si .
Rus’an Nasrudin Instrumental Variable November 16, 2021 7 / 29
Theory
Exclusion restriction
By exclusion restriction, let’s define1 :
cov(Yi , Zi ) cov(Yi , Zi )/V(Zi )
ρ= = (1)
cov(Si , Zi ) cov(Si , Zi )/V(Zi )
The second equality expresses the covariance ratio as the regression coefficient using
variance of Z .
1
Denote that cov(Zi , Yi ) = cov(Zi , α + ρSi + µi ) = α · cov(Zi , 1) + ρ · cov(Zi , Si ) + cov(Zi , µi ). Since cov(Zi , 1) and
i ,Yi )
cov(Zi , µi ) are equal to zero we find that ρ = cov(Z
cov(Zi ,Si )
.
Rus’an Nasrudin Instrumental Variable November 16, 2021 8 / 29
Theory
IV estimand
Definition 1.1
The coefficient of interest ρ is the ratio of the population regression of Yi on Zi (the
reduced form) to the population regression of Si on Zi (the first-stage).
Rus’an Nasrudin Instrumental Variable November 16, 2021 9 / 29
Theory
Terminologies
First-stage: the regression of endogenous variable on the instrument.
Si = π10 + π11 Zi + ξ1i
Reduced form: the regression of outcome variable on the instrument.
Yi = π20 + π21 Zi + ξ2i
Structural equation: the regression of outcome variable on the endogenous variable.
Yi = α + ρSi + µi
Rus’an Nasrudin Instrumental Variable November 16, 2021 10 / 29
Theory
IV estimator
Definition 1.2
The IV estimator is the sample analog for population estimand of IV.
Rus’an Nasrudin Instrumental Variable November 16, 2021 11 / 29
Theory
Three assumptions
Exogenous instrument : The instrument is as good as randomly assigned.
Exclusion restriction: the instrument has no effect on outcome other than through the
first-stage channel.
Relevance: The instrument must have a clear effect on the endogenous variable.
Rus’an Nasrudin Instrumental Variable November 16, 2021 12 / 29
Theory
LATE Interpretation
The understanding is that if an instrument is as good as randomly assigned, affects the
outcome through a single known channel, has a first-stage, and affects the causal channel of
interest only in one direction, can be used to estimate the average causal effect on the
affected group.
This is known as LATE (Local Average Treatment Effect).
Rus’an Nasrudin Instrumental Variable November 16, 2021 13 / 29
Implementation: Two-stage least squares
Table of Contents
1 Theory
2 Implementation: Two-stage least squares
3 Asymptotic 2SLS inference
Rus’an Nasrudin Instrumental Variable November 16, 2021 14 / 29
Implementation: Two-stage least squares
Derivation
The reduced form can be derived by substituting the first stage equation into the structural
equation. By doing so, we re-obtain the formula for IV estimand as similar to above formulation
using covariance.
We have:
Yi = α0 Xi + ρ[Xi0 π10 + π11 Zi + ξ1i ] + ηi
= Xi0 [α + ρπ10 ] + ρπ11 Zi + [ρξ1i + ηi ]
= Xi0 π20 + π21 Zi + ξ2i
π21
It shows that ρ = π11
Rus’an Nasrudin Instrumental Variable November 16, 2021 15 / 29
Implementation: Two-stage least squares
The regression on the fitted values
Re-arrangement of above equation yields:
Yi = α0 Xi + ρ[Xi0 π10 + π11 Zi ] + ξ1i
[Xi0 π10 + π11 Zi ] is the population fitted value from the first-stage regression of Si on Xi and Zi .
Since Zi and Xi are uncorrelated with the reduced form error, ξ2i , the coefficient on
[Xi0 π10 + π11 Zi ] in the population regression of Yi on Xi and [Xi0 π10 + π11 Zi ] equals ρ.
Rus’an Nasrudin Instrumental Variable November 16, 2021 16 / 29
Implementation: Two-stage least squares
The Two-Stage Least Squares (2SLS) estimator of ρ
When working with sample, the values in the population are estimated by:
ŝi = Xi0 π̂10 + π̂11 Zi
Where π̂10 and π̂11 are OLS estimates of the corresponding population regression.
ρ is called the Two-Stage Least Squares (2SLS) estimator. It is the coefficient on ŝi in the
regression of Yi on Xi and ŝi .
Rus’an Nasrudin Instrumental Variable November 16, 2021 17 / 29
Implementation: Two-stage least squares
Why it is called 2SLS?
2SLS estimates can be constructed by OLS estimation of the “second-stage equation”:
Yi = α0 Xi + ρŝi + [ηi + ρ(Si − ŝi )]
This is called 2SLS because it can be done in two steps. First is estimating ŝi using first-stage
equation. Second is is the estimating equation above.
The resulting estimator is consistent for ρ because (i) first-stage estimates are consistent and
(ii) the covariates Xi and instrument Zi are uncorrelated with both ηi and Si − ŝi .
Rus’an Nasrudin Instrumental Variable November 16, 2021 18 / 29
Implementation: Two-stage least squares
Note on standard error
We do not usually construct 2SLS estimates in two-steps since the resulting standard errors
are wrong.
Specialised software such as Stata do the iterated calculation to get the standard errors right.
However, computing 2SLS by a sequence of OLS regressions is one way to remember why it
works. Intuitively, conditional on covariates, 2SLS retains only the variation in Si that is
generated by quasi-experimental variation i.e. by the instrument Zi .
Rus’an Nasrudin Instrumental Variable November 16, 2021 19 / 29
Implementation: Two-stage least squares
Multi-instrument case
Assuming that each instrument captures the same causal effect, one can combine the
alternative IV estimates into a single more precise estimate.
2SLS provides a linear combination of multiple instruments into single instrument.
Suppose that there are three instrumental variables Z1i , Z2i , and Z3i . In Angrist and
Krueger (1991) these dummies are first, second and third-quarter births.
The first-stage equation becomes:
Si = Xi0 π10 + π11 Z1i + π12 Z2i + π13 Z3i + ξ1i
Rus’an Nasrudin Instrumental Variable November 16, 2021 20 / 29
Implementation: Two-stage least squares
Multi-instrument case...
The 2SLS second stage is the same as above single instrument case as well as the
interpretation of the 2SLS estimator.
The exclusion restriction is that all of the quarter of birth dummies are uncorrelated with
ηi .
Adding interaction terms of these instrument with year of birth might improve precision as
in the case of heterogenous cohort schooling.
Rus’an Nasrudin Instrumental Variable November 16, 2021 21 / 29
Implementation: Two-stage least squares
Example from Angrist and Krueger (1991)
Rus’an Nasrudin Instrumental Variable November 16, 2021 22 / 29
Asymptotic 2SLS inference
Table of Contents
1 Theory
2 Implementation: Two-stage least squares
3 Asymptotic 2SLS inference
Rus’an Nasrudin Instrumental Variable November 16, 2021 23 / 29
Asymptotic 2SLS inference
Inference
0
Let Vi ≡ Xi0 ŝi is the vector of regressor in the 2SLS second stage
The 2SLS estimator can be written as
X −1 X
Γ̂2SLS ≡ Vi Vi0 Vi Yi
i i
0
Where Γ ≡ α0 ρ is the corresponding coefficient vector.
Rus’an Nasrudin Instrumental Variable November 16, 2021 24 / 29
Asymptotic 2SLS inference
Inference..
Note that
X −1 X
Γ2SLS = Γ + Vi Vi0 Vi [ηi + ρ(Si − ŝi )]
i i
X −1 X
=Γ+ Vi Vi0 Vi ηi
i i
The second equality comes from the fact that the first-stage residuals (Si − ŝi ) are orthogonal
to Vi in the sample.
Rus’an Nasrudin Instrumental Variable November 16, 2021 25 / 29
Asymptotic 2SLS inference
Inference..
The limiting distribution of the 2SLS coefficient vector is therefore the limiting distribution of
P 0 −1 P V η .
i Vi Vi i i i
This quantity is a little harder to work with than the corresponding OLS quantity.
This is because the regressor in this case involve estimated fitted values, ŝi .
A Slutsky-argument shows that we get the same limiting distribution replacing estimated fitted
values with the corresponding population fitted values (i.e. replacing ŝi with Xi0 π10 + π11 Zi .)
Rus’an Nasrudin Instrumental Variable November 16, 2021 26 / 29
Asymptotic 2SLS inference
Inference..
It therefore follows that Γ̂2SLS has an asymptotically normal distribution, with probability limit Γ
and covariance matrix estimated consistently by
P 0 −1 P V V 0 η2 P V V 0 −1 .
i Vi Vi i i i i i i i
As with OLS, if ηi is conditionally homoskedastic given covariates and instruments, the
consistent covariance matrix estimator simplifies to
P 0 −1 σ2 .
i Vi Vi η
Rus’an Nasrudin Instrumental Variable November 16, 2021 27 / 29
Asymptotic 2SLS inference
Inference..be careful
It seems natural to construct 2SLS estimates manually by first estimating the first stage and
the plug in the fitted values into the second stage or main equation.
This procedure is ok for the coefficient of estimates, but the resulting standard errors will be
incorrect.
The conventional regression software does not recognise that we are trying to construct a
2SLS estimate.
The residual variance estimator that goes into the standard formulas therefore be incorrect.
When constructing standard errors, the software will estimate the residual variance of the
equation:
Yi − [α0 Xi + ρŝi ] = [ηi + ρ(Si − ŝi )]
Rus’an Nasrudin Instrumental Variable November 16, 2021 28 / 29
Asymptotic 2SLS inference
Correct way
The correct residual variance estimator, however, uses the original endogenous regressor to
construct the residuals and not the first stage fitted values, ŝi .
In other words, the residual you want is Yi − [α0 Xi + ρSi ] = ηi , so as to consistently estimate σ2η
and not ηi + ρ(Si − ŝi ).
Therefore, use software designed for 2SLS such as ivreg or ivreg2 in Stata.
Rus’an Nasrudin Instrumental Variable November 16, 2021 29 / 29