0% found this document useful (0 votes)
46 views25 pages

Matchse Handout

This document discusses components of causal estimation error and how they relate to different research designs for causal inference. It outlines the decomposition of causal effect estimation error and how elements like sample selection, treatment imbalance, observed covariates, and unobserved covariates contribute to error. The document also examines how randomization, blocking, matching, and other design choices can influence these different error components.

Uploaded by

Lance
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views25 pages

Matchse Handout

This document discusses components of causal estimation error and how they relate to different research designs for causal inference. It outlines the decomposition of causal effect estimation error and how elements like sample selection, treatment imbalance, observed covariates, and unobserved covariates contribute to error. The document also examines how randomization, blocking, matching, and other design choices can influence these different error components.

Uploaded by

Lance
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Quantitative Social Science Methods, I,

Lecture Notes: Research Designs for Causal


Inference

Gary King1
Institute for Quantitative Social Science
Harvard University

August 17, 2020

1
GaryKing.org
1 / 25 .
Components of Causal Estimation Error

Research Designs

Issues in Ideal Designs

Components of Causal Estimation Error 2 / 25 .


Reference

• Kosuke Imai, Gary King, and Elizabeth Stuart.


Misunderstandings among Experimentalists and
Observationalists: Balance Test Fallacies in Causal Inference
Journal of the Royal Statistical Society, Series A, 171, Part 2
(2008): 1–22.
• http://j.mp/MisExpObs

Components of Causal Estimation Error 3 / 25 .


Notation

• Sample 𝑛 units from finite population size 𝑁 (typically


𝑁 ≫ 𝑛)
• Observed outcome variable: 𝑌𝑖
• Sample selection: 𝐼𝑖 = 1 if selected, 0 otherwise
• Treatment assignment: 𝑇𝑖 = 1 if treated group, 0 if control
• (Assume: treated and control groups are each of size 𝑛/2)
• Potential outcomes: 𝑌𝑖 (1) and 𝑌𝑖 (0), 𝑌𝑖 when 𝑇𝑖 is 1 or 0
• Fundamental problem of causal inference. Only one potential
outcome is ever observed:
If 𝑇𝑖 = 0, 𝑌𝑖 (0) = 𝑌𝑖 𝑌𝑖 (1) = ?
If 𝑇𝑖 = 1, 𝑌𝑖 (0) = ? 𝑌𝑖 (1) = 𝑌𝑖
• (𝐼𝑖 , 𝑇𝑖 , 𝑌𝑖 ) are random; 𝑌𝑖 (1) and 𝑌𝑖 (0) are fixed.
• Quiz: How can 𝑌𝑖 be random when 𝑌𝑖 (0) and 𝑌𝑖 (1) are fixed?

Components of Causal Estimation Error 4 / 25 .


Quantities of Interest

• Treatment Effect (for unit 𝑖):

TE𝑖 ≡ 𝑌𝑖 (1) − 𝑌𝑖 (0)

• Population Average Treatment Effect

1 𝑁
PATE ≡ ∑ TE𝑖
𝑁 𝑖=1

• Sample Average Treatment Effect

1
SATE ≡ ∑ TE𝑖
𝑛 𝑖∈{𝐼 =1}
𝑖

Components of Causal Estimation Error 5 / 25 .


Decomposition of Causal Effect Estimation Error

• Difference in means estimator

⎛ 1 ⎞ ⎛ 1 ⎞
𝐷 ≡ 𝑌1̄ − 𝑌0̄ = ⎜ ∑ 𝑌𝑖 ⎟ − ⎜ ∑ 𝑌𝑖 ⎟ .
⎝ 𝑛/2 𝑖 ∈{𝐼𝑖 =1,𝑇𝑖 =1} ⎠ ⎝ 𝑛/2 𝑖 ∈{𝐼𝑖 =1,𝑇𝑖 =0} ⎠

• Pretreatment confounders: observed 𝑋 ; unobserved 𝑈


• Decomposition

Δ ≡ PATE − 𝐷 (Estimation error)


= Δ𝑆 + Δ𝑇
= (Δ𝑆𝑋 + Δ𝑆𝑈 ) + (Δ𝑇𝑋 + Δ𝑇𝑈 )

Error due to: Δ𝑆 (sample selection), Δ𝑇 (treatment


imbalance), and each due to observed (𝑋𝑖 ) and unobserved
(𝑈𝑖 ) covariates

Components of Causal Estimation Error 6 / 25 .


Decomposing Selection Error
Δ = Δ𝑆 + Δ𝑇 = (Δ𝑆𝑋 + Δ𝑆𝑈 ) + Δ𝑇
• Definition
Δ𝑆 ≡ PATE − SATE
𝑁 −𝑛
= (NATE − SATE), NATE: nonsample ATE
𝑁
• Δ𝑆 vanishes if
• The sample is a census (𝐼𝑖 = 1 for all observations and 𝑛 = 𝑁 );
• SATE = NATE (i.e., nothing to correct)
• Switch quantity of interest from PATE to SATE
(recommended!)
• Δ𝑆𝑋 = 0 when empirical distribution of (observed) 𝑋 is
identical in population and sample:
̃
𝐹 (𝑋 ∣ 𝐼 = 0) = ̃ 𝐹 (𝑋 ∣ 𝐼 = 1).
• Δ𝑆𝑈 = 0 when empirical distribution of (unobserved) 𝑈 is
identical in population and sample:
̃
𝐹 (𝑈 ∣ 𝐼 = 0) = ̃ 𝐹 (𝑈 ∣ 𝐼 = 1).
• Unverifiable: 𝑋 unobserved out of sample; 𝑈 unobserved
• Δ𝑆𝑋 : vanishes if weighting on 𝑋 (and examples exist in
sample)
Components of Causal Estimation Error 7 / 25 .
Decomposing Treatment Imbalance
Δ = Δ𝑆 + Δ𝑇 = Δ𝑆 + (Δ𝑇𝑋 + Δ𝑇𝑈 )

• Δ𝑇𝑋 = 0: when 𝑋 balanced between treateds and controls:

̃
𝐹 (𝑋 ∣ 𝑇 = 1, 𝐼 = 1) = ̃
𝐹 (𝑋 ∣ 𝑇 = 0, 𝐼 = 1).

Verifiable; generated ex ante by blocking or ex post via


matching or modeling
• Δ𝑇𝑈 = 0: when 𝑈 balanced between treateds and controls:

𝐹 (𝑈 ∣ 𝑇 = 1, 𝐼 = 1) = ̃
̃ 𝐹 (𝑈 ∣ 𝑇 = 0, 𝐼 = 1).

Unverifiable; Achieved only by assumption or, on average, by


random treatment assignment

Components of Causal Estimation Error 8 / 25 .


Alternative Quantities of Interest: For Matching
• Population average treatment effect on the treated

1
PATT ≡ ∑ TE𝑖
𝑁 ∗ 𝑖∈{𝑇 =1}
𝑖

(𝑁 ∗ = ∑𝑁
𝑖=1 𝑇𝑖 : number of treated units in population)
• Sample average treatment effect on the treated

1
SATT ≡ ∑ TE𝑖
𝑛/2 𝑖∈{𝐼 =1,𝑇 =1}
𝑖 𝑖

• Analogous estimation error decomposition holds:

Δ′ = PATT − 𝐷 = (Δ′𝑆𝑋 + Δ′𝑆𝑈 ) + (Δ′𝑇𝑋 + Δ′𝑇𝑈 )

• Quiz: Why PATT and SATT rather than PATE and SATE for
matching?
• Quiz: How do they differ in randomized experiments?
Components of Causal Estimation Error 9 / 25 .
Effects of Design Components on Estimation Error
Δ = Δ𝑆 + Δ𝑇 = (Δ𝑆𝑋 + Δ𝑆𝑈 ) + (Δ𝑇𝑋 + Δ𝑇𝑈 )

Design Choice Δ𝑆𝑋 Δ𝑆𝑈 Δ𝑇𝑋 Δ𝑇𝑈


avg avg
Random sampling = 0 = 0
avg
Complete stratified random sampling =0 = 0
Focus on SATE rather than PATE =0 =0
Weighting for nonrandom sampling =0 =?
Large sample size →? →? →? →?
avg avg
Random treatment assignment = 0 = 0
Complete blocking =0 =?
Exact matching =0 =?
Assumption
avg avg
No selection bias = 0 = 0
avg
Ignorability = 0
No omitted variables =0

Components of Causal Estimation Error 10 / 25 .


Comparing Blocking (i.e., before) and Matching (i.e.,
after)
• Adding blocking (on pretreatment vars related to outcome) to
random assignment: as or more efficient, and never biased
• Blocking: like regression adjustment, where functional form
and the parameter values are known
• Matching is like blocking, except:
• to avoid selection error: change QOI from PATE to PATT/SATT
• random treatment assignment following matching:
impossible
• Exact matching, unlike blocking: dependent on good matches
in already-collected data
• Worst case scenario: matching on wrong vars (like regression
adjustment) can increase bias
• Adding matching to a parametric model: reduces model
dependence and bias, and sometimes variance too
• Quiz: Which is preferable: Matching or Blocking?

Components of Causal Estimation Error 11 / 25 .


Components of Causal Estimation Error

Research Designs

Issues in Ideal Designs

Research Designs 12 / 25 .
The Benefits of Major Research Designs: Overview
Δ𝑆𝑋 Δ 𝑆𝑈 Δ 𝑇𝑋 Δ 𝑇𝑈
Ideal experiment →0 →0 =0 →0
Randomized clinicial trials
avg avg
(Limited or no blocking) ≠0 ≠0 = 0 = 0
Randomized clinicial trials
avg
(Full blocking) ≠0 ≠0 =0 = 0
Social Science
Field Experiment • → 0: 𝐸(𝑄) = 0 &
(Limited or no blocking) ≠0 ≠0 →0 →0 lim Var(𝑄) = 0
Survey Experiment 𝑛→∞
(Limited or no blocking) →0 →0 →0 →0
Observational Study avg
(Representative data set, • = 0: 𝐸(𝑄) = 0
Well-matched) ≈0 ≈0 ≈0 ≠0
Observational Study
(Unrepresentative but partially,
correctable data, well-matched) ≈0 ≠0 ≈0 ≠0
Observational Study
(Unrepresentative data set,
Well-matched) ≠0 ≠0 ≈0 ≠0

Research Designs 13 / 25 .
The Ideal Experiment (according to the paper)

• Random selection from well-defined population


• large 𝑛
• blocking on all known confounders
• random treatment assignment within blocks
• 𝐸(Δ𝑆𝑋 ) = 0, lim𝑛→∞ 𝑉 (Δ𝑆𝑋 ) = 0
• 𝐸(Δ𝑆𝑈 ) = 0, lim𝑛→∞ 𝑉 (Δ𝑆𝑈 ) = 0
• Δ𝑇𝑋 = 0
• 𝐸(Δ𝑇𝑈 ) = 0, lim𝑛→∞ 𝑉 (Δ𝑇𝑈 ) = 0
• Quiz: Is there an even more ideal experiment?
• Hint: How can we make Δ𝑆𝑋 = 0?

Research Designs 14 / 25 .
An Even More Ideal Experiment (not in the paper)

• Begin with a well-defined population


• New feature: Define sampling strata based on
cross-classification of all known confounders
• Random sampling within strata
• (if strata sample ∝ population size, no weights needed)
• large 𝑛
• blocking on all known confounders
• random treatment assignment within blocks
• Δ𝑆𝑋 = 0
• 𝐸(Δ𝑆𝑈 ) = 0, lim𝑛→∞ 𝑉 (Δ𝑆𝑈 ) = 0
• Δ𝑇𝑋 = 0
• 𝐸(Δ𝑇𝑈 ) = 0, lim𝑛→∞ 𝑉 (Δ𝑇𝑈 ) = 0
• Wait, why wasn’t this in the paper?

Research Designs 15 / 25 .
Randomized Clinical Trials (Little or no Blocking)

• nonrandom selection
• small 𝑛
• little or no blocking
• random treatment assignment
• Δ𝑆𝑋 ≠ 0
• Δ𝑆𝑈 ≠ 0
• 𝐸(Δ𝑇𝑋 ) = 0
• 𝐸(Δ𝑇𝑈 ) = 0

Research Designs 16 / 25 .
Randomized Clinical Trials (Full Blocking)

• nonrandom selection
• small 𝑛
• Full blocking
• random treatment assignment
• Δ𝑆𝑋 ≠ 0
• Δ𝑆𝑈 ≠ 0
• Δ𝑇𝑋 = 0
• 𝐸(Δ𝑇𝑈 ) = 0

Research Designs 17 / 25 .
Social Science Field Experiment

• nonrandom selection
• large 𝑛
• limited or no blocking
• random treatment assignment
• Δ𝑆𝑋 ≠ 0 or change PATE to SATE and Δ𝑆𝑋 = 0
• Δ𝑆𝑈 ≠ 0 or change PATE to SATE and Δ𝑆𝑈 = 0
• 𝐸(Δ𝑇𝑋 ) = 0, lim𝑛→∞ 𝑉 (Δ𝑇𝑋 ) = 0
• 𝐸(Δ𝑇𝑈 ) = 0, lim𝑛→∞ 𝑉 (Δ𝑇𝑈 ) = 0

Research Designs 18 / 25 .
Survey Experiment

• random selection
• large 𝑛
• limited or no blocking
• random treatment assignment
• (only treatments: question wording changes)
• 𝐸(Δ𝑆𝑋 ) = 0, lim𝑛→∞ 𝑉 (Δ𝑆𝑋 ) = 0
• 𝐸(Δ𝑆𝑈 ) = 0, lim𝑛→∞ 𝑉 (Δ𝑆𝑈 ) = 0
• 𝐸(Δ𝑇𝑋 ) = 0, lim𝑛→∞ 𝑉 (Δ𝑇𝑋 ) = 0
• 𝐸(Δ𝑇𝑈 ) = 0, lim𝑛→∞ 𝑉 (Δ𝑇𝑈 ) = 0

Research Designs 19 / 25 .
Observational Study, well-matched

• no stratification, nonrandom selection


• large 𝑛
• no blocking, nonrandom treatment assignment
• Δ𝑆𝑋 ≈ 0 if representative, corrected by weighting, or for
estimating SATE; or ≠ 0 otherwise
• Δ𝑆𝑈 ≠ 0
• Δ𝑇𝑋 ≈ 0 (due to matching well)
• Δ𝑇𝑈 ≠ 0 except by assumption

Research Designs 20 / 25 .
Components of Causal Estimation Error

Research Designs

Issues in Ideal Designs

Issues in Ideal Designs 21 / 25 .


What is the Best Design?

• Ideal design: rarely feasible


• Effort in experimental studies: random assignment
• Effort in observational studies: knowing, measuring, and
adjusting for 𝑋 (via matching or modeling)
• Achilles heal of experiments: Δ𝑆 , small 𝑛
• Achilles heal of observational studies: Δ𝑇
• Each design: accommodates best to its applications
• Quiz: Astronomers never randomize; is astronomy a science?

Issues in Ideal Designs 22 / 25 .


Fallacies in Experimental Research

• Failure to block on all available confounders


• incorrectly seen as requiring fewer assumptions (about what
to block on)
• In fact, blocking helps (except in strange situations)
• Blocking on relevant covariates is better, so choose carefully.
• “Block what you can and randomize what you cannot”
• t-test to check balance after random treatment assignment
• blocking vars: balance exactly after treatment assignment; if
you’re checking, you missed an opportunity to increase
efficiency
• if vars become available after treatment assignment: t-test
checks if randomization was done appropriately
• randomization balances on average: any one random
assignment is not balanced exactly (which is why its better to
block)

Issues in Ideal Designs 23 / 25 .


The Balance Test Fallacy in Matching Research

100
4

80
3

60
Math test score
t−statistic

"Statistical

40
insignificance" region
1

20 QQ Plot Mean Deviation


Difference in Means
0

0 100 200 300 400 0 100 200 300 400

Number of Controls Randomly Dropped Number of Controls Randomly Dropped

Quiz: randomly dropping observations reduces imbalance??

Issues in Ideal Designs 24 / 25 .


The Balance Test Fallacy: Explanation

• Hypo tests: balance and power; only want balance


• Balance is observed: No need for superpopulation or
inference
• Simple linear model (for intution):
• Suppose 𝐸(𝑌 ∣ 𝑇 , 𝑋 ) = 𝜃 + 𝑇 𝛽 + 𝑋 𝛾
• Bias in coefficient on 𝑇 from regressing 𝑌 on 𝑇 (without 𝑋 ):
𝐸(𝛽 ̂ − 𝛽 ∣ 𝑇 , 𝑋 ) = 𝐺𝛾 (where 𝐺 are coefficients from a
regression 𝑋 on a constant and 𝑇 )
• Imbalance: 𝐺, Importance: 𝛾
• If 𝐺 = 0, bias=0
• If 𝐺 ≠ 0, bias can be any size (due to 𝛾 )
• To reduce bias: reduce 𝐺 without limit
• No threshold level is safe
• But prune too much, variance increases
• Quiz: Should we match on vars that do not influence 𝑌 ?

Issues in Ideal Designs 25 / 25 .

You might also like