CS334: Principles and
Techniques of Data Science
Lecture 11
Mobin Javed
Slides partially adapted from DS100 at UC Berkeley and Ihsan Qazi
Announcements
Midterm 2 on March 26
Today
● Counterfactual Reasoning
● Potential Outcomes Framework
● Unobserved Confounding
● Non/Quasi-Experimental Methods for Causal Inference
A Practical Definition of Causality
Definition: X causes Y iff
changing X leads to a change in Y,
keeping everything else constant
The causal effect is the magnitude by which Y is changed by a
unit change in X
Called the “interventionist” interpretation of causality
http://plato.stanford.edu/entries/causation-mani/
Keeping everything else constant
Imagine a counterfactual world
What if the patient had not taken the cholesterol lowering drug?
What if an algorithm was not changed?
Counterfactual
● The counterfactual represents the state of the world that
program participants would have experienced in the absence of
the program
● Problem: Counterfactual cannot be observed
● Solution: We need to “mimic” or construct the counterfactual
Constructing the Counterfactual
● Usually done by selecting a group of individuals that did not
participate in the program
− This group is usually referred to as the control group or comparison group
● How this group is selected is a key decision in the design of any
impact evaluation
Selecting the comparison group
● Idea: Comparability
● Goal of impact evaluation: Attribution
Today
● Counterfactual Reasoning
● Potential Outcomes Framework
● Unobserved Confounding
● Quasi-Experimental Methods for Causal Inference
Treatments, Controls, and Outcomes
● We will assess causality with respect to a binary variable 𝑇, and
refer to its values as the treatment and the control
− We use variable 𝑖 ∈ {1, … , 𝑛} to denote the experimental units
● To each unit, we apply either the treatment or the control, but
not both
− {𝑇! = 1}: denotes the event that unit 𝑖 is given the treatment and
− {𝑇! = 0}: denotes the event that unit 𝑖 is given the control
● The observed outcome for unit 𝑖 is denoted by 𝑌!
Potential Outcomes & Counterfactuals
𝑌, (1) 𝑖𝑓 𝑇, = 1
𝑌, = #
𝑌, 0 𝑖𝑓 𝑇, = 0
𝑌! (1) 𝑌! (0)
Observed outcome
for unit i
Potential outcome if person Potential outcome if person
𝑖 received treatment 𝑖 did not receive treatment
Potential Outcomes & Counterfactuals
Individual causal effect of
treatment on unit 𝑖:
𝜏$ = 𝑌$ (1) − 𝑌$ (0)
𝑌! (1) 𝑌! (0)
Potential outcome if person Potential outcome if person
𝑖 received treatment 𝑖 did not receive treatment
Fundamental Problem of Causal Inference
● We cannot observe the counterfactual!
− i.e., we can only observe one of the potential outcomes!
𝑃𝑒𝑟𝑠𝑜𝑛 𝑖 𝑇𝒊 𝑌𝒊 (1) 𝑌𝒊 (0)
1 1 0.6 0.3
2 0 0.2 0.6
3 1 0.4 0.2
4 0 0.3 0.1
5 1 0.7 0.5
6 0 0.6 0.4
7 0 0.3 0.1
Fundamental Problem of Causal Inference
● We cannot observe the counterfactual!
− i.e., we can only observe one of the potential outcomes!
𝑃𝑒𝑟𝑠𝑜𝑛 𝑖 𝑇𝒊 𝑌𝒊 (1) 𝑌𝒊 (0)
1 1 0.6 0.3
2 0 0.2 0.6
3 1 0.4 0.2
4 0 0.3 0.1
5 1 0.7 0.5
6 0 0.6 0.4
7 0 0.3 0.1
Fundamental Problem of Causal Inference
How can we find the individual causal effect?
𝜏! = 𝑌! (1) − 𝑌! (0)
Option-1: Assume that {𝑌! (1) , 𝑌! (0)} is the same for all individuals? (unit
homogeneity)
{𝑌" (1) , 𝑌" (0)}
{𝑌! (1) , 𝑌! (0)}
{𝑌# (1) , 𝑌# (0)}
Fundamental Problem of Causal Inference
How can we find the individual causal effect?
𝜏! = 𝑌! (1) − 𝑌! (0)
Option-1: Assume that {𝑌! (1) , 𝑌! (0)} is the same for all individuals? (unit
homogeneity), i.e., causal effect is the same across units
𝜏! = 𝑌! 1 − 𝑌" 0
𝜏# = 𝑌# (1) − 𝑌" (0)
{𝑌" (1) , 𝑌" (0)} 𝜏" = 𝑌! (1) − 𝑌" (0)
{𝑌! (1) , 𝑌! (0)}
{𝑌# (1) , 𝑌# (0)}
Fundamental Problem of Causal Inference
How can we find the individual causal effect?
𝜏! = 𝑌! (1) − 𝑌! (0)
Option-1: Assume that {𝑌! (1) , 𝑌! (0)} is the same for all individuals? (unit
homogeneity), i.e., causal effect is the same across units
Option-2: Assume that {𝑌! (1), 𝑌! (0)} is constant across time (though the
effect may differ across units)
{𝑌" (1) , 𝑌" (0)}
{𝑌! (1) , 𝑌! (0)}
{𝑌# (1) , 𝑌# (0)}
Fundamental Problem of Causal Inference
How can we find the individual causal effect?
𝜏! = 𝑌! (1) − 𝑌! (0)
Option-1: Assume that {𝑌! (1) , 𝑌! (0)} is the same for all individuals? (unit
homogeneity) , i.e., causal effect is the same across units
Option-2: Assume that {𝑌! (1) , 𝑌! (0)} is constant across time (though the
effect may differ across units 𝜏 = 𝑌 1 − 𝑌 (0) ! ! !
𝜏# = 𝑌# 1 − 𝑌# (0)
𝜏" = 𝑌" 1 − 𝑌" (0)
{𝑌" (1) , 𝑌" (0)}
Measured before
{𝑌! (1) , 𝑌! (0)}
{𝑌# (1) , 𝑌# (0)} treatment is given
Fundamental Problem of Causal Inference
● These assumptions are strong, which are rarely true
− Without any further information about the missing potential outcomes, we
cannot estimate the individual causal effect
● So, what should we do?
− Instead, we aim to estimate the average causal effect (a.k.a. average
treatment effect
− That is, we consider groups of individuals rather than one person and work
with average treatment effect
Average Treatment Effect (ATE)
● Average of over all individual causal effects
Causal Inference Problem
Z𝑇!i Yi (0) Yi (1)
0
.. ⇥
. ?.
.. .. 𝔼[𝜏! ] = 𝔼[𝑌! (1) − 𝑌! (0)]
.
0 ⇥ ? Can we estimate 𝔼 𝜏! from data?
1
.. ?. ⇥
.
. .. ..
1 ? ⇥
can we infer something about the difference between
ment and control when we never observe them together?
University of California, Berkeley
Differences-in-Means Estimator for ATE
D: Difference in means (aka
associational difference)
𝐷 = 𝔼[𝑌, 1 𝑇, = 1 − 𝔼[𝑌5 0 |𝑇5 = 0]
Finding Average Causal Effects
Is ATE = D?
Is 𝔼[𝑌! (1) − 𝑌! (0)] = 𝔼[𝑌! 1 𝑇! = 1 − 𝔼[𝑌* 0 |𝑇* = 0]?
Let’s Work Through an Example
Suppose we could observe both potential outcomes for all individuals!
𝑃𝑒𝑟𝑠𝑜𝑛 𝑖 𝑇𝒊 𝑌𝒊 (1) 𝑌𝒊 (0) 𝜏!
1 1 0.6 0.3 0.3
2 0 0.2 0.6 -0.4
3 1 0.4 0.2 0.2
4 0 0.3 0.1 0.2
5 1 0.7 0.5 0.2
6 0 0.6 0.4 0.2
7 0 0.3 0.1 0.2
Let’s Work Through an Example
Suppose we could observe both potential outcomes for all individuals!
𝑃𝑒𝑟𝑠𝑜𝑛 𝑖 𝑇𝒊 𝑌𝒊 (1) 𝑌𝒊 (0) 𝜏!
1 1 0.6 0.3 0.3
2 0 0.2 0.6 -0.4
3 1 0.4 0.2 0.2
4 0 0.3 0.1 0.2
5 1 0.7 0.5 0.2
6 0 0.6 0.4 0.2
7 0 0.3 0.1 0.2
𝔼[𝑌! 1 ] =? 𝔼[𝑌! 0 ] = ? 𝜏 = 𝔼[𝑌! (1) − 𝑌! (0)] =?
Let’s Work Through an Example
Suppose we could observe both potential outcomes for all individuals!
𝑃𝑒𝑟𝑠𝑜𝑛 𝑖 𝑇𝒊 𝑌𝒊 (1) 𝑌𝒊 (0) 𝜏!
1 1 0.6 0.3 0.3
2 0 0.2 0.6 -0.4
3 1 0.4 0.2 0.2
4 0 0.3 0.1 0.2
5 1 0.7 0.5 0.2
6 0 0.6 0.4 0.2
7 0 0.3 0.1 0.2
𝔼[𝑌! 1 ] = 0.44 𝔼[𝑌! 0 ] = 0.31 𝜏 = 0.13
"
But we don’t know the 𝜏𝒊 𝑠!
𝑃𝑒𝑟𝑠𝑜𝑛 𝑖 𝑇𝒊 𝑌𝒊 (1) 𝑌𝒊 (0) 𝜏!
1 1 0.6 0.3 0.3
2 0 0.2 0.6 -0.4
3 1 0.4 0.2 0.2
4 0 0.3 0.1 0.2
5 1 0.7 0.5 0.2
6 0 0.6 0.4 0.2
7 0 0.3 0.1 0.2
𝔼[𝑌! 1 𝑇! = 1 =? 𝔼[𝑌! 0 𝑇! = 0 =? 𝐷 =?
"
But we don’t know the 𝜏𝒊 𝑠!
Let’s find the average of the observed outcomes in the two groups.
𝑃𝑒𝑟𝑠𝑜𝑛 𝑖 𝑇𝒊 𝑌𝒊 (1) 𝑌𝒊 (0) 𝜏!
1 1 0.6 0.3 0.3
2 0 0.2 0.6 -0.4
3 1 0.4 0.2 0.2
4 0 0.3 0.1 0.2
5 1 0.7 0.5 0.2
6 0 0.6 0.4 0.2
7 0 0.3 0.1 0.2
𝔼[𝑌! 1 𝑇! = 1 = 0.57 𝔼[𝑌! 0 𝑇! = 0 = 0.3 𝐷 = 0.27
Take-away
𝔼[𝜏! ] = 0.13 ≠ 𝐷 = 0.27
● In general, D is NOT the same as ATE!
● Why? because association is not causation!
So what do we need for D to be
equal to ATE?
Key Assumptions
● Ignorability: Potential outcomes are independent of the
treatment assignment! {Yi(1),Yi(0)}⊥T
● Stable Unit Treatment Value Assumption (SUTVA) (a.k.a. no
interference assumption)
− Ex: vaccination, communication
● No different versions of treatment (a.k.a. Stability, consistency)
− Violations may occur due to variable levels of treatment or technical
errors
When is D = ATE?
1 𝔼[𝑌! 1 𝑇 = 1 − 𝔼[𝑌! (0)|𝑇 = 0]
2 = 𝔼[𝑌! 1 𝑇 = 1 − 𝔼[𝑌! 0 𝑇 = 1 + 𝔼[𝑌! 0 𝑇 = 1 − 𝔼[𝑌! 0 𝑇 = 0
3 = (𝔼[𝑌! 1 𝑇 = 1 − 𝔼[𝑌! 0 𝑇 = 1 ) + 𝔼[𝑌! 0 𝑇 = 1 − 𝔼[𝑌! 0 𝑇 = 0
𝐼𝑓 {𝑌! (1), 𝑌! (0)} ⊥ 𝑇 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑡ℎ𝑒𝑛
4 = (𝔼[𝑌! 1 𝑇 = 1 − 𝔼[𝑌! 0 𝑇 = 1 ) + 𝔼[𝑌! 0 𝑇 = 1 − 𝔼[𝑌! 0 𝑇 = 0
5 = 𝔼[𝑌! (1) − 𝑌! (0)]
Causal Inference: Missing Data Problem
● Causal inference is the problem of estimating the counterfactual
− Estimate missing data values using various methods
− 𝑌! (0) now becomes an estimated quantity, based on outcomes of other
people who did not receive treatment
Various Methodologies
● Randomized Control Trials (RCTs)
− Gold standard
● Quasi-Experimental Methods
− Examples: cross-sectional approaches, before-after comparison, matching,
difference-in-differences, regression discontinuity, and instrumental
variables
RCTs are the Gold Standard
● RCTs have long been a gold standard in the medical field, for
example in establishing whether a new drug works
● They have also become increasingly common in other fields such
as economics
Caveats in RCTs
● Sometimes, people’s responses in an experiment are influenced
by their knowing which group they are in
− So, you might want to run a blind experiment…
− To make this work, you will have to give the control group a placebo,
which is something…
● External validity (or generalizability) is always an important
question
But RCTs may not always be possible!
● RCTs can be expensive, infeasible, or unethical
− For example, suppose you want to study the effects of smoking and you
randomly assign some people to your “smoking” group
● What can we do when an experiment is not possible?
− In such situations you will almost invariably be conducting an observational
study, not an experiment
− Be alert for confounding factors
Observational Data
Let’s now turn to the setting in which we don’t have
the ability to assign treatments randomly to units.
A lot of observational data is out there
● Social media logs
● Credit card transactions
● Website/apps logs
● Satellite images
● …
● Can we do causal inference using observational data? What can
go wrong?
Today
● Counterfactual Reasoning
● Potential Outcomes Framework
● Unobserved Confounding
● Quasi-Experimental Methods for Causal Inference
Is the New Drug Effective?
Goal: Evaluate the effectiveness of a new drug (treatment) in
improving the recovery rate (outcome) from a disease
● T=0 à Control group (those who do not take the drug)
● T=1 à Treatment group (those who take the new drug)
We record the recovery rates of 700 patients who were given
access to the drug. A total of 350 patients chose to take the drug
and 350 patients did not.
Is the New Drug Effective?
Drug (T=1) No Drug (T=0) N
Men 93% (81/87) 87% (234/270) 357
Women 73% (192/263) 69% (55/80) 343
Combined Data 78% (273/350) 83% (289/350) 700
Is this new drug effective? Should we prescribe it to
women? men? everyone?
Is the New Drug Effective?
Drug (T=1) No Drug (T=0) N
Men 93% (81/87) 87% (234/270) 357
Women 73% (192/263) 69% (55/80) 343
Combined Data 78% (273/350) 83% (289/350) 700
The answer depends on the causal mechanism or
the data generating process!
Is the New Drug Effective?
Drug (T=1) No Drug (T=0) N
Men 93% (81/87) 87% (234/270) 357
Women 73% (192/263) 69% (55/80) 343
Combined Data 78% (273/350) 83% (289/350) 700
• Suppose we knew an additional fact: a women-only hormone has a
negative effect on recovery, so women are less likely to recover
than men, regardless of the drug.
• In addition, as we can see from the data, women are significantly
more likely to take the drug than men are.
Key Insight
● The reason the drug appears to be harmful overall is:
− If we select a drug user at random, that person is more likely
to be a woman and hence less likely to recover than a random
person who does not take the drug
− Put differently, being a woman is a common cause of both
drug taking and failure to recover!
G
T Y
Insight
● Thus, to assess the effectiveness of the drug, we need
to compare subjects of the same gender
− This will ensure that any difference in recovery rates can be
attributed to the drug
− Thus, we should consult the segregated data, which shows
unequivocally that the drug is helpful
− This matches our intuition, which tells us that the segregated
data is “more specific,” hence more informative, than the
unsegregated data
This was an example of Simpson’s Paradox