0% found this document useful (0 votes)
15 views46 pages

Lec 11

Lecture 11 of CS334 focuses on counterfactual reasoning and the potential outcomes framework in causal inference. It discusses the challenges of unobserved confounding and methods for estimating causal effects, emphasizing the importance of constructing comparison groups. The lecture also introduces the concept of average treatment effect (ATE) and the fundamental problem of causal inference, which is the inability to observe counterfactual outcomes.

Uploaded by

salaarmasood321
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views46 pages

Lec 11

Lecture 11 of CS334 focuses on counterfactual reasoning and the potential outcomes framework in causal inference. It discusses the challenges of unobserved confounding and methods for estimating causal effects, emphasizing the importance of constructing comparison groups. The lecture also introduces the concept of average treatment effect (ATE) and the fundamental problem of causal inference, which is the inability to observe counterfactual outcomes.

Uploaded by

salaarmasood321
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

CS334: Principles and

Techniques of Data Science


Lecture 11
Mobin Javed

Slides partially adapted from DS100 at UC Berkeley and Ihsan Qazi


Announcements

Midterm 2 on March 26
Today
● Counterfactual Reasoning
● Potential Outcomes Framework
● Unobserved Confounding
● Non/Quasi-Experimental Methods for Causal Inference
A Practical Definition of Causality

Definition: X causes Y iff


changing X leads to a change in Y,
keeping everything else constant

The causal effect is the magnitude by which Y is changed by a


unit change in X

Called the “interventionist” interpretation of causality

http://plato.stanford.edu/entries/causation-mani/
Keeping everything else constant
Imagine a counterfactual world

What if the patient had not taken the cholesterol lowering drug?
What if an algorithm was not changed?
Counterfactual
● The counterfactual represents the state of the world that
program participants would have experienced in the absence of
the program
● Problem: Counterfactual cannot be observed
● Solution: We need to “mimic” or construct the counterfactual
Constructing the Counterfactual
● Usually done by selecting a group of individuals that did not
participate in the program
− This group is usually referred to as the control group or comparison group
● How this group is selected is a key decision in the design of any
impact evaluation
Selecting the comparison group
● Idea: Comparability

● Goal of impact evaluation: Attribution


Today
● Counterfactual Reasoning
● Potential Outcomes Framework
● Unobserved Confounding
● Quasi-Experimental Methods for Causal Inference
Treatments, Controls, and Outcomes
● We will assess causality with respect to a binary variable 𝑇, and
refer to its values as the treatment and the control
− We use variable 𝑖 ∈ {1, … , 𝑛} to denote the experimental units

● To each unit, we apply either the treatment or the control, but


not both
− {𝑇! = 1}: denotes the event that unit 𝑖 is given the treatment and
− {𝑇! = 0}: denotes the event that unit 𝑖 is given the control

● The observed outcome for unit 𝑖 is denoted by 𝑌!


Potential Outcomes & Counterfactuals
𝑌, (1) 𝑖𝑓 𝑇, = 1
𝑌, = #
𝑌, 0 𝑖𝑓 𝑇, = 0

𝑌! (1) 𝑌! (0)
Observed outcome
for unit i

Potential outcome if person Potential outcome if person


𝑖 received treatment 𝑖 did not receive treatment
Potential Outcomes & Counterfactuals
Individual causal effect of
treatment on unit 𝑖:

𝜏$ = 𝑌$ (1) − 𝑌$ (0)
𝑌! (1) 𝑌! (0)

Potential outcome if person Potential outcome if person


𝑖 received treatment 𝑖 did not receive treatment
Fundamental Problem of Causal Inference
● We cannot observe the counterfactual!
− i.e., we can only observe one of the potential outcomes!
𝑃𝑒𝑟𝑠𝑜𝑛 𝑖 𝑇𝒊 𝑌𝒊 (1) 𝑌𝒊 (0)
1 1 0.6 0.3
2 0 0.2 0.6
3 1 0.4 0.2
4 0 0.3 0.1
5 1 0.7 0.5
6 0 0.6 0.4
7 0 0.3 0.1
Fundamental Problem of Causal Inference
● We cannot observe the counterfactual!
− i.e., we can only observe one of the potential outcomes!
𝑃𝑒𝑟𝑠𝑜𝑛 𝑖 𝑇𝒊 𝑌𝒊 (1) 𝑌𝒊 (0)
1 1 0.6 0.3
2 0 0.2 0.6
3 1 0.4 0.2
4 0 0.3 0.1
5 1 0.7 0.5
6 0 0.6 0.4
7 0 0.3 0.1
Fundamental Problem of Causal Inference
How can we find the individual causal effect?
𝜏! = 𝑌! (1) − 𝑌! (0)

Option-1: Assume that {𝑌! (1) , 𝑌! (0)} is the same for all individuals? (unit
homogeneity)

{𝑌" (1) , 𝑌" (0)}

{𝑌! (1) , 𝑌! (0)}


{𝑌# (1) , 𝑌# (0)}
Fundamental Problem of Causal Inference
How can we find the individual causal effect?
𝜏! = 𝑌! (1) − 𝑌! (0)

Option-1: Assume that {𝑌! (1) , 𝑌! (0)} is the same for all individuals? (unit
homogeneity), i.e., causal effect is the same across units
𝜏! = 𝑌! 1 − 𝑌" 0
𝜏# = 𝑌# (1) − 𝑌" (0)
{𝑌" (1) , 𝑌" (0)} 𝜏" = 𝑌! (1) − 𝑌" (0)

{𝑌! (1) , 𝑌! (0)}


{𝑌# (1) , 𝑌# (0)}
Fundamental Problem of Causal Inference
How can we find the individual causal effect?
𝜏! = 𝑌! (1) − 𝑌! (0)

Option-1: Assume that {𝑌! (1) , 𝑌! (0)} is the same for all individuals? (unit
homogeneity), i.e., causal effect is the same across units
Option-2: Assume that {𝑌! (1), 𝑌! (0)} is constant across time (though the
effect may differ across units)

{𝑌" (1) , 𝑌" (0)}

{𝑌! (1) , 𝑌! (0)}


{𝑌# (1) , 𝑌# (0)}
Fundamental Problem of Causal Inference
How can we find the individual causal effect?
𝜏! = 𝑌! (1) − 𝑌! (0)
Option-1: Assume that {𝑌! (1) , 𝑌! (0)} is the same for all individuals? (unit
homogeneity) , i.e., causal effect is the same across units
Option-2: Assume that {𝑌! (1) , 𝑌! (0)} is constant across time (though the
effect may differ across units 𝜏 = 𝑌 1 − 𝑌 (0) ! ! !
𝜏# = 𝑌# 1 − 𝑌# (0)
𝜏" = 𝑌" 1 − 𝑌" (0)
{𝑌" (1) , 𝑌" (0)}

Measured before
{𝑌! (1) , 𝑌! (0)}
{𝑌# (1) , 𝑌# (0)} treatment is given
Fundamental Problem of Causal Inference
● These assumptions are strong, which are rarely true
− Without any further information about the missing potential outcomes, we
cannot estimate the individual causal effect
● So, what should we do?
− Instead, we aim to estimate the average causal effect (a.k.a. average
treatment effect
− That is, we consider groups of individuals rather than one person and work
with average treatment effect
Average Treatment Effect (ATE)
● Average of over all individual causal effects
Causal Inference Problem
Z𝑇!i Yi (0) Yi (1)
0
.. ⇥
. ?.
.. .. 𝔼[𝜏! ] = 𝔼[𝑌! (1) − 𝑌! (0)]
.
0 ⇥ ? Can we estimate 𝔼 𝜏! from data?
1
.. ?. ⇥
.
. .. ..
1 ? ⇥
can we infer something about the difference between
ment and control when we never observe them together?
University of California, Berkeley
Differences-in-Means Estimator for ATE

D: Difference in means (aka


associational difference)

𝐷 = 𝔼[𝑌, 1 𝑇, = 1 − 𝔼[𝑌5 0 |𝑇5 = 0]


Finding Average Causal Effects

Is ATE = D?

Is 𝔼[𝑌! (1) − 𝑌! (0)] = 𝔼[𝑌! 1 𝑇! = 1 − 𝔼[𝑌* 0 |𝑇* = 0]?


Let’s Work Through an Example
Suppose we could observe both potential outcomes for all individuals!
𝑃𝑒𝑟𝑠𝑜𝑛 𝑖 𝑇𝒊 𝑌𝒊 (1) 𝑌𝒊 (0) 𝜏!
1 1 0.6 0.3 0.3
2 0 0.2 0.6 -0.4
3 1 0.4 0.2 0.2
4 0 0.3 0.1 0.2
5 1 0.7 0.5 0.2
6 0 0.6 0.4 0.2
7 0 0.3 0.1 0.2
Let’s Work Through an Example
Suppose we could observe both potential outcomes for all individuals!
𝑃𝑒𝑟𝑠𝑜𝑛 𝑖 𝑇𝒊 𝑌𝒊 (1) 𝑌𝒊 (0) 𝜏!
1 1 0.6 0.3 0.3
2 0 0.2 0.6 -0.4
3 1 0.4 0.2 0.2
4 0 0.3 0.1 0.2
5 1 0.7 0.5 0.2
6 0 0.6 0.4 0.2
7 0 0.3 0.1 0.2
𝔼[𝑌! 1 ] =? 𝔼[𝑌! 0 ] = ? 𝜏 = 𝔼[𝑌! (1) − 𝑌! (0)] =?
Let’s Work Through an Example
Suppose we could observe both potential outcomes for all individuals!
𝑃𝑒𝑟𝑠𝑜𝑛 𝑖 𝑇𝒊 𝑌𝒊 (1) 𝑌𝒊 (0) 𝜏!
1 1 0.6 0.3 0.3
2 0 0.2 0.6 -0.4
3 1 0.4 0.2 0.2
4 0 0.3 0.1 0.2
5 1 0.7 0.5 0.2
6 0 0.6 0.4 0.2
7 0 0.3 0.1 0.2
𝔼[𝑌! 1 ] = 0.44 𝔼[𝑌! 0 ] = 0.31 𝜏 = 0.13
"
But we don’t know the 𝜏𝒊 𝑠!

𝑃𝑒𝑟𝑠𝑜𝑛 𝑖 𝑇𝒊 𝑌𝒊 (1) 𝑌𝒊 (0) 𝜏!


1 1 0.6 0.3 0.3
2 0 0.2 0.6 -0.4
3 1 0.4 0.2 0.2
4 0 0.3 0.1 0.2
5 1 0.7 0.5 0.2
6 0 0.6 0.4 0.2
7 0 0.3 0.1 0.2
𝔼[𝑌! 1 𝑇! = 1 =? 𝔼[𝑌! 0 𝑇! = 0 =? 𝐷 =?
"
But we don’t know the 𝜏𝒊 𝑠!
Let’s find the average of the observed outcomes in the two groups.
𝑃𝑒𝑟𝑠𝑜𝑛 𝑖 𝑇𝒊 𝑌𝒊 (1) 𝑌𝒊 (0) 𝜏!
1 1 0.6 0.3 0.3
2 0 0.2 0.6 -0.4
3 1 0.4 0.2 0.2
4 0 0.3 0.1 0.2
5 1 0.7 0.5 0.2
6 0 0.6 0.4 0.2
7 0 0.3 0.1 0.2
𝔼[𝑌! 1 𝑇! = 1 = 0.57 𝔼[𝑌! 0 𝑇! = 0 = 0.3 𝐷 = 0.27
Take-away

𝔼[𝜏! ] = 0.13 ≠ 𝐷 = 0.27

● In general, D is NOT the same as ATE!


● Why? because association is not causation!
So what do we need for D to be
equal to ATE?
Key Assumptions
● Ignorability: Potential outcomes are independent of the
treatment assignment! {Yi(1),Yi(0)}⊥T
● Stable Unit Treatment Value Assumption (SUTVA) (a.k.a. no
interference assumption)
− Ex: vaccination, communication
● No different versions of treatment (a.k.a. Stability, consistency)
− Violations may occur due to variable levels of treatment or technical
errors
When is D = ATE?
1 𝔼[𝑌! 1 𝑇 = 1 − 𝔼[𝑌! (0)|𝑇 = 0]
2 = 𝔼[𝑌! 1 𝑇 = 1 − 𝔼[𝑌! 0 𝑇 = 1 + 𝔼[𝑌! 0 𝑇 = 1 − 𝔼[𝑌! 0 𝑇 = 0
3 = (𝔼[𝑌! 1 𝑇 = 1 − 𝔼[𝑌! 0 𝑇 = 1 ) + 𝔼[𝑌! 0 𝑇 = 1 − 𝔼[𝑌! 0 𝑇 = 0
𝐼𝑓 {𝑌! (1), 𝑌! (0)} ⊥ 𝑇 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 𝑡ℎ𝑒𝑛
4 = (𝔼[𝑌! 1 𝑇 = 1 − 𝔼[𝑌! 0 𝑇 = 1 ) + 𝔼[𝑌! 0 𝑇 = 1 − 𝔼[𝑌! 0 𝑇 = 0
5 = 𝔼[𝑌! (1) − 𝑌! (0)]
Causal Inference: Missing Data Problem
● Causal inference is the problem of estimating the counterfactual
− Estimate missing data values using various methods
− 𝑌! (0) now becomes an estimated quantity, based on outcomes of other
people who did not receive treatment
Various Methodologies
● Randomized Control Trials (RCTs)
− Gold standard

● Quasi-Experimental Methods
− Examples: cross-sectional approaches, before-after comparison, matching,
difference-in-differences, regression discontinuity, and instrumental
variables
RCTs are the Gold Standard
● RCTs have long been a gold standard in the medical field, for
example in establishing whether a new drug works
● They have also become increasingly common in other fields such
as economics
Caveats in RCTs
● Sometimes, people’s responses in an experiment are influenced
by their knowing which group they are in
− So, you might want to run a blind experiment…
− To make this work, you will have to give the control group a placebo,
which is something…
● External validity (or generalizability) is always an important
question
But RCTs may not always be possible!
● RCTs can be expensive, infeasible, or unethical
− For example, suppose you want to study the effects of smoking and you
randomly assign some people to your “smoking” group
● What can we do when an experiment is not possible?
− In such situations you will almost invariably be conducting an observational
study, not an experiment
− Be alert for confounding factors
Observational Data
Let’s now turn to the setting in which we don’t have
the ability to assign treatments randomly to units.
A lot of observational data is out there
● Social media logs
● Credit card transactions
● Website/apps logs
● Satellite images
● …
● Can we do causal inference using observational data? What can
go wrong?
Today
● Counterfactual Reasoning
● Potential Outcomes Framework
● Unobserved Confounding
● Quasi-Experimental Methods for Causal Inference
Is the New Drug Effective?
Goal: Evaluate the effectiveness of a new drug (treatment) in
improving the recovery rate (outcome) from a disease
● T=0 à Control group (those who do not take the drug)

● T=1 à Treatment group (those who take the new drug)

We record the recovery rates of 700 patients who were given


access to the drug. A total of 350 patients chose to take the drug
and 350 patients did not.
Is the New Drug Effective?

Drug (T=1) No Drug (T=0) N


Men 93% (81/87) 87% (234/270) 357
Women 73% (192/263) 69% (55/80) 343
Combined Data 78% (273/350) 83% (289/350) 700

Is this new drug effective? Should we prescribe it to


women? men? everyone?
Is the New Drug Effective?

Drug (T=1) No Drug (T=0) N


Men 93% (81/87) 87% (234/270) 357
Women 73% (192/263) 69% (55/80) 343
Combined Data 78% (273/350) 83% (289/350) 700

The answer depends on the causal mechanism or


the data generating process!
Is the New Drug Effective?

Drug (T=1) No Drug (T=0) N


Men 93% (81/87) 87% (234/270) 357
Women 73% (192/263) 69% (55/80) 343
Combined Data 78% (273/350) 83% (289/350) 700

• Suppose we knew an additional fact: a women-only hormone has a


negative effect on recovery, so women are less likely to recover
than men, regardless of the drug.
• In addition, as we can see from the data, women are significantly
more likely to take the drug than men are.
Key Insight
● The reason the drug appears to be harmful overall is:
− If we select a drug user at random, that person is more likely
to be a woman and hence less likely to recover than a random
person who does not take the drug
− Put differently, being a woman is a common cause of both
drug taking and failure to recover!
G

T Y
Insight
● Thus, to assess the effectiveness of the drug, we need
to compare subjects of the same gender
− This will ensure that any difference in recovery rates can be
attributed to the drug
− Thus, we should consult the segregated data, which shows
unequivocally that the drug is helpful
− This matches our intuition, which tells us that the segregated
data is “more specific,” hence more informative, than the
unsegregated data
This was an example of Simpson’s Paradox

You might also like