0% found this document useful (0 votes)
39 views60 pages

Association and Causation

The document discusses the concepts of association and causation in epidemiology, emphasizing the importance of distinguishing between the two for effective public health interventions. It outlines key epidemiological tools, measures of disease frequency, and Bradford Hill's criteria for establishing causation, while also addressing confounding factors and effect modification. Additionally, it explains the significance of p-values and confidence intervals in evaluating associations and making inferences about population health.

Uploaded by

Michael Adamu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views60 pages

Association and Causation

The document discusses the concepts of association and causation in epidemiology, emphasizing the importance of distinguishing between the two for effective public health interventions. It outlines key epidemiological tools, measures of disease frequency, and Bradford Hill's criteria for establishing causation, while also addressing confounding factors and effect modification. Additionally, it explains the significance of p-values and confidence intervals in evaluating associations and making inferences about population health.

Uploaded by

Michael Adamu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Association and Causation

Dr. Segun Bello


Intended Learning Objectives
▪ Describe exposure and outcomes in specific scenarios

▪ Describe the Hills’ criteria for disease causation

▪ Measure exposure and outcome

▪ Describe confounding and effect modification

▪ Use p-value and confidence intervals to describe associations


Historical Perspective
▪ Experience learned from overcoming communal health challenges
especially involving infectious diseases control
▪ Example
– Contributions of John Snows during Cholera epidemic in England 1854
Lessons learnt from John Snows
▪ Impactful public health actions may require only simple
interventions
▪ Power of observation
– Distribution of health events
– Descriptive epidemiology; a health problem is half-solved if well understood
Epidemiology - key concepts
The study of
the distribution
and determinants
of health-related states or events
in specified human populations
and its application to the control of health problems
– Dictionary of Epidemiology

Porta M, ed. Dictionary of Epidemiology, 5th


ed. New York: Oxford U. Press, 2008
Distribution and determinant
▪ To understand a health services research question
▪ Five Ws of journalism
– WHAT is the problem (and how much/many)
– WHO (Person characteristics)
– WHERE (Place)
– WHEN (Time)
– WHY / HOW (Cause, risk factors)
Application of epidemiology
▪ Assessing the community’s health
– Determining the magnitude, trends, patterns of disease
▪ Searching for causes
– Identifying agents, vehicles, risk factors, susceptibility
– Identifying preventive factors
▪ Making individual decisions
▪ Completing the clinical picture
▪ Evaluating the impact of control, prevention measures
Tools in Epidemiology
▪ Counts
▪ Ratio
▪ Rate
▪ Prevalence
▪ Incidence
Measures of disease frequency
▪ Key initial information to investigation aetiology
▪ How frequently does the disease occur?
▪ Big picture about a disease
▪ Framing public health questions and guiding resource allocation
▪ Describes the absolute risk of a disease
▪ Can be categorized, stratified by person, place and/or time
characteristics to gain insight into disease mechanism
▪ Two most commonly used measures – prevalence/incidence
Exposure and outcome
▪ The term exposure carried over from infectious diseases
epidemiology
▪ Any factor or characteristics that may explain or predict the
presence of an outcome
▪ An outcome is any entity that is being predicted
▪ Most times an outcome is a disease
▪ Outcome may also be a biological measurement e.g. blood
pressure, cholesterol level, vaccination status, or medication use
Examples
▪ A study examines whether vaccination against pneumococcal
pneumonia is effective in preventing the disease

▪ Questions
– Specify the exposure
– Specify the outcome
– How can you elicit the exposure?
– How can you measure the outcome?
Examples
Potential answers

▪ Any previous pneumococcal vaccination – Yes/No


▪ Recent pneumococcal vaccination – Yes/No
▪ The number of years since last pneumococcal vaccination
Distinguishing causation from association
▪ No way of directly observing an exposure causing a disease

▪ Separating association from causation is important in Epidemiology


for prevention, treatment and control decisions
Causal inference
▪ Multiple exposures often influence a single disease outcome

▪ Many exposure take a long time to influence the outcome

▪ E.g. the influence of diet on the risk of cancer


Bradford Hills criteria for disease causation
▪ Strength of association:
– Percival Pott’s work on scrotal cancer among chimney workers
▪ Consistency
– Has the association been repeatedly observe by different persons, in
different places, circumstances and times?
▪ Specificity
– Association is specific to persons of certain characteristics and/or body sites
▪ Temporality
– Which is the cart and which the horse? E.g. does a particular diet lead to
disease or does the early disease lead to dietary habits?
▪ Dose-response gradient
– Between exposure and outcome e.g. cancer death and number of cigarettes
Bradford Hills criteria for disease causation
▪ Plausibility
– Is the association biological plausible?
▪ Coherence
– Cause-effect interpretation should not conflict with the general knowledge
about the natural history/biology of the disease
▪ Experiment
– Do preventive actions change the frequency of the events? E.g. does
cancer frequency reduce in people who stop smoking?
▪ Analogy
– Thalidomide and birth defects
Confounding
▪ A third factor may be responsible for an observed relationship
between ‘exposure’ and an outcome
▪ Confounding is an important limitation of observational studies
▪ Alters the interpretation of study results, obscures whether
exposure is a cause of the outcome
▪ A factor
– Associated with the exposure
– Associated with the outcome
– Not in the causal pathway of association
Confounding
▪ Study data are used to judge whether a potential confounder is
associated with the exposure and the outcome
▪ Biological and clinical knowledge are used to judge whether a
potential confounder is in the causal pathway of association
Example – exposure risk measurement
Example – background characteristics
Example – potential confounders
Evaluation of a potential confounding factor
▪ Association with exposure
– Statistical and plausible association between two independent variables
– E.g. supramycin use and history of eczema
▪ Association with outcome
– ‘bivariate’ analysis
– Univariate model
▪ Not in causal pathway
– Causal pathway represents the mechanisms connecting exposure with
outcome
– For example, autoantibody response and mast cell degranulation is
associated with both supramycin and rash but cannot be called confounder
because supramycin causes rash through these
Methods to control for confounding
▪ Restriction
– Remove all participants with previous history of eczema
Methods to control for confounding
▪ Stratification
– Separate study subjects according to their ‘history of eczema’ status
Other methods to control for confounding
▪ Matching
▪ Regression
▪ Randomization
Effect modification
▪ Present when the size of an association differs by another factor
▪ Presence of effect modification can suggest synergy between
exposure variables
▪ Effect modification is a different concept from confounding: a
particular characteristic can function as a confounder, an effect
modifier, both, or neither
▪ The likelihood ratio test evaluates whether the size of an
association is statistically different across two or more categories of
another factor
▪ The p-value (for interaction) from the likelihood ratio test represents
the probability of finding the observed difference in the size of an
association across subgroups by chance
Examples
▪ Association of smoking and heavy alcohol use with laryngeal cancer

▪ Chronic alcohol use is thought to damage the protective mucosal


layer of the larynx, thereby enhancing the carcinogenic effects of
smoking
P-value & Confidence Interval
ILL
VANILLA | + - | Total
-----------+-------------+------
+ | 43 11 | 54
- | 3 18 | 21
-----------+-------------+------
Total | 46 29 | 75

Single Table Analysis

Odds ratio 23.45


Cornfield 95% confidence limits for OR 5.07 < OR < 125.19*

RISK RATIO(RR)(Outcome:ILL=+; Exposure:VANILLA=+) 5.57


95% confidence limits for RR 1.94 < RR < 16.03
Ignore risk ratio if case control study

Chi-Squares P-values
----------- --------
Uncorrected: 27.22 0.00000018 <---
Mantel-Haenszel: 26.86 0.00000022 <---
Yates corrected: 24.54 0.00000073 <---
Confidence Intervals

Risk ratio = 2.1


95% confidence interval = 1.1–4.0

RR & 95% CI = 2.1 (1.1–4.0)


RR & 95% CLs = 2.1 (1.1, 4.0)
What is Estimation?

▪ In epidemiology, the process of making a


quantitative inference about a population from
a sample
– Point estimate
 Characteristic of a population
 Exposure’s effect on disease
– Interval estimate
 Range of values rather than single value
What is a Confidence Interval?
▪ A range of values that quantify the uncertainty
around a point estimate of a measure, such as the
proportion of children vaccinated or the effect of
exposure on disease.
▪ Provides interval estimate, reflects precision (or
imprecision) of the point estimate
▪ Examples of point estimates and 95% confidence
intervals :
– Vaccine coverage = 70% (95% CI = 65%-75%)
– Risk ratio = 4.0 (95% CI = 2.0–8.0)
Confidence Intervals

▪ Required or recommended by many medical and


public health journals
▪ Most relevant when goal is to estimate…
– Characteristic of a population
– Exposure’s effect on disease
▪ Less relevant for finding single responsible factor
What is a Confidence Interval?
Statistical Definition

The interval calculated from a random sample


by a procedure which, if applied to an infinite
number of random samples of the same size,
would, in 95% (or other specified level) of
instances, contain the true value in population.
1 VC=?

VC=?
2

Population
(Vaccine coverage = 60%) 999 VC=?

1000 VC=?

VC=?
95% Confidence Intervals from 20 Samples
What is a Confidence Interval?
Epidemiologic Definition

The range of values that are compatible with


the data under the standard interpretation of
statistical significance.
- Rothman
P-value Function?
RR & 95% CI = 2.1 (1.1-4.0)
RR=1.0
0.5

0.4
Probability

0.3

0.2
Is every value in
0.1 the confidence
0
-0.5 1 1.5 2 2.5
interval equally
3 3.5 4 4.5
Risk ratio likely?
P-value Function?
RR & 95% CI = 2.1 (1.1-4.0)
RR=1.0
0.5

0.4
Probability

0.3

0.2

0.1

0
No! This graph
-0.5 1 1.5 2 2.5 is wrong.
3 3.5 4 4.5
Risk ratio
P-value Function
RR & 95% CI = 2.1 (1.1-4.0)
0.5 RR=1.0
0.4
This graph is
0.3

0.2
correct.
0.1
Probability

0
-0.5 1 1.5 2 2.5 3 3.5 4 4.5
Risk Ratio
P-value Function
RR & 95% CI = 2.1 (1.1-4.0)
0.5 RR=1.0 RR=2.1
0.4
RR=1.1 RR=4.0
0.3

0.2

0.1
Probability

0
-0.5 1 1.5 2 2.5
95% CI3 = 1.1-4.0
3.5 4 4.5
Risk Ratio
P-value Function
RR & 95% CI = 2.1 (0.8-5.5)
0.5 RR=1.0 RR=2.1
0.4 RR=0.8 RR=5.5
0.3

0.2

0.1
Probability

0
-0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 CI = 0.8-5.5
95%
Risk Ratio
Width of a Confidence Interval

▪ An interval or range of values that reflects the


precision of the point estimate
▪ Example:
– Study 1: RR = 4.0 (95% CI = 2.0–8.0)
– Study 2: RR = 4.0 (95% CI = 1.01–16.0)

▪ Narrower confidence interval means more


precise estimate
▪ Wider confidence interval means less precise
estimate
Formulas for Confidence Intervals
▪ Based on the “measure” ± Z multiplied by the
standard error (measure), where “measure”
represents RR, OR, etc.
▪ Different formula for each measure
– Mean, proportion, RR, OR, etc.
▪ Some measures have multiple formulas
▪ Symmetrical for mean, proportion;
asymmetrical for RR, OR
▪ Let the computer do the computation!
General Structure for Confidence Intervals

Arithmetic Scale Measure


Point estimate ± tα v a ria n c e

Multiplicative Scale Measure


[  Z α va r ia nc e ]
Point estimate x e
e [ ln ( p o in t e s t im a t e )  Z α va r ia n c e ]
Confidence Interval for a Mean

2
 (xi - x )
mean  tα
n (n - 1)

x  tα  s / n
mean  t α x (standard error of the mean)
Confidence Interval for a Proportion

proportion  Z α pq/n

For 95% CI, Zα = 1.96


Output from Epi Info 7
(Classic Analysis)
Output from Epi Info 7
(Classic Analysis)

Use
Ignore

Use
Confidence Interval for a Risk Ratio
(“Taylor series”)

For 95% CI, Zα = 1.96


Confidence Interval for an Odds Ratio
(“Taylor series”)
Comments on the Width
of a Confidence Interval

▪ Width reflects the precision of the point estimate


OR & 95% CI = 3.0 (1.0 – 9.0)
OR & 95% CI = 3.0 (2.0 – 4.5)

▪ The more confidence you want (95 vs. 99%), the


wider the interval

99 95 OR 95 99
Confidence Interval is Affected by…

▪ Level of confidence
– 99% wider than 95% CI
▪ Sample size
– Larger study → narrower CI because SE
decreases as the sample size increases
▪ For some measures, variation in the data
▪ For RR and OR, strength of association
CI vs. P-value – Similarities

▪ Both can assess statistical significance


▪ Statistically equivalent (more or less)
▪ Neither accounts for bias
▪ Multiple formulas, including exact methods
CI vs. P-value – Differences

▪ CI provides same information as a statistical


test, plus more
▪ CI reminds reader of variability
▪ CI provides range of compatible values
(interval estimation)
▪ CI more clearly shows influence of sample
size
Interpreting a Point Estimate and
Confidence Interval
Risk ratio = 2.1 (95% CI = 1.1– 4.0)
▪ Where is the point estimate?
(RR > 1, RR  1, RR < 1)
▪ Is the range of values wide or narrow?
▪ What values are consistent with this study
(included)? What values are inconsistent
(excluded)?
▪ Is this study consistent with the null hypothesis
(no difference)?
Interpret the Findings (Studies 1–6)

Study P value RR 95% CI Interpretation

1 0.007 2.0 1.2 – 3.3

2 0.03 7.0 1.2 – 40.8

3 0.08 7.0 0.8 – 61.3

4 0.65 0.98 0.9 – 1.07

5 0.0001 0.98 0.97 – 0.99

6 8 x 10-11 0.6 0.5 – 0.7


Interpret the Findings (Studies 7–11)

Study P value RR 95% CI Interpretation

7 0.060 4.0 0.94 – 17.0

8 0.052 4.0 0.99 – 16.2

9 0.048 4.0 1.01 – 15.8

10 0.00009 4.0 2.0 – 8.0

11 0.9 4.0 0.2 – 80.0


Final Comments on CI

▪ Confidence interval is range of values


compatible with the data in a study
▪ Quantifies uncertainty around a point estimate
(indicates precision)
▪ Most relevant for interval estimation
▪ Can be used as test of statistical significance
(95% CI is equivalent to alpha = 0.05)
▪ Does not address bias!

You might also like