Association and Causation
Dr. Segun Bello
                                   Intended Learning Objectives
▪ Describe exposure and outcomes in specific scenarios
▪ Describe the Hills’ criteria for disease causation
▪ Measure exposure and outcome
▪ Describe confounding and effect modification
▪ Use p-value and confidence intervals to describe associations
                                              Historical Perspective
▪ Experience learned from overcoming communal health challenges
  especially involving infectious diseases control
▪ Example
  – Contributions of John Snows during Cholera epidemic in England 1854
                                 Lessons learnt from John Snows
▪ Impactful public health actions may require only simple
  interventions
▪ Power of observation
  – Distribution of health events
  – Descriptive epidemiology; a health problem is half-solved if well understood
                                     Epidemiology - key concepts
The study of
the distribution
and determinants
of health-related states or events
in specified human populations
and its application to the control of health problems
           – Dictionary of Epidemiology
                                          Porta M, ed. Dictionary of Epidemiology, 5th
                                          ed. New York: Oxford U. Press, 2008
                                  Distribution and determinant
▪ To understand a health services research question
▪ Five Ws of journalism
  –   WHAT is the problem (and how much/many)
  –   WHO (Person characteristics)
  –   WHERE (Place)
  –   WHEN (Time)
  –   WHY / HOW (Cause, risk factors)
                                          Application of epidemiology
▪ Assessing the community’s health
  – Determining the magnitude, trends, patterns of disease
▪ Searching for causes
  – Identifying agents, vehicles, risk factors, susceptibility
  – Identifying preventive factors
▪ Making individual decisions
▪ Completing the clinical picture
▪ Evaluating the impact of control, prevention measures
                 Tools in Epidemiology
▪   Counts
▪   Ratio
▪   Rate
▪   Prevalence
▪   Incidence
                               Measures of disease frequency
▪ Key initial information to investigation aetiology
▪ How frequently does the disease occur?
▪ Big picture about a disease
▪ Framing public health questions and guiding resource allocation
▪ Describes the absolute risk of a disease
▪ Can be categorized, stratified by person, place and/or time
  characteristics to gain insight into disease mechanism
▪ Two most commonly used measures – prevalence/incidence
                                          Exposure and outcome
▪ The term exposure carried over from infectious diseases
  epidemiology
▪ Any factor or characteristics that may explain or predict the
  presence of an outcome
▪ An outcome is any entity that is being predicted
▪ Most times an outcome is a disease
▪ Outcome may also be a biological measurement e.g. blood
  pressure, cholesterol level, vaccination status, or medication use
                                                     Examples
▪ A study examines whether vaccination against pneumococcal
  pneumonia is effective in preventing the disease
▪ Questions
  –   Specify the exposure
  –   Specify the outcome
  –   How can you elicit the exposure?
  –   How can you measure the outcome?
                                                       Examples
Potential answers
▪ Any previous pneumococcal vaccination – Yes/No
▪ Recent pneumococcal vaccination – Yes/No
▪ The number of years since last pneumococcal vaccination
                 Distinguishing causation from association
▪ No way of directly observing an exposure causing a disease
▪ Separating association from causation is important in Epidemiology
  for prevention, treatment and control decisions
                                                     Causal inference
▪ Multiple exposures often influence a single disease outcome
▪ Many exposure take a long time to influence the outcome
▪ E.g. the influence of diet on the risk of cancer
                   Bradford Hills criteria for disease causation
▪ Strength of association:
   – Percival Pott’s work on scrotal cancer among chimney workers
▪ Consistency
   – Has the association been repeatedly observe by different persons, in
     different places, circumstances and times?
▪ Specificity
   – Association is specific to persons of certain characteristics and/or body sites
▪ Temporality
   – Which is the cart and which the horse? E.g. does a particular diet lead to
     disease or does the early disease lead to dietary habits?
▪ Dose-response gradient
   – Between exposure and outcome e.g. cancer death and number of cigarettes
                   Bradford Hills criteria for disease causation
▪ Plausibility
   – Is the association biological plausible?
▪ Coherence
   – Cause-effect interpretation should not conflict with the general knowledge
     about the natural history/biology of the disease
▪ Experiment
   – Do preventive actions change the frequency of the events? E.g. does
     cancer frequency reduce in people who stop smoking?
▪ Analogy
   – Thalidomide and birth defects
                                                       Confounding
▪ A third factor may be responsible for an observed relationship
  between ‘exposure’ and an outcome
▪ Confounding is an important limitation of observational studies
▪ Alters the interpretation of study results, obscures whether
  exposure is a cause of the outcome
▪ A factor
  – Associated with the exposure
  – Associated with the outcome
  – Not in the causal pathway of association
                                                     Confounding
▪ Study data are used to judge whether a potential confounder is
  associated with the exposure and the outcome
▪ Biological and clinical knowledge are used to judge whether a
  potential confounder is in the causal pathway of association
Example – exposure risk measurement
Example – background characteristics
Example – potential confounders
                 Evaluation of a potential confounding factor
▪ Association with exposure
  – Statistical and plausible association between two independent variables
  – E.g. supramycin use and history of eczema
▪ Association with outcome
  – ‘bivariate’ analysis
  – Univariate model
▪ Not in causal pathway
  – Causal pathway represents the mechanisms connecting exposure with
    outcome
  – For example, autoantibody response and mast cell degranulation is
    associated with both supramycin and rash but cannot be called confounder
    because supramycin causes rash through these
                              Methods to control for confounding
▪ Restriction
   – Remove all participants with previous history of eczema
                              Methods to control for confounding
▪ Stratification
   – Separate study subjects according to their ‘history of eczema’ status
                  Other methods to control for confounding
▪ Matching
▪ Regression
▪ Randomization
                                                 Effect modification
▪ Present when the size of an association differs by another factor
▪ Presence of effect modification can suggest synergy between
  exposure variables
▪ Effect modification is a different concept from confounding: a
  particular characteristic can function as a confounder, an effect
  modifier, both, or neither
▪ The likelihood ratio test evaluates whether the size of an
  association is statistically different across two or more categories of
  another factor
▪ The p-value (for interaction) from the likelihood ratio test represents
  the probability of finding the observed difference in the size of an
  association across subgroups by chance
                                                          Examples
▪ Association of smoking and heavy alcohol use with laryngeal cancer
▪ Chronic alcohol use is thought to damage the protective mucosal
  layer of the larynx, thereby enhancing the carcinogenic effects of
  smoking
P-value & Confidence Interval
                           ILL
     VANILLA    |     +     - | Total
     -----------+-------------+------
              + |    43    11 |    54
              - |     3    18 |    21
     -----------+-------------+------
          Total |    46    29 |    75
                      Single Table Analysis
Odds ratio                                              23.45
Cornfield 95% confidence limits for OR    5.07 < OR < 125.19*
RISK RATIO(RR)(Outcome:ILL=+; Exposure:VANILLA=+)          5.57
95% confidence limits for RR             1.94 < RR <      16.03
            Ignore risk ratio if case control study
                         Chi-Squares     P-values
                         -----------     --------
        Uncorrected:        27.22       0.00000018 <---
        Mantel-Haenszel:    26.86       0.00000022 <---
        Yates corrected:    24.54       0.00000073 <---
                 Confidence Intervals
        Risk ratio = 2.1
95% confidence interval = 1.1–4.0
  RR & 95% CI = 2.1 (1.1–4.0)
 RR & 95% CLs = 2.1 (1.1, 4.0)
                           What is Estimation?
▪ In epidemiology, the process of making a
  quantitative inference about a population from
  a sample
  – Point estimate
      Characteristic of a population
      Exposure’s effect on disease
  – Interval estimate
        Range of values rather than single value
                What is a Confidence Interval?
▪ A range of values that quantify the uncertainty
  around a point estimate of a measure, such as the
  proportion of children vaccinated or the effect of
  exposure on disease.
▪ Provides interval estimate, reflects precision (or
  imprecision) of the point estimate
▪ Examples of point estimates and 95% confidence
  intervals :
   – Vaccine coverage = 70% (95% CI = 65%-75%)
   – Risk ratio = 4.0 (95% CI = 2.0–8.0)
                              Confidence Intervals
▪ Required or recommended by many medical and
  public health journals
▪ Most relevant when goal is to estimate…
   – Characteristic of a population
   – Exposure’s effect on disease
▪ Less relevant for finding single responsible factor
            What is a Confidence Interval?
                      Statistical Definition
The interval calculated from a random sample
by a procedure which, if applied to an infinite
number of random samples of the same size,
would, in 95% (or other specified level) of
instances, contain the true value in population.
                           1        VC=?
                                       VC=?
                                2
      Population
(Vaccine coverage = 60%)         999    VC=?
                               1000    VC=?
                                 VC=?
95% Confidence Intervals from 20 Samples
          What is a Confidence Interval?
                Epidemiologic Definition
The range of values that are compatible with
the data under the standard interpretation of
statistical significance.
                          - Rothman
                                                                              P-value Function?
                                                                       RR & 95% CI = 2.1 (1.1-4.0)
                               RR=1.0
              0.5
              0.4
Probability
              0.3
              0.2
                                                 Is every value in
              0.1                                 the confidence
               0
                    -0.5   1     1.5    2     2.5
                                                  interval equally
                                                         3   3.5   4    4.5
                                            Risk ratio likely?
                                                                              P-value Function?
                                                                       RR & 95% CI = 2.1 (1.1-4.0)
                               RR=1.0
              0.5
              0.4
Probability
              0.3
              0.2
              0.1
               0
                                                    No! This graph
                    -0.5   1     1.5    2     2.5     is wrong.
                                                         3   3.5   4    4.5
                                            Risk ratio
                                                              P-value Function
                                                RR & 95% CI = 2.1 (1.1-4.0)
  0.5                RR=1.0
  0.4
                                                              This graph is
  0.3
  0.2
                                                                correct.
  0.1
Probability
        0
              -0.5   1   1.5   2     2.5        3   3.5   4   4.5
                                   Risk Ratio
                                                                P-value Function
                                                RR & 95% CI = 2.1 (1.1-4.0)
  0.5                RR=1.0              RR=2.1
  0.4
                          RR=1.1                                      RR=4.0
  0.3
  0.2
  0.1
Probability
        0
              -0.5   1   1.5   2     2.5
                                     95%    CI3 = 1.1-4.0
                                                   3.5      4   4.5
                                   Risk Ratio
                                                                        P-value Function
                                                        RR & 95% CI = 2.1 (0.8-5.5)
  0.5                     RR=1.0                 RR=2.1
  0.4                RR=0.8                                                                RR=5.5
  0.3
  0.2
  0.1
Probability
        0
              -0.5    1    1.5   2   2.5     3    3.5    4   4.5   5    5.5 CI = 0.8-5.5
                                                                       95%
                                           Risk Ratio
              Width of a Confidence Interval
▪ An interval or range of values that reflects the
  precision of the point estimate
▪ Example:
   – Study 1: RR = 4.0 (95% CI = 2.0–8.0)
   – Study 2: RR = 4.0 (95% CI = 1.01–16.0)
▪ Narrower confidence interval means more
  precise estimate
▪ Wider confidence interval means less precise
  estimate
       Formulas for Confidence Intervals
▪ Based on the “measure” ± Z multiplied by the
  standard error (measure), where “measure”
  represents RR, OR, etc.
▪ Different formula for each measure
  – Mean, proportion, RR, OR, etc.
▪ Some measures have multiple formulas
▪ Symmetrical for mean, proportion;
  asymmetrical for RR, OR
▪ Let the computer do the computation!
General Structure for Confidence Intervals
Arithmetic Scale Measure
Point estimate ± tα               v a ria n c e
Multiplicative Scale Measure
                                  [  Z α va r ia nc e ]
Point estimate x e
e [ ln ( p o in t e s t im a t e )  Z α va r ia n c e ]
           Confidence Interval for a Mean
                                      2
                         (xi - x )
        mean  tα
                          n (n - 1)
          x  tα  s / n
mean  t α x (standard error of the mean)
 Confidence Interval for a Proportion
proportion  Z α pq/n
    For 95% CI, Zα = 1.96
Output from Epi Info 7
    (Classic Analysis)
Output from Epi Info 7
    (Classic Analysis)
        Use
        Ignore
        Use
Confidence Interval for a Risk Ratio
                    (“Taylor series”)
   For 95% CI, Zα = 1.96
Confidence Interval for an Odds Ratio
                      (“Taylor series”)
                       Comments on the Width
                        of a Confidence Interval
▪ Width reflects the precision of the point estimate
              OR & 95% CI = 3.0 (1.0 – 9.0)
              OR & 95% CI = 3.0 (2.0 – 4.5)
▪ The more confidence you want (95 vs. 99%), the
  wider the interval
         99    95          OR           95    99
   Confidence Interval is Affected by…
▪ Level of confidence
  – 99% wider than 95% CI
▪ Sample size
  – Larger study → narrower CI because SE
    decreases as the sample size increases
▪ For some measures, variation in the data
▪ For RR and OR, strength of association
                 CI vs. P-value – Similarities
▪   Both can assess statistical significance
▪   Statistically equivalent (more or less)
▪   Neither accounts for bias
▪   Multiple formulas, including exact methods
                 CI vs. P-value – Differences
▪ CI provides same information as a statistical
  test, plus more
▪ CI reminds reader of variability
▪ CI provides range of compatible values
  (interval estimation)
▪ CI more clearly shows influence of sample
  size
         Interpreting a Point Estimate and
                        Confidence Interval
 Risk ratio = 2.1 (95% CI = 1.1– 4.0)
▪ Where is the point estimate?
  (RR > 1, RR  1, RR < 1)
▪ Is the range of values wide or narrow?
▪ What values are consistent with this study
  (included)? What values are inconsistent
  (excluded)?
▪ Is this study consistent with the null hypothesis
  (no difference)?
                    Interpret the Findings (Studies 1–6)
Study   P value      RR      95% CI       Interpretation
 1       0.007       2.0     1.2 – 3.3
 2        0.03       7.0     1.2 – 40.8
 3        0.08       7.0     0.8 – 61.3
 4        0.65       0.98   0.9 – 1.07
 5      0.0001       0.98   0.97 – 0.99
 6      8 x 10-11    0.6     0.5 – 0.7
                  Interpret the Findings (Studies 7–11)
Study   P value     RR      95% CI       Interpretation
 7       0.060       4.0   0.94 – 17.0
 8       0.052       4.0   0.99 – 16.2
 9       0.048       4.0   1.01 – 15.8
 10     0.00009      4.0    2.0 – 8.0
 11       0.9        4.0   0.2 – 80.0
                       Final Comments on CI
▪ Confidence interval is range of values
  compatible with the data in a study
▪ Quantifies uncertainty around a point estimate
  (indicates precision)
▪ Most relevant for interval estimation
▪ Can be used as test of statistical significance
  (95% CI is equivalent to alpha = 0.05)
▪ Does not address bias!