Selection and information bias in
Epidemiological Studies
Dr. Nadira Sultana Kakoly
Every epidemiological study should be
viewed as a measurement exercise
K Rothman
Definition of bias
Any systematic error in an epidemiological
study that results in an incorrect estimate
of the association between exposure and
risk of disease
Three types of bias
• Confounding (Population characteristics)
• Selection bias
• Information bias
- Confounding may lead to errors in the conclusion of
a study, but, when confounding variables are
known, the effect may be fixed
- Bias is a systematic error in a study and cannot be
fixed
Selection bias
• Errors in the process of identifying the study
population
• Preferential selection of subjects related to
their
- case/control status
- exposure status
Direction of Bias
• A bias that overestimates an association is
called bias away from the null
• A bias that underestimates an association is
called bias toward the null
• What is null?
Selection bias
• Sampling bias (Purposive sampling)
• Ascertainment bias
- Surveillance (asymptomatic cases are missed)
- Referral, admission (Only referred and admitted cases
are included)
- Diagnostic (Exposed cases are more likely to be
diagnosed)
• Participation bias
- Self-selection (volunteerism)
- Non-response, refusal
- Healthy worker effect, survival
Selection bias in
case-control studies
Selection bias
Cases Controls A
liver cirrhosis trauma ward
Heavy alcohol use 80 40
Light/no alcohol use 20 60
OR=6
How representative are hospitalised trauma patients of the population
which gave rise to the cases?
Selection bias
Cases Controls A Controls B
liver cirrhosis trauma ward non-trauma
Heavy alcohol use 80 40 10
Light/no alcohol use 20 60 90
OR=6 OR=36
Diagnostic bias
Diagnostic approach related to knowing exposure status
Cases
Controls
uterine cancer
Takes oral
a b
contraceptives
Does not take oral
c d
contraceptives
• OC use breakthrough bleeding increased chance of
detecting uterine cancer
• Overestimation of “a” overestimation of OR
Non-response bias
Papanicolau test Cases of
Controls
cervical cancer
Did not have test a b
Had test c d
Total 1000 1060
• Controls chosen among women at their homes: 13000 homes
contacted 1060 controls
• Controls mainly housewives with lower chance of having
test than women gainfully employed
• Underestimation of “b” overestimation of OR
Selection bias in
cohort studies
Non-response bias
lung cancer
yes no
Smoker 9 91 100
Non-smoker 1 99 100
9 1
RR 9
100 100
lung cancer
yes no
Sportive smoker* 0 7 7
Unhealthy smoker 9 51 60
Non-smoker 1 99 100
*33 sportive smokers do not participate in study as too embarrassed
about admitting to their smoking
9 1
RR 13.4
67 100
Loss to follow-up
• Bias due to differences in completeness of
follow-up between comparison groups
• Example
- Study of disease risk in migrants
- Migrants more likely to return to place of origin when
having disease
lost to follow-up
lower disease rate among exposed (=migrant)
Minimising selection bias
• Clear definition of study population
• Explicit case and control definitions
• Cases and controls from same population
• Selection of exposed and non-exposed without
knowing disease status (retrospective cohort)
Information bias
• Systematic error in the measurement of
information on exposure or outcome
• Differences in accuracy
- of exposure data between cases and controls
- of outcome data between different exposure groups
Misclassification
Measurement error leads to assigning wrong
exposure or outcome category
Non-differential Differential
• Random error • Systematic error
• Unrelated to exposure or • Related to exposure or
outcome status outcome status
• Not a bias • Bias
• Weakens measure of • Measure of association
association distorted in any direction
Two main types of information bias
• Reporting bias
- Recall bias
• Observer bias
- Interviewer bias
- Biased follow-up
Recall bias
Cases remember exposure differently than controls
Mothers of
Children with
Controls
malformation
Took tobacco,
a b
alcohol, drugs
Did not take c d
• Mothers of children with malformations will remember past
exposures better than mothers with healthy children
• Overestimation of “a” overestimation of OR
Interviewer bias
Investigator asks cases and controls differently about exposure
Cases of
Controls
listeriosis
Eats soft cheese a b
Does not eat
c d
soft cheese
• Investigator may probe listeriosis cases about consumption
of soft cheese
• Overestimation of “a” overestimation of OR
Biased follow-up
Unexposed are less diagnosed for disease than
exposed
• Example
- Cohort study to investigate risk factors for
mesothelioma
- Difficult histological diagnosis
- Histologist more likely to diagnose specimen as
mesothelioma if asbestos exposure kown
Non-differential misclassification
• Misclassification does not depend on values of
other variables
- Exposure classification unrelated to disease status,
or
- Disease classification unrelated to exposure status
• Consequence
- Weakening of measure of association
(“bias towards the null”)
Nondifferential misclassification
• Cohort study: Alcohol laryngeal cancer
Incidence
No misclassification RR
per mill
1,000,000 drinkers 50 5.0
500,000 nondrinkers 10
50% drinkers misclassified
500,000 drinkers 50 1.7
1,000,000 “nondrinkers” 30
Nondifferential misclassification
• Cohort study: Alcohol laryngeal cancer
Incidence
No misclassification RR
per mill
1,000,000 drinkers 50 5.0
500,000 nondrinkers 10
50% drinkers & 33%
nondrinkers misclassified
666,667 “drinkers” 40 1.2
833,333 “nondrinkers” 34
Minimising information bias
• Standardise measurement instruments
• Administer instruments equally to cases and controls
(exposed/unexposed)
• Use multiple sources of information
- Questionnaires
- Direct measurements
- Registries
- Case records
Bias in randomised controlled trials
• Gold-standard: randomised, placebo-
controlled, double-blinded study
• Least biased
- Exposure randomly allocated to subjects -
minimises selection bias
- Masking of exposure status in subjects and study
staff - minimises information bias
Bias in prospective cohort studies
• Loss to follow up
- The major source of bias in cohort studies
- Assume that all do / do not develop outcome?
• Ascertainment and interviewer bias
- Some concern: Knowing exposure may influence how
outcome determined
• Non-response, refusals
- Little concern: Bias arises only if related to both exposure
and outcome
• Recall bias
- No problem: Exposure determined at time of enrolment
Bias in retrospective cohort &
case-control studies
• Ascertainment bias, participation bias,
interviewer bias
- Exposure and disease have already occurred
differential selection / interviewing of compared
groups possible
• Recall bias
- Cases (or ill) may remember exposures differently
than controls (or healthy)
Control of bias
• Careful study design- Some adjustments can
be made during the analysis, but a well
designed study can minimize the potential for
bias in any study ranging from general
considerations to specific features of data
collection process
Choice of study population
• For a case control study it is always better to
take controls from the population the cases
come. This will decrease the likelihood of
non-response, selection and recall bias.
Choice of study population
• For cohort studies population should be well
defined with respect to employment,
occupation, residence so that biases due to
loss to follow up can be minimized. This is
also same for clinical trials.
Data collection
Data obtained should be informative and
interpretable.
There are two major aspect of data collection
procedure that can minimize bias.
1. The construction of specific instruments to
obtain information e.g. questionnaires,
interviews, physical examination
2. Administration of those instruments by study
personnel
Blinding
• The investigator should be blind about the
possible exposure and outcome depending
on the study design
• The same thing is also applicable to subjects
Training
• It is important that the investigators are
properly trained
• They should follow specific procedure that
are identical to all subjects (in completing
forms, examining subjects etc.)
Source of exposure and disease
information
• The use of pre-existing record is the most
unbiased data
• Problem is that it may not have adequate
information on all factors of interest
• But it is possible to gather information from
multiple sources
Conclusion
• Bias should be considered as possible alternative
explanation of any statistical observation like
confounding and effect modification
• Bias should be dealt during the design phase
unlike the confounding and effect modification
that can be dealt during the analysis
• It is difficult to deal a bias in the analysis
• But if present estimate the direction
• Misclassification can be corrected by a small
validation study
Thanks