0% found this document useful (0 votes)
109 views5 pages

Guidelines For Evaluating Prevalence Studies: Ebmh Notebook

The document provides guidelines for evaluating the methodology of prevalence studies that estimate the frequency of psychiatric disorders. Key aspects to consider include: 1) Whether the target population is clearly defined, including any systematic exclusions. 2) Whether probability sampling methods were used to select a representative sample of the target population. 3) Whether the characteristics of respondents match the target population, or if there are significant differences due to non-response bias.

Uploaded by

Jose Perez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views5 pages

Guidelines For Evaluating Prevalence Studies: Ebmh Notebook

The document provides guidelines for evaluating the methodology of prevalence studies that estimate the frequency of psychiatric disorders. Key aspects to consider include: 1) Whether the target population is clearly defined, including any systematic exclusions. 2) Whether probability sampling methods were used to select a representative sample of the target population. 3) Whether the characteristics of respondents match the target population, or if there are significant differences due to non-response bias.

Uploaded by

Jose Perez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Downloaded from http://ebmh.bmj.com/ on October 5, 2017 - Published by group.bmj.

com

EBMH NOTEBOOK

Guidelines for evaluating prevalence studies


As stated in the first issue of Evidence-Based Mental Health, we are planning to widen the scope of the journal to include studies answering additional types
of clinical questions. One of our first priorities has been to develop criteria for studies providing information about the prevalence of psychiatric disorders, both
in the population and in specific clinical settings. We invited the following editorial from Dr Michael Boyle to highlight the key methodological issues involved
in the critical appraisal of prevalence studies. The next stage is to develop valid and reliable criteria for selecting prevalence studies for inclusion in the jour-
nal. We welcome our readers contribution to this process.

You are a geriatric psychiatrist providing consultation and lation must be defined by shared characteristics assessed and
care to elderly residents living in several nursing homes. measured accurately. Some of these characteristics include
The previous 3 patients referred to you have met criteria age, sex, language, ethnicity, income, and residency. Invari-
for depression, and you are beginning to wonder if the ably, subsets of the target population are too expensive or
prevalence of this disorder is high enough to warrant screen- difficult to enlist because, for example, they live in places that
ing. Alternatively, you are a child youth worker on a clinical are inaccessible to surveys (eg, remote areas, native reserves,
service for disruptive behaviour disorders. It seems that all of military bases, shelters) or they speak languages not accom-
the children being treated by the team come from economi- modated by data collection. These excluded individuals need
cally disadvantaged families. Rather than treating these chil- to be described and their number estimated as a proportion
dren on a case by case basis, the team has discussed develop- of the target population. The requirements to define the tar-
ing an experimental community initiative in a low income get population and to identify systematic exclusions are nec-
area of the city. You are beginning to wonder if the essary to give research consumers a basis for judging the
prevalence of disruptive behaviour disorders is high enough applicability of a study to their question.
in poor areas to justify such a programme.
Prevalence studies of psychiatric disorder take a sample of Was probability sampling used to identify potential
respondents to estimate the frequency and distribution of respondents?
these conditions in larger groups. All of these studies involve Probability sampling relies on the principle of randomisation
sampling, cross sectional assessments of disorder, the collec- to ensure that each eligible respondent has a known chance
tion of ancillary information, and data analysis. Interest in of selection; it requires that members of the target
prevalence may extend from a particular clinical setting (a population be identified through a sampling frame or listing
narrow focus) to an entire nation (a broad focus). In the of potential respondents. This listing must provide access to
examples given above, the geriatric psychiatrist needs all members of the defined target population except for
evidence from an institution based study (narrow focus), exclusions acknowledged by the study authors. Probability
whereas the child youth worker needs evidence from a gen- sampling comes in a variety of forms from simple to
eral population study (broad focus). complex. In simple random sampling, a predetermined
In recent years, concern for the mental health needs of number of units (individuals, families, households) is selected
individuals in clinical settings has been broadening to from the sampling frame so that each unit has an equal
include whole populations. This population health perspec- chance of being chosen. More complex methods may include
tive has stimulated numerous prevalence studies of psychiat- stratified sampling in which a population is divided into rela-
ric disorder which are intended to inform programme plan- tively homogeneous subgroups, called strata, and samples
ning, evaluation, and resource allocation. In general, the selected independently and with known probability from
quality of these prevalence studies has been improving as a each strata; cluster sampling in which a population is divided
direct result of drawing on advances in survey methodology. into affiliated units or clusters such as neighbourhoods or
In this note, the guidelines for evaluating prevalence studies households and a sample of clusters selected with known
arise from criteria applicable to community surveys. probability; multistage sampling in which samples are
selected with known probability in hierarchical order, for
Guidelines for evaluating prevalence studies example, a sample of neighbourhoods, then a sample of
The validity of prevalence studies is a function of sampling, households, then a sample of individuals; or multiphase
measurement, and analysis. Answers to the following sampling in which sampled individuals are screened and
questions (see box on page 38) can serve as criteria for subsets are selected with known probability for more
assessing these features. intensive assessment. The use of probability sampling is a
basic requirement in prevalence studies.
Sampling
(1) does the survey design yield a sample of Do the characteristics of respondents match the target
respondents representative of a defined target population?
population? Non-response is the failure to enlist sampled individuals. If
A valid study enlists a sample that accurately represents a non-response is extensive and influenced by variables central
defined target population. Representativeness is a quality to study objectives, it can lead to selection bias and estimates
associated with the use of statistical sampling methods and that deviate systematically from population values. When
careful evaluation of respondent characteristics. information is available on non-respondents, methods exist
Is the target population defined clearly? and should be used to evaluate selection bias.1 In the absence
A sample provides the means to obtain information about a of such information, sample respresentativeness must be
larger group, called the target population. The target popu- evaluated by comparing the sociodemographic characteris-

Evidence-Based Mental Health May 1998 Vol 1 No 2 EBMH notebook 37


Downloaded from http://ebmh.bmj.com/ on October 5, 2017 - Published by group.bmj.com

are collected to separate between individual differences that


Sampling are real or actual (true variation) from ones that are unreal or
(1) does the survey design yield a sample of artifacts of the measurement process (random variation). An
respondents representative of a defined informative empirical test of instrument reliability in preva-
target population? lence studies is to give the survey instrument on two occasions,
Is the target population defined clearly?
about 710 days apart (test-retest design), and to examine lev-
Was probability sampling used to identify potential
els of agreement using , for cross classified data, and the
respondents?
intraclass correlation coefficient, for dimensional data.
Do the characteristics of respondents match the target
Instrument reliabilities must be based on a sample derived
population?
from, or at least similar to, study respondents; they also need
to include effects for all major sources of unwanted random
Measurement variation. Respondent effects due to temporal fluctuations in
(2) do the survey instruments yield reliable memory, mood, and motivation are invariably present. There
and valid measures of psychiatric disorder may also be interviewer effects arising from differences in
and other key concepts? presentation, competence, and impact and setting effects
Are the data collection methods standardised?
stemming from variability in the location and circumstances
Are the survey instruments reliable?
of data collection. If all 3 sources of unwanted variation were
Are the survey instruments valid?
applicable in a study, then the test-retest design described
above should take them into account.
Analysis
Although there is no consensus on minimum standards
(3) were special features of the sampling
for reliability, a good reason exists for setting them. Random
design accounted for in the analysis?
variation in measurement leads to attenuation of effects (bias
(4) do the reports include confidence
towards the null). Tolerating large differences in reliability
intervals for statistical estimates?
between measures creates an unequal basis for comparing
effects, and in the same study, this practice can lead to
extremely biased inferences. To prevent the mindless analysis
tics of respondents with those of the target population
and reporting of associations for poorly measured variables,
derived from a census or other relevant databases.
minimum reliability standards should be set at 0.60 (based on
In clinical studies of treatment, prevention, prognosis, and
) for cross classified data and 0.70 (based on the intraclass
quality improvement, >80% response has become the
correlation coefficient) for dimensional data.
recommended minimum for follow up.2 Although appar-
ently fixed, this minimum standard is, in fact, variable because
it fails to account for study to study variation in non-response Are the survey instruments valid?
at inception. The threshold for minimally acceptable Validity establishes the extent to which an instrument makes
response in prevalence studies should be set at 70% as long discriminations between individuals that are meaningful and
as the report shows that respondents and non-respondents, useful. Evaluating instrument validity is analogous to testing
and/or the study sample and the target population, have hypotheses on substantive associations between measured
similar important sociodemographic characteristics. Without variables, with one important difference: validity testing is
evidence of comparability between respondents and non- done to confirm, not to add to, existing theory and
respondents and/or the study sample and the target popula- knowledge. In the measurement of psychiatric disorder, this
tion, the minimum standard should be set at 80%. theory and knowledge come from clinical and epidemiologi-
cal studies that have focused on aetiology, course, and
Measurement response to treatment. Although the need to present
(2) do the survey instruments yield reliable evidence on instrument validity extends to all key variables, it
and valid measures of psychiatric disorder and is the assessment of psychiatric disorder which provides the
other key concepts? focus here.
A valid study uses instruments that provide reliable and valid Efforts to validate structured interviews for classifying psy-
measurement. These are qualities that arise from the use of chiatric disorder have been remarkably circumscribed. This
standardised data collection methods and that are confirmed is true in children for a variety of interviews3 and in adults
empirically by measurement evaluation studies. for the current recommended standardthe Composite
International Diagnostic Interview.4 The best of these studies
Are the data collection methods standardised? usually compare assessment data generated by lay interview-
Prevalence studies collect information for purposes of
ers versus clinicians.
estimation (eg, frequency and distribution of psychiatric
There has been no commentary on minimum validity
disorder) and hypothesis testing (eg, association between dis-
standards for psychiatric instruments used in prevalence
order and other variables of interest). To achieve these
studies. The following are recommended here: (1) instru-
purposes, identical methods of assessment and data collection
ment content for measuring disorder (items and questions)
must be used with all respondents so that the information for
should map into the operational criteria and symptoms
analysis is completely comparable. Any deviation from a
contained in existing nosological systems (International
standard data collection protocol applicable to all respond-
Classification of Diseases and Diagnostic and Statistical
ents creates the potential for biased comparisons. Standardi-
Manual); (2) classifications of disorder should be based on
sation of method refers not only to eliciting information from
compound criteria, including symptoms and evidence of
respondents but also to interviewer training, supervision,
impairment, distress, or disadvantage; (3) the identification
enlistment of respondents, and processing of data.
of cases should derive from an explicit rationale that
Are the survey instruments reliable? includes an external criterion and decision rules for
Reliability establishes the extent to which an instrument can discriminating between test positives and test negatives5; and
discriminate between individuals. To evaluate reliability, data (4) evidence should exist from head to head comparisons

38 EBMH notebook Evidence-Based Mental Health May 1998 Vol 1 No 2


Downloaded from http://ebmh.bmj.com/ on October 5, 2017 - Published by group.bmj.com

with independent assessment data that the instrument Comment


meets specificity criteria (ability to distinguish among differ- The criteria presented in this commentary identify guide-
ent categories of disorder). lines to evaluate the basic elements of prevalence studies:
sampling, measurement, and analysis. The objective is to help
Analysis the research consumer make informed judgments about the
(3) were special features of the sampling validity of a particular report. Basic guidelines are set to
design accounted for in the analysis? stimulate debate and further study. Although the criteria
Complex sampling methods mean that eligible respondents arise mostly from experience with prevalence studies done in
will have different probabilities of selection. These methods general population settings, they extend to studies done in
introduce design effectsa term used by survey reseachers to clinical settings, with one important caveat. In clinical
indicate that the sampling method will have an impact on the settings, the question, does the survey design yield a sample
calculation of variance estimates for testing hypotheses and of respondents representative of a defined target popula-
determining confidence intervals. Complex sampling meth- tion? is largely unanswerable. It is difficult, if not impossible,
ods require the use of special statistical methods to obtain to define the target populations that give rise to respondents
estimates that are unbiased and associated with the correct sampled from clinical settings. The idiosyncracies of referral
statistical precision. to mental health services render suspect the general applica-
(4) do the reports include confidence bility of prevalence estimates from one setting to the next.
intervals for statistical estimates? This issue needs further clarification as it raises an important
A primary objective of prevalence studies is to produce question about the usefulness of publishing prevalence esti-
frequency estimates of disorder overall and for population mates from studies done in clinical settings.
subgroups. The usefulness of these estimates derives from MICHAEL H BOYLE, PhD
the expected closeness between the unobserved value in the Department of Psychiatry, McMaster University
target population and the observed value in the sample. Hamilton, Ontario, Canada
Confidence intervals quantify this closeness by telling us the
1 Boyle MH. Sampling in epidemiological studies. In: Verhulst FH, Koot HM,
chance, for example 95%, that the unobserved target editors. The epidemiology of child and adolescent psychopathology. Oxford:
population value will fall within a certain range of the Oxford University Press, 1995.
2 Purpose and procedure. Evidence-Based Mental Health 1998 Feb;1:23.
observed sample value. Estimates in prevalence studies must 3 Hodges K. J Child Psychol Psychiatry 1993;34:4968.
be accompanied by confidence intervals or the information 4 Robins LN, Sartorius N. International Journal of Methods in Psychiatric
Research 1993;3:63141.
needed to calculate them. 5 Zarin DA, Earls F. Am J Psychiatry 1993;150;197206.

Some useful concepts and terms used in articles about


treatment
One of the important principles of practising evidence-based intervention using different samples and outcome tools can
mental health is that the results of research studies should be add to the weight of the evidence in non-experimental designs.
used to influence clinical decisions about a particular patient.
The best quality evidence for making decisions about Statistical significance versus clinical importance
treatment comes from randomised controlled trials or from Given evidence from a randomised controlled trial, statistical
overviews of several randomised controlled trials such as a significance is not the only criterion for deciding whether to
meta-analysis. The reason that a randomised controlled trial apply the results of a study. Statistical significance depends
provides the best evidence is that, in most circumstances, on the size of the difference between the groups, the amount
randomisation avoids any systematic tendency to produce an of variation in outcome within the groups, and on the
unequal distribution of prognostic factors between the number of patients. Clinically trivial differences can be statis-
experimental and control treatments that could influence the tically significant if the sample size is sufficiently large. Con-
outcome. It is important to remember that not all methods of versely, clinically important differences can be statistically
allocation which are described as random are truly random, non-significant if the sample size is too smallthat is, if the
and even with true randomisation there may still be study lacks power. Clinicians need to evaluate statistical
important differences at baseline between the groups due to significance and clinical importance in interpreting the
small sample sizes. It is also important that those who are results of randomised controlled trials and meta-analyses.
assessing outcome are blind to whether the patient received
the experimental or control treatments. If there is a Measures of clinical importance
statistically significant difference in the rate of a favourable How does one measure clinical importance? The usual esti-
outcome or in change scores from baseline in the mate of clinical importance is the effect size; the size of the
experimental group compared with the control group, then difference between the experimental and control groups.
it is concluded that the treatment is effective. Whether the outcome is measured in a categorical way (eg,
Evidence-Based Mental Health will only abstract treatment the prevention or treatment of disorders or the appearance
studies if the method of allocation is random, if there was of specific side effects) or in a continuous way (eg, mean
adequate follow up of subjects entered into the trial, and if symptom scores), the effect size reflects the difference
clinically important outcomes were reported. Unfortunately, between the experimental and control groups. Effect sizes
there may not be a randomised controlled trial for each clini- tend to be smaller in randomised controlled trials than in
cal question. If that is the case, then clinical decisions must be non-experimental designs and smaller when there is
made on the basis of the best available evidence taking all rel- adequate blinding or concealment of the intervention from
evant factors into account. Frequent replication of the the assessors of outcome.1

Evidence-Based Mental Health May 1998 Vol 1 No 2 EBMH notebook 39


Downloaded from http://ebmh.bmj.com/ on October 5, 2017 - Published by group.bmj.com

A common way of expressing effect size for categorical In Evidence-Based Mental Health (as in Evidence-Based Medi-
data is the relative risk (RR) or relative benefit (RB cine), we have preferred to use terms such as ABI and NNT to
depending on whether one is assessing a negative or positive capture the essence of clinically important differences.
outcome). The study by Kendall et al in this issue of Evidence-
Based Mental Health (p 43) provides a good illustration of What is a clinically important NNT?
these points. Kendall et al report the results of a randomised The answer to this depends on the burden of suffering of the
controlled trial comparing cognitive behaviour therapy disorder as measured by prevalence, morbidity, and out-
(CBT) with a waiting list control for children with anxiety come; the economics and difficulty of the treatment
disorders. 60 children were randomised to CBT and 34 to a procedure; and, finally, the cost of not treating the disorder. It
waiting list control. After 8 weeks of treatment, 53% (32 of 60) is useful to compare the NNT in the study by Kendall et al
of the children receiving CBT no longer met diagnostic cri- with values obtained in other areas in medicine and in men-
teria for their primary anxiety disorder compared with 6% (2 tal health. For example, in a meta-analysis by Hotopf et al,2 42
of 34) in the control group (p < 0.001). This difference is cer- patients need to be treated with a serotonin specific reuptake
tainly statistically significant, but the p value tells us nothing inhibitor to prevent 1 additional discontinuation of treat-
about its clinical importance. One measure of clinical ment with a tricyclic antidepressant presumably due to side
importance is the RB; that is, the probability of being free of effects. Based on the data from Essali et al,3 37 patients need
anxiety disorder after 8 weeks of CBT compared with the to be treated with clozapine to prevent 1 additional relapse
probability of being free of anxiety disorder in the control on a typical neuroleptic, however only 6 patients had to be
group. Using data from the article, we can calculate that the treated with clozapine to have 1 additional patient
RB is 9.1 or 32/60 2/34 = 9.1. In other words, anxious experience a clinically important improvement. Thus the
children receiving CBT are 9.1 times more likely to be free of NNT in the study by Kendall et al is really quite impressive
anxiety disorder than children on the waiting list after 8 and if replicated means that an effective form of psycho-
weeks. An alternative but similar statistic is to calculate the therapy is now available for children with anxiety disorders.
relative benefit increase (RBI) which is the proportional So far we have just considered ways of expressing effect size
increase in rates of a good outcome between the experimen- using categorical data. Most treatment studies in mental health
tal and control patients in the trial. It is calculated as the report changes in symptoms over time and between patient
experimental group event rate (EER) minus the control groups. With continuous data, the issue is more complicated
group event rate (CER) divided by the CER or (EER CER) but it is still possible to convert continuous measures into NNT.
CER. In this case, the RBI is (32/60 2/34) 2/34 or (More about this in a forthcoming issue of the glossary.)
8.07. In other words, there is roughly an 8 fold increase
(800%) in rates of being free of anxiety disorder in the Uncertainty and confidence intervals
experimental compared with the control group. Attentive One final point needs to be made. The statistics outlined
readers will notice that RBI = RB 1 (the difference due to above to estimate effect sizes are just that; they are estimates
rounding), a relation that always holds. derived from a particular sample. The true value may or may
This is a legitimate and popular way of reporting effect not be exactly the same as the estimated value. There is a
sizes but it has one serious limitation; it ignores the base rates degree of uncertainty associated with these estimates and we
in a study which could have a profound influence on the can quantify that degree of uncertainty using confidence
clinical application. Consider a situation in which the rate of intervals. Altman provides a useful definition of confidence
improvement in CBT was 9% compared with 1% in the con- intervals as the range of values within which we can be 95%
trol group. With a large enough sample size, this difference sure that the population value lies.4 In the example used
could be statistically significant. The RB still equals 9 and the above from the study by Kendall et al we can be 95% certain
RBI is still roughly 8 or 800%. However, most would agree that the true NNT is between 2 and 4 to produce one more
that the magnitude of the difference between the experimen- child free of anxiety disorder using CBT. In the study by
tal and control groups is quite trivial, particularly if the treat- Hotopf et al the 95% CI is between 24 and 148.2 Because the
ment was expensive, difficult to deliver, or required consider- degree of uncertainty is such an important variable in com-
able training (as CBT does, for example). In view of these paring results from different studies, we will also provide CIs
limitations, it has been argued that RB and the similar RBI around estimates of ABI and NNT even if these are not pro-
are not user friendly and do not provide the most clinically vided in the article itself.
important information. We hope that these tools derived from clinical epidemiol-
An alternative to these statistics that does take account of ogy will be helpful to clinicians in translating the results of
base rates is to consider absolute benefit increase (ABI) and, treatment interventions into clinical practice. Future issues of
from this, the number needed to treat (NNT). The ABI is the the notebook will explain terms used in prognosis studies
absolute arithmetic difference in rates of good outcomes and studies of causation and cost effectiveness among others.
between the experimental and control patients and refers to We welcome the feedback of our readers on these and other
the number of patients who benefit per 100 treated. It is sim- topics.
ply calculated as rate of a good outcome in the experimental PETER SZATMARI, MD
group minus the rate in the control group (in the study by Editor, Evidence-Based Mental Health
Kendall et al, the ABI is 53% 6% = 47%). Going one step
1 Schulz KF, Chalmers I, Hayes RJ, et al. JAMA 1995;273:40812.
further, the reciprocal of the ABI is the NNTthat is, the 2 Evidence-Based Mental Health 1998 Feb;1:21. Abstract of: Hotopf M, Hardy
number of patients who need to be treated to achieve 1 addi- R, Lewis G. Discontinuation rates of SSRIs and tricylcic antidepressants: a
meta-analysis and investigation of heterogeneity. Br J Psychiatry 1997 Feb;
tional good outcome. It is calculated as 1/ABI and in the 170:1207.
study by Kendall et al the NNT is 1/.47 = 2.13 which is 3 Evidence-Based Mental Health 1998 Feb;1:17. Abstract of: Essali MA, Rezk
E, Wahlbeck K, et al. Clozapine v typical neuroleptic medication for
rounded up to 3. In other words, 3 children need to be schizophrenia. In: Cochrane Database of Systematic Reviews, [updated 04
treated with CBT to achieve 1 additional good outcome over March 1997]. In the Cochrane Library [database on disk and CD-ROM].
The Cochrane Collaboration; issue 2. Oxford: Update Software, 1997.
having a patient on a waiting list. 4 Altman DG. Evidence-Based Medicine 1996 May-Jun;1:1024.

40 EBMH notebook Evidence-Based Mental Health May 1998 Vol 1 No 2


Downloaded from http://ebmh.bmj.com/ on October 5, 2017 - Published by group.bmj.com

Guidelines for evaluating prevalence studies

Michael H Boyle

Evid Based Mental Health 1998 1: 37-39


doi: 10.1136/ebmh.1.2.37

Updated information and services can be found at:


http://ebmh.bmj.com/content/1/2/37

These include:

References This article cites 3 articles, 1 of which you can access for free at:
http://ebmh.bmj.com/content/1/2/37#BIBL

Email alerting Receive free email alerts when new articles cite this article. Sign up in the
service box at the top right corner of the online article.

Notes

To request permissions go to:


http://group.bmj.com/group/rights-licensing/permissions

To order reprints go to:


http://journals.bmj.com/cgi/reprintform

To subscribe to BMJ go to:


http://group.bmj.com/subscribe/

You might also like