0% found this document useful (0 votes)
314 views22 pages

Empirically Supported Treatment: Recommendations For A New Model

Uploaded by

Levente Dobai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
314 views22 pages

Empirically Supported Treatment: Recommendations For A New Model

Uploaded by

Levente Dobai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Empirically Supported Treatment: Recommendations for a

New Model
David F. Tolin, The Institute of Living and Yale University School of Medicine
Dean McKay, Fordham University
Evan M. Forman, Drexel University
E. David Klonsky, University of British Columbia
Brett D. Thombs, Jewish General Hospital and McGill University

Over the 20 years since the criteria for empirically sup- David Barlow, first published criteria for what were ini-
ported treatments (ESTs) were published, standards for tially termed “empirically validated psychological treat-
synthesizing evidence have evolved and more system- ments” (Task Force on Promotion and Dissemination
atic approaches to reviewing the findings from interven- of Psychological Procedures, 1993) and later termed
tion trials have emerged. Currently, the APA is planning
“empirically supported psychological treatments”
(Chambless & Hollon, 1998; Chambless & Ollendick,
the development of treatment guidelines, a process that
2001). The identification of empirically supported treat-
will likely take many years. As an intermediate step, we
ments (ESTs) has had substantial impact in psychology
recommend a revised set of criteria for ESTs that will
and related mental health disciplines. One immediately
utilize existing systematic reviews of all of the available
tangible effect of the movement to identify ESTs has
literature, and recommendations that address the been the validation of procedures for specific psycho-
methodological quality, outcomes, populations, and logical problems, and the dissemination of that informa-
treatment settings included in the literature. tion to practitioners, consumers, and other stakeholders
Key words: clinical significance, empirically sup- on the web (www.psychologicaltreatments.org).
ported treatment, GRADE tool, systematic reviews. [Clin Since (and perhaps in part due to) that early work,
Psychol Sci Prac, 2015] the quantity of treatment outcome studies has increased
dramatically. A search of PsycINFO for the terms “ran-
domized controlled trial” or “randomised controlled
CONSIDERATIONS IN THE EVALUATION OF EMPIRICALLY trial” (November 23, 2014) yielded only 20 citations
SUPPORTED TREATMENTS: ARE EMPIRICALLY SUPPORTED for the year 1995, compared to 123 in 2000, 427 in
TREATMENTS STILL RELEVANT? 2005, and 950 in 2010. Among this increase in ran-
Over two decades ago, the Society of Clinical Psychol- domized controlled trials (RCTs), we see a wide range
ogy (Division 12 of the American Psychological Asso- of therapeutic approaches being evaluated for efficacy.
ciation [APA]), under the direction of then President Since the development of the original list of ESTs,
most of which were cognitive-behavioral treatments,
efficacy trials for psychodynamic therapy (Milrod et al.,
Address correspondence to David F. Tolin, Ph.D., Anxiety
2007), transference-focused psychotherapy (Yeomans,
Disorders Center, The Institute of Living, 200 Retreat Avenue,
Hartford, CT 06106. E-mail: david.tolin@hhchealth.org.
Levy, & Caligor, 2013), family-based therapy (Lock
et al., 2010), and interpersonal psychotherapy (e.g.,
doi:10.1111/cpsp.12122 Parker, Parker, Brotchie, & Stuart, 2006) have

© 2015 American Psychological Association. Published by Wiley Periodicals, Inc., on behalf of the American Psychological Association.
All rights reserved. For permissions, please email: permissions@wiley.com. 1
appeared, to name a diverse few. The result has been a problem areas. This increased consumer education
greater emphasis on empiricism among approaches that encourages clinicians who might otherwise not have
previously lacked a history of accumulating research practiced in an empirically supported manner to
support. This increase in diverse outcome research has acquire the necessary skills to begin offering scientifi-
shifted the debate among practitioners of different the- cally based treatments. Perhaps the most ambitious
oretical persuasions from mere assertions of theory to a illustration of the impact of the movement toward sci-
consideration of empirical evidence. entifically tested treatments on clinical practice are the
The quality of available research evidence has also National Institute of Clinical Excellence standards in
increased substantially over the past 20 years. Detailed the United Kingdom (NICE; Baker & Kleijnen, 2000),
and stringent guidelines have now been published established to ensure that clinicians practice specific and
regarding the execution and reporting of methodologi- accepted empirically based interventions for different
cally sound treatment outcome studies (Moher, Schulz, psychological conditions (see http://guid-
& Altman, 2001), and leading psychology journals such ance.nice.org.uk/Topic/MentalHealthBehavioural).
as the Journal of Consulting and Clinical Psychology Similarly, the Veterans Health Administration, which
require that manuscripts adhere to such guidelines (re- serves nearly 6 million veterans in the United States,
trieved November 23, 2014, from http://www. has undertaken a complete overhaul of its mental
apa.org/pubs/journals/ccp/index.aspx). These changes health practices and is implementing a systemwide dis-
have led to a greater emphasis on study quality. Given semination of empirically based treatments for posttrau-
the emphasis on establishing procedures as empirically matic stress disorder, depression, and serious mental
supported, guidebooks have been published that care- illness (Ruzek, Karlin, & Zeiss, 2012).
fully document how to design sound therapy research Importantly, the early work on ESTs was an impor-
investigations (e.g., Arean & Kraemer, 2013). Recently, tant catalyst for the APA’s relatively recent emphasis on
a review of trials of psychodynamic and cognitive-be- evidence-based practice (EBP). EBP is a broad template of
havioral therapies, using a rating scale of various aspects activities that include assessment, case formulation, rela-
of methodological quality and study reporting (Kocsis tionship factors, and treatment decisions that will assist
et al., 2010), concluded that study quality and report- the clinician to work with a patient to achieve the best
ing have been significantly improving over the past possible outcome. In 2006, a Presidential Task Force of
four decades (Thoma et al., 2012). the American Psychological Association (APA Presi-
The EST movement has led to changes in how stu- dential Task Force on Evidence-Based Practice, 2006)
dents are trained in clinical practice. Although training adapted the Institute of Medicine’s (2001) definition of
programs still have a wide degree of latitude, EST lists evidence-based medicine, defining EBP as practice that
help guide curricula and inform syllabi. Most promi- integrates three sources of information: patient charac-
nently, the APA Commission on Accreditation’s Guideli- teristics, clinical expertise, and the best available
nes and Procedures (2013) encourages programs to train research evidence.
students in assessment and treatment procedures based on It might well be asked, given the broad movement
empirically supported methods, encourages placement in in psychology and other health disciplines toward EBP,
training settings that employ empirically supported whether identification of ESTs is still a necessary task.
approaches, and encourages internship training sites to We argue that it is, perhaps now more than ever. The
include methods of demonstrating that interns possess “three-legged stool” of research evidence, patient char-
intermediate to expert-level knowledge in ESTs. acteristics, and clinician expertise leaves room for
Finally, the development of lists of ESTs has resulted debate about the relative importance of each; however,
in greater protections for the public. By developing a we suggest that EBP is best approached as starting from
list of established and empirically supported interven- the perspective of ESTs—that is, for any given prob-
tions, treatment-seeking individuals are now better lem, what treatment or treatments have proven effica-
able to learn about and seek out information on cious? This scientific information is then interpreted
well-validated treatments for specific disorders and and potentially adapted based on clinician expertise and

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 2


patient characteristics. Thus, where treatment selection problems in how research evidence is synthesized and
is concerned, EBP might be thought of as an approach evaluated. The original Division 12 report on ESTs
to ESTs, filtering that scientific information through delineated specific criteria (see Table 1) by which a
the clinician’s and patient’s “lenses” (Djulbegovic & treatment would be regarded as “probably efficacious”
Guyatt, 2014; Tolin, 2014). or “well established” (Chambless & Hollon, 1998;
As a brief example, a clinician may want to select a Chambless & Ollendick, 2001; Task Force on Promo-
treatment approach for an impoverished African Amer- tion and Dissemination of Psychological Procedures,
ican man with a presenting complaint of depression, as 1993), and these criteria are still being used today. In
well as a significant drinking problem. Most likely, no brief, to meet the highest standard of “well estab-
published list of ESTs will match this situation per- lished,” a treatment must be supported by (a) at least
fectly. However, using the “filter system” of EBP may two independently conducted, well-designed studies or
lead to a helpful solution. Examination of the available (b) a large series of well-designed and carefully con-
ESTs for depression alerts the clinician to the fact that trolled single-case design experiments. To meet the
behavioral activation has strong empirical support in standard of “probably efficacious,” a treatment must be
the treatment of depression (Lejuez, Hopko, & Hopko, supported by at least one well-designed study or a small
2001; Lewinsohn, Biglan, & Zeiss, 1976; Martell, series of single-case design experiments.
Addis, & Jacobson, 2001). The contributing research, Given the proliferation of clinical research over the
however, did not address the present patient’s charac- past two decades, the improved quality of clinical
teristics such as socioeconomic status, race, and the research, and the adoption of more sophisticated meth-
presence of a co-occurring substance use disorder. The
clinician would therefore rely on expertise and addi-
tional research to understand how an EST such as Table 1. Current definitions of “well established” and “probably
efficacious” treatments (adapted from Chambless et al., 1998)
behavioral activation might be adapted in a manner
Well Established
that successfully addresses these issues. These modifica-
tions might include specific cultural adaptations (Ben-
I At least two good between-group design experiments
ish, Quintana, & Wampold, 2011; Griner & Smith, demonstrating efficacy in one or more of the following ways:
2006; van Loon, van Schaik, Dekker, & Beekman, Superior (based on statistical significance alone) to pill or
A
2013) or the addition (either concurrently or sequen- psychological placebo or to another treatment.
tially) of an EST for drinking problems such as behav- B Equivalent to an already established treatment in experiments
with adequate statistical power, considered to be approximately
ioral couples therapy (O’Farrell, Cutter, Choquette, 30 per group.
OR
Floyd, & Bayog, 1992) or contingency management II A large series of single-case design experiments (n > 9)
demonstrating efficacy. These experiments must have:
(Petry, Martin, Cooney, & Kranzler, 2000). The treat-
ment(s) must also be delivered competently in a way A Used good experimental designs and
B Compared the intervention to another treatment as in IA.
that successfully engages the patient, thus requiring a Further criteria for both I and II:
III Experiments must be conducted with treatment manuals.
high level of clinical competency and cross-cultural Characteristics of the client samples must be clearly specified.
IV
awareness. The process starts, however, with identifica- V Effects must have been demonstrated by at least two different
investigators or investigating teams.
tion of a specific EST. To make informed decisions,
patients and clinicians must be aware of the available Probably Efficacious
scientific evidence, and the degree of confidence that
can be placed in that evidence.
I Two experiments showing the treatment is superior (based on
statistical significance alone) to a waiting-list control group.
WHY DOES THE LIST NEED TO BE REVISED? OR
II One or more experiments meeting all criteria for well-established
Many authors, including those broadly in agreement treatments except V (demonstration by independent investigator
teams).
with the EST concept in theory, have raised significant OR
concerns about how ESTs are currently defined. Many III A small series of single-case design experiments (n > 3) meeting
well-established treatment criteria II, III, and IV.
of the critiques of the EST movement point to

EMPIRICALLY SUPPORTED TREATMENTS  TOLIN ET AL. 3


ods for research synthesis and evaluation, we concur only for pharmaceutical studies but also for studies of
with many critics who have suggested that the current psychological interventions, although poor adherence
criteria are outdated (see Table 2). The evaluation to registration policies and poor quality of trial registra-
based on two studies sets an unacceptably low bar for tions have been problematic (Riehm, Azar, & Thombs,
efficacy, may not account for mixed findings, and risks 2015).
creating a misrepresentative and highly selective The exclusive focus on symptom reduction risks
impression of efficacy (Borkovec & Castonguay, 1998; ignoring other potentially important clinical outcomes,
Henry, 1998; Herbert, 2003). For example, if two such as functional impairment (Dobson & Beshai,
studies find evidence that a given treatment is effica- 2013), despite the fact that functional concerns are a
cious, five studies find the treatment is no better than leading reason for individuals to seek treatment (Hunt
placebo, and 10 studies find that the treatment is worse & McKenna, 1993). Although symptom reduction and
than placebo, the current criteria for a designation of a improvements in functioning are significantly corre-
“well-established” EST would be satisfied. This is not a lated, there can be a mismatch after treatment (see
hypothetical scenario, and many bodies of treatment Vatne & Bjorkly, 2008, for review). Thus, it is possible
evidence include some studies with statistically signifi- that a treatment is highly effective at reducing specific
cant results favoring a treatment and other studies that target symptoms, and yet the patient fails to achieve
report null or even negative findings. This is a problem desired clinical outcomes such as improved social or
that occurs across areas of research, and its influence occupational functioning. Therefore, a number of
has been well documented in the evidence on pharma- scholars have cautioned against the overreliance of
ceutical products, where a clear bias for trials favorable symptom-based evaluations of efficacy and have instead
to a sponsored product has been demonstrated (Lex- urged consideration of wellness, quality of life, well-be-
chin, Bero, Djulbegovic, & Clark, 2003; Lundh, Sis- ing, and functionality (Cowen, 1991; Hayes, 2004;
mondo, Lexchin, Busuioc, & Bero, 2012). Registration Seligman, 1995). We propose that symptom reduction
of clinical trials (e.g., at www.clinicaltrials.gov) is no longer be considered the sine qua non of treatment
increasingly emphasized to address this problem, not outcome. Symptom reduction is important in deter-

Table 2. Common critiques of the EST movement and suggested changes

Area Critiques Proposed Changes

Concerns about the strength of • Inadequate attention to null or negative findings • Emphasize systematic reviews rather than
treatment • Reliance on statistical, rather than clinical, individual studies
significance • Separate strength of effect from strength of
• Inadequate attention to long-term outcomes evidence
• Potentially significant variability in study quality • Grade quality of studies
• Consider clinical significance in addition to
statistical significance
• Consider long-term efficacy in addition to
short-term efficacy
Concerns about selecting among
multiple treatment options • Within a given EST category, there is little basis for
choosing one over another
• Present quantitative information about
treatment strength
• Lack of clarity about whether empirical support
translates to a recommendation
• Make specific recommendations based on
clinical outcomes and the quality of the
available research
Concerns about the relevance of
• Inadequate attention to functional outcomes • Include functional or other health-related
findings
• Inadequate attention to effectiveness in outcomes as well as symptom outcomes
nonresearch settings or with diverse populations • Address generalization of research findings to
nonresearch settings and diverse populations
Concern about unclear active
treatment ingredients and the • Listing of packaged treatments rather than
empirically supported principles of change
• Evaluate and encourage dismantling research
to identify empirically supported principles of
proliferation of manuals for specific
• Emphasis on specific psychiatric diagnoses change
diagnoses
• De-emphasize diagnoses and emphasize
syndromes/mechanisms of psychopathology

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 4


mining the efficacy of a treatment, but the value of those working in routine clinical settings. The issue of
symptom reduction is greatly diminished if functional treatment generalizability is complex. Patients seen in
improvement is not also demonstrated. Functional out- clinical settings do not necessarily appear more com-
comes address domains of psychosocial functioning, plex or severe than those seen in clinical trials; in one
which may include work attendance or performance, study of clinical outpatients deemed ineligible for
school attendance or performance, social engagement, depression research trials, the most common reasons for
or family functioning. Several measures of such func- exclusion were partial remission of symptoms at intake
tional outcomes have been published, including the and insufficient severity or duration of symptoms.
Sheehan Disability Scale (Sheehan, 2008), Leibowitz Importantly, of those meeting criteria for major depres-
Self-rating Disability Scale (Schneier et al., 1994), sion, none were excluded due to Axis I or Axis II
Work and Social Adjustment Scale (Mundt, Marks, comorbidity (Stirman, Derubeis, Crits-Christoph, &
Shear, & Greist, 2002), Range of Impaired Functioning Rothman, 2005).
Tool (Leon et al., 1999), and the functional subscales Evidence for differential efficacy of treatments
of the Outcomes Questionnaire (Lambert et al., 1996), administered in research versus clinical settings is
in addition to a wide array of performance-based func- mixed. In some cases, randomized and nonrandomized
tional tests from disciplines such as industrial/organiza- patients receiving similar treatments appear to do
tional psychology. The value of specific measures in equally well (Franklin, Abramowitz, Kozak, Levitt, &
the evidence review will depend on their psychometric Foa, 2000), whereas in other cases, treatments adminis-
properties and direct relevance to the clinical problem tered in a research setting yield outcomes superior to
being treated. the same treatments administered in a clinical setting
Quality of life (QOL) is a less well-defined construct (Gibbons, Stirman, DeRubeis, Newman, & Beck,
(Gill & Feinstein, 1994), which is problematic for 2013; Kushner, Quilty, McBride, & Bagby, 2009). The
many trials of psychological treatment, given its appar- reasons for a possibly stronger response in research trials
ently strong overlap with depression (Keltner et al., are unclear, but could include factors such as therapist
2012). We therefore concur with Muldoon, Barger, training and fidelity monitoring, setting time limits for
Flory, and Manuck (1998) that objective functioning treatment, and providing feedback to clinicians and
and subjective appraisals of well-being be considered patients on treatment progress.
separately. Nevertheless, there is increasing interest in Many have called for a greater emphasis on effective-
QOL as an outcome measure in trials of psychological ness research, which focuses primarily on the generaliz-
treatments, particularly in the United Kingdom (e.g., ability of the treatment to more clinically representative
Layard & Clark, 2014), and its inclusion in treatment situations. We therefore suggest that the evaluation of
guidelines should be considered carefully going ESTs attend not only to the efficacy of a treatment in
forward. research settings, but also in terms of that treatment’s
There is, at present, no clear way to establish effectiveness in nonresearch settings. Criteria that could be
whether a treatment has proven effective with diverse considered include more diagnostically complex
populations or in more clinically representative settings patients, effectiveness with nonrandomized patients,
(Beutler, 1998; Goldfried & Wolfe, 1996, 1998; Gon- effectiveness when used by nonacademic practitioners,
zales & Chambers, 2002; Norcross, 1999; Seligman, and utility in open-ended, flexible practice.
1996). Concerns about the transportability of treatment The internal validity and degree of research bias in
include the fact that patients seen in routine clinical clinical trials are not adequately addressed, potentially
practice might be more complex or heterogeneous than making the results prone to false-positive results
those in efficacy-oriented RCTs, that willingness to be (Luborsky et al., 1999; Wachtel, 2010). Internal valid-
randomized to treatments may be a confounding factor ity relates to the degree to which a given trial likely
that diminishes sample representativeness, and that the answers the research question being evaluated correctly
therapists used in efficacy RCTs are more highly or free from bias. Bias is systematic error that can lead
trained, specialized, monitored, or structured than are to underestimation or overestimation of true treatment

EMPIRICALLY SUPPORTED TREATMENTS  TOLIN ET AL. 5


effects (Higgins & Green, 2008). It is not usually lines for panic disorder (National Institute for Clinical
possible to know with precision the degree to which Excellence, 2011), clinicians are advised to use
design flaws may have influenced results in a given cognitive-behavioral therapy (CBT). We would not
treatment trial, but elements of trial design have been disagree with this recommendation; however, NICE
shown to be related to bias. In RCTs, generally, design provides little means for understanding what kind of
weaknesses related to allocation concealment, blinding, CBT is most helpful or the strength of various inter-
and randomization methods may be expected to influ- ventions. Thus, although existing guidelines are com-
ence effect estimates, particularly when outcomes are prehensive and immediately available, we argue that
subjective (Savovic et al., 2012), which is the case in there is room for an alternative source of information
most trials of psychological treatments (Wood et al., for consumers of research on psychological treatments.
2008). An additional example is the researcher alle- As the Society of Clinical Psychology has been at the
giance effect (Gaffan, Tsaousis, & Kemp-Wheeler, forefront of identifying and disseminating ESTs for the
1995; Luborsky et al., 1999). The presence of past two decades and is one of the most prominent
researcher allegiance does not necessarily imply bias organizations dedicated to psychological ESTs in par-
(Hollon, 1999; Leykin & DeRubeis, 2009); however, ticular, it is logical for this group to take the lead in
it is a risk factor that has been shown empirically to be this next phase of treatment evaluation.
associated with some probability of bias. Financial con- In recent years, the APA Advisory Steering Commit-
flict of interest, a demonstrated source of publication tee for the Development of Clinical Practice Guidelines
bias in pharmaceutical studies (Friedman & Richter, was formed to provide research-based recommendations
2004; Lexchin et al., 2003; Perlis et al., 2005), may for the psychological treatment of particular disorders
also be considered in rating risk of bias (Bero, 2013; (Hollon et al., 2014). When in place, guideline devel-
Roseman et al., 2011, 2012), although conflict of opment panels, under the direction of the Steering
interest may be harder to identify and quantify in stud- Committee, will oversee the development of clinical
ies of psychological treatments. practice guidelines. A number of steps that have been
proposed by the Advisory Steering Committee to gen-
DOES THE WORLD NEED ANOTHER LIST OF ESTS? erate patient-focused, scientifically based, clinically use-
Even though, as we argue, it remains of vital impor- ful guidance point the way toward steps that should be
tance to identify ESTs, one might ask whether another taken for a much-needed update of EST standards.
list would be beneficial to the field. We suggest that a Two of them, in particular, should be central to mod-
well-designed list could be of great import, filling ernizing EST standards: (a) the evaluation of all existing
noticeable gaps in the available knowledge. Three evidence via high-quality systematic reviews, which
alternative systems with which readers are likely to be include (i) evaluation of relevance to clinical practice,
familiar include the NICE standards in the United including treatment fidelity; (ii) an assessment or risk of
Kingdom (Baker & Kleijnen, 2000), the Practice bias; and (iii) other considerations, including evaluation
Guidelines published by the American Psychiatric of multiple clinical outcomes, including functional, as
Association (e.g., 2009, 2010), and the Veterans well as symptom, outcomes; and (b) a committee-based
Administration/Department of Defense Clinical Prac- appraisal of the evidence, using the Grading of Recom-
tice Guidelines (e.g., Veterans Health Administration, mendations Assessment, Development, and Evaluation
2004, 2009). These systems are available immediately (GRADE) system (Atkins et al., 2004; Guyatt et al.,
and have the advantage of addressing both psychologi- 2006, 2008) to assess the quality of relevant evidence
cal and pharmacological treatments. However, the and degree to which benefits are established in excess of
breadth of these systems is also a limitation for psychol- potential harms.
ogists. As broad guidelines, they lack the depth of The proposed process by the APA Advisory Steering
information that clinical psychologists or other psy- Committee for the Development of Clinical Practice
chotherapy practitioners would need to make informed Guidelines represents an important step forward in initi-
treatment decisions. For example, in the NICE guide- ating a disorder-based guideline development process

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 6


for psychological treatments. This process, which paral- for reasons other than those proposed by the treatment
lels that used by the Institute of Medicine (2011a, developers (Lohr, Tolin, & Lilienfeld, 1998).
2011b), is expected to result in a transparent system of An emphasis on identifying the active ingredients of
treatment recommendations for practitioners and con- change need not exclude factors associated with the
sumers. However, it is an expensive and extremely therapeutic relationship. Indeed, many have suggested
time-consuming process, and it is unlikely that the Task that the therapeutic relationship accounts for greater
Force will develop recommendations for a wide range variance in clinical outcomes than do those aspects of
of clinical problems in the immediate future. Indeed, the therapy that are described as “techniques” (Blatt &
the APA initiated a process for producing guidelines in Zuroff, 2005; Henry, 1998; Lambert & Barley, 2001;
2010 and announced panels to develop guidelines for Norcross, 1999). Relationship-oriented therapist
the treatment of obesity and posttraumatic stress disor- behaviors are themselves subject to empirical scrutiny.
der in 2012 and for depressive disorders in 2013, but A pressing question, however, is whether there is
has not yet generated any finished guidelines. Thus, enough research to date to make meaningful recommen-
there is an immediate need for dissemination of up-to- dations to practitioners, consumers, and other stakehold-
date, evidence-based guidance that can not only com- ers based solely on empirically supported principles of
plement the work of the APA Task Force, but also pro- change. We suggest that the field is approaching that tar-
vide practitioners with clear information about the get, but has not yet arrived. Certainly, there is much
strength of ESTs and the degree of confidence that can work being done in this area (e.g., Castonguay & Beutler,
be derived from the available evidence. 2006); however, in our opinion, the field has not yet
amassed a body of evidence that would adequately address
TO WHAT EXTENT SHOULD WE FOCUS ON ESTABLISHED the multiple concerns of patients seen in clinical settings.
TREATMENTS, VERSUS PRINCIPLES OF CHANGE? As just one example, a recent review concluded that the
Over time, the field would likely benefit from a shift mechanisms of prolonged exposure (PE) for posttraumatic
away from “named” or “packaged” treatments. The cur- stress disorder (PTSD), which is a well-studied and fairly
rent EST list includes more recent multicomponent straightforward treatment, remain unclear (Zalta, 2015). It
treatments that contain many different interventions would be difficult, therefore, to evaluate only mecha-
within one treatment “package.” CBT for fibromyalgia, nism-based processes at this time, although we believe
as one example of a treatment currently identified as well that such research should be emphasized going forward.
established, is described as including education, relax-
ation, graded behavioral activation, pleasant activity HOW SHOULD WE HANDLE TREATMENTS WITH CONFLICTING
scheduling, sleep hygiene, stress management, goal set- EVIDENCE?
ting, structured problem solving, reframing, and com- As noted previously, a primary limitation of the exist-
munication skills (Bernardy, Fuber, Kollner, & Hauser, ing criteria is that it allows reviewers to select two pos-
2010). While the assessment of such treatment packages itive studies, while potentially ignoring studies with
is a necessary step in identifying what works, such null or even negative outcomes. In our view, the only
research does not allow for a determination of which defensible strategy is a systematic (quantitative) review
aspects of the treatment are responsible for change (Bor- that takes into account all of the available research evi-
kovec & Castonguay, 1998; Gonzales & Chambers, dence, rather than selecting a limited number of posi-
2002; Henry, 1998). That is, within a given treatment tive studies. This is the approach that has been
package, there is no way to determine which compo- proposed by the APA Advisory Steering Committee
nents of that treatment are therapeutically active or inert. for the Development of Clinical Practice Guidelines
As a result, practitioners are often unable to make (Hollon et al., 2014). Twenty years ago, there were
informed decisions about which treatments to use (Her- not enough controlled research trials, in many cases,
bert, 2003; Rosen & Davison, 2003; Westen, Novotny, for such a process to be feasible. Today, however, the
& Thompson-Brenner, 2004), and many treatments may field has seen a marked increase in published research,
be weakened by ineffective components and/or work making larger-scale reviews possible.

EMPIRICALLY SUPPORTED TREATMENTS  TOLIN ET AL. 7


HOW MUCH WEIGHT SHOULD WE AFFORD IMMEDIATE clinical practice might be more complex or heteroge-
VERSUS LONG-TERM EFFICACY OF TREATMENTS? neous than those in efficacy-oriented RCTs, that will-
Both short-term and long-term outcomes of psycho- ingness to be randomized to treatments may be a
logical treatment are important. Short-term outcomes confounding factor that diminishes sample representa-
are frequently the strongest and give the best estimate tiveness, and that the therapists used in efficacy RCTs
of the immediate efficacy of the treatment. However, are more highly trained, specialized, monitored, or
it is quite possible that a given treatment is effective in structured than are those working in routine clinical
the short term but not at a time point well after treat- settings. Many have therefore called for a greater
ment discontinuation (i.e., participants exhibited signs emphasis on effectiveness research, which focuses primarily
of relapse). In some cases, this might reflect a basic on the generalizability of the treatment to more clini-
weakness of the treatment, suggesting that its effects are cally representative situations.
not durable. In some other cases, it could be argued We suggest that treatments should be evaluated from
that the treatment is only effective so long as one both an efficacy and effectiveness perspective. Specifi-
remains in the treatment; so long as the treatment can cally, it is important to identify treatments that are not
be feasibly delivered on a maintenance basis, this is not only efficacious in research-based settings but have also
necessarily a fatal flaw. For example, while many have demonstrated evidence of effectiveness in more typical
pointed out that gold standard cognitive-behavioral clinical settings. Criteria that could be considered
treatments for obesity have short-term effects (most include more diagnostically complex patients, effective-
people eventually gain back their lost weight), others ness with nonrandomized patients, effectiveness when
point out that a continuous care model is both feasible used by nonacademic practitioners, and utility in open-
and better suited to the problem of overeating (Perri, ended, flexible practice.
Sears, & Clark, 1993). In still other cases, a lack of
long-term efficacy may reflect the presence of compet- HOW SHOULD TREATMENT COSTS AND BENEFITS BE
ing issues (e.g., chronic psychosocial stressors) that WEIGHED?
complicate the long-term prognosis despite an appar- There is, unfortunately, no quantitative “gold standard”
ently successful treatment, suggesting the need for sup- for determining whether or not a treatment is cost-ef-
plemental intervention. Alternatively, it is possible that fective. Nevertheless, cost-effectiveness considerations
a treatment might show only modest clinical effects at must be taken into account. Two treatments may show
immediate posttreatment, but outcomes become stron- similar clinical effects, but if one treatment is clearly
ger after treatment discontinuation (sleeper effects) due more costly to consumers, third-party payers, or society
to memory consolidation effects, skill practice effects, (e.g., the treatment requires a very large number of ses-
naturalistic reinforcement, or other factors. Consumers, sions, long duration, or hospitalization), then this
practitioners, and policymakers should be able to evalu- should be taken into consideration. It would be pro-
ate both short- and long-term treatment effects as part hibitive to conduct a full cost-benefit analysis of every
of a systematic review. psychological treatment, but a reasonable panel of
reviewers should be able to upgrade or downgrade a
HOW SHOULD WE ADDRESS EFFICACY VERSUS treatment based on obvious strengths or weaknesses in
EFFECTIVENESS? cost or patient burden.
Many authors have questioned whether the results of
RCTs conducted in clinical research settings will trans- WHAT STRENGTH OF EFFECT SHOULD BE CONSIDERED
late to more clinically representative settings such as “GOOD”?
private practice, community mental health centers, and Various attempts to define cutoffs of “good response”
hospitals (Beutler, 1998; Goldfried & Wolfe, 1996, have been proposed. Cohen (1988), for example, sug-
1998; Gonzales & Chambers, 2002; Norcross, 1999; gested that effect sizes (d) of 0.2, 0.5, and 0.8 be con-
Seligman, 1996). Concerns about the transportability of sidered small, moderate, and large effects, respectively.
treatment include the fact that patients seen in routine Others have proposed varying definitions of treatment

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 8


response and remission (Andreasen et al., 2005; Doyle symptoms of a psychological disorder directly. Thus, a
& Pollack, 2003; Frank et al., 1991; McIntyre, Fallu, & “clinically meaningful” treatment result for one group
Konarski, 2006; Simpson, Huppert, Petkova, Foa, & and purpose might not be suitable for another group
Liebowitz, 2006), usually operationalized as a cutoff and purpose. The conclusion that a treatment is “effi-
score on a standardized measure. Similarly, many have cacious” therefore is a subjective process that requires
called for the use of reliable change (demonstration that human decision-making.
reduction on a measure is greater than would be
expected to occur at random) and clinically significant A PROPOSED SYSTEM OF TREATMENT EVALUATION FOR THE
change (variously described as posttreatment scores no SOCIETY OF CLINICAL PSYCHOLOGY
longer in the pathological range, posttreatment scores As described previously, the proposed process of sys-
in the normal range, or posttreatment scores that are tematic evaluation by the APA Advisory Steering
closer to the normal range than the pathological range) Committee for the Development of Clinical Practice
as outcome criteria (Jacobson, Follette, & Revenstorf, Guidelines represents a clear move in the right direc-
1984; Lambert & Bailey, 2012). Some have used the tion. However, we argue that there remains a need,
criterion of good end-state functioning (e.g., Feeny, both due to the time-consuming nature of the APA
Zoellner, & Foa, 2002), reflecting scores in the normal process and due to the specific needs of clinical
range on a variety of different measures, not solely psychologists and consumers for evidence-based deci-
measures of the disorder being treated. From a popula- sion-making, for the Society of Clinical Psychology to
tion-based perspective, some have suggested the use of create a new system by which scientific evidence of
statistics such as number needed to treat (NNT), treatment efficacy can be evaluated and disseminated in
reflecting the number of patients needed to treat to a clear, transparent, and cost-effective manner that pri-
observe one improvement. oritizes the empirical basis of psychological treatments.
These methods (many of which overlap consider- The system we propose here is consistent with the
ably) all have their individual strengths and weaknesses. methods that will be used by the APA Task Force
Ultimately, however, there is no clear consensus in the (Hollon et al., 2014), but requires less time and there-
field to tell us how strong of an effect must be fore can provide more rapid dissemination of findings
observed before we pronounce a treatment to be effi- and recommendations. The most time-consuming
cacious. In our view, the degree to which treatment aspect of the APA Task Force will be the systematic
effects are considered clinically meaningful is highly review of research findings. That process could be
dependent on contextual factors such as the disorder greatly sped up by using existing, published systematic
being treated and the goals of treatment. In a case of reviews of the literature. Since the original EST criteria
(for example) mild depression treated on an outpatient were developed, systematic reviews and meta-analyses
basis, full remission and good end-state functioning are now available for most interventions, and for many
might be considered appropriate targets, and one might of these, the Task Force will be able to use high-qual-
be skeptical of a treatment that fails to achieve those ity reviews that have already been published in order
goals. On the other hand, for chronically psychotic to expedite its work.
patients seen in residential or day treatment, improve- We note as well that although many of the existing
ments in psychosocial functioning, regardless of the clinical trials and systematic reviews are based on par-
presence of psychotic symptoms, might be considered ticipants selected according to diagnostic criteria (e.g.,
an appropriate goal, and full remission would not be those listed in the Diagnostic and Statistical Manual of
reasonably expected. Brief inpatient interventions for Mental Disorders [5th ed.; DSM-5; American Psychiatric
suicidality may have as their aim the reduction of suici- Association, 2013]), there is no requirement that they
dal ideation and behavior, but not necessarily the do so. Indeed, the reliability and validity of the DSM
remission of depression. Interventions with medical and the medical taxonomy implied therein have been
populations might aim to improve compliance with critiqued as a basis for psychotherapy research (Fenster-
treatment regimens, but not necessarily address the heim & Raw, 1996; Henry, 1998). Over the coming

EMPIRICALLY SUPPORTED TREATMENTS  TOLIN ET AL. 9


years, we encourage clinical psychology researchers to APA work groups (American Psychological Associa-
focus on distinct, empirically derived syndromes of psy- tion, 1995, 2002), review panels should (a) be com-
chopathology (which can range from mild to severe), posed of individuals with a broad range of documented
rather than on categorical diagnoses. Such a shift would expertise, (b) disclose actual and potential conflict of
comport well with the Research Domain Criteria interest, (c) maintain a climate of openness and free
(RDoC) project currently underway within the exchange of views, and (d) have clearly defined pro-
National Institute of Mental Health (Insel et al., 2010), cesses and methods.
although the specific RDoC dimensions may or may When an individual nominates a treatment for eval-
not be those chosen as targets for psychotherapy uation, the nominator may provide existing reviews or
research. That shift would also likely decrease the EST may create a new review for this purpose, although all
movement’s reliance on a large number of treatment reviews will be evaluated carefully for thoroughness
manuals, a process to which many authors, even those and risk of bias (see below). Published or unpublished
supportive of the broad EST movement, object (e.g., systematic reviews that are not deemed to meet rigor-
Fonagy, 1999; Goldfried & Eubanks-Carter, 2004; ous quality standards will not be considered for EST
Levant, 2004; Norcross, 1999; Wachtel, 2010; Westen designation. Recently conducted reviews (i.e., within
et al., 2004). Understanding the core dimensions of the past 2 years) will be required unless the evidence in
pathology and the treatments that target this dimension an older review is robust and a strong case can be
would create a much simpler, more intuitive, and more made that it is unlikely that there are recent develop-
practitioner-friendly system. ments that would influence the evaluation of the body
The proposed system takes into account the recom- of evidence for or against a treatment. The effective-
mendations of APA work groups (American Psycho- ness of a given treatment can be evaluated (a) based on
logical Association, 1995, 2002), suggesting that comparisons to known and quantifiable inactive control
treatment guidelines should (a) be based on broad and conditions including (i) wait list, (ii) pill placebo, and
careful consideration of the relevant empirical litera- (iii) psychological placebo or (b) by comparing alterna-
ture, (b) take into consideration the level of method- tive psychological treatments.
ological rigor and clinical sophistication of the research,
(c) take comparison conditions into account, (d) con- Evaluating the Quality of Systematic Reviews
sider available evidence regarding patient-treatment There are a number of ways to determine whether a
matching, (e) specify the outcomes the intervention is systematic review has been conducted with sufficient
intended to produce, (f) identify known patient vari- transparency and rigor to provide confidence that its
ables that influence the utility of the intervention, (g) results are comprehensive and reflect the best possible
take the setting of the treatment into account, (h) note evidence. The Cochrane Handbook (Higgins & Green,
possible adverse effects, and (i) take treatment cost into 2008) and the Preferred Reporting Items for Systematic
account. Reviews and Meta-Analyses (PRISMA; Liberati et al.,
2009) are well-respected systems for evaluation; the
STEP 1: EXAMINATION OF SYSTEMATIC RESEARCH REVIEWS Task Force will use, at least in its initial efforts, the
We propose that candidate treatments be evaluated on AMSTAR checklist (Shea, Bouter, et al., 2007; Shea,
the basis of existing (published or unpublished) quanti- Grimshaw, et al., 2007; Shea et al., 2009) as described
tative reviews by a Task Force operating under the above and shown in the online supplement. The
direction of the Committee on Science and Practice, AMSTAR checklist is not scored; therefore, there is
the group that has overseen the identification of ESTs no cutoff at which a review is considered reliable;
over the past two decades. The process of member rather, the items on the checklist will be used to
selection should be transparent, with an open nominat- inform the group’s subjective decision of when a sys-
ing process, public listing of member names, and orga- tematic review is of sufficient quality and reported suf-
nizational measures to ensure diversity of member ficiently well to be used by the Division 12 Task Force
backgrounds. Following the recommendations of the (Table 3).

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 10


used to evaluate RCTs. Consistent with current
Table 3. Summary of the proposed Division 12 procedure for evaluating
empirically supported treatments approaches to evidence synthesis, however, we do not
recommend that evidence from only single-subject
Step Process Details designs be used as the basis of recommendations, which
Step 1 Systematic review • Treatment is nominated should rely largely on synthesis of data from larger clin-
• Existing systematic review is evaluated
according to:
ical trials.
o PICOTS (population, intervention,
comparison, outcomes, timeline, Evaluation of Relevance to Clinical Practice
setting)
o Risk of bias (low, unclear, high) An important component for ensuring the external
Step 2 Committee-based
evidence review
• GRADE (Grading of Recommendations
Assessment, Development, and validity of systematic reviews is the definition of struc-
Evaluation) recommendation by tured review questions. The mnemonic PICOTS refers
committee: very strong, strong, weak
to the explicit delineation of trials that are eligible for
consideration in the systematic review based on the
In some cases, a systematic review may combine tri- population that received the treatment (P); the inter-
als in which the treatments differed from each other in vention delivered (I); the comparison, such as another
one or more ways, such as the manner in which an active treatment or an inactive control (C); outcomes
intervention was applied, the characteristics of the pro- that are assessed (O); the timeline (e.g., 12 weeks,
vider, or the length of treatment or follow-up. In some 6 months, or long-term) (T); and setting of treatment,
cases, elements of treatment might be added to or sub- for example, inpatient psychiatry versus outpatient
tracted from certain studies. Such modifications across community clinics (S). To ensure external validity or
studies create a dilemma for the reviewers, who must generalizability, the Task Force should insist that a clear
determine whether there is sufficient similarity among PICOTS statement is included in the systematic
the studies to consider them all to be testing the same review, clearly defining the population of interest, the
essential treatment. Some degree of clinical heterogene- intervention, the comparisons considered, outcomes
ity must be anticipated and allowed, or else there examined, and timing of outcome assessment.
would be very few meaningful groupings of studies for In addition, the systematic review should evaluate
review. However, the degree to which there is clinical the degree to which trials included in the review took
heterogeneity that negatively impacts the interpretabil- steps to ensure treatment fidelity. Bellg et al. (2004)
ity of a single quantitative result must be carefully con- provide a thorough discussion of elements of treatment
sidered before a meta-analysis is considered by the Task fidelity and steps that can be taken to enhance treat-
Force (Ioannidis, 2008). A standard part of the review ment fidelity in trials of behavior change studies. In the
should include agreement among reviewers that all of context of systematic reviews, there are no standard
the selected studies are similar enough that they can be instruments for assessing steps taken to ensure treatment
considered to reflect a single treatment. fidelity in included trials. Elements that were included
The use of systematic reviews does not preclude the in Chambless and Hollon’s (1998) original EST defini-
inclusion of single-case designs, as these designs, when tion, and that continue to be evaluated in evidence
using appropriate experimental control, can establish reviews, are therapist qualifications and training, the
causality (Horner et al., 2005) in a manner comparable use of a treatment manual, and monitoring of the
to RCTs, although the smaller number of subjects may degree to which the treatment is implemented accord-
limit the generalizability of findings. Methods have ing to the manual.
been developed for calculating effect sizes of such stud-
ies and conducting Bayesian and multilevel modeling Assessing Risk of Bias
(see Shadish, 2014, for a summary). Assessment of the The original EST criteria (Chambless et al., 1998)
quality of single-subject designs could employ pub- operationalized methodological adequacy as including
lished quality indicators (Horner et al., 2005; Smith the use of a treatment manual, a well-characterized
et al., 2007), in a manner that parallels the procedures sample, and random assignment to treatment and

EMPIRICALLY SUPPORTED TREATMENTS  TOLIN ET AL. 11


control conditions. Since these criteria were published, that the outcome analyses adequately represent the out-
however, standards for evaluating both the external and comes of the sample. Examination of selective outcome
internal validity of treatment trials have evolved sub- reporting helps identify whether important (possibly
stantially, and there are now several widely accepted nonsignificant) findings were omitted from the report
methods of determining methodological adequacy that of the study (Higgins & Green, 2008). Whether or not
should be considered. We recommend that authors of clinical trials are registered and, if so, ascertaining
systematic reviews assess validity using the Cochrane whether published outcomes are consistent with regis-
Risk of Bias Tool (Higgins et al., 2011). This tool, tered outcomes is an important step in a systematic
widely regarded as the standard for evaluating risk of review (Milette, Roseman, & Thombs, 2011; Thombs,
bias in RCTs included in systematic reviews, provides Kwakkenbos, & Coronado-Montoya, 2014).
a rating system and criteria by which individual RCTs Across all dimensions, trials are rated as high risk of
are evaluated according to the potential sources of bias bias, unclear risk of bias, or low risk of bias. Cochrane
related to (a) adequate allocation sequence generation; advocates that systematic reviews assess the potential
(b) concealment of allocation to conditions; (c) blind- influence on outcomes of each of these dimensions
ing of participants, personnel, and outcome assessors; separately and recommends against attempting to gen-
(d) incomplete outcome data; (e) selective outcome erate a single score or rating of overall bias (Higgins &
reporting; and (f) other sources of bias (see online sup- Green, 2008). Summary scores tend to confound the
plement). Adequate sequence allocation ensures that quality of reporting with the quality of trial conduct,
study participants were appropriately randomized to to assign weights to different items in ways that are dif-
study conditions. Allocation concealment means that ficult to justify, and to yield inconsistent and unpre-
the random assignment is implemented in a way that dictable associations with intervention effect estimates
cannot be predicted by participants or key study per- (Greenland & O’Rourke, 2001; Juni, Witschi, Bloch,
sonnel. Blinding of key study personnel and outcome & Egger, 1999).
assessors ensures that those personnel in a position to Both individual trials and systematic reviews can be
affect outcome data are unaware of participants’ study judged as having low, unclear, or high risk of bias (see
condition. Blinding of participants indicates that the online supplement). A systematic review would be
participants themselves are unaware of study condition. graded to be at low risk of bias when the conclusions
Blinding of participants is not commonly used (and is from the review are based on evidence judged to be at
often not possible) in trials of psychotherapy. In many low risk of bias, according to the GRADE dimensions
cases, such as when a treatment group is compared to a described above. Note that this grading system differs
nontreatment group, this would be reflected as a markedly from those originally proposed by the Divi-
methodological limitation common to studies of psy- sion 12 Task Force (e.g., Chambless et al., 1998). Two
chological treatments. However, the Cochrane system well-conducted studies are no longer considered suffi-
allows a “low risk of bias” determination on this item cient; this system would now require that the conclu-
when the outcome and outcome measurement are not sions of the systematic review are based on studies
likely to be influenced by lack of blinding, or outcome deemed to be of high quality.
assessment was blinded and the nonblinding of partici- Assessment of risk of bias requires human judgment
pants was unlikely to introduce bias. Blinding of partic- (Higgins et al., 2011), and, unfortunately, there is no
ipants, or at least to study aims and hypotheses, would quantitative algorithm that will consistently lead to reli-
be possible in comparison trials between two psycho- able and valid assessment. Thus, there will always be
logical treatments; full blinding of participants has been room for disagreement and debate about the merits of
noted in some studies of computerized cognitive bias individual studies and about the quality of research
modification training (e.g., Amir, Beard, Burns, & across studies for a given treatment. Assessment of
Bomyea, 2009). Appropriate handling of incomplete whether a particular methodological concern in a trial
(missing) outcome data due to attrition during the creates a risk of bias requires both knowledge of the
study or to exclusions from the analysis helps ensure trial methods and a judgment about whether those

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 12


methods are likely to have led to a risk of bias. The framework to guide the decision-making process, and
Cochrane Risk of Bias Tool, at least, makes the deci- to make the process as transparent as possible so that the
sion process transparent and provides accessible guid- public can understand how these judgments were made.
ance for how decisions should be made (Higgins & A number of different strategies have been
Green, 2008). employed by guideline developers to attempt to make
clear the strength of evidence and recommendations,
Additional Considerations for the Evaluation of Systematic although the most widely used system is the GRADE
Reviews and Recommendations for Implementation system (Atkins et al., 2004; Guyatt et al., 2008). The
Systematic reviews will be examined for both short- aim of the GRADE system is to rate quality of evi-
term and long-term outcomes. Long-term outcomes dence and strength of recommendations in a manner
will generally be defined as outcomes collected some that is explicit, comprehensive, transparent, and prag-
time after treatment discontinuation; however, we rec- matic. Factors that are taken into account in making
ognize that some treatments may include a low-inten- these decisions include the methodological quality of
sity “maintenance” phase that continues for a long time the evidence that supports estimates of benefits, costs,
after the more acute phase; outcomes during the main- and harms; the importance of the outcome that the
tenance phase might be appropriate for consideration as treatment improves; the magnitude of the treatment
long-term effects. Effects for both symptom reduction effect and the precision of its estimate; the burden,
and functional outcomes will be coded, relying on vali- costs, and potential risks associated with the therapy;
dated measures that are appropriate for the population and other consumer values that might be expected to
and treatment under study. Finally, the review will note influence their decision process.
whether the treatment has demonstrated effectiveness
(e.g., more diagnostically complex patients, effectiveness Using the GRADE System for Treatment Recommendations
with nonrandomized patients, effectiveness when used The GRADE system rates evidence quality as high,
by nonacademic practitioners, and utility in open- moderate, or low. Evidence is judged to be high quality
ended, flexible practice) in addition to efficacy. when reviewers can be highly confident that the true
effect lies close to that of the estimate of the effect. For
STEP 2: COMMITTEE-BASED EVIDENCE REVIEW USING THE example, evidence is judged as high quality if all of the
GRADE TOOL following apply:
The systematic review, having been graded for risk of
1. There is a wide range of studies included in the
bias, must then be translated into practical recommen-
analyses with no major limitations.
dations that will address the concerns of a broad range
2. There is little variation between studies.
of patients, presenting problems, clinicians, and clinical
3. The summary estimate has a narrow confidence
settings. As it is unlikely that any statistical algorithm
interval.
will ever be able to provide such guidance consistently,
the process of recommending treatments must ulti- Evidence is judged to be moderate quality when
mately be a product of human judgment. The system- reviewers conclude that the true effect is likely to be
atic review will provide raw information about the close to the estimate of the effect, but there is a possi-
strength of clinical effects, as well as the risk of bias of bility that it is substantially different. For example, evi-
the studies evaluating the treatment. In addition to dence is judged as moderate quality if any of the
those basic assessments, a determination of whether psy- following applies:
chological treatments should be recommended to clini-
1. There are only a few studies, and some have lim-
cians, consumers, and other stakeholders must be based
itations but not major flaws.
on the strength and quality of existing evidence and a
2. There is some variation between studies, or the
comparison of the likely benefits versus burden, cost,
confidence interval of the summary estimate is
and potential harms of the treatment. The best strategy
wide.
one can use in such a situation is to provide a clear

EMPIRICALLY SUPPORTED TREATMENTS  TOLIN ET AL. 13


Evidence is judged to be low quality when the true
Table 4. Modified GRADE recommendations for psychological
effect may be substantially different from the estimate treatments based on systematic reviews (adapted from Guyatt et al.,
2008)
of the effect. For example, evidence is judged as low
quality if any of the following applies:
Recommendation
1. The studies have major flaws.
Very strong All of the following:
2. There is important variation between studies. recommendation • There is high-quality evidence that the treatment
produces a clinically meaningful effect on
3. The confidence interval of the summary estimate symptoms of the disorder being treated.
is very wide. • There is high-quality evidence that the treatment
produces a clinically meaningful effect on
functional outcomes.
In the GRADE system to determine quality of evi-
• There is high-quality evidence that the treatment
produces a clinically meaningful effect on
dence, evidence based on RCTs begins as high-quality
symptoms and/or functional outcomes at least
evidence, but such evidence could be downgraded 3 months after treatment discontinuation.
based on concerns such as study limitation, inconsis- • At least one well-conducted study has
demonstrated effectiveness in nonresearch
tency of results, indirectness of evidence, imprecision, settings.
Strong At least one of the following:
and reporting bias. Other types of studies begin as recommendation • There is moderate- to high-quality evidence
that the treatment produces a clinically
lower-quality evidence, but may be upgraded if mer-
meaningful effect on symptoms of the disorder
ited on a case-by-case basis. being treated.
The GRADE process typically results in a weak or a • There is moderate- to high-quality evidence
that the treatment produces a clinically
strong recommendation. For the psychotherapy evalua- meaningful effect on functional outcomes.
Weak Any of the following:
tion, we suggest that the GRADE system be modified recommendation • There is only low- or very low-quality evidence
that the treatment produces a clinically
to include a third category. A three-tier system would meaningful effect on symptoms of the disorder
better correspond to the current reality that few exist- being treated.
ing trials of psychological treatments have assessed • There is only low- or very low-quality evidence
that the treatment produces a clinically
functional and disability outcomes, despite the fact that meaningful effect on symptoms of the disorder
being treated as well as on functional outcomes.
such outcomes may be more important than symptom • There is moderate- to high-quality evidence that
the effect of the treatment, although statistically
outcomes. Thus, based on evidence from the submitted significant, may not be of a magnitude that is
systematic review and meta-analysis, we recommend clinically meaningful.

that the Task Force use an adapted GRADE process


and make one of three recommendations for the
empirical support of a psychological treatment: weak, ation) interval of not less than 3 months, with relatively
strong, or very strong. Treatments not meriting at least a little risk of harm and reasonable resource use, and
weak recommendation (e.g., no systematic review is there is at least one well-conducted study that has
available, or the outcomes of treatment studies do not demonstrated effectiveness of that treatment in nonre-
satisfy the minimal criteria for a weak recommenda- search settings (e.g., settings that provide routine clini-
tion) will be described simply as lacking sufficient evi- cal care, such as community mental health centers,
dence of efficacy. The criteria for these inpatient or outpatient treatment facilities, health main-
recommendations are shown in Table 4. tenance organizations, or private practices). We recog-
The GRADE recommendations are hierarchical; nize that this level of recommendation may be largely
treatments are ranked according to the highest level of aspirational at this time, although some treatments will
recommendation obtained. A very strong recommenda- merit a very strong recommendation at present. In other
tion is made when there is high-quality evidence that cases, the establishment of this level of recommenda-
the treatment produces a clinically meaningful effect on tion sets a bar for the planning of future treatment out-
symptoms of the disorder being treated, as well as a come studies.
clinically meaningful effect on functional outcomes, A strong recommendation, which will be more read-
with significant improvement noted at immediate ily attainable for many treatments at this time, requires
posttreatment and at a follow-up (treatment discontinu- the presence of moderate- to high-quality evidence

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 14


that the treatment produces a clinically meaningful the board can and should consider whether those asser-
effect on symptoms of the disorder being treated, or on tions are supported. Single-case designs are often par-
functional outcomes, again, with a clear positive bal- ticularly useful for such purposes. Such consideration
ance in consideration of benefits versus possible harms would help reduce the risk of allocating resources to
and resource use. Evidence of external effectiveness of elements of treatment that are inert or worse. Below,
generalizability is not required for this level of recom- we describe a longer-term plan for identifying active
mendation. therapeutic ingredients.
Weak recommendations, which are not necessarily Although most ESTs appear effective when applied
intended to discourage the use of treatments, are made to minority groups with specific disorders (e.g., Mir-
when there is only low- or very low-quality evidence anda et al., 2005), it cannot be automatically assumed
that the treatment produces a clinically meaningful that an EST that is effective for the majority population
effect on symptoms of the disorder being treated and/ will be equally effective among minority groups.
or functional outcomes, or when the evidence suggests Therefore, it is important that research on treatment
that the effects of the treatment may not be clinically efficacy and effectiveness attend to the generalizability
meaningful (although they may be statistically signifi- of effects across diverse populations. At this time, it
cant). In the case of a weak recommendation, it is not would be difficult to require a documentation of effi-
clear that gains from treatment warrant the resources cacy or effectiveness across minority groups, given the
involved, and patient preferences will be central in many nuances associated with assessing, treating, and
determining whether engaging in the treatment is the modifying treatments for different populations. Further-
best possible decision. more, it would likely be counterproductive to identify
a treatment as appropriate for minority populations
Taking Contextual Factors Into Account unless all such populations had been studied. We there-
It would be prohibitive, on several levels, for the Task fore recommend that nominators of treatments identify
Force to explicitly require comparative effectiveness specific studies demonstrating efficacy or effectiveness
analyses of all possible treatments or analyses of cost-ef- within a particular underrepresented group and that
fectiveness. However, when there are obvious concerns, such findings be highlighted in the presentation of the
the committee should be able to incorporate them into treatment and by the Task Force when recommenda-
the recommendation. This might occur, for instance, in tions are made.
contextualizing the clinical meaningfulness of a treat-
ment effect when there are other psychological treat- CONCLUSIONS AND FUTURE DIRECTIONS
ments that have well-documented and much larger The EST movement has, overall, provided positive
effects. Similarly, if a treatment generates an effect that direction for clinical psychology. However, several
is similar to other well-studied treatments, but requires valid criticisms of the process have been offered. In this
a very large number of sessions or length of time to article, we propose a new approach for identifying
generate the same effect at a much higher cost, then the ESTs and for recommending specific psychological
Task Force may take this into consideration. treatments to practitioners, consumers, and other stake-
The Task Force may take into account the pur- holders. Twenty years after the original Division 12
ported mechanism or active ingredient(s) of treatment Task Force report, such an update is long overdue.
and may upgrade or downgrade the recommendation Although clinical psychology once led the way in artic-
based on the quality of evidence supporting that mech- ulating how a treatment should be determined to be
anism or ingredient(s). It is conceptually difficult to empirically supported (and although many other
standardize this consideration into the criteria, as healthcare fields still look to those original criteria for
admittedly the mechanisms of many efficacious treat- guidance), advances in the field of evidence-based
ments are unclear. However, to the extent that a given medicine have rendered the old criteria obsolete.
treatment is based on a specific purported mechanism In this article, we propose a two-stage process by
or relies strongly on a particular treatment ingredient, which the Society of Clinical Psychology/Division 12

EMPIRICALLY SUPPORTED TREATMENTS  TOLIN ET AL. 15


may help bridge the gap between the current, outdated best available research evidence, nor should all three
EST criteria and the planned treatment guidelines from factors be considered an “either-or” selection. That is,
APA. The aim is to begin to evaluate treatments in a skillful EBP does not involve selecting a treatment
manner that parallels and will support the methods pro- based on research evidence or on the clinician’s exper-
posed by APA, but in a manner that lends itself to tise or on patient characteristics. Rather, the best avail-
more rapid dissemination of scientific findings to those able research evidence (including ESTs) forms the basis
who would benefit most from them. We propose that of clinical judgment, with additional selection and
the process of identifying one or two positive studies modification based on clinical expertise and patient
for a treatment ceases, and that in its place we begin characteristics. The modifications to how ESTs are
evaluating systematic reviews of the treatment outcome evaluated and disseminated proposed in this article are
literature, weighting them according to the risk of bias hoped to help EBP practitioners reach appropriate con-
in the studies contributing the review. We further rec- clusions based on the best available clinical science.
ommend that instead of labeling treatments as “well
established” or “probably efficacious,” as is currently REFERENCES
done under the current system, we translate the American Psychiatric Association. (2009). Practice guideline for
research findings into clear recommendations of very the treatment of patients with panic disorder (2nd ed.).
strong, strong, or weak, using well-established, widely Washington, DC: Author.
accepted, and transparent grading guidelines. These American Psychiatric Association. (2010). Practice guideline for
the treatment of patients with major depressive disorder (3rd
steps, which can be implemented immediately, will
ed.). Washington, DC: Author.
greatly improve the quality of information that is dis-
American Psychiatric Association. (2013). Diagnostic and
seminated.
statistical manual of mental disorders (5th ed.). Washington,
As mentioned earlier, the APA Presidential Task DC: Author.
Force on Evidence-Based Practice (2006) defines EBP American Psychological Association. (1995). Template for
as consisting of three components of information: best developing guidelines: Interventions for mental disorders and
available research evidence, clinical expertise, and psychosocial aspects of physical disorders. Washington, DC:
patient characteristics. In our view, these three compo- Author.
nents play different critical roles in clinical decision- American Psychological Association. (2002). Criteria for
making (e.g., Tolin, 2014), in which the best available evaluating treatment guidelines. American Psychologist, 57,
research evidence forms the basis of clinical decisions 1052–1059. doi:10.1037//0003-066X.57.12.1052
and is interpreted, adjusted, and implemented through Amir, N., Beard, C., Burns, M., & Bomyea, J. (2009). Attention
modification program in individuals with generalized anxiety
clinical expertise and patient characteristics. A skilled
disorder. Journal of Abnormal Psychology, 118(1), 28–33.
evidence-based practitioner will first identify the EST
doi:10.1037/a0012589
that most closely matches the concerns presented by a
Andreasen, N. C., Carpenter, W. T., Jr., Kane, J. M., Lasser,
given patient. One EST is selected over the others by R. A., Marder, S. R., & Weinberger, D. R. (2005).
examining the available research evidence that shows Remission in schizophrenia: Proposed criteria and
the strength of the treatment and the quality of evi- rationale for consensus. American Journal of Psychiatry, 162,
dence. ESTs may also need to be adapted or aug- 441–449. doi:10.1176/appi.ajp.162.3.441
mented, based on patient characteristics such as APA Presidential Task Force on Evidence-Based Practice.
comorbid psychopathology, situational factors, or cul- (2006). Evidence-based practice in psychology. American
tural and demographic features. Such selection, adapta- Psychologist, 61, 271–285. doi:10.1037/0003-066X.61.4.271
tion, and augmentation procedures derive from the Arean, P. A., & Kraemer, H. C. (2013). High-quality
expertise of the clinician, guided wherever possible by psychotherapy research. New York, NY: Oxford University
Press.
the best scientific evidence (with the understanding
Atkins, D., Eccles, M., Flottorp, S., Guyatt, G. H., Henry, D.,
that such research will rarely line up perfectly with the
Hill, S., . . . GRADE Working Group. (2004). Systems for
clinical problem). It is noted in this model that clinical
grading the quality of evidence and the strength of
expertise and patient characteristics do not trump the

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 16


recommendations I: Critical appraisal of existing Clinical Psychology, 66, 7–18. doi:10.1037/0022-
approaches: The GRADE Working Group. BMC Health 006X.66.1.7
Services Research, 4(1), 38. doi:10.1186/1472-6963-4-38 Chambless, D. L., & Ollendick, T. H. (2001). Empirically
Baker, M., & Kleijnen, J. (2000). The drive towards supported psychological interventions: Controversies and
evidence-based health care. In N. Rowland & S. Goss evidence. Annual Review of Psychology, 52, 685–716.
(Eds.), Evidence-based counseling and psychological therapies: doi:10.1146/annurev.psych.52.1.685
Research and applications (pp. 13–29). New York, NY: Cohen, J. (1988). Statistical power analysis for the behavioral sciences
Routledge. (2nd ed.). Hillsdale, NJ: Erlbaum.
Bellg, A. J., Borrelli, B., Resnick, B., Hecht, J., Minicucci, Commission on Accreditation. (2013). Guidelines and principles
D. S., & Ory, M., . . . Treatment Fidelity Workgroup of for accreditation of programs in professional psychology.
the NIH-BCC. (2004). Enhancing treatment fidelity in Washington, DC: American Psychological Association.
health behavior change studies: Best practices and Cowen, E. L. (1991). In pursuit of wellness. American
recommendations from the NIH Behavior Change Psychologist, 46, 404. doi:10.1037/0003-066X.46.4.40
Consortium. Health Psychology, 23, 443–451. doi:10.1037/ Djulbegovic, B., & Guyatt, G. H. (2014). Evidence-based
0278-6133.23.5.443 practice is not synonymous with delivery of uniform
Benish, S. G., Quintana, S., & Wampold, B. E. (2011). health care. JAMA, 312, 1293–1294. doi:10.1001/
Culturally adapted psychotherapy and the legitimacy of jama.2014.10713
myth: A direct-comparison meta-analysis. Journal of Dobson, K., & Beshai, S. (2013). The theory-practice gap in
Counseling Psychology, 58, 279–289. doi:10.1037/a0023626 cognitive behavioral therapy: Reflections and a modest
Bernardy, K., Fuber, N., Kollner, V., & Hauser, W. (2010). proposal to bridge the gap. Behavior Therapy, 44, 559–567.
Efficacy of cognitive-behavioral therapies in fibromyalgia doi:10.1016/j.beth.2013.03.002
syndrome—A systematic review and meta-analysis of Doyle, A. C., & Pollack, M. H. (2003). Establishment of
randomized controlled trials. Journal of Rheumatology, 37, remission criteria for anxiety disorders. Journal of Clinical
1991–2005. doi:10.3899/jrheum.100104 Psychiatry, 64(Suppl. 15), 40–45.
Bero, L. A. (2013). Why the Cochrane risk of bias tool Feeny, N. C., Zoellner, L. A., & Foa, E. B. (2002).
should include funding source as a standard item. Cochrane Treatment outcome for chronic PTSD among female
Database Systematic Review, 12, ED000075. doi:10.1002/ assault victims with borderline personality characteristics:
14651858.ED000075 A preliminary examination. Journal of Personality Disorders,
Beutler, L. E. (1998). Identifying empirically supported 16(1), 30–40. doi:10.1521/pedi.16.1.30.22555
treatments: What if we didn’t? Journal of Consulting and Fensterheim, H., & Raw, S. D. (1996). Psychotherapy
Clinical Psychology, 66, 113–120. doi:10.1037/0022- research is not psychotherapy practice. Clinical Psychology:
006X.66.1.113 Science and Practice, 3, 168–171. doi:10.1111/j.1468-
Blatt, S. J., & Zuroff, D. C. (2005). Empirical evaluation of 2850.1996.tb00067.x
the assumptions in identifying evidence based treatments Fonagy, P. (1999). Achieving evidence-based psychotherapy
in mental health. Clinical Psychology Review, 25, 459–486. practice: A psychodynamic perspective on the general
doi:10.1016/j.cpr.2005.03.001 acceptance of treatment manuals. Clinical Psychology: Science
Borkovec, T. D., & Castonguay, L. G. (1998). What is the and Practice, 6, 442–444. doi:10.1093/clipsy.6.4.442
scientific meaning of empirically supported therapy? Frank, E., Prien, R. F., Jarrett, R. B., Keller, M. B., Kupfer,
Journal of Consulting and Clinical Psychology, 66, 136–142. D. J., Lavori, P. W., . . . Weissman, M. M. (1991).
doi:10.1037/0022-006X.66.1.136 Conceptualization and rationale for consensus definitions of
Castonguay, L. G., & Beutler, L. E. (2006). Principles of terms in major depressive disorder: Remission, recovery,
therapeutic change that work. New York, NY: Oxford relapse, and recurrence. Archives of General Psychiatry, 48,
University Press. 851–855. doi:10.1001/archpsyc.1991.01810330075011
Chambless, D. L., Baker, M. J., Baucom, D. H., Beutler, L. Franklin, M. E., Abramowitz, J. S., Kozak, M. J., Levitt, J.
E., Calhoun, K. S., Crits-Christoph, P., . . . Woody, S. T., & Foa, E. B. (2000). Effectiveness of exposure and
(1998). Update on empirically validated therapies. II. ritual prevention for obsessive-compulsive disorder:
The Clinical Psychologist, 51(1), 3–16. Randomized compared with nonrandomized samples.
Chambless, D. L., & Hollon, S. D. (1998). Defining Journal of Consulting and Clinical Psychology, 68, 594–602.
empirically supported therapies. Journal of Consulting and doi:10.1037/0022-006X.68.4.59

EMPIRICALLY SUPPORTED TREATMENTS  TOLIN ET AL. 17


Friedman, L. S., & Richter, E. D. (2004). Relationship (2008). GRADE: An emerging consensus on rating quality
between conflicts of interest and research results. Journal of of evidence and strength of recommendations. British Medical
General Internal Medicine, 19(1), 51–56. doi:10.1111/ Journal, 336, 924–926. doi:10.1136/bmj.39489.470347.AD
j.1525-1497.2004.30617.x Hayes, S. C. (2004). Acceptance and commitment therapy,
Gaffan, E. A., Tsaousis, I., & Kemp-Wheeler, S. M. (1995). relational frame theory, and the third wave of behavioral
Researcher allegiance and meta-analysis: The case of and cognitive therapies. Behavior Therapy, 35, 639–665.
cognitive therapy for depression. Journal of Consulting and doi:10.1016/S0005-7894(04)80013-3
Clinical Psychology, 63, 966–980. doi:10.1037/0022- Henry, W. P. (1998). Science, politics, and the politics of
006X.63.6.966 science: The use and misuse of empirically validated
Gibbons, C. R., Stirman, S. W., DeRubeis, R. J., Newman, treatment research. Psychotherapy Research, 8(2), 126–140.
C. F., & Beck, A. T. (2013). Research setting versus doi:10.1093/ptr/8.2.126
clinic setting: Which produces better outcomes in Herbert, J. D. (2003). The science and practice of empirically
cognitive therapy for depression? Cognitive Therapy and supported treatments. Behavior Modification, 27, 412–430.
Research, 37, 605–612. doi:10.1007/s10608-012-9499-7 doi:10.1177/0145445503253836
Gill, T. M., & Feinstein, A. R. (1994). A critical appraisal of Higgins, J. P., Altman, D. G., Gotzsche, P. C., Juni, P.,
the quality of quality-of-life measurements. JAMA, 272, Moher, D., Oxman, A. D., . . . Cochrane Statistical
619–626. doi:10.1001/jama.1994.03520080061045 Methods Group. (2011). The Cochrane Collaboration’s
Goldfried, M. R., & Eubanks-Carter, C. (2004). On the tool for assessing risk of bias in randomised trials. British
need for a new psychotherapy research paradigm: Medical Journal, 343, d5928. doi:10.1136/bmj.d5928
Comment on Westen, Novotny, and Thompson-Brenner Higgins, J. P., & Green, S. (Eds.). (2008). Cochrane handbook
(2004). Psychological Bulletin, 130, 669–673; author reply for systematic reviews of interventions. Hoboken, NJ: Wiley-
677-683. doi:10.1037/0033-2909.130.4.669 Blackwell.
Goldfried, M. R., & Wolfe, B. E. (1996). Psychotherapy Hollon, S. D. (1999). Allegiance effects in treatment research:
practice and research: Repairing a strained alliance. A commentary. Clinical Psychology: Science and Practice, 6,
American Psychologist, 51, 1007–1016. doi:10.1037//0003- 107–112. doi:10.1093/clipsy.6.1.107
066X.51.10.1007 Hollon, S. D., Arean, P. A., Craske, M. G., Crawford, K.
Goldfried, M. R., & Wolfe, B. E. (1998). Toward a more A., Kivlahan, D. R., Magnavita, J. J., . . . Kurtzman, H.
clinically valid approach to therapy research. Journal of (2014). Development of clinical practice guidelines.
Consulting and Clinical Psychology, 66, 143–150. Annual Review of Clinical Psychology, 10, 213–241.
doi:10.1037/0022-006X.66.1.143 doi:10.1146/annurev-clinpsy-050212-185529
Gonzales, J. J., & Chambers, D. A. (2002). The tangled and Horner, R. H., Carr, E. G., Halle, J., McGee, G., Odom, S.
thorny path of science to practice: Tensions in & Wolery, M. (2005). The use of single-subject research
interpreting and applying “evidence.” Clinical Psychology: to identify evidence-based practice in special education.
Science and Practice, 9, 204–209. doi:10.1093/clipsy.9.2.204 Exceptional Child, 71(2), 165–179. doi:10.1177/
Greenland, S., & O’Rourke, K. (2001). On the bias 001440290507100203
produced by quality scores in meta-analysis, and a Hunt, S. M., & McKenna, S. P. (1993). Measuring quality of
hierarchical view of proposed solutions. Biostatistics, 2, life in psychiatry. In S. R. Walker & R. M. Rossner
463–471. doi:10.1093/biostatistics/2.4.463 (Eds.), Quality of life assessment: Key issues in the 1990s (pp.
Griner, D., & Smith, T. B. (2006). Culturally adapted mental 343–354). Boston: Kluwer Academic.
health intervention: A meta-analytic review. Psychotherapy Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S.,
(Chic), 43, 531–548. doi:10.1037/0033-3204.43.4.531 Quinn, K., . . . Wang, P. (2010). Research domain criteria
Guyatt, G. H., Gutterman, D., Baumann, M. H., Addrizzo- (RDoC): Toward a new classification framework for
Harris, D., Hylek, E. M., Phillips, B., . . . Schunemann, research on mental disorders. American Journal of Psychiatry,
H. (2006). Grading strength of recommendations and 167, 748–751. doi:10.1176/appi.ajp.2010.09091379
quality of evidence in clinical guidelines: Report from an Institute of Medicine. (2001). Crossing the quality chasm: A
American College of Chest Physicians task force. Chest, new health system for the 21st century. Washington, DC:
129(1), 174–181. doi:10.1378/chest.129.1.174 National Academies Press.
Guyatt, G. H., Oxman, A. D., Vist, G. E., Kunz, R., Falck- Institute of Medicine. (2011a). Clinical practice guidelines we can
Ytter, Y., Alonso-Coello, P., . . . GRADE Working Group. trust. Washington, DC: National Academies Press.

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 18


Institute of Medicine. (2011b). Finding what works in health Treatment manual. Behavior Modification, 25, 255–286.
care: Standards for systematic reviews. Washington, DC: doi:10.1177/0145445501252005
National Academies Press. Leon, A. C., Solomon, D. A., Mueller, T. I., Turvey, C. L.,
Ioannidis, J. P. (2008). Interpretation of tests of heterogeneity and Endicott, J., & Keller, M. B. (1999). The Range of
bias in meta-analysis. Journal of Evaluation in Clinical Practice, Impaired Functioning Tool (LIFE-RIFT): A brief
14, 951–957. doi:10.1111/j.1365-2753.2008.00986.x measure of functional impairment. Psychological Medicine,
Jacobson, N. S., Follette, W. C., & Revenstorf, D. (1984). 29, 869–878. doi:10.1017/S0033291799008570
Toward a standard definition of clinically significant Levant, R. F. (2004). The empirically validated treatments
change. Behavior Therapy, 17, 308–311. doi:10.1016/ movement: A practitioner/educator perspective. Clinical
S0005-7894(86)80061-2 Psychology: Science and Practice, 11, 219–224. doi:10.1093/
Juni, P., Witschi, A., Bloch, R., & Egger, M. (1999). The clipsy.bph075
hazards of scoring the quality of clinical trials for meta- Lewinsohn, P. M., Biglan, A., & Zeiss, A. M. (1976).
analysis. JAMA, 282, 1054–1060. doi:10.1001/ Behavioral treatment for depression. In P. O. Davidson
jama.282.11.1054 (Ed.), Behavioral management of anxiety, depression, and pain
Keltner, J. R., Vaida, F., Ellis, R. J., Moeller-Bertram, T., (pp. 91–146). New York, NY: Brunner/Mazel.
Fitzsimmons, C., Duarte, N. A., . . . CHARTER Group. Lexchin, J., Bero, L. A., Djulbegovic, B., & Clark, O.
(2012). Health-related quality of life ‘well-being’ in HIV (2003). Pharmaceutical industry sponsorship and research
distal neuropathic pain is more strongly associated with outcome and quality: Systematic review. BMJ, 326, 1167–
depression severity than with pain intensity. Psychosomatics, 1170. doi:10.1136/bmj.326.7400.1167
53, 380–386. doi:10.1016/j.psym.2012.05.002 Leykin, Y., & DeRubeis, R. J. (2009). Allegiance in
Kocsis, J. H., Gerber, A. J., Milrod, B., Roose, S. P., Barber, psychotherapy outcome research: Separating association
J., Thase, M. E., . . . Leon, A. C. (2010). A new scale for from bias. Clinical Psychology: Science and Practice, 16, 54–
assessing the quality of randomized clinical trials of 65. doi:10.1111/j.1468-2850.2009.01143.x
psychotherapy. Comprehensive Psychiatry, 51, 319–324. Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C.,
doi:10.1016/j.comppsych.2009.07.001 Gotzsche, P. C., Ioannidis, J. P., . . . Moher, D. (2009).
Kushner, S. C., Quilty, L. C., McBride, C. & Bagby, R. M. The PRISMA statement for reporting systematic reviews
(2009). A comparison of depressed patients in randomized and meta-analyses of studies that evaluate health care
versus nonrandomized trials of antidepressant medication interventions: Explanation and elaboration. PLoS Medicine,
and psychotherapy. Depression and Anxiety, 26, 666–673. 6(7), e1000100. doi:10.1371/journal.pmed.1000100
doi:10.1002/da.20566 Lock, J., Le Grange, D., Agras, W. S., Moye, A., Bryson, S.
Lambert, M. J., & Bailey, R. J. (2012). Measures of clinically W., & Jo, B. (2010). Randomized clinical trial comparing
significant change. In H. Cooper, P. M. Camic, D. L. family-based treatment with adolescent-focused individual
Long, A. T. Panter, D. Rindskopf, & K. J. Sher (Eds.), therapy for adolescents with anorexia nervosa. Archives of
APA handbook of research methods in psychology, Vol. 3: Data General Psychiatry, 67, 1025–1032. doi:10.1001/
analysis and research publication, (pp. 147–160). Washington, archgenpsychiatry.2010.128
DC: American Psychological Association. Lohr, J. M., Tolin, D. F., & Lilienfeld, S. O. (1998). Efficacy
Lambert, M. J., & Barley, D. E. (2001). Research summary of eye movement desensitization and reprocessing:
on the therapeutic relationship and psychotherapy Implications for behavior therapy. Behavior Therapy, 29,
outcome. Psychotherapy, 38, 357–361. doi:10.1037/0033- 123–156. doi:10.1016/S0005-7894(98)80035-X
3204.38.4.357 van Loon, A., van Schaik, A., Dekker, J., & Beekman, A.
Lambert, M. J., Hansen, N. B., Umphress, V., Lunnen, K. (2013). Bridging the gap for ethnic minority adult
M., Okiishi, J., Burlingame, G. M., . . . Reisinger, C. R. outpatients with depression and anxiety disorders by
(1996). Administration and scoring manual for the Outcome culturally adapted treatments. Journal of Affective Disorders,
Questionnaire (OQ-45.2). Wilmington, DE: American 147(1–3), 9–16. doi:10.1016/j.jad.2012.12.014
Professional Credentialing Services. Luborsky, L., Diguer, L., Seligman, D. A., Rosenthal, R.,
Layard, R., & Clark, D. M. (2014). Thrive: The power of Krause, E. D., Johnson, S., . . . Schweizer, E. (1999). The
evidence-based psychological therapies. London, UK: Penguin. researcher’s own therapy allegiances: A “wild card” in
Lejuez, C. W., Hopko, D. R., & Hopko, S. D. (2001). A comparisons of treatment efficacy. Clinical Psychology:
brief behavioral activation treatment for depression: Science and Practice, 6, 95–106. doi:10.1093/clipsy.6.1.95

EMPIRICALLY SUPPORTED TREATMENTS  TOLIN ET AL. 19


Lundh, A., Sismondo, S., Lexchin, J., Busuioc, O. A., & O’Farrell, T. J., Cutter, H. S. G., Choquette, K. A., Floyd,
Bero, L. (2012). Industry sponsorship and research F. J., & Bayog, R. D. (1992). Behavioral marital
outcome. Cochrane Database Systematic Review, 12, therapy for male alcoholics: Marital and drinking
MR000033. doi:10.1002/14651858.MR000033.pub2 adjustment during the two years after treatment. Behavior
Martell, C. R., Addis, M. E., & Jacobson, N. S. (2001). Therapy, 23, 529–549. doi:10.1016/S0005-7894(05)
Depression in context: Strategies for guided action. New York, 80220-5
NY: W. W. Norton. Parker, G., Parker, I., Brotchie, H., & Stuart, S. (2006).
McIntyre, R. S., Fallu, A., & Konarski, J. Z. (2006). Interpersonal psychotherapy for depression? The need to
Measurable outcomes in psychiatric disorders: Remission define its ecological niche. Journal of Affective Disorders, 95
as a marker of wellness. Clinical Therapeutics, 28, 1882– (1–3), 1–11. doi:10.1016/j.jad.2006.03.019
1891. doi:10.1016/j.clinthera.2006.11.007 Perlis, R. H., Perlis, C. S., Wu, Y., Hwang, C., Joseph, M.,
Milette, K., Roseman, M., & Thombs, B. D. (2011). & Nierenberg, A. A. (2005). Industry sponsorship and
Transparency of outcome reporting and trial registration financial conflict of interest in the reporting of clinical
of randomized controlled trials in top psychosomatic and trials in psychiatry. American Journal of Psychiatry, 162,
behavioral health journals: A systematic review. Journal of 1957–1960. doi:10.1176/appi.ajp.162.10.1957
Psychosomatic Research, 70, 205–217. doi:10.1016/ Perri, M. G., Sears, S. F., Jr., & Clark, J. E. (1993). Strategies
j.jpsychores.2010.09.015 for improving maintenance of weight loss: Toward a
Milrod, B., Leon, A. C., Busch, F., Rudden, M., Schwalberg, continuous care model of obesity management. Diabetes
M., Clarkin, J., . . . Shear, M. K. (2007). A randomized Care, 16(1), 200–209. doi:10.2337/diacare.16.1.200
controlled clinical trial of psychoanalytic psychotherapy for Petry, N. M., Martin, B., Cooney, J. L., & Kranzler, H. R.
panic disorder. American Journal of Psychiatry, 164, 265–272. (2000). Give them prizes, and they will come:
doi:10.1176/appi.ajp.164.2.265 Contingency management for treatment of alcohol
Miranda, J., Bernal, G., Lau, A., Kohn, L., Hwang, W. C., dependence. Journal of Consulting and Clinical Psychology,
& LaFromboise, T. (2005). State of the science on 68, 250–257. doi:10.1037//0022-006X.68.2.250
psychosocial interventions for ethnic minorities. Annual Riehm, K. E., Azar, M., & Thombs, B. D. (2015).
Review of Clinical Psychology, 1, 113–142. doi:10.1146/ Transparency of outcome reporting and trial registration
annurev.clinpsy.1.102803.143822 of randomized controlled trials in top psychosomatic and
Moher, D., Schulz, K. F., & Altman, D. G. (2001). The behavioral health journals: A 5-year follow-up. Journal of
CONSORT statement: Revised recommendations for Psychosomatic Research, 79(1), 1–12. doi:10.1016/
improving the quality of reports of parallel group j.jpsychores.2015.04.010
randomized trials. BMC Medical Research Methodology, 1, 2. Roseman, M., Milette, K., Bero, L. A., Coyne, J. C.,
doi:10.7326/0003-4819-134-8-200104170-00011 Lexchin, J., Turner, E. H., & Thombs, B. D. (2011).
Muldoon, M. F., Barger, S. D., Flory, J. D., & Manuck, S. Reporting of conflicts of interest in meta-analyses of trials
B. (1998). What are quality of life measurements of pharmacological treatments. JAMA, 305, 1008–1017.
measuring? British Medical Journal, 316, 542–545. doi:10.1001/jama.2011.257
doi:10.1136/bmj.316.7130.542 Roseman, M., Turner, E. H., Lexchin, J., Coyne, J. C.,
Mundt, J. C., Marks, I. M., Shear, M. K., & Greist, J. H. Bero, L. A., & Thombs, B. D. (2012). Reporting of
(2002). The Work and Social Adjustment Scale: A simple conflicts of interest from drug trials in Cochrane reviews:
measure of impairment in functioning. British Journal of Cross sectional study. British Medical Journal, 345, e5155.
Psychiatry, 180, 461–464. doi:10.1192/bjp.180.5.46 doi:10.1136/bmj.e5155
National Institute for Clinical Excellence. (2011). Generalised Rosen, G. M., & Davison, G. C. (2003). Psychology should
anxiety disorder and panic disorder (with or without agoraphobia) list empirically supported principles of change (ESPs) and
in adults: Management in primary, secondary and community not credential trademarked therapies or other treatment
care (National Clinical Practice Guideline number CG113). packages. Behavior Modification, 27, 300–312. doi:10.1177/
London: British Psychological Society and Royal College 0145445503027003003
of Psychiatrists. Ruzek, J. I., Karlin, B. E., & Zeiss, A. M. (2012).
Norcross, J. C. (1999). Collegially validated limitations of Implementation of evidence-based psychological
empirically validated treatments. Clinical Psychology: Science treatments in the Veterans Health Administration. In R.
and Practice, 6, 472–476. doi:10.1093/clipsy.6.4.472 K. McHugh & D. H. Barlow (Eds.), Dissemination of

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 20


evidence-based psychological treatments (pp. 78–96). New studies on psychosocial interventions in autism. Journal of
York, NY: Oxford University Press. Autism and Developmental Disorders, 37, 354–366.
Savovic, J., Jones, H. E., Altman, D. G., Harris, R. J., Juni, doi:10.1007/s10803-006-0173-3
P., Pildal, J., . . . Sterne, J. A. (2012). Influence of Stirman, S. W., DeRubeis, R. J., Crits-Christoph, P., &
reported study design characteristics on intervention effect Rothman, A. (2005). Can the randomized controlled trial
estimates from randomized, controlled trials. Annals of literature generalize to nonrandomized patients? Journal of
Internal Medicine, 157, 429–438. doi:10.7326/0003-4819- Consulting and Clinical Psychology, 73, 127–135.
157-6-201209180-00537 doi:10.1037/0022-006X.73.1.127
Schneier, F. R., Heckelman, L. R., Garfinkel, R., Campeas, Task Force on Promotion and Dissemination of
R., Fallon, B. A., Gitow, A., . . . Liebowitz, M. R. Psychological Procedures. (1993). Training in and
(1994). Functional impairment in social phobia. Journal of dissemination of empirically-validated psychological
Clinical Psychiatry, 55, 322–331. treatments: Report and recommendation. The Clinical
Seligman, M. E. (1995). The effectiveness of psychotherapy: Psychologist, 48, 3–23.
The Consumer Reports Study. American Psychologist, 50, Thoma, N. C., McKay, D., Gerber, A. J., Milrod, B. L.,
965–974. doi:10.1037/0003-066X.50.12.965 Edwards, A. R., & Kocsis, J. H. (2012). A quality-based
Seligman, M. E. (1996). Science as an ally of practice. review of randomized controlled trials of cognitive-
American Psychologist, 51, 1072–1079. doi:10.1037/0003- behavioral therapy for depression: An assessment and
066X.51.10.1072 metaregression. American Journal of Psychiatry, 169, 22–30.
Shadish, W. R. (2014). Statistical analyses of single-case doi:10.1176/appi.ajp.2011.11030433
designs: The shape of things to come. Current Directions in Thombs, B. D., Kwakkenbos, L., & Coronado-Montoya, S.
Psychological Science, 23, 139–146. doi:10.1177/ (2014). Trial registration in rheumatology: The next step.
0963721414524773 Arthritis Care and Research, 66, 1435–1437. doi:10.1002/
Shea, B. J., Bouter, L. M., Peterson, J., Boers, M., acr.22335
Andersson, N., Ortiz, Z., . . . Grimshaw, J. M. (2007). Tolin, D. F. (2014). Evidence-based practice: Three-legged
External validation of a measurement tool to assess stool or filter system? The Clinical Psychologist, 67(3),
systematic reviews (AMSTAR). PLoS ONE, 2, e1350. 1–3.
doi:10.1371/journal.pone.0001350 Vatne, S., & Bjorkly, S. (2008). Empirical evidence for using
Shea, B. J., Grimshaw, J. M., Wells, G. A., Boers, M., subjective quality of life as an outcome variable in clinical
Andersson, N., Hamel, C., . . . Bouter, L. M. (2007). studies: A meta-analysis of correlates and predictors in
Development of AMSTAR: A measurement tool to assess persons with a major mental disorder living in the
the methodological quality of systematic reviews. BMC community. Clinical Psychology Review, 28, 869–889.
Medical Research Methodology, 7, 10. doi:10.1186/1471-2288- doi:10.1016/j.cpr.2008.01.001
7-10 Veterans Health Administration, Department of Defense.
Shea, B. J., Hamel, C., Wells, G. A., Bouter, L. M., (2004). VA/DoD clinical practice guideline for the
Kristjansson, E., Grimshaw, J., . . . Boers, M. (2009). management of post-traumatic stress. Version 1.0.
AMSTAR is a reliable and valid measurement tool to Retrieved January 2004.
assess the methodological quality of systematic reviews. Veterans Health Administration, Department of Defense.
Journal of Clinical Epidemiology, 62, 1013–1020. (2009). VA/DoD clinical practice guideline for the
doi:10.1016/j.jclinepi.2008.10.009 management of major depressive disorder. Version 2.0.
Sheehan, D. V. (2008). Sheehan Disability Scale. In A. Rush, Retrieved June 2015.
M. First, & D. Blacker, (Eds.), Handbook of psychiatric Wachtel, P. L. (2010). Beyond “ESTs:” Problematic
measures (2nd. ed., pp. 100–102). Washington, DC: assumptions in the pursuit of evidence-based practice.
American Psychiatric Publishing. Psychoanalytic Psychology, 27, 251–272. doi:10.1037/
Simpson, H. B., Huppert, J. D., Petkova, E., Foa, E. B., & a0020532
Liebowitz, M. R. (2006). Response versus remission in Westen, D., Novotny, C. M., & Thompson-Brenner, H.
obsessive-compulsive disorder. Journal of Clinical Psychiatry, (2004). The empirical status of empirically supported
67, 269–276. doi:10.4088/jcp.v67n0214 psychotherapies: Assumptions, findings, and reporting in
Smith, T., Scahill, L., Dawson, G., Guthrie, D., Lord, C., controlled clinical trials. Psychological Bulletin, 130, 631–
Odom, S., . . . Wagner, A. (2007). Designing research 663. doi:10.1037/0033-2909.130.4.63

EMPIRICALLY SUPPORTED TREATMENTS  TOLIN ET AL. 21


Wood, L., Egger, M., Gluud, L. L., Schulz, K. F., Juni, P., Received April 4, 2015; revised June 15, 2015; accepted June
Altman, D. G., . . . Sterne, J. A. (2008). Empirical 19, 2015.
evidence of bias in treatment effect estimates in controlled
trials with different interventions and outcomes: Meta- SUPPORTING INFORMATION
epidemiological study. British Medical Journal, 336, 601– Additional Supporting Information may be found in
605. doi:10.1136/bmj.39465.451748.AD
the online version of this article:
Yeomans, F. E., Levy, K. N., & Caligor, E. (2013).
Transference-focused psychotherapy. Psychotherapy (Chic),
Table S1. Domains for determining risk of bias in
50, 449–453. doi:10.1037/a0033417
systematic reviews of psychotherapy outcome studies.
Zalta, A. K. (2015). Psychological mechanisms of
effective cognitive-behavioral treatments for PTSD. Table S2. Summary assessments of risk of bias.
Current Psychiatry Reports, 17, 560. doi:10.1007/s11920- Table S3. Scoring criteria for the Assessment of
015-0560-6 Multiple Systematic Reviews System.

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 22

All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.

You might also like