Robey Etal 1999
Robey Etal 1999
6, 445±473
Review
Single-subject clinical-outcome research:
designs, data, eåect sizes, and analyses
Abstract
In the last 20 years, single-subject research designs have become important
forms of aphasia-treatment research for assessing the eåectiveness of treatment
on a subject-by-subject (or patient-by-patient) basis. In that time, several
important developments in the statistical literature centring on the reliability
and validity of single-subject research have occurred. This work assesses the
state of aphasia-treatment single-subject research in the context of that
scholarship and proposes recommendations for future applications through a
tutorial-like presentation. This paper details the analysis of published single
subject results and proposes recommendations concerning future applications
of single-subject designs. The work focuses on four domains : designs, data,
eåect sizes, and analyses. The ®ndings indicate that aphasia-treatment single-
subject studies, which are well designed for the most part, yield a short series
of autocorrelated data manifesting generally large treatment eåects. However,
only one analysis satisfactorily controlled Type I and Type II errors under
typical clinical-aphasiology applications. That procedure, ITSACORR, is
easily accomplished and it expresses outcome in familiar terms. To facilitate
understanding, the review promotes a hands-on understanding of the various
analysis options through worked examples and clari®es the (in)appropriateness
of each procedure for clinical applications. Although the focus of the work is
treatment for aphasia, the central thesis has general application across disorder
categories.
Introduction
Communication disorders scientists and practitioners currently experience a
transition of great potential. The movement is toward providing evidence arising
out of broadly-accepted forms of experimentation for testing that treatments are
eåective, and away from idiosyncratic and less eåective forms of experimentation
and evidence (Robey and Schultz 1998). The outcome of the transition will be
acceptance by; (a) the public, (b) those in¯uential in the creation of public policy,
and (c) public and private reimbursers of therapy services, that treatments for
communication disorders are demonstrably eåective as evaluated throughstringent
Address correspondence to : Randall R. Robey, Communication Disorders Program, University of
Virginia, Suite 202, 2205 Fontaine Avenue, Charlottsville, VA 22903, USA.
domains: (a) design (e.g. What types of design should (not) be used to assess clinical
outcomes of behavioural treatments ?); (b) data (e.g. Are behavioural single-subject
$
data autocorrelated and, if so, does the observed degree of autocorrelation aåect
the validity or reliability of analysis decisions?) ; (c) eåect size (e.g. Can the results
of a single-subject design be quanti®ed to yield an estimate of eåect size and so
make the application of meta-analysis procedures possible?); and (d) analysis (e.g.
Is the visual analysis of single-subject data valid and reliable ? Are statistical
analyses of single-subject data valid and reliable ?). Scholarship addressing these
issues in the statistical literature has direct application in aphasia-treatment single-
subject experimentation and provides answers for timely and focused questions in
each of the four domains.
E Given that treatments for aphasia impart a carry-over eåect when they are
%
successful, what single-subject research designs are prescribed and proscribed
&
by this threat to internal validity ?
E Are single-subject aphasia-treatment data autocorrelated and, if so, to what
extent ? What is the impact of this potential threat to statistical conclusion
validity ?
E Can aphasia-treatment single-subject outcomes be quanti®ed for calculating
estimates of eåect size ? If so, do single-subject designs produce appreciable
eåect sizes ? and
E How are aphasia-treatment single-subject data validly and reliably analysed ?
Purpose
This work centres on the designs, data, eåect sizes, and analyses of single-subject
aphasia-treatment research. The purpose of the work is to accomplish two
objectives in each of these four domains: (a) review current scholarship regarding
single-subject experimentation reported in the statistical and clinical research
literatures ; and (b) assess the state of single-subject aphasia-treatment exper-
imentation. The assessment of available eåect sizes constitutes an elementary meta-
analysis. Recommendations for optimizing future applications of aphasia-treatment
single-subject research are proposed throughout.
Methods
The ®rst objective was accomplished through a review of relevant statistical
literature. Since writings on the analysis of single-subject data are much more
numerous than writings in the other three domains, that section of this work is
more extensive than the others. To facilitate communication, the analysis section
contains a series of step-by-step worked examples and illustrations.
An extensive search of aphasia-treatment literature reported by Robey (1998)
$
Autocorrelation refers to the relationship (i.e. the degree of predictability, or lack of independence)
between an observation of a subject and observations of the same subject made at later points in time.
See the Data section of this work for a technical de®nition and example.
%
A carry-over eåect occurs when the eåect of one treatment is not washed out (i.e. behaviour does
not return to baseline) prior to the administration of a second treatment. As a result, the observation
of behaviour in the second period will result from an interaction of the ®rst and second treatments
(see Robey and Schultz 1998).
&
The reader is referred to Cook and Campbell (1979) and to Robey and Schultz (1993) for
explanations of internal validity as well as statistical conclusion validity mentioned in the next point.
448 R. R. Robey et al.
Designs
Levin (1992) describes two classes of motivation for single-subject research
designs, those generating hypotheses and those testing hypotheses once generated. The
former is appropriate in exploratory research when the focus of inquiry is
exploration of the several dimensions of a new treatment : amount of treatment,
materials, protocols and contingencies (see Phase I and Phase II research in Robey
and Schultz 1998). These early and purposeful considerations are necessary for
Single subject research 449
specifying the null hypothesis to be tested in the e¬cacy stage of clinical outcome
research (see Phase III research in Robey and Schultz 1998). The second class
applies to con®rmatory research wherein the focus is an already particularized null
hypothesis.That is, hypotheses are selected for single-subject research to determine
eåectiveness of a speci®c treatment administered to patients having one or more
speci®c attributes through some particular means of service delivery (see Phase IV
research in Robey and Schultz 1998).
Focused hypotheses for testing treatment eåectiveness must be replicated.
Hilliard (1993) explains that the necessary sequence is ®rst direct replication
(additional experiments using similar subjects and similar circumstances to
determine the reliability of the eåectiveness of treatment) and then systematic
replication (i.e. replications with thoughtfully selected diåerences to determine
limits on the generality of eåectiveness). Said diåerently, testing the eåectiveness of
treatments encompasses a programme of serial testing of focused, theory-driven,
single-subject null hypotheses (Kearns and Thompson 1991b) in addition to
similarly focused group studies.
The di¬culty in eåectiveness research is that one cannot compare multiple
treatments in a single individual without suåering the negative consequences of a
carry-over eåect in the analysis of outcome (Shapiro et al. 1982, Kratochwill and
Williams 1988, Senn 1993, Franklin et al. 1996, Fukkink 1996, Backman et al. 1997).
By way of explanation, a carry-over eåect occurs when the eåect of a prior
treatment in¯uences a measurement made following the next (or later) treatment
(Senn 1993). That is, the eåect of an early treatment persists and in¯uences
measurements made after the administration of (an)other treatment(s). It would be
unreasonable to expect the eåect of aphasia treatment to wash out between
experimental periods and undesirable if it occurred. When successful, treatment of
aphasia imparts permanent change; a carry-over eåect is the expected and necessary
outcome. Therefore, two treatments applied to one person cannot be compared on
the basis of a common baseline of performance unaltered by treatment. Moreover,
it is unreasonable to expect the eåects of treatments to be linearly additive ; that is,
one cannot expect that if treatment" brings about a units of change, and treatment#
brings about b units of change, administering treatment" and then treatment#
would yield a total magnitude of a plus b units of change. As a result, direct
comparisons of two treatments administered to the same subject or subjects often
yield ambiguous ®ndings (Kazdin 1986). The most direct solution is to test one
treatment per subject (Eick and Kofoed 1994, Fukkink 1996).
Table 1.
Multiple
Design Number baseline
A, B 9 9
A, B, A 34 29
A, B, A, B 3 1
A, B, A, B, A 3 1
A, B, A, C, A 1 0
A, B, A, C, B 1 0
A, B, B1 C, A 1 1
A, B, C 1 1
A, B, C, A 3 0
A, B, C, B, C 1 0
A, B, C, D, E, A 1 0
A, B r C, A (alternating treatments) 4 2
A, B1 C, B, B1 C, A, B, B1 C, B, A 1 1
the period was likewise coded A. Periods labelled `follow up’ and `maintenance ’
to designate post-treatment periods of no treatment were also coded A.
Some authors have used the term ` reversal period’ to connote a period of no
treatment (e.g. Connors and Wells 1982, Yaden 1995) and these were coded A.
Classically, the terms `withdrawal ’ and ` reversal’ are distinct. Bloom and Fischer
(1982), Hersen and Barlow (1976), Kazdin (1982), and Kratochwill (1978) point
out that a reversal period is one in which active treatment persists. In a reversal
period, either the reinforcement contingency protocol changes or treatment is
directed toward a diåerent target behaviour.
The ®rst period of active treatment was always coded B ; all later periods in which
that same treatment was again implemented were also coded B. Any period for
implementing a second and distinct treatment was coded C ; periods dedicated to
a third treatment were coded D, and so forth.
Nine of the 63 studies reported a classic AB design with multiple baseline
controls. Thirty-four studies reported a withdrawal control in the form of an ABA
design (withdrawal design). Twenty-nine of those ABA designs also incorporated
multiple baseline controls. In all, 45 studies reported some form of a multiple
baseline design (e.g. across behaviours, across subjects ; see Kearns and Thompson
1991a for a similar ®nding). In addition, 38 of the 63 designs incorporated some
form of generalization probe.
Forty-nine of the 63 studies compared one or more periods of a single treatment
with one or more periods of no treatment. The remaining 14 studies compared two
or more treatments in sequence. The number of subjects reported in each of the 63
studies ranged from 1 to 10 with a mean of 3 and a standard deviation of 2. The
most frequently reported n was 1 (26 studies). A catalogue of the reported designs
and their frequencies is listed in table 1.
Because the number of baseline observations and the number of treatment
observations in¯uence the validity and the reliability of conclusions, these design
attributes of the 63 studies were indexed. In multiple-baseline designs, the numbers
of observations per period were averaged for each subject and those values were
then averaged across subjects so that each study contributed one value for each
period (all fractions were rounded up to whole numbers, as per Cohen 1988). For
Single subject research 451
Data
The property of times-series data known as autocorrelation (also known as
autoregression, serial dependence, and serial correlation) biases conclusions drawn
from analyses of those data. Positive autocorrelation leads to liberally biased errors
and negative autocorrelation leads to conservatively biased errors (Crosbie 1987).
By way of explanation, autocorrelation indexes the relationship between observa-
tions taken at one point in time and observations on the same subject taken at a
diåerent time. The autocorrelation coe¬cient is nothing more than the Pearson-
Product-Moment correlation coe¬cient for two vectors of the same data. For a
®rst-order autocorrelation coe¬cient, the ®rst vector contains the observations as
they were collected in series. The ®rst element of the second vector is set empty;
the second element is the ®rst observation; the third element is the second
observation, and so forth. The second vector is said to be a lag-1 variate. A
correlation coe¬cient is then calculated for the two vectors of data. The lag-1
autocorrelation coe¬cient indexes the relationship between data points and those
452 R. R. Robey et al.
Eåect sizes
The quanti®cation and synthesis of single-subject outcomes was ®rst proposed by
Scruggs and Mastropieri (1987). Their approach has received much criticism (e.g.
White 1987, Allison and Gorman 1993) causing Scruggs and Mastropieri (1994) to
acknowledge that the procedure suåers some shortcomings. Recently, Busk and
Serlin (1992) proposed algorithms for estimating single-subject eåect sizes ; the
Busk and Serlin algorithms are grounded in the principles of conventional meta-
analysis. An eåect size indexes the magnitude of departure from the null state (i.e.
no meaningful change) in a set of experimental observations; it has no associated
probability. Because an eåect size is a scale-free index, its value is independent of
measurement scales and so its absolute size is meaningful and, as a result, it is
possible to compare directly the eåect sizes taken from diåerent studies.
The estimation of eåect sizes (ES) for a meta-analysis of single-subject research
is described by Busk and Serlin (1992, pp. 197±198) :
xa B 2 xa A
ES 5 (1)
sA
where B and A designate treatment and baseline periods respectively, x- is the mean
of the data collected in a period and s is the correspondingstandard deviation. Two
comments are germane. First, this is the most liberal of Busk and Serlin’s estimates ;
the 3 : 1 variance ratio of aphasia-treatment data (treatment : baseline) cannot justify
application of their more conservative algorithms. Also important is that meta-
analysis procedures for single-subject research are elementary compared to those
for group studies and the two expressions of eåect size are not comparable.
Nevertheless, the Busk and Serlin calculations quantify the magnitude of change
brought about by treatment and permit relative comparisons across single-subject
studies. It should also be noted that when there is no variability in baseline data (i.e.
baseline values are all equal), ES cannot be calculated.
Eåect
Study Design N Treatment " Treatment # size
Note : A designates a no-treatment period ; B designates a treatment period ; C designates a diåerent treatment ; (m) designates a multiple baseline control;
C-VIC 5 computer-based visual-communication treatment ; HELPSS 5 Helm’s Elicited Language Training Program for Syntax Stimulation (Helm-Estabrooks
1981); A-V 5 audio-visual.
‹ This design was also reversed in the form of A, C, BC, 0.
Œ One subject received a second treatment.
Subjects 1±3 in table 5.
R. R. Robey et al.
Single subject research 455
multiple baselines, an eåect size was calculated for each target behaviour and the set
of eåect sizes was averaged to achieve a single index for each study. The subjects
described in the 12 studies were mostly non-¯uent aphasic individuals in the third
through eighth decades of life. With few exceptions, subjects were in the chronic
stage of recovery and demonstrated marked aphasia.
Notably, each eåect size in table 2 is a large-sized eåect (Glass 1976). Overall, the
magnitude of eåect size estimates in table 2 compares favourably to analogous
eåects reported in counselling literature (Busse et al. 1995). The extraordinarily
large eåect size for Raymer and Thompson (1991) is attributed to the very small
variance from a ¯oor eåect in those baseline data (many zero-correct observations).
The reader is reminded that, because of diåerences in the mathematics, these eåect
sizes cannot be compared to those from multiple-subject meta-analyses (Robey
1994, 1998). Furthermore, the equation used to obtain these eåect sizes produces
liberal estimates. Nevertheless, table 2 demonstrates that single-subject treatment
eåects can be quanti®ed and that treatments for aphasia bring about appreciable
change. It is therefore unfortunate that table 2 can contain eåects reported by only
19 % of the available studies.
Analysis
Single-subject research designs yield sequences of performance-over-time data
called time series data. An interruption in the stream of data occurs through
experimenter manipulation of time as an independent variable (e.g. a period of no
treatment followed by a period of treatment). As a consequence, primary analyses
of single-subject aphasia-treatment clinical- outcome research centre on the
tenabilities of two null hypotheses (H! subscripted seriatim) :
H! " : b slope A & b slope B
H! # : b level A & b level B
Here, b slope is a coe¬cient indexing the slope of the performance-over-time
plot of the data points within a certain period (e.g. the period of treatment
observations). Similarly, b level is a coe¬cient indexing the magnitude of overall
performance within a period. The subscripts A and B designate no-treatment and
active-treatment periods respectively. The reader is referred to Bloom and Fischer
(1982, pp. 428±441) for a thorough discussion of changes in slope and level.
The research hypothesis associated with H! " asserts that the slope observed in a
no-treatment period accelerates in the following active-treatment period. Likewise,
the research hypothesis for H! # asserts that the overall level of performance
increases from a no-treatment period to the following active-treatment period.
When a subject is in the acute stage of recovery and spontaneous recovery causes
performance to increase over time, drawing the conclusion that treatment is
eåective requires a rejection of H! " applied to the slope data as well as a rejection
of H! # ; that is, treatment must accelerate the rate of recovery and increase the
overall level of performance. When stability of baseline is evident, eåectiveness
may be demonstrated with rejection of H! # only.
Only two analysis procedures providing a direct test of H! " and H! # are designed
to be insensitive to the negative aåects of autocorrelation on statistical conclusion
validity : interrupted time-series analysis (ITSA) and an improved interrupted time-
series analysis (ITSACORR). For that reason, the presentation of these two
analyses are ordered ®rst in this review. As will be seen, the latter is more applicable
456 R. R. Robey et al.
A 100 E 100
90 Baseline Treatment 90 Baseline Treatment
80 80
70 70
Percent correct
Percent correct
60 60
50 50
40 40
30 30
20 20
10 10
0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
B 100 F 100
90 Baseline Treatment 90 Baseline Treatment
80 80
70 70
Percent correct
Percent correct
60 60
50 50
40 40
30 30
20 20
10 10
0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
C 100 G 100
90 Baseline Treatment 90 Baseline Treatment
80 80
70 70
Percent correct
Percent correct
60 60
50 50
40 40
30 30
20 20
10 10
0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
D 100
H 100
90 Baseline Treatment 90 Baseline Treatment
80 80
70 70
Percent correct
Percent correct
60 60
50 50
40 40
30 30
20 20
10 10
0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Sessions Sessions
Y= 5 XB
E 5 Y2 Y=
SSE 5 E´E
to obtain their respective sums-of-squares-due-to-error term (SSE). Because Y is a
column vector in each case, SSE (i.e. SS" and SS! ) is always a scalar.
The overall F test is given by
(SS! 2 SS" )} 2
F5
SS" } m
where m is N2 42 2p degrees of freedom. Here, N is the total number of
observations in the two periods being contrasted and p is the number of
autoregressive parameters in the model. The value of F is evaluated on 2 and m
degrees of freedom. If statistical signi®cance is achieved, the elements of B in the
SS" model indexing the period-1 and period-2 levels (i.e. elements B" , " and B" , $ ) are
contrasted by means of a t test (see Crosbie 1995, p. 378, equation 12.20). Similarly,
the elements of B indexing the period-1 and period-2 slopes (i.e. elements B" , # and
B" , % ) are contrasted. Lastly, these same four elements of B are entered into
Gottman’s (1981, pp. 371±372) equations for generating a straight-line level-slope
function characterizing the ®t of the autoregressive model to the obtained data.
Preliminary analyses of the example data suggested a single autoregressive
parameter (i.e. p 5 1) as a best ®t. The elemental structures of the X and Y matrices
giving the values of SS" (i.e. 613.746) and SS! (i.e. 1070.486) are displayed in ®gure
2. The resulting value of the overall F was 4. 837. On 2 and 13 degrees of freedom,
the test is signi®cant at the 0.05 level but not at the 0.01 level. However, neither the
separate t test for a diåerence in level (t 5 2 0. 053) nor the t test for a diåerence in
slope (t 5 2 0. 408) achieved statistical signi®cance. If Type I error tolerance for the
overall F test is set at 0. 05, the analysis suggests an omnibus diåerence from period
Single subject research
ITSACORR
Crosbie (1993) determined that ITSA algorithms underestimate positive auto-
correlation with short series of data and altered them accordingly. Crosbie (1993)
conducted Monte Carlo simulations to test the resulting Type I and Type II error
characteristics of the optimized algorithms. In general, the improved ITSA, named
ITSACORR, maintained Type I error at or below the nominal level with
satisfactory statistical power in the analysis of short-series data (i.e. fewer than 50
observations per period). The exception occurred when autocorrelation exceeded
0.6 and sample sizes were less than 20. Crosbie (1993) recommended a general
minimum of 10 observations per period, although the greater the number of
observations is, the more accurately autocorrelation is estimated.
ITSACORR is simpler in use than ITSA because it does not require an initial
®tting of autoregressive parameters. Like ITSA, ITSACORR yields a test of
overall change, a test of change in slope, and a test of change in level. The ®rst is
a preliminary test for preserving Type I error control: if the overall test does not
achieve statistical signi®cance, the tests of H! " : b slope A & b slope B and H! # : b level A &
b level b are not interpreted. With multiple baseline studies in which ITSACORR
would be applied to each iteration of the basic design, preserving Type I error
control would require the additional step of setting an experiment-wise (see
Maxwell and Delaney 1990) Type-I-error tolerance (a ) to be divided equally among
each of the ITSACORR applications. It should be noted that the directional null
hypotheses make possible the use of one-tailed a levels.
The ITSACORR F test for the null hypothesis of overall change in the example
data is 1.650 on 2 and 14 degrees of freedom which does not achieve statistical
signi®cance. The ITSACORR ®t for the example data is displayed in ®gure 1(C).
If the example data are made more consistent with outcomes observed in aphasia-
treatment literature by lowering the baseline observations to centre around the
10 %-correct level and altering the treatment observations to culminate at the
90 %-correct level (see ®gure 1(D)), the sensitivity of ITSACORR to appreciable
changes is made evident. For these data, the overall F is 5.910 with an exact
probability of 0.014. The t test for a change in level is also signi®cant (i.e. t 5 3. 341,
p 5 0. 005) ; the t test for a change in slope does not achieve statistical signi®cance
(i.e. t 5 0.187, p 5 0.855). That is, the overall change from period 1 to period 2 in
®gure 1(D) is particularized as a change in level but not in slope.
The superiority of ITSACORR among the various analysis alternatives is given
by three large and important advantages. First, ITSACORR provides scienti®cally
valid evidence on the tenability of change in using numbers of observations that are
Single subject research 461
dividing line. The median of the data in each half of the period is then calculated.
Each half of the period is halved by another vertical dividing line so that the period
is segmented into quarters. The value of the ®rst median (i.e. of the data in the ®rst
half of the period) is plotted on the ®rst-quarter reference line ; the second median
is plotted on the third-quarter reference line. A straight line extending throughout
the entire period is drawn through the two median data points. If an equal number
of data points do not appear on either side of the connecting line, a second line,
parallel to the ®rst and achieving the even split, is drawn. The equal dividing line
is termed the split-middle trend line. It is extrapolated into the next period (e.g.
treatment period) as a dashed or dotted line (see Kazdin 1982, p. 311f).
A split-middle trend line ®tted for the original example data is found in ®gure
1(E). In this example, the baseline data is separated into two equal portions each
consisting of four data points. The median of each portion was 20.0. The ®rst-
quarter reference line occurs at 2.5 on the abscissa; the third-quarter reference line
occurs at 6. 5. In this case, the split-middle line is a horizontal line through 20. 0 on
the ordinate which extends into the treatment period as a dotted line. To contrast
the level and slope of the two adjacent periods of data, a split-middle line was
drawn for the second-period data (i.e. the solid line through the treatment period).
The split-middle analysis of the example data supports the conclusion that change
occurred.
Hojem and Ottenbacher (1988) examined the value of adding the split-middle
trend line to graphs of single-subject data. Their results suggested that adding the
line may improve inter-rater reliability of visual analysis somewhat. Johnson and
Ottenbacher (1991) examined the eåect of adding split-middle trend line to graphs
of single-subject data on inter-rater agreement and found only marginal utility in
adding the trend line ; overall inter-rater agreement rose to an average of 0. 76 when
a split-middle line was included in the graph. Similarly, Ottenbacher and Cusick
(1991) found that the inconsistency of visual analyses without bene®t of reference
lines (i.e. interrater reliability of 0.54) marginally diminished with the presence of
a split-middle trend line (i.e. interrater reliability raised to 0. 67), but remained
unacceptable.
Binomial tests
Two applications of the well known binomial test (Siegel 1956) have been
proposed as easily obtained tests of the null hypothesis of no change in level across
period boundaries. One application is based upon the split-middle trend line and
the other is based upon the celeration trend line. In each case, the binomial test is
carried out by comparing the proportion of second-period data points falling
above} below the extrapolation of the baseline trend line.
In ®gures 1(E) (i.e. split-middle trend line) and (F) (i.e. celeration trend line),
only one data point falls below the dotted line extending through the treatment
period. In each case then, the proportion of data points below the extended line is
0.0909 and the proportion above is 0.9091. As a result, the following calculations
apply for both ®gures 1(E) and (F).
Cohen (1988) points out that the density of the distribution of proportionsis not
uniform and so the area under the sampling distribution curve between say 0.60
464 R. R. Robey et al.
and 0.65 (near the centre of the distribution) is not equivalent to the area beneath
0.90 and 0.95 (in the extremity of the tail). Therefore, Cohen recommends that the
test of the null hypothesis that one population proportion (p ) equals another is
transformed from
H! : r p baseline 5 p treatment r
to
H! : r u baseline 5 u treatment r
where
u x 5 2 arcsin o p x
Analysis of variance
For behavioural scientists, an intuitive approach to the analysis of single-subject
data might be the application of a t test (i.e. o F) to compare data from two
diåerent periods for mean diåerence. However, this approach has two inherent
problems. First, the analysis of variance (ANOVA) model assumes that obser-
vations are independent of one another (i.e. the errors are not correlated). Since the
data points all derive from a single subject, the assumption is not tenable. In
addition, Scheåe! (1959) showed mathematically that F is not robust to violations
of the independence assumption. Furthermore, Phillips (1983), Toothaker et al.
(1983), Sharpley and Alavosius (1988), and Suen et al. (1990) have all demonstrated
through Monte Carlo simulation experiments that ANOVA is not at all robust
when the data are even slightly autocorrelated. Autocorrelation causes the test to
be prohibitively liberal in terms of Type I error control. The value of t for the
example data is 4. 06 on 17 degrees of freedom yielding a probability of p ! 0. 001.
Single subject research 465
Randomization tests
Several variations of the randomization test have been proposed as solutions to the
analysis di¬culties in single-subject research (Edgington 1987). Revusky’s Rn test
is one such test. Originally, the algorithm tested only for changes in level (Franzen
and Iverson 1990). Wolery and Billingsley (1982) added mathematics to test for
changes in slope. However, both the original and enhanced procedures require an
experiment that is not interesting : each of four subjects must receive treatments
administered in random order. As a result, the test is beside the point for most
clinical single-subject research applications including the example data ; the test is
meant for a diåerent design altogether.
The C statistic
Tryon (1982) applied the C statistic to single-subject data to contrast the slope of
the baseline data with the slope of the data obtained during the treatment period.
The value of C is given by
n
3 (xi 2
#
xi + " )
C5 12
i= "
n
23 (xi 2 xa )
#
i= "
where xi is the ith data point in the combined stream of n data points in periods
1 and 2. The ratio of C over its standard error yields the standard unit-normaldeviate
Z which gives the probability value for assessing the tenability of the null
hypothesis. The value of C for the example data is 2 0.2025 (Z 5 2 0. 8058) which
does not achieve statistical signi®cance (Tryon 1982). Crosbie (1989) conducted a
Monte Carlo analysis examining the Type I error characteristics of the C statistic ;
the actual Type I error rates of the C statistic were prohibitively high when applied
to autocorrelated data.
Discussion
For the most part, single-subject aphasia-treatment research designs are hypothesis
driven and well controlled. Seventy-eight per cent (i.e. 49) of the 63 studies tested
only one treatment, and thereby precluded a carry-over eåect. Every one of these
49 studies comprised multiple baseline controls, or withdrawal controls, or both.
The great advantage of including a withdrawal period in the sequence is the
opportunity to test two important eåects : direct treatment and maintenance.
466 R. R. Robey et al.
Typically, the number of initial baseline observations has been insu¬cient for
conducting valid analyses of change. Single-subject research must optimize on the
extended baseline series that are possible with multiple-baseline designs. It is highly
desirable, for example, to re-order multiple target behaviours for each subject
whenever possible. By changing the order in which target behaviours are
introduced in each replication of the basic design, an extended baseline is
established for each target.
The elementary estimates of eåect sizes in table 2 clearly establish that
quantitatively-based analysis can capture the outcomes of single-subject quasi-
experiments and can express them in standard scienti®c terms. Although changes
brought about by aphasia treatment in single-subject quasi-experiments appear
robust, few studies reported the necessary mathematical details required for
calculating an eåect size. When calculable, eåect sizes were large. The lesson is
clear : single-subject aphasia-treatment studies merit and require the quanti®ed
outcomes expected by the greater clinical-outcome research community, including
reimbursers. The interests of the professionand practitioners will be well served by
demonstrably valid and reliable applications of hypothesis testing logic.
Certainly, visual analyses should not be abandoned. As Franklin et al. (1996)
point out, visual analyses of single-subject data are necessary descriptive tools and
statistical analyses are necessary inferential tools. Taken together, conservative and
consistent visual analyses combined with valid statistical analyses capture clinical
signi®cance as well as statistical signi®cance ; interpretation requires both (Bloom
and Fischer 1982, Johnston et al. 1995, Franklin et al. 1996).
Single-subject aphasia-treatment data are clearly autocorrelated. Therefore, the
means for testing the two null hypotheses of aphasia-treatment outcome must be
insensitive to autocorrelation. That restriction narrows the ®eld of analysis
alternatives to two : ITSA and ITSACORR. Because the number of observations
per period is typically low in single-subject aphasia-treatment studies, many fewer
than 50 per period, ITSACORR is preferred to ITSA. Until it is surpassed by
future developments, ITSACORR should be the procedure-of-choice, and
essentially the standard, for applying hypothesis testing logic to single-subject data.
A comment and a caveat regarding ITSACORR are necessary. As in all areas of
research, there is likely to be an evolution of ITSACORR algorithms or
programming or both. The user and consumer of ITSACORR should therefore
expect further developments in the analysis of short-series single-subject data. The
caveat relates to the F and t tests reported by ITSACORR. It is important to realize
the rejection of an idiographic null hypothesis (Grossman 1986, Ottenbacher
1990), say H! # : b level A & b level B , warrants only the inference that change has
occurred in one human being ; it provides no basis for an inference regarding a
clinical population. For that reason, single-subject research designs are the
preferred means for testing the eåectiveness of treatment provided to a particular
individual. The generality of treatment eåectiveness for a population must be
assessed through a synthesis of many similarly focused and similarly designed
single-subject quasi-experiments, a meta-analysis. The same statement regarding
generality applies when single-subject designs are used to assess treatment
eåectiveness in terms of, for example, variations in treatment methods, sub-
populations, or diåerent service-delivery models (see Robey and Schultz 1998).
Writings on single-subject research often assert that a compelling concert of
evidence will result through the absolutely vital practice of systematic replication.
Single subject research 467
Most often, however, those writings do not mention how this evidence is to be
formed through combination and synthesis to achieve a broadly based, and so
general, conclusion. That is, given extensive and successful systematic replication,
how are the individual outcomes combined to form a generalized ®nding? By
de®nition, one would not combine single-subject data from several studies to
obtain a `group’ estimate of central tendency, nor would one desire to do so. What
needs combining is not single-subject data but the outcomes of single-subject
quasi-experiments bearing on a common research question.
A ` vote counting’ approach applied to a series of subjective decisions (i.e. visual
analyses) could yield nothing more than a collective subjective conclusion formed
out of many individual subjective decisions. It could not satisfy the demand that
conclusions be scienti®cally rigorous in applying current statistics and probability
theory. Acceptable procedures for combining experimental results require
estimates of eåect size that can be meaningfully combined and synthesized (Robey
1997). The accepted means for synthesizing research outcomes bearing on a
research question is through meta-analysis : a set of mathematical procedures for
estimating an average eåect size, with its associated con®dence interval, from all
available evidence (Hunter and Schmidt 1990, Hall et al. 1994, Petitti 1994).
Increasingly, researchers in other clinical sciences recognize the value of meta-
analysis for objectively synthesizing individual results to achieve generality in
conclusion (e.g. Wilson et al. 1996, Wurthmann et al. 1996). Certainly, the
procedures for conducting meta-analyses of single-subject research are elementary
at this point but, as certainly, the technology will advance. Aphasiologists will
optimize on these advances and the associated bene®ts only if the products of future
single-subject research are quanti®ed. Only one of three measures is necessary to
meet this criterion : (a) report statistical tests such as those obtained through
ITSACORR; (b) report raw data ; or (c) construct performance-over-time plots
with su¬cient resolution to permit data retrieval.
A great and increasing need is experienced for studies testing that interventions
are applicable to persons other than those tested with the reasonable expectation
that the interventions will be generally eåective. It seems obvious that for purposes
of documenting or justifying the eåectiveness of therapeutic intervention, it is in
the professional interest to make the most powerful arguments that one can, using
both group and individual study data wherever possible. Results that can be
generalized are required, then, both from studies of groups and of single subjects.
For either class, individual or group, care is required in design and in
implementation so that the results can be entered into meaningful combinations
with other studies and so increase the potency of the results. Incorporating valid
quantitative analyses of single-subject data in research reports will position the
profession to assess the generality of individual outcomes and so optimize the
warrant for asserting treatment eåectiveness.
Summary
It is clear that the large majority of aphasia-treatment single-subject quasi-
experiments are hypothesis-driven tests of single treatments utilizing multiple
baseline controls or withdrawal controls or both. It is also clear that single-subject
aphasia-treatment designs yield short series of autocorrelated data embodying
appreciable eåects of the independent variable : the crossing over from no
468 R. R. Robey et al.
Acknowledgement
The authors thank Drs Robert S. Barcikowski, John W. Lloyd, and Tonya R.
Moon for their helpful comments regarding drafts of this manuscript.
References
Allison, D. B. and Gorman, B. S. 1993, Calculating eåect sizes for meta-analysis : The case of the
single-case. Behaviour Research and Therapy, 31, 621±631.
Backman, C. L., Harris, S. R., Chisholm, J.-A. M. and M onette, A. D. 1997, Single-subject
research in rehabilitation : A review of studies using AB, withdrawal, multiple baseline, and
alternating treatments designs. Archives of Physical Medicine and Rehabilitation , 78, 1145±1153.
Bloom, M. and Fischer, J. 1982, Evaluating Practice : Guidelines for the Accountable Professional
(Englewood Cliå, NJ : Prentice-Hall).
Bobrovitz, C. D. and Ottenbacher, K. J. 1998, Comparison of visual inspection and statistical
analysis of single-subject data in rehabilitation research. American Journal of Physical Medicine and
Rehabilitation, 77, 94±102.
Busk, P. L. and Maracuilo, R. C. 1988, Autocorrelation in single-subject research : A counter-
argument to the myth of no autocorrelation. Behavioral Assessment, 10, 229±242.
Busk, P. L. and Serlin, R. C. 1992, Meta-analysis for single case research. In T. R. Kratochwill and
J. R. Levin (Eds) Single-Case Research Design and Analysis (Hillsdale, NJ : Lawrence Erlbaum
Associates, Inc.), pp. 197±198.
Buss, R. T., Kratochwill, T. R. and Elliot, S. N. 1995, Meta-analysis for single-case consultation
outcomes: Applications to research and practice. Journal of School Psychology, 33, 269±285.
Campbell, D. T. and Stanley, J. C. 1963, Experimental and Quasi-Experimental Designs for Research.
(Boston: Houghton Mi°in Co.)
Coelho, C. A. 1991, Manual sign acquisition and use in two aphasic subjects. In M. L. Lemme (Ed.)
Clinical Aphasiology, vol. 19 (Austin, TX : Pro-Ed), pp. 209±218.‹
Single subject research 469
Cohen, J. 1988, Statistical Power Analysis for the Behavioral Sciences (2nd edn) (Hillsdale, NJ : Lawrence
Erlbaum).
Conlon, C. P. and McNeil, M. R. 1991, The e¬cacy of treatment for two globally aphasic adults
using visual action therapy. In M. L. Lemme (Ed.) Clinical Aphasiology, vol. 19 (Austin, TX :
Pro-Ed), pp. 185±195.‹
Connell, P. J. and Thompson, C. K. 1986, Flexibility of single-subject experimental designs. Part
III : Using ¯exibility to design or modify experiments. Journal of Speech and Hearing Disorders, 51,
214±225.
Conners, C. K. and W ells, K. C. 1982, Single-case designs in psychopharmacology. In A. E.
Kazdin and A. H. Tuma (Eds), Single-Case Research Designs (San Francisco, CA : Jossey-Bass,
Inc), pp. 61±77.
Cook, T. D. and Campbell, D. T. 1979, Quasi-Experimentation : Design and Analysis Issues for Field
Settings (Boston: Houghton Mi°in).
Cook, T. D. and Shadish, W. R. 1994, Social experiments: Some developments over the past ®fteen
years. Annual Review of Psychology, 45, 545±580.
Crosbie, J. 1987, The inability of the binomial test to control Type I error with single-subject data.
Behavioral Assessment, 9, 141±150.
Crosbie, J. 1989, The inappropriateness of the C statistic for assessing stability or treatment eåects
with single-subject data. Behavioral Assessment, 11, 315±325.
Crosbie, J. 1993, Interrupted time-series analysis with brief single-subject data. Journal of Consulting
and Clinical Psychology, 61, 966±974.
Crosbie, J. 1995, Interrupted time-series analysis with short series : Why is it problematic : how can
it be improved. In J. M. Gottman (Ed.) The Analysis of Change (Mahwah, NJ : Lawrence
Erlbaum Associates), pp. 361±395.
D avis, G. A. 1978, The clinical application of withdrawal, single-case research designs. In R. H.
Brookshire (Ed.) Clinical Aphasiology, vol. 8 (Minneapolis, MN : BRK Publishers), pp. 11±19.
D eProspero, W. and Cohen, S. 1979, Inconsistent visual analysis of intrasubject data. Journal of
Applied Behavior Analysis, 12, 573±579.
D oyle, P. J. and Goldstein, H. 1985, Experimental analysis of acquisition and generalization of
syntax in Broca’s aphasia. In R. H. Brookshire (Ed.) Clinical Aphasiology, vol. 15 (Minneapolis,
MN : BRK Publishers), pp. 205±213.*
D oyle, P. J., Goldstein, H. and Bourgeois, M. S. 1987, Experimental analysis of syntax training in
Broca’s aphasia : A generalization and social validation study. Journal of Speech and Hearing
Disorders, 52, 143±155.‹
Edgington, E. S. 1987, Randomized single-subject experiments and statistical tests. Journal of
Counseling Psychology, 34, 437±442.
Eick, T. J. and Kofoed, L. 1994, An unusual indication for a single-subject clinical trial. The Journal
of Nervous and Mental Disease, 182, 587±590.
Franklin, R. D., Gorman, B. S., Beasely, T. M. and Allison, D. B. 1996, Graphical display and
visual analysis. In R. D. Franklin, D. B. Allison and B. S. Gorman (Eds) Design and Analysis of
Single-Case Research (Mahwah, NJ : Lawrence Erlbaum), pp. 119±158.
Franzen, M. D. and Iverson, G. L. 1990, Applications of single subject design to cognitive
rehabilitation. In A. M. Horton (Ed.) Neuropsychology Across the Life-Span : Assessment and
Treatment (New York : Springer Publishing Co.), pp. 155±174.
Frattali, C. M. 1998, Outcomes measurement: de®nitions, dimensions, and perspectives. In C. M.
Frattali (Ed.) Measuring Outcomes in Speech-Language Pathology (New York : Thieme), pp. 1±27.
Fukkink , R. 1996, The internal validity of aphasiological single-subject studies. Aphasiology, 10,
741±754.
Furlong, M. J. and W ampold, B. E. 1982, Intervention eåects and relative variation as dimensions
on experts’ use of visual inference. Journal of Applied Behavior Analysis, 15, 415±421.
Glass, G. V. 1976, Primary, secondary, and meta-analysis of research. Educational Researcher, 5, 3±8.
Gottman, J. M. 1981, Time-Series Analysis : A comprehensive Introduction for Social Scientists (Cambridge:
Cambridge University Press).
Greenwood, K. M. and Matyas, T. A. 1990, Problems with the application of interrupted time
series analysis for brief single-subject data. Behavioral Assessment , 12, 355±370.
Grossman, K. E. 1986, From idiographic approaches to nomothetic hypotheses: Stern, Alport, and
the biology of knowledge, exempli®ed by an exploration of sibling relationships. In J. Valsiner
(Ed.) The Individual Subject in Scienti®c Psychology (New York : Plenum Press), pp. 37±69.
470 R. R. Robey et al.
Hall, J. A., Rosenthal, R., T ickle-D egnen, L. and Mosteller, F. 1994, Hypotheses and problem
in research synthesis. In H. Cooper and L. V. Hedges (Eds.) The Handbook of Research Synthesis.
(New York : Russel Sage Foundation), pp. 17±28.
Harrop, J. W. and Velicer, W. F. 1985, A comparison of alternative approaches to the analysis of
interrupted time-series. Multivariate Behavioral Research, 20, 27±44.
Helm-Estabrooks, N. 1981, Helm’s Elicited Language Program for Syntax Stimulation (HELPSS).
Austin, TX : Exceptional Resources, Inc.).
Hersen, M. and Barlow, D. H. 1976, Single-Case Experimental Designs : Strategies for Studying Behavior
Change (New York : Pergamon Press).
Hilliard, R. B. 1993, Single-case methodology in psychotherapy process and outcome research.
Journal of Consulting and Clinical Psychology, 61, 373±380.
Hojem, M. A. and Ottenbacher, K. J. 1988, Empirical investigation of visual-inspection versus
trend-line analysis of single-subject data. Physical Therapy, 68, 983±988.
Hopkins, A. 1998, The measurement of outcomes of health care research. In M. Swash (Ed.) Outcomes
in neurological and neurosurgical disorders (Cambridge, UK : Cambridge University Press).
Huitma, B. E. 1985, Autocorrelation in applied behavior analysis : A myth. Behavioral Assessment, 7,
107±118.
Huitma, B. E. 1988, Autocorrelation : 10 years of confusion. Behavioral Assessment , 10, 252±294.
Huitma, B. E. and McKean, J. W. 1991, Autocorrelation estimation and inference with small
samples. Psychological Bulletin, 110, 291±304.
Hunter, J. E. and Schmidt, F. L. 1990, Methods of meta-analysis : Correcting error and bias in research
®ndings (Newbury Park, CA : Sage Publications).
Johnson, M. B. and Ottenbacher, K. J. 1991, Trend line in¯uence on visual analysis of single-
subject data in rehabilitation research. International Disabilities Studies, 13, 55±59.
Johnston, M. V., Ottenbacher, K. J. and Reichardt, C. S. 1995, Strong quasi-experimental
designs for research on the eåectiveness of rehabilitation. American Journal of Physical Medicine
and Rehabilitation , 74, 383±392.
Jones, R. R., Weinrott, M. R. and Vaught, R. S. 1978, Eåects of serial dependency on the
agreement between visual and statistical inference. Journal of Applied Behavior Analysis, 11,
277±283.
Kane, R. L. 1997, Approaching the outcomes question. In R. L. Kane (Ed.) Understanding health care
outcome research (Gaithersburg, MD : Aspen Publishers).
Kazdin, A. E. 1982, Single-Case Research Designs : Methods for Clinical and Applied Settings (New York :
Oxford University Press).
Kazdin, A. E. 1986, Comparative outcome studies of psychotherapy: methodological issues and
strategies. Journal of Consulting and Clinical Psychology, 54, 95±105.
Kearns, K. P. 1985, Response elaboration training for patient initiated utterances. In R. H.
Brookshire (Ed.) Clinical Aphasiology, vol. 15 (Minneapolis, MN : BRK Publishers), pp.
196±204.‹
Kearns, K. 1986, Flexibility of single-subject experimental designs. Part II : Design selection and
arrangement of experimental phases. Journal of Speech and Hearing Disorders, 51, 204±214.
Kearns, K. P. and Salmon, S. J. 1984, An experimental analysis of auxiliary and copula verb
generalization in aphasia. Journal of Speech and Hearing Disorders, 49, 152±163.*
Kearns, K. P. and Thompson, C. K. 1991a, Analytical and technical directions in applied aphasia
analysis : The Midas touch. In T. E. Prescott (Ed.) Clinical Aphasiology, vol. 19 (Austin, TX :
Pro-Ed), pp. 40±54.
Kearns, K. P. and Thompson, C. K. 1991b, Technical drift and conceptual myopia: The Merlin
eåect. In T. E. Prescott (Ed.) Clinical Aphasiology, vol. 19 (Austin, TX : Pro-Ed), pp. 31±40.
Kratochwill, T. R. 1978, Single Subject Research : Strategies For Evaluating Change (New York :
Academic Press).
Kratochwill, T. R. and Williams, B. L. 1988, Perspectives on pitfalls and hassles in single-subject
research. Journal of the Association for Persons with Severe Handicaps, 13, 147±154.
Krishef, C. H. 1991, Fundamental Approaches to Single Subject Design and Analysis (Malabar, FL :
Krieger Publishing Co.).
LaPointe, L. L. 1978, Multiple baseline design. In R. H. Brookshire (Ed.) Clinical Aphasiology, vol.
8 (Minneapolis, MN : BRK Publishers), pp. 20±29.
Levin, J. R. 1992, Single-case research design and analysis : Comments and concerns. In T. R.
Single subject research 471
Kratochwill and J. R. Levin (Eds.) Single-Case Research Design and Analysis (Hillsdale, NJ :
Lawrence Erlbaum Associates, Inc.), pp. 213±224.
Matyas, T. A. and Greenwood, K. M. 1990, Visual analysis of single-case time series : Eåects of
variability, serial dependence, and magnitude of intervention eåects. Journal of Applied Behavior
Analysis, 23, 341±351.
Maxwell, S. E. and Delaney, H. D. 1990, Designing Experiments and Analyzing Data : A Model
Comparison Perspective (Belmont, CA : Wadsworth Publishing).
McReynolds, L. V. and Kearns, K. P. 1983, Single-Subject Experimental Designs in Communicative
Disorders (Baltimore, MD : University Park Press).
McReynolds, L. V. and T hompson, C. K. 1986, Flexibility of single-subject experimental designs.
Part I : Review of the basics of single-subject designs. Journal of Speech and Hearing Disorders, 51,
194±203.
Ottenbacher, K. J. 1990a, Clinically relevant designs for rehabilitation research : The idiographic
model. American Journal of Physical Medicine and Rehabilitation , 69, 287±292.
Ottenbacher, K. J. 1990b, Visual inspection of single-subject data: An empirical analysis. Mental
Retardation, 28, 283±290.
Ottenbacher, K. J. 1993, Interrater agreement of visual analysis in single-subject decisions:
Quantitative review and analysis. American Journal of Mental Retardation, 98, 135±142.
Ottenbacher, K. J. and Cusick, A. 1991, An empirical investigation of interrater agreement for
single-subject data using graphs with and without trend lines. Journal of the Association for
Persons with Severe Handicaps, 16, 48±55.
Parsonson, B. S. and Baer, D. M. 1992, The visual analysis of data, and current research into the
stimuli controlling it. In T. R. Kratochwill and J. R. Levin (Eds.) Single-Case Research Design
and Analysis (Hillsdale, NJ : Lawrence Erlbaum Associates, Inc.), pp. 15±40.
Petitti, D. B. 1994, Meta-Analysis, Decision Analysis, and Cost-Eåectiveness Analysis (New York :
Oxford University Press).
Phillips, J. P. N. 1983, Serially correlated errors in some single-subject designs. British Journal of
Mathematical and Statistical Psychology, 36, 269±280.
Portney, L. G. and W atkins, M. P. 1993, Foundations of Clinical Research : Application to Practice
(Norwalk, CN : Appleton and Lange).
Potter, R. E. and Goodman, N. J. 1983, The implementation of laughter as a therapy facilitator with
adult aphasics. Journal of Communication Disorders, 16, 41±48.*
Raymer, A. and T hompson, C. K. 1991, Eåects of verbal plus gestural treatment in a patient with
aphasia and severe apraxia of speech. In M. L. Lemme (Ed.) Clinical Aphasiology, vol. 20
(Austin, TX : Pro-Ed.), pp. 285±297.*
Raymer, A. M., T hompson, C. K., Jacobs, B. and Le Grand, H. R. 1993, Phonological treatment
of naming de®cits in aphasia: model based generalization analysis. Aphasiology, 7, 27±53.*
Reichardt, C. S. and Mark, M. M. 1998, Quasi-experimentation. In L. Bickman and D. L. Rog
(Eds) Handbook of Applied Research Methods (Thousand Oaks, CA : Sage Publications Inc.), pp.
193±228.
Robey, R. R. 1994, The e¬cacy of treatment for aphasic persons: A meta-analysis. Brain and Language,
47, 582±608.
Robey, R. R. 1997, Meta-Analysis of Clinical Outcome Research. A paper presented before the annual
meeting of the American Speech-Language-Hearing Association, Boston.
Robey, R. R. 1998, A meta-analysis of clinical outcomes in the treatment of aphasia. Journal of Speech
and Hearing Research, 41, 172±187.
Robey, R. R. and Schultz, M. C. 1993, Optimizing Theories and Experiments (San Diego, CA : Singular
Publishing Group).
Robey, R. R. and Schultz, M. C. 1998, A model for conducting clinical outcome research : An
adaptation of the standard protocol for use in aphasiology. Aphasiology, 12, 787±810.
Salvatore, A. 1976, Training an aphasic adult to respond appropriately to spoken commands by
fading pause duration within commands. In R. H. Brookshire (Ed.) Clinical Aphasiology, vol.
6 (Minneapolis, MN : BRK Publishers), pp. 172±191.‹
SAS Institute Inc. 1994, SAS System Under Microsoft Windows, Release 6.10 (Cary, NC : SAS Institute
Inc.).
Scheffe! , H. 1959, The Analysis of Variance (New York : John Wiley and Sons).
Scruggs, T. E. and Mastropieri, M. A. 1994, The utility of the PND statistic : A reply to Allison
and Gorman. Behaviour Research and Therapy, 32, 879±883.
472 R. R. Robey et al.
Sederer, L. I., Dickey, B. and Hermann, R. C. 1996, The imperative of outcomes assessment in
psychiatry. In L. I. Sederer and B. Dickey, (eds) Outcomes Assessment in Clinical Practice
(Baltimore, MD : Williams and Wilkins), pp. 1±7.
Senn, S. 1993, Suspended Judgment: N-of-1 trials. Controlled Clinical Trials, 14, 1±5.
Shapiro, E. S., Kazdin, A. E. and McGonigle, J. J. 1982, Multiple-treatment interference in the
simultaneous- or alternating-treatment design. Behavioral Assessment , 4, 105±115.
Sharpley, C. F. 1987, Time-series analysis of behavioural data : An update. Behaviour Change, 4,
40±45.
Sharpley, C. F. and Alavosius, M. P. 1988, Autocorrelation in behavioral data: An alternative
perspective. Behavioral Assessment, 10, 243±251.
Shewart, W. A. 1931, Economic Control of Quantity of Manufactured Products (New York : Van
Nostrand Reinhold).
Siegel, S. 1956, Nonparametric Statistics for the Behavioral Sciences (New York : McGraw-Hill).
Sokal, R. R. and Rohlf, F. J. 1981, Biometry : The principles and practice of statistics in biological research
(2nd ed.) (New York : W. H. Freeman and Co.).
SPSS Inc. 1994, SPSS for Windows, Release 6.1 (Chicago : SPSS Inc.).
Starch, S. A. and M arshall, R. C. 1986, Who’s on ®rst ? A treatment approach for name recall
with aphasic patients. In R. H. Brookshire (Ed.) Clinical Aphasiology, vol. 16 (Minneapolis,
MN : BRK Publishers), pp. 73±79.‹
Steele, R. D., Weinrich, M., Wertz, R. T., Kleczewska, M. K. and Carlson, G. S. 1989,
Computer-based visual communication in aphasia. Neuropsychologia, 27, 409±426.*
Stocks, J. T. and Williams, M. 1995, Evaluation of single subject data using statistical hypothesis
tests versus visual inspection of charts with and without celeration lines. Journal of Social Service
Research, 20, 105±126.
Suen, H. K., Lee, P. S. C. and Owen, S. V. 1990, Eåects of autocorrelation on single-subject single-
facet crossed-design generalizability assessment. Behavioral Assessment, 12, 305±315.
Sullivan, M. P., Fisher , B. and Marshall, R. C. 1986, Treating the repetition de®cit in conduction
aphasia. In R. H. Brookshire (Ed.) Clinical Aphasiology, vol. 16 (Minneapolis, MN : BRK
Publishers), pp. 172±180.‹
T hompson, C. K. 1983, An experimental analysis of the eåects of two treatments on Wh interrogative
production in agrammatic aphasia. Doctoral dissertation, University of Kansas, Kansas,
USA.‹
T hompson, C. K. and Byrne, M. E. 1984, Across setting generalization of social conventions in
aphasia: An experimental analysis of `` loose training.’’ In R. H. Brookshire (Ed.) Clinical
Aphasiology, vol. 14 (Minneapolis, MN : BRK Publishers).*
T hompson, C. K. and McReynolds, L. V. 1986, Wh interrogative production in agrammatic
aphasia: An experimental analysis of auditory-visual stimulation and direct-production
treatment. Journal of Speech and Hearing Research, 29, 193±206.*
T hompson, C. K. and Shapiro, L. P. 1994, A linguistic-speci®c approach to treatment of sentence
production de®cits in aphasia. In M. L. Lemme (Ed.) Clinical Aphasiology, vol. 22 (Austin, TX :
Pro-Ed.), pp. 307±323.*
T hompson, C. K., Hall, H. R. and Sison, C. E. 1986, Eåects of hypnosis and imagery training on
naming behavior in aphasia. Brain and Language, 28, 141±153.*
T hompson, C. K., Raymer, A. and le Grand, H. 1991, Eåects of phonologically based treatment on
aphasic naming de®cits : A model driven approach. In T. E. Prescott (Ed.) Clinical Aphasiology,
vol. 20 (Austin, TX : Pro-Ed.).*
T hompson, C. K., Shapiro, L. P., T ait, M. E., Jacobs, B. J. and Schneider, S. L. 1996, Training
wh-question production in agrammatic aphasia: Analysis of argument and adjunct movement.
Brain and Language, 52, 175±228.‹
T imm , N. H. 1975, Multivariate Analysis with Applications in Education and Psychology (Monterey, CA :
Brooks-Cole).
T oothaker, L. E., Banz, M., Noble, C., Camp, J. and D avis, D. 1983, N 5 1 designs: The failure
of ANOVA-based tests. Journal of Educational Statistics, 8, 289±309.
T ryon, W. W. 1982, A simpli®ed time-series analysis for evaluating treatment interventions. Journal
of Applied Behavioral Analysis, 15, 423±429.
Velicer, W. F. and McDonald, R. P. 1984, Times series analysis without model identi®cation.
Multivariate Behavioral Research, 19, 33±47.
Single subject research 473
References marked with an * indicate studies included in the meta-analysis and contributed estimates
of autocorrelation. References marked with a ‹ contributed estimates of autocorrelation.