Review Article: Curriculum-Based Measurement: A Brief History of Nearly Everything From The 1970s To The Present
Review Article: Curriculum-Based Measurement: A Brief History of Nearly Everything From The 1970s To The Present
ISRN Education
Volume 2013, Article ID 958530, 29 pages
http://dx.doi.org/10.1155/2013/958530
Review Article
Curriculum-Based Measurement: A Brief History of
Nearly Everything from the 1970s to the Present
          Gerald Tindal
          College of Education, University of Oregon, Eugene, OR 97403, USA
          Copyright © 2013 Gerald Tindal. This is an open access article distributed under the Creative Commons Attribution License, which
          permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
          This paper provides a description of 30 years of research conducted on curriculum-based measurement. In this time span, several
          subject matter areas have been studied—reading, writing, mathematics, and secondary content (subject) areas—in developing
          technically adequate measures of student performance and progress. This research has been conducted by scores of scholars across
          the United States using a variety of methodologies with widely differing populations. Nevertheless, little of this research has moved
          from a “measurement paradigm” to one focused on “training in data use and decision making paradigm.” The paper concludes with
          a program of research that is needed over the next 30 years.
in an academic domain. Finally, the last stage (three) was          2.1. Institute for Research on Learning Disabilities with Curric-
reached when the focus of research was on the relevance             ulum-Based Measurement. Curriculum-based measurement
of the data for making instructional decisions (including           (CBM) was originally coined in the mid-1970s by Deno and
teacher acceptance in using these data).                            Mirkin [5] in which they articulated two systems for teachers
     While Fuchs’ heuristic is certainly useful and viable, the     to use in systematically monitoring students’ progress over
manner in which research on CBM is analyzed in this paper           time: (a) mastery monitoring (mm) and (b) general outcome
is quite different. Rather than thinking of the research in a       measurement (GOM). To this day, the two forms are con-
sequential manner (with the earlier phases considered more          sidered by the National Center on Response to Intervention
basic and the last phase more applied), the depiction in this       (RTI) (http://www.rti4success.org/).
                                                                         MM is the traditional model in which teachers provide
paper is more iterative and targeted at various and more
                                                                    instruction in a specific skill unit and then test students’
encompassing critical dimensions of a measurement system
                                                                    mastery. As a result, MM has two unique features from a
within a decision making framework, with some dimensions            measurement perspective: (a) the domain for sampling items
more critical in CBM, given its design for classroom use by         is limited to those taught, and (b) the graphic display reflects
teachers in monitoring teaching and learning. Finally, in the       a stair step function with time on the 𝑥-axis and mastery of a
last two sections, CBM is depicted in the context of response       unit (the dependent variable) on the 𝑦-axis. The steeper the
to intervention (RTI) for evaluating instructional programs         steps, the faster students are mastering units.
and within a theoretical context using an argumentative                  In contrast, GOM is based on comparable alternate forms
approach to the process of validation.                              from long-range goal (LRG) material (skills to be taught over
     The big idea in this document is to summarize the              an extended period of time). Because CBM was developed
research and findings from the past 30 years to rearrange           within special education, the LRG is often drawn from the
the focus of curriculum-based measurement for the next 30           Individual Education Program (IEP). The two unique features
years. In short, the past 30 years of research on curriculum-       of this model are that (a) the domain for sampling items spans
based measurement privileged measurement (as noted in the           the material being taught over an extended time with items
summary of findings in the five subject area sections) over         that both preview and review skills for any given student and
an argument-based validation scheme on training and data-           (b) the forms are considered comparable to each other so
based decision making (as noted in the last two sections).          the graphic display can include slope and variability as the
                                                                    primary dependent variable.
2. CBM: The Beginnings at the                                            These early features are important to consider as the
                                                                    eventual development of CBM moved on to a more norm-
   University of Minnesota                                          referenced perspective for comparing students to each other
The early research on curriculum-based measurement at the           in order to identify those at risk of failing-to-learn. Because
Institute for Research on Learning Disabilities at the University   of the preferred psychometric properties of alternate forms
of Minnesota established two major lines of reports focusing        over time leading to slope and variability as outcomes as well
                                                                    as the difficulty in defining and defending “mastery”, the field
on (a) technical adequacy and (b) teacher decision making.
                                                                    has generally focused on GOM methodologies.
Following is a brief summary of this research, most of which
                                                                         The immediate aftermath of the original publication by
still stands today as relevant and of value in considering
                                                                    Deno and Mirkin [5] was a five-year program of research
continued research that needs to be done in both explicating        conducted at the Institute for Research on Learning Dis-
and extending the findings.                                         abilities (IRLD) at the University of Minnesota from 1979
     Although not specifically based on the Standards for Edu-      through 1983. The focus of the institute was to (a) investigate
cational and Psychological Testing [4], the original research       current practices in identification of learning disabilities,
addressed the reliability and validity components as defined        and (b) document the technical characteristics of CBM for
at that time. In addition to this focus on technical adequacy,      monitoring student progress. In total, 23 monographs and 144
several other characteristics were demanded of the measure-         technical reports were published through the IRLD with six
ment system so teachers could use the measures on a regular         monographs and 62 research reports written on CBM. The
basis. Deno and Mirkin [5] emphasized specific features of          monographs established the critical dimensions of systematic
these measures to allow alternate forms so that a time series       formative classroom measurement system [11, 12].
data display could be used to determine if programs were                 At that time, few alternative options were present for
working or needing to be adjusted. For example, the measures        teachers to use in formatively evaluating their instructional
had to be easy to create, quick to administer, usable by all,       programs. Basal series had mastery tests with little technical
from the curriculum, and with sufficient technical adequacy.        adequacy (reliability and validity) in decision making and
In the end, they laid the groundwork for an experimental            actually misinformed teachers about skills being learned and
view of teaching in which any intervention was considered           practiced by students [13–17]. Informal reading inventories
a hypothesis that needed vindication” [6, p. 32]. This last         and readability-based estimates of passage difficulty were not
component was deemed to be a critical component of for-             (and are still not) reliable [18, 19].
mative evaluation [7, 8] and development of Individualized               CBMs and time series data on student progress are effec-
Education Programs (IEPs) [9] and assessment of short- and          tive setting events for influencing teachers’ decision making
long-term goals [10].                                               and practices as well managing Individualized Education
ISRN Education                                                                                                                     3
Programs (IEPs) [7, 8, 10, 20–30]. Some of this research was        and decision making. In this early review, the models were
done because of the unique (statistical) issues present in time     compared on several dimensions: relationship to assessment
series data [31–34] the effects of which are summarized by          and decision making, underlying premises about the relation-
Marston et al. [35].                                                ship of assessment data to instruction (primarily addressing
     Training and structure of instruction is an important          the prescriptive nature of recommendations to teach), type
component of effective data utilization for systematic use in       of student response, focus of material for monitoring student
school settings [30, 36, 37].                                       progress (the “shelf life” of the materials), test format (degree
     The practice of implementing CBM should be planned in          of standardization, length, and test construction skills), and
its adoption in the schools and may successfully be used by         technical adequacy (reliability and validity).
both teachers and students to improve performance [27, 29,
                                                                        With the exception of curriculum-based measurement
38–40].
                                                                    (CBM), these models have had a limited presence in the
     CBMs can be effectively used in screening students at
risk of failure to learn basic skills [41, 42]. Programs and        research literature in special education. CBM, however, has
decisions may be appropriately evaluated at a systems level in      been extensively researched over the past 30 years, in both
a manner that comprehensive ties different decisions together       general and special education. It has become a term used
in a manner [35, 42–44].                                            widely in the special education literature and has become
     Domain sampling (breadth and representation) in the            the backbone of response-to-intervention (RTI), serving as
curriculum is an important issue to consider in developing          the preferred form of progress monitoring used to evaluate
progress monitoring probes, influencing sensitivity to change       instructional programs. Indeed, its staying power has been a
and rate of progress [45–48].                                       function of (testimony to) its compatibility with traditional
     CBMs are reliable and are both related to other critical       measurement development and the applications of standards
measures of basic academic skills and can serve as adequate         for educational and psychological testing [4]. Indeed, one of
measures of progress in learning basic skills [13, 20–22, 49–       the core focuses on CBM was the need for technical adequacy,
56].                                                                something which was not embedded into the fiber of the
     In the end, the research is summarized in three mono-          other systems. Indeed, the proliferation of investigations on
graphs focusing on establishing and monitoring progress on          CBM has been most recently acknowledged in a Festschrift,
IEP goals [57], systemic use of student classroom perfor-           A Measure of Success: How Curriculum-Based Measurement
mance and progress in making multiple decisions [28, 29,            has Influenced Education and Learning [67].
57, 58], and a review of the research over the five years of            And with this core attention to traditional measurement
funding establishing curriculum-based evaluation in reading,        development, a number of tractable issues can be pursued
spelling, and written expression [59].                              that would not be possible with non-GOM systems. For
     The early research also spawned three books in the area        example, it is difficult to address slope with mastery monitor-
of systematic classroom assessment, each addressing unique          ing or ratios of correct to incorrect. In contrast, with a solid
aspects of this behavioral measurement approach. The books          measurement system based on general outcomes, further
synthesized the research from the institute and researchers         research can be conducted on slope, which is addressed
responsible for the initial research as follows: (a) Shinn [60],    next in the context of oral reading fluency. Similarly, major
(b) Shinn [61], and (c) Tindal and Marston [62]. The former         developmental trajectories in skill development are difficult
two books summarized the research completed during the              to document with measurement systems across brief periods
funding of the IRLD and its immediate aftermath, while the          of time that is typical of non-GOM systems. Yet, with the
latter book extended CBM into a larger, test driven model for       research on early literacy, it is possible to measure the
measurement development.                                            development of early skills (in letter names and sounds)
                                                                    and their transformation into decoding and fluent reading.
2.2. Other Models: Non-General Outcome Measurement.                 With non-GOM systems, research on administration and
From the beginning, the term begged the question, and sev-          metrics for scoring and reporting outcomes is restricted, an
eral alternative terms were offered. In a special issue of Excep-   issue addressed in the area of written expression based on
tional Children, the term CBM was subjected to a debate.            a GOM approach. Ironically, with restricted domains from
For Gickling and Thompson [63], the term curriculum-based           a non-GOM approach, it is nearly impossible to research
assessment was used to denote administration of mastery             the effects of domains; in coverage of mathematics, this
monitoring tasks and then using the ratio of known (cor-            area is addressed. Finally, with secondary school measures
rect items) to unknown (incorrect items) to diagnostically          (of access and target content skill), a non-GOM approach
determine what to teach. At about the same time, Howell             can be used but reflects the traditional manner in which
[64] developed curriculum-based evaluation in which two             measurement is conducted in the classroom. In fact, a GOM
types of assessments were administered: survey level and            approach in these settings is present with only a tentative
specific level. In addition to the models just noted, these         foothold accomplished in the past 20 years. These are the
authors added criterion-referenced curriculum-based assess-         topics for the next several sections after which two major
ment [65] to describe a mastery-monitoring model with               issues are addressed: what do we know and need to know in
sequentially arranged objectives. Eventually, various models        implementing a measurement system for decision making in
were compared [66], and since those early days, most of the         classrooms and how can a validity argument be deployed in
research has been on various dimensions of measurement              the development of the curriculum-based measurement.
4                                                                                                                  ISRN Education
3. Oral Reading Fluency CBM                                                 when using slope to prompt change, more consistent
                                                                            progress monitoring occurs across conditions of goal
Oral reading fluency (ORF) has been extensively studied over                [76].
the past three decades. Since the original study [50]. In a con-
ference paper published by Tindal and Nese [68], the history             As Tindal and Nese [68] note, the sampling plans for
of research is summarized on slope of improvement for ORF.           students have been very inconsistent, primarily representing
In the following section, critical text from that publication is     convenience samples for researchers, considerably different
summarized. In addition, other topics of generalizability and        in size (from several dozen to thousands), and with different
standard error of measurement are addressed.                         demographic characteristics documented. The results have
                                                                     been reported by grade level and, if special education was
3.1. Within Year Average Slope (Growth). The earliest study          noted, it was dummy coded. The measures used in these
on average growth in oral reading fluency was by Fuchs               studies have been equally varied as the students measured
et al. [69]. They documented that weekly growth in words             and described as generic, standard, or grade appropriate.
correct per minute was 1.5 and 2.0 in first and second grades,       Likewise, frequency of measurement has been inconsistent
1.0 and 1.5 in third grade, 0.85 and 1.1 in fourth grade,            from weekly to triannually, with seasonal effects widely
0.5 and 0.8 in fifth grade, and 0.3 and 0.63 in sixth grade.         interpreted. In general, they report the following.
In 2001, Deno et al. [70] documented growth from nearly
                                                                         (1) Growth is not linear within a year but likely quadratic.
3,000 students tested from four regions of the country. They
                                                                             Even in the early research by Deno et al. [70], a large
reported 2.0 words correct per minute growth per week
                                                                             percentage of students showed nonlinear growth.
until students achieved 30 WRCM; thereafter, students in
the general education population improved at least 1.0 word              (2) Growth also is not linear across years but reflects
correct per minute/week.                                                     much greater improvement in the early grades that
     In the past six years, seven studies have been published                tapers off in the later elementary years.
on slope of progress in oral reading fluency [71–77]. This               (3) Students receiving special education services enter at
recent rise in publications has led to various findings and                  a lower level (intercept) and generally grow at a slower
conclusions.                                                                 rate.
     More growth occurs from the beginning of the year to
the middle of the year than from the middle to the end of               (4) Within year (weekly) growth in oral reading, fluency
the year, with overall growth being moderate [74] and more                  ranges from .50 words correct per minute per week to
specifically from fall to spring than winter to spring [71].                2.0 words correct per minute per week.
When quadratic models cannot be computed (because of                      An important issue behind this research is the growing
the limited number of measurement occasions), a piecewise            sophistication of the estimates of growth. In the original
model of growth can be conducted and also fits better than           research, growth is simply calculated in a linear manner either
a linear model, with all slopes being positive but negatively        as raw score gains or using an ordinary least squares (OLS)
accelerated and decreasing as grade level increases [73].            regression estimate. Furthermore, this growth is artificially
And, as confirmed by Nese et al. [77], slopes are nonlinear          converted to a weekly average by taking the values and
for students in grades 3–5 taking benchmark easyCBM                  dividing by the number of weeks within the span (e.g., year or
measures, again with more growth occurring in the fall than          season). In only two studies were students actually measured
in the winter, at least for students in grades three and four, and   on a weekly basis (see the two studies by Jenkins et al. [75]
more growth for students in earlier grades; students of color,       and Jenkins and Terjeson [76]).
with a disability (SWD), and eligible for free and reduced                In the more recent research, hierarchical linear modeling
price lunch perform considerably lower at the beginning of           (HLM) is being used [78] for making growth estimates
the year with slope only significant for SWD.                        conditioned on (nested in) various levels [79]. Typically, time
     Most of this research involves measurement on only two          is considered level one, student characteristics are considered
or three occasions, either in the fall and winter or in the          level two, and teacher or school is considered level three.
fall, winter, and spring. Rarely is instruction considered or        This kind of estimation provides a more precise estimate
progress monitoring being used. In other research on slope           as the confounding effects of various levels are controlled.
that is more oriented to teacher use in the classroom, two           For example, the study by Nese et al. [77] specifically model
explanations have been invoked for different findings on             time in both a linear and nonlinear fashion. This finding of
slope.                                                               nonlinear growth (within years or over grades) has profound
                                                                     implications for establishing aimlines (or long-range goals) or
    (1) Steeper slopes may occur because of the lower perfor-
                                                                     interpret results. If indeed students grow more from the fall
        mance obtained when only one (versus four) baseline
                                                                     to the winter than they do from the winter to the spring, then
        measure is used in calculating slope. Furthermore,
                                                                     teachers would be wise to use this information as mid-year
        more accurate estimates occur when measurement is
                                                                     checks on their decision making. More progress should be
        either pre-post or every three weeks [75].
                                                                     expected of students in this first four to five months, without
    (2) Instructional changes are made more often as a func-         which teachers would be making false positive decisions
        tion of weekly goal ambitiousness (1 to 1.5 word per         of growth and not changing instruction because they are
        week) but interact with frequency of measurement;            assuming linear growth. What appears to be an effective
ISRN Education                                                                                                                       5
program actually may not be effective, particularly if the            of error reliability coefficients and the SEM given a specified
teacher is using a normative reference of growth (e.g., how           number of probes when making relative and absolute deci-
other students are doing).                                            sions. Using 20 passages from DIBELS, they found that “the
                                                                      largest amount of variation in scores, 81%, was attributable
3.2. Generalizability Theory and Standard Error of Measure-           to the person facet. Item, or probe, accounted for 10% of the
ment. Separate from research on slope and expected growth             variance, and 9% of the variation was located in unaccounted
for students within the school year, another line of research         sources of error” (p. 331) with the index of dependability
has developed in the post-IRLD era. In the original IRLD              ranging from .81 (SEM of 18 WCPM) with one passage to .97
research, the focus was on traditional analyses of reliability        (SEM of 6 WCPM) with nine passages. In a slight variation
(test-retest, inter-judge, and alternate form). In the past           of analytical research on reliability of oral reading fluency
decade, however, advancement in the research continues to             measures, Christ and Ardoin [86] compared four passage
focus on reliability but with an emaphasis on the conditions          construction procedures: “(1) random selection, (2) selection
under which oral reading fluency is measured. Specifically,           based on readability results, (3) selection based on mean
two areas of research (with ORF measures) are G-theory and            levels of performance from field testing, and (4) use of ED
the influence of various data characteristics on the standard         procedures. . .(an estimate of inconsistencies in performance
error of measurement (SEM).                                           within and across both students and passages)” (p. 59). They
                                                                      reported generalizability coefficients consistent with previous
3.2.1. G-Theory Studies of ORF. In generalizability theory (G-        research (.91–.97 for 1–3 passages), but the D study found
theory), reliability is analyzed in terms of the effects of various   considerable differences in the passage compositions in favor
measurement facets on estimates of performance [80]. Rather           of the ED procedures in estimating the level of student
than considering a score on a performance assessment to               performance.
be comprised of observed score and error, in G-theory,                     For Ardoin and Christ [87], passage stability was inves-
the error is further parsed into constituent components or            tigated (FAIP-R, AIMSweb, and DIBELS) using four depen-
facets. For example, performance can be a function of the             dent variables: intercept, weekly growth, weekly SE(b), and
task (or passage in the context of oral reading fluency), the         SEE. Statistically significant differences were found “with the
occasion (conditions), or the rater (administrator). These            FAIP-R passage set resulting in the least amount of error
three facets are most typically studied in G-theory; however,         and the DIBELS passage set resulting in the greatest amount
the consistent finding is that raters are rarely influential          of error” (p. 274). In addition, the FAIP-R passages had the
whereas tasks are nearly always influential [81]. Generally,          lowest intercepts. Finally, measurement error increased with
this research involves two phases in which coefficients from          fluency rates. Francis et al. [88] also studied form effects
these facets are estimated in a G-study, and then a D-study           (using DIBELS passages) noting the problems with readabil-
is conducted to ascertain the number of such facets to make           ity formulae and the lack of actual student performance data
a reliable decision (e.g., how many tasks, occasions, or raters       to ensure comparability of passages over time. They reported
are needed?).                                                         that the passages were not substitutable (nor comparable)
    While traditional research continued on aspects of ORF            in the first assessment and that passages were not equally
measurement as a function of passage difficulty, [82], other          likely to yield students’ median fluency. Finally, they found
perhaps more sophisticated research has employed G-theory             a significant group by wave interaction: “growth trajectories
to understand the effects of facets on the reliability of per-        are significantly influenced by the placement of the more
formance estimates [82–85]. For example, Hintze et al. [84]           difficult stories (p. 333). Finally, and most recently, Christ
used G-theory to examine person, grade, method, occasion,             [89] reported that the number of data points, quality of data,
and various interactions. In the first study, participants and        and method used to estimate growth each influenced the
developmental changes explained most of the variance with             reliability, validity, and precision of estimated growth” (p. 15)
a generalizability coefficient of .90 with two sets of materials      with 14 weeks of intervention required for reliable and valid
and .82 with one set of materials in making intraindividual           estimates of slope.
decisions. For interindividual decisions, the generalizability
coefficients were also high (.98) using only three reading            3.3. Summary and Reflection on Analytical Models for Under-
passages. In the second study, participants and grade level           standing CBM. The most significant problem from this last
again explained most of the variance with very little influence       study and all other (previous) analyses on reliability is that
from CBM progress monitoring procedures. Generalizability             nothing about instruction is ever presented or considered.
coefficients were .80. Another study by Hintze and Christ             Rather, a limited set of variables for CBM is analyzed: quality
[83] used generalizability theory to study passages (both             of the data set, schedule and administration, and trend line
uncontrolled (randomly sampled) and controlled (purpo-                estimation. No questions are asked such as the following, all
sively sampled)) from the reading curriculum. They found an           of which would influence the reliability of score estimation.
interaction of probe by grade with controlled passages having         What is the fidelity of instruction implementation? How
smaller SE(b).                                                        well does the intervention match the progress monitoring
                                                                      of oral reading fluency? What grade level of measure is
3.2.2. G-Theory and Standard Error of Measurement on ORF.             used to progress monitor (e.g., it is hardly ideographic if
Poncy et al. [85] documented variability of CBM scores due            all students are monitored on grade level passages)? How
to student skill, passage difficulty, and unaccounted sources         many data points per intervention phase are present? What
6                                                                                                                  ISRN Education
curriculum is used during intervention and how well does             that the research to follow was really spurred by three events.
this curriculum represent the progress monitoring measures?          First, the National Reading Panel [93] emphasized alphabetics
How often is an intervention changed? How much time                  (phoneme awareness and phonics), fluency, and comprehen-
are students engaged in what kinds of activities during              sion (vocabulary and text), the big five ideas behind effective
an intervention? How many students are in instructional              reading, teaching, and learning. Another reason for the spur
groups? How are students identified for Tier II (e.g., what is       in development of such measurement research was the No
their initial level of performance on benchmark measures)?           Child Left Behind Act [94], with its emphasis on reading and
What seasonal growth is likely present (i.e., in which season        mathematics in grades three to eight (and a high school year).
data points are collected, given that several studies have been      With this form of high stakes testing, educators realized that
published or presented at conferences indicating that growth         reaching proficiency in grade three required attention to the
on oral reading fluency is nonlinear)? What cohort effect            foundation skills in earlier grades. Finally, Reading First was
might be present but is ignored (by data collection year             funded immediately after NCLB, again providing attention
or nested within geography)? How many days are students              to measures of early literacy. Most of the original research
present during the year? How stable is the sample of students        behind CBM did not include measurement of these early
across simulation draws (when used)? How many moves                  reading skills; the landmark review provided by Adams [95]
across schools are present by individual students? How many          was still six years out from the end of IRLD funding. As
students are with various (and specific) disabilities (or not        Fuchs et al. [96] noted, “in reading, most CBM research has
with any disability)? What are the IEP goals for students with       focused on the passage reading fluency task which becomes
disabilities? If students are with a disability, what is the level   appropriate for most students sometime during the second
and type of instruction implementation using response to             semester of first grade. Additional research is needed to
intervention (RTI) and how do special and general education          examine the tenability of reading tasks that address an earlier
teachers plan instruction and interact in its delivery?              phase of reading” (p. 7).
    The post-IRLD research on the technical characteristics
of ORF has made great advancements and moved the field               4.1. Evidence on Skill Relations. When students are just begin-
much further along using sophisticated analytical techniques         ning to read, students need to master both the graphemic and
that has uncovered increasingly nuanced issues. Unfortu-             phonemic components of reading. Furthermore as reading
nately, this research has been almost entirely concentrated          develops, not only are letter names and sounds important, as
on ORF and not applied to other CBMs. Furthermore, the               well as their concatenation, but they form digraphs, rimes,
research has not yet been contextualized into a nomological          and syllables that are the building blocks of mono- and poly-
net of practice in which important instructional and decision        syllabic words. Not only are letter names, letter sounds, and
making issues have been presented. For example, all of the           phoneme segmentation critical skills in developing readers,
previous questions relate equally well to reliability. Finally, in   these sublexical features are interrelated, “especially letter
using a Messick frame of validity as argument with a claim           sound fluency, may act as the mechanism that connects letter
supported by warrants and evidence, it appears that the major        names, letter sounds, and phonemic segmentation to word
claims are instructionally free and with little attention to         reading and spelling” (p. 321).
generalizability issues that underlie external and construct              However, it is fluency not accuracy that serves as the
validity [90].                                                       essential metric. “Letter name fluency and letter sound
                                                                     fluency, but not phoneme segmentation fluency, uniquely
4. Early Literacy CBM                                                predicted word reading and were stronger predictors than
                                                                     their accuracy counterparts. . . Fluent recognition of letter-
A number of skills have been considered in the measurement           sound associations may provide the mechanism that supports
of early reading, including alphabet knowledge of the names          phonological recoding, blending, and accurate word identi-
and sounds associated with printed letters, phonological             fication” [97, pag 321]. These skills are viewed as sublexical
awareness (e.g., ability to distinguish or segment words,            but importantly predictive in laying the groundwork for later
syllables, or phonemes), rapid automatic naming (RAN) of             word and passage reading fluency [98].
letters or digits, rapid automatic naming (RAN) of objects or             This fluency in the building blocks then becomes impor-
colors, writing names or letters in isolation, and phonological      tant in reading words and passages. Word recognition
memory (remembering spoken information). “Six variables              includes two processes: (a) the ability to decode written words
representing early literacy skills or precursor literacy skills      and (b) the ability to decode words instantly and automati-
[have] had medium to large predictive relationships with later       cally, in addition to psychological and ecological components
measures of literacy development. These six variables not only       [99]. Speed in word reading again is a factor in assessment
[have] correlated with later literacy as shown by data drawn         of reading and has long been of interest to researchers [100]
from multiple studies with large numbers of children but             and over the years consistently documented its relation with
also maintained their predictive power even when the role            comprehension [101]. Generally, authors invoke the logic of
of other variables, such as IQ or socioeconomic status (SES)         Laberge and Samuels [102] in that automaticity is important
were accounted for” [91, p. vii].                                    from an information processing view.
     Even though there had been a call to arms on the need                Letter sound knowledge also has been studied using
to emphasize reading in our public schools with publication          nonsense word fluency (NWF). In the late 1990s, the research
of the Report of the Commission on Reading [92], it appears          on curriculum-based measurement expanded to include
ISRN Education                                                                                                                   7
measures of early reading literacy. Probably the most ubiqui-       results on the DIBELS benchmark assessment scores indi-
tous measure to appear is the Dynamic Indicators of Early Lit-      cated that the treatment-intensive/strategic students scored
eracy Skills (DIBELS) [103]. DIBELS has been used to explore        significantly lower on the PSF and NWF subtests in the winter
the importance of phonological awareness and phonics as             (pretest), when compared to the other two benchmark groups
predictors of reading difficulty [104, 105].                        with or without treatment” [109, p. 23] and, in a study of
     In a study by Fuchs et al. [96], higher coefficients were      the influence of early phonological awareness and alphabet
found both predictively and concurrently for word identi-           knowledge, Yesil-Dagli Ummuhan [110] studied 2481 ELL
fication fluency (WIF) over nonsense word fluency (NWF)             students and found, on average, that the ELL students showed
with the Woodcock Reading Mastery Test-Revised. More                a 38 words per minute growth in English ORF throughout the
importantly, they found that “word identification fluency           first grade year and were able to read 53 words per minute
accounts for more unique variance than nonsense word                at the end of the first grade. Finally and more generally,
fluency. . . and word identification fluency slope dominated        phonemic awareness (PA) instruction has been found to be
nonsense word fluency slope in 3 of 4 comparisons” (pp. 16-         large and statistically significant for direct measures (𝑑 =
17). For Ritchey [98], however, both measures were moder-           0.86), as well as more indirect measures of reading (.53)
ately correlated with each other and with word identifica-          and spelling (.59). Importantly, PA instruction was effective
tion and word attack subtests from the Woodcock Reading             for students from various backgrounds, including from low
Mastery Test. Importantly, a ROC analysis showed both               socioeconomic backgrounds, at-risk and disabled readers,
measures to have relatively high positive predictive power          preschoolers, kindergartners, and first graders, as well as
(in identifying students at risk). NWF was not found to be          normally developing readers. Instruction was more effective
particularly useful in distinguishing students’ emerging skill      when accompanied with letters, with only a few PA skills, in
in learning to blend over time, as the manner in which student      small groups and over 5–18 hours [111].
respond may change. Likewise, beta weights for DIBELS
subtests in kindergarten were moderately correlated with            4.2. Growth Trajectories in Learning to Read. Sensitivity to
other published measures of reading, and only 32% to 52%            instruction and reflection of progress is related to the larger
of the variance was explained for literacy constructs at the        issue of growth trajectories possible (and necessary) in the
end of first grade [106, p. 349]. Furthermore, letter naming,       early years of elementary school. For example, Stage et al.
nonsense word, and phoneme segmentation fluency were as             [112] investigated letter sounds and its “relative predictive
highly correlated with motivation, persistence, and attitude as     relationship of kindergarten letter-naming and letter-sound
they were with reading submeasures. Nevertheless, Clemens           fluency to first-grade growth in ORF” (p. 227). Using four
et al. [107] administered word identification fluency (WIF),        measurement occasions over first grade, they found both
letter naming fluency (LNF), phoneme segmentation fluency           letter names and sounds to predict growth in oral reading
(PSF), and nonsense word fluency (NWF) as screening                 fluency (even above that of the first ORF occasion to predict
measures for 138 first grade students in the fall and another set   later growth in ORF). They also reported that, with eight
of reading measures (TOWRE) at the end of first grade. Using        or fewer letter names, 81% of the students were correctly
a ROC analysis, they reported that the measure with the             classified as at risk.
greatest classification accuracy was the WIF as a significant            Fluent readers in first grade have also been found to be
predictor for each of the outcome variables. Only 3-4 stu-          fluent in second grade [97]. This finding is consistent with
dents per classroom were falsely classified. AUC values were        other researchers who have reported that letter naming speed
shown “ranging from .862 to .909 across outcome measures,           appears to be greatest in second grade (accounting for 11% of
followed by LNF (range = .821–.849), NWF (range = .793–             the variance to just more than 2% in fifth grade on a reading
.843), and PSF (range = .640–.728)” (p. 238). Only modest           comprehension test [99]. In fact, phonemic awareness in
improvements were made in combining the measures.                   kindergarten has been found to be an important predictor
     Even though moderate correlations have been reported           of reading and spelling in fourth and fifth grades [113]. In
between various DIBELS measures and published tests, the            a study using hierarchical linear models of growth with
use of these measures for monitoring progress is really what is     three levels (time, student, and classroom), Linklater et al.
critical. And some sensitivity to progress (given instruction)      [114] documented growth on initial sound fluency, phoneme
has been reported. Hagan-Burke et al. [108] documented              segmentation fluency, combined phoneme segmentation, and
the sensitivity of growth using the DIBELS measures of              nonsense word fluency. “ISF in the beginning of kindergarten
letter naming fluency, nonsense word fluency, phoneme               significantly predicted and accounted for variability on end-
segmentation fluency, and word use fluency in relation to the       of-kindergarten measures of nonsense words, word iden-
Test of Word Reading Efficiency. They reported significant          tification, and reading comprehension” (p. 389). However,
correlations among the measures (with the DIBELS measures           considerable variation in growth was also apparent, as a
loading on a single factor) and significant influence of NWF        function of time, gender, and initial language status.
in predicting the TOWRE (accounting for over 50% of the                  In an analysis of growth in reading from kindergarten
variance). In another study, both phoneme segmentation              through third grade, Speece et al. [115] used several measures
fluency (PSF) and nonsense word fluency (NWF) subtests              of oral language, syntax, listening comprehension, phono-
of the DIBELS have been documented to be sensitive to               logical awareness, decoding, emergent literacy, and spelling
intensive instruction for 47 kindergarten students across           in addition to several background measures (race, gender,
four classes in an urban K-5 Title I school [109]. “The             SES as measured by free/reduced lunch, family literacy, and
8                                                                                                                ISRN Education
primary language spoken by the child). They found that “the       letter sounds per minute as benchmark goals for nonsense
prediction of third grade performance and growth varied by        words (with less than 30 indicating the need for intensive
type of reading skill assessed. . .only phonological awareness,   instructional support). In the end, oral reading fluency may
emergent reading (TERA-2), and family literacy retained           be more sensitive to monitoring progress than nonsense word
their significance as predictors of the intercept parameter in    fluency [96, 120]. Nevertheless, more research is needed to
the conditional model. . .It appears that the unique linguistic   document the handshake between the relations among the
roots of word-level reading at third grade are limited to the     skills and the trajectory supporting normal development.
influence of phonological awareness skill” (p. 328). These
findings on the importance of fluency in the basic skills of      5. Writing CBM
reading (sublexical features) help explain a finding over 20
years ago: word decoding skills in first grade accounts for 44%   Written expression curriculum-based measures (administra-
of first grade and 12% of fourth grade reading comprehension      tion and scoring) were investigated in the original research
variance [116].                                                   at the IRLD for their reliability and criterion relatedness
                                                                  to other published measures [52]. The initial findings indi-
4.3. Summary and Reflection on Skill Relations in CBM.            cated that correlations were moderate to high with a three-
Measurement of early literacy skills appears to be consistent     minute writing time to an unfinished story starter (a prompt
with the traditional criteria of curriculum-based measures.       providing the student with a scenario to which they were
These skills are straightforward in operationalizing, can         directed to write a narrative) and words written and words
generate alternate forms for frequent administration, can be      spelled correctly. Later, correct word sequences and correct
administered by others with a modicum of training, and            letter sequences were added as metrics for scoring student
reflect important general outcomes with sufficient technical      CBM outcomes [56]. In short order, a number of other
adequacy. Unlike oral reading fluency, however, the skills        researchers expanded nearly all aspects of measurement:
reflect a complex constellation with a relatively brief shelf     administration time, prompts, scoring procedures, and crite-
life. For example, letter naming, one of the earliest skills      rion measures. This initial research also was conducted only
to develop in kindergarten, is probably not sensitive to          with students in elementary schools and eventually expanded
progress across years (beyond first grade) for most students.     to secondary students [121–123], who further expanded the
Likewise, in first grade, association of sounds with letters      outcome indicators to both number and percentage of correct
likely has a similar short period for which it is sensitive.      word sequences and correctly spelled words. These later
Other measures like initial sound fluency as well as phoneme      researchers reported stronger correlations (and more obvious
segmentation, nonsense word fluency, or elision tasks are         group differences) when percentages were used rather than
likely to be short lived even though they represent slightly      straight counts.
more advanced skills than letter naming or sounding. And,
as Vloedgraven and Verhoeven [117] note, three problems           5.1. Administration and Scoring Systems for Elementary School
still exist with measurement of phonological awareness:           Students. The initial research on written expression [51,
(a) the theoretical explanation among various measures as         52] was conducted with 82 students from five elementary
related to a larger coherent construct, (b) the inaccuracy in     schools in Minneapolis and St Paul school districts in grades
measurement (and its developmental shelf life), and (c) the       three to six. Like all the initial validation studies at the
difficulty in documenting growth. They reported difficulty        IRLD in reading and spelling, the focus was on various
parameters that showed the easiest task to be rhyming, then       parameters of a CBM, including amount of time (one to five
phoneme segmentation, followed by phoneme blending, and           minutes), directions for administration and scoring, as well
finally phoneme segmentation as the most difficult task. They     as collection of criterion-related evidence. A number of story
also noted differences in performance on the tasks as a           starters and topic sentences were used, and students’ writing
function of grade.                                                samples were scored for average 𝑡-unit length (essentially
     Furthermore, the relation of these skills in development     independent clauses), use of mature (infrequent) words, large
is quite complex. As Mark Twain wrote, “what is needed            words, words spelled correctly, and total words written. The
is that each letter of the alphabet shall have a perfectly        latter four measures correlated the highest with the published
definite sound, and that this sound shall never be changed or     test (Test of Written Language). In a later study, the emphasis
modified without the addition of an accent, or other visible      was on growth with significant linear trends reported for
sound. . .But the English alphabet is pure insanity. It can       words spelled correctly, number of correct letter sequences,
hardly spell any word in the language with any degree of          and total words written [35]. Finally, the reliability of various
certainty” [118, pp. 168-169]. Therefore, it is quite uncertain   outcome indicators were established by Marston and Deno
the degree to which the progression of skills advances and the    [54], including alternate form, test-retest, inter-judge, and
thresholds that are needed for eventual transformation into       internal consistency. The only research to expand the mea-
actual reading (decoding) or fluency (reading accurately with     sures from the original outcome indicators was conducted
speed). Even though proficiency levels have been identified,      by Videen et al. [56] who established correct word sequences
the data supporting their application is not inviolate. For       as a “valid” indicator of writing proficiency in grades three
example, L. Fuchs and D. Fuchs [119] suggested that students      through six. Later research validated these original findings
need to complete 35 letter sounds per minute by the end           for total words written, words spelled correctly, and correct
of kindergarten, and Good et al. [104] recommended 50             word sequences with elementary students [124]; ironically,
ISRN Education                                                                                                                   9
the countable indices correlated more highly with ratings of       length of all continuous strings of correctly spelled and
story and organization than with conventions.                      sequenced words, proportion of the total words written that
    Nearly a decade later, the focus of research on written        are legible, and the proportion of words written that are
expression continued to address various metrics for scoring        correctly spelled. They reported that the percent of legible
performance [125]. In this study, total words written, correct     words, correct word sequences, and mean length of correct
word sequences, words spelled correctly, correct punctuation       word sequences were the strongest predictors of holistic
marks, and correct capitalizations, complete sentences, and        ratings, although growth over six months was limited to total
words in complete sentences were investigated by correlating       number of words written, number of legible words written,
the measures with teacher ratings (ideas, organization, voice,     and number of correctly spelled words.
word choice, sentence fluency, and conventions). Although               In a later refinement of this work, correct minus incorrect
inter-rater reliability of all measures was adequate (above        word sequences were added as an outcome indicator with
90% on most measures and 80% on two of them), test-retest          appropriate criterion-related evidence [128, 129]. In the for-
reliability was quite modest except for total words written        mer study, a number of different scoring metrics were used
and words spelled correctly. At best, moderate correlations        (total words, words correct, correct word sequences, number
have been reported among these CBM measures and a                  of characters, and number sentences written). Correlations
standardized test with the relation between teacher ratings        (with published subtests in both reading and math and
and the standardized test much higher.                             English GPA as well as a rating of quality) were moderate (and
    In a study similar to the original validation research, a      higher with reading). “In sum, the results of the correlational
number of scoring systems was investigated: words written,         analyses revealed that four measures—characters per word,
words spelled correctly, correct word sequences, and correct       sentences, CWS, and MLCWS—had a fairly consistent and
minus incorrect word sequences along with different stimulus       reliable pattern of relations with other measures of writing
tasks. The focus was correlating these measures (and tasks)        proficiency. Of these four, only sentences and CWS showed
with a standardized test of writing and a state test, as well as   divergent validity with correlations higher for the writing
language arts GPA. “Students completed two passage-copying         measures than for the reading and mathematics measures”
tasks and responded to two picture, two narrative, and two         (p. 20). Group differences were documented that reflected
expository prompts” [126, p. 554]. Moderately high correla-        a sensible pattern with students with learning disabilities
tions were found among all the measures. In grade 3, none          significantly below basic, regular, and enriched students. In
of the measures showed change from fall to spring, while the       the latter study [129], a slightly younger group of students was
5th grade students (and to a lesser extent 7th grade students)     studied (in grades six through eight), an expanded admin-
showed noticeable growth on several of the measures.               istration time was used (three and five minutes), and story
    Very little research has been done on sensitivity to           versus descriptive writing was considered. Similar dependent
progress with writing CBMs. In one of the few such studies,        variables were used as the previous study in 1999 (total words,
students were administered traditional story starters and then     words correct and incorrect, correct and incorrect word
provided an intervention focusing on the writing process           sequences, number of characters per word, number words per
(brainstorming and writing complete sentences) [127]. As           sentence, as well as correct minus incorrect word sequences)
in most of the research on written expression, a number of         along with teacher ratings (purpose, tone, and voice; main
scores were analyzed from the writing samples before and           idea, details, and organization; structure mechanics and leg-
after the intervention: total words written, total punctua-        ibility) and a district writing test used as criterion measures.
tion marks, correct punctuation marks, words in complete           They reported the highest reliability and criterion validity for
sentences, correct word sequences, and simple sentences.           correct word sequences and correct minus incorrect word
Ironically, although the number of total words written was         sequences with few differences in administration time or type
the only measure to show a difference from pre-to-post             of writing.
intervention, it did not correlate with a standardized test (in         This change in scoring systems across grade levels
addition to simple sentences).                                     indicates that “curriculum-based measures need to change
                                                                   as students become older and more skilled” [130, p. 151].
5.2. Administration and Scoring Systems for Secondary School       With students in grades four, eight, and ten, the alter-
Students. This research on written expression expanded in          nate form reliability was investigated with different sample
both the populations studied and the outcome indicators            duration or scoring procedures (total words, correct word
used to reflect proficiency with research by Parker and            sequences, and correct minus incorrect word sequences),
Tindal. For example, percentage of words spelled correctly         and criterion-related evidence was collected with a state
and percentage of correct word sequences were found to be          test. They reported that “alternate-form reliability correlation
suitable indicators for students in late elementary and middle     coefficients decreased with grade level” (p. 159). Although
school grades [121]. In another study with only middle school      all three scoring systems correlated equally well with the
students [122], a number of production and production-             state test (with a slight advantage for CWS-ICWS) and the
independent indicators were studied: total number of words         three administration times (three, five, and ten minutes), they
written regardless of spelling or handwriting legibility, num-     also found “a general pattern of decreasing coefficients with
ber of letter groupings recognizable as real English words,        shorter sample lengths for older students” (p. 163).
number of correctly spelled words, number of adjacent and               In another study using a state test as a criterion measure,
correctly spelled word pairs that make sense together, average     Espin et al. [131] varied the following variables to investigate
10                                                                                                                   ISRN Education
the impact on reliability and validity: time (3 to 10 minutes),     sequences, number of correct capitalizations, and number of
scoring procedures, and criterion reference (state writing          punctuation marks (both total and correct). Using the Test of
test). In addition, they investigated impact on primary             Written Language (TOWL) as a criterion, they reported high
language of the students. “Although [reliability] correlations      reliability for the CBMs (among 10 scorers) and moderate
were similar across scoring procedure, differences were seen        correlations. Only percentage of correct word sequences,
for time-frame” (p. 181) and “the scoring procedure with the        number of correct capitalizations, and number of correct
strongest reliability and validity coefficients was (CIWS)” (p.     punctuation marks were significant.
182).
    In a study using holistic judgments and a state test            5.3. Summary and Reflection on Administration and Scor-
as criterion measures, a number of unique scoring proce-            ing Systems in Writing CBM. McMaster and Espin [136]
dures were used with high school students, including incor-         reviewed this body of work including the initial IRLD
rect word sequences (ICWSs), correct punctuation marks              research and the decade in which the research was initially
(CPMs), adjectives (ADJs), and adverbs (ADVs), all scored           expanded by Tindal and colleagues, as well as Espin and
for each 10-minute sample [132]. They reported that “both           colleagues. Researchers focused on reliability, validity, and
CPM and ICWS were moderately to strongly related to                 sensitivity to growth with the initial research. In their
the holistic scores. . .[but only] ICWS was moderately and          research with elementary students, they addressed students
inversely correlated (𝑟 = −.51) with the norm-referenced            at different skill levels, screening decisions, scoring proce-
scores” (p. 366), though considerable misidentification of          dures, and beginning writers. In their coverage of secondary
learning disabilities was present using various percentile rank     students, the issues were the integration of reading and
cut offs (15th to 30th PR). The number of adjectives or adverbs     writing, attention to students requiring remedial and special
was not sufficiently reliable or correlated with either holistic    education, a focus on screening and progress monitoring,
judgments or the state test score.                                  consideration of scoring procedures, type task and sample
    This line of research expanded to include expository            duration, and, finally, predictions of performance on school-
essays and a much longer period of time to write (35 minutes        based indicators. They concluded by noting the need to
versus the traditional three to five minutes). In this last study   address reliability in progress monitoring and attention
with a relatively small sample size (only 22 middle school          to generalizability and consequential validity of measures.
students), “criterion variables in the study were the number        “Finally, if measures are used by teachers to monitor progress
                                                                    and make instructional decisions, it is necessary to demon-
of functional essay elements and quality ratings of the essays”
                                                                    strate that student performance improves as a result” (p. 82).
[133, p. 210]. They reported moderately strong correlations
                                                                         For the past 30 years, the research on written expression
among correct and correct-incorrect word sequences and              has addressed a number of test administration variables.
both functional elements as well as quality ratings. Interest-      Clearly, the writing process needs to address the stimulus
ingly, the correlations decreased only somewhat when only           prompt for soliciting student writing. In addition, the amount
the first 50 words of the essays were scored (particularly          of time to write is a critical variable. Finally, the system used to
with the quality rating and for the students with learning          generate a score needs to be carefully considered. Obviously,
disabilities). Finally, growth over time was documented from        all three of these variables need to be studied in terms of
pre- to posttest.                                                   reliability and validity, particularly criterion-related evidence
    To use criterion measures that teachers would find more         to be both predictive of performance on other measures
credible than standardized tests or state tests, Fewster and        but perhaps more importantly to be credible for teachers
Macmillan [134] studied student performance from late               (and therefore be predictive of their own judgments obtained
elementary school through high school and correlated it with        through grades or holistic ratings). Unlike all other areas in
teacher grades. Students were given three (annual) adminis-         CBM, these test administration variables have been studied,
trations of both a reading measure (oral reading fluency) and       often with similar results.
a three-minute story starter from which to write a composi-              In summary, by the late 2000s, a sufficient number of
tion (scored by counting the total number of words written          studies had been conducted in written expression CBM to
(TWW) and the number of words spelled correctly (WSC)).             indicate that (d) scoring written products needed to be
                                                                    different for middle school students (from the previous use
In correlating these measures with teacher grades (in English
                                                                    of words written and words written correct with elemen-
and social studies), they reported that “in nearly every case,
                                                                    tary students, (c) percentages of words written correctly
WRC [words read correctly] correlated more highly with              (WWC) and correct word sequences (CWS), as well as
course grades than WSC (p. 152)” with most in the moderate          correct minus incorrect word sequences (CIWS) were more
range and accounting for 15% to 21% of the variance.                sensitive metrics (in terms of criterion related evidence), (b)
    Finally, in the most recent study with 447 eighth grade         amount of writing time influenced reliability but not validity,
students, Amato and Watkins [135] studied the predictive            particularly for older students, and (a) genre for writing was
validity of traditional CBMs in writing. A number of metrics        not an influential variable (on either reliability or validity).
were used: total words written, words spelled correctly (both       Furthermore, the various measures of written expression
count and percent, correct word sequences (both count               appeared to be correlated with published writing tests, state
and percent), correct-incorrect word sequences, number of           tests, and teacher grades.
ISRN Education                                                                                                                        11
of tutoring with “validated instruction” resulting in a very          structured on identifying reliable and valid measures for
large ES of 1.34.                                                     screening and progress monitoring, using progress monitor-
                                                                      ing for students with special needs, analyzing specific skills,
6.2. Domains Sampled in Middle Schools. Ann Foegen has                and using the measures as part of class wide peer tutoring and
been the most dominant researcher in middle school math-              consulting with teaching.
ematics and has addressed a number of problem types.                       Christ et al. [140] review computation CBMs and use
Foegen’s [155] early work on middle school math focused on            Messick’s validity framework [1] relating content and criteri-
both a fact probe and an estimation probe (both computation           on-related evidence while also noting the lack of research
only and as word problems). Using teacher ratings and a               addressing consequential validity. In particular, two domain
standardized achievement test as criterion measures, she
                                                                      samples are considered: sub-skill mastery measure and gen-
reported sufficient levels of criterion-referenced correlations
                                                                      eral outcomes measures as well as stimulus sampling (cur-
over 10 weeks. She also noted that reliability increases when
                                                                      riculum and robust indicators); clearly the two features are
successive probes were aggregated but slope of improve-
ment remained moderate. She extended this work [137] by               related. Most reliability studies address internal consistency
addressing low achieving students, administering a basic              and alternate forms; no research is presented on test-retest
math operations task, a basic estimation task (both com-              reliability. A key issue is the relative stability of single skill
putational and word problems), a modified estimation task             measures versus the greater variance associated with multiple
(with changes in the number types), along with ratings                skill measures, particularly when randomly placed on the
from teachers, grades from students, and performance on a             measure. Finally, duration of administration is summarized.
standardized test. Multiple types of reliability were computed        “A single brief assessment of 1 to 2 min is probably sufficient
(and found to be adequate); in addition, criterion-related            to assess most single skills. Longer and more numerous
evidence (correlations) among the CBMs and teacher ratings            assessments are necessary to assess multiple skills” (p. 203).
were moderate.                                                             For Fuchs et al. [144], the focus of their review of four
    Finally, Foegen [156] used The Monitoring Basic Skills            studies was on the capability of a response to intention (RTI)
Progress (MBSP) measures for computation [157] and for                system in mathematics to reduce the need for intensive, long-
concepts and applications [158] (reflecting skills and concepts       term service. The studies address first grade tutoring, math
from the Tennessee state mathematics curriculum); in addi-            facts fluency in first grade, math facts tutoring in third grade,
tion, she used basic facts, estimation, complex quantity dis-         and finally tutoring with word problems. Although they
crimination, and missing number. She reported moderately              describe significant effects from the first grade tutoring (effect
high levels of alternate form reliability (.80–.91) in grades 6–      size of .40), they also noted the low relative performance level
8 (fall, winter, and spring) and test-retest reliability (.70–.85).   of the students (compared to students not at risk); therefore,
Using teacher ratings and a standardized achievement test as          another study was implemented with an emphasis on tutoring
criterion measures, she reported sufficient levels of criterion-      for fluency. In the third study, transfer from math facts to
referenced correlations and moderate rates of growth.                 word problems was the focus for third grade student with
                                                                      effect sizes reported for math facts of .27 and word problems
6.3. Summary and Reflection of Mathematics Domains with               .53, suggesting lack of transfer. Finally, in this third study, the
CBM. Four reviews have been published on mathematics                  interaction of tutoring with validated classroom instruction
CBM. In this chronological summary, the highlights of these           was the focus that was found to be significantly more effective
reviews are presented.                                                than when implemented with teacher-designed instruction.
    Foegen et al. [141] comprehensively review two approach-               Following the initial research on reading and writing at
es to developing CBMs in math: curriculum sampling versus             the University of Minnesota, there has been a very significant
robust indicators. Rather than a mastery approach, cur-               increase in the research on mathematics. Quite likely, the fed-
riculum sampling reflects items from within a year “given             eral legislation (NCLB) that began in 2001 had an influence
the high level of curriculum specificity in mathematics”              as the two areas in which districts were held accountable
(p. 122); in contrast, robust indicators reflect problems that        was reading and mathematics. And, like the early literacy
“identify aspects of core competence in mathematics [that]            skills, waiting until grade 3 made little sense so much of this
enables students’ growth to be modeled over multiple years            research spanned the entire grade range of K-8. Also, like
of learning” (p. 122). This review analyzes results from 32           the early literacy skills research, mathematics is comprised of
reports on progress monitoring summarizing reliability and            multiple domains that eventually need to be stitched together.
validity research (stage 1) as well as student growth (stage 2).
                                                                      However, an important difference is that these domains
Results are summarized for early, elementary, and secondary
                                                                      successively build upon each other. Whereas learning to
mathematics. The seven studies on using data to improve
student achievement have all been conducted by Fuchs and              read involves generalizations with many exceptions to the
colleagues with six of them focused on computation.                   rules (of sounds and syllabication), learning to compute
    Lembke and Stecker [159] summarize procedural issues              and solve mathematics problems is lawful with successive
in implementing math CBMs, including steps in data-based              algorithms used. For example, an early skill in counting
decision making (screening and progress monitoring), met-             (using objects and finger counting) turns into addition and
rics for summarizing performance (correct digits, problems,           then multiplication. A number of principles are applied in
or responses), and establishing goals. The review is then             solving problems so that students can learn more general case
ISRN Education                                                                                                                     13
strategies (e.g., distributive and associative principles). There-   demands made on students and is likely part of measurement
fore, the field of CBM research addressed domains that had to        systems that are “curriculum-based.”
be integrated into long-range goal measurement, a feat more               Initially, Espin’s focus was on reading aloud from content
similar to the development of general outcome measurement            area text books [163]. In this study, 124 students in grade
in secondary settings that is addressed next. That is, to avoid      10 were administered a background knowledge vocabulary
a mastery-monitoring model (with domains of assessment               test (matching a word with its definition with 10 words
tightly associated with instruction), a more general case            sampled from the content text), reading passages (both pre
sampling was developed, integrating a range of skills.               and post), and a classroom study task (with a multiple choice
                                                                     test that included literal and inferential questions) in English
7. Secondary Content CBM                                             and Science. They reported high reliability (both alternate
                                                                     form and test-retest) as well as a pattern of performance for
Tindal and Espin have contributed the most research for              most students with disabilities (5 of 7) that was consistent
CBM in secondary settings, with the essential difference             with a general deficit in both content areas (below the
between them being the source for ensuring measurement               40th percentile rank) with only 28 of 102 general education
comparability and applicability to content area instruction.         students in this group; no differences were found in special
A comprehensive comparison of these two approaches is                education comparing general deficit versus content-specific
published by Espin and Tindal [160] addressing critical issues       deficit performance. However, when comparing these two
such as the curriculum in secondary classrooms, the skills           groups following a period of study in the text, the two groups
                                                                     were different (with higher performance by content-specific
to be assessed in CBM, research in reading, writing, and
                                                                     deficit students in the number of words read correctly per
mathematics, and finally a comparison of the two approaches.
                                                                     minute). Finally, they reported differences between these two
In general, Espin has addressed vocabulary as the key to
                                                                     groups on background knowledge.
understanding measurement in secondary settings while                     The study was completed by Espin and Deno [164] with
Tindal has addressed content concepts.                               10th grade students (𝑛 = 121) who were given English and
                                                                     Science texts to read (for one minute) and study (for 30
7.1. Vocabulary Assessment in Secondary Schools. Much of
                                                                     minutes in preparation for completing a 25-item multiple
Espin’s research laid the groundwork for this area (correlating
                                                                     choice test). They used grade point average and performance
reading aloud from content text, completing a maze test, and         on a standardized achievement test as criterion measures.
matching vocabulary words with definitions); consistently,           They reported reliable and positive correlations between
low moderate correlations have been reported among these             reading measures and all performance measures (in the range
measures, often highlighting the importance of vocabulary.           of 40 s to 70 s), accounting for 11% to 28% of the variances. The
     Quirk [161] notes that words in secondary settings are          correlations with reading fluency and study questions and
Graeco-Latin in origin and distinguishes them from English           published achievement tests, as well as grade point average,
words having an Anglo-Saxon origin, which are typically              were about the same value.
short (monosyllabic), learned early in life, and form the basis           In a similar study, Espin and Deno [165] measured 120
of everyday discourse. When students move from elementary            tenth grade students with four tasks in both English and
schools to middle schools, the nature of reading changes             Science: vocabulary matching (10 items from the study pas-
drastically. The words of secondary content are typically mul-       sage), prestudy reading aloud, content area study task (900-
tisyllabic, sound foreign, contain morphophonemic struc-             word passages with 25 multiple choice (literal or inferential)
tures that are only somewhat transparent, and are learned            questions to answer), and poststudy reading aloud (which
later in life [162]. They comprise the primary language of           were not analyzed). They reported moderate correlations
the secondary content classroom and pose a more serious              among the measures. Text-based reading added 13% of the
problem for students at risk of failure, “the component parts        variance to performance on the content area study task with
of Graeco-Latin words no longer carry much meaning for               vocabulary subsequently adding an additional 9% of the
most uses of the words in English. The words have lost               variance; this finding was true for both English and Science
their semantic transparency” [162, p. 689]. Consequently,            (though with slightly more positive results for English).
they typically are abstract, do not appear frequently in the         When vocabulary was entered into the regression analysis
English language, and are not easy to understand. “When              first and subsequent analyses included text-based reading,
these features combine in words, they interfere with word use        no additional variance was accounted. Separate analyses for
and with word learning” [162, p. 696], and as a result, meaning      students from both ends of the distribution showed the test-
is derived through whole word search which is slower. Fur-           based reading passages to function as well as vocabulary
thermore, in secondary settings, the lens for learning is not        measures. In a follow-up study, they also reported that the
spoken language or conversation. “Academic Graeco-Latin              vocabulary test was slightly higher in its correlation with a
words are mainly literary in their use. Most native speakers         content study task (and accounted for more of the variance)
of English begin to encounter these words in quantity in their       than oral reading of the content text (counting number of
upper primary school reading and in the formal secondary             words correct per minute).
school setting. So, the word’s introduction in literature or              Espin and Foegen [166] added a maze test to the reading
textbooks, rather than conversation, restricts people’s access       aloud from text and vocabulary test as secondary measures;
to them” [162, p. 677]. This change has significance in the          in this study, 184 students in grades six to eight participated
14                                                                                                                    ISRN Education
by taking these three measures and then taking three criterion       the Direct Instruction approach in which the focus is on the
measures: a 10-item multiple choice comprehension test, daily        juxtaposition of examples and nonexamples as the basis for
tests (from five timed reading passages with 10 multiple             defining concepts [173].
choice questions), and a posttest comprised of a compilation             The reason for taking this route to curriculum-based
of multiple choice questions from the daily tests. The results       measurement in secondary content areas is threefold. First,
showed moderately high correlations among the three CBM              content (information) becomes much more subject specific
tasks and the three criterion measures though vocabulary was         in middle and high school grades and less generalizable
the best predictor measures. Again, in a series of stepwise          outside of the subject area. Second, this content specificity is
regression analyses, vocabulary appeared to contribute the           likely to be noteworthy even within a subject area, favoring a
most unique variance (when it was added first, no more               mastery monitoring approach rather than a general outcome
variance was accounted for by reading aloud or maze).                measurement (GOM) methodology. Third, students with
    Espin et al. [167] in a later study that directly addressed      disabilities or those at risk are likely to have difficulty reading
the vocabulary measure as a generalized indicator of perfor-         in addition to learning subject matter content, requiring a
mance in content area; in this study, text from the classroom        measurement system to do basically double duty, in serving
of instruction was used. They administered a number of               as a barometer of reading AND content learning. The subject
measures to 58 seventh grade students (five of whom were             specificity (both across and within content areas), however,
receiving special education services); a number of vocabulary        is the most troubling issue to address in generating a GOM
matching measures (with 49 terms and definitions used in             approach. An artifact of this issue is that content information
each area of sociology, psychology, and geography), both stu-        becomes difficult to traverse; every new content area is new
dent read and administrator read, were administered along            with little opportunity to preview and review of information.
with a standardized achievement test and a classroom knowl-
edge test (36 items using factual and applied knowledge).            7.2.1. General Description of CBA. In the concept-based as-
Alternate form reliability was moderate (but higher when             sessment (CBA), three to four primary attributes per concept
multiple probes were aggregated); the correlation was mod-           are identified in a content area from content experts who
erately high between the vocabulary probes (particularly the         use facts in the curriculum to inductively derive them. “A
administrator read measures) and the published test as well as       concept-based focus provides the teacher with a template for
the classroom post-test, though moderate-low with grades.            specifying the domain-specific conceptual knowledge with
    The study conducted by Espin et al. [168] continued              corresponding explicitly identified attributes and examples
this research on vocabulary matching in social studies with          and nonexamples” [174, p. 335]. Attributes provide critical
more focus on progress and administration format (student            rules for organizing the domain of facts with generalizability
versus administrator). Using the same population as used             to novel content in the same subject area. For example,
earlier by Espin et al. [167], they analyzed the data using          Twyman and Tindal [175] define “nationalism as a concept
hierarchical linear modeling to compare growth for the two           [with] three attributes as: (a) has an ideology based on
administration formats. They found more growth in correct            devotion and loyalty to a nation, (b) primary emphasis on
matches with student (.65) than administrator (.22) read             promotion of its culture and interests, and (c) identifies
tests. In a comparison of three criterion measures as level          with the power of historical tradition to oppose those of
two predictors, they found all to contribute significantly to        other nations or supranational groups” (p. 5). As another
vocabulary knowledge.                                                illustration of a major concept in social studies, Twyman et
    In the most recent study, [169] presented the results            al. [176] use four critical attributes to characterize civilization:
from two studies. In the first study, two measures were              (a) having religious beliefs, (b) forming social groups, (c)
administered to 236 students in grade eight (a read aloud            generating support systems and activities, and (d) employing
task administered with timing noted at one, two, and three           writing as a primary form of communication. In both
minutes and a maze task with timing noted at two, three,             illustrations, the concepts and their corresponding attributes
and four minutes); both tasks were based on human interest           can be applied to novel content with new examples.
stories in the newspaper. Performance on these tasks was then
correlated with a state test. Alternate form reliability was quite   7.2.2. Critical Effects from CBA. One of the important side
high and both correlated with the state test.                        effects from this approach is the immediate reduction in com-
                                                                     plexity of content information with clear explication of crit-
7.2. Concept-Based Assessment in Secondary Schools. As a             ical information. Furthermore, the concept-based approach
basis for framing a concept-based approach, Nolet, and               leads to assessments that take advantage of constructed
Tindal [170] divide content into three knowledge forms [171]:        responses with explicit guidelines for a partial credit-scoring
facts (single dimensional declarative statements), concepts          algorithm [177]. For example, a student’s response can be
(with a label, attributes, and both examples and nonexam-            scored for framing an argument, as well as providing both
ples), and principles (if-then relations that present cause-         a rational and details that support the rationale with up to
effect relations). With concepts highlighted as the key knowl-       seven points possible [174]; alternatively, a multistep flow
edge form, attributes and subsequently examples and nonex-           chart, with binary decisions, can provide a score of 0 to 4
amples are then articulated for each concept [172]. Attributes       points [172, 175]. In either partial credit system, reliability
are essential for ensuring comparability of alternate forms,         is enhanced by using an explicit scoring guide [178]. A
a critical feature of CBM. This approach is in contrast to           concept-based approach also provides an explicit link to
ISRN Education                                                                                                                  15
instruction through the use of semantic maps [179] or the           a number of training manuals are available in middle school
verbal interactions between teachers and students during            science [186], high school science [187], mathematics [188],
instruction [180, 181]. Furthermore, the approach provides a        language arts [189], and social science [187].
mechanism for special education teachers to collaborate with
general education content teachers in an effective manner           7.3. Summary and Reflection of Access and Target Skills for
[182]. Finally, both teachers and researchers can begin to          CBM. Assessment in secondary settings did not develop
integrate curriculum content, instructional delivery, student       until well after the initial research at the University of Min-
perception, and formative test performance that provides            nesota IRLD. And the research that did eventually emerge was
essential feedback [183]. The approach is particularly useful       considerably different. No longer looking at measurement
in linking assessment with instruction and is universal across      of access skills, the focus directly targeted the curriculum
content areas with application in the social and physical           content. For both researchers and programs of research,
sciences [181].                                                     this target reflected a challenge to a GOM perspective as
                                                                    well as other principles underlying the initial CBM research
7.2.3. Empirical Outcomes from CBA. The approach has                platform. To maintain a time series (and long-range goal
empirical support from several studies. For example, Twy-           sampling) design, target content had to be approached from a
man et al. [176] showed that by anchoring both instruction          broad perspective that nevertheless maintained relevance for
and assessment on concepts, students form a deeper                  teachers to use in evaluating instructional programs. Espin
understanding of content. Although students using a concept         and colleagues used vocabulary (knowledge) as their content
basis showed no difference on tests of facts (declarative           medium while Tindal and colleagues addressed concepts
knowledge), they were significantly better on tests of vocab-       (attributes and examples-nonexamples). Both approaches
ulary and problem solving. This approach has been used in a         distilled measurement to a command of academic words
case study with a seventh grade student for whom English was        and helped remove reading as a necessary access skill which
his second language [176]. In a unit on four Mesoamerican           would have presented a tautology. Students could not reach
civilizations with a cloze technique using examples and             proficiency on content unless they learned to read, and
nonexamples of religion, social groups, support systems and         they would not be able to read (content) unless they knew
activities, and writing, a control group of students taught in      the academic words. However, neither provided quite the
a traditional factual approach performed more poorly on a           elegance attained with other features of CBM (like easy to
writing task than students in the experimental CBI group.           create and implement as well as administer and score).
In a similar study on history, Twyman et al. [174] used three
types of measures (fact test, vocabulary knowledge, and             8. Analysis of Current Practice and a Proposal
problem-solving essay) to compare two groups of students               for an Alternative Conception
taught with CBI and traditionally (as factual information)
over 21 days (with each group for 46 minutes). Although             The most dominant use of CBM has been for screening and
both groups improved significantly on the facts test, students      progress monitoring. Even before response-to-intervention
in CBI improved differentially more than the traditional            (RTI) was common parlance among educators, CBM offered
group for the other two measures.                                   a system for monitoring students attainment of goals on their
                                                                    IEPs and evaluating instructional programs. As noted in the
7.2.4. Research to Practice. Two training modules provide           early history, the measures were validated and used to make a
an overview of concept-based instruction and assessment in          variety of decisions. Of late, however, their use in a formal
middle-high schools (Tindal et al. [184] and Nolet et al. [185]).   system such as RTI has become widespread among school
In the former, a complete description is provided for under-        districts in the United States.
standing content information in terms of knowledge forms                The National Research Center on Learning Disabili-
(facts, concepts, and principles) as well as designing instruc-     ties (NRCLD) defines RTI as “student-centered assessment
tion for higher order intellectual operations that addresses        models that use problem-solving and research-based meth-
curriculum analysis, instructional planning, and interactive        ods to identify and address learning difficulties in chil-
teaching. An appendix is included with black line masters           dren [through] high-quality classroom instruction, universal
for various semantic maps to organize concepts. The latter          screening, continuous progress monitoring, research-based
publication provides companion training in development of           interventions, and fidelity of instructional interventions”
assessments for understanding students’ ability to reiterate,       [190, p. 86]. RTI depends upon the coordinate relation
summarize, illustrate, evaluate, predict, and explain. In addi-     between implementation of interventions and change on
tion, a number of important assessment issues are addressed         measures over time so that the interventions are either
including sampling planning and indexing reliability, both of       vindicated or modified for students with disabilities or who
which are necessary for development of assessment problem-          are at risk of failure. For Wanzek and Cavanaugh [191], fully
solving tasks. An appendix provides black line masters for          developed RTI models also integrate general and special
a decision-logic scoring guide using constructed problem-           education to identify and integrate school resources in pro-
solving essays. These initial training materials were later         viding effective instruction and intervention. Furthermore,
adapted with empirical references and applied to a consul-          scientifically based evidence is used to develop effective
tation [182] and learning assessment system [172] for special       instruction including the intensity of instruction (time, fre-
education teachers in middle schools settings. In addition,         quency, duration, and instructional group size).
16                                                                                                                  ISRN Education
8.1. Empirical Results and Needed Research. Systemic research         on the effects of instructional modifications, they found
on RTI is more conceptual than actual. For example, Barnett           that teachers with the self-monitoring added to CBM made
et al. [192] conceptualized technical adequacy of RTI from            more significant modification and had significantly greater
Messick’s [193] evidential (efficacy and effectiveness) and           improvement in their students’ performance (pre-post) in
consequential perspective. They present a technical adequacy          math than teachers with CBM alone or no CBM.
checklist for each of three instructional tiers “emphasizing              Within 20 years, the research on data-based decision
technical checks and iterative use, via a fictitious example” (p.     making was sufficient to arrive at some generalizations (with-
26). More recently, VanDerHeyden [194] added the following            out and with achievement results) [200]. In particular, five
to considerations of technical adequacy of RTI implemen-              features appeared critical for the use of CBM for elementary
tation in which “assessments must be technically adequate             or middle school students with mild to moderate disabilities.
for the purpose for which it is used [and] decision rules
must be correctly applied with correct actions occurring in               (1) Progress monitoring alone is insufficient; rather,
sequence” (p. 336). In the end, her focus was on sensitivity                  instruction should be tailored to student needs.
and specificity of decisions (and subsequent errors) using a              (2) Both types of changes are needed: (a) raising goals
receiver operating curve (ROC) analysis.                                      when progress is higher than expected and (b) chang-
     The problem with both previous analyses (and a number                    ing instruction when progress is less than expected.
of similar publications) is not the researchers’ perspectives
                                                                          (3) Decision making is facilitated (made more efficient
but the need to follow up on their perspectives. Or, as
                                                                              and with greater satisfaction) with computerized data
Gersten et al. [195] note, “fully determining the validity of an
                                                                              collection, storage, management, and analysis.
assessment process transcends what any one researcher can
accomplish. It is a task for a community of researchers and              (4) Student strengths and weaknesses can be identified
practitioners to consider meanings and utility of assessment                 with skills analysis in conjunction with consultation.
procedures in relation to current thinking about how to                   (5) Meaningful programmatic changes can be facilitated
improve instructional practice and issues raised by studies of                with consultation.
implementation” (p. 512).
                                                                          This research on teacher skills in the use of CBM also
8.1.1. What We Know. Two factors appear particularly influ-           is based on the use of various decision rules for identifying
ential in how well teachers implement CBM: adequacy of                risk. For example, four dual discrepancy (DD) cutoffs were
planning time and teachers’ degrees of personal and/or                compared on DIBELS (winter and spring) in which student
teaching efficacy [196]. Even though five indices of imple-           growth (in first grade) was set below the 25th percentile, at
mentation were examined (number of measurement points,                the 33rd percentile, at the 50th percentile, and less than one
ambitiousness of the goal set for the student, number of              standard deviation below the mean [201]. On an end of year
times student goal was raised, number of times teachers made          reading measure, the percentile rank measures consistently
instructional changes, and timing of changes), she reported           (and increasingly by rank) reflected reading risk; “both the
that “teachers with high personal efficacy and high teaching          25th and 33rd percentile criteria moderately differentiated
efficacy increased the end-of-year goal for students more             reading skills of responsive and non-responsive students
often than their counterparts with low degrees of efficacy;           who were at risk, but the 50th percentile and one standard
teachers with high teaching efficacy set goals that were overall      deviation criteria both led to small effect sizes” (p. 402). It
more ambitious than those of teachers with low teaching               appears that teachers need guidance to ensure that students
efficacy” (p. 5). In a related and later study, Allinder [197] also   with the greatest need are identified.
reported that when teachers implemented CBM more accu-                    And this aspect is important at the systems level. As
rately, their students made significantly greater math gains          Mellard et al. [202] determined, in one of the few studies
than the students with teachers who either implemented                done on systems use of RTI. They addressed four compo-
CBM less accurately or who did not use CBM.                           nents as “a framework that includes (a) universal screening,
     This research on teacher use of CBM is based, in part,           (b) tiered levels of high-quality interventions, (c) progress
on measurement of, and intervention on, teacher behavior.             monitoring, and (d) data-based curricular decisions” (p. 186).
For example, Allinder and BeckBest [198] used an accuracy of          In articulating the actual practice of RTI, they concluded
implementation scale to investigate teachers’ CBM structure           that schools were screening students in various ways, using
(measuring baseline, graphing data, writing goals, and draw-          norms or percentage of population as cut points for risk
ing goal lines), measurement (test administration, scoring,           assessment, placing students into instructional tiers (in vary-
and frequency of measurement), and evaluation (describing             ing proportions), and monitoring progress in tiers two and
instructional strategies and the changes to them both in              three. Finally, they noted that “good recordkeeping systems
terms of content and timing) with each item rated on a 5-             was a recurring theme” (p. 192). In addition, school personnel
point scale. Their intervention included an initial training and      heeded the need to make screening and progress monitoring
then either university-based consultation or self-monitoring.         results accessible and to share data from year to year.
Though significant gains in math achievement were made, no                Ironically, none of this research on data-based decision
differences were found between the two treatment conditions.          making included training on single subject methodology.
Again, in a related and later study, Allinder et al. [199] studied    In fact, during the first 20 years of CBM, only one study
self-monitoring with CBM versus CBM alone. In their focus             investigated the effects of decision making using a single
ISRN Education                                                                                                                       17
subject design [203]. These researchers reported that, when         organized into tiers, time, and groups, as well as specific
teachers are trained in a single subject design, they can obtain    instructional emphases being used (along with curriculum
significant results that transfer well beyond the content of        materials and strategies), and how well decisions have been
instruction. This same argument appears in consideration            made with subsequent changes in level, variability, or slope.
of the curriculum in CBM. In terms of content-related               To get teachers with these resources and information, training
evidence and the relation between instructional planning, the       needs to be based on student reports that can be accessed to
curriculum has not been found to be a critical feature that         evaluate not only the effectiveness of interventions but also
identifies CBM as an effective tool in monitoring progress          the systematic use of data-based decision making. Training
[204]. That is, the use of CBM does not appear to be dictated       needs to focus not only on how to implement best practice but
by the relation of measurement content with instruction. The        also how to interpret information on student performance
studies Fuchs and Deno reference were from some of the
                                                                    and progress. Most importantly, professional development is
early research on curriculum differences conducted at the
                                                                    needed in how to use the information in a systematic manner.
IRLD in the 1980s. They also reference within curriculum
differences in sampling passages being as great as between          8.2. A Nomological Net Supporting a Theory of Change. To
curriculum differences and the lack of (or potential for)           support this “next generation” of training on data use, future
generalization. Rather than limiting sampling plans to the
                                                                    research first needs to be based on a nomological net that
curriculum, they suggest three other criteria as instrumental
                                                                    references three warrants.
for instructional utility: (a) comparable difficulty, (b) valid
outcome indicators, and (c) qualitative feedback. Therefore,        Assumption 1 (measurement sufficiency). Students are ap-
in developing teachers skills for using CBM, these important        propriately placed in long-range goal material to ensure the
contexts should be considered so that teachers know how             measures are sensitive to change. What is the type of measure,
to generalize from measurement results to more broadly              grade level of measure, and the time interval (density of meas-
considered instructional changes.                                   ures) used during the year?
8.1.2. What We Do Not Know. Presently, few researchers              Assumption 2 (instructional adequacy). Instruction is
have directly investigated training systems. For example,           detailed and explicit, allowing a team of teachers to coordi-
in a training manual for reading decision making, such              nate various elements such as providing an instructional tier
topics include benchmark measures, expected growth rates,           (1–3), allocating sufficient time to teach, grouping students
goal establishment, decision rules, instructional strategies,       appropriately, deciding on the instructional emphasis using
effective reading instruction, and exemplar case studies [205].     specific curriculum (core and supplemental) materials, and
Yet, no information exists on the effectiveness of this training.   determining what instructional strategies to use.
And, as noted by Barnett et al. [192, p. 20], implementation of
RTI requires practitioners to consider prevention, screening,       Assumption 3 (decision making). Interventions need to be
and successive instructional programs that are scientifically       introduced for low performing students when data warrant
based and changed using decision rules. It is this combination      change. Are interventions provided at the right time and in
of multiple variables that needs to be investigated concur-         accordance with specific data features (e.g., level, variability,
rently and at the systems level.                                    and slope)?
    In summary, more research and development is needed
for training teachers on systematic use of data, consideration          The theory of change can be viewed as the interlocking
of goals, skills analysis, and data management systems; it          union of the three components in a chain: measurement
also should include use of single subject designs in practice.      sufficiency, instructional adequacy, and decision making. See
Yet, most CBM training systems have little data on the              Figure 1. It is not each link itself that is critical, but it is the
effectiveness of the training, even though they are premised        intersection of the link with subsequent links that is critical
upon the collection of student performance and progress.            (the time series is important in making causal arguments in
Another problem is that data are collected only at the              which time proceeds from left to right). As teachers collect
individual student level and not on teachers. Therefore,            data (from benchmark to decisions of risk and monitoring
professional development cannot be tailored to the areas in         of progress), the data used to inform them needs to be
which teachers need the most assistance, whether it is about        sufficient, directed toward instruction, and adjusted as
how to effectively progress monitor students, how to develop        needed. Furthermore, this information needs to be collected
effective instructional programs, or how to make decisions          into a single database for teachers to monitor their application
for maintaining or changing these programs.                         of RTI as well as policy makers and administrators to use
    Next generation training needs to allow information             the information in making system decisions. The theory
to be used from teachers and students through relational            is driven by accessibility as a key ingredient to change. If
databases. Teachers need systematic models of assessment            information is not easily accessible and tractable, then it is
practices with classroom vignettes, exemplary practices, and        unlikely to result in use. This theoretical approach also needs
resources that can be immediately used to develop effective         to be holistic. Changing individual components as separate
progress monitoring. Teachers also need to determine how            events is unlikely to change systems. Rather, the whole needs
many students are being monitored on specific measures,             to be reflected in the parts that in turn need to connect
grade levels, and time intervals, how students are being            teachers and administrators.
18                                                                                                                      ISRN Education
                   Teacher
                 professional
                 development
                                                            Progress monitoring
                                                             Instructional
                                                             adequacy
                                                              • Tier
                                      Measurement             • Time              Decision
                                       sufficiency            • Group             making
                   Teacher            • Type                  • Emphasis          • Level
                   practice                                   • Curriculum        • Variability
                                      • Grade level
                                                              • Strategy          • Slope
                                      • No. (density)
                                         Proximal
                                         variables
    The combined effects from all three components (prox-                 and progress monitoring data with state test scores in the
imal variables) are critical as well as the relation between              Oregon Assessment of Knowledge and Skills (OAKS) to doc-
them and the outcome (distal variables), which is within-                 ument the effects for students with varying characteristics.
year growth (on benchmark measures) to document change                    Most of the research examples in this section are based in
relative to peers. It is not enough to have only one of the               reading, but are likely to apply equally well in mathematics.
proximal variables—the right measures, targeted interven-
tions based on best evidence, and decisions tied to their affect          9.1. Measurement Sufficiency. The focus of this component is
on students. All three are needed. However, they need to                  to provide teachers and administrators training on student
work synergistically. And even then, changing these three                 measurement information to improve measurement suffi-
components is not enough either. Rather, the effect needs to              ciency (i.e., increase or decrease the frequency of measure-
close the achievement gap in which students with disabilities             ment, change the grade level of measurement, or modify
are catching up to their peers on grade level performance                 the type or domain of measure being used). These three
measures (e.g., benchmarks). Finally, for systemic change                 measurement components would allow policy to reflect an
data need to be collected on proximal variables for developing            empirical basis for practice.
reports on use, allowing for professional development to be                    Many suggestions have been published in the literature
tailored and specific.                                                    on how often a student should be progress monitored, but
                                                                          these reflect few actual studies of different types or schedules
9. Empirical Research on Three                                            of measurement. We know little about the full range of
   Assumptions (Warrants)                                                 reading measures, as most of the research in reading has
                                                                          focused on oral reading fluency. Yet, teachers need to consider
With these assumptions to guide training, and given what we               the full range of skills for students, including very early
know and do not know about teacher practice, the research                 reading (phoneme segmentation and alphabetic principles),
agenda for the next 30 years is organized around three critical           as well as vocabulary and comprehension. As Gersten and
warrants supporting claims and guiding collection of evi-                 Dimino [207] note, “equally important is research on RTI that
dence. In this last section, these three warrants are explicated          addresses students with problems with vocabulary, listening
and used to critique the empirical literature that has been col-          comprehension, and reading comprehension” (p. 105).
lected in the previous 30 years. However, when the appropri-
ate literature is absent, the analysis is based on easy CBM data          9.1.1. What (Domain) to Measure? Very little research has
extracted from the 2010-2011 school year, with approximately              been completed on the sensitivity of various measures for
10,000 students having progress monitoring in kindergarten                monitoring reading development. For example, should first
through grade five. A team of researchers from Behavioral                 grade students with low reading skills be monitored in letter
Research and Teaching (BRT) has presented the initial results             names, letter sounds, phoneme segmentation, or passage
from these analyses at two national conferences, respectively:            reading fluency? At what point in skill development should
Pacific Coast Research Conference and National Council on                 measures be changed (e.g., from monitoring letter sounds
Measurement in Education with [206] or Tindal and Nese                    to word reading fluency)? Hintze’s work may be the most
[68]. Data continue to be harvested on annual benchmarking                revealing, but it is limited primarily to oral reading fluency
ISRN Education                                                                                                                    19
[82, 208, 209]. In our initial analysis of progress monitoring,     Probably the most significant limitation of prior research in
we have found considerable variation in the type of reading         this area is that benchmark measures, not progress measures,
measure administered over time; see Tindal et al. [206].            are often used to document growth [68]. The general reason
In grades three to five, teachers sometimes measured                for this is the completeness of the data set for large groups
passage reading fluency one week and then either word               of students and the regularity of data collection (all students
reading fluency or reading comprehension a few weeks later,         in the district take measures in the fall, winter, and spring).
followed by word reading fluency. We found little pattern in        Two research exceptions reflecting different schedules report
this cycle of different reading measures being successively         generally similar results to the earlier research on growth:
administered over time.                                             Jenkins and Terjeson [76] and Jenkins et al. [75].
                                                                         We also know, however, that growth is increasingly being
9.1.2. What Grade Level to Measure? Recommendations on              documented as nonlinear, with more growth from fall to
the grade or domain level to measure are much less promi-           winter than from winter to spring [73, 77]. These findings
nent than how often to measure. Early research documented           have several implications for policy and practice. If indeed
differential sensitivity as a function of breadth of the reading    growth is nonlinear, teachers should not use a single goal as
domain [47, 210]. Otherwise, very few studies have addressed        part of an Individualized Education Plan (IEP) to create an
grade level of the measures but rather consider passage vari-       aim line. This practice would result in teachers underpredict-
ation [88] or difficulty [82]; again, most of this research is on   ing early progress and overpredicting later progress. From
oral reading fluency. The general finding is that passages differ   our own (easyCBM) data sets on progress monitoring, we
considerably unless explicitly controlled [83]. Again, our          have found that teachers are quite haphazard in the way they
initial analyses of progress monitoring indicate considerable       monitor students over time. For example, the average length
variation in the grade level of the measures used to monitor        of time between September 1 and the first passage reading
progress (from 1st to 8th grade). In 3rd to 5th grade, teachers     fluency (PRF) progress measure in 3rd through 5th grades is
generally (most frequently) monitored progress with grade-          more than 10 weeks with a standard deviation greater than
level passages or passages that were one grade level below.         seven weeks and students predominantly measured every
                                                                    two to five weeks thereafter. Finally, much of this research
9.1.3. How Often to Measure Progress? Frequency of measure-         fails to include full demographic information for students,
ment is a function of likelihood of change over time, and for       particularly those being progress monitored (e.g., students at
that, the best literature to reference is research on within-year   risk of failing to learn to read) [68].
growth for oral reading fluency. As noted earlier, Fuchs et
al. [69] documented the average slope of students in grades
                                                                    9.2. Instructional Adequacy. Two primary sources are useful
one to six using a least-squares regression between scores
                                                                    as sources for training on instructional systems: (a) What
and calendar days with slope converted to a weekly time
                                                                    Works Clearinghouse [211–213] and (b) the recently published
frame. For the 103 special and general education students,
                                                                    edition of the Handbook for Reading Interventions [214] as
respectively, the weekly growth in words correct per minute
                                                                    well as Wanzek and Cavanaugh [191]. These sources provide a
was 1.5 and 2.0 in first and second grades, 1.0 and 1.5 in
                                                                    wealth of information that can be used by teachers to organize
third grade, 0.85 and 1.1 in fourth grade, 0.5 and 0.8 in fifth
                                                                    their instruction, particularly the former two documents. We
grade, and 0.3 and 0.63 in sixth grade. Eight years later, Deno
                                                                    also use the results from a recent publication on grouping
et al. [70] conducted a far more reaching study in terms of
                                                                    by Schwartz et al. [215] as well as writing from the National
geographic sampling plan and sample size with nearly 3,000
                                                                    Center to Improve the Tools of Educators (NCITE).
students tested from four regions of the country. For 2,675
students in general education and 324 in special education
classes, they reported 2.0 words correct per minute growth          9.2.1. Instructional Tiers (1–3), Time, and Grouping. Most
per week until students achieved 30 WRCM; thereafter,               RTI systems use a 3-tier system [211]. In these three tiers, the
students in the general education population improved at            following levels of evidence are associated with each tier: (a)
least 1.0 word correct per minute/week. In the past decade          tier 1 (differentiated instruction for all students using current
years, seven studies have been published on slope of progress       reading performance) has low levels of evidence, (b) tier two
in oral reading fluency (Deno et al. [70]). In summary, a           (intensive, systematic instruction on multiple reading skills
number of variables have been considered in studying growth         that is delivered in small groups three to five times per week
of oral reading fluency over the past 15 years. Probably            for 20–40 minutes for students scoring below benchmarks
the most remarkable finding is the general consistency in           on universal screening) receives strong levels of evidence,
outcomes with very inconsistent sampling plans and research         and (c) tier three (daily intensive instruction on a number
methods.                                                            of different reading skills for students exhibiting minimum
     Note that most of these studies have focused on oral           progress in tier two after a reasonable time) receives low
reading fluency with few other reading measures being inves-        levels of evidence. An important caveat for tiers two and
tigated. The field also needs to address other reading skills       three is the recommendation for small group instruction to
(in addition to fluency) that are critical in learning to read      be homogenously configured though the use of a certified
for young students in the early elementary grades (e.g., letter     teacher or paraprofessional is not deemed critical. Rather, it is
naming, letter sounding, and phoneme segmenting) as well            the focus on how time is spent and apportioned that is critical,
as later elementary grades (vocabulary and comprehension).          not the total amount of time.
20                                                                                                                ISRN Education
     In the easyCBM research, considerable variation exists         Soar to Success, Study Island, Step Up to Writing, various
across districts in their allocations of time in tiers 2 and 3      trade consumable workbooks, or various leveled texts. Less
(from 30 minutes to 60 and 120 minutes extra) though most           than 3% were from the core curriculum: Houghton Mifflin or
(51%) of the students receive instruction 4 days per week and       Treasures.
another 25% receive instruction five days per week for their             “These instructional factors are encouraging high student
first intervention. Fully 22% receive this instruction for fewer    engagement in learning, having a strong academic focus
than 30 minutes per day, 39% received it for 30–59 minutes          and clarity of content coverage, facilitating moderate-to-high
per day, another 14% for an hour each day, and only 4% more         success rates, and performance monitoring and informative
than an hour every day [216].                                       feedback” [219, p. 582]. In this study, a specific mathematics
     The most clear research on grouping comes from                 program (Accelerated Math) was implemented with CBM.
Schwartz et al. [215] who used an experimental design to            The results of the analyses for both the NALT and STAR
document the effects of student-teacher ratios in teasing out       Math exams indicated that students who participated in
the very significant effects found with 1 : 1 (versus small group   AM demonstrated more growth than students who did not
instruction) along with professional development (primarily         participate (p. 532).
driven by the Reading Recovery model). With teachers                     In general, and consistent with the What Works Clear-
randomly assigned to small groups of two, three, and five           inghouse, Wanzek and Cavanaugh reported that tier two
students in grade two, teachers taught for 20 weeks in daily        involved small group instruction and “more frequent pro-
30-minute lessons in each condition. Using several pre-post         grams monitoring (weekly or biweekly) to ensure the effec-
measures, 1 : 1 was compared to all other groups combined           tiveness of instruction for the students” (p. 193). They found
and found to be significantly different (students scored            group sizes of three to four students, with increasing amounts
higher on the post-test after covarying on the pre-test);           of time over the grades (though in smaller bites in the lower
pairwise comparisons of 1 : 1 with each group were made,            grades), increasing intensity by offering additional time or
and no significant differences were found among the small           smaller groups with instruction more explicit, practice more
group comparisons (1 : 2, 1 : 3, and 1 : 5). Finally, post hoc      repeated, feedback more frequent, and more emphasis on
comparisons showed the 1 : 1 condition scoring significantly        high priority skills (also see Saez [216]).
higher on most of the measures. The problem, then, is simply
the resources needed to implement 1 : 1 programs, which is          9.3. Decision Making. CBM has generally been viewed as
an expensive venture that may be best reserved for students         ideographic not nomothetic with students serving as their
in tier 3. Otherwise, the meta-analysis by Elbaum et al. [217]      own control. In this view, the critical currency is how students
can be referenced to justify small group instruction, given         change over time rather than how they compare to each
that 1 : 1 instruction is very resource intensive.                  other. The purpose is to make an individual difference in
                                                                    measurement and evaluation not to document individual
9.2.2. Instructional Emphasis. In the What Works Clearing-          differences [220].
house [211], the recommendation is made to “use a curricu-              Single subject research designs are used to present data
lum that addresses the components of reading instruction            on a line graph and analyzed to make data-based decisions
(phonemic awareness, phonics, vocabulary, comprehension,            [221]. Typically, such designs include a baseline phase prior to
and fluency) and relates to students’ needs and development         implementation of intervention; this phase is then compared
level” (p. 20). In the easyCBM 2001–2011 analysis by Saez           to postintervention trajectory [222]. Visual analysis is used
[216], four areas were reportedly emphasized as the first inter-    to evaluate data of individuals or small groups [221], and
vention (in decreasing use) targeting: fluency (38%), word          decisions rules are then used to signal when instructional
identification (32%), comprehension (21%), and vocabulary           changes are needed (not just who should be labeled with a
(8%). In addition, we add phonemic awareness and alphabetic         disability). With appropriate labels and particularly timely
principles noted by O’ Connor [218].                                well-depicted interventions, “cause-effect” relations can be
                                                                    inferred. Using one of three references, teachers can deter-
9.2.3. Curriculum Materials and Instructional Strategies. A         mine who is benefitting and who is not benefitting from the
myriad of materials and strategies are available for read-          instructional program using (a) the normative distribution
ing instruction. Suffice it to say that “reading instruction        of student performance based on all those in the group or a
should also be explicit. Explicit instruction involves a high       subset of students similar to the targeted student or group,
level of teacher student interaction that includes frequent         (b) a standard that corresponds with a successful outcome, or
opportunities for students to practice the skill with specific      (c) individual reference using a comparison to an individual
corrective feedback” [211, p. 22]. According to the analysis        student’s prior data [223]. Once sufficient data are collected
by Saez [216], approximately 66% were intensive interven-           (e.g., four to five occasions), four indices are summarized to
tions: CARS/STARS (Comprehensive Assessment of Reading,             evaluate instructional phases, the first two of which are within
Strategies/Strategies to Achieve Reading Success), Corrective       phases, and the last two are between phases: (a) slope, (b)
Reading, Explode the Code, Phonics for Reading, Harcourt            variability, (c) level, and (d) overlap (which incorporates all
Intervention Station, Horizons, Voyager Passport, Open              three of these indices).
Court Intervention Kit, Triumphs, Read Naturally, Reading               Decisions can also be based on goal lines or aim lines.
Mastery, Rewards, Score4Reading, or SLANT (Structured               Decision rules for intervention change, including service
Language Training). About 31% were strategic interventions:         eligibility, are based upon these goal lines in concert with
ISRN Education                                                                                                                 21
an empirically sound instructional and decision making            or intensity duration (19%); relatively few targeted changes in
sequence [192, 224]. The aim line or goal line refers to the      tier (6%) or group size (6%).
projected amount of weekly growth across time that teachers            The recommendation is made to monitor progress at
establish as a minimum for adequate progress [76]. Aim lines      least eight times per year by the What Works Clearinghouse
are established in research using such techniques as norm         [211]. “Although no studies have experimentally tested the
references or ordinary least squares (OLS) regression. Deno et    impact of progress monitoring on outcomes in reading, we
al. [70] suggested normative growth rates; others considered      still encourage schools to monitor the progress of these
slope of improvement relative to aim lines by fitting an OLS      students so that personnel possess information on how a
regression line to each student’s scores, computed the weekly     student is doing in general education reading proficiency
increase based on the obtained parameter, and averaged            and improving in specific skills” [211, p. 7]. This information
these weekly increases (across students within grade levels       would allow decision making to be based on changes in
and within general or special education programs). Ardoin         level (average performance), variability (standard deviation
[225] developed aim lines by using each student’s median          of values), and slope (amount of improvement) as indices that
baseline score as the aim line start point and calculated the     can be used to make decisions to change programs. Generally,
end point of the aim line by multiplying the number of            slope is used most frequently, though it is interpreted relative
weeks of intervention implementation by the desired gain          to the amount of variability.
per week and adding the product to the start point (using
an OLS solution). The beginning point of the goal line            10. Summary and Parting Comments
is placed at the beginning of intervention, the end point
is placed at the predicted end of data collection, and a          Curriculum-based measurement research has a solid 30-year
line is drawn to connect the two points. VanDerHeyden et          track record that supports its technical adequacy. It represents
al. [226] developed a local norm in a manner similar to           a standardized form of assessment so that comparability
the technique used by Ardoin [225] using students in the          of measures (and outcomes) can be ensured, whether it is
“instructional range” as the comparison group. They then          comparing a student’s performance to other students (norm
used level of performance and trend over time to estimate         referenced) or comparing their previous performance to later
students’ growth by subtracting each student’s fall score from    performance (individual referenced). The former provides
his/her spring score and then dividing this difference by the     effective screening for risk and resource allocation, while
number of intervention weeks. The average of the students’        the latter provides a system for progress monitoring and
weekly growth estimates was then considered to represent          evaluation of instruction.
the growth rate of students considered not to be at risk (i.e.,       For any measurement system, the first order of business
those students who scored in the instructional range during       is about technical adequacy, which has been the essential
the spring administration).                                       focus throughout the 30 years. The key areas for this research
     Presently, most decision rules are conceptually not empir-   began in reading (fluency), but it was only after major
ically based. Furthermore, a number of methodological             political events that it expanded to early literacy or the
problems are inherent in their use: (a) as noted earlier, the     foundation of disparate reading skills. Likewise, the initial
assumption of linear growth, which may not be accurate            research on writing was limited, primarily in the populations
[73, 77], (b) use of gain scores [227], and (c) difficulties      being monitored (elementary age students but later becoming
in combining statistical and graphic representations of data      secondary students), which gave rise to systems for scoring
[192]. Although aim lines may improve decisions, they also        student work (products). Attention to mathematics also was
have their limitations: disagreement in estimates, distraction    not part of the original research but quickly gained stride
from other data qualities, ambiguous results that can lead to     across the age ranges of students; given the unique nature
misinterpretations [192], and confounding growth with level       of mathematics, the issue mostly addressed domains for
of baseline performance [194, 224].                               sampling items. Finally, in middle and high schools, the focus
     In the analysis by Saez [216], three districts were com-     was on the target skills of content trying to “essentialize” the
pared in their response to nonresponders with the following       access skills needed to get to the content. Obviously, more
options being promulgated: (a) four to six points below aim       research is needed in all of these subject areas.
line or slope is flat/decreasing, (b) measured achievement            However, and perhaps more importantly, further research
falls below aim line or flat progress trend, or (c) after four    and development needs to occur in the training of teachers
data points to “make an instructional change or continue          on the use of data and on the reporting systems that teachers
to monitor.” Unfortunately, the vast majority of teachers         are using. Although early research targeted decision making,
were implementing few changes, with 355 students (65%)            it was under well-controlled studies being conducted with
receiving only one intervention, 138 students (25%) receiving     university support. Little attention was given to full-scale
two changes, 25 (4.6%) receiving three, and only 29 (5.3%)        adoption and systematic incorporation into the decision
receiving four or more instructional interventions. Although      making of the classrooms. Essentially, the research had little
some of the second interventions were introduced in Septem-       verisimilitude. Now, however, it is important to begin this
ber (𝑛 = 15) or October (𝑛 = 13), many were implemented           research in typical classrooms that are part of not apart
in the late fall (48 in November and December) and winter         from teachers and students participating in a response-to-
(21 in January and February). And of the changes made, most       intervention system. In order to begin this research, it is best
were targeted toward instructional program curricula (50%)        to consider training and reporting. The training can address
22                                                                                                                           ISRN Education
the use of norm- or individual-referenced data as well as                 [11] S. Deno, P. Mirkin, and M. Shinn, Behavioral Perspectives on the
on smart reports that are electronic and provide prompts                       Assessment of Learning Disabled Children, Monograph, Institute
for decision making, much like many software applications                      for Research on Learning Disabilities (IRLD), University of
being developed around scheduling flights, meetings, and                       Minnesota, Minneapolis, Minn, USA, 1979.
weight loss. Given the ubiquitous nature of technology in                [12] J. R. Jenkins, S. Deno, and P. Mirkin, Measuring Pupil Progress
the classroom, there is no better time for a new generation                    Toward the Least Restrictive Environment, Monograph, Institute
of curriculum-based measurement embedded into a digital                        for Research on Learning Disabilities (IRLD), University of
                                                                               Minnesota, Minneapolis, Minn, USA, 1979.
environment.
                                                                          [13] L. Fuchs and S. Deno, “The relationship between curriculum-
                                                                               based mastery measures and standardized achievement tests in
Acknowledgments                                                                reading,” Research Report, Institute for Research on Learning
                                                                               Disabilities (IRLD), University of Minnesota, Minneapolis,
This paper would not have been possible without others at                      Minn, USA, 1981.
Behavioral Research and Teaching (BRT) Denise Swanson,                   [14] L. Fuchs, G. Tindal, D. Fuchs, M. Shinn, S. Deno, and G. Ger-
helped organize the many references cited in this compila-                     mann, “The technical adequacy of a basal reading mastery test:
tion. A Steffani Mast, provided exceptional editing in moving                  the Holt reading series,” Research Report, Institute for Research
the paper to a final draft. Finally, Dr. Joseph Nese helped craft              on Learning Disabilities (IRLD), University of Minnesota,
the logic of the decision making argument at the end of the                    Minneapolis, Minn, USA, 1983.
paper as part of a grant application to fund this line of work.          [15] L. Fuchs, G. Tindal, M. Shinn, D. Fuchs, and G. Germann,
                                                                               “Technical adequacy of basal readers’ mastery tests: the Ginn
References                                                                     720 series,” Research Report, Institute for Research on Learning
                                                                               Disabilities (IRLD), University of Minnesota, Minneapolis,
 [1] S. Messick, “Standards of validity and the validity of standards          Minn, USA, 1983.
     in performance assessment,” Educational Measurement, vol. 14,       [16] G. Tindal, L. Fuchs, D. Fuchs, M. Shinn, S. Deno, and G. Ger-
     no. 4, pp. 5–8, 1995.                                                     mann, “The technical adequacy of a basal series mastery test: the
 [2] M. Kane, “Validation,” in Educational Measurement, R. Bren-               Scott-Foresman reading program,” Research Report, Institute
     nan, Ed., pp. 17–64, American Council on Education and Prae-              for Research on Learning Disabilities (IRLD), University of
     ger, Westport, Conn, USA, 4th edition, 2006.                              Minnesota, Minneapolis, Minn, USA, 1983.
 [3] L. S. Fuchs, “The past, present, and future of curriculum-based     [17] G. Tindal, M. Shinn, L. Fuchs, D. Fuchs, S. Deno, and G. Ger-
     measurement research,” School Psychology Review, vol. 33, no. 2,          mann, “The technical adequacy of a base reading series mastery
     pp. 188–192, 2004.                                                        test,” Research Report, Institute for Research on Learning Dis-
                                                                               abilities (IRLD), University of Minnesota, Minneapolis, Minn,
 [4] American Educational Research Association, American Psy-
                                                                               USA, 1983.
     chological Association, & National Council on Measurement in
     Education, Standards for Educational and Psychological Testing,     [18] D. Fuchs and S. Deno, “Reliability and validity of curriculum-
     Amer Psychological Association, Washington, DC, USA, 1999.                based informal reading inventories,” Research Report, Institute
                                                                               for Research on Learning Disabilities (IRLD), University of
 [5] S. Deno and P. Mirkin, Data Based Program Modification: A                 Minnesota, Minneapolis, Minn, USA, 1981.
     Manual, Leadership Training Institute for Special Education,
                                                                         [19] L. Fuchs, D. Fuchs, and S. Deno, “The nature of inaccuracy
     Minneapolis, Minn, USA, 1977.
                                                                               among readability formulas,” Research Report, Institute for
 [6] G. Tindal and J. F. T. Nese, “Applications of curriculum-based            Research on Learning Disabilities (IRLD), University of Min-
     measures in making multiple decisions with multiple reference             nesota, Minneapolis, Minn, USA, 1983.
     points,” in Assessment and Intervention: Advances in Learning       [20] L. Fuchs and S. Deno, “A comparison of reading placement
     and Behavioral Disabilities, M. M. T. Scruggs, Ed., vol. 24, pp.          based on teacher judgment, standardized testing, and curricu-
     31–58, Emerald, Bingley, UK, 2011.                                        lum-based assessment,” Research Report, Institute for Research
 [7] P. Mirkin, S. Deno, G. Tindal, and K. Kuehnle, “Formative eval-           on Learning Disabilities (IRLD), University of Minnesota, Min-
     uation: continued development of data utilization systems,”               neapolis, Minn, USA, 1981.
     Research Report, Institute for Research on Learning Disabilities    [21] L. Fuchs, S. Deno, and P. Mirkin, “Direct and frequent mea-
     (IRLD), University of Minnesota, Minneapolis, Minn, USA,                  surement and evaluation: effects on instruction and estimates of
     1980.                                                                     student progress,” Research Report, Minneapolis, Minn, USA,
 [8] B. Meyers, J. Meyers, and S. Deno, “Formative evaluation and              1982.
     teacher decision-making: a follow-up investigation,” Research       [22] L. Fuchs, S. Deno, and P. Mirkin, “Effects of frequent curricu-
     Report, Institute for Research on Learning Disabilities (IRLD),           lum-based measurement and evaluation on student achieve-
     University of Minnesota, Minneapolis, Minn, USA, 1980.                    ment and knowledge of performance: an experimental study,”
 [9] S. Deno and P. Mirkin, Data-Based IEP Development: An                     Research Report, Institute for Research on Learning Disabilities
     Approach to Substantive Compliance, Monograph, Institute for              (IRLD), University of Minnesota, Minneapolis, Minn, USA,
     Research on Learning Disabilities (IRLD), University of Minne-            1982.
     sota, Minneapolis, Minn, USA, 1979.                                 [23] L. Fuchs, S. Deno, and A. Roettger, “The effect of alterna-
[10] G. Tindal, L. Fuchs, S. Christenson, P. Mirkin, and S. Deno, “The         tive data-utilization rules on spelling achievement,” Research
     relationship between student achievement and teacher assess-              Report, Institute for Research on Learning Disabilities (IRLD),
     ment of short or long-term goals,” Research Report, Institute             University of Minnesota, Minneapolis, Minn, USA, 1983.
     for Research on Learning Disabilities (IRLD), University of         [24] L. Fuchs, C. Wesson, G. Tindal, P. Mirkin, and S. Deno, “Instruc-
     Minnesota, Minneapolis, Minn, USA, 1981.                                  tional changes, student performances, and teacher preferences:
ISRN Education                                                                                                                                 23
     the effects of specific measurement and evaluation procedures,”       [37] R. Skiba, C. Wesson, and S. Deno, “The effects of training
     Research Report, Institute for Research on Learning Disabilities            teachers in the use of formative evaluation in reading: an exper-
     (IRLD), University of Minnesota, Minneapolis, Minn, USA,                    imental control comparison,” Research Report, Institute for
     1982.                                                                       Research on Learning Disabilities (IRLD), University of Min-
[25] P. Mirkin and S. Deno, “Formative evaluation in the classroom:              nesota, Minneapolis, Minn, USA, 1982.
     an approach to improving instruction,” Research Report, Insti-        [38] L. Fuchs, C. Wesson, G. Tindal, P. Mirkin, and S. Deno, “Teacher
     tute for Research on Learning Disabilities (IRLD), University of            efficiency in continuous evaluation of IEP goals,” Research
     Minnesota, Minneapolis, Minn, USA, 1979.                                    Report, Institute for Research on Learning Disabilities (IRLD),
                                                                                 University of Minnesota, Minneapolis, Minn, USA, 1981.
[26] P. Mirkin, L. Fuchs, G. Tindal, S. Christenson, and S. Deno,
     “The effect of IEP monitoring strategies on teacher behavior,”        [39] R. King, C. Wesson, and S. Deno, “Direct and frequent mea-
     Research Report, Institute for Research on Learning Disabilities            surement of student performance: does it take too much time?”
     (IRLD), University of Minnesota, Minneapolis, Minn, USA,                    Research Report, Institute for Research on Learning Disabilities
     1981.                                                                       (IRLD), University of Minnesota, Minneapolis, Minn, USA,
                                                                                 1982.
[27] B. Sevcik, R. Skiba, G. Tindal et al., “Curriculum-based mea-
                                                                           [40] D. Marston and S. Deno, “Implementation of direct and repeat-
     surement: effects on instruction, teacher estimates of student
                                                                                 ed measurement in the school setting,” Research Report, Insti-
     progress and student knowledge of performance,” Research
                                                                                 tute for Research on Learning Disabilities (IRLD), University of
     Report, Institute for Research on Learning Disabilities (IRLD),
                                                                                 Minnesota, Minneapolis, Minn, USA, 1982.
     University of Minnesota, Minneapolis, Minn, USA, 1983.
                                                                            [41] D. Marston, G. Tindal, and S. Deno, “Predictive efficiency of
[28] C. Wesson, S. Deno, P. Mirkin et al., “Teaching structure and               direct, repeated measurement: an analysis of cost and accuracy
     student achievement effects of curriculum-based measurement:                in classification,” Research Report, Institute for Research on
     a casual (structural) analysis,” Research Report, Institute for             Learning Disabilities (IRLD), University of Minnesota, Min-
     Research on Learning Disabilities (IRLD), University of Min-                neapolis, Minn, USA, 1982.
     nesota, Minneapolis, Minn, USA, 1982.
                                                                           [42] P. Mirkin, D. Marston, and S. Deno, “Direct and repeated
[29] C. Wesson, P. Mirkin, and S. Deno, “Teachers’ use of self-in-               measurement of academic skills: an alternative to traditional
     structional materials for learning procedures for developing and            screening referral, and identification of learning disabled stu-
     monitoring progress on IEP goals,” Research Report, Institute               dents,” Research Reports, Institute for Research on Learning
     for Research on Learning Disabilities (IRLD), University of                 Disabilities (IRLD), University of Minnesota, Minneapolis,
     Minnesota, Minneapolis, Minn, USA, 1982.                                    Minn, USA, 1982.
[30] C. Wesson, R. Skiba, B. Sevcik et al., “The impact of the structure   [43] G. Tindal, G. Germann, and S. Deno, “Descriptive research on
     of instruction and the use of technically adequate instructional            the Pine County norms: a compilation of findings,” Research
     data on reading improvement,” Research Report, Institute                    Report, Institute for Research on Learning Disabilities (IRLD),
     for Research on Learning Disabilities (IRLD), University of                 University of Minnesota, Minneapolis, Minn, USA, 1983.
     Minnesota, Minneapolis, Minn, USA, 1983.                              [44] G. Tindal, G. Germann, D. Marston, and S. Deno, “The
[31] M. Shinn, R. Good, and S. Stein, “Summarizing trend in student              effectiveness of special education,” Research Report, Institute
     achievement: a comparison of methods,” School Psychology                    for Research on Learning Disabilities (IRLD), University of
     Review, vol. 18, no. 3, pp. 356–370, 1989.                                  Minnesota, Minneapolis, Minn, USA, 1983.
                                                                           [45] L. Fuchs, G. Tindal, and S. Deno, “Effects of varying item
[32] R. Skiba and S. Deno, “A correlational analysis of the statistical
                                                                                 domain and sample duration on technical characteristics of
     properties of time-series data and their relationship to student
                                                                                 daily measures in reading,” Research Report, Institute for
     achievement in resource classrooms,” Research Report, Institute
                                                                                 Research on Learning Disabilities (IRLD), University of Min-
     for Research on Learning Disabilities (IRLD), University of
                                                                                 nesota, Minneapolis, Minn, USA, 1981.
     Minnesota, Minneapolis, Minn, USA, 1983.
                                                                           [46] M. Shinn, M. Gleason, and G. Tindal, “Varying the difficulty of
[33] R. Skiba, D. Marston, C. Wesson, B. Sevcik, and S. Deno, “Char-             testing materials: implications for curriculum-based measure-
     acteristics of the time-series data collected through curriculum-           ment,” The Journal of Special Education, vol. 23, pp. 223–233,
     based reading measurement,” Research Report, Institute for                  1989.
     Research on Learning Disabilities (IRLD), University of Min-
                                                                           [47] G. Tindal and S. Deno, “Daily measurement of reading: effects
     nesota, Minneapolis, Minn, USA, 1983.
                                                                                 of varying the size of the item pool,” Research Report, Institute
[34] G. Tindal, S. Deno, and J. Ysseldyke, “Visual analysis of time              for Research on Learning Disabilities (IRLD), University of
     series data: factors of influence and level reliability,” Research          Minnesota, Minneapolis, Minn, USA, 1981.
     Report, Institute for Research on Learning Disabilities (IRLD),       [48] G. Tindal, D. Marston, S. Deno, and G. Germann, “Curriculum
     University of Minnesota, Minneapolis, Minn, USA, 1983.                      differences in direct repeated measures of reading,” Research
[35] D. Marston, L. Lowry, S. Deno, and P. Mirkin, “An analysis of               Report, Institute for Research on Learning Disabilities (IRLD),
     learning trends in simple measures of reading, spelling, and                University of Minnesota, Minneapolis, Minn, USA, 1982.
     written expression: a longitudinal study,” Research Report,           [49] S. Deno, D. Marston, P. Mirkin, L. Lowry, P. Sindelar, and J. R.
     Institute for Research on Learning Disabilities (IRLD), Univer-             Jenkins, The Use of Standard Tasks to Measure Achievement in
     sity of Minnesota, Minneapolis, Minn, USA, 1981.                            Reading, Spelling, and Written Expression: A Normative and
[36] R. King, S. Deno, P. Mirkin, and C. Wesson, “The effects of                 Developmental Study, Institute for Research on Learning Dis-
     training in the use of formative evaluation in reading: an exper-           abilities, Minneapolis, Minn, USA, 1982.
     imental control comparison,” Research Report, Institute for           [50] S. Deno, P. Mirkin, B. Chiang, and L. Lowry, “Relationships
     Research on Learning Disabilities (IRLD), University of Min-                among simple measures of reading and performance on stan-
     nesota, Minneapolis, Minn, USA, 1983.                                       dardized achievement tests,” Research Report, Institute for
24                                                                                                                              ISRN Education
       Research on Learning Disabilities (IRLD), University of Min-         [67] C. A. Espin, K. L. McMaster, S. Rose, and M. M. Wayman, A
       nesota, Minneapolis, Minn, USA, 1980.                                     Measure of Success: How Curriculum-Based Measurement has
[51]   S. Deno, P. Mirkin, L. Lowry, and K. Kuehnle, “Relation-                  Influenced Education and Learning, University of Minnesota
       ships among simple measures of spelling and performance on                Press, Minneapolis, Minn, USA, 2012.
       standardized achievement tests,” Research Report, Institute for      [68] G. Tindal and J. F. T. Nese, “Within year achievement growth
       Research on Learning Disabilities (IRLD), University of Min-              using curriculum based measurement,” in Proceedings of the
       nesota, Minneapolis, Minn, USA, 1980.                                     National Council on Measurement in Education, Vancouver,
[52]   S. Deno, P. Mirkin, and D. Marston, “Relationships among                  Canada, April 2012.
       simple measures of written expression and performance on             [69] L. Fuchs, D. Fuchs, C. Hamlett, L. Walz, and G. Germann,
       standardized achievement tests,” Research Report, Institute for           “Formative evaluation of academic progress: how much growth
       Research on Learning Disabilities (IRLD), University of Min-              can we expect?” School Psychology Review, vol. 22, pp. 27–48,
       nesota, Minneapolis, Minn, USA, 1980.                                     1993.
[53]   L. Fuchs, S. Deno, and D. Marston, “Use of aggregation to            [70] S. L. Deno, L. S. Fuchs, D. Marston, and J. Shin, “Using curric-
       improve the reliability of simple direct measures of academic             ulum-based measurement to establish growth standards for
       performance,” Research Report, Institute for Research on                  students with learning disabilities,” School Psychology Review,
       Learning Disabilities (IRLD), University of Minnesota, Min-               vol. 30, no. 4, pp. 507–524, 2001.
       neapolis, Minn, USA, 1982.                                           [71] S. P. Ardoin and T. J. Christ, “Evaluating curriculum-based mea-
[54]   D. Marston and S. Deno, “The reliability of simple, direct                surement slope estimates using data from triannual universal
       measures of written expression,” Research Report, Institute for           screenings,” School Psychology Review, vol. 37, no. 1, pp. 109–125,
       Research on Learning Disabilities (IRLD), University of Minne-            2008.
       sota, Minneapolis, Minn, USA, 1981.                                  [72] T. J. Christ, “Short-term estimates of growth using curriculum-
[55]   G. Tindal, D. Marston, and S. Deno, “The reliability of direct and        based measurement of oral reading fluency: estimating stan-
       repeated measurement,” Research Report, Institute for Research            dard error of the slope to construct confidence intervals,” School
       on Learning Disabilities (IRLD), University of Minnesota,                 Psychology Review, vol. 35, no. 1, pp. 128–133, 2006.
       Minneapolis, Minn, USA, 1983.                                        [73] T. J. Christ, B. Silberglitt, S. Yeo, and D. Cormier, “Curriculum-
[56]   J. Videen, S. Deno, and D. Marston, “Correct word sequences:              based measurement of oral reading: an evaluation of growth
       a valid indicator of proficiency in written expression,” Research         rates and seasonal effects among students served in general and
       Reports, Institute for Research on Learning Disabilities (IRLD),          special education,” School Psychology Review, vol. 39, no. 3, pp.
       University of Minnesota, Minneapolis, Minn, USA, 1982.                    447–462, 2010.
[57]   C. Wesson, S. Deno, and P. Mirkin, Research on Developing            [74] S. B. Graney, K. N. Missall, R. S. Martı́nez, and M. Bergstrom,
       and Monitoring Progress on IEP Goals: Current Findings and                “A preliminary investigation of within-year growth patterns in
       Implications for Practice, Monograph, Institute for Research on           reading and mathematics curriculum-based measures,” Journal
       Learning Disabilities (IRLD), University of Minnesota, Minne-             of School Psychology, vol. 47, no. 2, pp. 121–142, 2009.
       apolis, Minn, USA, 1982.                                             [75] J. R. Jenkins, J. J. Graff, and D. L. Miglioretti, “Estimating
[58]   G. Tindal, G. Germann, S. Deno, and P. Mirkin, The Pine County            reading growth using intermittent CBM progress monitoring,”
       Model for Special Education Delivery: A Data-Based System,                Exceptional Children, vol. 75, no. 2, pp. 151–163, 2009.
       Monograph, Institute for Research on Learning Disabilities           [76] J. Jenkins and K. Terjeson, “Monitoring reading growth: goal
       (IRLD), University of Minnesota, Minneapolis, Minn, USA,                  setting, measurement frequency, and methods of evaluation,”
       1982.                                                                     Learning Disabilities Research & Practice, vol. 26, no. 1, pp. 28–
[59]   P. K. Mirkin, L. S. Fuchs, and S. L. Deno, Consideration for De-          35, 2011.
       signing a Continuous Evaluation : An Interpretive Review, Mono-      [77] J. F. T. Nese, G. Biancarosa, D. Anderson, C. F. Lai, J. Alonzo,
       graph, Institute for Research on Learning Disabilities (IRLD),            and G. Tindal, “Within-year oral reading fluency with CBM: a
       University of Minnesota, Minneapolis, Minn, USA, 1982.                    comparison of models,” Reading and Writing, vol. 25, no. 4, pp.
[60]   M. R. Shinn, Curriculum-Based Measurement: Assessing Special              887–915, 2012.
       Children, The Guilford Press, New York, NY, USA, 1989.               [78] S. Raudenbush, A. Bryk, Y. Cheong, and R. Congdon, HLM 6:
[61]   M. R. Shinn, Advanced Applications of Curriculum-Based Mea-               Hierarchical Linear and NonLinear Modeling, Scientific Software
       surement, Guilford Press, New York, NY, USA, 1998.                        International, Lincolnwood, Ill, USA, 2004.
[62]   G. Tindal and D. Marston, Classroom-Based Assessment: Eval-          [79] S. Raudenbush and A. Bryk, Hierarchical Linear Models: Appli-
       uating Instructional Outcomes, Merrill, Columbus, Ohio, USA,              cations and Data Analysis Methods, Sage Publications, Thou-
       1990.                                                                     sand Oaks, Calif, USA, 2nd edition, 2002.
[63]   E. E. Gickling and V. P. Thompson, “A personal view of curricu-      [80] R. Brennan, Generalizability Theory, Springer, New York, NY,
       lum-based assessment,” Exceptional Children, vol. 52, no. 3, pp.          USA, 2001.
       205–218, 1985.                                                       [81] R. Linn and E. Burton, “Performance-based assessment: impli-
[64]   K. Howell, Curriculum-Based Evaluation for Special and Reme-              cations of task specificity,” Educational Measurement, vol. 13, pp.
       dial Education: A Handbook for Deciding What to Teach, Merrill            5–8, 1994.
       Publishing, Columbus, Ohio, USA, 1987.                               [82] J. M. Hintze, E. J. Daly, and E. S. Shapiro, “An investigation of
[65]   C. S. Blankenship, “Using curriculum-based assessment data to             the effects of passage difficulty level on outcomes of oral reading
       make instructional decisions,” Exceptional Children, vol. 52, no.         fluency progress monitoring,” School Psychology Review, vol. 27,
       3, pp. 233–238, 1985.                                                     no. 3, pp. 433–445, 1998.
[66]   M. R. Shinn, S. Rosenfield, and N. Knutson, “Curriculum-based        [83] J. M. Hintze and T. J. Christ, “An examination of variability as
       assessment: a comparison of models,” School Psychology Review,            a function of passage variance in CBM progress monitoring,”
       vol. 18, no. 3, pp. 299–316, 1989.                                        School Psychology Review, vol. 33, no. 2, pp. 204–217, 2004.
ISRN Education                                                                                                                                      25
[84] J. M. Hintze, S. V. Owen, E. S. Shapiro, and E. J. Daly, “Research     [100] D. Starch, “The measurement of efficiency in reading,” The
     design and methodology section—generalizability of oral read-                 Journal of Educational Psychology, vol. 6, no. 1, pp. 1–24, 1915.
     ing fluency measures: application of g theory to curriculum-            [101] V. L. Anderson and M. A. Tinker, “The speed factor in reading
     based measurement,” School Psychology Quarterly, vol. 15, no.                 performance,” Journal of Educational Psychology, vol. 27, no. 8,
     1, pp. 52–68, 2000.                                                           pp. 621–624, 1936.
[85] B. C. Poncy, C. H. Skinner, and P. K. Axtell, “An investigation of     [102] D. LaBerge and S. J. Samuels, “Toward a theory of automatic
     the reliability and standard error of measurement of words read               information processing in reading,” Cognitive Psychology, vol.
     correctly per minute using curriculum-based measurement,”                     6, no. 2, pp. 293–323, 1974.
     Journal of Psychoeducational Assessment, vol. 23, no. 4, pp. 326–
                                                                            [103] R. Good, D. Simmons, and E. Kame’enui, “The importance and
     338, 2005.
                                                                                   decision-making utility of a continuum of fluency-based indica-
[86] T. J. Christ and S. P. Ardoin, “Curriculum-based measurement                  tors of foundational reading skills for third-grade high-stakes
     of oral reading: passage equivalence and probe-set develop-                   outcomes,” Scientific Studies of Reading, vol. 5, no. 3, pp. 257–
     ment,” Journal of School Psychology, vol. 47, no. 1, pp. 55–75,               288, 2001.
     2009.
                                                                            [104] R. H. Good, J. Gruba, and R. A. Kaminski, “Best practices
[87] S. P. Ardoin and T. J. Christ, “Curriculum-based measurement
                                                                                   in using Dynamic Indicators of Basic Early Literacy Skills
     of oral reading: standard errors associated with progress moni-
                                                                                   (DIBELS) in an outcomes-driven model,” in Best Practices in
     toring outcomes from DIBELS, AIMSweb, and an experimental
                                                                                   School Psychology IV, A. Thomas and J. Grimes, Eds., pp. 679–
     passage set,” School Psychology Review, vol. 38, no. 2, pp. 266–
                                                                                   700, National Association of School Psychologists, Washington,
     283, 2009.
                                                                                   DC, USA, 2001.
[88] D. J. Francis, K. L. Santi, C. Barr, J. M. Fletcher, A. Varisco, and
                                                                            [105] R. H. Good and R. A. Kaminski, “Assessment for instructional
     B. R. Foorman, “Form effects on the estimation of students’ oral
                                                                                   decisions: toward a proactive/prevention model of decision-
     reading fluency using DIBELS,” Journal of School Psychology,
                                                                                   making for early literacy skills,” School Psychology Quarterly,
     vol. 46, no. 3, pp. 315–342, 2008.
                                                                                   vol. 11, no. 4, pp. 326–336, 1996.
[89] T. Christ, “Curriculum-based measurement of oral reading:
     multi-study evaluation of schedule, duration and dataset quality       [106] H. L. Rouse and J. W. Fantuzzo, “Validity of the dynamic
     on progress monitoring outcomes,” Journal of School Psychology,               indicators for basic early literacy skills as an indicator of early
     vol. 78, no. 3, 2012.                                                         literacy for urban kindergarten children,” School Psychology
                                                                                   Review, vol. 35, no. 3, pp. 341–355, 2006.
[90] D. C. Briggs, “Synthesizing causal inferences,” Educational
     Researcher, vol. 37, no. 1, pp. 15–22, 2008.                           [107] N. Clemens, E. Shapiro, and F. Thoemmes, “Improving the
                                                                                   efficacy of first grade reading screening: an investigation of
[91] National Institute for Literacy, Developing Early Literacy: Report
                                                                                   word identification fluency with other early literacy indicators,”
     of the National Early Literacy Panel (A Scientific Synthesis of
                                                                                   School Psychology Quarterly, vol. 26, no. 3, pp. 231–244, 2011.
     Early Literacy Development and Implications for Intervention),
     Washington, DC, USA, 2008.                                             [108] S. Hagan-Burke, M. Burke, and C. Crowder, “The convergent
[92] R. C. Anderson, E. H. Hiebert, J. A. Scott, and I. A. G. Wilkin-              validity of the dynamic indicators of basic and early literacy
     son, Becoming a Nation of Readers: The Report of the Commis-                  skills and the test of word reading efficiency for the beginning
     sion on Reading, National Institute of Education, Washington,                 of first grade,” Assessment for Effective Intervention, vol. 31, no.
     DC, USA, 1985.                                                                4, pp. 1–15, 2006.
[93] National Institutes of Child Health and Human Development,             [109] Y. Y. Lo, C. Wang, and S. Haskell, “Examining the impacts of
     “Report of national reading panel: teaching children to read:                 early reading intervention on the growth rates in basic literacy
     an evidence-based assessment of the scientific literature on                  skills of at-risk urban kindergarteners,” The Journal of Special
     reading and its implications for reading instruction,” Report of              Education, vol. 43, no. 1, pp. 12–28, 2009.
     the Subgroups, Washington, DC, USA, 2000.                              [110] U. Yesil-Dagli Ummuhan, “Predicting ELL students’ beginning
[94] No Child Left Behind, Committee on Education and Labor,                       first grade English oral reading fluency from initial kinder-
     Government Printing Office, Washington, DC, USA, 1st edition,                 garten vocabulary, letter naming, and phonological awareness
     2001.                                                                         skills,” Early Childhood Research Quarterly, vol. 26, no. 1, pp. 15–
[95] M. J. Adams, Beginning to Read: Thinking and Learning about                   29, 2011.
     Print, MIT Press, Cambridge, Mass, USA, 1990.                           [111] L. C. Ehri, S. R. Nunes, D. M. Willows, B. V. Schuster,
[96] L. S. Fuchs, D. Fuchs, and D. L. Compton, “Monitoring early                   Z. Yaghoub-Zadeh, and T. Shanahan, “Phonemic awareness
     reading development in first grade: word identification fluency               instruction helps children learn to read: evidence from the
     versus nonsense word fluency,” Exceptional Children, vol. 71, no.             National Reading Panels meta-analysis,” Reading Research
     1, pp. 7–21, 2004.                                                            Quarterly, vol. 36, no. 3, pp. 250–287, 2001.
[97] K. D. Ritchey and D. L. Speece, “From letter names to word              [112] S. Stage, J. Sheppard, M. Davidson, and M. Browning, “Pre-
     reading: the nascent role of sublexical fluency,” Contemporary                diction of first-graders’ growth in oral reading fluency using
     Educational Psychology, vol. 31, no. 3, pp. 301–327, 2006.                    kindergarten letter fluency,” Journal of School Psychology, vol.
[98] K. D. Ritchey, “Assessing letter sound knowledge: a comparison                39, pp. 225–237, 2001.
     of letter sound fluency and nonsense word fluency,” Exceptional         [113] H. Yopp, “A test for assessing phonemic awareness in young
     Children, vol. 74, no. 4, pp. 487–506, 2008.                                  children,” The Reading Teacher, vol. 49, no. 1, pp. 20–29, 1995.
[99] P. G. Aaron, R. Malatesha Joshi, R. Gooden, and K. E. Bentum,           [114] D. L. Linklater, R. E. O’Connor, and G. J. Palardy, “Kindergarten
     “Diagnosis and treatment of reading disabilities based on the                 literacy assessment of English Only and English language
     component model of reading: an alternative to the discrepancy                 learner students: an examination of the predictive validity of
     model of LD,” Journal of Learning Disabilities, vol. 41, no. 1, pp.           three phonemic awareness measures,” Journal of School Psychol-
     67–84, 2008.                                                                  ogy, vol. 47, no. 6, pp. 369–394, 2009.
 26                                                                                                                                  ISRN Education
 [115] D. L. Speece, K. D. Ritchey, D. H. Cooper, F. P. Roth, and C.           [132] B. Diercks-Gransee, J. W. Weissenburger, C. L. Johnson, and P.
       Schatschneider, “Growth in early reading skills from kinder-                   Christensen, “Curriculum-based measures of writing for high
       garten to third grade,” Contemporary Educational Psychology,                   school students,” Remedial and Special Education, vol. 30, no. 6,
       vol. 29, no. 3, pp. 312–332, 2004.                                             pp. 360–371, 2009.
[116] C. Juel, “Learning to read and write: a longitudinal study of 54         [133] C. A. Espin, S. De La Paz, B. J. Scierka, and L. Roelofs, “The
       children from first through fourth grades,” Journal of Education-              relationship between curriculum-based measures in written
       al Psychology, vol. 80, no. 4, pp. 437–447, 1988.                              expression and quality and completeness of expository writing
[117] J. M. T. Vloedgraven and L. Verhoeven, “Screening of phonolog-                  for middle school students,” The Journal of Special Education,
       ical awareness in the early elementary grades: an IRT approach,”               vol. 38, no. 4, pp. 208–217, 2005.
       Annals of Dyslexia, vol. 57, no. 1, pp. 33–50, 2007.                    [134] S. Fewster and P. D. Macmillan, “School-based evidence for
[118] M. Twain, “Simplified spelling,” in Letters from the Earth:                     the validity of curriculum-based measurement of reading and
       Uncensored Writings, B. DeVoto, Ed., HarperCollins, New York,                  writing,” Remedial and Special Education, vol. 23, no. 3, pp. 149–
       NY, USA, 1942.                                                                 156, 2002.
[119] L. Fuchs and D. Fuchs, “Determining adequate yearly progress             [135] J. M. Amato and M. W. Watkins, “The predictive validity of
       from kindergarten through grade 6 with curriculum-based                        CBM writing indices for eighth-grade students,” The Journal of
       measurement,” Assessment for Effective Intervention, vol. 29, no.              Special Education, vol. 44, no. 4, pp. 195–204, 2011.
       4, pp. 25–37, 2004.                                                     [136] K. McMaster and C. Espin, “Technical features of curriculum-
[120] L. S. Fuchs, D. Fuchs, and D. L. Compton, “Monitoring early                     based measurement in writing: a literature review,” The Journal
       reading development in first grade: word identification fluency                of Special Education, vol. 41, no. 2, pp. 68–84, 2007.
       versus nonsense word fluency,” Exceptional Children, vol. 71, no.       [137] A. Foegen and S. L. Deno, “Identifying growth indicators for
       1, pp. 7–21, 2004.                                                             low-achieving students in middle school mathematics,” The
[121] R. Parker, G. Tindal, and J. Hasbrouck, “Countable indices of                   Journal of Special Education, vol. 35, no. 1, pp. 4–16, 2001.
       writing quality: their suitability for screening-eligibility deci-
                                                                               [138] M. Calhoon, “Curriculum-based measurement for mathemat-
       sions,” Exceptionality, vol. 2, pp. 1–17, 1991.
                                                                                      ics at the high school level: what we do not know and what we
[122] R. I. Parker, G. Tindal, and J. Hasbrouck, “Progress monitoring                 need to know,” Assessment for Effective Intervention, vol. 33, pp.
       with objective measures of writing performance for students                    234–239, 2008.
       with mild disabilities,” Exceptional Children, vol. 58, no. 1, pp.
                                                                               [139] L. Fuchs, D. Fuch, and S. Courey, “Curriculum-based measure-
       61–73, 1991.
                                                                                      ment of mathematics competence: from competence to con-
[123] G. Tindal and R. Parker, “Assessment of written expression for                  cepts and applications to real life problem solving,” Assessment
       students in compensatory and special education programs,” The                  for Effective Intervention, vol. 30, no. 2, pp. 33–46, 2005.
       Journal of Special Education, vol. 23, no. 2, pp. 169–183, 1989.
                                                                               [140] T. Christ, S. Sculin, A. Tolbize, and C. Jiban, “Implication of
[124] G. Tindal and R. Parker, “Identifying measures for evaluating
                                                                                      recent research: curriculum-based measurement of math com-
       written expression,” Learning Disabilities Research and Practice,
                                                                                      putation,” Assessment for Effective Intervention, vol. 33, pp. 198–
       vol. 6, pp. 211–218, 1991.
                                                                                      205, 2008.
[125] K. A. Gansle, A. M. VanDerHeyden, G. H. Noell, J. L. Resetar,
                                                                                [141] A. Foegen, C. Jiban, and S. Deno, “Progress monitoring mea-
       and K. L. Williams, “The technical adequacy of curriculum-
                                                                                      sures in mathematics. A review of the literature,” The Journal of
       based and rating-based measures of written expression for
                                                                                      Special Education, vol. 41, no. 2, pp. 121–139, 2007.
       elementary school students,” School Psychology Review, vol. 35,
       no. 3, pp. 435–450, 2006.                                               [142] M. K. Burns, A. M. VanDerHeyden, and C. L. Jiban, “Assessing
[126] K. L. McMaster and H. Campbell, “New and existing curricu-                      the instructional level for mathematics: a comparison of meth-
       lum-based writing measures: technical features within and                      ods,” School Psychology Review, vol. 35, no. 3, pp. 401–418, 2006.
       across grades,” School Psychology Review, vol. 37, no. 4, pp. 550–      [143] T. J. Christ and O. Vining, “Curriculum-based measurement
       566, 2008.                                                                     procedures to develop multiple-skill mathematics computa-
[127] K. A. Gansle, G. H. Noell, A. M. VanDerHeyden et al., “An                       tion probes: evaluation of random and stratified stimulus-set
       examination of the criterion validity and sensitivity to brief                 arrangements,” School Psychology Review, vol. 35, no. 3, pp. 387–
       intervention of alternate curriculum-based measures of writing                 400, 2006.
       skill,” Psychology in the Schools, vol. 41, no. 3, pp. 291–300, 2004.   [144] L. S. Fuchs, D. Fuchs, D. L. Compton, J. D. Bryant, C. L. Hamlett,
[128] C. Espin, B. Scierka, and S. Skare, “Criterion-related validity                 and P. M. Seethaler, “Mathematics screening and progress mon-
       of curriculum-based measures in writing for secondary school                   itoring at first grade: implications for responsiveness to inter-
       students,” Reading & Writing Quarterly, vol. 15, no. 1, pp. 5–27,              vention,” Exceptional Children, vol. 73, no. 3, pp. 311–330, 2007.
       1999.                                                                   [145] C. Jiban and S. Deno, “Using math and reading curriculum-
[129] C. Espin, J. Shin, S. L. Deno, S. Skare, S. Robinson, and B. Ben-               based measurements to predict state mathematics test perfor-
       ner, “Identifying indicators of written expression proficiency for             mance: are simple one-minute measures technically adequate?”
       middle school students,” The Journal of Special Education, vol.                Assessment for Effective Intervention, vol. 32, pp. 78–89, 2007.
       34, no. 3, pp. 140–153, 2000.                                           [146] P. Seethaler and L. Fuchs, “Using curriculum-based measure-
[130] J. W. Weissenburger and C. A. Espin, “Curriculum-based meas-                    ment to monitor kindergarteners’ mathematics development,”
       ures of writing across grade levels,” Journal of School Psychology,            Assessment for Effective Intervention, vol. 36, no. 4, pp. 219–229,
       vol. 43, no. 2, pp. 153–169, 2005.                                             2011.
 [131] C. Espin, T. Wallace, H. Campbell, E. S. Lembke, J. D. Long, and        [147] E. Shapiro, L. Edwards, and N. Zigmond, “Progress monitoring
       R. Ticha, “Curriculum-based measurement in writing: predict-                   of mathematics among students with learning disabilities,”
       ing the success of high-school students on state standards tests,”             Assessment for Effective Intervention, vol. 30, no. 2, pp. 15–32,
       Exceptional Children, vol. 74, no. 2, pp. 174–193, 2008.                       2005.
 ISRN Education                                                                                                                                        27
[148] B. Clarke, S. Baker, K. Smolkowski, and D. J. Chard, “An analysis       [166] C. A. Espin and A. Foegen, “Validity of general outcome meas-
       of early numeracy curriculum-based measurement: examining                     ures for predicting secondary students performance on content-
       the role of growth in student outcomes,” Remedial and Special                 area tasks,” Exceptional Children, vol. 62, no. 6, pp. 497–514,
       Education, vol. 29, no. 1, pp. 46–57, 2008.                                   1996.
[149] D. Chard, B. Clarke, S. Baker, J. Otterstedt, D. Braun, and R.          [167] C. Espin, T. Busch, J. Shin, and R. Kruschwitz, “Curriculum-
       Katz, “Using measures of number sense to screen for difficulties              based measurement in the content areas: validity of vocabulary
       in mathematics: preliminary findings,” Assessment for Effective               matching as an indicator of performance in social studies,”
       Intervention, vol. 30, no. 2, pp. 3–14, 2005.                                 Learning Disabilities, vol. 16, pp. 142–151, 2001.
[150] E. Lembke, A. Foegen, T. Whittaker, and D. Hampton, “Estab-             [168] C. A. Espin, J. Shin, and T. W. Busch, “Curriculum-based
       lishing technically adequate measures of progress in early                    measurement in the content areas: vocabulary matching as
       numeracy,” Assessment for Effective Intervention, vol. 33, no. 4,             an indicator of progress in social studies learning,” Journal of
       pp. 206–214, 2008.                                                            Learning Disabilities, vol. 38, no. 4, pp. 353–363, 2005.
 [151] R. Martinez, K. Missal, S. Graney, O. Aricak, and B. Clarke,           [169] C. Espin, T. Wallace, E. Lembke, H. Campbell, and J. Long,
       “Technical adequacy of early numeracy curriculum-based mea-                   “Creating a progress-monitoring system in reading for middle
       surement in kindergarten,” Assessment for Effective Intervention,             school students: tracking progress toward meeting high-stakes
       vol. 34, pp. 116–125, 2009.                                                   standards,” Learning Disabilities Research and Practice, vol. 25,
[152] A. M. VanDerHeyden, C. Broussard, and A. Cooley, “Further                      pp. 60–75, 2010.
       development of measures of early math performance for pre-
                                                                              [170] V. Nolet and G. Tindal, “Special education in content area class-
       schoolers,” Journal of School Psychology, vol. 44, no. 6, pp. 533–
                                                                                     es: development of a model and practical procedures,” Remedial
       553, 2006.
                                                                                     and Special Education, vol. 14, no. 1, pp. 36–48, 1993.
[153] J. Leh, A. Jitendra, G. Caskie, and C. Griffin, “An evaluation of
       curriculum-based measurement of mathematics word problem                [171] G. Roid and T. M. Haladyna, A Technology for Test-Item Writing,
       solving measures monitoring third-grade students’ mathemat-                   Academic Press, Orlando, Fla, USA, 1982.
       ics competence,” Assessment for Effective Intervention, vol. 32,       [172] G. Tindal and V. Nolet, “Curriculum-based measurement in
       pp. 90–99, 2007.                                                              middle and high schools: critical thinking skills in content
[154] L. Fuchs, D. Fuchs, and D. Compton, “The early prevention of                   areas,” Focus on Exceptional Children, vol. 27, no. 7, pp. 1–22,
       mathematics difficulty: its power and limitations,” Journal of                1995.
       Learning Disabilities, vol. 45, no. 3, Article ID 269, p. 257, 2012.   [173] S. Engelmann and D. Carnine, Theory of Instruction: Principles
[155] A. Foegen, “Technical adequacy of general outcome measures                     and Applications, Irvington Publishers, New York, NY, USA,
       for middle school mathematics,” Assessment for Effective Inter-               1982.
       vention, vol. 25, pp. 175–203, 2000.                                   [174] T. Twyman, J. McCleery, and G. Tindal, “Using concepts to
[156] A. Foegen, “Progress monitoring in middle school mathemat-                     frame history content,” Journal of Experimental Education, vol.
       ics: options and issues,” Remedial and Special Education, vol. 29,            74, no. 4, pp. 331–349, 2006.
       no. 4, pp. 195–207, 2008.                                              [175] T. Twyman and G. Tindal, “Reaching all of your students in
[157] L. S. Fuchs, C. L. Hamlett, and D. Fuchs, Monitoring Basic Skills              social studies,” Teaching Exceptional Children, vol. 1, no. 5,
       Progress: Basic Math Computation, 2nd edition, 1998.                          article 1, 2005.
[158] L. S. Fuchs, C. L. Hamlett, and D. Fuchs, Monitoring Basic Skills       [176] T. Twyman, L. R. Ketterlin-Geller, J. D. McCoy, and G. Tindal,
       Progress: Basic Math Concepts and Applications, 1999.                         “Effects of concept-based instruction on an English language
[159] E. Lembke and P. Stecker, Curriculum-Based Measurement in                      learner in a rural school: a descriptive case study,” Bilingual
       Mathematics: An Evidence-Based Formative Assessment Pro-                      Research Journal, vol. 27, no. 2, pp. 259–274, 2003.
       cedure, RMC Research Corporation, Center on Instruction,               [177] S. Embretson and S. Reise, Item Response Theory for Psycholo-
       Portsmouth, UK, 2007.                                                         gists, Lawrence Erlbaum Associates, Mahwah, NJ, USA, 2000.
[160] C. Espin and G. Tindal, “Curriculum-based measurement for               [178] T. Twyman and G. Tindal, “Extending curriculum-based meas-
       secondary students,” in Advanced Applications of Curriculum-                  urement into middle/secondary schools: the technical adequacy
       Based Measurement, M. R. Shinn, Ed., Guilford Press, New                      of the concept maze,” Journal of Applied School Psychology, vol.
       York, NY, USA, 1998.                                                          24, no. 1, pp. 49–67, 2007.
 [161] R. Quirk, The Linguist and the English Language, Arnold, Lon-
                                                                              [179] J. E. Heimlich and S. D. Pittelman, Semantic Mapping: Class-
       don, UK, 1974.
                                                                                     room Applications, International Reading Association, Newark,
[162] D. Corson, “The learning and use of academic english words,”                   Del, USA, 1986.
       Language Learning, vol. 47, no. 4, pp. 671–718, 1997.
                                                                              [180] F. P. Hunkins, Teaching Thinking Through Effective Questioning,
[163] C. Espin and S. Deno, “Content-specific and general reading
                                                                                     Christopher-Gordon Publisher, Boston, Mass, USA, 1989.
       disabilities of secondary-level students: identification and edu-
       cational relevance,” The Journal of Special Education, vol. 27, pp.     [181] G. Tindal and V. Nolet, “Serving students in middle school
       321–337, 1993.                                                                content classes: a heuristic study of critical variables linking
[164] C. Espin and S. Deno, “Performance in reading from content                     instruction and assessment,” The Journal of Special Education,
       area text as an indicator of achievement,” Remedial and Special               vol. 29, no. 4, pp. 414–432, 1996.
       Education, vol. 14, no. 6, pp. 47–59, 1993.                            [182] V. Nolet and G. Tindal, “Curriculum-based collaboration,”
[165] C. Espin and S. Deno, “Curriculum-based measures for sec-                      Focus on Exceptional Children, vol. 27, no. 3, pp. 1–12, 1994.
       ondary students: utility and task specificty of text-based read-       [183] V. Nolet and G. Tindal, “Instruction and learning in middle
       ing and vocabulary measures for predicting performance on                     school science classes: implications for students with disabili-
       content-area tasks,” Diagnostique, vol. 20, no. 1–4, pp. 121–142,             ties,” The Journal of Special Education, vol. 28, no. 2, pp. 166–187,
       1994.                                                                         1994.
  28                                                                                                                                 ISRN Education
[184] G. Tindal, V. Nolet, and G. Blake, Focus on Teaching and                [202] D. Mellard, M. McKnight, and K. Woods, “Response to inter-
       Learning in Content Classes, University of Oregon Behavioral                  vention screening and progress-monitoring practices in 41 local
       Research and Teaching, Eugene, Ore, USA, 1992.                                schools,” Learning Disabilities Research & Practice, vol. 24, no. 4,
 [185] V. Nolet, G. Tindal, and G. Blake, Focus on Assessment Learning               pp. 186–195, 2009.
       in Content Classes, University of Oregon Behavioral Research           [203] C. H. Hofstetter, “Contextual and mathematics accommodation
       and Teaching, Eugene, Ore, USA, 1993.                                         test effects for English-language learners,” Applied Measurement
[186] L. Ketterlin-Geller and G. Tindal, Concept-Based Instruction:                  in Education, vol. 16, no. 2, pp. 159–188, 2003.
       Science, University of Oregon Behavioral Research and Teach-           [204] L. Fuchs and S. Deno, “Must instructionally useful performance
       ing, Eugene, Ore, USA, 2002.                                                  assessment be based in the curriculum?” Exceptional Children,
 [187] T. Twyman, L. Ketterlin-Geller, and G. Tindal, Concept-Based                  vol. 61, no. 1, pp. 15–24, 1994.
       Instruction: Social Science, University of Oregon Behavioral           [205] P. Stecker and E. Lembke, Advanced Applications of CBM in
       Research and Teaching, Eugene, Ore, USA, 2002.                                Reading (K-6): Instructional Decision-making Strategies Manual,
 [188] M. McDonald, L. Ketterlin-Geller, and G. Tindal, Concept-                     National Center on Student Progress Monitoring, Washington,
       Based Instruction: Mathematics, University of Oregon Behav-                   DC, USA, 2011.
       ioral Research and Teaching, Eugene, Ore, USA, 2002.                   [206] G. Tindal, J. Alonzo, J. F. T. Nese, and L. Saez, “Validating
 [189] G. Tindal, J. Alonzo, and L. Ketterlin-Geller, Concept-Based                  progress monitoring in the context of RTI,” in Pacific Coast
       Instruction: Language Arts, University of Oregon Behavioral                   Research Conference (PCRC), Coronado, Calif, USA, February
       Research and Teaching, Eugene, Ore, USA, 2002.                                2012.
[190] S. Berkeley, W. N. Bender, L. Gregg Peaster, and L. Saunders,           [207] R. Gersten and J. A. Dimino, “RTI (Response to Intervention):
       “Implementation of response to intervention: a snapshot of                    rethinking special education for students with reading difficul-
       progress,” Journal of Learning Disabilities, vol. 42, no. 1, pp. 85–          ties (yet again),” Reading Research Quarterly, vol. 41, no. 1, pp.
       95, 2009.                                                                     99–108, 2006.
 [191] J. Wanzek and C. Cavanaugh, “Characteristics of general educa-         [208] J. M. Hintze and E. S. Shapiro, “Curriculum-based measurement
       tion reading interventions implemented in elementary schools                  and literature-based reading: is curriculum-based measure-
       with reading difficulties,” Remedial and Special Education, vol.              ment meeting the needs of changing reading curricula?” Journal
       33, no. 3, pp. 192–202, 2012.                                                 of School Psychology, vol. 35, no. 4, pp. 351–375, 1997.
 [192] D. Barnett, N. Elliott, J. Graden et al., “Technical adequacy          [209] J. Hintze, E. Shapiro, and J. Lutz, “The effects of curriculum on
       for response to intervention practices,” Assessment for Effective             the sensitivity of curriculum-based measurement in reading,”
       Intervention, vol. 32, no. 1, pp. 20–31, 2006.                                The Journal of Special Education, vol. 28, pp. 188–202, 1994.
 [193] S. Messick, “Validity,” in Educational Measurement, R. Linn, Ed.,       [210] L. Fuchs, G. Tindal, and S. Deno, Effects of Varying Item
       pp. 13–103, Macmillan Publishing Company, New York, NY,                       Domains and Sample Duration on Technical Characteristics of
       USA, 3rd edition, 1989.                                                       Daily Measures in Reading, University of Minnesota Institute
[194] A. VanDerHeyden, “Technical adequacy of response to inter-                     for Research on Learning Disabilities, Minneapolis, Minn, USA,
       vention decisions,” Council for Exceptional Children, vol. 77, no.            1982.
       3, pp. 335–350, 2011.
                                                                               [211] Department of Education, Assisting Students Struggling With
 [195] R. Gersten, T. Keating, and L. K. Irvin, “The burden of proof:                Reading: Response to Intervention and Multi-Tier Intervention in
       validity as improvement of instructional practice,” Exceptional               the Primary Grades, Institute of Education Sciences, Washing-
       Children, vol. 61, no. 5, pp. 510–519, 1995.                                  ton, DC, USA, 2009.
[196] R. Allinder, “An examination of the relationship between                 [212] Department of Education, Improving Reading Comprehension
       teacher efficacy and curriculum-based measurement and stu-                    in Kindergarten Through 3rd Grade, Institute of Education
       dent achievement,” Remedial and Special Education, vol. 16, pp.               Sciences, Washington, DC, USA, 2010.
       247–254, 1995.
                                                                               [213] Department of Education, WWC Evidence Review Protocol for
 [197] R. M. Allinder, “When some is not better than none: effects
                                                                                     K-12 Students with Learning Disabilities, Institute of Education
       of differential implementation of curriculum-based measure-
                                                                                     Sciences, Washington, DC, USA.
       ment,” Exceptional Children, vol. 62, no. 6, pp. 525–535, 1996.
                                                                               [214] E. R. O’Connor and P. Vadasy, The Handbook of Reading Inter-
 [198] R. Allinder and M. BeckBest, “Differential effects of two
                                                                                     ventions, Guilford Press, New York, NY, USA, 2011.
       approaches to supporting teachers’ use of curriculum-based
       measurement,” School Psychology Review, vol. 24, pp. 287–298,           [215] R. M. Schwartz, M. C. Schmitt, and M. K. Lose, “Effects of teach-
       1995.                                                                         er-student ratio in response to intervention approaches,” The
[199] R. M. Allinder, R. M. Bolling, R. G. Oats, and W. A. Gagnon,                   Elementary School Journal, vol. 112, no. 4, pp. 547–567, 2012.
       “Effects of teacher self-monitoring on implementation of                [216] L. Saez, “Instructional responsiveness: what are teachers
       curriculum-based measurement and mathematics computation                      doing?” in Proceedings of the Pacific Coast Research Conference,
       achievement of students with disabilities,” Remedial and Special              Coronado, Calif, USA, Februrary 2012.
       Education, vol. 21, no. 4, pp. 219–226, 2000.                           [217] B. Elbaum, S. Vaughn, M. T. Hughes, and S. W. Moody, “How
[200] P. M. Stecker, L. S. Fuchs, and D. Fuchs, “Using curriculum-                   effective are one-to-one tutoring programs in reading for ele-
       based measurement to improve student achievement: review of                   mentary students at risk for reading failure? A meta-analysis
       research,” Psychology in the Schools, vol. 42, no. 8, pp. 795–819,            of the intervention research,” Journal of Educational Psychology,
       2005.                                                                         vol. 92, no. 4, pp. 605–619, 2000.
[201] M. K. Burns and B. V. Senesac, “Comparison of dual discrep-              [218] R. E. O’Connor, “Phoneme awareness and the alphabetic princi-
       ancy criteria to assess response to intervention,” Journal of                 ple,” in The Handbook of Reading Interventions, R. E. O’Connor
       School Psychology, vol. 43, no. 5, pp. 393–406, 2005.                         and P. Vadasy, Eds., Guilford Press, New York, NY, USA, 2011.
  ISRN Education                                                           29
Journal of                                                                                                                                                                                  Economics
Biomedical Education                                                                                                                                                                        Research International
Hindawi Publishing Corporation   Volume 2014                                                                                                                                                Hindawi Publishing Corporation   Volume 2014
http://www.hindawi.com                                                                                                                                                                      http://www.hindawi.com
                                               Child Development
                                               Research
Sleep Disorders
                                                                                              Autism
                                                                                              Research and Treatment
                                                                                                                                             Nursing
                                                                                                                                             Research and Practice
                                                                                                                                                                                            Schizophrenia
                                                                                                                                                                                            Research and Treatment
Hindawi Publishing Corporation                 Hindawi Publishing Corporation                 Hindawi Publishing Corporation                 Hindawi Publishing Corporation                 Hindawi Publishing Corporation
http://www.hindawi.com           Volume 2014   http://www.hindawi.com           Volume 2014   http://www.hindawi.com           Volume 2014   http://www.hindawi.com           Volume 2014   http://www.hindawi.com           Volume 2014
                                               Geography                                                                                     Psychiatry
                                               Journal                                                                                       Journal
Current Gerontology
& Geriatrics Research