0% found this document useful (0 votes)
8 views14 pages

1 - 1 - Murnane - Capítulo 3-Ocr

Uploaded by

lmiranda.mao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views14 pages

1 - 1 - Murnane - Capítulo 3-Ocr

Uploaded by

lmiranda.mao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

3

Designing Research to Address


Causal Questions

One of the first actions that Grover "Russ" Whitehurst, the first director
of the Institute of Education Sciences, took after assuming office in 2002
was to commission a survey of educational practitioners and policymak-
ers in order to learn what they wanted from educational research.I Not
surprisingly, the survey results showed that the priorities of educators
depended on their responsibilities. Superintendents and other local edu-
cation officials were most interested in evidence about particular curricula
and instructional techniques that were effective in increasing student
achievement. State-level policymakers wanted to learn about the conse-
quences of standards-based educational reforms and the impact of
particular school intervention strategies. Congressional staff wanted to
know about the effectiveness of different strategies for enhancing teacher
quality. Educators at all levels wanted to know about the effect of differ-
ences in resource levels, such as class sizes, in determining students'
achievement.
Whereas the priorities of educators depended on their responsibilities,
the striking commonality in their responses was that practitioners and
policymakers-at all levels-wanted to know the answers to questions about
cause and effect. They wanted to know ifA caused B, and wanted JES to com-
mission research that would provide them with answers. In this chapter,
we discuss the conditions that must be satisfied for such causal questions
to be addressed effectively in education, and we introduce some of the
major concepts and terms that we use throughout the rest of the book.

1. See Huang et al. (2003).

26
Designing Research to Address Causal Questions 27

Conditions to Strive for in All Research

Before we begin our discussion of how best to address the causal questions
that are so central to educators, we begin with a brief description of the
classical elements of good research design in the social sciences and edu-
cation. We do this because designing causal research requires us to pay
attention to the central tenets of all good research. Then, within this larger
domain, causal research must satisfy an additional set of constraints, and
it is these that form the central topic for the rest of our book. We used the
expression "strive for" in the title of this section because it is typically dif-
ficult to satisfy all of the conditions we describe. We use examples throughout
the book to clarify the consequences of not satisfying particular elements
of the classical description of effective research design. As you will learn,
violation of some of the tenets of appropriate design makes it impossible
to make a defensible causal inference about the consequences of an edu-
cational policy or intervention. Violation of other tenets does not threaten
the ability to make a causal inference, but does limit the ability to deter-
mine to whom the results of the study apply. We will return to these issues.
However, we begin by stating these elements of good research design.
First, in any high-quality research, whether it be purely descriptive or
able to support causal inference, it is critically important that it begin with
a clear statement of the research question that will drive the project and
the theory that will frame the effort. These two key elements ultimately
drive every aspect of the research design, as they provide the motivation
and the rationale for every design decision that you ultimately make. They
have also been the topics of our first two chapters and, as we have argued,
they are completely intertwined. As theories are refined, it becomes pos-
sible to pose more complex questions, and these, in their turn, inform
refinements of the theory. Light, Singer, and Willett (1990) referred to
this as the "wheel of science."
An explicit statement of the research question makes it possible to
define the population of interest clearly and unambiguously. This is critical
in any research. If we do not do it, we cannot build a suitable sampling
frame, nor can we know to whom we can generalize the findings of our
research. In addition, it pays to be explicit, rather than vague, about the
nature of the population of interest. For example, in studying the impact
of class size on children's reading skills, it might make sense to define the
population of interest to be "all children without special needs in first-
grade classrooms in urban public schools in the United States," rather
than just "children." Defining the population clearly enables readers who
have a particular concern, such as the impact of class size on the learning
of autistic children, to judge the relevance of our results to their concern.
28 Methods Matter

Once we have defined the population of interest clearly, we must work


hard to sample representatively from that population. Thus, in an investi-
gation of the impact of class size on student achievement in the population
defined earlier, we need to decide whether it is feasible to obtain a simple
random sample of students from the population of first graders without
special needs attending urban public schools in the United States.
Alternatively, we might decide that we want to use a more complex sam-
pling plan, such as a multistage cluster sample of school districts, schools,
and grades. However we go about sampling, it is critical that the analytic
sample that we use in our research be fully representative of the popula-
tion. This ensures what methodologists call the external validity of the
research. This refers to the ability to generalize our findings credibly to a
known population of interest.
The next important step in any research project is to choose appropriate
measures of the key variables that are central to the research, and to
ensure their construct validity and reliability for the population under
investigation. We should use our knowledge of the research question and
its supporting theory to distinguish three important classes of variables:
(a) the outcome variable; (b) the principal question predictor, defined as
the variable that provides our research question; and (c) the covariates or
control predictors. These distinctions will recur consistently throughout
our account of causal research, as they do through the account of any
high-quality descriptive research project. In our hypothetical investiga-
tion of class size and academic achievement, for instance, we might decide
to focus on two specific academic outcomes, such as children's reading
and mathematics achievement. Our principal question predictor would
be a measure of class size. Covariates or control variables might include
student demographic characteristics and measures of teacher experience.
We would need to exercise care in determining just how we would mea-
sure each of these variables. For example, we would need to decide
whether we want to measure class size by the number of students enrolled
in a class on a particular day, or perhaps by the average of the number of
students enrolled on several prespecified dates. We would also want to be
sure to measure each student's reading and mathematics achievement
using age-appropriate normed and suitably scaled tests. Our decisions
should be guided by our research question, our theoretical framework,
and the background literature in which they are embedded.
At this point, we want to point out explicitly the one and only distinc-
tion between descriptive and causal research. It concerns the principal
question predictor that forms the centerpiece of the research design.
The critical question for causal research is how the values of the question
predictor are determined for each of the participants in the sample.
Designing Research to Address Causal Questions 29

In our class-size example, if the actions of children, teachers, parents, or


school administrators determine the size of the class into which each child
is placed, all manner of unobserved forces and choices would undermine
our ability to make inferences about the causal impact of class size on
children's achievement. On the other hand, if we were to randomly assign
children and teachers to classes of different sizes, thereby determining
their values on the principal question predictor, we may be able to credi-
bly estimate the causal impact of class size on the achievement of children
in the population from which the analytic sample was drawn. The differ-
ence is simply in the way that the values of the question predictor, class
size, have been determined for each child in the analytic sample and for
their teachers. This single issue and its consequences for design, data
analysis, and interpretation distinguish credible causal research from all
other research. It is the central concern of the rest of our book.
One final step is to ensure that the research is replicated in other samples
drawn from the same population. This is important because of the uncer-
tainty that exists in measurement and is built into the probabilistic nature
of statistical inference. We will devote considerable attention in this book
to describing how different kinds of statistical errors can influence the
findings from statistical analysis.

Making Causal Inferences

In their excellent book on the design of social science research, Shadish,


Campbell, and Cook (2002, p. 6) cite 19th-century philosopherJohn Stuart
Mill's description of three critical conditions that must be met in order to
claim that one thing causes another. The first condition is that the hypoth-
esized cause must precede its anticipated effect in time. For example, in
investigating whether student achievement depends upon the number of
students in the class, it is important to ensure that students had been taught
in class settings of a particular size before their achievement was measured.
The second of Mill's conditions is that if the levels of the cause differ in
some systematic way, then there must be corresponding variation in the
effect. For example, if our theory suggests that children taught in classes
with fewer students achieved at higher levels, we would anticipate that as
the number of students in classes got smaller, the students' achievement
would be higher, on average.
The third of Mill's conditions is by far the most important and the most
difficult to satisfy in practice. It stipulates that the researcher must be able
to discount all other plausible explanations-other than the anticipated
causal one-for the link observed between the hypothetical cause and effect.
30 Methods Matter

In the case of an investigation of the impact of class size on student


achievement, we must be able to argue compellingly that any observed
association between class sizes and subsequent student achievement is
not a consequence, for example, of choices that parents may have made
about where to send their children to school or decisions by school admin-
istrators to assign students with particular characteristics to classes of
particular sizes.
The most persuasive way to conduct research that satisfies Mills' three
conditions-and thereby successfully address causal questions-is for the
researcher to conduct an experiment. Following Shadish, Campbell, and
Cook (2002, p. 511), we define an experiment as an empirical investiga-
tion in which the levels of a potential cause are manipulated by an outside
agent functioning independently of the participants in the research, and
after which the consequences for an important outcome are measured.
Furthermore, as illustrated in Figure 3.1, we distinguish between two
kinds of experiments: randomized experiments and quasi-experiments.
The most compelling evidence for making causal attributions typically
comes from randomized experiments, defined as experiments in which
units are assigned to experimental conditions by a random process, such
as the toss of a fair coin (Shadish, Campbell, & Cook, 2002, p . 12). Notice
that well-executed randomized experiments satisfy Mills's three conditions
for making causal inferences: (a) cause precedes effect, (b) different levels
of cause can lead to different levels of effect, and (c) random assignment
obviates all other plausible explanations for any differences in effect
detected. In fact, the random assignment of students and teachers to
classes of different sizes by an independent investigator ensures that the
children and teachers who were in the different class-size "treatments" are
equal on all characteristics-on average-before the experiment begins.
Because of randomization, any small and idiosyncratic differences that
exist among the groups prior to treatment will fall within the noise that is

Randomized
experiments

All
Quasi-
experiments
experiments

Figure 3.1 Two kinds of experiments.


Designing Research to Address Causal Questions 31

accounted for naturally by statistical methods used to analyze the resulting


outcome data. As we describe more fully in Chapter 4, when individuals
are assigned by randomization to different experimental conditions, we
say that the groups so-formed are equal in expectation.
Quasi-experiments are experiments in which units are not assigned to
conditions randomly (Shadish, Campbell, & Cook, 2002, p. 12). It is some-
times possible to make legitimate causal inferences using data from
quasi-experiments. Indeed, we devote several chapters of this book to
methods for doing so. However, as we illustrate with many examples,
researchers need to be prepared to deal with a variety of threats to the inter-
nal validity of research based on data from quasi-experiments. As we
discuss in more detail in Chapter 4, this term refers to threats to the validity
of a statement that the relationship between two variables is causal
(Shadish, Campbell, & Cook, 2002, pp. 53-61 ).
Although the interpretation of the evidence from any experiment will
depend on the details of the particular case, we want to emphasize one
general point. Randomized experiments and quasi-experiments typically
provide estimates of the total effect of a policy intervention on one or
more outcomes, not the effects of the intervention holding constant the
levels of other inputs (Todd & Wolpin, 2003). This matters, because fami-
lies often respond to a policy intervention in a variety of ways, and the
experiment provides evidence about the net impact of all of the responses
on measured outcomes. For example, we will describe several experiments
in which parents were offered scholarships to help pay for the education
of a particular child at a private school. A common outcome in such
experiments is a measure of the cognitive skills of children at a later point
in time. One response to the policy is to increase the probability that par-
ents send to a private school the child offered a scholarship. However,
another response may be that the parents reduce the amount of money
that they spend on providing tutoring and enrichment activities for that
child in order to free up resources to devote to other children. The exper-
iment provides evidence about the net impact of these two responses (as
well as any others). It does not provide an estimate of the impact of the
scholarship offer on children's subsequent achievement, holding constant
the level of parental resources devoted to tutoring and enrichment.

Past Approaches to Answering Causal Questions


in Education

Unfortunately, until fairly recently, most educational researchers did not


address their causal questions by conducting randomized experiments or
32 Methods Matter

by adopting creative approaches to analyzing data from quasi-experiments.


Instead, they typically conducted observational studies, defined as analyz-
ing data from settings in which the values of all variables-including those
describing participation in different potential "treatments"-are observed
rather than assigned by an external agent (Shadish, Campbell & Cook,
2002, p. 510). For instance, hundreds of observational studies have been
conducted on the association between class size and academic achieve-
ment using achievement data collected from students during the normal
operation of a school district. In these settings, the number of students in
various classes differs as a result of demographic patterns, the decisions
of parents about where to live, and the decisions of school administrators
about placement of students into classes.
In observational studies, the skills and motivations of students in small
classes may differ from those in larger classes, irrespective of any impact
that class size itself may have had ultimately on their achievement. This
could be the result of a variety of mechanisms. For example, families with
the resources to invest in their children's education may purchase or rent
homes in the attendance zones of schools with reputations for having
small classes. As a result, the average achievement of students in the
schools with relatively small classes may be higher than that in schools
with larger classes, even if class size did not have a causal effect on student
achievement. The reason could be that those parents who chose to live
near schools with small classes used their resources to provide their
children with educationally enriched environment at home. This is an
example of what methodologists would call an endogenous assignment of
participants to treatments. By this we mean that assignment to levels of
the treatment is a result of actions by participants within the system being
investigated-in this case, the decisions of parents with resources to take
advantage of the relatively small classes offered in particular schools.
Of course, well-trained quantitative researchers recognized that, as a
result of the decisions of parents and school administrators, students
placed endogenously in classes of different sizes may differ from each
other in respects that are difficult to observe and measure. For many
years, researchers responded to this dilemma in one of two ways. One
common response was to include increasingly larger and richer sets of
covariates describing the students and their families in the statistical
models that were used to estimate the effect of treatment on outcome.
The hope was that the presence of these control predictors would account
for differences in the outcome that were due to all of the unobserved-
and endogenously generated-differences among students in classes of
different size. Sociologists Stephen Morgan and Christopher Winship (2007,
p. 10) refer to the period in which researchers relied on this strategy as
Designing Research to Address Causal Questions 33

"the age of regression." Seminal studies published in the 1980s threw cold
water on this "control for everything" strategy by demonstrating that
regression analyses that contained a very rich set of covariates did not
reproduce consistently the results of experiments in which individuals
were assigned randomly to different experimental conditions. 2
A second response, especially common among developmental psychol-
ogists, was to accept that analysis of observational data could not support
causal inference and to simply avoid using causal language in both the
framing of research questions and in the interpretation of research results.
For example, researchers would investigate whether children placed in
center-based child care had better subsequent performance on cognitive
tests than did observationally similar children in family-based child care,
and would simply caution that causal attribution was not justified on the
basis of their findings. In our view, there are at least two problems with
this approach. First, the cautions presented in the "Methods" and "Results"
sections of research papers were often forgotten in the "Discussion" sec-
tion, where researchers would suggest policy implications that depended
on an unsupported causal interpretation of their findings. Second, their
use of noncausal language meant that these researchers were not accus-
tomed to considering explicitly alternative explanations for the statistical
relationships they observed.
Fortunately, in more recent years, social scientists have developed a
variety of new research designs and analytic strategies that offer greater
promise for addressing causal questions about the impact of educational
policies. Many of these new approaches also make use of standard tech-
niques of multiple regression analysis, but apply them in new ways.
Explaining these strategies, and illustrating their use, is a central goal of
this book.

The Key Challenge of Causal Research

In conducting causal research in education and the social sciences, our


central objective is to determine how the outcomes for individuals who
receive a treatment differ from what the outcomes would have been in the
absence of the treatment. The condition to which the research subjects
would have been exposed in the absence of the experimental treatment is
called the counterfactual. From a theoretical standpoint, the way to obtain
an ideal counterfactual would be to use the same participants under both

2. See Angrist & Pischke (2009, pp. 86-91) for a discussion of this evidence.
34 Methods Matter

a treatment (e.g., "small" class size) and a "control" (e.g., "normal" class
size) condition, resetting all internal and external conditions to their
identical initial values before participants experienced either condition.
So, you might draw a representative sample of participants from the pop-
ulation, administer the treatment to them, and measure their outcome
values afterward. Then, to learn what the outcomes would be under the
counterfactual condition, you would need to transport these same par-
ticipants back to a time before your research was conducted, erase
all their experiences of the treatment and the outcome measurement
from their memories, and measure their values of the outcome again,
after their lives had transpired under the control condition. If this were
possible, you could argue convincingly that any difference in each partici-
pant's outcome values between the two conditions must be due only to
their experiences of the treatment.
Then, because you possessed values of the outcome for each individual
obtained under both "factual" and 'counterfactual" conditions, you would
be able to estimate the effect of the treatment for each participant. We
call this the individual treatment effect (ITE). You would do this simply by
subtracting the value of the outcome obtained under the counterfactual
condition from the value obtained under the treated condition. In this
imaginary world, you could then average these estimated ITEs across all
members of the sample to obtain the estimated average treatment effect
(ATE) for the entire group. Finally, with a statistical technique like a simple
paired t-test, you could seek to reject the null hypothesis that the popula-
tion mean difference in participants' outcomes between the treated and
counterfactual conditions was zero. On its rejection, you could use your
estimate of the ATE as an unbiased estimate of the causal effect of the
treatment in the population from which you had sampled the participants.
Since time travel and selective memory erasure lie in the realm of imag-
ination rather than research, in practice you always have a "missing data"
problem. As we illustrate in Figure 3.2, you never actually know the value
of the outcome for any individual under both the treatment and control
conditions. Instead, for members of the treatment group, you are missing
the value of the outcome under the control condition, and for members
of the control group, you are missing the value of the outcome under the
treatment condition. Consequently, you can no longer estimate the indi-
vidual treatment effects and average them up to obtain an estimate of the
average treatment effect.
So, you must devise an alternative, practical strategy for estimating the
average treatment effect. The reason that this is so difficult to do in prac-
tice is that actors in the educational system typically care a lot about which
experimental units (whether they be students or teachers or schools) are
Designing Research to Address Causal Questions 35

... the value of ... the value of


the outcome in the outcome in
the Treatment the Control
Group is ... Group is ...

For members of
Known Missing
the Treatment
Group ...

For members of
Missing Known
the Control
Group ... Figure 3.2 The challenge of the
counterfactual.

assigned to particular educational treatments, and they take actions to


try to influence these assignments. In other words, the assignment of
participants to treatments is typically endogenous in educational research.
A consequence of this is that, in an investigation of the impact of class size
on academic achievement, students assigned endogenously to differently
sized classes are likely to differ from each other, and not only on dimen-
sions that can be observed (such as gender, age, and socioeconomic status),
but also on dimensions that remain unobserved (such as intrinsic motiva-
tion and parental commitment, both of which are likely to be associated
with achievement outcomes).
One positive way to restate this point-and to satisfy Mills's third condi-
tion for making compelling causal inferences-is to insist that the
assignment of participants to treatments be exogenous rather than endoge-
nous. According to the Oxford English Dictionary, exogenous means "relating
to external causes," and is the natural opposite of endogenous, which means
"relating to an internal cause or origin." In the context of our book, these
words have similar, though more refined and specific meanings. When we
say that there is "exogenous variation" in the educational treatments that
students receive, we mean that the assignment of students to treatments
has not been determined by participants within the educational system-
that is, by the students, parents, teachers, or administrators-themselves.
Instead, their placement in a particular treatment condition has been
determined "externally" by the investigator or some other independent
agency.
Of course, you might argue that it is not good enough for assignment
to treatment condition to be simply exogenous. It is possible, for instance,
that even external agents may be biased or corrupt in their assignment of
participants to treatment conditions. Typically, though, when we say that
assignment to experimental conditions is exogenous, we are assuming
36 Methods Matter

that the external agent has exercised his or her opportunity to assign par-
ticipants in a way that supports causal inference directly. One very simple
and useful way that such exogenous variation in experimental conditions
can be created is for the investigator to assign participants randomly to
treatments. Such an approach was taken in the Tennessee Student/
Teacher Achievement Ratio (STAR) experiment (Krueger, 1999).
In the mid-1980s, the Tennessee state legislature appropriated funding
for a randomized experiment to evaluate the causal impact of class-size
reduction on the reading and mathematics achievement of children in
the primary grades. More than 11,000 students and 1,300 teachers in
79 public schools throughout the state participated in the experiment,
which became known as Project STAR In each participating school,
children entering kindergarten in the fall of 1985 were assigned randomly
by investigators to one of three types of classes: (a) a small class with 13 to
17 children, (b) a class of regular size with 22 to 25 students, or (c) a class
of regular size staffed by both a teacher and a full-time teacher's aide.
Teachers in each school were also assigned randomly to classrooms.
Finally, the research design called for students to remain in their origi-
nally designated class type through third grade.
A major theme of our book is that some element of exogeneity in the
assignment of units to a treatment is necessary in order to make causal
inferences about the effects of that treatment. Expressed in the formal
terms used by statisticians and quantitative social scientists, a source of
exogenous assignment of units to treatments is necessary to identify the
causal impact of the treatment. So, when a social scientist asks what
identification strategy was used in a particular study, the question is about
the source of the exogeneity in the assignment of units to treatments.
In subsequent chapters, we show that randomization is not the only way
of obtaining useful exogenous variation in treatment status and conse-
quently of identifying the causal impact of a treatment. Sometimes, it is
possible to do so with data from a quasi-experiment. Sometimes, it is even
possible to do so with data from an observational study, using a statistical
method known as instrumental-variables estimation that we introduce in
Chapter 10.
The Tennessee STAR experiment, which the eminent Harvard statisti-
cian Frederick Mosteller called "one of the most important educational
investigations ever carried out" (Mosteller 1995, p. 113), illustrates the
difficulties in satisfying all of the conditions for good research that we
described earlier in this chapter. After the Tennessee legislature autho-
rized the experiment in 1985, the State Commissioner of Education
invited all public school systems and elementary schools in the state to
Designing Research to Address Causal Questions 37

apply to participate. Approximately 180 schools did so, 100 of which were
sufficiently large to satisfy the design criterion of having three classes at
each grade level from kindergarten through grade 3. The research team
then chose 79 schools to participate.
The process of selecting schools to participate in the STAR experiment
illustrates some of the compromises with best research practice that are
sometimes necessary in even extremely well-planned experiments. First,
the research sample of schools was chosen from the set of schools that
volunteered to participate. It is possible that the schools that volunteered
differed from those that did not in dimensions such as the quality oflead-
ership. Second, only quite large schools met the design requirements and
consequently the STAR experiment provided no evidence about the
impact of class size on student achievement in small schools. Third,
although the research team was careful to include in the research sample
urban, suburban, and rural schools, as the enabling legislation mandated,
it did not randomly select 79 schools from the population of 100 schools
that volunteered and met the size criteria (Folger, 1989). A consequence
of the sample selection process is that the definition of the population of
schools to which the results of the experiment could be generalized is not
completely clear. The most that can be said is that the results pertain to
large elementary schools in Tennessee that volunteered to participate in
the class-size experiment. It is important to understand that the lack of
clarity about the population from which the sample is taken is a matter of
external validity. The sampling strategy did not threaten the internal valid-
ity of the experiment because students and teachers within participating
schools were randomized to treatment conditions.
The STAR experiment also encountered challenges to internal validity.
Even though children in participating schools had originally been ran-
domly and exogenously assigned to classes of different sizes, some parents
were successful in switching their children from a regular-size class to
a small class at the start of the second school year. This endogenous
manipulation had the potential to violate the principal assumption that
underpinned the randomized experiment, namely, that the average
achievement of the students in regular-size classes provided a compelling
estimate of what the average achievement of the students placed in the
small classes would have been in the absence of the intervention. The
actions of these parents therefore posed a threat to the internal validity of
the causal inferences made from data collected in the STAR experiment
about the impact of a second year of placement in a small class.
This term, threat to internal validity, is important in the annals of causal
research and was one of four types of validity threats that Donald Campbell
38 Methods Matter

(1957), a pioneer in developing methods for making causal inferences,


described more than a half century ago. As mentioned earlier, it refers to
rival explanations for the statistical relationships observed between
educational treatments and outcomes. If we can remove all threats to
internal validity, we have eliminated all alternative explanations for the
link between cause and effect, and satisfied Mills's third condition.
Devising strategies to respond to threats to internal validity is a critical
part of good social science. Of course, in quasi-experimental and observa-
tional research, ruling out all potential rival explanations for the
hypothesized link between "cause" and "effect" is extraordinarily difficult
to do. How do you know when you have enumerated and dismissed all
potential rival explanations? The short answer is that you never do know
with certainty (although, of course, with each rival explanation that you
do succeed in ruling out explicitly, the stronger is your case for claiming
a causal link between treatment and outcome, even in quasi-experimental
and observational research). As we explain in the next chapter, one of the
great advantages of the classic randomized experimental design, in which
a sample of participants is assigned randomly to different treatments, is
that this process eliminates all alternative explanations for any relation-
ship between class size and student achievement. But, even in randomized
experiments, things can go wrong, and you may have to provide evidence
for the internal validity of your work. In Chapter 5, we describe some of
the problems that can crop up in randomized experiments and how
skilled researchers have dealt with them.
Perhaps the most important lesson to take away from this chapter is
that the active behaviors of the participants in the educational system-
teachers, administrators, parents, and students-have enormous impacts
on the quality of the education provided in particular schools and class-
rooms. These active behaviors often make it very difficult to conduct
internally valid evaluations of the impacts of educational initiatives,
whether they involve the placement of students in smaller classes, the use
of new curricula and instructional methods, the installation of new ways
to prepare teachers, or the creation of new governance structures. In the
chapters that follow, we show how new sources of data, new approaches
to research design, and new data-analytic methods have improved our
ability to conduct internally valid studies of the causal impact of educa-
tional initiatives on student outcomes. We will make use of the terms
introduced in this chapter, including randomized experiment, quasi-experiment,
observational study, exogenous, endogenous, and threats to internal and external
validity. By the time you have finished reading our book, these terms will
be old friends.
Designing Research to Address Causal Questions 39

What to Read Next

For readers who wish to follow up on the ideas we have raised in this chapter,
we recommend Shadish, Campbell, and Cook's comprehensive book (2002)
on the design of research, Experimental and Quasi-Experimental Designs,
and Morgan and Winship's insightful book (2007), Counterfactuals and
Causal Inference.

You might also like