57 12.3 The logic of experimental design
Learning Objectives
- Apply the criteria of causality to experimental design
- Define internal validity and external validity
- Identify threats to validity
As we discussed at the beginning of this chapter, experimental design is commonly understood and implemented informally in everyday life. Trying out a new restaurant, dating a new person—we often term these experiments. As you’ve learned over the past two sections, in order for something to be a true experiment, or even a quasi- or pre-experiment, you must rigorously apply the various components of experimental design. A true experiment for trying a new restaurant would include recruitment of a large enough sample, random assignment to control and experimental groups, pretesting and posttesting, as well as using clearly and objectively defined measures of satisfaction with the restaurant.
Social scientists use this level of rigor and control because they try to maximize the internal validity of their experiment. Internal validity is the confidence researchers have about whether their intervention produced variation in their dependent variable. Thus, experiments are attempts to establish causality between two variables—your treatment and its intended outcome. As we talked about in Chapter 7, nomothetic causal relationships must establish four criteria: covariation, plausibility, temporality, and nonspuriousness.
The logic and rigor experimental design allows for causal relationships to be established. Experimenters can assess covariation on the dependent variable through pre- and posttests. The use of experimental and control conditions ensures that some people receive the intervention and others do not, providing variation in the independent variable. Moreover, since the researcher controls when the intervention is administered, she can be assured that changes in the independent variable (the treatment) happened before changes the dependent variable (the outcome). In this way, experiments assure temporality. In our restaurant experiment, we would know through assignment experimental and control groups that people varied in the restaurant they attended. We would also know whether their level of satisfaction changed, as measured by the pre- and posttest. We would also know that changes in our diners’ satisfaction occurred after they left the restaurant, not before they walked in because of the pre- and posttest.
Experimenters will also have a plausible reason why their intervention would cause changes in the dependent variable. Usually, a theory or previous empirical evidence should indicate the potential for a causal relationship. Perhaps we found a national poll that found the type of food our experimental restaurant served, let’s say pizza, is the most popular food in America. Perhaps this restaurant has good reviews on Yelp or Google. This evidence would give us a plausible reason to establish our restaurant as causing satisfaction.
One of the most important features of experiments is that they allow researchers to eliminate spurious variables. True experiments are usually conducted under strictly controlled laboratory conditions. The intervention must be given in the same way to each person, with a minimal number of other variables that might cause their posttest scores to change. In our restaurant example, this level of control might prove difficult. We cannot control how many people are waiting for a table, whether participants saw someone famous there, or if there is bad weather. Any of these factors might cause a diner to be less satisfied with their meal. These spurious variables may cause changes in satisfaction that have nothing to do with the restaurant itself, an important problem in real-world research. For this reason, experiments use the laboratory environment try to control as many aspects of the research process as possible. Researchers in large experiments often employ clinicians or other research staff to help them. Researchers train their staff members exhaustively, provide pre-scripted responses to common questions, and control the physical environment of the lab so each person who participates receives the exact same treatment.
Experimental researchers also document their procedures, so that others can review how well they controlled for spurious variables. My favorite example of this concept is Bruce Alexander’s Rat Park (1981) experiments because it spoke directly to my practice as a substance abuse and mental health social worker. [1] Much of the early research conducted on addictive drugs, like heroin and cocaine, was conducted on animals other than humans, usually mice or rats. While this may seem strange, the systems of our mammalian relatives are similar enough to humans that causal inferences can be made from animal studies to human studies. It is certainly unethical to deliberately cause humans to become addicted to cocaine and measure them for weeks in a laboratory, but it is currently more ethically acceptable to do so with animals. There are specific ethical processes for animal research, similar to an IRB review.
The scientific consensus up until Alexander’s experiments was that cocaine and heroin were so addictive that rats, if offered the drugs, would consume them repeatedly until they perished. Researchers claimed this behavior explained how addiction worked in humans, but Alexander was not so sure. He knew rats were social animals and the experimental procedure from previous experiments did not allow them to socialize. Instead, rats were kept isolated in small cages with only food, water, and metal walls. To Alexander, social isolation was a spurious variable, causing changes in addictive behavior not due to the drug itself. Alexander created an experiment of his own, in which rats were allowed to run freely in an interesting environment, socialize and mate with other rats, and of course, drink from a solution that contained an addictive drug. In this environment, rats did not become hopelessly addicted to drugs. In fact, they had little interest in the substance.
To Alexander, the results of his experiment demonstrated that social isolation was more of a causal factor for addiction than the drug itself. This makes intuitive sense to me. If I were in solitary confinement cell for most of my life, the escape of an addictive drug would seem more tempting than if I were in my natural environment with friends, family, and activities. One challenge with Alexander’s findings is that subsequent researchers have had mixed success replicating his findings (e.g., Petrie, 1996; Solinas, Thiriet, El Rawas, Lardeux, & Jaber, 2009). [2] Replication involves conducting another researcher’s experiment in the same manner and seeing if it produces the same results. If the causal relationship is real, it should occur in all (or at least most) replications of the experiment.
One of the defining features of experiments is that they report their procedures diligently, which allows for easier replication. Recently, researchers at the Reproducibility Project have caused a significant controversy in social science fields like psychology (Open Science Collaboration, 2015). [3] In one study, researchers attempted reproduce the results of 100 experiments published in major psychology journals between 2008 and the present. What they found was shocking. The results of only 36% of the studies were reproducible. Despite coordinating closely with the original researchers, the Reproducibility Project found that nearly two-thirds of psychology experiments published in respected journals were not reproducible. The implications of the Reproducibility Project are staggering, and social scientists are coming up with new ways to ensure researchers do not cherry-pick data or change their hypotheses, simply to get published.
Returning to Alexander’s Rat Park study, consider what the implications of his experiment were to a substance abuse professional such as myself. The conclusions he drew from his experiments on rats were meant to generalize to the population of people with substance use disorders with whom I worked. Experiments seek to establish external validity, which is the degree to which their conclusions generalize to larger populations and different situations. Alexander argues his conclusions about addiction and social isolation help us understand why people living in deprived, isolated environments will often become addicted to drugs more often than those in more enriching environments. Similarly, earlier rat researchers argued their results showed these drugs were instantly addictive, often to the point of death.
Neither study will match up perfectly with real life. I met in my practice many individuals who may have fit into Alexander’s social isolation model, but social isolations for humans is complex. My clients lived in environments with other sociable humans, worked jobs, and had romantic relationships, so how isolated were they? On the other hand, many faced structural racism, poverty, trauma, and other challenges that may contribute to social isolation. Alexander’s work helped me understand part of my clients’ experiences, but the explanation was incomplete. The real world was much more complicated than the experimental conditions in Rat Park, just as humans are more complex than rats.
Social workers are especially attentive to how social context shapes social life. So, we are likely to point out a specific disadvantage of experiments. They are rather artificial. How often do real-world social interactions occur in the same way that they do in a lab? Experiments that are conducted in community settings may not be as subject to artificiality, though then their conditions are less easily controlled. This relationship demonstrates the tension between internal and external validity. The more researchers tightly control the environment to ensure internal validity, the less they can claim external validity and that their results are applicable to different populations and circumstances. Correspondingly, researchers whose settings are just like the real world will be less able to ensure internal validity, as there are many factors that could pollute the research process. This is not to suggest that experimental research cannot have external validity, but experimental researchers must always be aware that external validity problems can occur and be forthcoming in their reports of findings about this potential weakness.
Threats to validity
Internal validity and external validity are conceptually linked. Internal validity refers to the degree to which the intervention causes its intended outcomes, and external validity refers to how well that relationship applies to different groups and circumstances. There are a number of factors that may influence a study’s validity. You might consider these threats to all be spurious variables, as we discussed at the beginning of this section. Each threat proposes another factor that is changing the relationship between intervention and outcome. The threats introduce error and bias into the experiment.
Throughout this chapter, we reviewed the importance of experimental and control groups. These groups must be comparable in order for experimental design to work. Comparable groups are groups that are similar across factors important for the study. Researchers can help establish comparable groups by using probability sampling, random assignment, or matching techniques. Control or comparison groups provide a counterfactual—what would have happened to my experimental group had I not given them my intervention? Two very different groups would not allow you to answer that question. Intuitively, we all know that no two people are exactly the same. So, no groups are ever perfectly comparable. What’s important is ensuring groups are comparable along the variables relevant to the research project.
In our restaurant example, if one of my groups had far more vegetarians or people with gluten issues, it might influence how satisfied they were with my restaurant. My groups, in that case, would not be comparable. Researchers also account for this by measuring other variables, like dietary preference, and controlling for their effects statistically, after the data are collected. We discussed control variables like these in Chapter 7. Similarly, if I were to pick out people I thought would “really like” my restaurant and assign them to the experimental group, I would be introducing selection bias into my sample. This is another reason experimenters use random assignment, so conscious and unconscious bias do not influence to which group a participant is assigned.
Experimenters themselves are often the source of threats to validity. They may choose measures that do not accurately measure participants or implement the measure in a way that biases participant responses in one direction or another. Researchers may, just by the very act of conducting an experiment, influence participants to perform differently. Experiments are different from participants’ normal routines. The novelty of a research environment or experimental treatment may cause them to expect to feel differently, independently of the actual intervention. You have likely heard of the placebo effect, in which a participant feels better, despite having received no intervention at all.
Researchers may also introduce error by expecting participants in each group to behave differently. For the experimental group, researchers may expect them to feel better and may give off conscious or unconscious cues to participants that influence their outcomes. Control groups will be expected to fare worse, and research staff could cue participants that they should feel worse than they otherwise would. For this reason, researchers often use double-blind designs wherein research staff interacting with participants are unaware of who is in the control or experimental group. Proper training and supervision are also necessary to account for these and other threats to validity. If proper supervision is not applied, research staff administering the control group may try to equalize treatment or engage in a rivalry with research staff administering the experimental group (Engel & Schutt, 2016). [4]
No matter how tightly the researcher controls the experiment, participants are humans and are therefore curious, problem-solving creatures. Participants who learn they are in the control group may react by trying to outperform the experimental group or by becoming demoralized. In either case, their outcomes in the study would be different had they been unaware of their group assignment. Participants in the experimental group may begin to behave differently or share insights from the intervention with individuals in the control group. Whether through social learning or conversation, participants in the control group may receive parts of the intervention of which they were supposed to be unaware. Experimenters, as a result, try to keep experimental and control groups as separate as possible. Inside a laboratory study, this is significantly easier as the researchers control access and timing at the facility. In agency-based research, this problem is more complicated. If your intervention is good, your participants in the experimental group may impact the control group by behaving differently and sharing the insights they’ve learned with their peers. Agency-based researchers may locate experimental and control conditions at separate offices with separate treatment staff to minimize the interaction between their participants.
Key Takeaways
- Experimental design provides researchers with the ability to best establish causality between their variables.
- Experiments provide strong internal validity but may have trouble achieving external validity.
- Experimental deigns should be reproducible by future researchers.
- Threats to validity come from both experimenter and participant reactivity.
Glossary
- Comparable groups- groups that are similar across factors important for the study
- Double-blind- when researchers interact with participants are unaware of who is in the control or experimental group
- External validity- the degree to which experimental conclusions generalize to larger populations and different situations
- Internal validity- the confidence researchers have about whether their intervention produced variation in their dependent variable
- Placebo effect- when a participant feels better, despite having received no intervention at all
- Replication- conducting another researcher’s experiment in the same manner and seeing if it produces the same results
- Selection bias- when a researcher consciously or unconsciously influences assignment into experimental and control groups
Image attributions
One of Juno’s solar panels before illumination test by NASA/Jack Pfaller public domain
- Alexander, B. (2010). Addiction: The view from rat park. Retrieved from: http://www.brucekalexander.com/articles-speeches/rat-park/148-addiction-the-view-from-rat-park ↵
- Petrie, B. F. (1996). Environment is not the most important variable in determining oral morphine consumption in Wistar rats. Psychological reports, 78(2), 391-400.; Solinas, M., Thiriet, N., El Rawas, R., Lardeux, V., & Jaber, M. (2009). Environmental enrichment during early stages of life reduces the behavioral, neurochemical, and molecular effects of cocaine. Neuropsychopharmacology, 34(5), 1102. ↵
- Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. ↵
- Engel, R. J. & Schutt, R. K. (2016). The practice of research in social work (4th ed.). Washington, DC: SAGE Publishing. ↵