0% found this document useful (0 votes)
12 views25 pages

RP 7

The study investigates the phenomenon of retrieval-induced forgetting, where the act of recalling certain items can impair the recall of related, unpracticed items. Three experiments demonstrate that this forgetting persists even after controlling for output interference and primarily affects high-frequency items. The findings suggest that the retrieval process itself plays a significant role in long-term memory accessibility and forgetting, challenging traditional views on memory retrieval dynamics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views25 pages

RP 7

The study investigates the phenomenon of retrieval-induced forgetting, where the act of recalling certain items can impair the recall of related, unpracticed items. Three experiments demonstrate that this forgetting persists even after controlling for output interference and primarily affects high-frequency items. The findings suggest that the retrieval process itself plays a significant role in long-term memory accessibility and forgetting, challenging traditional views on memory retrieval dynamics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Journal of Experimental Psychology: Copyright 1994 by the American Psychological Association, Inc.

Learning, Memory, and Cognition 0278-7393/94/13.00


1994, Vol. 20, No. 5,1063-1087

Remembering Can Cause Forgetting:


Retrieval Dynamics in Long-Term Memory
Michael C. Anderson, Robert A. Bjork, and Elizabeth L. Bjork

Three studies show that the retrieval process itself causes long-lasting forgetting. Ss studied 8
categories (e.g., Fruit). Half the members of half the categories were then repeatedly practiced
through retrieval tests (e.g., Fruit Or ). Category-cued recall of unpracticed members of
practiced categories was impaired on a delayed test. Experiments 2 and 3 identified 2 significant
features of this retrieval-induced forgetting: The impairment remains when output interference is
controlled, suggesting a retrieval-based suppression that endures for 20 min or more, and the
impairment appears restricted to high-frequency members. Low-frequency members show little
impairment, even in the presence of strong, practiced competitors that might be expected to block
access to those items. These findings suggest a critical role for suppression in models of retrieval
inhibition and implicate the retrieval process itself in everyday forgetting.

A striking implication of current memory theory is that the competitors' associations to the cue; and (c) the retrieval-based
very act of remembering may cause forgetting. It is not that the learning assumption—that the act of retrieval is a learning
remembered item itself becomes more susceptible to forget- event in the sense that it enhances subsequent recall of the
ting; in fact, recalling an item increases the likelihood that it retrieved item. Taken together, these assumptions imply that
will be recallable again at a later time. Rather, it is other repeated retrieval of a given item will strengthen that item,
items—items that are associated to the same cue or cues causing loss of retrieval access to other related items. We refer
guiding retrieval—that may be put in greater jeopardy of being to this possibility as retrieval-induced forgetting. In this article,
forgotten. Impaired recall of such related items may arise if we explore two questions regarding retrieval-induced forget-
access to them is blocked by the newly acquired strength of ting, one empirical and the other theoretical: (a) Is retrieval-
their successfully retrieved competitors (Blaxton & Neely, induced forgetting a significant factor producing fluctuations
1983; Brown, 1981; Brown, Whiteman, Cattoi, & Bradley, in the long-term accessibility of knowledge? and (b) To what
1985; Roediger, 1974, 1978; Roediger & Schmidt, 1980; Run- extent do such effects support the strength-dependence assump-
dus, 1973). tion? We believe that exploring these questions may help solve
This implication follows from three assumptions underlying the puzzle of why so little of the knowledge available in
what we herein refer to as strength-dependent competition long-term memory remains consistently accessible.
models of interference: (a) the competition assumption—that Many studies illustrate that prior retrievals can make subse-
memories associated to a common cue compete for access to quent retrieval of related information more difficult, at least
conscious recall when that cue is presented; (b) the strength- within the context of a single testing session. For example, in
dependence assumption—that the cued recall of an item will the domain of episodic memory, the study of output interfer-
decrease as a function of increases in the strengths of its ence has shown that an item's recall probability declines
linearly as a function of its serial position in a testing sequence.
This decline has been demonstrated with recall of paired
Michael C. Anderson, Robert A. Bjork, and Elizabeth L. Bjork, associates (Arbuckle, 1966; Roediger & Schmidt, 1980; Tulv-
Department of Psychology, University of California, Los Angeles. ing & Arbuckle, 1963,1966) and categorized word lists (Dong,
The research reported herein was supported in part by Grant
1972; Roediger, 1973; Roediger & Schmidt, 1980; Smith, 1971,
4-564040-RB-19900 to Robert A. Bjork and Grant 4-564040-EB-19900
to Elizabeth L. Bjork from the Committee on Research, University of
1973; Smith, D'Agostino, & Reid, 1970); it occurs regardless of
California, Los Angeles, and by Grant MDA 903-89-K-0179 to Keith a category's serial position in the learning list (Smith, 1973),
Holyoak from the Army Research Institute. The article appears on and it does not result from the loss of items from primary
University Microfilms as part of a dissertation submitted to the memory over time (Smith, 1971). In semantic memory, speeded
University of California, Los Angeles, in fulfillment of the degree of generation of several category exemplars on the basis of letter
PhD for Michael C. Anderson. cues (e.g., Fruit A ) slows generation of later exemplars
We gratefully acknowledge the assistance of Myra Jimenez, Steven and increases the number of generation failures (Blaxton &
Machado, and Shirley Yu in the collection of data and of Catherine Neely, 1983; Brown, 1981; Brown et al., 1985). These effects of
Fritz, Dina Ghodsian, Keith Holyoak, Keith Horton, John Shaw,
output interference in both episodic and semantic memory
Bobbie Spellman, and Tom Wickens for comments on drafts of this
article. We also thank Todd Gross, Steven Machado, Anthony Wag- violate expectations derived on the basis of semantic priming
ner, and especially Bobbie Spellman for many thoughtful conversa- and spreading activation, according to which retrieval should
tions on the topic of retrieval inhibition. facilitate recall of related knowledge, not impair it (Loftus,
Correspondence concerning this article should be addressed to 1973; Loftus & Loftus, 1974; Neely, 1976; Warren, 1977).
Michael C. Anderson, Department of Psychology, University of These effects show that retrieval-induced forgetting does
California, 405 Hilgard Avenue, Los Angeles, California 90024-1563. occur, at least within a single testing session, which some

1063
1064 M. ANDERSON, R. BJORK, AND E. BJORK

authors have taken as evidence that retrieval is a basic process ments are the various findings that strengthening can fail to
underlying forgetting from long-term memory (Roediger, 1974). produce impairment. These failures are illustrated vividly in
Although these initial forays into retrieval-induced forget- studies by DaPolito (1966) and Blaxton and Neely (1983).
ting are suggestive, little work has been done to justify the DaPolito explored the amount of proactive interference suf-
assertion that retrieval plays a significant role in producing fered by a later studied associate to a cue (an A-C item) as a
long-term fluctuations in accessibility. All studies of retrieval- function of the number of presentations of an earlier studied
induced forgetting have emphasized the decline in recall associate to that cue (an A-B item). Although increasing the
arising from retrievals occurring within a single test session. presentations of the A-B items from one to three increased
The extrapolation from these findings to long-lasting impair- recall for those items from 49% to 82%, recall of once-
ment hinges crucially on a theoretical interpretation of output presented A-C items went from 30% to 32% (see Riefer &
interference in terms of strength-dependent competition, which Batchelder, 1988, for detailed analysis of this study). In a
is an interpretation that may not be warranted. For example, different but related theoretical context, Blaxton and Neely
no evidence suggests that these effects reflect anything other (1983) demonstrated that prior presentation of several cat-
than temporary suppression occurring within the brief span of egory exemplars for speeded naming actually facilitated genera-
an episodic or semantic recall task. However, if the strength- tion of target exemplars from semantic memory. In both
dependence interpretation is correct, such effects should not studies, strengthening of prior responses should have signifi-
be restricted to a single output session: A single, effortful recall cantly impaired subsequent retrieval of related items but did
buried within the context of other thoughts and processes not. If strengthening is not sufficient to cause impairment,
should cause forgetting of related memories on even remote retrieval-based learning may not cause long-lasting retrieval-
occasions provided that retrieval-based learning endures. When induced forgetting.
we consider the ubiquity of retrieval in our daily cognitive Given the uncertain empirical status of the strength-
experiences, retrieval-induced forgetting might be a pervasive dependence assumption, we thought it useful to treat the
source of long-lasting retrieval failures in long-term memory, present work not only as an exploration of retrieval-induced
an implication that starkly contrasts with the cursory weight forgetting but also as a test of the strength-dependence
given to retrieval processes in recent theoretical treatments of assumption itself. In the next section, we introduce a new
interference (e.g., Mensink & Raaijmakers, 1988). Thus, a paradigm for examining the impact of retrieval on the long-
major goal of the present work is to seek evidence for term accessibility of related information, and we contrast this
retrieval-induced forgetting that endures beyond the retrieval method with previous procedures used to investigate strength-
event during which it is induced. dependent competition. The new procedure improves on
The strength-dependence interpretation of retrieval-in- previous paradigms by unconfounding the strengthening opera-
duced forgetting depends, of course, on the assumptions tion from other logical phases of the experiment, a problem
underlying strength-dependent competition. Although strength- that has arguably generated many of the interpretational
dependent competition has a long history in interference difficulties surrounding strength-dependent competition. Next,
theory (Anderson, 1976; McGeoch, 1936; Melton & Irwin, we develop predictions concerning the relative impairment
1940; Mensink & Raaijmakers, 1988) and remains popular as a expected for different stimulus materials on the basis of a
means of explaining a variety of phenomena (e.g., the increase general class of strength-dependent competition models: ratio-
in part-set cuing inhibition with the number of cues: Roediger, rule models. If impaired recall is observed with the new
1974; Rundus, 1973; the increase in retroactive interference procedure, then retrieval-induced forgetting will be implicated
with the degree of interpolated learning: Mensink & Raaijmak- as a significant factor in producing long-term retrieval failures.
ers, 1988; list-strength effects in free recall: Ratcliff, Clark, & Furthermore, if the impairment follows the pattern expected
Shiffrin, 1990; the exacerbation of the tip-of-the-tongue experi- on the basis of the ratio rule, then we will have obtained
ence with recent presentation of similar words: Baddeley, evidence for strength-dependent competition.
1982; Jones, 1989; Reason & Lucas, 1984; Woodworth, 1938),
the empirical case for the strength-dependence assumption is A Paradigm for Examining
not as clearly established as those for the retrieval-based
learning assumption (e.g., Allen, Mahler, & Estes, 1969; Bjork,
Retrieval-Induced Forgetting
1975; Gardiner, Craik, & Bleasdale, 1973; Hogan & Kintsch, In constructing a paradigm to explore retrieval-induced
1971) and the competition assumption (see Watkins, 1978, for forgetting, we thought it important to consider both the logic
a review). When studies show that strengthening some informa- of strength-dependent competition and the conditions under
tion in memory impairs recall of other information, there is which retrieval-induced forgetting might be expected to occur
substantial disagreement on the theoretical interpretation of naturally. Because strength-dependent competition among
the impairment (regarding part-set cuing, see Basden, Basden, items is thought to occur with respect to a shared retrieval cue,
& Galloway, 1977; Sloman, Bower, & Roher, 1991; regarding we placed special emphasis on cue-target relationships in all
retroactive interference, see Greeno, James, DaPolito, & phases of the paradigm. We also sought to minimize opportu-
Poison, 1978; Martin, 1971; Postman, Stark, & Fraser, 1968; nities for the formation of item-to-item (as opposed to cue-to-
Riefer & Batchelder, 1988; regarding the tip-of-the-tongue item) associations, the presence of which could provide sub-
state, see Brown, 1991; Burke, MacKay, Worthley, & Wade, jects with retrieval routes for circumventing strength-dependent
1991). competition. Because retrieval-induced forgetting may arise
More troubling, however, than any such theoretical disagree- from retrieval-based learning that occurs long after initial
REMEMBERING CAUSES FORGETTING 1065

learning, we separated initial study and retrieval-based learn- Retroactive Part-set Retrieval
Interference Cuing Practice
ing into distinct phases; we also included a substantial reten-
tion interval between retrieval-based learning and the final test
to examine the long-term effects of retrieval.
These considerations led to our designing a retrieval-
practice paradigm that consists of three phases: a study phase,
a retrieval-practice phase, and a final test phase. In the study
phase, subjects study a series of category-exemplar pairs, such
as Fruit Orange, with a typical series consisting of six members
of each of eight different categories. Because the exemplars of
a given category share the category label as a retrieval cue, they
should compete for access to conscious recall on later presen-
tation of the category cue. After the study phase, subjects
engage in directed retrieval practice on half of the items from
half of the categories (e.g., three items from each of four
categories). The retrieval practice of a given item is induced by
presenting a category name together with an exemplar stem
(e.g., Fruit Or ). Each exemplar test appears several
times throughout the practice phase, interleaved with practice
trials on other items to maximize the facilitatory effects of
retrieval practice. After a substantial retention interval (e.g.,
20 min), a final, surprise category cued-recall test is adminis-
tered: Subjects are cued with each category name and asked to
free recall any exemplars of that category that they remember
having seen at any point in the experiment. If strengthening
due to retrieval practice endures throughout the retention
interval, the practiced exemplars in a given category should Figure 1. The temporal organization of retroactive interference,
still create substantial competition for the unpracticed exem- part-set cuing, and retrieval-practice paradigms into discrete phases.
plars in that category on the delayed category cued-recall test. Boxes denote distinct experimental phases; contiguous boxes denote
The impact of this competition can be assessed by contrasting logically distinct but simultaneous phases; arrows indicate theflowof
the final recall of the unpracticed items from the practiced time. The letters L, S, and T designate learning, strengthening, and
categories with the final recall of items from the unpracticed testing of items, respectively. Note that the strengthening operation is
confounded with different phases for all paradigms except the retrieval-
categories (i.e., those categories for which none of their
practice paradigm. Note also that the retroactive interference para-
exemplars had been given retrieval practice). If impairment is digm divides the learning of the two competitors (LI, L2) per stimulus
observed, we have evidence that retrieval-induced forgetting into distinct contexts, whereas all items are learned in the same context
may contribute to long-lasting retrieval failures and that these for other paradigms.
failures may result from strength-dependent competition.
The separation of the retrieval-practice paradigm into three
phases appears to have several advantages over other well- remaining noncue exemplars relative to a baseline condition in
known procedures thought to provide evidence for strength- which subjects receive no cues. The retrieval-practice para-
dependent competition. These features are highlighted in digm, as described above, is depicted in the right column of
Figure 1, which contrasts the retrieval-practice paradigm with Figure 1.
the retroactive-interference and part-set cuing procedures. That strengthening does not occur in a distinct phase in the
These paradigms are represented according to their temporal retroactive-interference and part-set cuing paradigms compli-
organization into learning (L), strengthening (S), and final test cates interpreting the effects of that strengthening. The retro-
(T) phases. (Distinct phases are depicted by boxes; contiguous active-interference procedure confounds strengthening of L2
boxes indicate logically distinct, but co-occurring, phases.) In competitors with the acquisition of the new temporal context
the retroactive-interference paradigm, subjects learn a second (List 2) in which those competitors are learned, confusing the
list of associates to the same stimuli (L2), and these associates relative contributions of strength-dependent competition and
are strengthened by repeated study-test trials (S); this strength- response-set suppression to the impaired recall of LI associ-
ening of second-list associates is thought to impair recall of ates (Postman et al., 1968); in the retrieval-practice procedure,
earlier responses from the first list (LI) on a subsequent test on the other hand, any response-set suppression on the
(T) relative to a baseline condition in which subjects never learning list caused by the retrieval-practice phase should be
learned the second list (L2). In the part-set cuing paradigm, equated across practiced categories and the within-subjects
several exemplars from an earlier studied categorized word list baseline (i.e., those categories that remain unpracticed; see
Delprato, 1972, for a similar approach). The part-set cuing
(containing exemplars L i . . . LN) are presented as cues at test
paradigm confounds strengthening of competitors with presen-
(T), presumably strengthening (S) those cues; this strengthen-
tation of those items as retrieval cues on the final test,
ing of the cue exemplars is thought to impair recall of the
1066 M. ANDERSON, R. BJORK, AND E. BJORK

obscuring the relative effects of strength-dependent competi- Experiment 1


tion and those deriving from the role of strengthened items as
retrieval cues (Basden et al., 1977; see also Raaijmakers & In Experiment 1, we used the retrieval-practice paradigm to
Shiffrin, 1981; Sloman et al., 1991); in the retrieval-practice determine whether retrieval-based learning causes long-
procedure, a long interval separates retrieval-based strengthen- lasting memory failures. In the initial study phase, subjects
ing from the final test, and no items are presented as cues, studied 8 six-item categories. Four of these categories were
eliminating the psychological context of cuing. To the extent composed of strong exemplars (e.g., Fruit Orange), and four
that confounding the various factors described above with were composed of weak exemplars (e.g., Tree Hickory). After
strengthening compromises the measure of strength-depen- the study phase, three exemplars from two strong and two
dent competition in the retroactive-interference and part-set weak categories received retrieval practice (e.g., Fruit
cuing paradigms, the retrieval-practice paradigm may provide Or ) three times each. The three retrievals for each
a better means of testing strength-dependent competition. item, interleaved with tests of other items, were ordered to
produce an expanding sequence of intertest intervals for each
item to maximize the consequences of retrieval practice (see
Testing Strength-Dependent Competition Landauer & Bjork, 1978). After a 20-min retention interval, a
final unexpected category cued-recall test was administered:
Models of Retrieval Subjects were cued with each category name and asked to free
Because our paradigm seemed to have certain advantages as recall any members of that category they could remember
a means of testing strength-dependent competition, we took having been presented at any point in the experiment.
our exploration of retrieval-induced forgetting as an opportu- To describe our predictions (for each of the experiments we
nity to evaluate strength-dependent competition more system- report) more concisely and to simplify discussions throughout
atically. Because ratio-rule formulations of retrieval are the this article, we have labeled the different types of categories
most widely applied and best articulated strength-dependent and items that occur in the retrieval-practice paradigm as
models (e.g., Anderson, 1976; Gillund & Shiffrin, 1984; Men- follows: Categories for which some of their members receive
sink & Raaijmakers, 1988; Raaijmakers & Shiffrin, 1981; retrieval practice are labeled Rp categories (i.e., retrieval
Rundus, 1973), we used a simple ratio-rule model to develop practice categories); categories for which no members receive
predictions of the relative amount of impairment to be any retrieval practice are labeled Nrp categories (i.e., no
expected across materials differing in their strength of associa- retrieval practice categories). The items within an Rp category
tion to a cue. that actually receive retrieval practice are labeled Rp+ items
In the present studies, we manipulated the taxonomic (i.e., Rp category, practiced items); items within an Rp
frequency of exemplars in a category. In Experiments 1 and 2, category that do not receive retrieval practice are labeled Rp—
to test an implication of the basic ratio-rule equation, we items (i.e., Rp category, unpracticed items); and,finally,items
contrasted categories consisting entirely of strong exemplars within an Nrp category, none of which, of course, receive any
with categories consisting entirely of weak exemplars. For a retrieval practice, are simply labeled Nrp items. If retrieval-
broad range of learning-rate assumptions, ratio-rule models induced forgetting produces long-lasting retrieval failures,
predict that retrieval-based strengthening should impair weak retrieval practice of Rp+ items should impair later recall of
exemplar categories to a proportionally greater extent than R p - items (relative to recall observed for the Nrp baseline),
strong exemplar categories (see Appendix A for a numerical even though retrieval-based learning occurred in a context
example). Qualitatively, the reason for this prediction is separated from the final test by 20 min. If impaired recall of
straightforward. The ratio-rule model asserts that the probabil- R p - items is caused by strength-dependent competition from
ity of retrieving an item is a function of the strength of the Rp+ items, the impairment of weak R p - items should be
association of that item to the retrieval cue, relative to the proportionally greater than the impairment of strong Rp—
strength of association of all other memory items to that cue. items.
This relation can be expressed as a simple recall probability
ratio, as in the following example: P(recall Orange given the Method
cue Fruit) = Strength of the Fruit-Orange association/sum of
Subjects
strengths for all Fruit associates. When other items, such as
Banana, are strengthened through retrieval practice, the The subjects were 36 introductory psychology students from the
denominator in the equation for Orange increases, decreasing University of California, Los Angeles, whose participation partially
its recall probability ratio. Because retrieval practice will fulfilled a course requirement.
increase the associative strength of a weaker item to a
proportionally greater extent (see Appendix A), proportional Design
impairment of its competitors will also be greater. If retrieval-
induced forgetting manifests this pattern of impairment across Two factors, retrieval-practice status and category composition,
strong- and weak-exemplar categories, specific evidence in were manipulated within subjects. Retrieval-practice status had three
levels: (a) Rp+ items, which were practiced three times each by means
favor of ratio-rule formulations of strength-dependent compe- of an expanding schedule of category-plus-stem cued-recall tests (e.g.,
tition will have been obtained; if it does not, the ratio rule, and Fruit Or ) during the retrieval practice phase; (b) Rp— items,
perhaps strength-dependent competition in general, may be which were not practiced, but were members of the same category as
inadequate as an account of retrieval-induced forgetting. the Rp+ items, and (c) Nrp items, which received no additional
REMEMBERING CAUSES FORGETTING 1067

retrieval practice and were not members of a practiced category. Nrp reinforced, using the Marshall and Cofer (1970) norms, by minimizing
items, which were divided into two subgroups of three (called Nrpa (a) the pairwise associations between category labels and (b) the
and Nrpb) for counterbalancing purposes, served as a baseline against interexemplar associations (after particular exemplars had been cho-
which to measure the positive effects of practice in the case of Rp+ sen). The phonemic similarities among the category labels was also
items, and the hypothesized negative effects of practice on Rp— items. minimized.
Category composition had two levels: Strong categories, which To reduce variations in stimulus complexity and associability,
contained exemplars whose taxonomic frequency had an average rank category labels were constrained to be semantically unambiguous and
order of 8 (Battig & Montague, 1969); and weak categories, which only one word in length (e.g., no categories such as Earth Formations
contained exemplars with an average rank order of 33. The dependent were included). Finally, the word frequencies (Kucera & Francis,
measure was the proportion of each type of item recalled on a final 1967) of category labels were kept in the low to moderate range, with
category cued-recall test. all labels falling between 25 and 100 occurrences per million.
Exemplar selection. Once eight categories were found that met
these constraints, particular exemplars were chosen for each one (see
Procedure Appendix B). Four of the categories were randomly chosen to contain
all strong exemplars and four to contain all weak exemplars. Exem-
The experiment was conducted in four phases: a learning, a practice, plars in three of the strong categories had an average rank order of 8
a distractor, and a surprise category cued-recall phase. In the learning (median = 7, i.e., average position in a list rank ordered by frequency
phase, subjects were randomly assigned to one of two random orders of report), according to Battig and Montague (1969) category norms.
of the learning materials. Each subject was given a learning booklet, Exemplars in the remaining strong category (Leather) were drawn
face down, as well as an instruction page, which they followed as the from the Shapiro and Palermo (1970) norms and had an average rank
experimenter read the instructions aloud. Subjects were told that (a) order of 3.8. Exemplars in the four weak categories had an average
they were participating in an experiment on memory and reasoning, rank order, according to Battig and Montague, of 33 (median = 23).
(b) they would be given 5 s to study category-exemplar pairs and Thus, there was a clear difference in the taxonomic frequency of
should spend all of this time relating the exemplar to its category, (c) exemplars in the strong versus the weak categories.
after each 5 s passed, a voice on a tape recording would signal them to
turn the page, and (d) the sequence was to be repeated until all pairs in Exemplars were also constrained to be low-frequency, unambigu-
the learning booklet had been presented. On completion of the ous, noncompound words. The average word frequency (Kucera &
instructions, subjects were told to turn their booklets over and begin Francis, 1967) for all eight categories was 13 occurrences per million,
studying. SD = 3.8. No two exemplars began with the same first two letters,
ensuring that each two-letter cue in the retrieval-practice task would
Booklets and instructions were collected as soon as the learning
be unique. In addition, to avoid interference of extraexperimental
phase was completed. Subjects were then randomly assigned to one of
items, no chosen category exemplar had the samefirsttwo letters as an
four practice counterbalancing conditions and to one of three retrieval-
unchosen category exemplar that was listed in the Battig and Mon-
practice orders for that counterbalancing condition. Subjects received
tague (1969) norms. For example, the word trumpet could not be
a booklet face down and a new instruction page, which they followed as
chosen as a musical instrument because the word trombone might
the experimenter read it aloud. Subjects were told that (a) each page
produce extraexperimental interference. Items with strong a priori
would contain one of the category labels that they had received in the
item-to-item associations (e.g., cat and mouse as members of the set
previous phase along with a hint about what exemplar they were to
animals) were avoided.
retrieve; (b) the hint consisted of the first two letters of the appropriate
exemplar; and (c) they were to retrieve an item that they had seen, Finally, two constraints were used to match the effectiveness of the
rather than responding with any exemplar that fit the letter cues. first two letters of an exemplar as a retrieval cue for the retrieval
Subjects then turned their booklets over and began the test: They were practice task: versatility matching and syllable matching. The versatil-
given 10 s to recall each cued exemplar, and a tape-recorded voice ity (Solso & Juel, 1980) of a set of letters corresponds to the number of
instructed them when to turn pages. After the practice phase, subjects words containing those letters in the specified positions. For example,
participated in an unrelated causal reasoning experiment for 20 min. an estimate of the versatility of the letter combination BA in the first
two positions of a word is 413 because there are approximately 413
In the testing phase, subjects were randomly assigned to one of three
words that begin with that combination of letters in the Kucera and
random testing orders of the categories. Booklets were distributed face
Francis (1967) norms. Versatilities of the two-letter stems of exem-
down and the experimenter read instructions aloud. Subjects were told
plars were constrained to be at a moderate level of difficulty (M = 281,
that, at the top of each page, there would be a name of one of the
SD = 12) as measured by Solso and Juel. Finally, stems were con-
categories studied previously and that they should recall all exemplars
strained to provide less than one syllable of information. In ambiguous
of that category that they had been shown at any time in the
cases, we used Webster's New Collegiate Dictionary (1980) to determine
experiment. Subjects were given 30 s for each category, and were then
where syllabic breaks occurred.
instructed to turn the page.
Learning booklets. Learning booklets were constructed from the 48
experimental and 12filleritems. The placement of these items in the
Materials learning booklet was designed to minimize interexemplar associations
because such associations could provide secondary retrieval routes to
Category selection. Ten categories, two of which were used as unpracticed items in the practiced categories, offsetting the impair-
fillers, were drawn from several published norms (Battig & Montague, ment caused by the competition for the primary retrieval cue. Two
1969; Marshall & Cofer, 1970; Shapiro & Palermo, 1970). The 8 measures were taken to minimize interitem association among cat-
experimental categories were selected in the following manner. Rela- egory members and to maximize attention to category-exemplar
tively unrelated categories (i.e., dissimilar and nonassociated catego- relationships. First, category-exemplar pairs were presented to sub-
ries) were chosen to ensure that measures of category-recall perfor- jects centered on individual pages in paired-associate format (e.g.,
mance were as independent as possible. Intercategory similarity and Fruit Orange). Second, rather than presenting all exemplars from a
association were first determined by the experimenters carefully given category at once, the order of exemplars within a booklet was
assessing the relatedness of the knowledge domains (e.g., If Fruit were determined by blocked randomization in which each block contained
to be used, Vegetable would not be selected); these judgments were one exemplar from each category, resulting in six blocks of 10 items
1068 M. ANDERSON, R. BJORK, AND E. BJORK

(each block containing 8 items from experimental categories and 2 Results and Discussion
items from filler categories). The ordering of exemplars within each
block was determined randomly except that (a) in the first block, filler Retrieval Practice
items appeared in the beginning to control for primacy effects; (b) in
the last block, filler items appeared at the end to control for recency The retrieval practice success rates for Rp+ items varied as
effects; and (c) throughout the booklet, no two categories appeared in a function of category composition, with 74% and 90% success
sequence more than once. Two different learning booklets were rates being obtained across weak and strong Rp-t- items,
constructed, in which both the ordering of categories within blocks and respectively. (Note that potential difficulties of interpretation
the list position of particular category items varied. created by the differing rates of retrieval-practice success are
Retrieval-practice booklets. Each page of a retrieval-practice book- addressed in Experiment 3).
let contained one test of a single category exemplar. The category label
appeared centered on the page with the first two letters of the
exemplar printed two spaces to the right of it, followed by a solid line
Final Test Performance
to indicate that the item was incomplete (e.g., Fruit Or ). The
stem of the exemplar was provided to direct subjects to retrieve a All analyses were first conducted treating the counterbalanc-
particular item. The solid line was the same length for all items so that ing subgroups of Nrp items as distinct levels of the retrieval
no cues for word length would be given.
practice factor. Because no significant difference was obtained
To construct retrieval-practice booklets, we first defined an abstract
between the recall means of these subgroups (M = 48.8% and
ordering of exemplar tests using the following constraints. The first
and last few items in all practice booklets were tests of filler items to
48.1% for Nrpa and Nrpb items, respectively) nor was there a
acquaint subjects with the practice task and to control for primacy and simple interaction between the Nrpa-Nrpb and the strong-
recency effects on final recall. All experimental items were tested three weak manipulation, the data from these subgroups were
times on an expanding schedule, with an average spacing of 3.5 trials combined in the results reported below.
between the first and second test and 6.5 trials between the second and Table 1 shows the percentages of each type of item that were
third test. In general, no two category members were tested on correctly recalled for the strong and weak categories, respec-
adjacent pages, and the average test position of each category in the tively. As expected, repeatedly retrieving several members of a
test booklet was kept constant. To the extent possible, we prevented studied category improved the recall of those items
particular sequences of category-exemplar tests from appearing con-
(Rp+ = 73.6%) relative to the baseline (Nrp = 48.4%) on the
secutively more than once (as is prone to occur with systematic spacing
manipulations) by inserting tests of filler items.
final delayed recall test, F(l, 32) = 136.9, p < .0001, MSe =
To control for specific-category effects, we counterbalanced which
.022. More important, however, is the finding of impaired
categories were practiced and which were not. The eight experimental recall for the remaining unpracticed category exemplars
categories were divided into two random sets of four (referred to as Set (Rp- = 37.5%) relative to the same baseline, F(l, 32) = 30.3,
A and Set B), with the constraint that two strong and two weak p < .0001, MSe = 019. This pattern of improved recall for
categories appeared in each set. Half of the subjects performed Rp+ items and impaired recall for R p - items is consistent
retrieval practice on Set A and the other half of the subjects on Set B. with the item-specific interference predicted by strength-
To control for specific-exemplar effects, we further divided Set A and dependent competition models of forgetting: That is, retrieval
Set B into two random subsets (referred to as Subsets Al, A2, Bl, and practice appears to have produced enduring retrieval-based
B2). For Subset Al, three exemplars were randomly selected from learning of the Rp+ items, as evidenced by their improved
each of the four categories in A, with the remaining three exemplars
recall performance, thereby reducing the competitiveness of
constituting A2. Half of the subjects who practiced the Set A
categories practiced Al exemplars, and the remaining subjects prac-
the Rp— items during thefinalrecall test, as evidenced by their
ticed A2 exemplars. Subsets Bl and B2 were constructed and distrib- impaired recall performance. Furthermore, this pattern of
uted in the same manner (see Appendix B for the materials and their results indicates that retrieval-induced forgetting is not re-
divisions into these sets). These procedures ensured that every item stricted to a single output session and may, in fact, contribute
participated in every condition equally often, and resulted in four sets to long-lasting retrieval failures.
of 12 items (Al, A2, Bl, and B2) from which we constructed As expected, the main effect of our category composition
retrieval-practice booklets. manipulation was significant, with strong exemplars being
Each of the four 12-item counterbalancing sets was assigned to the recalled at a higher level than weak exemplars (M = 58.3%
abstract ordering of exemplar tests three times, resulting in 12 booklets and 45.7%, respectively), F(l, 32) = 53.2, p < .0001, MSC =
of 51 pages (three practice orders for each of the four counterbalanc-
ing sets). Distractor materials were booklets containing causal-
reasoning tasks.
Test booklets. Each page of the nine-page test booklets contained Table 1
one category cue centered at the top. The first page for all testing
Mean Percentage of Items Recalled on a Category Cued-Recall
booklets was one of the filler categories (mountains), which was
Test as a Function of Category Composition in Experiment 1
inserted to minimize variance due to output interference. The order of
the remaining experimental categories was random, except that across Retrieval practice status of item
the three testing orders, the average test position for each category and Category composition Rp-t- Rp- Nrp
each condition was approximately the same. Each of the three testing
orders was combined with each of the 12 practice booklets, yielding 36 Strong exemplars 81.0 40.3 56.0
distinct combinations. Weak exemplars 66.2 34.7 41.0
Finally, we used a portable tape recorder to play the tape instructing Note. Rp+ = practiced exemplars from practiced categories; Rp— =
subjects when to turn booklet pages and a stopwatch to time subjects in unpracticed exemplars from practiced categories; Nrp = unpracticed
the final test phase. exemplars from unpracticed categories.
REMEMBERING CAUSES FORGETTING 1069

.022. An analysis of the magnitudes of retrieval-practice no way to disentangle dynamics arising at test from those
facilitation for strong and weak exemplars, however, revealed arising during the retrieval-practice phase. It is possible, for
that the absolute improvement for weak items was not reliably example, that impaired recall of R p - items was produced
different from that for strong items, (Rp+ — Nrp = 66.2 — entirely at final test, arising as a consequence of the prior
41.2 = 25.0% for weak items vs. 81.0 - 56.0 = 25.0% for retrieval of strengthened Rp+ items. Indeed, an inspection of
strong items), F(l, 32) < 1. Furthermore, although the the output order of items on the final recall test of the present
proportional facilitation of weak items—measured as a per- study supports such an interpretation: Rp+ items were re-
cent of their Nrp baseline—appeared to be greater than the ported far earlier, on average, than R p - items, similar to the
facilitation of strong items (61.5% vs. 44.6%, respectively), this early recall of cue items in studies of part-set cuing (Roediger,
difference was not statistically reliable, F(l, 32) < 1. This Stellon, & Tulving, 1977).
failure for weak exemplars to show greater facilitation is In summary, then, the temporal locus (or loci) of the
probably because final recall performance underestimates the mechanism (or mechanisms) contributing to the impaired
facilitation of those items; final recall reflects both the facilita- recall of R p - items cannot be determined with precision on
tion of successfully practiced items and the lack of facilitation the basis of the results of Experiment 1 alone. We thus
for the larger number of weak items missed entirely during designed Experiment 2 to test whether impaired recall of R p -
practice. exemplars would still be observed when the output order of the
Examining next the pattern of impairment for strong and exemplars in a given category was controlled at the time of the
weak exemplars, we first determined that reliable impairment final test.
had been obtained for both strong and weak categories,
F(l, 32) = 27.4,p < .0001,MSe = .022; F(l, 32) = 4.5,/J < .05,
Experiment 2
MSe = .021, respectively. Additional analyses, however, re-
vealed that the recall of strong R p - items exhibited both more In Experiment 2, we used the same procedure and materials
absolute impairment and more proportional impairment than as in Experiment 1 except that we replaced the category-cued
did the recall of weak R p - items: absolute impairments being free-recall test with a category-plus-stem cued-recall test,
15.7% (56.0 - 40.3) for strong R p - items versus 6.3% which allowed us to control for the order in which Rp+ and
(41.0 - 34.7) for weak R p - items, F(l, 32) = 4.6, p < .05, R p - items were output at the time of the final test. More
MSt = .023; and proportional impairments being 28.0% for specifically, each item on the final test, as in the retrieval-
strong items versus 15.4% for weak items, F(l, 32) + 7.5,p < practice phase, was tested on a single page by presenting a
.01, MSe = -194. category name and the first two letters of that exemplar. Using
Thus, whereas the overall tradeoff between facilitation and thefirsttwo letters of an exemplar to direct the subjects' search
impairment observed in the present recall results is consistent enabled us to manipulate whether R p - items were tested first
with an interpretation in terms of strength-dependent compe- or second in their categories—hereinafter referred to as
tition, the results obtained from our manipulation of category Rp-lst and Rp-2nd items, respectively—and whether Nrp
composition are not what would be expected from ratio-rule items were tested first or second—hereinafter referred to as
models. If, for example, one assumes that weak items would be Nrplst and Nrp2nd items, respectively.
strengthened at a proportionally greater rate than strong items By comparing the recall of Rp-lst items to that of Nrplst
by retrieval practice (as we had originally expected to find), items, we would be able to obtain a measure of Rp— recall that
then the ratio-rule model predicts proportionally greater was free of any potential output interference effects from the
impairment for weak categories than for strong. If, rather, one recall of Rp+ items. Thus, any recall impairment observed for
assumes that strong and weak items would be facilitated to a these Rp—1st items would have to reflect the long-term
proportionally equivalent degree by retrieval practice, the consequence of events that had occurred during the retrieval-
assumption consistent with the present results, the ratio-rule practice phase, rather than the consequence of output interfer-
model predicts—as shown in Appendix A—greater absolute ence dynamics occurring during the final test phase. Similarly,
impairment for strong-exemplar categories than for weak- by comparing the recall of Rp-2nd items to that of Nrp2nd
exemplar categories but equivalent proportional impairments, items, we would obtain a measure of Rp— impairment from
an outcome not observed in the present results. (One excep- which potential interference effects owing to the earlier recall
tion to the previous predictions, arising under certain unrealis- of Rp+ items had been eliminated: The recall tests for both
tic assumptions, is addressed in Experiment 3) sets of these items would follow the tests for items recalled first
The observed pattern of impairment as a function of in their respective categories (i.e., Rp+lst and Nrplst items),
exemplar strength is, thus, both surprising and potentially thus, their recall should be equally affected by output interfer-
important, appearing as it does to be inconsistent with the ence. If output interference actually does contribute to recall
predictions of ratio-rule models. One approach to explaining in this task, a comparison of the recall levels for Nrplst and
this discrepancy would be to propose an additional mechanism Nrp2nd items should reveal that the former are recalled better
that either selectively impairs recall of strong Rp— items, or than the latter. Given this result, we would expect the
that selectively facilitates recall of weak R p - exemplars. For difference in recall performance for Rp—1st versus Nrplst
instance, the retrieval-practice phase may set in motion some items or for Rp—2nd versus Nrp2nd items, either of which
process other than strengthening that affects the pattern of would be a measure of Rp— recall impairment uncontami-
impairment, the effects of which persist throughout the reten- nated by output interference, to be less than the difference
tion interval. Unfortunately, the present experiment provides between the recall for Rp—2nd and Nrplst items because this
1070 M. ANDERSON, R. BJORK, AND E. BJORK

latter difference should reflect the recall of R p - items used to construct two specific counterbalanced orderings of categories:
impaired by both output interference and any potential long- The first ordering was constructed by selecting categories from the
term effects from the retrieval-practice phase. That is, a strong and weak sets and randomly assigning them to appropriate
comparison between the recall of Rp-2nd items and Nrplst positions; the second ordering was constructed by switching categories
items would produce a measure of R p - recall that would be from thefirsthalf of the first test sequence with those of the second.
subject to the same effects as had influenced the Rp— recall The average testing position of practiced and unpracticed categories
was controlled by implementing one pattern (Rp, Nrp, Nrp, Rp, Nrp,
observed in Experiment 1.
Rp, Rp, Nrp), which was then inverted when we counterbalanced the
categories that were practiced.
Method The testing order of particular exemplars within a category was
counterbalanced by switching thefirstthree exemplars with the second
Subjects three. The exemplar-position counterbalancing crossed with the cat-
egory-position counterbalancing (resulting in four test booklet types)
The subjects were 48 introductory psychology students from the ensured that all items contributed to all testing-order and practice-
University of California, Los Angeles, whose participation partially condition combinations (e.g., Rp+lst, Rp+2nd, Rp-lst, Rp-2nd,
fulfilled a course requirement. etc.) and that all categories and exemplars had the same average
testing position.
Design Each of the four retrieval-practice counterbalancing conditions (Al,
A2, Bl, and B2, as in Experiment 1), each having three random orders,
The design of Experiment 2 differed from that of Experiment 1 in was paired with each of the four final test booklet types, resulting in 48
how final recall was measured: Accessibility of category exemplars was practice-book-test-book combinations (one for each subject).
assessed with a category-plus-stem completion task rather than a
category-cued free-recall task, so that the order for testing category
exemplars could be manipulated. Thus, the design involved three Results and Discussion
factors, all manipulated within-subjects: retrieval practice, category
composition, and testing position, with retrieval practice and category Retrieval-Practice Performance
composition being manipulated exactly as they had been in Experi-
ment 1. As in Experiment 1, the retrieval practice success rates for
Thefinaltest booklet was blocked by categories. The testing order of Rp+ items varied as a function of category composition, with a
exemplars within category blocks was manipulated on two levels: The 76.1% and 85.0% success rate being obtained across weak and
first half of the block constituted the tested-first exemplars (e.g., strong Rp-t- items, respectively.
Rp—1st and Nrplst items) and the last half constituted the tested-
second exemplars (e.g., Rp-2nd and Nrp2nd items). The dependent
measure was the percentage of words recalled in a category-plus-stem
Final Test Performance
cued-recall test.
As for Experiment 1, all statistical analyses were initially
Procedure conducted treating the counterbalancing subgroups of Nrpa
and Nrpb as distinct levels of the retrieval-practice factor.
To the point of thefinaltest, the procedure we used in Experiment 2 However, because the mean correct recall percentages for
exactly matched the procedure used in Experiment 1. In thefinaltest these subgroups (71.2% and 74.1%, respectively) did not differ
phase, subjects were instructed that they would be tested in a way significantly, F(l, 44) = 1.6, p = .21, their data were combined
similar to that in which they had been tested in the practice phase. into a single Nrp measure for ease of exposition. Similarly,
More specifically, subjects were told that on each page of the test data were collapsed across our other two counterbalancing
booklet they would see the name of a category with thefirsttwo letters
factors because they did not interact with the variables of
of an exemplar next to it and that their task was to retrieve the
interest.
exemplar, from any portion of the experiment, that corresponded to
those cues. Subjects were given 10 s to recall each item, after which Table 2 shows the percentages of each type of item that were
time a tape-recorded voice instructed subjects to turn the page. This correctly recalled on the final category-plus-stem cued-recall
sequence was repeated until all trials in the test booklet were test for strong and weak exemplars, respectively, as a function
completed. of their within-category testing position. As might have been
expected, the addition of a two-letter cue during the final test
Materials substantially increased the overall level of recall in Experiment
2 as compared with that of Experiment 1 (M = 75.7% vs.
The apparatus, as well as the learning, practice, and distractor 52.0%, respectively). The overall correct recall percentages
materials, were identical to those used in Experiment 1. increased from 59% to 82.8% for strong exemplars and from
Each page of the final test booklets had one category-plus-stem 47% to 68.5% for weak exemplars. As can be seen from
cued-recall test. Tests of exemplars were blocked by category to match observing the means reported in Table 2, retrieval practice
the recall conditions of Experiment 1 as closely as possible. Finally, appeared to facilitate weak exemplars more than strong
items of a particular type (e.g., Rp+, R p - , Nrpa, and Nrpb) were exemplars (Rp+ - Nrp = 79.9 - 62.7 = 17.2% for weak exem-
always tested in sequence, being either the first three or the last three
items tested within their respective categories. plars and 91.0 - 82.7 = 8.5% for strong exemplars), F(l,
The average test booklet position of category types (i.e., Strong and 40) = 3.9,p = .054, a result that is likely to be an artifact of the
Weak) was controlled by creating the following order of category very high recall performance of the strong exemplars and, as
types: S, W, W, S, S, W, W, S. This general order of category types was such, not likely to be meaningful.
REMEMBERING CAUSES FORGETTING 1071

Final Test Performance Averaged Across Output Position Table 2


Mean Percentage of Items Recalled on a Category-Plus-Stem
In general, the findings of Experiment 2 replicated those of Cued-Recall Test as a Function of Category Composition and
Experiment 1, despite our use of a substantially different Within-Category Testing Position in Experiment 2
testing method. We obtained a significant main effect for
Retrieval practice status of item
category composition, with strong exemplars being recalled
more frequently than weak exemplars (M = 82.7% and 67.0%, Category composition Rp+ Rp- Nrp
respectively), F(l, 40) = 73.6, p < .0001,MSe = .064. Planned Strong exemplars
comparisons revealed that retrieval practice improved the Tested first 91.0 77.8 85.4
recall of Rp+ items over that of Nrp items (M = 85% and Tested second 91.0 71.5 79.9
73%, respectively), F(3, 120) = 37.2, p < .0001, MSe = .056, M 91.0 74.7 82.7
but, on the whole, did not reliably damage the recall of Rp— Weak exemplars
Tested first 79.9 63.2 59.7
items relative to that of Nrp items (Af = 68.8% and 73%, Tested second 79.9 62.5 65.7
respectively), F(l, 40) = 2.3, p = .13. This main-effect 62.7
M 79.9 62.9
comparison for R p - impairment, however, is obscured by a
marginal interaction with category composition, F(l, 40) = 2.8, Note. Rp+ = practiced exemplars from practiced categories; R p - =
unpracticed exemplars from practiced categories; Nrp = unpracticed
p = .10, MSe = 076. Because Experiment 1 had led us to exemplars from unpracticed categories. Tested first or second = items
expect an interaction between our category-composition and tested in the first three or second three positions of a category block.
our retrieval-practice factors and because strong items, but not Comparisons of R p - and Nrp items within a given row reflect
weak items, may have been subject to ceiling effects, we practice-induced inhibitory effects alone. Comparison of Rp— tested
second and Nrp tested first reflects the combined effects of practice-
reasoned that any inhibiting effects on the recall of strong and test-induced inhibition.
categories may have been artificially reduced, lessening the
chance for obtaining a significant interaction. We, therefore,
regarded this marginal interaction as sufficient grounds to
examine the potential inhibitory effects of retrieval practice on Impact of Testing Order on Final Test Performance
strong items and weak items in isolation. Comparisons re-
vealed that Nrp items were recalled at a significantly higher As the output order of items in Experiment 1 had led us to
rate than R p - items (82.7% vs. 74.7%) for strong categories, suspect, the prior recall of other category members at the time
F(l, 40) = 7.2, p < .01, MSe = .060, whereas there was no of the final test did impair the recall of later items in
evidence for a difference in the recall of Nrp and Rp— items Experiment 2. Although the main effect of testing position did
(62.7% vs. 62.9%) for weak categories. As in Experiment 1, not reveal an advantage for earlier items (M = 75.3%) over
there was a proportionally greater degree of impairment for later items (M = 74.5%), this factor showed a marginal interac-
strong R p - items than for weak R p - items (9.7% vs. 0%), tion with category composition, F(l, 40) = 3.9, p = .056,MSe =
.063. Consistent with the tendency observed in Experiment 1
F(l, 44) = 5.8, p < .05. Interestingly, this finding, like those of
for strong exemplars to be more impaired than weak exem-
Blaxton and Neely (1983) and DaPolito (1966) discussed in the
plars, the effect of output interference at the time of the final
introduction of this article, appears to be an instance in which
test was greater for strong exemplars than it was for weak
strengthening fails to cause impairment.
exemplars in Experiment 2. That is, whereas the overall
Finding impairment with the category-plus-stem cued-recall correct recall percentage for strong exemplars tested first
testing procedure used in Experiment 2 is surprising for at (84.7%) was significantly better than that for strong exemplars
least two reasons. First, it is surprising to the degree that stem tested last (80.6%), F(l, 40) = 4.0, p < .05, MSC = .045, the
completion, which was essentially what this testing procedure overall correct recall percentages for weak exemplars tested
required, resembles recognition testing. It is well known that first showed no advantage over that for weak exemplars tested
retroactive interference effects are greatly attenuated (and last (65.6% vs. 68.4%, respectively), F(l, 40) = 1.1, p > .05.
often eliminated) when a recognition testing procedure is used Interestingly, for strong items, the two sources of impairment—
instead of modified-modified free recall (see, e.g., Postman & the impairment due to testing position and the impairment due
Stark, 1969), suggesting that such interference effects reflect to practice of other category members—appear to be indepen-
difficulties in retrieval. Second, other effects of retrieval dent effects: Collapsing across testing order, the impairment
inhibition (e.g., part-set cuing inhibition and the list-strength due to the retrieval-practice factor (Nrp - R p - = 82.7 - 74.7)
effect) are either rather small (Todres & Watkins, 1981) or are was significant, F(l, 44) = 7.2,p < .01, and this factor did not
nonexistent (Ratcliff et al., 1990; Slamecka, 1975) with recogni- interact with testing position, F(l, 40) < 1.
tion testing, unless more sensitive tests (e.g., recognition time, Perhaps the most important findings of Experiment 2
see Neely, Schmidt, & Roediger, 1983) are used. Because we concern the variations in R p - impairment as a function of our
did observe retrieval-induced forgetting for a stem-completion testing order manipulations. First is the demonstration of
testing procedure, however, it follows that either (a) the impairment even when Rp— items were tested prior to Rp+
retrieval demands of stem completion are more similar to items. As noted, the reliable impairment observed for strong
those imposed by recall than to those imposed by recognition, Rp— items did not vary with the position in which Rp— items
or (b) the current impairment is qualitatively different from were tested, Nrplst - Rp-lst = 7.6% and Nrp2nd - R p -
part-set cuing and retroactive interference effects. 2nd = 8.4%. Because Rp— items that were tested first were
1072 M. ANDERSON, R. BJORK, AND E. BJORK

not contaminated by the potentially interfering effects of Rp+ clear difference in difficulty between strong and weak Rp+
output, we can attribute the impairment of strong Rp-lst items that highlighted the necessity of giving extra rehearsal to
items to effects enduring from the retrieval practice phase. weak items. If the difficulty of weak Rp+ items triggers
Second is the demonstration that the output of Rp+ items strategic rehearsal of weak Rp+ and R p - items, impairment
before Rp— items did result in some additional impairment for should not arise whenever Rp+ items are weak and should
the strong Rp— exemplars. Looking at Table 2, if one arise whenever Rp+ items are strong, provided that significant
compares Rp—2nd performance, which is subject to both strengthening of the practiced items occurs.
retrieval-practice and output sources of inhibition, with Nrplst A second aspect of the present data that complicates the
performance, which is free from both sources of inhibition, the interpretation of the greater impairment for strong items is
difference (13.9%) is larger than that between Rp-lst and that ceiling effects prevented us from accurately assessing the
Nrplst performance (7.6%), which is a measure of R p - relative facilitation of strong and weak Rp+ items. Although
impairment free of any potential output interference effects, ceiling effects were clearly not a problem in Experiment 1, a
and that between Rp-2nd and Nrp2nd performance (8.4%), potentially greater strengthening of strong Rp+ items in
which is a measure of R p - impairment from which potential Experiment 2 might have caused the greater impairment of
output interference effects have been eliminated. It appears, strong R p - items. Such concerns are fueled by the differences
then, that under circumstances in which output order is not in retrieval-practice success rates observed in both Experi-
constrained, practiced items will tend to be recalled first, ments 1 and 2. If either strengthening bias or strategic covert
adding to the long-term debilitating effects of retrieval prac- rehearsal occurred, competition might still be strength depen-
tice, at least for strong items. dent in the sense predicted by the ratio rule.
Extraexperimental interference. A second explanation of the
greater impairment for strong items emerges if extraexperimen-
Possible Explanations tal exemplars contributed to the patterns of impairment
The finding of impairment when R p - items were tested first observed in Experiments 1 and 2, as might occur if subjects
rules out the possibility that the retrieval-induced forgetting failed to use a representation of the experimental context as a
observed in the present paradigm can be entirely due to output retrieval cue. When the potential contribution of extraexperi-
interference dynamics operating at the time of the final recall mental interference is considered, the ratio-rule model can
test. We turn now to a consideration of explanations for R p - predict greater proportional impairment for strong categories
impairment in terms of enduring consequences of processes and minimal impairment for weak categories. These predic-
set in motion by the retrieval practice given to Rp+ items and tions derive from differences in the composition of the set of
to a consideration of our failures in both Experiments 1 and 2 extraexperimental exemplars across strong and weak catego-
to obtain a pattern of R p - impairment consistent with ries. To illustrate, because strong studied categories included
predictions of ratio-rule models. Four accounts of this appar- many of their strongest exemplars as part of the study list, their
ent violation of the strength-dependence assumption are extraexperimental sets should contain mainly weak exemplars;
outlined and then tested in Experiment 3: (a) covert retrieval in contrast, extraexperimental sets for weak categories should
and strengthening bias, (b) extraexperimental interference, (c) contain the strong exemplars. Because the negative impact of
lateral inhibition, and (d) suppression. retrieval-based learning on R p - items can be shown to be far
Covert retrieval and strengthening bias. Although the pre- greater when the net strength of the extraexperimental set is
sent findings clearly violate the most straightforward predic- low than when it is high (assuming that the experimental
tions of the ratio-rule model, perhaps aspects of our procedure context is not used as a cue, see Appendix A), the impairment
conspired to make our results appear as though the ratio-rule to strong categories can be great, whereas the impairment to
model had been violated. For instance, covert retrievals during weak categories can be minimal, owing to the differential
the retrieval-practice phase of our paradigm might have makeup of their extraexperimental sets of exemplars.
influenced the relative impairment across strong and weak Lateral inhibition. A third possibility consistent with the
categories. Perhaps the present pattern of impairment could results thus far is that competition may be strength dependent
be made consistent with ratio-rule models if additional strength- but in a way that we did not expect: Practice of strong Rp+
ening deriving from such retrievals selectively reduced the items might produce more absolute and proportional impair-
impairment expected for weak Rp— exemplars. ment than practice of weak Rp+ items. Although this would
Analysis of the expected pattern of covert retrievals illus- not be consistent with the ratio rule, greater impairment
trates, however, that such intrusions, were they to occur deriving from the practice of strong exemplars might result if
spontaneously (as opposed to strategically), should, in fact, strong Rp+ items were more effective inhibitors than were
decrease impairment more for strong Rp— items than for weak weak Rp+ items, as might be the case if impairment were
R p - items. Strong Rp— items should be more likely to intrude caused by automatic lateral inhibition among category exem-
and be strengthened than should weak Rp— items; covert plars. Such models have been suggested to account for the negative
retrieval, therefore, should favor the recall of strong Rp— effects of part-list cues on retrieval of related material (Blaxton
items. The question remains, however, whether subjects used & Neely, 1983; Martindale, 1981; Roediger & Neely, 1982).1
some strategy during practice of weak categories that enabled
selective rehearsal of weak R p - items, thereby reducing the 1
It is not a necessary property of lateral-inhibition models that they
final recall impairment to weak categories. Subjects might have predict greater impairment for strong categories than for weak
adopted such an intentional rehearsal strategy if there was a categories. For example, one might assume that exemplar nodes in a
REMEMBERING CAUSES FORGETTING 1073

Suppression. A final possibility is that the greater impair- strong categories derives from the differential composition of
ment of strong R p - items results from a process of active the set of extraexperimental exemplars across strong and weak
suppression (as suggested by Keele & Neill, 1978, in their categories; (c) the lateral inhibition hypothesis, which asserts
model of attention; see also Blaxton & Neely, 1983; Carr & that strong Rp+ items are better inhibitors than are weak
Dagenbach, 1990; Dagenbach, Carr, & Barnhardt, 1990; Neill Rp+ items; and (d) the suppression hypothesis, which asserts
& Westberry, 1987), which is an inhibitory process that acts on that the greater impairment for strong categories arises be-
those Rp— items during the retrieval-practice phase. Suppose cause strong R p - items are more interfering than weak Rp—
that we assume that spontaneous covert retrievals did occur items, and thus, are more vulnerable to suppression during
during retrieval practice but not in a way that led to covert retrieval practice.
strengthening of competitors. Instead, suppose that the provi- We implemented several modifications of the design and
sion of the category cues during retrieval practice primed all procedure in Experiment 3. First, to eliminate the ceiling
category members but that the stem cues directed access effects on the recall of Rp+ and Nrp items observed in
sufficiently so that competitors were not consciously intruded. Experiment 2, we made the final test more difficult by using
Activation of Rp— items in this manner, however, may have single-letter rather than double-letter word-stem cues. Sec-
created retrieval discrimination problems, slowing access to ond, category composition was manipulated between subjects
Rp+ items. If inhibition were used to overcome such discrimi- in the present experiment to reduce subject strategies arising
nation problems, and if strongly associated exemplars inter- from contrasts in the difficulty of strong versus weak Rp+
fered more frequently than weak exemplars—and were, thus, items during retrieval practice. Finally, we expanded our
suppressed or inhibited more frequently than weak exemplars— manipulation of category composition to include mixed catego-
the greater impairment of strong R p - items could be ex- ries (i.e., categories composed of three strong and three weak
plained. exemplars), resulting in four levels of category composition
Like the lateral-inhibition approach, the suppression ac- instead of two: the pure strong condition, with strong items
count explains the impaired recall of R p - items by an practiced (hereinafter designated the SS condition, where the
inhibitory process; unlike lateral inhibition, however, the underlined letter denotes the subset that is practiced), the
amount of impairment suffered by Rp— items is thought to be mixed condition with strong items practiced (SW), the mixed
modulated by the amount of interference caused by Rp— items condition with weak items practiced (W.S), and the pure weak
rather than the strength of the Rp+ items. Thus, the suppres- condition with weak items practiced (WW).
sion hypothesis need not make the strength-dependence as- The inclusion of mixed categories in the present experiment
sumption inherent to both the ratio rule and lateral inhibitory should allow us to discriminate among the four accounts of the
models because the extent to which Rp— items are impaired greater impairment for strong categories obtained in Experi-
depends only on their own strength. Experiments 1 and 2 ments 1 and 2. The predictions of these four hypotheses are
cannot distinguish between lateral inhibition and suppression summarized in Table 3 in terms of the hypothesized influence
because we used homogeneous categories; thus, the greater of retrieval practice on Rp— items. Note that the four
impairment for strong items could have resulted from either hypotheses make identical predictions for the pure category
the greater strength of Rp+ or of R p - items. Experiment 3 conditions (i.e., SS and SW), but vary in what they predict for
was designed to discriminate among these possible accounts of the mixed categories (i.e., SW and W_S). Consider first the
the greater impairment for strong categories. covert retrieval and extraexperimental interference hypoth-
eses, depicted in Rows 1 and 2, either of which, if confirmed,
Experiment 3
Experiment 3 explores mechanisms that might underlie the
greater retrieval-induced forgetting for strong categories ob- Table 3
served in Experiments 1 and 2. In particular, we attempt to Hypothesized Influence of Retrieval Practice onRp— Recall as a
distinguish among the four accounts proposed in the discus- Function ofRp+ andRp— Exemplar Strength
sion of Experiment 2: (a) the strengthening bias and covert Category composition
retrieval hypothesis, which asserts that the greater impairment (example items)
for strong categories is an artifact of biases in the strengthen- SS SW WS WW
ing of Rp+ items and in the covert rehearsal of Rp— items (Orange,, (Orange, (Guava, (Guava,
during retrieval practice; (b) the extraexperimental interfer- Hypotheses Banana) Kiwi) Banana) Kiwi)
ence hypothesis, which asserts that greater impairment for Covert retrieval plus
strengthening bias 0
1

Extraexperimental
interference 0
I I I

lateral-inhibitory network had nonlinear activation functions that Automatic lateral inhibition 0 0
reduced or enhanced inhibitory inputs, dependent on the current Suppression 0 0
activational state of the node. For present purposes, the important
point is that the amount of impairment inflicted by an inhibiting item Note. SS, SW, WS, and WW designate categories composed of either
all strong exemplars (SS), all weak exemplars (WW), or half strong and
does depend on the strength of the association between the cue and half weak exemplars (SW and WS). The strength of the practiced and
the inhibiting item and that this strength-dependent process can, unpracticed items (Rp+ and R p - items) is indicated by underlined
under certain assumptions, cause greater impairment for strong and nonunderlined letters respectively. - = inhibitory effects; + =
categories. facilitatory effects; 0 = neutral effects.
1074 M. ANDERSON, R. BJORK, AND E. BJORK

would support a ratio-rule interpretation of our results. Accord- four levels instead of two: The strong-strong (SS) and the weak-weak
ing to the covert-retrieval hypothesis, subjects give extra (WW) conditions contained only strong and weak categories, respec-
rehearsal to weak Rp+ and R p - items because weak Rp+ tively; and the remaining two conditions, SW and W.S, contained
items seem difficult. If subjects rehearse in this manner, there categories composed of three strong and three weak exemplars. In the
should be no impairment whenever Rp-I- items are weak (W.S SW condition, subjects practiced the strong items, whereas in the W.S
and WW) with the potential for facilitation when R p - items condition, subjects practiced the weak items. As in Experiment 2, both
the practice status of an item and testing order were manipulated
are more accessible for rehearsal (WJS). Furthermore, there
within subjects.
should be significant impairment in the SW condition because The dependent measure was the percentage of words recalled in a
subjects should not consider it necessary to perform extra category-plus-stem cued-recall test, in which single-letter stems were
rehearsal on strong Rp+ items. The inclusion of mixed used instead of two-letter stems as had been used in Experiment 2.
categories also controls for variations in extraexperimental
interference because the contents of the extraexperimental Materials and Procedure
exemplar sets for SW and W.S conditions are identical; thus,
there should be impairment in both mixed conditions, pro- The materials used in Experiments 1 and 2 were revised to meet the
vided that significant strengthening occurs for Rp-l- items. constraints imposed by our expanded manipulation of category compo-
Next, consider the two inhibitory hypotheses—lateral inhibi- sition. As illustrated in Appendix C, eight large categories were
tion and suppression depicted in Rows 3 and 4. If the greater constructed, each with 12 exemplars (6 strong and 6 weak) so that each
impairment for strong categories resulted because strong Rp+ category could participate in the SS, SW, W.S, and WAV conditions.
The newly constructed categories and exemplars had characteristics
items are better inhibitors, there should be more impairment
similar to those used in previous experiments. According to Battig and
for conditions containing strong Rp-I- items than for conditions Montague (1969) category norms, strong exemplars had an average
containing weak Rp+ items (i.e., average of SS and SW rank order of 8, and weak exemplars had an average rank order of 50,
impairment > average of W_S and WW impairment). Finally, which was substantially lower than that of weak items in Experiments 1
if the greater impairment for strong categories arises because and 2 (M = 33). Thus, there was a clear difference in the taxonomic
strong items are more vulnerable to suppression, more impair- frequency of exemplars across the strong and weak item sets.
ment should occur for conditions containing strong R p - items As before, exemplars were constrained to be low-frequency, noncom-
than for conditions containing weak R p - items, irrespective of pound words. The average word frequency (Kucera & Francis, 1967)
the strength of the practiced set (i.e., the average of SS and W.S for all eight categories was 12 occurrences per million, not differing
impairment > average of SW and WW impairment).2 substantially between strong (M = 15) and weak exemplars (Af = 8).
Because the newfinaltest used only thefirstletters of exemplars to cue
An additional benefit arising from the inclusion of mixed
subjects, no two exemplars within a category were allowed to begin
categories in Experiment 3 is that it affords further tests of the with the same first letter. Exemplars from different categories could
ratio-rule model. Ratio-rule models make two predictions with begin with the same first letter (for the obvious reason that we have
respect to performance on tests of our Nrp baseline items. more than 26 words), but efforts were taken to distribute this overlap
First, the probability of recalling a strong exemplar should be among letters, categories, and conditions. Because our materials pool
greater for strong items in an SW baseline category than for was large, we relaxed the constraints that no exemplar could begin with
strong items in an SS baseline category. This prediction arises the same first two letters as any extraexperimental exemplar from its
because the presence of additional strong items in the SS own category or as any exemplar from other presented categories,
category reduces the relative strength of those strong items. although these constraints were honored to the degree possible. As
Second, for similar reasons, weak items in SW baseline before, versatilities of the two-letter stems were constrained to be at a
moderate level of difficulty (M = 246), and did not differ substantially
categories should be recalled less well than weak items in WW
across strong (M = 244) and weak (M = 248) exemplars. The construc-
baseline categories because the presence of strong items tion of such large categories in accordance with these constraints
should reduce their relative strengths. Thus, our mixed base- required us to replace two of our previous categories, Leather and
line categories enable us to test predictions of the ratio-rule Hobbies, with new categories, Insects and Fish.
model on the basis of results that are not likely to have been Learning booklets. The strong and weak exemplars of each cat-
affected by any special dynamics that may have arisen in our egory were randomly divided into two subsets, SI and S2 in the case of
retrieval-practice phase. strong exemplars and Wl and W2 in the case of weak exemplars, as
illustrated in Appendix C. We used these materials to construct six
different types of learning booklets: SS booklets, containing only
Method
Subjects 2
A further prediction might be made that strong R p - items should
be more impaired in the W.S than in the SS condition because those
The subjects were 64 students (16 in each of the four between-
items might cause more interference during the practice of weak Rp+
subjects conditions) from the University of California, Los Angeles. Of
items. This prediction requires that either (a) the probability that a
these, 48 students participated in partial fulfillment of a course
strong R p - item will intrude is a function of its strength relative to
requirement and 16 students (8 in condition SW and 8 in condition
Rp+ items in that category rather than a function of its own absolute
3S£S) were paid for their participation.
strength, or (b) the intrusion probability for strong R p - items is
equivalent in the \¥S and SS conditions but that the longer search time
Design necessary for weak Rp+ items provides more occasions for intrusion,
and thus, inhibition. Although the former approach can be questioned
The design of Experiment 3 differed from that of Experiment 2 in on the basis of the failures of strength-dependent competition in
that category composition was manipulated between subjects and had Experiment 2, the latter assumption seems plausible.
REMEMBERING CAUSES FORGETTING 1075

categories having six strong exemplars each; WW booklets, containing were equivalent for conditions in which the taxonomic frequen-
only categories having six weak exemplars each; and four SW booklets, cies of items were the same (e.g., for SS and SW and for WS
containing only categories having three strong and three weak exem- and WW).
plars each. (Note that no underlining is needed to denote the contents
of the learning booklets and that the order of S and W is irrelevant.)
The latter four booklets were designed by making all four possible Final Test Performance
combinations of strong and weak subsets of our categories: S1W1, As in Experiments 1 and 2, we collapsed across most of our
S1W2, S2W1, and S2W2. Thus, we completely counterbalanced for
counterbalancing factors because they did not interact with the
exemplar-specific effects within each exemplar type (S or W), and, in
the case of SW categories, ensured that all combinations of strong and variables of interest. The statistical treatment of Nrpa and
weak exemplars were presented for study. Nrpb subdivisions, however, differed somewhat from that of
Retrieval-practice booklets. As in Experiments 1 and 2, the eight the previous two experiments. Whereas it was feasible to
categories were randomly divided into two subsets of four each: sets A collapse across these two measures in the SS and 3KW groups,
and B. For each of our four category-composition types, SS, SW, W_S, in which Nrpa and Nrpb subsets represented the same item
and 3KW, one half of the subjects were given retrieval practice on Set pools, it was not feasible in the SW and WS conditions, in
A, the other half on Set B. In the cases of SS and WW, the which Nrpa and Nrpb subsets reflected different item pools
exemplar-specific counterbalancing was identical to that used in the (strong and weak items). To avoid differences in the number of
previous experiments: Half of the subjects practiced condition SI (or observations entering into Nrp measurements between homo-
Wl) and half practiced S2 (or W2), resulting in four retrieval-practice geneous categories (SS and WW) and heterogeneous catego-
counterbalancing conditions: AS1, AS2, BS1, and BS2 (or AW1, etc. in
ries (SW and WS), we restricted our comparisons of Rp—
the case of weak exemplars). In the SW and WJ5 conditions, only the
category-level counterbalancing was used because the distinction items to the Nrpb subset (which always matched the taxonomic
between these two conditions reflects the item counterbalancing (i.e., frequency of R p - exemplars) and our comparisons of Rp+
the only difference between W.S and SW subjects was which items they items to Nrpa subsets (which always matched the taxonomic
practiced). Thus, for both SW and JKS conditions, there were only two frequency of Rp+ exemplars).
retrieval-practice counterbalancing conditions. Eight retrieval-prac- Table 4 shows the percentages of each type of item that were
tice booklets were constructed to implement these counterbalancing correctly recalled on the final category-plus-stem cued-recall
measures: four booklets—SI, S2, Wl, and W2—for each of our two test as a function of category composition and within-category
category subsets, A and B. Unlike our previous studies, however, only testing position. As expected, overall performance in Experi-
one random order for each booklet type was constructed instead of
ment 3 (M = 56.2%) decreased relative to that observed in
three.
Final test booklets. The format of the testing pages of the final test Experiment 2 (M = 74.8%), most likely owing to the use of
booklets was identical to that of Experiment 2: one category-plus-stem single-letter rather than two-letter stems to cue the recall of
cued-recall test per page. The test-phase-counterbalancing and average- exemplars during the final test. This decrease in performance
position-matching measures were also carried over from Experiment 2, eliminated the possibility of a ceiling-effect problem as had
with the following exceptions: (a) Because, for any given subject, all occurred in Experiment 2, allowing us to assess reliably the
categories were of one type only (e.g., SS), matching of the average
testing position of category types was unnecessary, and (b) the
Table 4
counterbalancing of the half of the testing sequence in which a
category appeared was eliminated. These measures resulted in 2 test Mean Percentage of Items Recalled on a Category-Plus-Stem
counterbalancing conditions (corresponding to the exemplar-order Cued-Recall Test as a Function of Category Composition and
counterbalancing) for each of our six different learning booklet types. Within-Category Testing Position in Experiment 3
Because testing orders for SW and B£S conditions were identical, Retrieval practice status of item
however, only eight booklet types were actually required to implement
these 12 conditions. Category composition Rp+ Rp- Nrpa Nrpb
The two practice counterbalancing booklets for each of the four Strong-strong (SS) 79.6 (S) 56.8 (S) 64.1 (S) 66.2 (S)
combinations of SW learning booklets (S1W1, S1W2, S2W1, and Tested first 83.2 54.2 62.6 60.4
S2W2), when crossed with the 2 different test booklet types, resulted in Tested second 75.9 59.3 65.6 71.9
16 practice-test booklet combinations, one for each subject. The 4 Strong-weak (SW) (78J.(S) 4X9 (W) 55;2(S) 44.2 (W)
practice counterbalancing booklets for SS and WW learning booklets, Tested first 78.1 52.1 56.2 46.8
when combined with testing order counterbalancing, resulted in 8 Tested second 78.1 43.7 54.2 41.6
different practice-test booklet combinations, one for every 2 subjects.
Weak-strong (WS) 66.2 (W) 51.0 (S) 48.9 (W) 60.5 (S)
Filler materials were identical to those used previously. The procedure Tested first 5377 522 W9 64Z
used in Experiment 3 was identical to that of Experiment 2. Tested second 68.7 49.9 47.9 56.3
Weak-weak (WW) 610 (W) 4Z2(W) 4Z2(W) 33.4 (W)
Tested first 58.4 43.7 40.6 32.3
Results and Discussion Tested second 65.6 40.7 43.8 34.5
Retrieval-Practice Performance Note. Rp+ = practiced exemplars from practiced categories; Rp— =
unpracticed exemplars from practiced categories; Nrpa and Nrpb =
The retrieval-practice success rates varied across the SS unpracticed exemplars from unpracticed categories. An S or a W in
(M = 82%), SW (Af=82%), WS (M = 67%), and WW parentheses denotes the strength of the exemplars in that cell. Tested
(M = 68%) conditions, as one might have expected on the first or second = items tested in the first or second three positions of a
category block. Comparisons of R p - and Nrpb baseline items reflect
basis of the differing taxonomic frequencies of practiced items impairment. Comparisons of Rp+ and Nrpa baseline items reflect
across these sets. Note that the retrieval-practice success rates facilitation.
1076 M. ANDERSON, R. BJORK, AND E. BJORK

absolute and proportional differences in facilitation and inhibi- As can be seen in Table 5, which summarizes facilitation and
tion. The absolute facilitation owing to retrieval practice impairment effects for Rp— and Rp+ items as a function of
obtained for weak items was not different from that obtained Rp+ and R p - strength, there is little evidence that variations
for strong items, (Rp+) - (Nrp) = 64.1 - 45.6 = 18.5% and in the strength of Rp+ items modulated impairment of R p -
78.9 - 59.7 = 19.2%, respectively, F(l, 60) < 1, reinforcing recall: The impairment to R p - items when the Rp-l- items
the conclusion that the difference in facilitation observed in were strong (-2.9%) was not significantly different from the
Experiment 2 arose from the influence of ceiling effects on the impairment to R p - items when the Rp+ items were weak
recall of strong items. Contrary to expectation, weak exemplars (-0.3%),F(l,60) < 1, failing to support the lateral inhibition
also failed to show proportionally greater facilitation than hypothesis. Furthermore, the impairment to R p - items was
strong exemplars (28.9% and 24.3%, respectively), F(l, 60) < nonsignificant in both cases, presumably because the facili-
1, as in Experiment 1. Again, the failure for weak exemplars to tatory and inhibitory effects on the recall of R p - items as a
exhibit greater facilitation than strong exemplars may reflect function of R p - strength cancelled each other out. The
the fact that final recall performance underestimates facilita- pattern of results presented in Table 5 implies that the variable
tion due to retrieval practice (see Experiment 1). However, the modulating the degree of retrieval-induced forgetting is not
strengthening-bias explanation proposed to account for the the strength of the Rp+ item but the strength of the R p - item,
greater impairment for strong categories obtained in Experi- as predicted by the suppression hypothesis. Specifically, if
ment 2 is clearly not supported by the present results. nontarget competitors are strong, they are more likely to be
inhibited than if they are weak, regardless of whether practiced
items are strong or weak.
Final Recall Performance Averaged Across It is important to emphasize that the present findings
Output Position replicate the complete absence of impairment that was ob-
served for weak R p - items in Experiment 2, despite variations
Except for the lower level of overall performance, the results in materials and testing procedure. Indeed, there is even some
of Experiment 3 were similar to those of Experiment 2. A indication that weak R p - items may profit from the practice of
significant main effect for category composition was obtained, their competitors. There are several reasons why these surpris-
F(3,60) = 8.2,p < .0001, with the average recall of subjects in ing results cannot be explained by either the strengthening-
the SS condition (66.6%) being superior to the average recall bias and covert-retrieval hypothesis or the extraexperimental
of subjects in the SW (56.3%) and the W.S (56.7%) conditions, interference hypothesis. First, if strong Rp+ items received
F(l, 60) = 7.1,p < .01, and the recall of subjects in the latter more strengthening, they should have displayed greater abso-
two sets being superior to that of subjects in the S W lute and proportional facilitation with respect to their Nrp
conditions (44.9%), F(l, 60) = 9.2, p < .01. Thus, our baseline than did the weak Rp+ items. As noted earlier,
manipulations of taxonomic frequency clearly had the desired however, both the absolute and the proportional facilitation
impact on recall performance. Furthermore, as expected, for strong and weak exemplars were statistically equivalent,
planned comparisons revealed that retrieval practice improved and, if anything, evidenced proportionally greater facilitation
overall recall of Rp-t- items (M = 71.5%) over Nrpa items for the weak Rp+ items. Furthermore, the impairment ob-
(M = 52.6%), F(l, 60) = 53.0,p < .0001, MSe = .043, but, on served for R p - items in the W.S condition, in which the
the whole, did not reliably damage recall of the Rp— items hypothetically less facilitated weak items were practiced,
(M = 49.5%) relative to Nrpb items (M = 51.1%), F(l, 60) < makes an explanation of the greater impairment for strong
1. Facilitation of practiced items did not interact with category R p - items in terms of less facilitation for weak Rp+ items
composition whether the taxonomic strengths of the practiced unlikely. Second, if weak categories were less impaired be-
items were contrasted (SS and SW vs. WS and WW = 19.2% cause the difficulty of weak Rp+ items led subjects selectively
vs. 18.5%) or whether the taxonomic strengths of the R p - to rehearse R p - items, we should have observed (a) no
competitor items were contrasted (SS and WS vs. SW and impairment, and perhaps facilitation in the WS condition, and
WW = 16.4% vs. 21.4%), with F(l, 60) < 1 in all cases. (b) substantial impairment in the SW condition. Because
The crucial comparisons, however, regard interactions of
inhibition with the levels of our category composition factor. In
particular, the suppression hypothesis predicts greater impair- Table 5
ment for conditions in which Rp— items were strong (SS and Impairment ofRp— Items and Facilitation ofRp+ Items on a
W.S) than for those in which Rp— items were weak (SW and Category-Plus-Stem Cued-Recall Test as a Function of the
WW). This interaction was found to be significant, appearing Taxonomic Strength ofRp+ andRp— Items in Experiment 3
when absolute impairment was considered, F(l, 60) = 10.5, Strength of R p - Items
Strength of
p < .01, as well as when proportional impairment was Rp+ items Strong Weak
considered, although the latter interaction was only marginally
Strong -9.4 ( + 15.5) +3.7 (+22.9)
significant, F(l, 60) = 3.2,/? = .08. Interestingly, the interac- Weak -9.5 (+17.3) +8.8 (+19.8)
tion resulted both from significant absolute inhibition in strong M -9.5 +6.3
R p - conditions, (Rp-) - (Nrpb) = 53.9 - 63.4 = -9.5%, Note. Impairment = (Rp—) - (Nrp); facilitation = (Rp+) — (Nrp).
F(l, 60) = 7.6, p < .01, and from marginally significant Rp+ = practiced exemplars from practiced categories; Rp— =
facilitation in weak R p - conditions, 45.1 — 38.8 = +6.5%, unpracticed exemplars from practiced categories; Nrp = unpracticed
F(l, 60) = 3.3,p = .07. exemplars from unpracticed categories.
REMEMBERING CAUSES FORGETTING 1077

neither the impairment of strong nor the facilitation of weak 2, however, testing order did not interact with our category
R p - items showed a significant effect of the strength of the composition factor, F(3,60) = lA,p > .2, even when attention
practiced exemplar, F(l, 60) < 1 in both cases, biases in covert was restricted to only those conditions used in Experiment 2
rehearsal cannot explain the present data. Finally, because the (SSandW^),F(l,60) < 1. Because the number of subjects in
SW and W_S conditions had the same extraexperimental each condition (n = 16) was smaller than in the previous
exemplar set and because the Rp+ items in those conditions experiment (n = 48), and because there is considerable variabil-
were strengthened to a proportionally equivalent degree, the ity in the effects of testing order for both strong items (overall,
lack of impairment in the SW condition (and probably in the four cells show impairment, three show facilitation, and one is
JKW condition as well) cannot be explained by the extraexperi- a tie) and weak items (overall, four cells show impairment and
mental interference hypothesis. Thus, it appears that the four show facilitation), comparisons of individual cells are not
failure of retrieval-based strengthening in the SW and 3KW likely to be meaningful. However, when all cells with strong
conditions to impair Rp— items constitutes a genuine violation and weak exemplars are considered (i.e., Rp+, R p - , Nrpa,
of the strength-dependence assumption. The implications of and Nrpb for all conditions), strong items tested first
these findings for ratio-rule models are elaborated further in (M = 63.9%) are no different than strong items tested last
the General Discussion section. (Af = 63.9%), nor are weak items tested first (M = 48.4%)
We also examined the performance of strong and weak different than weak items tested last (M = 48.3%). The rea-
exemplars in our Nrp baseline conditions to determine whether sons for this failure to replicate the output interference of
they conformed to the patterns predicted by relative strength Experiment 2 are unclear.
models. Ratio-rule models predict that strong exemplars in the In summary, the results of Experiment 3 replicated those of
SW and W.S conditions should be recalled better than those in Experiment 2 in most major respects, including (a) the greater
the SS condition because a strong item's relative strength is impairment for strong than for weak Rp— items; (b) the
reduced in the latter case. Not only did we fail to observe this complete absence of impairment for weak Rp— items; and (c)
pattern, we observed what may be a trend in the opposite the presence of R p - impairment when Rp— items were tested
direction: As can be seen in Table 4, recall of strong exemplars before their Rp+ competitors. In addition, Experiment 3
in SS categories (65.2%) appeared to be better than the demonstrated that the greater impairment for strong catego-
average recall of strong exemplars in the SW and W.S catego- ries observed in Experiments 1 and 2 is attributable to a
ries (57.9%), although this was not significant, F(l, 30) = 2.3, greater susceptibility of strong Rp— items to impairment,
p = .14. Similarly, weak exemplars in the WJW condition should rather than to either a greater potency of strong Rp+ items as
be recalled better than weak items in the W_S or SW condi- inhibitors or to the covert strengthening of weak Rp— items.
tions. This trend also failed to occur, and the opposite pattern
was suggested: The recall of weak exemplars in WJW categories
(37.8%) appeared to be worse than the average recall of weak General Discussion
exemplars in the SW and W.S categories (46.6%), although this Three generalfindingsemerge from the current work. First,
difference was only marginally significant, F(l, 30) = 3.3,p = retrieving information repeatedly can impair recall perfor-
.08. This pattern of results constitutes yet another violation of mance on related information. In Experiment 1, retrieval
the strength-dependence assumption, contradicting the predic- practice on three members of a studied category, such as Fruit,
tions of a ratio-rule model. improved recall performance for those items on a subsequent
test but often at the cost of decreasing recall performance for
the remaining three members. Experiments 2 and 3 replicated
Impact of Testing Order on Final Recall Performance
this impairment and generalized it to a category-plus-stem
The most important testing-order finding of Experiment 3 cued-recall test. Thus, the act of remembering can cause
was the replication of significant Rp— inhibition at different forgetting of semantically related material on a later recall test.
positions in the testing sequence. As illustrated in the rows Second, the present experiments demonstrate that the
labeled Testedfirstin Table 4, the recall of strong Rp— items negative effects of retrieval can endure well beyond the
was impaired when they were tested before Rp+ items. As in immediate context in which a competitor is retrieved. In all
Experiment 2, the reliable impairment observed for strong three experiments, the impairment of nonpracticed exemplars
Rp— items (SS and W_S) did not vary with the position in which was still in evidence after the 20 min retention interval
R p - items were tested: (Nrpblst) - (Rp-lst) = 9.3%; between retrieval practice and the final test. This finding
(Nrpb2nd) - (Rp-2nd) = 9.5%, with the interaction, F(l, contrasts with those from previous studies that focused exclu-
60) < 1. Nor did the greater impairment for strong Rp— items sively on retrieval-based impairment within a single testing
than for weak R p - items interact with testing order, F(l, session (e.g., Blaxton & Neely, 1983; Brown, 1981; Dong, 1972;
60) < 1. Again, because Rp— items that are tested first are not Roediger, 1973; Roediger & Schmidt, 1980; Smith, 1971,
contaminated by the potentially interfering effects of Rp-t- 1973). These previous studies did not address the durability of
output, we can attribute the impairment of strong Rp-lst output interference, leaving it unclear whether output interfer-
items to effects enduring from the retrieval-practice phase. ence contributed to long-term forgetting or reflected a tran-
Thus, thefindingof enduring inhibition was replicated. sient interference. The present finding demonstrates that the
As in Experiment 2, items recalled later in a category negative effects of retrieval are not restricted to a single output
(M = 56.1%) were not, in general, recalled worse than items session and suggests that the reasons for this enduring quality
recalled earlier in a category (M = 56.2%). Unlike Experiment are more complex than we anticipated.
1078 M. ANDERSON, R. BJORK, AND E. BJORK

Initially, we expected that impairment would occur after 20 practiced companions. Indeed, the retrieval-practice proce-
min because the practice-based facilitation would persist, dure was designed to maximize this strengthening because the
allowing practiced items to block unpracticed competitors. In prediction of retrieval-induced forgetting was based on the
Experiments 2 and 3, we studied these assumptions more strength-dependence assumption. Several of the present find-
closely by manipulating the output order of Rp+ and R p - ings, however, lead one to question whether an item's recall
items at test. Interestingly, Rp— impairment still occurred probability is affected by the strength of its competitors.
when category-plus-stem cues (e.g., Fruit Or ) were The most compelling findings are summarized in Table 6,
used to force subjects' output of R p - items before Rp+ items. which displays the facilitatory and inhibitory effects of retrieval
This result suggests that output interference at test cannot be practice as a function of the strength of unpracticed and
the sole explanation of the Rp— impairment and that an practiced competitors for all three experiments. The mean
additional inhibitory component persists throughout the 20- facilitation of Rp+ items, illustrated in the right column of
min retention interval. This impairment may be the first Table 6, makes it clear that retrieval practice strengthened
demonstration of inhibition at a long retention interval that practiced items (average facilitation across all three experi-
cannot be explained by prior output of dominant items. ments, M = 17.7%). If this facilitation caused impairment by
Whatever the contributions of practice- and test-based sources blocking access to R p - items, we should have observed R p -
of impairment may be, the present experiments show that impairment whenever facilitation of Rp+ items was in evi-
retrieval is a significant factor contributing to long-lasting dence. Yet, the inhibitory effect of retrieval practice (left
memory failure. column) depended greatly on whether unpracticed items were
Finally, and unexpectedly, retrieval appears to have its weak exemplars (bottom left) or strong exemplars (top left).
greatest negative effects on items strongly associated to the When R p - items were weak, no impairment occurred (bottom
current retrieval cue. In Experiment 1, recall of unpracticed left, averaged across experiments, M = +2.7%; the impair-
members from strong-exemplar categories (e.g., Fruit Orange) ment in Experiment 1 will be addressed in the Suppression
suffered significantly more retrieval-induced forgetting than section), even though their practiced companions were strongly
did recall of unpracticed members from weak-exemplar catego- facilitated (bottom right, M = 20.7%). Furthermore, as shown
ries (e.g., Tree Hickory). This general pattern was replicated in Row 8 of Table 6, recall of weak Rp— items remained
with the category-plus-stem cued-recall task of Experiments 2 unaffected (M = 3.7%), even when their practiced competi-
and 3, except that unpracticed members of weak-exemplar tors were already more accessible because they were strong
categories were not simply less impaired than members of exemplars of the category. In contrast, when R p - items were
strong-exemplar categories, they were either unimpaired alto- strong, significant impairment occurred (top left,M = -9.9%),
gether or they were even facilitated by the retrieval of their even though their practiced competitors were no more, and
competitors. Experiment 3 demonstrated that the strength of possibly less, facilitated than the aforementioned practiced
the unpracticed item, not the strength of the practiced item, items (see top right, M = 14.7%). This pattern of R p -
had determined the impairment observed in Experiments 1 impairment across strong and weak exemplars was consistent
and 2: Strong competitors were impaired independently of the across three experiments that varied in materials and testing
type of item that was practiced (strong or weak), whereas weak procedures, and it was not influenced by the taxonomic
competitors were unimpaired by practice of those same items. strength of the practiced competitors (as can be seen by
These findings suggest the surprising conclusion that highly
accessible items will be the most vulnerable to retrieval-
induced forgetting. Table 6
When trying to explain why retrieval of some items has Impairment (Rp—) — (Nrp) and Facilitation (Rp+) — (Nrp)
negative effects on other items, one is inevitably drawn to the Due to Retrieval Practice Across Experiments 1, 2, and 3 as a
significant facilitatory effects of retrieval practice as a potential Function of the Taxonomic Strength oftheRp— Set and the
cause. The intuition that strong items block the retrieval of Strength oftheRp+ Set
weaker ones is compelling, even though the empirical justifica- Effect of retrieval practice
tion for this intuition is not as strong as one might like. If the Strength of R p -
impairment observed at present related sensibly to the degree and strength Impairment Facilitation
ofRp+ Exp. N ( R p - ) - (Nrp) (Rp+) - (Nrp)
of strengthening, it would clearly support the strength-
dependence assumption. In the next two sections, we argue Strong items -9.9 + 14.7
Strong 1 36 -15.7*" +25.0" *
that strength-dependent competition has difficulty accounting Strong 2 48 -8.0** +8.4*'
for the pattern of impairment across our experiments and that Strong 3 16 -9.4'* + 15.5"
a retrieval-based suppression mechanism provides a better Weak 3 16 -9.4** + 17.3**
account. We then discuss relations of the present findings to Weak items +2.7 +20.7
research on retroactive interference, part-set cuing and the Weak 1 36 -6.3* +21.9"
list-strength effect. Weak 2 48 +0.2 + 17.2**
Weak 3 16 +8.8* + 19.8"
Strong 3 16 +3.7 +22.9"
Strength-Dependent Competition Note. Rp+ = practiced exemplars from practiced categories; Rp— =
unpracticed exemplars from practiced categories; Nrp = unpracticed
The impairment of unpracticed category members might exemplars from unpracticed categories.
seem to result from the retrieval-based strengthening of their *p < .05. " p < .01. ' " p < .001.
REMEMBERING CAUSES FORGETTING 1079

comparing Row 3 vs. Row 4 and Row 7 vs. Row 8 of Table 6). It site pattern should be true according to strength-dependent
appears from these results that the strengthening of a competi- competition models (augmented with fairly common learning
tor (whether defined in terms of taxonomic frequency or in assumptions). Greater proportional impairment for weak cat-
terms of retrieval-based facilitation), though correlated with egories is predicted because retrieval practice should increase
the events that lead to impairment, is not the cause of the the associative strength of weaker items to a proportionally
effect; the critical variable is the strength of the unpracticed greater extent. Although this assumption appears justified, the
item. difference in facilitation for strong and weak items was not
The failure of strong competitors to impair recall is not statistically reliable; nonetheless, even with proportionally
restricted to the retrieval-practice manipulations summarized equivalent facilitation, impairment should not be greater for
in Table 6. In Experiment 3, recall of baseline items (i.e., Nrp strong-exemplar categories (as shown in Appendix A), as it
items) varying in taxonomic frequency showed a similar pat- was found to be in all three experiments. As argued in the
tern. Neither the recall of strong nor the recall of weak Nrp discussions of Experiments 1 and 3, these findings cannot be
exemplars decreased when strong competitors were substi- explained by such factors as covert rehearsal or biases in the
tuted for weak ones: As can be seen in Table 4, recall of strong strengthening of practiced items in strong categories. Even
Nrp items in the SW and WS conditions (55.2 and 60.5, when we focus on the category-cued free-recall procedure of
respectively; M = 57.9) was not different than recall of those Experiment 1, the pattern of impairment does not relate
same Nrp items in the SS condition (64.1 and 66.2, M = 65.2); sensibly to the strengthening of competitors.
similarly, recall of weak Nrp items in the WW condition (42.2 Thus, although it is compelling to attribute the impairment
and 33.4, M = 37.8) was not different than recall of those same of unpracticed exemplars to the strengthening of their prac-
Nrp items in the SW or WS conditions (44.2 and 48.9, ticed competitors, this approach appears to be inadequate, if
respectively, M — 46.6). Indeed, if there was any effect of not mistaken. The facilitation of practiced items does not
adding strong competitors, it was positive, not negative. This relate in any orderly way to the degree of impairment; rather,
pattern of results clearly violates the strength-dependence the strength of unpracticed exemplars is the best predictor of
assumption. Even when differences in the relative strength of their own impairment. When trying to explain these failures of
competitors were operationalized according to variations in strength-dependent competition, one must keep in mind that
taxonomic frequency (which did, in fact, result in highly retrieval is functionally distinct from other strengthening
significant differences in recall rates) rather than according to procedures such as multiple presentations of an item (see, e.g.,
retrieval-based learning, the predicted strength-dependent Blaxton & Neely, 1983, for an informative contrast of these
competition effects failed to occur. procedures). In particular, retrieval involves the search for an
One might object that these failures of the strength- item in memory and the discrimination of that target item from
dependent competition predictions arise from the category- among a set of partial matches. Thus, when strengthening
plus-stem testing procedure we used in Experiments 2 and 3. occurs through retrieval, as opposed to other strengthening
In this procedure, subjects may have treated the category and methods in which the full item is presented to subjects, the
the exemplar stem as a joint retrieval cue, focusing memory activation of these partial matches may have significant impli-
search to category exemplars beginning with that stem. Be- cations for success on later retrieval tasks. These special
cause all exemplar stems were constructed to be unique in the qualities of retrieval led us to consider the contribution of
category (and, in most cases, in the experiment), such a search suppression in the production of retrieval-induced forgetting.
would exclude Rp+ items from the search set. If the stem-
Suppression
completion testing procedure eliminated Rp+ items from the
search set, it should not be surprising (from the standpoint of The failure of strength-dependent competition to account
relative strength models) to find that Rp— items were unim- for the pattern of results obtained in the present research
paired by the greater strengths of Rp+ items. The difficulty argues for some other mechanism associated with retrieval
with this reasoning is that although it may account for the lack that causes forgetting. One possibility is that the observed
of impairment for weak R p - items in Experiments 2 and 3, it impairment reflects the inhibition of the affected items, as
leaves the impairment of strong Rp— items in those same suggested in some modified spreading-activation theories of
experiments unexplained. Thus, the results of Experiments 2 memory retrieval. In these theories, presenting a cue should
and 3 imply either that (a) the stem-completion testing activate all associated responses in parallel; this initial spread
procedure eliminates the blocking predicted by strength- of activation may then need to be focused to isolate the target
dependent competition and that a mechanism other than response from interfering competitors. Although focusing can
blocking is contributing to the retrieval-induced forgetting be achieved in various ways, inhibition is often thought to
observed for strong items or that (b) impairment is not a subserve this function (Blaxton & Neely, 1983; Carr & Dagen-
necessary consequence of the strengthening of competitors. bach, 1990; Gernsbacher, Barner, & Faust, 1990; Keele &
But even if we focused exclusively on the category-cued Neill, 1978; Martindale, 1981; Neely & Durgunoglu, 1985;
free-recall testing procedure of Experiment 1, the relationship Neill & Westberry, 1987; Walley & Weiden, 1973). If nontar-
between the degree of impairment and the degree of facilita- get items are inhibited during retrieval of target exemplars,
tion does not fit the strength-dependent competition model. In subsequent recall of those inhibited items should be impaired.
Experiment 1, as in Experiments 2 and 3, both absolute and This inhibition may be sufficient to produce retrieval-induced
proportional impairment were greater for strong-exemplar forgetting.
categories than for weak-exemplar categories. Yet, the oppo- An inhibitory theory of retrieval-induced forgetting can
1080 M. ANDERSON, R. BJORK, AND E. BJORK

account for several important features of the present findings. targets produce more, not less, inhibition than weak targets. If
First, it offers an explanation for the greater impairment of highly associated targets become more active when presenta-
strong items observed in all three experiments (Table 6, top tion of the cue occurs and if increases in target activation lead
left). Strong Rp— items should be more impaired because to increases in the inhibition that is spread laterally to
their greater associative strength should lead them to interfere competitors, strong exemplars should cause more inhibition
more with the retrieval practice of their competitors, and this than weak exemplars. Both approaches assume that the
greater interference should, in turn, render those strong items severity of inhibition relates to the strength of the target item,
more vulnerable to inhibition. In contrast, weak R p - items yet the findings of Experiment 3 suggest that this assumption
may remain totally unimpaired (Table 6, lower left) or may may not be correct: The degree of impairment suffered by
even be facilitated by their initial activation (Table 6, Row 7), Rp - items did not depend on whether strong or weak category
provided that their level of activation does not interfere with exemplars were practiced (see Rows 3, 4, 7, and 8 in Table 6).
the retrieval practice of their competitors. Second, the impair- The failure for impairment to be related to target (Rp+)
ment of Rp— items that were tested before Rp+ items (i.e., strength suggests that inhibition may not be an automatic
Rp-lst items) in Experiments 2 and 3 would be explained: process mediated by the representations of competing target
Impaired recall of Rp-lst items would reflect inhibition that items. The results are consistent, however, with a process of
endured from the prior retrieval-practice phase, as suggested active suppression, applied directly to competing items to the
previously. Finally, the many failures of the strength of a extent that those items interfere with task demands (see, e.g.,
competitor to affect recall probability can be explained if we Blaxton & Neely, 1983; Keele & Neill, 1978; Neely & Durguno-
assume that a competitor's strength decreases retrieval speed glu, 1985; Neill & Westberry, 1987).
without affecting retrieval probability. The mere presence of Although suppression provides the best single account of
Rp+ items (or strong Nrp exemplars) in memory would then our data, it must be emphasized that this hypothesis is not
slow retrieval of R p - items (or Nrp competitors) on the final incompatible with strength-dependent competition. Indeed,
test, but should not prevent their recall. The recall of those there is some indirect evidence for a two-process interpreta-
Rp— items, however, should be impaired on the final test if tion of retrieval-induced forgetting. Weak Rp— items exhib-
their strength had impeded the retrieval practice of their ited small, but reliable recall impairment in Experiment 1 but
practiced companions. did not in Experiments 2 and 3, whereas strong Rp— items
Although inhibitory processes can account for the present exhibited reliable impairment in all three experiments. An
findings better than can strength-dependent competition, some interesting two-process interpretation of this pattern of impair-
aspects of the results are inconsistent with both hypotheses. ment is as follows: If the stem-completion testing procedure
First, the same strong items exhibited output interference used in Experiments 2 and 3 eliminated strength-dependent
(Strong 1st - Strong 2nd = 4.1%) in Experiment 2, but did competition (as suggested previously), the lack of impairment
not in Experiment 3 (0.0%).3 Second, Rp+ items never for weak Rp— items can be explained, but the impairment for
showed output interference in Experiments 2 or 3 strong R p - items in those same experiments cannot. If this
(Rp+lst - Rp+2nd = 0.6%, averaged across strong and weak testing procedure remained sensitive to suppression, however,
items for both experiments). According to the inhibition then the results of Experiments 2 and 3 show that strong items
hypothesis, prior retrieval of category members at final test suffer suppression but weak items do not. This interpretation
should inhibit the remaining strong items (whether those items suggests that the impairment of weak Rp— items in the
are Rp+ items or strong exemplars); according to strength- category-cued free-recall test of Experiment 1 may have arisen
dependent competition, these prior retrievals should strengthen entirely from strength-dependent competition. Whatever the
the retrieved exemplars, blocking access to subsequent items. contributions of strength-dependent competition, however,
It is possible that a single retrieval of each item on the final test the present results argue that an active suppression mecha-
may not be sufficient to produce the expectation of reliable nism causes much of the long-lasting retrieval-induced forget-
differences in recall for either theory. Whatever the proper ting in the retrieval-practice paradigm.
explanation may be, these inconsistencies afflict both theories.
Given this observation, the results are most consistent with a Relation to Other Empirical Findings
model in which inhibition is used to overcome interference
from competing items. Retrieval-induced forgetting resembles several other phe-
nomena in which enhancing recall of some items impairs
The present results support some inhibitory theories of memory for related information. For example, our findings
retrieval-induced forgetting more than others. Many theories resemble both retroactive interference effects and part-set
assume that the degree to which a target inhibits competitors cuing inhibition to the extent that retrieval practice is similar to
depends on the strength of that target item. For instance, in repeated learning trials and cuing, respectively. Despite these
their recent center-surround theory of semantic memory similarities, the pattern of impairment in the present experi-
retrieval, Carr and Dagenbach (1990) proposed that inhibition
enhances the discriminability of weakly activated targets that
may be overcome by the activation of competing codes. In this 3
Although the present experiment did not obtain output interfer-
theory, the weaker the target item, the more inhibited competi- ence, subsequent experiments with the same materials and procedure
tors should be (with the strength of competitors held con- have obtained sizable output interference effects (8 to 10%). The
stant), even when the target is not successfully retrieved. Other reason for the failure to find such effects in the present Experiment 3
formulations of lateral inhibition might assert that strong are unclear.
REMEMBERING CAUSES FORGETTING 1081

ments argues that retrieval-based learning is not the primary sions of first-list responses during tests of the second list; the
cause of retrieval-induced forgetting; rather, impairment ap- suppression account, however, attributes impairment to inhibi-
pears to result from an active suppression of unpracticed tion of the first-list target items rather than to weakening of
exemplars. This interpretation raises the possibility that the their cue-target associations (see the response-set suppression
commonly assumed link between strengthening and impair- hypothesis of Postman et al., 1968, for a similar emphasis on
ment in the aforementioned phenomena has been overstated response inhibition). The important point, for present pur-
or perhaps even misinterpreted. In this section, we show that poses, is that theoretical treatments of interference data that
these and other findings that support a causal link between focus exclusively on the strengthening of second-list responses
strengthening and impairment stem from paradigms that greatly understate the role of retrieval-induced forgetting.
confound strengthening and retrieval-induced forgetting. Thus, Indeed, if suppression contributes to retroactive interference
what appears to be strength-dependent competition may often as suggested by the present data, it becomes difficult to assess
be retrieval-based suppression. Although this general argu- whether strengthening by itself is sufficient to produce im-
ment applies to many phenomena, we focus on three for the paired recall.
purpose of illustration: retroactive interference, part-set cuing
inhibition, and the list-strength effect.
Part-Set Cuing Inhibition
A second illustration of the connection between strengthen-
Retroactive Interference
ing and impairment was provided in a study of part-set cuing
Perhaps nowhere has the apparent connection between inhibition by Rundus (1973). In this experiment, subjects
strengthening and impairment been more vividly demon- studied categorized word lists and then recalled items from
strated than in a classic study of retroactive interference by each category with varying numbers of exemplars provided as
Barnes and Underwood (1959). In their study, Barnes and cues. Rundus found that as the number of cues increased from
Underwood showed that recall for items from a first list of zero to four, recall of the remaining noncue items decreased.
paired associates systematically decreased with increases in Based on the assumption that cue exemplars were strength-
the number of learning trials administered on a second list of ened by their presentation at test, Rundus concluded that the
associates. Decreases in the recall of first-list responses corre- decline in recall of noncue items was caused by the strengthen-
lated well with increases in the recall for second-list responses, ing of their cued competitors. Several replications of this basic
suggesting that strengthening second-list items caused the finding (see, e.g., Roediger, 1973, and Watkins, 1975) have
decrease in recall of their first-list competitors. This negative supported Rundus's interpretation, although manipulations of
correlation between second- and first-list recall has been cue type that should induce variations in strengthening (e.g.,
successfully modeled with strength-dependent competition taxonomic frequency of exemplars; intralist vs. extralist exem-
mechanisms (Mensink & Raaijmakers, 1988), without propos- plars) have failed to cause the predicted variations in impair-
ing the additional unlearning process included in both the ment (Basden et al., 1977; Karchmer & Winograd, 1971;
classical two-factor theory of interference (Melton & Irwin, Watkins, 1975). Nonetheless, Rundus's strength approach
1940) and in modern connectionist learning approaches (see, retains its popularity because it accounts for a range of part-set
e.g., Lewandowsky, 1991; Sloman & Rumelhart, 1992). De- cuing findings (see Nickerson, 1984, and Roediger & Neely,
spite the success of the strength approach in modeling these 1982, for reviews).
data, the present findings question whether the conditions of Although the robust relationship between the number of
strength-dependent competition are sufficient or even neces-
cues and impairment supports strength-dependent competi-
sary to produce retroactive interference.
tion, an alternative interpretation arises when we consider that
Although it is compelling to focus on the orderly relation- strengthening cues often causes subjects to retrieve those items
ship between the degree of strengthening on second-list before noncues. Cue items may be retrieved before noncues
responses and the amount of retroactive interference, an either overtly, if both cues and noncues are to be recalled (see,
alternative view arises when we consider that second-list e.g., Karchmer & Winograd, 1971; Roediger et al., 1977 for
responses in Barnes and Underwood's (1959) study were data on this point), or covertly during attempts to recall
strengthened by the method of anticipation. In this method, noncues, as is often presumed to occur in "blocking" models of
each cycle through a learning list entails two events for each part-set cuing inhibition (see, e.g., Rundus, 1973). When cue
paired associate: (a) presentation of that associate's stimulus items are retrieved early, noncues should suffer more retrieval-
as a cue, to which subjects must recall or "anticipate" the induced forgetting than the corresponding items for control
associated response and then (b) presentation of the response subjects for whom recall order has not been biased. As more
as feedback. By cuing recall in this manner, Barnes and cues are provided, more items should be retrieved prior to
Underwood effectively gave subjects retrieval practice on the noncues, further impairing noncue recall. Although decreases
second list. If the present analysis of retrieval practice is in noncue performance may be caused by strengthening of cue
correct, repeated suppression of first-list responses during items during their covert retrieval—a possibility noted by both
these trials may have caused the observed increases in retroac- Roediger (1974) and Rundus (1973), the present analysis
tive interference rather than (or perhaps, in addition to) suggests that noncue impairment reflects retrieval-based sup-
strengthening of second-list competitors. This account of pression. This interpretation receives support from a study by
retroactive interference effects parallels the classical notion of Blaxton and Neely (1983) in which speeded recall of several
unlearning (Melton & Irwin, 1940) in its emphasis on intru- prime exemplars from a semantic category slowed subsequent
1082 M. ANDERSON, R. BJORK, AND E. BJORK

recall of a target exemplar, whereas speeded naming of those (in Experiment 3, there was 0.0% impairment, despite 21.2%
same primes facilitated target recall. If strengthening were facilitation of strong items; in Experiment 6, there was 0.7%
sufficient to impair competing items, then both the recall and impairment, despite 27.2% facilitation). Thus the existing data
presentation of prime items should have impaired retrieval of on the list-strength effect provide no support for the relation
target exemplars. Thus, cuing by itself may not impair recall; between strengthening and impairment.
rather, the strengthening of cues may indirectly impair recall to
the extent that early retrieval of cue items suppresses noncues Concluding Remarks
at the time of test.
Although previous work has demonstrated the negative side
List-Strength Effect effects of retrieval, these effects have received surprisingly
A final illustration of the apparent relationship between little attention in modern theories of interference. The relative
strengthening and impairment comes from a recent series of neglect of these phenomena may stem from two factors. First,
studies on what has been termed the list-strength effect by retrieval-induced forgetting resembles other varieties of forget-
Ratcliff et al. (1990). The list-strength effect can be thought of ting in which facilitating recall of some items impairs memory
as an analog to the well-known list-length effect, except that performance on related competitors. Because retrieval clearly
performance on a target item (or set of items) is predicted to facilitates those items that are retrieved, it is tempting to
decrease from the strengthening of other list members rather reduce the associated impairment of related items to strength-
than from the addition of new list members. To test this dependent competition. Second, the characterization of re-
prediction, Ratcliff et al. developed the mixed-pure paradigm, trieval-induced forgetting as output interference may have
the goal of which was to show that strengthening one half of a hampered generalization of the phenomenon from the empiri-
list of words would both (a) impair performance on the cal context in which it was initially investigated. Indeed, the
remaining nonstrengthened list-half to a greater extent than term output interference connotes a fleeting source of interfer-
would be the case were the words to be on a list in which no ence, muddying measures of recall in list-learning experi-
items were strengthened (i.e., a pure-weak list) and (b) ments. Together, these factors may have discouraged the
facilitate performance on the strengthened list-half to a separate study of retrieval-induced forgetting.
greater extent than would be the case were the words to be on The present research has stressed the key role that retrieval
a list in which all items were strengthened (i.e., a pure-strong may play in producing long-lasting forgetting. Our findings
list). Strengthening may be accomplished either by increasing show that forgetting due to retrieval can last for at least 20 min,
the exposure time or the number of repetitions of the to-be- afflicting what we know the best, the most severely. Further-
strengthened items, and either free recall, cued recall, or more, the pattern of impairment in the present experiments
recognition memory can be tested. In a series of experiments suggests that the reduction of retrieval-induced forgetting to
using this paradigm, Ratcliff et al. found reliable list-strength strength-dependent competition, though parsimonious, has
effects in free recall, small and inconsistent effects in cued been misleading. Though strengthening correlates with impair-
recall, and either no effect or reverse effects in recognition ment, it may not, by itself, be the cause of forgetting; rather,
memory. Although the authors' interpretation of their entire impairment may instead reflect the negative side effects of a
pattern of results involved more than strength-dependent suppression process that assists in the resolution of retrieval
competition, this factor was thought to be crucial in producing competition. If this hypothesis is correct, it suggests that the
the observed free- and cued-recall effects. recall impairments observed in other paradigms in which the
effects of strengthening have not been adequately separated
Two points should be made concerning Ratcliff et al.'s
from the effects of retrieval-induced forgetting (e.g., retroac-
(1990) findings as evidence for the relationship between
tive interference, part-set cuing paradigms) may actually
strengthening and impairment. First, although the authors
reflect retrieval-based suppression rather than strength-
successfully demonstrated an overall list-strength effect in free
dependent competition. Thus, the contrary reduction may be
recall, the component of their data that produced this effect
possible: Strength-dependent competition may reflect the
was not impairment of the weak-list half: The weak half of the
mechanisms of retrieval-induced forgetting. Regardless of how
study list was impaired by 2.7%, even though the remainder of
the theoretical interpretation of these effects evolves, the
the list was strengthened by 25% (i.e., relative to a pure-weak
present research illustrates that retrieval can be a cause of
baseline, see Ratcliff et al., 1990, p. 172). Rather, the signifi-
long-lasting forgetting. The ubiquity of retrieval processes in
cant list-strength effect in free recall was produced by the 8%
our daily cognitive experience may render the mere use of
advantage of strong items in a mixed list over strong items in a
"what we know" the most common source offluctuationin the
pure-strong list (i.e., part "(b)" of the above list-strength
accessibility of our knowledge.
prediction). Second, even the small amount of impairment that
did occur in free recall cannot be confidently attributed to
strength-dependent competition because Ratcliff et al.'s free- References
recall measure suffers from the same output-order bias present Allen, G. A., Mahler, W. A., & Estes, W. K. (1969). Effects of recall
in studies of part-set cuing inhibition. If strengthened items tests on long-term retention of paired associates. Journal of Verbal
were retrieved before nonstrengthened items, retrieval-based Learning and Verbal Behavior, 8, 463—470.
suppression may have occurred. When such output-order Anderson, J. R. (1976). Language, memory and thought. Hillsdale, NJ:
biases were eliminated, as was the case in their cued-recall Erlbaum.
experiments, impairment of weak items disappeared entirely Arbuckle, T. Y. (1967). Differential retention of individual paired
REMEMBERING CAUSES FORGETTING 1083

associates within an RTT "learning" trial. Journal of Experimental subset of familiar items on recall of the remaining items: The John
Psychology, 74, 443-451. Brown effect. Psychonomic Science, 25, 224-225.
Baddeley, A. D. (1982). Domains of recollection. Psychological Review, Keele, S. W., & Neill, W. T. (1978). Mechanisms of attention. In E. C.
89, 708-729. Carterette & M. P. Friedman (Eds.), Handbook of Perception, (Vol.
Barnes, J. M., & Underwood, B. J. (1959). "Fate" of first-list 9, pp. 3-47). New York: Academic Press.
associations in transfer theory. Journal of Experimental Psychology, Kucera, H., & Francis, W. (1967). Computational analysis of present-
58, 95-105. day American English. Providence, RI: Brown University Press.
Basden, D. R., Basden, B. H., & Galloway, B. C. (1977). Inhibition Landauer, T. K., & Bjork, R. A. (1978). Optimum rehearsal patterns
with part-list cuing: Some tests of the item strength hypothesis. and name learning. In M. M. Gruneberg, P. E. Morris, & R. N.
Journal of Experimental Psychology: Human Learning and Memory, 3, Skykes (Eds.), Practical aspects of memory (pp. 625-632). London:
100-108. Academic Press.
Battig, W. F., & Montague, W. E. (1969). Category norms for verbal Lewandowsky, S. (1991). Gradual unlearning and catastrophic interfer-
items in 56 categories: A replication and extension of the Connecti- ence: A comparison of distributed architectures. In W. E. Hockley &
cut norms [Monograph]. Journal of Experimental Psychology, 80, S. Lewandowsky (Eds.), Relating theory and data: Essays in honor of
1-46. BennetB. Murdock (pp. 445-476). Hillsdale, NJ: Erlbaum.
Bjork, R. A. (1975). Retrieval as a memory modifier. In R. Solso (Ed.), Loftus, E. F. (1973). Activation of semantic memory. American Journal
Information processing and cognition: The Loyola Symposium (pp. of Psychology, 86, 331-337.
123-144). Hillsdale, NJ: Erlbaum. Loftus, G. R., & Loftus, E. F. (1974). The influence of one memory
Blaxton, T. A., & Neely, J. H. (1983). Inhibition from semantically retrieval on a subsequent memory retrieval. Memory & Cognition, 2,
related primes: Evidence of a category-specific inhibition. Memory & 467-471.
Cognition, 11, 500-510. Marshall, G. R., & Cofer, C. N. (1970). Single-word free association
Brown, A. S. (1981). Inhibition in cued retrieval./OW/TM/O/Experimen- norms for 328 responses from the Connecticut cultural norms for
tal Psychology: Human Learning and Memory, 7, 204-215. verbal items in categories. In L. Postman & G. Keppel (Eds.), Norms
Brown, A. S. (1991). A review of the tip-of-the-tongue experience. of word association (pp. 321-360). New York: Academic Press.
Psychological Bulletin, 109, 204-223. Martin, E. (1971). Verbal learning theory and independent retrieval
Brown, A. S., Whiteman, S. L., Cattoi, R. J., & Bradley, C. K. (1985). phenomena. Psychological Review, 78, 314-332.
Associative strength level and retrieval inhibition in semantic Martindale, C. (1981). Cognition and consciousness. Homewood, 111:
memory. American Journal of Psychology, 98, 433-447. Dorsey Press.
McGeoch, J. A. (1936). Studies in retroactive inhibition: VII. Retroac-
Burke, D. M., MacKay, D. G., Worthley, J. S., & Wade, E. (1991). On
tive inhibition as a function of the length and frequency of
the tip of the tongue: What causes word finding failures in young and
presentation of the interpolated lists. Journal of Experimental Psychol-
older adults? Journal of Memory and Language, 30, 542-579.
ogy, 19, 674-693.
Bush, R. R., & Mosteller, F. (1955). Stochastic models for learning. New
Melton, A. W., & Irwin, J. M. (1940). The influence of degree of
York: Wiley.
interpolated learning on retroactive inhibition and the overt transfer
Carr, T. H., & Dagenbach, D. (1990). Semantic priming and repetition
of specific responses. American Journal of Psychology, 3, 173-203.
priming from masked words: Evidence for a center-surround atten-
Mensink, G. J. M., & Raaijmakers, J. G. W. (1988). A model of
tional mechanism in perceptual recognition. Journal of Experimental
interference and forgetting. Psychological Review, 95, 434-455.
Psychology: Learning, Memory, and Cognition, 16, 341-350.
Neely, J. H. (1976). Semantic priming and retrieval from lexical
Dagenbach, D., Carr, T. H., & Barnhardt, T. M. (1990). Inhibitory
memory: Evidence for facilitory and inhibitory processes. Memory &
semantic priming of lexical decisions due to failure to retrieve
Cognition, 4, 648-654.
weakly activated codes. Journal of Experimental Psychology: Learning,
Neely, J. H., & Durgunoglu, A. Y. (1985). Dissociative episodic and
Memory, and Cognition, 16, 328-340.
semantic priming effects in episodic recognition and lexical decision
DaPolito, F. J. (1966). Proactive effects with independent retrieval of tasks. Journal of Memory and Language, 24, 466-489.
competing responses. Unpublished doctoral dissertation, Indiana Neely, J. H., Schmidt, S. R., & Roediger, H. L., III. (1983). Inhibition
University. from related primes in recognition memory. Journal of Experimental
Delprato, D. J. (1972). Pair-specific effects in retroactive inhibition. Psychology: Learning, Memory, and Cognition, 9, 196-211.
Journal of Verbal Learning and Verbal Behavior, 11, 566-572. Neill, W. T., & Westberry, R. L. (1987). Selective attention and the
Dong, T. (1972). Cued partial recall of categorized words. Journal of suppression of cognitive noise. Journal of Experimental Psychology:
Experimental Psychology, 93, 123-129. Learning, Memory, and Cognition, 13, 327-334.
Gardiner, J. M., Craik, F. I. M., & Bleasdale, F. A. (1973). Retrieval Nickerson, R. S. (1984). Retrieval inhibition from part-set cuing: A
difficulty and subsequent recall. Memory & Cognition, 1, 213-216. persisting enigma in memory research. Memory & Cognition, 12,
Gernsbacher, M. A., Barner, K. R., & Faust, M. E. (1990). Investigat- 531-552.
ing differences in general comprehension skill. Journal of Experimen- Postman, L., & Stark, K. (1969). The role of response availability in
tal Psychology: Learning, Memory, and Cognition, 16, 430-445. transfer and interference. Journal of Experimental Psychology, 79,
Gillund, G., & Shiffrin, R. M. (1984). A retrieval model for both 168-177.
recognition and recall. Psychological Review, 91, 1-67. Postman, L., Stark, K., & Fraser, J. (1968). Temporal changes in
Greeno, J. G., James, C. T., DaPolito, F., & Poison, P. G. (1978). interference. Journal of Verbal Learning and Verbal Behavior, 7,
Associative learning: A cognitive analysis. Englewood Cliffs, NJ: 672-694.
Prentice-Hall. Raaijmakers, J. G. W., & Shiffrin, R. M. (1981). Search of associative
Hogan, R. M., & Kintsch, W. (1971). Differential effects of study and memory. Psychological Review, 88, 93-134.
test trials on long-term recognition and recall. Journal of Verbal Ratcliff, R., Clark, S. E., & Shiffrin, R. M. (1990). The list-strength
Learning and Verbal Behavior, 10, 562-567. effect: I. Data and discussion. Journal of Experimental Psychology:
Jones, G. V. (1989). Back to Woodworth: Role of interlopers in the Learning, Memory, and Cognition, 16, 163-178.
tip-of-the-tongue phenomenon. Memory & Cognition, 17, 69-76. Reason, J. T., & Lucas, D. (1984). Using cognitive diaries to investi-
Karchmer, N. A., & Winograd, E. (1971). The effects of studying a gate naturally occurring memory blocks. In J. E. Harris & P. E.
1084 M. ANDERSON, R. BJORK, AND E. BJORK

Morris (Eds.), Everyday memory actions and absent-mindedness (pp. ist theory: Essays in honor of William K. Estes (pp. 227-248).
53-70). London: Academic Press. Hillsdale, NJ: Erlbaum.
Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian Smith, A. D. (1971). Output interference and organized recall from
conditioning: Variations in the effectiveness of reinforcement and long-term memory. Journal of Verbal Learning and Verbal Behavior,
nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical 10, 4(XM08.
conditioning II: Current theory and research (pp. 64-99). New York: Smith, A. D. (1973). Input order and output interference in organized
Appleton-Century-Crofts. recall. Journal of Experimental Psychology, 100, 147-150.
Riefer, D. M., & Batchelder, W. H. (1988). Multinomial modeling and Smith, A. D., D'Agostino, P. R., & Reid, L. S. (1970). Output
the measurement of cognitive processes. Psychological Review, 93, interference in long-term memory. Canadian Journal of Psychology,
318-339. 24, 85-87.
Roediger, H. L., III. (1973). Inhibition in recall from cueing with recall Solso, R. L., & Juel, C. L. (1980). Positional frequency and versatility
targets. Journal of Verbal Learning and Verbal Behavior, 12, 644-657. of bigrams for two- through nine-letter English words. Behavior
Roediger, H. L., III. (1974). Inhibiting effects of recall. Memory & Research Methods and Instrumentation, 12, 297-343.
Cognition, 2, 261-269. Todres, A. K., & Watkins, M. J. (1981). A part-set cuing effect in
Roediger, H. L., III. (1978). Recall as a self-limiting process. Memory recognition memory. Journal of Experimental Psychology: Human
& Cognition, 6, 54-63. Learning and Memory, 7, 91-99.
Roediger, H. L., Ill, & Neely, J. H. (1982). Retrieval blocks in episodic Tulving, E., & Arbuckle, T. Y. (1963). Sources of intratrial interfer-
and semantic memory. Canadian Journal of Psychology, 36(2), ence in paired-associate learning. Journal of Verbal Learning and
213-242. Verbal Behavior, 1, 321-334.
Roediger, H. L., Ill, & Schmidt, S. R. (1980). Output interference in Tulving, E., & Arbuckle, T. Y. (1966). Input and output interference in
the recall of categorized and paired associate lists. Journal of short-term associative memory. Journal of Experimental Psychology,
Experimental Psychology: Human learning and Memory, 6, 91-105. 72, 89-104.
Roediger, H. L., Ill, Stellon, C. C , & Tulving, E. (1977). Inhibition Walley, R. E., & Weiden, T. D. (1973). Lateral inhibition and cognitive
from part-list cues and rate of recall. Journal of Experimental masking: A neuropsychological theory of attention. Psychological
Psychology: Human Learning and Memory, 3, 174-188. Review, 80, 284-302.
Rundus, D. (1973). Negative effects of using list items as retrieval cues. Warren, R. E. (1977). Time and the spread of activation in memory.
Journal of Verbal Learning and Verbal Behavior, 12, 43-50. Journal of Experimental Psychology: Human Learning and Memory, 3,
Shapiro, S. I., & Palermo, D. S. (1970). Conceptual organization and 458-466.
class membership: Normative data for representatives of 100 catego- Watkins, M. J. (1975). Inhibition in recall with extralist "cues." Journal
ries. Psychonomic Monograph Supplements, 5(11, Whole No. 43). of Verbal Learning and Verbal Behavior, 14, 294-303.
Slamecka, N. J. (1975). Intralist cueing of recognition. Journal of Watkins, M. (1978). Engrams as cuegrams and forgetting as cue-
Verbal Learning and Verbal Behavior, 14, 630-637. overload: A cueing approach to the structure of memory. In C. R.
Sloman, S. A., Bower, G. H., & Roher, D. (1991). Congruency effects Puff (Ed.), The structure of memory (pp. 347-372). New York:
in part-list cuing inhibition. Journal of Experimental Psychology: Academic Press.
Learning, Memory, and Cognition, 17, 974-982. Webster's new collegiate dictionary. (1980). Springfield, MA: G. & C.
Sloman, S. A., & Rumelhart, D. E. (1992). Reducing interference in Merriam.
distributed memories through episodic gating. In A. Healy, S. Woodworth, R. S. (1938). Experimental psychology. New York: Henry
Kosslyn, & R. M. Shiffrin (Eds.), From learning theory to connection- Holt.
REMEMBERING CAUSES FORGETTING 1085

Appendix A

Numerical Examples of a Ratio-Rule Model

We provide several numerical examples of ratio-rule predictions for from a weak set having an initial strength of .2 would be incremented
the retrieval-practice paradigm. First, we show how the simplest to .3. Given proportionally equivalent strengthening for items in strong
formulation of the ratio rule predicts facilitation and impairment. We and weak sets, the reduction in target accessibility would be the same for
then extend the basic model to derive predictions for our taxonomic unpracticed items in either set (e.g., for the strong set, Nrp — R p - is:
frequency manipulation. [.4/(.4 + .4 + .4 + .4)] - [.4/(.4 + .4 + .4 + .6)] = .03; for the weak
set: [.2/(.2 + .2 + .2 + .2)] - [.2/(.2 + .2 + .2 + .3)] = .03). Superior
recovery probabilities for items in strong sets, when multiplied by a
Basic Model and an Example strong item's target-access probability, would increase the absolute
recall impairment expected for strong sets above that expected for
Assume that our categories are represented as a set of exemplars, weak sets (deficit in strong-item recall = [.25 x .4] - [.22 x .4] = .012;
each with a univalent association to the category cue. The simplest deficit in weak-item recall = [.25 x .2] - [.22 x .2] = .006). However,
ratio-rule equation for this representation would then express the regardless of the magnitude of the difference in recovery probabilities
probability of recalling an exemplar, given a category cue, in the across these sets, impairment for each set relative to its baseline should
following form: be proportionally equivalent (for strong items, proportional impair-
ment = .012/[.25 x .4] = .12; for weak items, .006/[.25 x .2] = .12).
P (El |C1) = S (Cl, El)/Sum (S (Cl, Ex)) If we revise the somewhat unrealistic assumption that learning rates
are proportionally equivalent across strong and weak items by assum-
In this equation, El is a particular exemplar; Cl is a particular ing that items increase by the same constant amount (e.g., retrieval
category; and S(C1, El) is the associative strength between category practice results in an increment of .1, regardless of an item's existing
Cl and El. Thus, the probability of recalling a particular exemplar, El, strength), or that growth in strength is a negatively accelerated
is governed by the ratio of that exemplar's associative strength to the function of current strength (as would be the case with linear operator
category cue, to the summed strengths of association of all exemplars models of learning, e.g., Bush & Mosteller, 1955; Rescorla & Wagner,
(Ex) to that cue. 1972), the proportional impairment should be less for strong items
To see why this equation predicts facilitation for practiced exem- than for weak items. This outcome obtains because weak items will
plars and impairment for unpracticed exemplars, consider a simple increase in strength to a proportionally greater degree than strong
four-member category, each exemplar having a cue-item associ- items. Because we know that proportionally equivalent strengthening
ative strength of .2. The probability of recalling an item from this leads to proportionally equivalent impairment, proportionally greater
set would then be proportional to the ratio of its own strength of facilitation for weak categories should lead to proportionally greater
association to the cue to those of all competitors' strengths impairment for weak items.
[.2/(.2 + .2 + .2 + .2) = .25]. If retrieval practice on two items from
this set increased their associative strengths, say, to .3, then for those two Extended Model With Extraexperimental Exemplars
practiced items we should observe facilitation [.3/(.2 + .2 + .3 + .3) = .3];
however, that same increase should result in impairment for the two items Suppose that each category has four strong and four weak exemplars
of that set that were not practiced [.2/(.2 + .2 + .3 + .3) = .2]. and that four are presented in the experiment and four remain as
extraexperimental exemplars. Suppose, also, that strong and weak
exemplars begin with extraexperimental strengths of .2 and .1, respec-
Extended Model With Examples tively, which are then incremented to .4 and .2 respectively upon their
presentation in the study list.A1 With these assumptions, the four
Because the basic model, as currently specified, incorrectly predicts category types in Experiment 3 can be represented with sets of eight
equal recall for items from strong and weak sets [e.g., strong: strengths—four experimental and four extraexperimental strengths:
.4/(.4 + .4 + .4 + .4) = .25, weak: .2/(.2 + .2 + .2 + .2) = .25], it must SS = (.4, .4, .4, .41.1, .1, .1, .1); SW = (.4, .4, .2, .21.2, .2, .1, .1); WS =
be modified so that recall probability is dependent on an item's (.2, .2, .4, .41.2, .2, .1, .1); and WW = (.2, .2, .2, .2 (.2, .2, .2, .2). Note
absolute strength as well as its relative strength. One way in which this that the SS and WW category types vary in the strengths of their
goal can be accomplished is to distinguish between trace-access respective extraexperimental items, whereas the SW and WS category
probability and response-recovery probability, the former governed by types do not.
the target item's relative strength and the latter by its absolute Under these assumptions, the ratio rule predicts that impairment
cue-target strength (see, e.g., Raaijmakers & Shiffrin, 1981). Thus, for strong categories should be proportionally greater than impair-
recall probability for a strong item would be its trace-access probability ment for weak categories. To see this, suppose that two items in each
multiplied by its response-recovery probability, which would result in SS and WW category are strengthened by 50% of their original
greater recall for items from strong sets than for items from weak sets
(e.g., from the previous example, .4 and .2 might be recovery
probabilities, yielding .25 x .4 = .10 vs. .25 x .2 = .05, for strong and A1
Note that this example assumes that the learning rates for strong
weak sets, respectively). and weak exemplars are proportionally equivalent, as discussed in the
To make predictions about the relative impairment for strong and previous section of Appendix A. Although this assumption is not
weak sets, we must specify both how retrieval practice increases reasonable given the wealth of data showing that learning rate is a
cue-target associative strengths across strong and weak sets and how negatively accelerating function of prior strength, this learning assump-
recovery probabilities differ across these sets. To simplify the analysis, tion is the one that is most consistent with the present pattern of
first suppose that retrieval practice increases cue-target associative facilitation for Rp+ items across strong and weak categories. Without
strengths to a proportionally equivalent degree across strong and weak this particular learning rate assumption, it is unclear whether the
sets. For example, an item in a four-item strong set having an initial ratio-rule model could account for the greater impairment of strong-
strength of .4 might be incremented by 50% to .6, in which case an item exemplar categories in the manner suggested in this section.
1086 M. ANDERSON, R. BJORK, AND E. BJORK

strengths; that is, to .6 and .3, respectively. The probability of recalling categories. However, proportional impairment for strong categories
an R p - item from a strong category would then become .4/2.4 x .4, or (.0173/.08 = .216) would also be greater than proportional impair-
.0627, whereas the probability of recalling a weak R p - item would ment for weak categories (.0028/.025 = .112). Thus, the relative
then become .2/1.8 x .2, or .0222. Relative to the baseline for strong impairment for strong and weak categories would depend on the
(.08) and weak (.025) categories, strong and weak R p - items would be composition of the extraexperimental set, given that we assume that
impaired by .0173 and .0028 respectively. Thus, absolute impairment subjects do not use experimental context as a retrieval cue to restrict
for strong categories would clearly be greater than that for weak memory search.

Appendix B

Categories and Exemplars Used in Experiments 1 and 2, Divided Into the Four
Practice Counterbalancing Sets (Al, A2, Bl, B2) and Sorted by Category
Composition (Strong or Weak)

Category Exemplar Set 1 Exemplar Set 2


Set A: Strong
Fruits Orange, nectarine, pineapple Banana, cantaloupe, lemon
Leather Saddle, gloves, wallet Shoes, belt, purse
Set A: Weak
Trees Palm, hickory, willow Poplar, sequoia, ash
Professions Tailor, florist, farmer Critic, grocer, clerk
Set B: Strong
Drinks Bourbon, scotch, tequila Brandy, gin, rum
Hobbies Gardening, coins, stamps Ceramics, biking, drawing
Set B: Weak
Metals Chrome, platinum, magnesium Mercury, pewter, tungsten
Weapons Hammer, fist, lance Rock, arrow, dagger
REMEMBERING CAUSES FORGETTING 1087

Appendix C

Categories From Experiment 3, With the 12 Exemplars From Each Category


Divided Into Four Subsets (SI, S2, Wl, and W2) and With the Categories Divided
Into Practice Counterbalancing Sets A and B

Category SI S2 Wl W2
Set A
Drinks Vodka Bourbon Sake Moonshine
Rum Ale Tequila Cognac
Gin Whiskey Drambuie Kahlua
Weapons Sword Bomb Arrow Nail
Rifle Pistol Dagger Foot
Tank Club Hatchet Lance
Fish Catfish Bluegill Walleye Yellowtail
Trout Flounder Snapper Muskie
Herring Guppy Angler Puffer
Fruits Tomato Orange Fig Coconut
Strawberry Lemon Mango Raisin
Banana Pineapple Nectarine Guava
SetB
Professions Engineer Nurse Veterinarian Critic
Accountant Plumber Janitor Investor
Dentist Farmer Gardener Soldier
Metals Iron Silver Francium Lithium
Aluminum Brass Tungsten Pewter
Nickel Gold Chrome Mercury
Trees Birch Elm Mimosa Palm
Hickory Spruce Cedar Willow
Dogwood Redwood Juniper Ash
Insects Beetle Fly Locust Tick
Roach Mosquito Weevil Cicada
Hornet Grasshopper Aphid Scorpion
Note. Assignments of subsets to Al, A2, Bl, and B2 are not shown. S = strong; W = weak.

Received March 12,1992


Revision received August 25,1993
Accepted October 12,1993

You might also like