13.
LANGUAGE PRODUCTION
                                              the earliest models of how we produce
LEXICALIZATION                                words and why we make word
Lexicalization is the process in speech       substitutions. They observed that there
production whereby we turn the thoughts       were two distinct types of whole word
underlying words into sounds: We              substitution speech error: semantic
translate a semantic representation (the      substitutions, such as examples (29) and
meaning) of a content word into its           (30), and form-based substitutions, such
phonological representation of form (its      as examples (31) and (32). Form-based
sound).                                       word substitutions are sometimes called
                                              phonologically related word substitution
There is widespread agreement that            errors or malapropisms. (The word
lexicalization is a two-stage process, with   “malapropism” originally came from a
the first stage being meaning-based, and      character called Mrs. Malaprop in
the second phonologically based. When         Sheridan’s play The Rivals, who was
we produce a word, we first go from the       always using words incorrectly, such as
semantic level to an intermediate level of    saying “reprehend” for “apprehend” and
individual words. Choosing the word is        “epitaphs” for “epithets.” Note that while
called lexical selection. We then retrieve    Mrs.     Malaprop       produced      these
the phonological forms of these words in      substitutions out of ignorance, the term is
a stage of phonological encoding.             used       slightly      confusingly      in
    According to the best known lemma         psycholinguistics to refer to errors where
theory (e.g., Levelt, 1989), each word is     the speaker knows perfectly well what the
represented by a lemma. Lemmas are            target should be.)
specified syntactically and semantically
but not phonologically. The stage of               (29)      fingers → toes (semantic)
specifying in a pre-phonological, abstract         (30)      husband → wife (semantic)
way the word that we are just about to say         (31)      equivalent                →
is called lemma selection; the second              equivocal(form        based         or
stage of specifying the actual concrete            phonologically related)
phonological form of the word is called            (32)      historical → hysterical
lexeme or phonological form selection              (form base, phonologically related)
(see Figure 13.3).
                                                  The important idea of Fay and
                                              Cutler’s model is that phonological and
         Two-stage model of lexicalization
                                              semantic word substitutions happen as a
              Conceptual representation       result of mistakes in the word retrieval
                                              process.
                       Lemma
                                                  In general, in the two-stage model
                                              semantic and phonological substitutions
                                              occur at different levels. The Fay and
               Phonological word form
                                              Cutler (1977) model predicts that semantic
                                              and phonological processes should be
FIGURE 13.3                                   independent.
                                              Experimental evidence
Evidence from speech errors
                                              The earliest experimental evidence for
Fay and Cutler (1977) presented one of
                                              the division of lexical access into two
                                                  13. LANGUAGE PRODUCTION
stages came from studies of the              pictures      which     have    superimposed
description of simple scenes (Kempen &       distractor words that they have to ignore;
Huijbers, 1983). They analyzed the time      naming times are longer when the picture
people take before they start speaking       and the word are related. The distractors
when describing these scenes, and            lead to the activation of semantic
argued that people do not start speaking     competitors that slow down the selection
until the content to be expressed has        of the lexical target. In the related word
been fully identified. The selection of      translation task, semantically related
several lemmas for a multiword sentence      words induce semantic interference;
can take place simultaneously. We cannot     however,       related    pictures     produce
produce the first word of an utterance       facilitation (Bloem & La Heij, 2003). The
until we have accessed all the lemmas (at    SOA is, however, critical; if the interfering
least for these short utterances) and at     words are presented 200 ms after the
least the first phonological word form.      target, we observe semantic interference,
Individual word difficulty affects only      but if they are presented 400 ms before
word form retrieval times.                   the     target,    we     observe     semantic
     Early semantic activation:              facilitation (Bloem, van den Boogaard, &
     Evidence for a phase of early           La Heij, 2004). Bloem and La Heij
semantic activation in lexical selection     proposed a model of lexical access in
and a later phase of phonological            which semantic facilitation is localized at
activation in phonological encoding          the       conceptual       level,     semantic
comes from picture–word interference         interference is localized at the lexical level,
studies (Levelt et al., 1991a; Schriefers,   and only one concept is selected for
Meyer,      &   Levelt,    1990).    These   lexicalization. They called this the
experiments, discussed in more detail        Conceptual Selection Model (CSM). They
later in the section on the time course of   account for the effects of SOA with the
lexicalization,   used   a    picture–word   assumption that lexical representations
interference      paradigm      in   which   decay         faster     than       conceptual
participants see pictures that they have     representations.
to name as quickly as possible. At about          Whether or not we observe facilitation
the same time they are given an auditorily   or     inhibition    in   the     picture–word
presented word for which they have to        interference paradigm depends on the
make a lexical decision. Words prime         details of the experimental set-up. In the
semantic neighbors early on, whereas late    most famous example of picture– word
on they prime phonological neighbors.        interference, the Stroop task (naming the
This suggests that there is an early stage   color in which a word is printed when the
when semantic candidates are active (this    word spells out a color name), there is
is the lemma stage), and a late stage when   striking inhibition. Usually we find
phonological forms are active.               interference with semantically related
     The semantic-interference paradigm      pairs from the same category, and
provides evidence for two stages, and        facilitation with phonologically related
furthermore, that the lexical items          pairs. Schriefers et al. (1990) found that
activated by the first stage compete         inhibition disappears if participants have to
against each other (Starreveld & La Heij,    press buttons instead of naming pictures,
1995, 1996). In semantic-interference        suggesting that the interference reflects
studies, participants have to name           competition among lexical items at the
                                                  13. LANGUAGE PRODUCTION
stage of lemma selection. The details of
the task and the timings involved are also
critical (Bloem & La Heij, 2003; Bloem et
al., 2004).
Evidence from neuroscience
(important; students task)
Different regions of the brain become
activated in sequence as we produce
words (Indefrey & Levelt, 2000, 2004).
Conceptual selection of a word in picture
naming is associated with activation of the
mid-part of the left middle temporal gyrus;
accessing a word’s phonological code is
associated with activation of Wernicke’s
area; and phonological encoding, in terms
of the preparation of syllables, sounds,
and the prosody of the word, is associated
with activation around Broca’s area. As we
shall see, lesions to these areas lead to
different types of impairment to word
naming, with damage to more posterior
regions of the brain resulting in difficulty in
accessing the meanings of words, and
damage to more frontal regions resulting
in difficulty in accessing the sounds of
words. A survey of the imaging literature
also reveals the timings of word retrieval in
naming an object (Indefrey & Levelt,
2004): Visual and conceptual processing
take on average 175 ms; the best-fitting
lexical item, or lemma, is retrieved
between 150 and 225 ms; the
phonological representations are retrieved
between 250 and 330 ms; and the details
of the sounds of the word at around 450
ms (see Figure 13.5).
    Electrophysiological evidence also
supports the two-stage model (van
Turenout, Hagoort, & Brown, 1998). Dutch-
speaking participants were shown colored
pictures and had to name them with a
simple noun phrase (e.g., “red table”). At
the same time the participants had to push
buttons depending on the grammatical
gender of the noun, and
                                                                       Picture 0 ms
                                                                 ↓ Conceptual preparation
                                                                     Lexical concept
                                                                        175 ms
                                                                    ↓ Lemma retrieval
                                                                     Multiple lemmas
                                                                    ↓ Lemma selection
                                                                                                Self-monitoring
             400–600
                                                                   Target lemma 250 ms
                           275–400
                                  200–400                       ↓ Phonological code retrieval
                                                                   Lexical phonological
                       150–225                                        output code
                                                                   ↓ Segmental spell-out
         L                                                           Segments 350 ms
                                                                     ↓ Syllabification
                                                                 Phonological word 455 ms
                                                                   ↓ Phonetic encoding
                                                                 Articulatory scores 600 ms
                                                                            ↓
                                                                       Articulation
FIGURE 13.5 Time taken (in ms) for different processes to occur in picture naming. The specific
processes are shown on the right and the relevant brain regions are shown on the left. Reprinted
from Indefrey and Levelt (2004).
                                                                           PET scans of human
                                                                           brain areas which are
                                                                           active while speaking
                                                                           and listening. Top
                                                                           left—monitoring
                                                                           imagined speech lights
                                                                           up the auditory cortex.
                                                                           Top right—working out
                                                                           the meaning of heard
                                                                           words activates other
                                                                           areas of the temporal
                                                                           lobe. Bottom
                                                                           left—repeating words
                                                                13. LANGUAGE PRODUCTION 5
   activates Wernicke’s area for language comprehension (right), Broca’s area for speech
   generation (left), and a motor region producing speech. Bottom right— monitoring speech
   activates the auditory cortex.
on whether or not it began with a                Stop and try to name the item defined by
particular sound. The electrophysiological       (33). You may experience a TOT.
data for the preparation of the motor                Example (33) defines the word
movements suggested that the syntactic           “sextant.” Brown and McNeill found that a
properties were accessed before the              proportion of the participants will be
phonological information. However, the           placed in a TOT state by this task.
time delay between the two was very              Furthermore, they found that lexical
short—in the order of 40 ms.                     retrieval is not an all-or-none affair.
                                                 Partial information, such as the number of
Evidence from the tip-of-the-tongue              syllables, the initial letter or sound, and
                                                 the stress pattern, can be retrieved.
phenomenon                                       Participants also often output near
The tip-of-the-tongue (TOT) state is a           phonological neighbors like “secant,”
noticeable temporary difficulty in lexical       “sextet,” and “sexton.” These other words
access. It is an extreme form of a pause,        that come to mind are called interlopers.
where the word takes a noticeable time to        TOTs show us that we can be aware of
come out (sometimes several weeks!).             the meaning of a word without being
You are almost certainly familiar with this      aware of its component sounds; and
phenomenon: You know that you know               furthermore,         that       phonological
what the word is, yet you are unable to get      representations are not unitary entities.
the sounds out. TOTs are accompanied                 There are two theories of the origin of
by strong “feelings of knowing” what the         TOTs. These are called the partial
word is. They appear to be universal; they       activation and blocking (or interference)
have even been observed in children as           hypotheses. Brown (1970) first proposed
young as 2 (Elbers, 1985). The incidence         the partial activation hypothesis. This says
of TOTs increases with old age (Burke,           that the target items are inaccessible
MacKay, Worthley, & Wade, 1991), and             because they are only weakly represented
TOTS are more common in bilingual                in the system. Burke et al. (1991) provided
speakers (Gollan & Acenas, 2004; Gollan          evidence in favor of this model from both
& Brown, 2006). They appear to be                an experimental and a diary study
universal; deaf speakers experience “tip-        involving a group of young and old
of-the-finger”     states      (Thompson,        participants. They
Emmorey, & Gollan, 2005).
     Brown and McNeill (1966) were the
first to examine the TOT state
experimentally. They induced TOTs in
participants by reading them definitions of
low-frequency words, such as (33):
(33) “A navigational instrument used in
      measuring      angular     distances,
      especially the altitude of the sun,
      moon, and stars at sea.”
                                                               13. LANGUAGE PRODUCTION 6
The tip-of-the-tongue (TOT) state is an extreme form of a pause, where the word takes a noticeable time to
come out.
argued that the retrieval deficit involves weak links between the semantic and the phonological
systems: there is a transmission deficit in getting between the two. A broadly similar approach by
Harley and MacAndrew (1992) localized the deficit within a two-stage model of lexical access,
between the abstract lexical units and the phonological forms. At first sight Kohn et al. (1987)
provided evidence contrary to the partial activation hypothesis in the form of a free association task.
They showed that the partial information provided by participants does not in time narrow or
converge on the target. However, A. S. Brown (1991) pointed out that participants might not say out
loud the interlopers in the order in which they came to mind. Furthermore, in a noisy system there is
no reason why each attempt at retrieval should give the same incorrect answer.
    Further evidence that TOTs are associated with a difficulty in retrieving the phonological forms
of words comes from brain imaging. Shafto, Burke, Stamatakis, Tam, and Tyler (2007) had people
aged 19–88 name pictures of famous people. The number of TOTs increased with age and with
atrophy of the left insula, a region of the brain known to be involved (among other things) in
phonological production.
Problems with the lemma model
Although most researchers favor the two-stage model of lexicalization, there is less agreement on
the need for lemmas as a level of amodal, syntactically specified representations mediating between
concepts and phonological forms (Caramazza, 1997; Caramazza & Miozzo, 1997, 1998; Miozzo &
Caramazza, 1997).
    One point is that it is not clear that the need for lemmas is strongly motivated by the data. Most
of the evidence really only demands a distinction between the semantic and the phonological levels.
The strongest evidence for lemmas comes from the finding that gender can be retrieved when in the
tip-of-the-tongue state, although this interpretation has been disputed. It should not be possible to
retrieve phonological information for a word without retrieving the syntactic information for that
word such as gender, as the phonological stage can only be reached through the lemma stage. Tip-
of-thetongue data suggest, however, that syntactic
                                                            13. LANGUAGE PRODUCTION 7
and phonological information are independent (Caramazza & Miozzo, 1997, 1998; Miozzo &
Caramazza, 1997): Italian speakers can sometimes retrieve partial phonological information when
they cannot retrieve the gender of the word, and vice versa. Importantly, there was no correlation
between the retrieval of gender and phonological information; people are no better at recalling
gender when they correctly recall the initial phoneme of the target in a TOT state than when they
fail to do so. Hence, phonological retrieval does not necessarily depend on syntactic retrieval, and
therefore these results do not support the idea of syntactic mediation. Arguing that lemmas are
unnecessary complications, Caramazza (1997) dispenses with them. He proposes that lexical
access in production involves the interaction of a semantic network, a syntactic network, and
phonological forms (see Figure 13.6). Semantic representations activate both appropriate nodes in
the syntactic network and the phonological network.
Is lexicalization interactive?
Given that there are two stages involved in lexicalization, how do they relate to each other?
Interaction involves the influence of one level of processing on the operation of another. It
comprises two ideas. First, there is the notion of temporal discreteness.
Dell’s (1986) interactive model of speech production
Dell (1986) proposed an interactive model of lexicalization based on the mechanism of spreading
activation. Items are slotted into frames at each level of processing. Processing units specify the
syntactic, morphological, and phonological properties of words. Activation spreads down from the
sentence level, where items are coded for syntactic properties, through a morphological level, to a
phonological level. At each level, the most highly activated item is inserted into the currently active
slot in the frame. For example, the sentence frame might be quantifier–noun–verb. The
morphological frame might be stem plus affix. The phonological frame might be onset–nucleus–coda.
The final output is a series of phonemes coded for position (e.g., /s/ in word-onset position). The
flow of activation throughout the network is timedependent, so that the first noun in a sentence is
activated before the second noun.
    The model (see Figure 13.9) gives a good account of speech errors. Several units may be active
at each level of representation at any one time. If there is sufficient random noise an item might be
substituted for another one. As items are coded for syntactic category and position in a word, the
other units that are active at any one time tend to be similar to the target in these respects. There is
feedback between levels. The feedback between the phonological and lexical levels gives rise to
lexical bias and similarity constraints.
    A related issue that has recently arisen is the degree to which there is competition within a level
between similar units. Recall that in the IAC model of letter recognition there are withinlevel
inhibitory links leading to competition between similar units. The key issue therefore is whether the
time to produce a word is affected by the activation of similar words. This issue is currently
unresolved, with some researchers arguing for competition, others against it, while yet others claim
that the data can be accounted for by an internal monitor checking planned productions against
internal goals (Dhooge & Hartsuiker, 2012; Melinger & Rahman, 2013).