Handbook of Prosody
Editors Carlos Gussenhoven and Aoju Chen
A project for Oxford University Press
The Handbook will be published as a single volume containing 60 chapters divided over
the following 14 sections:
I General
II Prosody and language structure
III Phonetic realization
IV Prosodic systems: Areal and genetic groupings
V Communicative effects of sentence prosody
VI The role of prosody in language processing
VII Prosody and first language acquisition
VIII Prosody and second language acquisition
IX Prosody in speech and language technology
X Prosody in sign language
XI Prosody and gesture
XII Language prosody and art forms
XIII Prosody in speech pathology
XIV Prosody and biology
Draft
The [Oxford] Handbook of Prosody
Edited by Carlos Gussenhoven and Aoju Chen
Surveys of prosody are rare. In part, this may be due to the relative difficulty of the
subject. Shifting conceptions and terminologies may indicate a lack of consensus about
basic issues to newcomers in the field and a confrontation with the variety of approaches
to the topic may well have the same effect. We believe, however, that the way today’s
researchers conceptualize the word and sentence prosodic structures in the languages of
the world and their place in discourse, processing, acquisition and language change
shows more coherence than differences in terminology and definition may suggest. In the
past decades, a number of theoretical breakthroughs can be identified that support us in
our attempt to present a series of concise and reliable accounts of the current
understanding in this field written by leading experts.
The first and arguably the most easily identifiable of these breakthroughs is the model of
phonology and phonetics presented by Janet Pierrehumbert in her 1980 dissertation and
its application to the intonation of English. It was not only influential generally in the
area, in particular in the way it shaped the conceptualization of phonetic implementation,
but was also important for increasing the comparability as well as the sheer number of
descriptions of intonation systems by laying out its descriptive framework. In addition,
this mainstream phonetics-phonology model effectively merged the study of intonation
and the study of tone, preserving one of the main achievements of Gösta Bruce’s 1977
dissertation (Ladd 2008). As a result of the increased research effort that followed, our
understanding of the phonological structure of tone and intonation and its typological
context has improved dramatically. This structure includes a series of incrementally
embedding consitituents which together form the prosodic hierarchy (Selkirk 1986,
Nespor & Vogel 2007). The vowels and consonants can be seen as forming a tier which
is parallel to a tier of tones, i.e. phonological elements notated as H, L, M. Lexical tones
participate in the phonological specification of morphemes which may also contain
vowels or consonants, while intonational tones have discoursal meanings and demarcate
prosodic phrases or appear in paradigms of pitch accents. The integration of lexical and
intonational tones, which interact with stress in various ways, has led to a large number
of typologically important descriptions.
Quite unlike what happened after the introduction of phonology as a separate discipline
in the early 20th century, the separation of phonetic implementation from phonological
representations by Pierrehumbert (1980) has had a fruitful effect on the integration of
phonetics and phonology. A great deal of attention has been given to the detailed
synchronization of tonal events with the segmental structure, and variation in pitch range
has been systematically investigated. Phonological elements are famously promiscuous
when it comes to their phonetic realization, and contrasts that are analysed as tonal may
have pitch features that are less salient than differences in voice quality or duration (e.g.
Brunelle 2012). Systematic variation in the realization of intonation has been found for
degrees of emphasis, size of the focus constituent, communicative intention, such as
interrogativity and focus meaning.
In the wake of the autosegmental-metrical model, there were two further developments.
The first of these is the expansion of the data base. The wider typological perspective has
led to a more realistic view of the high level of linguistic diversity in prosodic systems
and a realization that the intonational complexity of English is among the highest in the
world. Typological studies have gained in detail and there has been a move away from
labelling languages on the basis of a single linguistic property, as in Hyman’s work on
‘property-driven typology’ (2006, 2014). The other development concerned the increased
understanding of the communicative effects of prosody. In this area, a great deal remains
uncertain and controversial, but progress has clearly been made. Hypotheses about a
presumed universality in a relation between prosodic prominence and focus (e.g. Samek-
Lodovici 2005) are now competing with hypotheses that see Post-Focus Compression,
the reduction of the pitch span after the focused element which is commonly attested in
European and many Asian languages, as an areal feature (Xu, Chen & Wang 2012), while
doubts have been raised about the presumed universality of focus meanings and
categories (Matić & Wedgwood 2013). Research into the way focus is integrated in the
linguistic structure of English and other languages has accelerated (e.g. Katz & Selkirk
2011). The size of the focus constituent is seen as distinct from the kind of meaning
expressed by the focus (e.g. Elordieta & Irurtzun 2012). Near-universal paralinguistic
communication through vocal pitch is being treated as a separate system of
communication that interacts with linguistic intonational meanings (e.g. Chen 2005).
Affective meanings referring to the speaker have been investigated in the context of
prosodic convergence as an expression of solidarity or power difference, as well as in the
expression of value judgements, emotions and agonistic signals.
The expansion of the database is also evident from the impact on the acquisition of
prosody in a first or second language and the prosody of sign languages. Analyses of
children’s and L2 learners’ prosody have emerged for an increasing number of languages
within the autosegmental-metrical framework (e.g. Astruc et al. 2012). Together they
provide exciting insights into how L1 and L2 learners acquire the internal structure of
prosody and the phonetic realisation of phonological tonal categories, in addition to the
relevant communicative competence. Children’s production of word prosody, both with
stress and with lexical tone, has tradionally received a lot of attention in L1 acquisition
research. Across languages, infants have generally acquired the word prosodic system at
age 3, but continue to consolidate their compentence in various segmental and
suprasegmental contexts in the following years. A related line of research addresses how
word and sentence prosody facilitates the acquisition of other elements in the language,
such as words and word meanings as well as syntactic phrasing. Research on the
acquisition of L2 word prosody has centred on the acquisition of English word stress by
learners with various L1s and the acquisition of lexical tones by learners of both tonal
and non-tonal L1s. Perception of L2 word stress has been conducted from theperspective
of the stress “deafness” paradigm introduced by Emmanuel Dupoux and colleagues.
Studies on the production and perception of tone have shed light on how speakers of
tonal and nontonal languages differ in the production and perception of lexical tones and
how musical experience and auditory training can influence the acquisition of lexical
tones. Finally, earlier research on cry- and non-cry vocalisations by infants showed how
they modulate shape and pitch height to suit different contexts (e.g. presence vs. absence
of the mother) and express different needs at the age of 4 to 9 months. There would seem
to be a revival of this line of research which uses more sophisticated technologies,
showing among other things the language dependence of baby cries.
Considerable research efforts have been devoted to the unravelling of the phonological
component in sign languages. To a large extent, phonological units in signs have been
equated with phonological units in spoken languages, like features, segments and
syllables. Increasingly, attention has moved to prosodic phrasing, which has been shown
to be hierarchally organized, as in spoken languages. Manual and facial gestures as well
as body leans indicate meanings that are equatable with phonological phrasing and
melodic forms of spoken languages. The research is faced with challenges in the
separation of phonological and intonational phrases from syntactic units, as well as with
challenges in the separation of the expression of phonological units from paralinguistic
gesturing.
The third breakthrough is the emergence of new lines of research, in part as a result of a
rapid evolution of research methodologies and registration techniques, like eye tracking
and the registration of brain activity (e.g. EEG, MEG, fMRI). Earlier, psycholinguistic
research paradigms were used in the investigation of how prosodic cues (both at the
lexical and post-lexical levels) are used in spoken word recognition, whether and how
listeners capitalise on prosodic cues to resolve temporary syntactic and semantic
ambiguity as in cases like When he leaves the house is dark, and how pragmatic
appropriateness of prosody influences speech comprehension. More recent research on
prosodic processing approaches further issues, from a phonological as well as phonetic
perspective. Intonational categories, like accent location and type of accent have been
investigated as cues to how information is packaged using eye tracking. Other topics
include anticipatory prosodic processing (as reflected in online reference resolution), the
effect of prosody on the depth of semantic processing as evidenced by event-related
potentials and the lateralization of prosodic processing, as well as various issues that have
traditionally been addressed by means of offline perception studies and which have been
re-examined using EEG. These studies not only confirm findings from behavioural
research, but also provide valuable insights into the role of prosody in the integration of
incoming information in the evolving discourse. While prosodic processing has mainly
been studied in the cerebrum, research measuring brain stem responses to pitch show that
subcortical processing of pitch is modulated by both (short and long term) language and
music experience. A final area in which progress is recent is that of visual prosody.
Manual and facial gestures are used to communicate several kinds of information, from
joy to disappointment and sadness, while at the same time they may reinforce meanings
that are also communicated in spoken language. Besides, visual cues may reveal
unintended information, like sincerity or the lack thereof, or pathologies. The synthesis of
speech and visual prosody has become an important issue in robotics.
Importantly, the Handbook aims to highlight the multidisciplinarity of the research on
prosody by including a series of overviews of prosodic research in other fields. First,
there is a long history of research on the way poets match language to the metrical
schemes of their poetry, one landmark publication being Halle & Keyser (1971), on
Chaucer’s iambic pentameter in particular. The topic continues to attract the attention of
phonologists (e.g. Dresher & Fridberg 2006; Hayes, Wilson & Shisko 2012). Currently,
research on the way music constrains the use of lexical tone is undergoing a revival, as
evident from work on several Asian and African languages. Progress has also been made
in understanding the cultural relation between language and music, for instance in the
correlation between linguistic and musical pitch jumps, as well as in their cognitive
relatedness (Patel 2008). Next, language and speech technology is an area in which the
position of prosody, like that of linguistic descriptions generally, has had its ups and
downs. More recently, the trend has been ‘up’. One consideration is the level at which
structural prosodic information is called upon in the ASR and speech synthesis. Some
models of pitch contours focus on the extraction of parameters that will reproduce an
original contour, while others focus on modelling the generative system of linguistic
elements that is held responsible for pitch contours. Automatic labelling systems may
show a similar range of more or less surface elements. A third area is speech and
language pathology. Aphasic conditions have shown that prosody is a separate at-risk
component in language pathology caused by brain damage. Pathological prosody in
patients with intact language may in fact be hard to define in speakers of European
languages. Conversely, prosody can appear to be fully intact in patients with severe
language impairments, to the extent that it may seem the only element left in speech
production. There are also indications that autism may be cued early on by prosodic
features, suggesting a promising line of research into longitudinal monitoring of prosodic
development. Finally, an interest has arisen in the place of prosody in language evolution
and genetics.
Our aim is to solicit brief chapters in the area of language structure that deal with topics
in an expository fashion. Equally, disciplinary chapters ideally aim to present summaries
of the understanding that has been reached in the field, as opposed to presenting lists of
research findings. Without implying that the autosegmental-metrical theory should be the
starting point for all prosodic research or that its assumptions are the final word on the
subject, the model may serve as an overarching theoretical frame of reference throughout
the Handbook. In addition, we will attempt to solicit chapters from authors who, while
being theoretically responsible, are committed to a methodologically sound, empirical
foundation in their research. In doing so, we intend to foster the transparency of research
achievements by different disciplinary groups and stimulate communication across these
groups.
Readership
The Handbook is intended as a source of reference and course readings in all research
fields in which language prosody plays a role.
References
Astruc, Lluïsa; Payne, Elinor; Post, Brechtje; Vanrell, Maria del Mar; Prieto, Pilar. 2012.
Tonal targets in early child English, Spanish, and Catalan. Language and Speech
56: 229-253.
Bruce, Gösta. 1977. Swedish word accents in sentence perspective. Lund: Liber
Läromedel.
Brunelle, Marc. 2012. Dialect experience and perceptual integrality in phonological
registers: Fundamental frequency, voice quality and the first formant in Cham.
Journal of the Acoustical Society of America 131 (4): 3088-3102.
Chen, Aoju. 2005. Universal and language-specific perception of paralinguistic
intonational meaning. PhD dissertation. Utrecht: LOT. ISBN 90-76864-69-1.
Dresher, B. Elan; Fridberg, Nancy. 2006. Formal approaches to poetry: Recent
developments in metrics. Berlin/ New York: de Gruyter Mouton.
Elordieta, Gorka; Irurtzun, Aritz. 2010. The relationship between meaning and
intonation in non-exhaustive answers: Evidence from Basque. The Linguistic
Review 27, 261–291
Halle, Morris; Keyser, Samuel J. 1971. English Stress: Its Forms, Its Growth, and Its
Role in Verse. New York: Harper & Row.
Hayes, Bruce; Wilson, Colin; Shisko, Anne. 2012. Maxent Grammars for the Metrics of
Shakespeare and Milton. Language 88: 691-731.
Hyman, Larry M. 2006. Word-prosodic typology. Phonology 23: 225-257.
Hyman, Larry M. 2014. What is phonological typology? UC Berkeley Phonology Lab
Annual Report (2014): 101-118.
Katz, Jonah; Selkirk, Elisabeth O. 2011. Contrastive focus vs. discourse-new: Evidence
from phonetic prominence in English. Language 87: 771-816.
Ladd, D. Robert. 2008. Intonational Phonology. Cambridge University Press.
Matić, Dejan; Wedgwood, Daniel 2013. The meanings of focus: The significance of an
interpretation-based category in cross-linguistic analysis. Journal of Linguistics 49:
127-163.
Nespor, Marina; Vogel, Irene. 1986, 2007. Prosodic Phonology. Berlin/New York:
Mouton de Gruyter (2nd ed.; 1st ed. Dordrecht: Foris).
Patel, Aniruddh D. 2008. Music, Language and the Brain. New York: Oxford University
Press.
Pierrehumbert, Janet B. 1980. The Phonetics and Phonology of English Intonation. PhD
disertation, MIT. Distributed by Indiana University Linguistics Club.
Prieto, Pilar; Estrella, A.; Thorson, J.; Vanrell, Maria M. 2012. Is prosodic development
correlated with grammatical development? Evidence from emerging intonation in
Catalan and Spanish. Journal of Child Language 39: 221-257.
Selkirk, Elisabeth O. (1984). Phonology and Syntax: The Relation between Sound and
Structure. Cambridge: MIT Press.
Samek-Lodovici, Vieri. 2005. Prosody-syntax interaction in the expression of focus.
Natural Language and Linguistic Theory, 23, 687-755.
Xu, Yi; Chen, Szu-wei; Bei Wang. 2012. Prosodic focus with and without post-focus
compression: A typological divide within the same language family? The Linguistic
Review 29: 131 - 147.