0% found this document useful (0 votes)
13 views6 pages

157 Chenyue 2

This conference paper investigates the functional load of prosodic boundaries in Chinese speech, particularly under conditions of reduced syllable information. The study employs a novel method to quantify the importance of prosodic boundaries and reveals that their significance increases when syllable information is merged, aiding in semantic disambiguation. Results indicate that different levels of prosodic boundaries exhibit varying functional loads, highlighting their crucial role in effective communication.

Uploaded by

白文發
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views6 pages

157 Chenyue 2

This conference paper investigates the functional load of prosodic boundaries in Chinese speech, particularly under conditions of reduced syllable information. The study employs a novel method to quantify the importance of prosodic boundaries and reveals that their significance increases when syllable information is merged, aiding in semantic disambiguation. Results indicate that different levels of prosodic boundaries exhibit varying functional loads, highlighting their crucial role in effective communication.

Uploaded by

白文發
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/316908651

A study on functional load of Chinese prosodic boundaries under reduction of


syllable information

Conference Paper · October 2016


DOI: 10.1109/ISCSLP.2016.7918470

CITATION READS

1 85

4 authors:

Yue Chen Yanlu Xie


Beijing Language and Culture University Beijing Language and Culture University
2 PUBLICATIONS 4 CITATIONS 77 PUBLICATIONS 355 CITATIONS

SEE PROFILE SEE PROFILE

Wu Bin Jinsong Zhang


Beijing Language and Culture University Beijing Language and Culture University
9 PUBLICATIONS 42 CITATIONS 189 PUBLICATIONS 1,201 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Jinsong Zhang on 02 April 2020.

The user has requested enhancement of the downloaded file.


A Study on Functional Load of Chinese Prosodic Boundaries under Reduction of
Syllable Information
Yue Chen, Yanlu Xie, Bin Wu, Jinsong Zhang*

College of Information Sciences, Beijing Language and Culture University, Beijing 100083, China
chenyue_blcu93@126.com, xieyanlu@blcu.edu.cn,
wubin7416850@outlook.com, jinsong.zhang@blcu.edu.cn

contribution in communication and should be paid more


Abstract attention in linguistic research.
Prosodic boundaries play an important role in intelligibility and Among those linguistic events, prosodic boundaries give
naturalness of speech. It is an interesting topic to find a rise to segmentations of the speech chain into little chunks and
quantitative measurement of their importance. Previous studies make it convenient for people to organize, produce and perceive
have quantitatively measured the importance of prosodic language. The segmentation of chunks plays an important role
boundaries based on functional loads (FLs). However, early in intelligibility and naturalness of speech [5]. Selkirk et al.
study of estimating the information contribution of prosodic divided the prosodic boundaries into different levels based on
boundaries was under the hypothesis that all other linguistic perception experiments [6, 7].
events were exactly correct. In real speech communication, Modeling the boundaries can be useful to process human
speakers rarely clearly articulate every phonetic event, but languages with machine. So far, there are many studies about
listeners can correctively recognize their meaning with the help prosodic boundaries’ analysis and prediction [8, 9], but few
of context. In order to explore the information contributions of quantitatively measure the importance of prosodic boundaries
prosodic boundaries in real communication and investigate how in communication. Zhang et al. [10] proposed a novel method
phonetic segments affect that contributions, this paper merges to estimate functional loads (FLs) of prosodic boundaries based
tones and Initials to simulate reduction of syllable information on the mutual information (MI) of text transcriptions and their
in real speech communication to some extent, observe the phoneme representations. They found that prosodic boundaries
distributions of FLs of prosodic boundaries and the changes of carry information and much more than phonetic segments.
FLs under the reduction of syllable information. The results
show that, prosodic boundaries in different levels have different However, early study of estimating the information
FLs. When syllable information (tones or Initials) being contribution of prosodic boundaries was under the hypothesis
merged, FLs of prosodic boundaries increase. The information that all other linguistic events were exactly correct. In real
contributions of prosodic boundaries become more obvious in communication, speakers rarely clearly articulate every
disambiguating sentence meanings. Merging of tones and phonetic events such as tones and Initials and cannot transmit
Initials have different effects on prosodic boundaries in information to listeners accurately, but listeners can clearly
transmitting information. recognize their meaning over the context. It may because of
language redundancy [11]. In a language, two or more of its
Index Terms: prosodic boundaries, functional load, tones, features serve the same function. When some syllable
Initials information lost, prosodic boundaries may show a greater role
in semantic disambiguating. We want to explore the
1. Introduction information contributions of prosodic boundaries in common
communication.
In speech communication, different linguistic events including
segmental phonemes and supra-segmental ones such as This paper aimed at studying further about how prosodic
phonemes, tones, prosodic phrasing structure and so on are all boundaries functions in communications, and how merger of
used for information communication. Either segmental phonetic segments affects information contributions of prosodic
phonemes or supra-segmental ones offer phonetic information boundaries communicatively. We merged tones and some
and help to reduce the uncertainty in language. Those linguistic Initials to simulate mispronunciation in real communication, to
events are coding methods within a language system and been observe the changes of distribution of prosodic boundaries’ FLs
decoded by listeners. In a language, information in a sentence before and after the mergers. Here we measured FL of every
is not only from every single linguistic event but also over and single boundary instead of estimating the FLs of boundaries
above that which is essential. when they all worked together [10] to see how a single
boundary works and the distribution of FLs of boundaries of
It is an interesting topic to find a quantitative measurement
different levels.
of importance of linguistic events. In early studies, the
information contributions of phonetic events was quantitatively The following is organized as follows: Section 2 introduces
measured using Functional Loads [1-4]. The measurement of the method to estimate FL of boundary. Section 3 describes the
FLs provide a quantified way to order any phonetic contrasts in experimental setups and results. Section 4 concludes the study
a language which can be applied to many domains of research and suggests future directions.
like speech recognition, language acquisition and so on.
Linguistic events with higher FLs have greater information
2. FLs of Prosodic Boundaries 3. Experiment and result
2.1. Mutual information (MI) of text transcriptions 3.1. Experimental data
and phoneme representations The training corpus is from the People Daily newspaper, and we
The mutual information (MI) quantifies the "amount of used SRILM toolkit to train a bigram and a trigram word based
information" obtained about one random variable, through the language model (LM) after word segmentation. Our test
other random variable. The mutual information of text (W) and corpus’s segmentation and WHG are all based on that LM.
phonemes (F) is defined as MI (W, F): The testing corpus is a Chinese natural speech corpus
ASCCD. It has a total of 8768 Chinese characters, and was
MI (W , F ) = H ( F ) − H ( F | W ) (1) converted to pinyin transcriptions. After the word segmentation,
the prosodic boundaries in corpus was labeled 1, 2, 3 based on
The word sequence W1,W 2 ,...,W m of phoneme transcription C-TOBI [12] that we call prosodic boundaries and the syllable
are from WHG, and the MI (W, F) can be calculated as: boundaries within prosodic words was labeled 0 which we call
non-boundary. The labels are shown in Table 1. After all, we
n
1 got 422 sentences, 5162 NB, 1893 PWB, 692 PPB and 599 IPB.
n 
MI (W , F ) = - log P(Wi )
i =1
(2)
Table 1. The labels of boundaries.

Here W i stands for all the word sequences that have the label Level
same phoneme transcription. The bigger MI is, the closer W and 0 Non-boundaries(NB)
F are, and more certainty the F has.
1 Prosodics word boundaries (PWB)
2.2. Word Hypothesis Graph (WHG) 2 Prosodic phrase boundaries (PPB)
3 Intonation phrase boundaries (IPB)

3.2. Experimental design


In this paper, we firstly used the previous method to estimate
FLs of some lexical tones and prosodic boundaries. Then, we
used our new method to estimate FLs of every single syllable
boundary. Here we merged tones and Initials respectively to
simulate mispronunciation of tones and Initials in real speech
communication. We hope to see how the FLs of boundaries in
different levels are distributed and how the merger of syllable
information affects the FLs of boundaries. The control variables
Figure 1: An example of a partial WHG for the phonetic are shown in details in Table 2. The Initial pairs for merger are
transcription “bu ru dong wu”. classified by phonological features, such as nasality, aspiration
and so on [9]. When we merged tones, we removed all the tones.
In Figure 1, all the W’ have the same phoneme transcription F. Table 2. Merger groups of phonetic contrasts.
The more W’ the F has, the more uncertainty the utterance has,
and the smaller MI is.
Contract Merger groups
2.3. FLs of prosodic boundaries (b m) (d n)
(b p) (d t) (c z) (j q) (ch zh) (g k)
In previous study, FL based on MI was defined as: Initial (f b) (d z s) (q x) (zh sh) (g h)
MI(W,F)− MI(W,Fα ) (b d g) (z j zh) (p t k) (m n) (c q
F (α) = (3) ch) (f s x h sh)
MI(W,F) Tone (Tone 1,2,3,4,5)
Where α represents the merging of phonemic contrast. FL
Table 3. Set ups for estimating FLs of prosodic
is based on a relative loss of mutual information before and after
boundaries
merger of α . In this paper, FL is defined as follow:
MI(W,Fα ) − MI(W,F) No. Tone Initial LM
F (α) = (4)
MI(W,F) 1 N N
Here α represent prosodic boundary. When boundary 2 N M
bigram
appears, the WHG will decrease into fewer W’ and MI will 3 M N
increase, uncertainty of F decrease. The FL represents the 4 M M
reduction of uncertainty by prosodic boundaries.
Different from method in early study, in this paper, we
Table 3 shows the control group in experiments. M means
estimated the FL of every single syllable boundary instead of
merger and N means no merger. The first group has no merger
estimating FLs of boundaries in the same level overall,
of syllable information and the forth group merged both tones
and Initials. We estimated FLs based on both bigram and The quantitative values of boundaries’ FLs are restricted to
trigram LM and they had similar results, so here just shows the other linguistic information in the sentence and the LM to a
results using bigram LM. large extent. When other phonetic information is exactly right,
due to the redundancy of language, different linguistic events
3.3. Results and analysis may overlap their information contributions, and the
importance of boundaries cannot be shown clearly and the
3.3.1. Comparison between means of FLs of different quantitative value of FL maybe zero. But it doesn’t mean the
kinds of boundaries boundaries don’t carry information.
In this experiment, we firstly used the previous method to No Merger
18
estimate the FLs when the boundaries in the same level worked
16
together. Figure 2 shows FLs of prosodic boundaries, together 14
with lexical tone pairs of Tone 1 and 4, Tone 2 and 4. 12
We can see from Figure 2 that, in the corpus, both prosodic 10 1

Frequence
boundaries and tones carry information, and boundaries carries 8
more. 6 2
4
2 3
0

FLs

Figure 3: Distribution of boundaries’ FLs without


merger of syllable information

Figure 3 shows the distribution of boundaries’ FLs without


merger of syllable information. The quantity of boundaries in
different level had a distinct difference and there were many
boundaries that FLs’ values equaled zero, so we chose 599
Figure 2: FLs of prosodic boundaries and partial tone samples for each kind of boundaries to analysis and in the
pairs. figure, we didn’t show the boundaries that FL’s value equals
zero. So did the figures below.
Then we figured out every single boundary’s FL. Means of When tones were merged, most of the FLs of boundaries
FLs of boundaries in different levels and different conditions had different degree of rise and the percentages of boundaries
are shown in table 4. We can see that, under all conditions, the that FLs are zero in different levels had about 30% reduction.
values of FLs of non-boundaries are bigger than prosodic
boundaries. We conducted statistical analysis and found that the Table 5. Percentage of boundaries that FLs are zero
difference between FLs of non-boundaries and boundaries is in different levels.
statistically. And that for boundaries of different levels is not.
Although FLs of boundaries in different levels don’t have Before merger After merger
Label
significant differences, the means of them are very different. of tones of tones
The bigger the label is, the smaller the mean of FL is, and the 1 0.875858 0.597993
smaller the contribution of boundary is. It is the same as results 2 0.867052 0.559249
in Figure 2.
3 0.899833 0.597663
Table 4. Means of FLs of boundaries in different levels
and different conditions
Merger of Tones
60
No Tone Initial both
Label
merger merger merger merger 50
0 0.06107 0.04507 0.05602 0.038345
40
1
Frequence

1 0.00278 0.00214 0.00261 0.00206 30


2 0.00110 0.00132 0.00110 0.00131 20 2
3 0.00013 0.00031 0.00013 0.00037 10 3
0
3.3.2. FLs of boundaries under the reduction of syllable
information
FLs
In the experiments to find how the reduction of syllable
information affects the FLs of prosodic boundaries, we set two Figure 4: Distribution of boundaries’ FLs after merger
control variables: tones and Initials, to observe the changes of tones.
before and after the merger of tones and Initials.
Figure 4 shows the distribution of boundaries’ FLs after When tones and Initials were merged at the meantime, the
merger of tones. Compared to figure 3, the quantity of results were very similar with the merger of tones.
boundaries that FLs bigger than 0 increased significantly, and
from an overall perspective, distribution of boundaries’ FLs in 3.4. Discussion
each level tended to clear and stable.
Based on the experimental results, we can observe that:
Table 6. Percentage of boundaries that FLs’ value are  The values of syllable boundaries within prosodic
zero in different levels. words and prosodic boundaries’ FLs have significant
differences. This could be taken count into prediction
Before merger After merger of prosodic boundaries.
label
of Initials of Initials  Table 4 shows that FLs prosodic boundaries in
1 0.875858 0.874274 different levels are to a little extent correlated with
2 0.867052 0.867052 their levels in the prosody hierarchy: PWB > PPB >
IPB. They may have different information contribution
3 0.899833 0.898164 in communication.
 From figure 3-6, when syllable information is reduced,
Merger of Initials
FLs of prosodic boundaries increase. From the
16 language redundancies point of view, prosodic
14 boundaries maybe serve some same function as
12 syllable information, in the merger of syllable
10 information that degrades the speech, one would
Frequence

1 expect information (and redundancy) to be decreased.


8
6 2 When syllable information is reduced, the information
4 contribution of prosodic boundaries in communication
3 can be seen more clearly.
2
0  In this corpus, tones have greater influence on prosodic
boundaries than Initials, this may be owing to the
limits of corpus or merger of Initials not affecting
FLs enough syllables.
Figure 5: Distribution of boundaries’ FLs after merger
of Initials 4. Conclusions
In this paper, we explored how reduction of syllable
But the results changed little after merger only of Initials. information affects FLs of prosodic boundaries with merger of
So we inferred that, in our corpus, Initials have little influence tones and Initials. The experimental result showed that phonetic
on FLs of prosodic boundaries. segments affect information contributions of prosodic
boundaries communicatively.
Table 7: Percentage of boundaries that FLs’ value are
zero in different levels. Here, we estimated FL of every single boundary and got
some conclusions, we hope to provide some reference opinions
Before merger After merger for automatic detection of prosodic boundary. Also, we hope to
label of tones & of tones & find some more linguistic events with more redundancies
Initials Initials between each other, so that they can be used in ASR or more
studies to increase efficiency.
1 0.875858 0.597464
In future work, we will improve our models and
2 0.867052 0.560694 experimental conditions and use some more linguistic events to
3 0.899833 0.595993 study further about FLs of prosodic boundaries.

5. Acknowledgements
Merger of Tones and Initials
60 This research project is supported by Beijing Wu Tong
50 Innovation Platform of Beijing Language and Culture
University (the Fundamental Research Funds for the Central
40
Universities) (16PT05). The asterisked author is the
Frequence

1
30 corresponding author.
20 2
3
6. References
10
[1] S. Wang, “The measurement of functional load,” Phonetica,
0
vol.16, no. 1, pp. 36-54, 1967.
[2] D. Surendran, and G.A. Levow, “The functional load of tone in
Mandarin is as high as that of vowels,” in Speech Prosody 2004,
FLs
International Conference. 2004.
[3] J. Zhang, L. Wei, Y. Hou, W. Cao, and Z. Xiong, “A study on
Figure 6: Distribution of boundaries’ FLs after merger Functional Loads of phonetic contrasts under context based on
of tones and Initials Mutual Information of Chinese text and phonemes,” Chinese
Spoken Language Processing (ISCSLP), 2010 7th International
Symposium on. IEEE, 2010.
[4] B. Wu, Y. Xie, and J. Zhang, “A comparison study on contextual
modeling for estimating functional loads of phonological
contrasts,” Oriental COCOSDA held jointly with 2015
Conference on Asian Spoken Language Research and Evaluation
(O-COCOSDA/CASLRE), 2015 International Conference. IEEE,
2015.
[5] A. Cutler, D. Dahan, and W. Van Donselaar, “Prosody in the
comprehension of spoken language: A literature review,”
Language and speech, vol. 40, no. 2, pp. 141-201, 1997.
[6] E. Selkirk, “The prosodic structure of function words,” Signal to
syntax: Bootstrapping from speech to grammar in early
acquisition, pp. 187-214, 1996.
[7] M.E. Beckman, and G. Ayers, “Guidelines for ToBI
labelling,” The OSU Research Foundation 3, 1997.
[8] A.K. Syrdal, J. Hirschberg, J. McGory, and E. Beckman,
“Automatic ToBI prediction and alignment to speed manual
labeling of prosody,” Speech communication, vol. 33, no. 1, pp.
135-151, 2001.
[9] Y. Yang, and B. Wang, “Acoustic correlates of hierarchical
prosodic boundary in Mandarin,” Speech Prosody 2002,
International Conference. 2002.
[10] J. Zhang, W. Li, Y. Xie, and W. Cao, “A Quantitative Study on
Information Contribution of Prosody Phrase Boundaries in
Chinese Speech,” Speech Prosody 2012. 2012.
[11] E.C. Wit, and M. Gillette, “What is linguistic redundancy,”
University of Chicago, 1998.
[12] J. Zhang, H. U. Xin-Hui, and S. Nakamura, “Using mutual
information criterion to design an efficient phoneme set for
Chinese speech recognition,” IEICE TRANSACTIONS on
Information and Systems vol. 91, no. 3, pp. 508-513, 2008.
[13] A. Li, “Chinese prosody and prosodic labeling of spontaneous
speech,” Speech Prosody 2002, International Conference. 2002.

View publication stats

You might also like