See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/316908651
A study on functional load of Chinese prosodic boundaries under reduction of
syllable information
Conference Paper · October 2016
DOI: 10.1109/ISCSLP.2016.7918470
CITATION                                                                                               READS
1                                                                                                      85
4 authors:
            Yue Chen                                                                                              Yanlu Xie
            Beijing Language and Culture University                                                               Beijing Language and Culture University
            2 PUBLICATIONS 4 CITATIONS                                                                            77 PUBLICATIONS 355 CITATIONS
                SEE PROFILE                                                                                          SEE PROFILE
            Wu Bin                                                                                                Jinsong Zhang
            Beijing Language and Culture University                                                               Beijing Language and Culture University
            9 PUBLICATIONS 42 CITATIONS                                                                           189 PUBLICATIONS 1,201 CITATIONS
                SEE PROFILE                                                                                          SEE PROFILE
 All content following this page was uploaded by Jinsong Zhang on 02 April 2020.
 The user has requested enhancement of the downloaded file.
A Study on Functional Load of Chinese Prosodic Boundaries under Reduction of
                            Syllable Information
                                     Yue Chen, Yanlu Xie, Bin Wu, Jinsong Zhang*
 College of Information Sciences, Beijing Language and Culture University, Beijing 100083, China
                                chenyue_blcu93@126.com, xieyanlu@blcu.edu.cn,
                             wubin7416850@outlook.com, jinsong.zhang@blcu.edu.cn
                                                                    contribution in communication and should be paid more
                         Abstract                                   attention in linguistic research.
Prosodic boundaries play an important role in intelligibility and        Among those linguistic events, prosodic boundaries give
naturalness of speech. It is an interesting topic to find a         rise to segmentations of the speech chain into little chunks and
quantitative measurement of their importance. Previous studies      make it convenient for people to organize, produce and perceive
have quantitatively measured the importance of prosodic             language. The segmentation of chunks plays an important role
boundaries based on functional loads (FLs). However, early          in intelligibility and naturalness of speech [5]. Selkirk et al.
study of estimating the information contribution of prosodic        divided the prosodic boundaries into different levels based on
boundaries was under the hypothesis that all other linguistic       perception experiments [6, 7].
events were exactly correct. In real speech communication,               Modeling the boundaries can be useful to process human
speakers rarely clearly articulate every phonetic event, but        languages with machine. So far, there are many studies about
listeners can correctively recognize their meaning with the help    prosodic boundaries’ analysis and prediction [8, 9], but few
of context. In order to explore the information contributions of    quantitatively measure the importance of prosodic boundaries
prosodic boundaries in real communication and investigate how       in communication. Zhang et al. [10] proposed a novel method
phonetic segments affect that contributions, this paper merges      to estimate functional loads (FLs) of prosodic boundaries based
tones and Initials to simulate reduction of syllable information    on the mutual information (MI) of text transcriptions and their
in real speech communication to some extent, observe the            phoneme representations. They found that prosodic boundaries
distributions of FLs of prosodic boundaries and the changes of      carry information and much more than phonetic segments.
FLs under the reduction of syllable information. The results
show that, prosodic boundaries in different levels have different        However, early study of estimating the information
FLs. When syllable information (tones or Initials) being            contribution of prosodic boundaries was under the hypothesis
merged, FLs of prosodic boundaries increase. The information        that all other linguistic events were exactly correct. In real
contributions of prosodic boundaries become more obvious in         communication, speakers rarely clearly articulate every
disambiguating sentence meanings. Merging of tones and              phonetic events such as tones and Initials and cannot transmit
Initials have different effects on prosodic boundaries in           information to listeners accurately, but listeners can clearly
transmitting information.                                           recognize their meaning over the context. It may because of
                                                                    language redundancy [11]. In a language, two or more of its
Index Terms: prosodic boundaries, functional load, tones,           features serve the same function. When some syllable
Initials                                                            information lost, prosodic boundaries may show a greater role
                                                                    in semantic disambiguating. We want to explore the
                    1. Introduction                                 information contributions of prosodic boundaries in common
                                                                    communication.
In speech communication, different linguistic events including
segmental phonemes and supra-segmental ones such as                      This paper aimed at studying further about how prosodic
phonemes, tones, prosodic phrasing structure and so on are all      boundaries functions in communications, and how merger of
used for information communication. Either segmental                phonetic segments affects information contributions of prosodic
phonemes or supra-segmental ones offer phonetic information         boundaries communicatively. We merged tones and some
and help to reduce the uncertainty in language. Those linguistic    Initials to simulate mispronunciation in real communication, to
events are coding methods within a language system and been         observe the changes of distribution of prosodic boundaries’ FLs
decoded by listeners. In a language, information in a sentence      before and after the mergers. Here we measured FL of every
is not only from every single linguistic event but also over and    single boundary instead of estimating the FLs of boundaries
above that which is essential.                                      when they all worked together [10] to see how a single
                                                                    boundary works and the distribution of FLs of boundaries of
    It is an interesting topic to find a quantitative measurement
                                                                    different levels.
of importance of linguistic events. In early studies, the
information contributions of phonetic events was quantitatively          The following is organized as follows: Section 2 introduces
measured using Functional Loads [1-4]. The measurement of           the method to estimate FL of boundary. Section 3 describes the
FLs provide a quantified way to order any phonetic contrasts in     experimental setups and results. Section 4 concludes the study
a language which can be applied to many domains of research         and suggests future directions.
like speech recognition, language acquisition and so on.
Linguistic events with higher FLs have greater information
         2. FLs of Prosodic Boundaries                                              3. Experiment and result
2.1. Mutual information (MI) of text transcriptions                  3.1. Experimental data
and phoneme representations                                          The training corpus is from the People Daily newspaper, and we
The mutual information (MI) quantifies the "amount of                used SRILM toolkit to train a bigram and a trigram word based
information" obtained about one random variable, through the         language model (LM) after word segmentation. Our test
other random variable. The mutual information of text (W) and        corpus’s segmentation and WHG are all based on that LM.
phonemes (F) is defined as MI (W, F):                                    The testing corpus is a Chinese natural speech corpus
                                                                     ASCCD. It has a total of 8768 Chinese characters, and was
        MI (W , F ) = H ( F ) − H ( F | W )                   (1)    converted to pinyin transcriptions. After the word segmentation,
                                                                     the prosodic boundaries in corpus was labeled 1, 2, 3 based on
The word sequence W1,W 2 ,...,W m of phoneme transcription           C-TOBI [12] that we call prosodic boundaries and the syllable
are from WHG, and the MI (W, F) can be calculated as:                boundaries within prosodic words was labeled 0 which we call
                                                                     non-boundary. The labels are shown in Table 1. After all, we
                                n
                       1                                             got 422 sentences, 5162 NB, 1893 PWB, 692 PPB and 599 IPB.
                       n      
        MI (W , F ) = - log P(Wi )
                           i =1
                                                              (2)
                                                                                    Table 1. The labels of boundaries.
    Here W i stands for all the word sequences that have the               label                           Level
same phoneme transcription. The bigger MI is, the closer W and                 0                  Non-boundaries(NB)
F are, and more certainty the F has.
                                                                               1           Prosodics word boundaries (PWB)
2.2. Word Hypothesis Graph (WHG)                                               2           Prosodic phrase boundaries (PPB)
                                                                               3           Intonation phrase boundaries (IPB)
                                                                     3.2. Experimental design
                                                                     In this paper, we firstly used the previous method to estimate
                                                                     FLs of some lexical tones and prosodic boundaries. Then, we
                                                                     used our new method to estimate FLs of every single syllable
                                                                     boundary. Here we merged tones and Initials respectively to
                                                                     simulate mispronunciation of tones and Initials in real speech
                                                                     communication. We hope to see how the FLs of boundaries in
                                                                     different levels are distributed and how the merger of syllable
                                                                     information affects the FLs of boundaries. The control variables
    Figure 1: An example of a partial WHG for the phonetic           are shown in details in Table 2. The Initial pairs for merger are
    transcription “bu ru dong wu”.                                   classified by phonological features, such as nasality, aspiration
                                                                     and so on [9]. When we merged tones, we removed all the tones.
In Figure 1, all the W’ have the same phoneme transcription F.               Table 2. Merger groups of phonetic contrasts.
The more W’ the F has, the more uncertainty the utterance has,
and the smaller MI is.
                                                                             Contract                Merger groups
2.3. FLs of prosodic boundaries                                                           (b m) (d n)
                                                                                          (b p) (d t) (c z) (j q) (ch zh) (g k)
In previous study, FL based on MI was defined as:                              Initial    (f b) (d z s) (q x) (zh sh) (g h)
            MI(W,F)− MI(W,Fα )                                                            (b d g) (z j zh) (p t k) (m n) (c q
    F (α) =                                                   (3)                         ch) (f s x h sh)
                MI(W,F)                                                        Tone               (Tone 1,2,3,4,5)
    Where α represents the merging of phonemic contrast. FL
                                                                             Table 3. Set ups for estimating FLs of prosodic
is based on a relative loss of mutual information before and after
                                                                                               boundaries
merger of α . In this paper, FL is defined as follow:
              MI(W,Fα ) − MI(W,F)                                             No.         Tone         Initial         LM
    F (α) =                                                   (4)
                  MI(W,F)                                                       1          N              N
    Here α represent prosodic boundary. When boundary                           2          N             M
                                                                                                                      bigram
appears, the WHG will decrease into fewer W’ and MI will                        3          M             N
increase, uncertainty of F decrease. The FL represents the                      4          M             M
reduction of uncertainty by prosodic boundaries.
    Different from method in early study, in this paper, we
                                                                         Table 3 shows the control group in experiments. M means
estimated the FL of every single syllable boundary instead of
                                                                     merger and N means no merger. The first group has no merger
estimating FLs of boundaries in the same level overall,
                                                                     of syllable information and the forth group merged both tones
and Initials. We estimated FLs based on both bigram and                  The quantitative values of boundaries’ FLs are restricted to
trigram LM and they had similar results, so here just shows the      other linguistic information in the sentence and the LM to a
results using bigram LM.                                             large extent. When other phonetic information is exactly right,
                                                                     due to the redundancy of language, different linguistic events
3.3. Results and analysis                                            may overlap their information contributions, and the
                                                                     importance of boundaries cannot be shown clearly and the
3.3.1. Comparison between means of FLs of different                  quantitative value of FL maybe zero. But it doesn’t mean the
kinds of boundaries                                                  boundaries don’t carry information.
In this experiment, we firstly used the previous method to                                               No Merger
                                                                                   18
estimate the FLs when the boundaries in the same level worked
                                                                                   16
together. Figure 2 shows FLs of prosodic boundaries, together                      14
with lexical tone pairs of Tone 1 and 4, Tone 2 and 4.                             12
    We can see from Figure 2 that, in the corpus, both prosodic                    10                                                    1
                                                                      Frequence
boundaries and tones carry information, and boundaries carries                      8
more.                                                                               6                                                    2
                                                                                    4
                                                                                    2                                                    3
                                                                                    0
                                                                                                       FLs
                                                                                    Figure 3: Distribution of boundaries’ FLs without
                                                                                              merger of syllable information
                                                                         Figure 3 shows the distribution of boundaries’ FLs without
                                                                     merger of syllable information. The quantity of boundaries in
                                                                     different level had a distinct difference and there were many
                                                                     boundaries that FLs’ values equaled zero, so we chose 599
    Figure 2: FLs of prosodic boundaries and partial tone            samples for each kind of boundaries to analysis and in the
                            pairs.                                   figure, we didn’t show the boundaries that FL’s value equals
                                                                     zero. So did the figures below.
     Then we figured out every single boundary’s FL. Means of            When tones were merged, most of the FLs of boundaries
FLs of boundaries in different levels and different conditions       had different degree of rise and the percentages of boundaries
are shown in table 4. We can see that, under all conditions, the     that FLs are zero in different levels had about 30% reduction.
values of FLs of non-boundaries are bigger than prosodic
boundaries. We conducted statistical analysis and found that the                   Table 5. Percentage of boundaries that FLs are zero
difference between FLs of non-boundaries and boundaries is                                          in different levels.
statistically. And that for boundaries of different levels is not.
     Although FLs of boundaries in different levels don’t have                                   Before merger       After merger
                                                                                        Label
significant differences, the means of them are very different.                                      of tones           of tones
The bigger the label is, the smaller the mean of FL is, and the                          1         0.875858           0.597993
smaller the contribution of boundary is. It is the same as results                       2          0.867052           0.559249
in Figure 2.
                                                                                         3          0.899833           0.597663
    Table 4. Means of FLs of boundaries in different levels
                  and different conditions
                                                                                                     Merger of Tones
                                                                                   60
                   No         Tone     Initial      both
       Label
                  merger     merger    merger      merger                          50
         0        0.06107    0.04507   0.05602    0.038345
                                                                                   40
                                                                                                                                         1
                                                                      Frequence
         1        0.00278    0.00214   0.00261     0.00206                         30
         2        0.00110    0.00132   0.00110     0.00131                         20                                                    2
         3        0.00013    0.00031   0.00013     0.00037                         10                                                    3
                                                                                    0
3.3.2. FLs of boundaries under the reduction of syllable
information
                                                                                                        FLs
In the experiments to find how the reduction of syllable
information affects the FLs of prosodic boundaries, we set two                    Figure 4: Distribution of boundaries’ FLs after merger
control variables: tones and Initials, to observe the changes                                             of tones.
before and after the merger of tones and Initials.
    Figure 4 shows the distribution of boundaries’ FLs after                    When tones and Initials were merged at the meantime, the
merger of tones. Compared to figure 3, the quantity of                      results were very similar with the merger of tones.
boundaries that FLs bigger than 0 increased significantly, and
from an overall perspective, distribution of boundaries’ FLs in             3.4. Discussion
each level tended to clear and stable.
                                                                            Based on the experimental results, we can observe that:
               Table 6. Percentage of boundaries that FLs’ value are            The values of syllable boundaries within prosodic
                              zero in different levels.                             words and prosodic boundaries’ FLs have significant
                                                                                    differences. This could be taken count into prediction
                              Before merger        After merger                     of prosodic boundaries.
                     label
                                of Initials         of Initials                 Table 4 shows that FLs prosodic boundaries in
                      1         0.875858            0.874274                        different levels are to a little extent correlated with
                      2          0.867052              0.867052                     their levels in the prosody hierarchy: PWB > PPB >
                                                                                    IPB. They may have different information contribution
                      3          0.899833              0.898164                     in communication.
                                                                                From figure 3-6, when syllable information is reduced,
                                  Merger of Initials
                                                                                    FLs of prosodic boundaries increase. From the
                16                                                                  language redundancies point of view, prosodic
                14                                                                  boundaries maybe serve some same function as
                12                                                                  syllable information, in the merger of syllable
                10                                                                  information that degrades the speech, one would
   Frequence
                                                                    1               expect information (and redundancy) to be decreased.
                 8
                 6                                                  2               When syllable information is reduced, the information
                 4                                                                  contribution of prosodic boundaries in communication
                                                                    3               can be seen more clearly.
                 2
                 0                                                              In this corpus, tones have greater influence on prosodic
                                                                                    boundaries than Initials, this may be owing to the
                                                                                    limits of corpus or merger of Initials not affecting
                                     FLs                                            enough syllables.
               Figure 5: Distribution of boundaries’ FLs after merger
                                      of Initials                                                 4. Conclusions
                                                                            In this paper, we explored how reduction of syllable
    But the results changed little after merger only of Initials.           information affects FLs of prosodic boundaries with merger of
So we inferred that, in our corpus, Initials have little influence          tones and Initials. The experimental result showed that phonetic
on FLs of prosodic boundaries.                                              segments affect information contributions of prosodic
                                                                            boundaries communicatively.
               Table 7: Percentage of boundaries that FLs’ value are
                              zero in different levels.                         Here, we estimated FL of every single boundary and got
                                                                            some conclusions, we hope to provide some reference opinions
                              Before merger        After merger             for automatic detection of prosodic boundary. Also, we hope to
                     label     of tones &           of tones &              find some more linguistic events with more redundancies
                                 Initials             Initials              between each other, so that they can be used in ASR or more
                                                                            studies to increase efficiency.
                      1         0.875858            0.597464
                                                                                In future work, we will improve our models and
                      2          0.867052              0.560694             experimental conditions and use some more linguistic events to
                      3          0.899833              0.595993             study further about FLs of prosodic boundaries.
                                                                                             5. Acknowledgements
                             Merger of Tones and Initials
                60                                                          This research project is supported by Beijing Wu Tong
                50                                                          Innovation Platform of Beijing Language and Culture
                                                                            University (the Fundamental Research Funds for the Central
                40
                                                                            Universities) (16PT05). The asterisked author is the
Frequence
                                                                        1
                30                                                          corresponding author.
                20                                                      2
                                                                        3
                                                                                                   6. References
                10
                                                                            [1]   S. Wang, “The measurement of functional load,” Phonetica,
                 0
                                                                                  vol.16, no. 1, pp. 36-54, 1967.
                                                                            [2]   D. Surendran, and G.A. Levow, “The functional load of tone in
                                                                                  Mandarin is as high as that of vowels,” in Speech Prosody 2004,
                                      FLs
                                                                                  International Conference. 2004.
                                                                            [3]   J. Zhang, L. Wei, Y. Hou, W. Cao, and Z. Xiong, “A study on
               Figure 6: Distribution of boundaries’ FLs after merger             Functional Loads of phonetic contrasts under context based on
                                 of tones and Initials                            Mutual Information of Chinese text and phonemes,” Chinese
              Spoken Language Processing (ISCSLP), 2010 7th International
              Symposium on. IEEE, 2010.
   [4]        B. Wu, Y. Xie, and J. Zhang, “A comparison study on contextual
              modeling for estimating functional loads of phonological
              contrasts,” Oriental COCOSDA held jointly with 2015
              Conference on Asian Spoken Language Research and Evaluation
              (O-COCOSDA/CASLRE), 2015 International Conference. IEEE,
              2015.
   [5]        A. Cutler, D. Dahan, and W. Van Donselaar, “Prosody in the
              comprehension of spoken language: A literature review,”
              Language and speech, vol. 40, no. 2, pp. 141-201, 1997.
   [6]        E. Selkirk, “The prosodic structure of function words,” Signal to
              syntax: Bootstrapping from speech to grammar in early
              acquisition, pp. 187-214, 1996.
   [7]        M.E. Beckman, and G. Ayers, “Guidelines for ToBI
              labelling,” The OSU Research Foundation 3, 1997.
   [8]        A.K. Syrdal, J. Hirschberg, J. McGory, and E. Beckman,
              “Automatic ToBI prediction and alignment to speed manual
              labeling of prosody,” Speech communication, vol. 33, no. 1, pp.
              135-151, 2001.
   [9]        Y. Yang, and B. Wang, “Acoustic correlates of hierarchical
              prosodic boundary in Mandarin,” Speech Prosody 2002,
              International Conference. 2002.
   [10]       J. Zhang, W. Li, Y. Xie, and W. Cao, “A Quantitative Study on
              Information Contribution of Prosody Phrase Boundaries in
              Chinese Speech,” Speech Prosody 2012. 2012.
   [11]       E.C. Wit, and M. Gillette, “What is linguistic redundancy,”
              University of Chicago, 1998.
   [12]       J. Zhang, H. U. Xin-Hui, and S. Nakamura, “Using mutual
              information criterion to design an efficient phoneme set for
              Chinese speech recognition,” IEICE TRANSACTIONS on
              Information and Systems vol. 91, no. 3, pp. 508-513, 2008.
   [13]       A. Li, “Chinese prosody and prosodic labeling of spontaneous
              speech,” Speech Prosody 2002, International Conference. 2002.
View publication stats