default search action
INTERSPEECH 2008: Brisbane, Australia
- 9th Annual Conference of the International Speech Communication Association, INTERSPEECH 2008, Brisbane, Australia, September 22-26, 2008. ISCA 2008
Keynote Sessions
- Hiroya Fujisaki:
In search of models in speech communication research. 1-10 - Abeer Alwan:
Dealing with limited and noisy data in ASR: a hybrid knowledge-based and statistical approach. 11-15 - Joaquin Gonzalez-Rodriguez:
Forensic automatic speaker recognition: fiction or science? 16-17 - Justine Cassell:
Modelling rapport in embodied conversational agents. 18-19
Segmentation and Classification
- Kyu Jeong Han, Shrikanth S. Narayanan:
Agglomerative hierarchical speaker clustering using incremental Gaussian mixture cluster modeling. 20-23 - Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman:
Weighted segmental k-means initialization for SOM-based speaker clustering. 24-27 - Shajith Ikbal, Karthik Visweswariah:
Learning essential speaker sub-space using hetero-associative neural networks for speaker clustering. 28-31 - Kofi Boakye, Oriol Vinyals, Gerald Friedland:
Two's a crowd: improving speaker diarization by automatically identifying and excluding overlapped speech. 32-35 - Trung Hieu Nguyen, Engsiong Chng, Haizhou Li:
T-test distance and clustering criterion for speaker diarization. 36-39 - Deepu Vijayasenan, Fabio Valente, Hervé Bourlard:
Integration of TDOA features in information bottleneck framework for fast speaker diarization. 40-43
Speech Coding
- V. Ramasubramanian, D. Harish:
Low complexity near-optimal unit-selection algorithm for ultra low bit-rate speech coding based on n-best lattice and Viterbi search. 44 - Vaclav Eksler, Redwan Salami, Milan Jelinek:
A new fast algebraic fixed codebook search algorithm in CELP speech coding. 45-48 - Hao Xu, Changchun Bao:
A novel transcoding algorithm between 3GPP AMR-NB (7.95kbit/s) and ITU-t g.729a (8kbit/s). 49-52 - Amr H. Nour-Eldin, Peter Kabal:
Mel-frequency cepstral coefficient-based bandwidth extension of narrowband speech. 53-56 - Jean-Luc Garcia, Claude Marro, Balázs Kövesi:
A PCM coding noise reduction for ITU-t g.711.1. 57-60 - Marcel Wältermann, Kirstin Scholz, Sebastian Möller, Lu Huo, Alexander Raake, Ulrich Heute:
An instrumental measure for end-to-end speech transmission quality based on perceptual dimensions: framework and realization. 61-64
Human Conversation and Communication
- Benno Peters, Hartmut R. Pfitzinger:
Duration and F0 interval of utterance-final intonation contours in the perception of German sentence modality. 65-68 - Bettina Braun, Lara Tagliapietra, Anne Cutler:
Contrastive utterances make alternatives salient - cross-modal priming evidence. 69 - Masato Ishizaki, Yasuharu Den, Senshi Fukashiro:
Exploring a mechanism of speech sychronization using auditory delayed experiments. 70-73 - Heather Pon-Barry:
Prosodic manifestations of confidence and uncertainty in spoken language. 74-77 - Raquel Fernández, Matthew Frampton, John Dowding, Anish Adukuzhiyil, Patrick Ehlen, Stanley Peters:
Identifying relevant phrases to summarize decisions in spoken meetings. 78-81 - Kornel Laskowski, Tanja Schultz:
Recovering participant identities in meetings from a probabilistic description of vocal interaction. 82-85
OzPhon08 - Phonetics and Phonology of Australian Aboriginal Languages (Special Session)
- Janet Fletcher, Deborah Loakes, Andrew Butcher:
Coarticulation in nasal and lateral clusters in Warlpiri. 86-89 - Deborah Loakes, Andrew Butcher, Janet Fletcher, Hywel Stoakes:
Phonetically prestopped laterals in Australian languages: a preliminary investigation of Warlpiri. 90-93 - John Ingram, Mary Laughren, Jeff Chapman:
Connected speech processes in Warlpiri. 94 - Christina Pentland:
Consonant enhancement in Lamalama, an initial-dropping language of Cape York Peninsula, North Queensland. 95 - Myfany Turpin:
Text, rhythm and metrical form in an Aboriginal song series. 96-98
Acoustic Activity Detection, Pitch Tracking and Analysis
- Kentaro Ishizuka, Shoko Araki, Tatsuya Kawahara:
Statistical speech activity detection based on spatial power distribution for analyses of poster presentations. 99-102 - Sang-Ick Kang, Ji-Hyun Song, Kye-Hwan Lee, Yun-Sik Park, Joon-Hyuk Chang:
A statistical model-based voice activity detection employing minimum classification error technique. 103-106 - Hongfei Ding, Koichi Yamamoto, Masami Akamine:
Comparative evaluation of different methods for voice activity detection. 107-110 - Soheil Shafiee, Farshad Almasganj, Ayyoob Jafari:
Speech/non-speech segments detection based on chaotic and prosodic features. 111-114 - Christian Zieger, Maurizio Omologo:
Acoustic event classification using a distributed microphone network with a GMM/SVM combined algorithm. 115-118 - Yasunari Obuchi, Masahito Togami, Takashi Sumiyoshi:
Intentional voice command detection for completely hands-free speech interface in home environments. 119-122 - Taras Butko, Andrey Temko, Climent Nadeu, Cristian Canton-Ferrer:
Fusion of audio and video modalities for detection of acoustic events. 123-126 - Ron J. Weiss, Trausti T. Kristjansson:
DySANA: dynamic speech and noise adaptation for voice activity detection. 127-130 - Rico Petrick, Masashi Unoki, Anish Mittal, Carlos Segura, Rüdiger Hoffmann:
A comprehensive study on the effects of room reverberation on fundamental frequency estimation. 131-134 - Hussein Hussein, Matthias Wolff, Oliver Jokisch, Frank Duckhorn, Guntram Strecha, Rüdiger Hoffmann:
A hybrid speech signal based algorithm for pitch marking using finite state machines. 135-138 - Yasunori Ohishi, Hirokazu Kameoka, Kunio Kashino, Kazuya Takeda:
Parameter estimation method of F0 control model for singing voices. 139-142 - Srikanth Vishnubhotla, Carol Y. Espy-Wilson:
An algorithm for multi-pitch tracking in co-channel speech. 143-146 - Michael Wohlmayr, Franz Pernkopf:
Multipitch tracking using a factorial hidden Markov model. 147-150 - Ming Li, Chuan Cao, Di Wang, Ping Lu, Qiang Fu, Yonghong Yan:
Cochannel speech separation using multi-pitch estimation and model based voiced sequential grouping. 151-154 - Philippe Martin:
Crosscorrelation of adjacent spectra enhances fundamental frequency tracking. 155-158
Single- and Multichannel Speech Enhancement I, II
- Jirí Málek, Zbynek Koldovský, Jindrich Zdánský, Jan Nouza:
Enhancement of noisy speech recordings via blind source separation. 159-162 - Takaaki Ishibashi, Hidetoshi Nakashima, Hiromu Gotanda:
Studies on estimation of the number of sources in blind source separation. 163-166 - V. Ramasubramanian, Deepak Vijaywargi:
Speech enhancement based on hypothesized Wiener filtering. 167-170 - Junfeng Li, Hui Jiang, Masato Akagi:
Psychoacoustically-motivated adaptive β-order generalized spectral subtraction based on data-driven optimization. 171-174 - Krishna Nand K., T. V. Sreenivas:
Two stage iterative Wiener filtering for speech enhancement. 175-178 - Pei Ding, Jie Hao:
Assessment of correlation between objective measures and speech recognition performance in the evaluation of speech enhancement. 179-182
Spoken Language Systems I, II
- Kazunori Komatani, Tatsuya Kawahara, Hiroshi G. Okuno:
Predicting ASR errors by exploiting barge-in rate of individual users for spoken dialogue systems. 183-186 - Masaki Katsumaru, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Expanding vocabulary for recognizing user's abbreviations of proper nouns without increasing ASR error rates in spoken dialogue systems. 187-190 - Jason D. Williams:
Exploiting the ASR n-best by tracking multiple dialog state hypotheses. 191-194 - Enes Makalic, Ingrid Zukerman, Michael Niemann:
A spoken language interpretation component for a robot dialogue system. 195-198 - Federico Cesari, Horacio Franco, Gregory K. Myers, Harry Bratt:
MUESLI: multiple utterance error correction for a spoken language interface. 199-202 - Sarah Conrod, Sara H. Basson, Dimitri Kanevsky:
Methods to optimize transcription of on-line media. 203-206 - Akinori Ito, Toyomi Meguro, Shozo Makino, Motoyuki Suzuki:
Discrimination of task-related words for vocabulary design of spoken dialog systems. 207-210 - Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, Hideki Kashioka, Satoshi Nakamura:
Dialog management using weighted finite-state transducers. 211-214 - Yoshitaka Yoshimi, Ryota Kakitsuba, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
Probabilistic answer selection based on conditional random fields for spoken dialog system. 215-218 - Maxine Eskénazi, Alan W. Black, Antoine Raux, Brian Langner:
Let's go lab: a platform for evaluation of spoken dialog systems with real world users. 219 - Fernando Batista, Nuno J. Mamede, Isabel Trancoso:
The impact of language dynamics on the capitalization of broadcast news. 220-223 - Matthias Paulik, Alex Waibel:
Lightly supervised acoustic model training on EPPS recordings. 224-227 - Christophe Servan, Frédéric Béchet:
Fast call-classification system development without in-domain training data. 228-231 - Björn Hoffmeister, Ralf Schlüter, Hermann Ney:
iCNC and iROVER: the limits of improving system combination with classification? 232-235 - Stefan Hahn, Patrick Lehnen, Hermann Ney:
System combination for spoken language understanding. 236-239
Emotion and Expression I, II
- Tomoko Suzuki, Machiko Ikemoto, Tomoko Sano, Toshihiko Kinoshita:
Multidimensional features of emotional speech. 240 - Narjès Boufaden, Pierre Dumouchel:
Leveraging emotion detection using emotions from yes-no answers. 241-244 - Thomas John Millhouse, Dianna T. Kenny:
Vowel placement during operatic singing: 'come si parla' or 'aggiustamento'? 245-248 - Yumiko O. Kato, Yoshifumi Hirose, Takahiro Kamai:
Study on strained rough voice as a conveyer of rage. 249-252 - Mumtaz Begum, Raja Noor Ainon, Roziati Zainuddin, Zuraidah M. Don, Gerry Knowles:
Integrating rule and template-based approaches for emotional Malay speech synthesis. 253-256 - Carlos Busso, Shrikanth S. Narayanan:
The expression and perception of emotions: comparing assessments of self versus others. 257-260 - Emiel Krahmer, Marc Swerts:
On the role of acting skills for the collection of simulated emotional speech. 261-264 - Björn W. Schuller, Matthias Wimmer, Dejan Arsic, Tobias Moosmayr, Gerhard Rigoll:
Detection of security related affect and behaviour in passenger transport. 265-268
Automatic Speech Recognition: Acoustic Models I-III
- Jinyu Li, Zhi-Jie Yan, Chin-Hui Lee, Ren-Hua Wang:
Soft margin estimation with various separation levels for LVCSR. 269-272 - Georg Heigold, Patrick Lehnen, Ralf Schlüter, Hermann Ney:
On the equivalence of Gaussian and log-linear HMMs. 273-276 - Dimitri Kanevsky, Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo:
Generalization of extended baum-welch parameter estimation for discriminative training and decoding. 277-280 - Peng Liu, Frank K. Soong:
An ellipsoid constrained quadratic programming perspective to discriminative training of HMMs. 281-284 - Dong Yu, Li Deng, Yifan Gong, Alex Acero:
Discriminative training of variable-parameter HMMs for noise robust speech recognition. 285-288 - Jasha Droppo, Michael L. Seltzer, Alex Acero, Yu-Hsiang Bosco Chiu:
Towards a non-parametric acoustic model: an acoustic decision tree for observation probability calculation. 289-292
Accent and Language Identification
- Shona D'Arcy, Martin J. Russell:
Experiments with the ABI (accents of the british isles) speech corpus. 293-296 - Fabio Castaldo, Emanuele Dalmasso, Pietro Laface, Daniele Colibro, Claudio Vair:
Politecnico di Torino system for the 2007 NIST language recognition evaluation. 297-300 - Valiantsina Hubeika, Lukás Burget, Pavel Matejka, Petr Schwarz:
Discriminative training and channel compensation for acoustic language recognition. 301-304 - Tingyao Wu, Peter Karsmakers, Hugo Van hamme, Dirk Van Compernolle:
Comparison of variable selection methods and classifiers for native accent identification. 305-308 - William M. Campbell, Douglas E. Sturim, Pedro A. Torres-Carrasquillo, Douglas A. Reynolds:
A comparison of subspace feature-domain methods for language recognition. 309-312 - Mohamed Faouzi BenZeghiba, Jean-Luc Gauvain, Lori Lamel:
Context-dependent phone models and models adaptation for phonotactic language recognition. 313-316
Emotion and Expression I, II
- Martijn Goudbeek, Jean-Philippe Goldman, Klaus R. Scherer:
Emotions and articulatory precision. 317 - Khiet P. Truong, Mark A. Neerincx, David A. van Leeuwen:
Assessing agreement of observer- and self-annotations in spontaneous multimodal emotion data. 318-321 - Yoshiko Arimoto, Hiromi Kawatsu, Sumio Ohno, Hitoshi Iida:
Emotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems. 322-325 - Shaikh Mostafa Al Masum, M. Khademul Islam Molla, Keikichi Hirose:
Assigning suitable phrasal tones and pitch accents by sensing affective information from text to synthesize human-like speech. 326-329 - Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl:
Cross-language study of vocal correlates of affective states. 330-333 - Marc Swerts, Emiel Krahmer:
Gender-related differences in the production and perception of emotion. 334-337
Special Session: PANZE 2008 - Phonetics and Phonology of Australian and New Zealand English
- Catherine Inez Watson, Margaret Maclagan, Jeanette King, Ray Harlow:
The English pronunciation of successive groups of Maori speakers. 338-341 - Felicity Cox, Sallyanne Palethorpe:
Reversal of short front vowel raising in Australian English. 342-345 - Jennifer Price:
GOOSE on the move: a study of /u/-fronting in Australian news speech. 346 - Andrew Butcher, Victoria Anderson:
The vowels of Australian Aboriginal English. 347-350 - Robert H. Mannell:
Perception and production of /i: /, /i@/ and /e: / in australian English. 351-354
Speaker Recognition and Diarisation
- Zbynek Zajíc, Lukás Machlica, Ales Padrta, Jan Vanek, Vlasta Radová:
An expert system in speaker verification task. 355-358 - David Dean, Sridha Sridharan, Patrick Lucey:
Cascading appearance-based features for visual speaker verification. 359-362 - Konstantin Markov, Satoshi Nakamura:
Improved novelty detection for online GMM based speaker diarization. 363-366 - Salah Eddine Mezaache, Jean-François Bonastre, Driss Matrouf:
Analysis of impostor tests with high scores in NIST-SRE context. 367-370 - Anthony Larcher, Jean-François Bonastre, John S. D. Mason:
Reinforced temporal structure information for embedded utterance-based speaker recognition. 371-374 - Michael Gerber, Beat Pfister:
Fast search for common segments in speech signals for speaker verification. 375-378 - Girija Chetty, Michael Wagner:
Audio-visual multilevel fusion for speech and speaker recognition. 379-382 - Jordi Luque, Carlos Segura, Javier Hernando:
Clustering initialization based on spatial information for speaker diarization of meetings. 383-386
Single- and Multichannel Speech Enhancement I, II
- James G. Lyons, Kuldip K. Paliwal:
Effect of compressing the dynamic range of the power spectrum in modulation filtering based speech enhancement. 387-390 - Stephen So, Kuldip K. Paliwal:
A long state vector kalman filter for speech enhancement. 391-394 - Achintya Kundu, Saikat Chatterjee, T. V. Sreenivas:
Subspace based speech enhancement using Gaussian mixture model. 395-398 - Amit Das, John H. L. Hansen:
Generalized parametric spectral subtraction using weighted Euclidean distortion. 399-402 - Nobuyuki Miyake, Tetsuya Takiguchi, Yasuo Ariki:
Sudden noise reduction based on GMM with noise power estimation. 403-406 - Md. Jahangir Alam, Sid-Ahmed Selouani, Douglas D. O'Shaughnessy, Sofia Ben Jebara:
Speech enhancement using a wiener denoising technique and musical noise reduction. 407-410 - Kevin W. Wilson, Bhiksha Raj, Paris Smaragdis:
Regularized non-negative matrix factorization with temporal dependencies for speech denoising. 411-414 - Xin Zou, Peter Jancovic, Münevver Köküer, Martin J. Russell:
ICA-based MAP speech enhancement with multiple variable speech distribution models. 415-418 - Ron J. Weiss, Michael I. Mandel, Daniel P. W. Ellis:
Source separation based on binaural cues and source model constraints. 419-422 - Ken'ichi Kumatani, John W. McDonough, Barbara Rauch, Philip N. Garner, Weifeng Li, John Dines:
Maximum kurtosis beamforming with the generalized sidelobe canceller. 423-426 - Ken'ichi Furuya, Akitoshi Kataoka, Youichi Haneda:
Noise robust speech dereverberation using constrained inverse filter. 427-430 - Mohsen Rahmani, Ahmad Akbari, Beghdad Ayad:
A dual microphone coherence based method for speech enhancement in headsets. 431-434 - Ivan Tashev, Slavy Mihov, Tyler Gleghorn, Alex Acero:
Sound capture system and spatial filter for small devices. 435-438 - Ning Cheng, Wenju Liu, Peng Li, Bo Xu:
An effective microphone array post-filter in arbitrary environments. 439-442 - Kook Cho, Hajime Okumura, Takanobu Nishiura, Yoichi Yamashita:
Localization of multiple sound sources based on inter-channel correlation using a distributed microphone system. 443-446 - Heng Zhang, Qiang Fu, Yonghong Yan:
A frequency domain approach for speech enhancement with directionality using compact microphone array. 447-450
Spoken Language Systems I, II
- Shota Takeuchi, Tobias Cincarek, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Question and answer database optimization using speech recognition results. 451-454 - Hiroshi Saruwatari, Yu Takahashi, Hiroyuki Sakai, Shota Takeuchi, Tobias Cincarek, Hiromichi Kawanami, Kiyohiro Shikano:
Development and evaluation of hands-free spoken dialogue system for railway station guidance. 455-458 - Amanda J. Stent, Srinivas Bangalore:
Statistical shared plan-based dialog management. 459-462 - Ota Herm, Alexander Schmitt, Jackson Liscombe:
When calls go wrong: how to detect problematic calls based on log-files and emotions? 463-466 - Daniel Gillick, Dilek Hakkani-Tür, Michael Levit:
Unsupervised learning of edit parameters for matching name variants. 467-470 - Mert Cevik, Fuliang Weng, Chin-Hui Lee:
Detection of repetitions in spontaneous speech in dialogue sessions. 471-474 - Nathalie Camelin, Géraldine Damnati, Frédéric Béchet, Renato de Mori:
Automatic customer feedback processing: alarm detection in open question spoken messages. 475-478 - Mithun Balakrishna, Marta Tatu, Dan I. Moldovan:
Minimal training based semantic categorization in a voice activated question answering (VAQA) system. 479-482 - Blaise Thomson, Milica Gasic, Simon Keizer, François Mairesse, Jost Schatzmann, Kai Yu, Steve J. Young:
User study of the Bayesian update of dialogue state approach to dialogue management. 483-486 - Satoshi Ikeda, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Extensibility verification of robust domain selection against out-of-grammar utterances in multi-domain spoken dialogue system. 487-490 - Ea-Ee Jan, Osamuyimen Stewart, Raymond Co, David M. Lubensky:
Improving large scale alphanumeric string recognition using redundant information. 491-494 - Kris Demuynck, Jan Roelens, Dirk Van Compernolle, Patrick Wambacq:
SPRAAK: an open source "SPeech recognition and automatic annotation kit". 495 - Michel Vacher, Anthony Fleury, Jean-François Serignat, Norbert Noury, Hubert Glasson:
Preliminary evaluation of speech/sound recognition for telemedicine application in a real environment. 496-499 - Markku Turunen, Aleksi Melto, Anssi Kainulainen, Jaakko Hakulinen:
Mobidic - a mobile dictation and notetaking application. 500-503 - Thomas Hain, Asmaa El Hannani, Stuart N. Wrigley, Vincent Wan:
Automatic speech recognition for scientific purposes - webASR. 504-507 - Hugo Meinedo, Márcio Viveiros, João Paulo Neto:
Evaluation of a live broadcast news subtitling system for portuguese. 508-511
Perception, Production, Discourse and Dialog
- Jonas Beskow, Gösta Bruce, Laura Enflo, Björn Granström, Susanne Schötz:
Recognizing and modelling regional varieties of Swedish. 512-515 - John Hajek, Mary Stevens:
Vowel duration, compression and lengthening in stressed syllables in central and southern varieties of standard Italian. 516-519 - Joan K.-Y. Ma, Valter Ciocca, Tara L. Whitehill:
Acoustic cues for the perception of intonation in Cantonese. 520-523 - Adrian Leemann, Beat Siebenhaar:
Perception of dialectal prosody. 524-527 - Christian Kroos, Ashlie Dreves:
Does the Mcgurk effect rely on processing time constraints? 528 - Takaaki Kuratate, Kathryn Ayers, Jeesun Kim, Denis Burnham:
Exploring the Uncanny Valley Effect with talking heads. 529 - Knut Kvale, Ragnhild Halvorsrud:
How do the elderly talk to a natural language call routing system? 530-533 - Ryota Nishimura, Norihide Kitaoka, Seiichi Nakagawa:
Analysis of relationship between impression of human-to-human conversations and prosodic change and its modeling. 534-537 - Tuomo Saarni, Jussi Hakokari, Jouni Isoaho, Tapio Salakoski:
Utterance-level normalization for relative articulation rate analysis. 538-541 - Martin I. Tietze, Vera Demberg, Johanna D. Moore:
Syntactic complexity induces explicit grounding in the Maptask corpus. 542 - Andi Winterboer, Johanna D. Moore, Fernanda Ferreira:
Do discourse cues facilitate recall in information presentation messages? 543 - Noriko Hattori:
Structured heterogeneity of English stress variants. 544 - Shota Sato, Taro Kimura, Yasuo Horiuchi, Masafumi Nishida, Shingo Kuroiwa, Akira Ichikawa:
A method for automatically estimating F0 model parameters and a speech re-synthesis tool using F0 model and STRAIGHT. 545-548
Single-Channel Speech Enhancement
- Anthony P. Stark, Kamil K. Wójcicki, James G. Lyons, Kuldip K. Paliwal:
Noise driven short-time phase spectrum compensation procedure for speech enhancement. 549-552 - Friedrich Faubel, John W. McDonough, Dietrich Klakow:
A phase-averaged model for the relationship between noisy speech, clean speech and noise in the log-mel domain. 553-556 - Henk Brouckxon, Werner Verhelst, Bart De Schuymer:
Time and frequency dependent amplification for speech intelligibility enhancement in noisy environments. 557-560 - Mehdi Mohammadi, Behzad Zamani, Babak Nasersharif, Mohsen Rahmani, Ahmad Akbari:
A wavelet based speech enhancement method using noise classification and shaping. 561-564 - Md. Jahangir Alam, Douglas D. O'Shaughnessy, Sid-Ahmed Selouani:
Speech enhancement based on novel two-step a priori SNR estimators. 565-568 - Jun Du, Qiang Huo:
A speech enhancement approach using piecewise linear approximation of an explicit model of environmental distortions. 569-572
Speech Synthesis Methods I, II
- Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi, Ren-Hua Wang:
Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge. 573-576 - Yi-Jian Wu, Keiichi Tokuda:
Minimum generation error training with direct log spectral distortion on LSPs for HMM-based speech synthesis. 577-580 - Junichi Yamagishi, Zhen-Hua Ling, Simon King:
Robustness of HMM-based speech synthesis. 581-584 - Alistair Conkie, Ann K. Syrdal, Yeon-Jun Kim, Marc C. Beutnagel:
Improving preselection in unit selection synthesis. 585-588 - Feng Ding, Jani Nurminen, Jilei Tian:
Efficient join cost computation for unit selection based TTS systems. 589-592 - Kayoko Yanagisawa, Mark A. Huckvale:
A phonetic assessment of cross-language voice conversion. 593-596
Speaking Style and Emotion Recognition
- Martin Wöllmer, Florian Eyben, Stephan Reiter, Björn W. Schuller, Cate Cox, Ellen Douglas-Cowie, Roddy Cowie:
Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies. 597-600 - Dino Seppi, Anton Batliner, Björn W. Schuller, Stefan Steidl, Thurid Vogt, Johannes Wagner, Laurence Devillers, Laurence Vidrascu, Noam Amir, Vered Aharonson:
Patterns, prototypes, performance: classifying emotional user states. 601-604 - Ling He, Margaret Lech, Sheeraz Memon, Nicholas B. Allen:
Recognition of stress in speech using wavelet analysis and Teager energy operator. 605-608 - Elizabeth Shriberg, Martin Graciarena, Harry Bratt, Andreas Kathol, Sachin S. Kajarekar, Huda Jameel, Colleen Richey, Fred Goodman:
Effects of vocal effort and speaking style on text-independent speaker verification. 609-612 - Mireia Farrús, Michael Wagner, Jan Anguita, Javier Hernando:
Robustness of prosodic features to voice imitation. 613-616 - Vidhyasaharan Sethu, Eliathamby Ambikairajah, Julien Epps:
Phonetic and speaker variations in automatic emotion classification. 617-620
Special Session: Cross-Linguistic and Developmental Issues in the Perception and Production of Lexical Tone
- Karen Mattock:
Infants' native and nonnative tone perception. 621 - Ananthanarayan Krishnan, Jackson T. Gandour, Jayaganesh Swaminathan:
Language experience dependent plasticity for pitch representation in the human brainstem. 622 - Valter Ciocca, Vivian W.-K. Ip:
Development of tone perception and tone production in Cantonese-learning children aged 2 to 5 years. 623 - Nan Xu, Denis Burnham:
Tone hyperarticulation in Cantonese infant-directed speech. 624 - Sabine Zerbian, Etienne Barnard:
Influences on tone in Sepedi, a southern Bantu language. 625 - Shunichi Ishihara:
An acoustic-phonetic comparative analysis of Osaka and Kagoshima Japanese tonal phenomena. 626-629
Special Session: Auditory-Inspired Spectro-Temporal Features I, II
- Oriol Vinyals, Gerald Friedland:
Modulation spectrogram features for improved speaker diarization. 630-633 - Tiago H. Falk, Wai-Yip Chan:
Spectro-temporal features for robust far-field speaker identification. 634-637 - Siqing Wu, Tiago H. Falk, Wai-Yip Chan:
Long-term spectro-temporal information for improved automatic speech emotion classification. 638-641 - Yotaro Kubo, Shigeki Okawa, Akira Kurematsu, Katsuhiko Shirai:
A comparative study on AM and FM features. 642-645 - Maria E. Markaki, Yannis Stylianou:
Dimensionality reduction of modulation frequency features for speech discrimination. 646-649 - Hideki Kawahara, Masanori Morise, Hideki Banno, Toru Takahashi, Ryuichi Nisimura, Toshio Irino:
Spectral envelope recovery beyond the nyquist limit for high-quality manipulation of speech sounds. 650-653 - Hui Yin, Xiang Xie, Jingming Kuang:
Adaptive-order fractional Fourier transform features for speech recognition. 654-657 - Rico Petrick, Xugang Lu, Masashi Unoki, Masato Akagi, Rüdiger Hoffmann:
Robust front end processing for speech recognition in reverberant environments: utilization of speech characteristics. 658-661
Speech Coding, Quality Measurement and Auditory Modelling
- Binh Phu Nguyen, Takeshi Shibata, Masato Akagi:
High-quality analysis/synthesis method based on temporal decomposition for speech modification. 662-665 - Philippe Gournay:
Improved frame loss recovery using closed-loop estimation of very low bit rate side information. 666-669 - Max F. K. Happel, Simon Müller, Jörn Anemüller, Frank W. Ohl:
Predictability of STRFs in auditory cortex neurons depends on stimulus class. 670 - Udar Mittal, James P. Ashley, Jonathan Gibbs:
Higher layer coding of non-speech like signals using factorial pulse codebook. 671-674 - Sriram Ganapathy, Petr Motlícek, Hynek Hermansky, Harinath Garudadri:
Spectral noise shaping: improvements in speech/audio codec based on linear prediction in spectral domain. 675-678 - Matthew R. Flax, W. Harvey Holmes:
Introducing the compression wave cochlear amplifier. 679-682 - Matthew R. Flax, W. Harvey Holmes:
Goldman-hodgkin-katz cochlear hair cell models - a foundation for nonlinear cochlear mechanics. 683-686 - Changchun Bao, Hai-ting Li, Ze-xin Liu, Rui Fan, Heng Zhu, Mao-shen Jia, Rui Li:
A 8.32 kb/s embedded wideband speech coding candidate for ITU-t EV-VBR standardization. 687-690 - Jong Kyu Kim, Seung Seop Park, Chang Woo Han, Nam Soo Kim:
Decision tree based frame mode selection for AMR-WB+. 695-698 - Wei Ming Liu, Keith A. Jellyman, Nicholas W. D. Evans, John S. D. Mason:
Assessment of objective quality measures for speech intelligibility. 699-702 - Kirstin Scholz, Christine Kühnel, Marcel Wältermann, Sebastian Möller, Ulrich Heute:
Assessment of the speech-quality dimension "noisiness" for the instrumental estimation and analysis of telephone-band speech quality. 703-706 - Angel M. Gomez, José L. Carmona, Antonio M. Peinado, Victoria E. Sánchez, José A. González:
Intelligibility evaluation of Ramsey-derived interleavers for internet voice streaming with the iLBC codec. 707-710
Accent and Language Recognition
- Dau-Cheng Lyu, Ren-Yuan Lyu:
Language identification on code-switching utterances using multiple cues. 711-714 - Rong Tong, Bin Ma, Haizhou Li, Engsiong Chng:
Target-oriented phone selection from universal phone set for spoken language recognition. 715-718 - Pedro A. Torres-Carrasquillo, Elliot Singer, William M. Campbell, Terry P. Gleason, Alan McCree, Douglas A. Reynolds, Fred Richardson, Wade Shen, Douglas E. Sturim:
The MITLL NIST LRE 2007 language recognition system. 719-722 - Pedro A. Torres-Carrasquillo, Douglas E. Sturim, Douglas A. Reynolds, Alan McCree:
Eigen-channel compensation and discriminatively trained Gaussian mixture models for dialect and accent recognition. 723-726 - Ignacio López-Moreno, Daniel Ramos, Joaquin Gonzalez-Rodriguez, Doroteo T. Toledano:
Anchor-model fusion for language recognition. 727-730 - Bo Yin, Tharmarajah Thiruvaran, Eliathamby Ambikairajah, Fang Chen:
Introducing a FM based feature to hierarchical language identification. 731-734 - Yun Lei, John H. L. Hansen:
Dialect classification via discriminative training. 735-738 - Pavel Matejka, Lukás Burget, Ondrej Glembek, Petr Schwarz, Valiantsina Hubeika, Michal Fapso, Tomás Mikolov, Oldrich Plchot, Jan Cernocký:
BUT language recognition system for NIST 2007 evaluations. 739-742 - Ondrej Glembek, Pavel Matejka, Lukás Burget, Tomás Mikolov:
Advances in phonotactic language recognition. 743-746 - Mahnoosh Mehrabani, John H. L. Hansen:
Dialect separation assessment using log-likelihood score distributions. 747-750 - Yousef Ajami Alotaibi, Khondaker Abdullah Al Mamun, Muhammad Ghulam:
Study on unique pharyngeal and uvular consonants in foreign accented Arabic. 751-754 - Fukun Bi, Jian Yang, Dan Xu:
Automatic accent classification using ensemble methods. 755-758 - Marina Piat, Dominique Fohr, Irina Illina:
Foreign accent identification based on prosodic parameters. 759-762 - Wade Shen, Nancy F. Chen, Douglas A. Reynolds:
Dialect recognition using adapted phonetic models. 763-766 - Alan McCree, Fred Richardson, Elliot Singer, Douglas A. Reynolds:
Beyond frame independence: parametric modelling of time duration in speaker and language recognition. 767-770
Prosody: Prosodic Structure, Paralinguistic, Non-linguistic and Other Cues
- Liz Dockendorf, Dalal Almubayei, Matthew Benton:
Testing a large corpus of natural standard Arabic for rhythm class. 771 - Matthew Benton, Liz Dockendorf:
A comparison of two acoustic measurement approaches to the rhythm continuum of natural Chinese and English speech. 772-775 - Tomoko Nariai, Kazuyo Tanaka:
A study of pitch patterns of Japanese English analyzed via comparative linguistic features of English and Japanese. 776-779 - Cécile Woehrling, Philippe Boula de Mareüil, Martine Adda-Decker, Lori Lamel:
A corpus-based prosodic study of Alsatian, Belgian and Swiss French. 780-783 - Mitsuhiro Nakamura:
Prosodic position effects and function words in English: a pilot study. 784 - Laura E. de Ruiter:
How useful are polynomials for analyzing intonation? 785-788 - Qingcai Chen, Shusen Zhou, Dandan Wang, Xiaohong Yang:
Adaptive filter based prosody modification approach. 789-792 - Swe Zin Kalayar Khine, Tin Lay Nwe, Haizhou Li:
Speech/laughter classification in meeting audio. 793-796 - Mary Tai Knox, Nelson Morgan, Nikki Mirghafori:
Getting the last laugh: automatic laughter segmentation in meetings. 797-800 - Stuart N. Wrigley, Simon Tucker, Guy J. Brown, Steve Whittaker:
The influence of audio presentation style on multitasking during teleconferences. 801-804 - Bogdan Vlasenko, Björn W. Schuller, Kinfe Tadesse Mengistu, Gerhard Rigoll, Andreas Wendemuth:
Balancing spoken content adaptation and unit length in the recognition of emotion and interest. 805-808 - Emiel Krahmer, Juliette Schaafsma, Marc Swerts, Ad Vingerhoets:
Nonverbal responses to social inclusion and exclusion. 809-812 - Tatsuya Kitamura:
Acoustic analysis of imitated voice produced by a professional impersonator. 813-816 - Sanjay A. Patil, John H. L. Hansen:
Detection of speech under physical stress: model development, sensor selection, and feature fusion. 817-820
Automatic Speech Recognition: Language Models I, II
- Langzhou Chen, Hisayoshi Nagae, Matthew N. Stuttle:
Improving Japanese language models using POS information. 821-824 - Ebru Arisoy, Brian Roark, Izhak Shafran, Murat Saraclar:
Discriminative n-gram language modeling for Turkish. 825-828 - Ahmad Emami, Imed Zitouni, Lidia Mangu:
Rich morphology based n-gram language models for Arabic. 829-832 - Songfang Huang, Steve Renals:
Unsupervised language model adaptation based on topic and role information in multiparty meetings. 833-836 - Xunying Liu, Mark J. F. Gales, Philip C. Woodland:
Context dependent language model adaptation. 837-840 - Bo-June Paul Hsu, James R. Glass:
Iterative language model estimation: efficient data structure & algorithms. 841-844
Speaker Identification and Verification
- Sachin S. Kajarekar:
Phone-based cepstral polynomial SVM system for speaker recognition. 845-848 - Donglai Zhu, Bin Ma, Haizhou Li:
Using MAP estimation of feature transformation for speaker recognition. 849-852 - Robbie Vogt, Brendan Baker, Sridha Sridharan:
Factor analysis subspace estimation for speaker verification with short utterances. 853-856 - Mitchell McLaren, Driss Matrouf, Robbie Vogt, Jean-François Bonastre:
Combining continuous progressive model adaptation and factor analysis for speaker verification. 857-860 - Chia-Hsin Hsieh, Chung-Hsien Wu, Han-Ping Shen:
Adaptive decision tree-based phone cluster models for speaker clustering. 861-864 - Hagai Aronowitz, Yosef A. Solewicz:
Speaker recognition in two-wire test sessions. 865-868
Prosodic Structure and Processing
- Jason B. Bishop:
The effect of position on the realization of second occurrence focus. 869-872 - Yen-Liang Shue, Stefanie Shattuck-Hufnagel, Markus Iseli, Sun-Ah Jun, Nanette Veilleux, Abeer Alwan:
Effects of intonational phrase boundaries on pitch-accented syllables in american English. 873-876 - Michael Walsh, Katrin Schweitzer, Bernd Möbius, Hinrich Schütze:
Examining pitch-accent variability from an exemplar-theoretic perspective. 877-880 - Jussi Hakokari, Tuomo Saarni, Jouni Isoaho, Tapio Salakoski:
Correlation of utterance length and segmental duration in Finnish is questionable. 881-884 - Jiahong Yuan, Stephen Isard, Mark Liberman:
Different roles of pitch and duration in distinguishing word stress in English. 885 - Maria O'Reilly, Ailbhe Ní Chasaide, Christer Gobl:
Cross-dialect Irish prosody: linguistic constraints on Fujisaki modelling. 886-889
Special Session: Auditory-Inspired Spectro-Temporal Features I, II
- Garimella S. V. S. Sivaram, Hynek Hermansky:
Introducing temporal asymmetries in feature extraction for automatic speech recognition. 890-893 - Martin Heckmann, Xavier Domont, Frank Joublin, Christian Goerick:
A closer look on hierarchical spectro-temporal features (HIST). 894-897 - Sherry Y. Zhao, Nelson Morgan:
Multi-stream spectro-temporal features for robust speech recognition. 898-901 - Huan Wang, David Gelbart, Hans-Günter Hirsch, Werner Hemmert:
The value of auditory offset adaptation and appropriate acoustic modeling. 902-905 - Bernd T. Meyer, Birger Kollmeier:
Optimization and evaluation of Gabor feature sets for ASR. 906-909
Automatic Speech Recognition: Acoustic Models I-III
- Peter Bell, Simon King:
A shrinkage estimator for speech recognition with full covariance HMMs. 910-913 - Peter Bell, Simon King:
Covariance updates for discriminative training by constrained line search. 914 - Brian Mak, Tom Ko:
Min-max discriminative training of decoding parameters using iterative linear programming. 915-918 - Daniel Willett, Chuang He:
Discriminative training for complementariness in system combination. 919 - George Saon, Daniel Povey:
Penalty function maximization for large margin HMM training. 920-923 - Daniel Bolaños, Wayne H. Ward:
Implicit state-tying for support vector machines based speech recognition. 924-927 - Guillermo Aradilla, Hervé Bourlard, Mathew Magimai-Doss:
Using KL-based acoustic models in a large vocabulary recognition task. 928-931 - Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
Acoustic modeling based on model structure annealing for speech recognition. 932-935 - Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition. 936-939 - Jitendra Ajmera, Masami Akamine:
Speech recognition using soft decision trees. 940-943 - Yu Shi, Frank Seide, Frank K. Soong:
GPU-accelerated Gaussian clustering for fMPE discriminative training. 944-947 - Yasser Hifny, Yuqing Gao:
Discriminative training using the trusted expectation maximization. 948-951 - Jui-Ting Huang, Mark Hasegawa-Johnson:
Maximum mutual information estimation with unlabeled data for phonetic classification. 952-955 - Vivek Tyagi:
Maximum accept and reject (MARS) training of HMM-GMM speech recognition systems. 956-959 - Sundararajan Srinivasan, Tao Ma, Daniel May, Georgios Y. Lazarou, Joseph Picone:
Nonlinear mixture autoregressive hidden Markov models for speech recognition. 960-963 - Patrick Cardinal, Pierre Dumouchel, Gilles Boulianne, Michel Comeau:
GPU accelerated acoustic likelihood computations. 964-967
Robust Automatic Speech Recognition I-III
- Masato Nakayama, Takanobu Nishiura, Yuki Denda, Norihide Kitaoka, Kazumasa Yamamoto, Takeshi Yamada, Satoru Tsuge, Chiyomi Miyajima, Masakiyo Fujimoto, Tetsuya Takiguchi, Satoshi Tamura, Tetsuji Ogawa, Shigeki Matsuda, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura:
CENSREC-4: development of evaluation framework for distant-talking speech recognition under reverberant environments. 968-971 - Masanori Tsujikawa, Takayuki Arakawa, Ryosuke Isotani:
In-car speech recognition using model-based wiener filter and multi-condition training. 972-975 - Marco Kühne, Roberto Togneri, Sven Nordholm:
Adaptive beamforming and soft missing data decoding for robust speech recognition in reverberant environments. 976-979 - Bagher BabaAli, Hossein Sameti, Mehran Safayani:
Spectral subtraction in likelihood-maximizing framework for robust speech recognition. 980-983 - Sriram Ganapathy, Samuel Thomas, Hynek Hermansky:
Front-end for far-field speech recognition based on frequency domain linear prediction. 984-987 - Ji Hun Park, Jae Sam Yoon, Hong Kook Kim:
Mask estimation incorporating time-frequency trajectories for a CASA-based ASR front-end. 988-991 - Toru Takahashi, Shun'ichi Yamamoto, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Soft missing-feature mask generation for simultaneous speech recognition system in robots. 992-995 - Dong Wang, Ivan Himawan, Joe Frankel, Simon King:
A posterior approach for microphone array based speech recognition. 996-999 - Yu-Hsiang Bosco Chiu, Richard M. Stern:
Analysis of physiologically-motivated signal processing for robust speech recognition. 1000-1003 - Liang-Che Sun, Chang-Wen Hsu, Lin-Shan Lee:
Evaluation of modulation spectrum equalization techniques for large vocabulary robust speech recognition. 1004-1007 - Yi Chen, Chia-Yu Wan, Lin-Shan Lee:
Confusion-based entropy-weighted decoding for robust speech recognition. 1008-1011 - Svein Gunnar Pettersen, Magne Hallstein Johnsen:
Cepstral domain voice activity detection for improved noise modeling in MMSE feature enhancement for ASR. 1012-1015 - Carlos Molina, Néstor Becerra Yoma, Fernando Huenupán, Claudio Garretón:
Unsupervised re-scoring of observation probability based on maximum entropy criterion by using confidence measure with telephone speech. 1016-1019 - Yuan-Fu Liao, Chi-Hui Hsu, Chi-Min Yang, Jeng-Shien Lin, Sen-Chia Chang:
Within-class feature normalization for robust speech recognition. 1020-1023 - Zheng-Hua Tan, Børge Lindberg:
A posteriori SNR weighted energy based variable frame rate analysis for speech recognition. 1024-1027 - Chih-Cheng Wang, Chi-an Pan, Jeih-Weih Hung:
Silence feature normalization for robust speech recognition in additive noise environments. 1028-1031 - Longbiao Wang, Seiichi Nakagawa, Norihide Kitaoka:
Blind dereverberation based on CMN and spectral subtraction by multi-channel LMS algorithm. 1032-1035
Speech Analysis and Processing, Voice Conversion and Modification
- Hartmut R. Pfitzinger, Christian Kaernbach:
Amplitude and amplitude variation of emotional speech. 1036-1039 - Nitish Krishnamurthy, Ayako Ikeno, John H. L. Hansen:
Babble speech: acoustic and perceptual variability. 1040-1043 - Yannis Pantazis, Olivier Rosec, Yannis Stylianou:
On the properties of a time-varying quasi-harmonic model of speech. 1044-1047 - Wenliang Lu, Deep Sen:
Extraction and tracking of formant response jitter in the cochlea for objective prediction of SB/SF DAM attributes. 1048-1051 - David P. Messing, Lorraine Delhorne, Ed Bruckert, Louis D. Braida, Oded Ghitza:
Consonant discrimination of degraded speech using an efferent-inspired closed-loop cochlear model. 1052-1055 - Vikrant Tomar, Hemant A. Patil:
On the development of variable length Teager energy operator (VTEO). 1056-1059 - Yu Qiao, Nobuaki Minematsu:
Metric learning for unsupervised phoneme segmentation. 1060-1063 - Ozlem Kalinli, Shrikanth S. Narayanan:
Combining task-dependent information with auditory attention cues for prominence detection in speech. 1064-1067 - Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda:
Probabilistic feature mapping based on trajectory HMMs. 1068-1071 - Kaori Yutani, Yosuke Uto, Yoshihiko Nankaku, Tomoki Toda, Keiichi Tokuda:
Simultaneous conversion of duration and spectrum based on statistical models including time-sequence matching. 1072-1075 - Takashi Muramatsu, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory. 1076-1079 - Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
An improved one-to-many eigenvoice conversion system. 1080-1083 - Yoshinori Uchimura, Hideki Banno, Fumitada Itakura, Hideki Kawahara:
Study on manipulation method of voice quality based on the vocal tract area function. 1084-1087 - Arthur R. Toth, Alan W. Black:
Incorporating durational modification in voice transformation. 1088-1091 - Amy Dashiell, Brian Hutchinson, Anna Margolis, Mari Ostendorf:
Non-segmental duration feature extraction for prosodic classification. 1092-1095
Special Session: Tonality in Production and Perception, Language in Australia and New Zealand
- Hongying Zheng, William S.-Y. Wang:
An ERP study on categorical perception of lexical tones and nonspeech pitches. 1096 - Takashi Otake, Marii Higuchi:
The role of Japanese pitch accent in spoken-word recognition: evidence from middle-aged accentless dialect listeners. 1097-1100 - Siwei Wang, Gina-Anne Levow:
Mandarin Chinese tone nucleus detection with landmarks. 1101-1104 - Weixiang Hu, Jin Jian, Aijun Li, Xia Wang:
A comparative study on dissyllabic stress patterns of Mandarin and Cantonese. 1105-1108 - Rerrario Shui-Ching Ho, Yoshinori Sagisaka:
Three-sectional-staff characterization of Cantonese level tones. 1109-1112 - Xiaonong Zhu, Caicai Zhang:
A seven-tone dialect in southern China with falling-rising-falling contour: a linguistic acoustic analysis. 1113-1115 - Santitham Prom-on:
Pitch target analysis of Thai tones using quantitative target approximation model and unsupervised clustering. 1116-1119 - Connie K. So, Catherine T. Best:
Do English speakers assimilate Mandarin tones to English prosodic categories? 1120 - Rikke L. Bundgaard-Nielsen, Catherine T. Best, Michael D. Tyler, Christian Kroos:
Evidence of a near-merger in western sydney australian English vowels. 1121 - Marija Tabain, Kristine Rickard, Gavan Breen, Veronica Dobson:
Central vowels in Arrernte: metrical prominence and pitch accent. 1122 - Bella Ross:
Pausing and phrase length in two australian languages. 1123 - Mary Stevens, John Hajek:
Positional effects on the characterization of ejectives in Waima'a. 1124-1127 - Donna Starks, Laura Thompson, Catherine Inez Watson:
A Niuean variant of New Zealand English? 1128
Automatic Speech Recognition: Tone Languages
- Guo-Hong Ding:
Phonetic confusion analysis and robust phone set generation for Shanghai-accented Mandarin speech recognition. 1129-1132 - Yu Ting Yeung, Yao Qian, Tan Lee, Frank K. Soong:
Prosody for Mandarin speech recognition: a comparative study of read and spontaneous speech. 1133-1136 - Li-Wei Cheng, Lin-Shan Lee:
Improved large vocabulary Mandarin speech recognition by selectively using tone information with a two-stage prosodic model. 1137-1140 - Tingting Ru, Xiang Xie, Hui Yin, Jingming Kuang:
Mandarin connected digits recognition for whispered speech. 1141-1144 - Changchun Bao, Weiqun Xu, Yonghong Yan:
Recognizing named entities in spoken Chinese dialogues with a character-level maximum entropy tagger. 1145-1148 - Hong Quang Nguyen, Pascal Nocera, Eric Castelli, Van Loan Trinh:
A novel approach in continuous speech recognition for Vietnamese, an isolating tonal language. 1149-1152
Spoken Dialogue Systems
- Blaise Thomson, Kai Yu, Milica Gasic, Simon Keizer, François Mairesse, Jost Schatzmann, Steve J. Young:
Evaluating semantic-level confidence scores with multiple hypotheses. 1153-1156 - Geoffrey Zweig, Dan Bohus, Xiao Li, Patrick Nguyen:
Structured models for joint decoding of repeated utterances. 1157-1160 - Marie-Jean Meurs, Fabrice Lefèvre, Renato de Mori:
A Bayesian approach to semantic composition for spoken language interpretation. 1161-1164 - Tim Paek, Yun-Cheng Ju:
Accommodating explicit user expressions of uncertainty in voice search or something like that. 1165-1168 - Dongho Kim, Hyeong Seop Sim, Kee-Eung Kim, Jin Hyung Kim, Hyunjeong Kim, Joo Won Sung:
Effects of user modeling on POMDP-based dialogue systems. 1169-1172 - Jason D. Williams:
The best of both worlds: unifying conventional dialog systems and POMDPs. 1173-1176
Cross-Language and Language-Specific Phonetics
- Rikke L. Bundgaard-Nielsen, Catherine T. Best, Michael D. Tyler:
The assimilation of L2 australian English vowels to L1 Japanese vowel categories: vocabulary size matters. 1177 - Azra Nahid Ali, Mohamed Lahrouchi, Michael Ingleby:
Vowel epenthesis, acoustics and phonology patterns in Moroccan Arabic. 1178-1181 - Gaowu Wang, Jianwu Dang, Jiangping Kong:
Estimation of vocal tract area function for Mandarin vowel sequences using MRI. 1182-1185 - Kimiko Tsukada, Thu T. A. Nguyen:
The effect of first language (L1) dialects on the identification of Vietnamese word-final stops. 1186-1189 - Mark Antoniou, Catherine T. Best, Michael D. Tyler:
Perceptual evidence of modern Greek voiced stops as phonological categories. 1190 - Valérie Hazan, Enid Li:
The effect of auditory and visual degradation on audiovisual perception of native and non-native speakers. 1191-1194
Special Session: Prosody of Spontaneous Speech I, II
- Hansjörg Mixdorff:
Quantitative prosodic analysis of spontaneous speech. 1195 - Anders Lindström, Jessica Villing, Staffan Larsson, Alexander Seward, Nina Åberg, Cecilia Holtelius:
The effect of cognitive load on disfluencies during in-vehicle spoken dialogue. 1196-1199 - Chiu-yu Tseng, Zhao-yu Su:
Discourse prosody context - global F0 and tempo modulations. 1200-1203 - Nicolas Obin, Anne Lacheret-Dujour, Christophe Veaux, Xavier Rodet, Anne-Catherine Simon:
A method for automatic and dynamic estimation of discourse genre typology with prosodic features. 1204-1207 - Carlos Toshinori Ishi, Hiroshi Ishiguro, Norihiro Hagita:
The meanings carried by interjections in spontaneous speech. 1208-1211 - Christian Martyn Jones, Andrew Deeming:
Speech interaction with an emotional robotic dog. 1212-1215 - Keiko Ochi, Keikichi Hirose, Nobuaki Minematsu:
Control of prosodic focus in corpus-based generation of fundamental frequency based on the generation process model. 1216
Automatic Speech Recognition: Adaptation I, II
- Martin Karafiát, Lukás Burget, Thomas Hain, Jan Cernocký:
Discrimininative training of narrow band - wide band adapted systems for meeting recognition. 1217-1220 - Seongjun Hahm, Akinori Ito, Shozo Makino, Motoyuki Suzuki:
A fast speaker adaptation method using aspect model. 1221-1224 - Dan Su, Xihong Wu, Huisheng Chi:
Probabilistic latent speaker training for large vocabulary speech recognition. 1225-1228 - Shutaro Tanji, Koichi Shinoda, Sadaoki Furui, Antonio Ortega:
Improvement of eigenvoice-based speaker adaptation by parameter space clustering. 1229-1232 - D. Rama Sanand, Srinivasan Umesh:
Study of jacobian compensation using linear transformation of conventional MFCC for VTLN. 1233-1236 - Chuan-Wei Ting, Kuo-Yuan Lee, Jen-Tzung Chien:
Adaptive HMM topology for speech recognition. 1237-1240 - Liang-Yu Chen, Chun-Jen Lee, Jyh-Shing Roger Jang:
Minimum phone error discriminative training for Mandarin Chinese speaker adaptation. 1241-1244 - Daniel Povey, Hong-Kwang Jeff Kuo, Hagen Soltau:
Fast speaker adaptive training for speech recognition. 1245-1248
Robust Automatic Speech Recognition I-III
- Yuan-Fu Liao, Hung-Hsiang Fang, Chi-Hui Hsu:
Eigen-MLLR environment/speaker compensation for robust speech recognition. 1249-1252 - Dong Yu, Li Deng, Yifan Gong, Alex Acero:
Parameter clustering and sharing in variable-parameter HMMs for noise robust speech recognition. 1253-1256 - Jun Du, Qiang Huo:
A feature compensation approach using high-order vector taylor series approximation of an explicit distortion model for noisy speech recognition. 1257-1260 - Xiaodong Cui, Mohamed Afify, Yuqing Gao:
N-best based stochastic mapping on stereo HMM for noise robust speech recognition. 1261-1264 - Yu Tsao, Chin-Hui Lee:
Improving the ensemble speaker and speaking environment modeling approach by enhancing the precision of the online estimation process. 1265-1268 - Jianhua Lu, Ji Ming, Roger F. Woods:
Combining noise compensation and missing-feature decoding for large vocabulary speech recognition in noise. 1269-1272 - Svein Gunnar Pettersen:
Joint Bayesian predictive classification and parallel model combination with prior scaling for robust ASR. 1273-1276 - Abhishek Kumar, John H. L. Hansen:
Environment mismatch compensation using average eigenspace for speech recognition. 1277-1280 - Daniel Povey, Brian Kingsbury:
Monte Carlo model-space noise adaptation for speech recognition. 1281-1284 - Ning Ma, Phil D. Green:
A 'speechiness' measure to improve speech decoding in the presence of other sound sources. 1285-1288 - Luis Buera, Antonio Miguel, Oscar Saz, Alfonso Ortega, Eduardo Lleida:
Feature vector normalization with combined standard and throat microphones for robust ASR. 1289-1292 - Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura:
Phone-duration-dependent long-term dynamic features for a stochastic model-based voice activity detection. 1293-1296 - Yusuke Ijima, Makoto Tachibana, Takashi Nose, Takao Kobayashi:
An on-line adaptation technique for emotional speech recognition using style estimation with multiple-regression HMM. 1297-1300 - Michael Berkovitch, Ilan D. Shallom:
HMM adaptation using statistical linear approximation for robust automatic speech recognition. 1301-1304 - Steven J. Rennie, Pierre L. Dognin:
Beyond linear transforms: efficient non-linear dynamic adaptation for noise robust speech recognition. 1305-1308 - Randy Gomez, Jani Even, Kiyohiro Shikano:
Rapid unsupervised speaker adaptation robust in reverberant environment conditions. 1309-1312
Features for Speech and Speaker Recognition
- Xing Fan, John H. L. Hansen:
Speaker identification for whispered speech based on frequency warping and score competition. 1313-1316 - Tania Habib, Lukas Ottowitz, Marián Képesi:
Experimental evaluation of multi-band position-pitch estimation (m-popi) algorithm for multi-speaker localization. 1317-1320 - N. Dhananjaya, S. Rajendran, B. Yegnanarayana:
Features for automatic detection of voice bars in continuous speech. 1321-1324 - Carlos Segura, Alberto Abad, Javier Hernando, Climent Nadeu:
Speaker orientation estimation based on hybridation of GCC-PHAT and HLBR. 1325-1328 - Jun Hou, Lawrence R. Rabiner, Sorin Dusan:
Parallel and hierarchical speech feature classification using frame and segment-based methods. 1329-1332 - Balakrishnan Varadarajan, Sanjeev Khudanpur:
Automatically learning speaker-independent acoustic subword units. 1333-1336 - Waleed H. Abdulla, Yushi Zhang:
Human-like ears versus two-microphone array, which works better for speaker identification? 1337-1340 - Kenji Kobayashi, Mitsuhiro Somiya, Hiromitsu Nishizaki, Yoshihiro Sekiguchi:
Is a speech recognizer useful for characteristic analysis of classroom lecture speech? 1341-1344 - Ladan Golipour, Douglas D. O'Shaughnessy:
An intuitive class discriminability measure for feature selection in a speech recognition system. 1345-1348 - Yu Qiao, Nobuaki Minematsu:
f-divergence is a generalized invariant measure between distributions. 1349-1352 - Daniele Giacobello, Mads Græsbøll Christensen, Joachim Dahl, Søren Holdt Jensen, Marc Moonen:
Sparse linear predictors for speech processing. 1353-1356 - Johan Xi Zhang, Mads Græsbøll Christensen, Joachim Dahl, Søren Holdt Jensen, Marc Moonen:
Frequency-domain parameter estimations for binary masked signals. 1357-1360 - Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose:
Decomposition of rotational distortion caused by VTL difference using eigenvalues of its transformation matrix. 1361-1364 - Michail G. Maragakis, Alexandros Potamianos:
Region-based vocal tract length normalization for ASR. 1365-1368
Speaker Recognition: Kernel-Based and Session Mismatch
- Hideki Okamoto, Tomoko Matsui, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Speaker verification with non-audible murmur segments by combining global alignment kernel and penalized logistic regression machine. 1369-1372 - Liang Lu, Yuan Dong, Xianyu Zhao, Jian Zhao, Chengyu Dong, Haila Wang:
Analysis of subspace within-class covariance normalization for SVM-based speaker verification. 1373-1376 - Xianyu Zhao, Yuan Dong, Jian Zhao, Liang Lu, Jiqing Liu, Haila Wang:
Comparison of input and feature space nonlinear kernel nuisance attribute projections for speaker verification. 1377-1380 - Chris Longworth, Mark J. F. Gales:
A generalised derivative kernel for speaker verification. 1381-1384 - Luciana Ferrer:
Modeling prior belief for speaker verification SVM systems. 1385-1388 - Delphine Charlet, Xianyu Zhao, Yuan Dong:
Convergence between SVM-based and distance-based paradigms for speaker recognition. 1389-1392 - Shi-Xiong Zhang, Man-Wai Mak:
High-level speaker verification via articulatory-feature based sequence kernels and SVM. 1393-1396 - Kong-Aik Lee, Changhuai You, Haizhou Li, Tomi Kinnunen, Donglai Zhu:
Characterizing speech utterances for speaker verification with sequence kernel SVM. 1397-1400 - Patrick Kenny, Najim Dehak, Pierre Ouellet, Vishwa Gupta, Pierre Dumouchel:
Development of the primary CRIM system for the NIST 2008 speaker recognition evaluation. 1401-1404 - Robbie Vogt, Sridha Sridharan, Michael Mason:
Making confident speaker verification decisions with minimal speech. 1405-1408 - Jun Luo, Cheung-Chi Leung, Marc Ferras, Claude Barras:
Parallelized factor analysis and feature normalization for automatic speaker verification. 1409-1412 - Daniel Garcia-Romero, Carol Y. Espy-Wilson:
Intersession variability in speaker recognition: a behind the scene analysis. 1413-1416 - Tatsuya Ito, Kei Hashimoto, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
Speaker recognition based on variational Bayesian method. 1417-1420 - Driss Matrouf, Jean-François Bonastre, Salah Eddine Mezaache:
Factor analysis multi-session training constraint in session compensation for speaker verification. 1421-1424 - Ying Liu, Martin J. Russell, Michael J. Carey:
The role of 'delta' features in speaker verification. 1425-1428
Broadcast Transcription Systems
- Lori Lamel, Abdelkhalek Messaoudi, Jean-Luc Gauvain:
Investigating morphological decomposition for transcription of Arabic broadcast news and broadcast conversation data. 1429-1432 - Petr Fousek, Lori Lamel, Jean-Luc Gauvain:
Transcribing broadcast data using MLP features. 1433-1436 - Dimitra Vergyri, Arindam Mandal, Wen Wang, Andreas Stolcke, Jing Zheng, Martin Graciarena, David Rybach, Christian Gollan, Ralf Schlüter, Katrin Kirchhoff, Arlo Faria, Nelson Morgan:
Development of the SRI/nightingale Arabic ASR system. 1437-1440 - Christian Gollan, Hermann Ney:
Towards automatic learning in LVCSR: rapid development of a Persian broadcast transcription system. 1441-1444 - Roger Hsiao, Mark C. Fuhs, Yik-Cheung Tam, Qin Jin, Tanja Schultz:
The CMU-interACT 2008 Mandarin transcription system. 1445-1448 - Anoop Deoras, Jürgen Fritsch:
Decoding-time prediction of non-verbalized punctuation. 1449-1452
Voice Conversion and Modification
- Elina Helander, Jan Schwarz, Jani Nurminen, Hanna Silén, Moncef Gabbouj:
On the impact of alignment on voice conversion performance. 1453-1456 - Arantza del Pozo, Steve J. Young:
The linear transformation of LF glottal waveforms for voice conversion. 1457-1460 - Daisuke Tani, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari, Kiyohiro Shikano:
Maximum a posteriori adaptation for many-to-one eigenvoice conversion. 1461-1463 - Viet-Anh Tran, Gérard Bailly, Hélène Loevenbruck, Christian Jutten:
Improvement to a NAM captured whisper-to-speech system. 1465-1468 - Tran Huy Dat, Haizhou Li:
Speaker identification in noise mismatch conditions based on jump function Kolmogorov analysis in wavelet domain. 1469-1472
Phonetics: General
- Odette Scharenborg:
Modelling fine-phonetic detail in a computational model of word recognition. 1473-1476 - Helmer Strik, Joost van Doremalen, Catia Cucchiarini:
Pronunciation reduction: how it relates to speech style, gender, and age. 1477-1480 - B. Yegnanarayana, S. Rajendran, Hussien Seid Worku, N. Dhananjaya:
Analysis of glottal stops in speech signals. 1481-1484 - Daniel Neiberg, Gopal Ananthakrishnan, Olov Engwall:
The acoustic to articulation mapping: non-linear or non-unique? 1485-1488 - Xiaodan Zhuang, Hosung Nam, Mark Hasegawa-Johnson, Louis M. Goldstein, Elliot Saltzman:
The entropy of the articulatory phonological code: recognizing gestures from tract variables. 1489-1492
Special Session: Forensic Speaker Recognition - Traditional and Automatic Approaches
- Daniel Ramos, Joaquin Gonzalez-Rodriguez, Javier Gonzalez-Dominguez, Jose Juan Lucena-Molina:
Addressing database mismatch in forensic speaker recognition with Ahumada III: a public real-casework database in Spanish. 1493-1496 - Tharmarajah Thiruvaran, Eliathamby Ambikairajah, Julien Epps:
FM features for automatic forensic speaker recognition. 1497-1500 - Geoffrey Stewart Morrison, Yuko Kinoshita:
Automatic-type calibration of traditionally derived likelihood ratios: forensic analysis of australian English /o/ formant trajectories. 1501-1504 - Timo Becker, Michael Jessen, Catalin Grigoras:
Forensic speaker verification using formant features and Gaussian mixture models. 1505-1508 - Elizabeth Shriberg, Andreas Stolcke:
The case for automatic higher-level features in forensic speaker recognition. 1509-1512
Automatic Speech Recognition: Features I, II
- Kye-Hwan Lee, Sang-Ick Kang, Ji-Hyun Song, Joon-Hyuk Chang:
Group delay function for improved gender identification. 1513-1516 - Joseph Razik, Odile Mella, Dominique Fohr, Jean Paul Haton:
Frame-synchronous and local confidence measures for on-the-fly automatic speech recognition. 1517-1520 - Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Hilbert envelope based spectro-temporal features for phoneme recognition in telephone speech. 1521-1524 - Abhijeet Sangwan, Ayako Ikeno, John H. L. Hansen:
Evidence of coarticulation in a phonological feature detection system. 1525-1528 - Mohammad Nurul Huda, Kouichi Katsurada, Tsuneo Nitta:
Phoneme recognition based on hybrid neural networks with inhibition/enhancement of distinctive phonetic feature (DPF) trajectories. 1529-1532 - Hongbing Hu, Stephen A. Zahorian:
A neural network based nonlinear feature transformation for speech recognition. 1533-1536 - R. Ramya, Rajesh M. Hegde, Hema A. Murthy:
Significance of group delay based acoustic features in the linguistic search space for robust speech recognition. 1537-1540 - Houman Abbasian, Babak Nasersharif, Ahmad Akbari:
Genetic programming based optimization of class-dependent PCA for extracting robust MFCC. 1541-1544 - K. V. S. Narayana, T. V. Sreenivas:
Comparison of AM-FM based features for robust speech recognition. 1545-1548 - Joe Frankel, Dong Wang, Simon King:
Growing bottleneck features for tandem ASR. 1549 - Veena Karjigi, Preeti Rao:
Landmark based recognition of stops: acoustic attributes versus smoothed spectra. 1550-1553 - Satoru Kogure, Hiromitsu Nishizaki, Masatoshi Tsuchiya, Kazumasa Yamamoto, Shingo Togashi, Seiichi Nakagawa:
Speech recognition performance of CJLC: corpus of Japanese lecture contents. 1554-1557
Automatic Speech Recognition: Language Models I, II
- Kengo Ohta, Masatoshi Tsuchiya, Seiichi Nakagawa:
Evaluating spoken language model based on filler prediction model in speech recognition. 1558-1561 - Nicolae Duta:
Transcription-less call routing using unsupervised language model adaptation. 1562-1565 - Zhen-Yu Pan, Hui Jiang:
Large margin multinomial mixture model for text categorization. 1566-1569 - Yu Ting Yeung, Houwei Cao, Nengheng Zheng, Tan Lee, P. C. Ching:
Language modeling for speech recognition of spoken Cantonese. 1570-1573 - Akio Kobayashi, Takahiro Oku, Shinichi Homma, Shoei Sato, Toru Imai, Tohru Takagi:
Discriminative rescoring based on minimization of word errors for transcribing broadcast news. 1574-1577 - Qin Shi, Stephen M. Chu, Wen Liu, Hong-Kwang Jeff Kuo, Yi Liu, Yong Qin:
Search and classification based language model adaptation. 1578-1581 - Marijn Huijbregts, Roeland Ordelman, Franciska de Jong:
Fast n-gram language model look-ahead for decoders with static pronunciation prefix trees. 1582-1585 - Kwanchiva Saykhum, Vataya Boonpiam, Nattanun Thatphithakkul, Chai Wutiwiwatchai, Cholwich Nattee:
Thai named-entity recognition using class-based language modeling on multiple-sized subword units. 1586-1589 - Stefan Schwärzler, Jürgen T. Geiger, Joachim Schenk, Marc A. Al-Hames, Benedikt Hörnler, Günther Ruske, Gerhard Rigoll:
Combining statistical and syntactical systems for spoken language understanding with graphical models. 1590-1593 - Abhinav Sethy, Bhuvana Ramabhadran:
Bag-of-word normalized n-gram models. 1594-1597 - Sangyun Hahn, Abhinav Sethy, Hong-Kwang Jeff Kuo, Bhuvana Ramabhadran:
A study of unsupervised clustering techniques for language modeling. 1598-1601 - Ciro Martins, António J. S. Teixeira, João Paulo Neto:
Automatic estimation of language model parameters for unseen words using morpho-syntactic contextual information. 1602-1605 - Nigel G. Ward, Alejandro Vega:
Modeling the effects on time-into-utterance on word probabilities. 1606-1609 - Ye-Yi Wang, Xiao Li, Alex Acero:
Inductive and example-based learning for text classification. 1610-1613 - Theresa Wilson, Stephan Raaijmakers:
Comparing word, character, and phoneme n-grams for subjective utterance recognition. 1614-1617 - Marcello Federico, Nicola Bertoldi, Mauro Cettolo:
IRSTLM: an open source toolkit for handling large scale language models. 1618-1621
Speech Resources and Technology Evaluation
- Tatsuya Kawahara, Hisao Setoguchi, Katsuya Takanashi, Kentaro Ishizuka, Shoko Araki:
Multi-modal recording, analysis and indexing of poster sessions. 1622-1625 - Jindrich Matousek, Jan Romportl:
Automatic pitch-synchronous phonetic segmentation. 1626-1629 - Wade Shen, Joseph P. Olive, Douglas A. Jones:
Two protocols comparing human and machine phonetic recognition performance in conversational speech. 1630-1633 - Tomoyuki Kato, Jun Okamoto, Makoto Shozakai:
Analysis of drivers' speech in a car environment. 1634-1637 - Barbara Schuppler, Mirjam Ernestus, Odette Scharenborg, Lou Boves:
Preparing a corpus of dutch spontaneous dialogues for automatic phonetic analysis. 1638-1641 - Bojan Kotnik, Pierre Sendorek, Sergey Astrov, Turgay Koç, Tolga Çiloglu, Laura Docío Fernández, Eduardo Rodríguez Banga, Harald Höge, Zdravko Kacic:
Evaluation of voice activity and voicing detection. 1642-1645 - Christoph Draxler, Klaus Jänsch:
Wikispeech - a content management system for speech databases. 1646-1649 - Grazyna Demenko, Jolanta Bachan, Bernd Möbius, Katarzyna Klessa, Marcin Szymanski, Stefan Grocholewski:
Development and evaluation of Polish speech corpus for unit selection speech synthesis systems. 1650-1653 - John F. Pitrelli, Burn L. Lewis, Edward A. Epstein, Jerome L. Quinn, Ganesh N. Ramaswamy:
A data format enabling interoperation of speech recognition, translation and information extraction engines: the GALE type system. 1654-1657 - Wei Li, Qiang Huo:
A rank-predicted pseudo-greedy approach to efficient text selection from large-scale corpus for maximum coverage of target units. 1658-1661 - Klaus-Peter Engelbrecht, Michael Kruppa, Sebastian Möller, Michael Quade:
Memo workbench for semi-automated usability testing. 1662-1665 - Kimiko Yamakawa, Tomoko Matsui, Shuichi Itahashi:
MDS-based visualization method for multiple speech corpora. 1666-1669 - Carlos Busso, Shrikanth S. Narayanan:
Scripted dialogs versus improvisation: lessons learned about emotional elicitation techniques from the IEMOCAP database. 1670-1673
Special Session: Prosody of Spontaneous Speech I, II
- Keith W. Godin, John H. L. Hansen:
Analysis and perception of speech under physical task stress. 1674-1677 - Chi-Chun Lee, Sungbok Lee, Shrikanth S. Narayanan:
An analysis of multimodal cues of interruption in dyadic spoken interactions. 1678-1681 - Hiroki Mori, Hideki Kasuya:
Paralinguistic effects on turn-taking behavior in expressive conversation. 1682 - Zhigang Yin, Aijun Li, Ziyu Xiong:
Study on "ng, a" type of discourse markers in standard Chinese. 1683-1686 - Helena Moniz, Ana Isabel Mata, Isabel Trancoso, Céu Viana:
How can you use disfluencies and still sound as a good speaker? 1687 - Eva Strangert, Joakim Gustafson:
What makes a good speaker? subject ratings, acoustic measurements and perceptual evaluations. 1688-1691 - Spyros Kousidis, David Dorran, Yi Wang, Brian Vaughan, Charlie Cullen, Dermot Campbell, Ciaran McDonnell, Eugene Coyle:
Towards measuring continuous acoustic feature convergence in unconstrained spoken dialogues. 1692-1695 - Tatsuya Kawahara, Masayoshi Toyokura, Teruhisa Misu, Chiori Hori:
Detection of feeling through back-channels in spoken dialogue. 1696
Automatic Speech Recognition: Adaptation I, II
- Chandra Kant Raut, Kai Yu, Mark J. F. Gales:
Adaptive training using discriminative mapping transforms. 1697-1700 - Jonas Lööf, Christian Gollan, Hermann Ney:
Speaker adaptive training using shift-MLLR. 1701-1704 - Daniel Povey, Hong-Kwang Jeff Kuo:
XMLLR for improved speaker adaptation in speech recognition. 1705-1708 - Jing Huang, Mark Epstein, Marco Matassoni:
Effective acoustic adaptation for a distant-talking interactive TV system. 1709-1712 - P. T. Akhil, Shakti Prasad Rath, Srinivasan Umesh, D. Rama Sanand:
A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics. 1713-1716 - Shizhen Wang, Steven M. Lulich, Abeer Alwan:
A reliable technique for detecting the second subglottal resonance and its use in cross-language speaker adaptation. 1717-1720
Applications in Education and Learning I, II
- Om Deshmukh, Sachindra Joshi, Ashish Verma:
Automatic pronunciation evaluation and classification. 1721-1724 - Daniel Bolaños, Wayne H. Ward, Barbara Wise, Sarel van Vuuren:
Pronunciation error detection techniques for children's speech. 1725-1728 - Lan Wang, Xin Feng, Helen M. Meng:
Automatic generation and pruning of phonetic mispronunciations to support computer-aided pronunciation training. 1729-1732 - Xiaolong Li, Li Deng, Yun-Cheng Ju, Alex Acero:
Automatic children's reading tutor on hand-held devices. 1733-1736 - Hongcui Wang, Tatsuya Kawahara:
A Japanese CALL system based on dynamic question generation and error prediction for ASR. 1737-1740
Speech Pathologies
- Heejin Kim, Mark Hasegawa-Johnson, Adrienne Perlman, Jon R. Gunderson, Thomas S. Huang, Kenneth L. Watkin, Simone Frame:
Dysarthric speech database for universal access research. 1741-1744 - Catherine Middag, Gwen Van Nuffelen, Jean-Pierre Martens, Marc De Bodt:
Objective intelligibility assessment of pathological speakers. 1745-1748 - Joan K.-Y. Ma, Tara L. Whitehill:
Quantitative analysis of intonation patterns produced by Cantonese speakers with Parkinson's disease: a preliminary study. 1749-1752 - Marieke de Bruijn, Irma Verdonck-de Leeuw, Louis ten Bosch, Joop Kuik, Hugo Quené, Lou Boves, Hans Langendijk, C. René Leemans:
Phonetic-acoustic and feature analyses by a neural network to assess speech quality in patients treated for head and neck cancer. 1753-1756 - Andreas K. Maier, Florian Hönig, Christian Hacker, Maria Schuster, Elmar Nöth:
Automatic evaluation of characteristic speech disorders in children with cleft lip and palate. 1757-1760 - Santiago Omar Caballero Morales, Stephen J. Cox:
Application of weighted finite-state transducers to improve recognition accuracy for dysarthric speech. 1761-1764
Special Session: Consonant Challenge . Human-Machine Comparisons of Consonant Recognition in Noise
- Martin Cooke, Odette Scharenborg:
The interspeech 2008 consonant challenge. 1765-1768 - Bengt J. Borgström, Abeer Alwan:
HMM-based estimation of unreliable spectral components for noise robust speech recognition. 1769-1772 - Jae Sam Yoon, Ji Hun Park, Hong Kook Kim:
Gammatone-domain model combination for consonant recognition in noisy environments. 1773-1776 - Peter Jancovic, Münevver Köküer:
On the mask modeling and feature representation in the missing-feature ASR: evaluation on the Consonant Challenge. 1777-1780 - María Luisa García Lecumberri, Martin Cooke, Francesco Cutugno, Mircea Giurgiu, Bernd T. Meyer, Odette Scharenborg, Wim A. van Dommelen, Jan Volín:
The non-native consonant challenge for european languages. 1781-1784 - Jort F. Gemmeke, Bert Cranen:
Noise reduction through compressed sensing. 1785-1788 - Björn W. Schuller, Martin Wöllmer, Tobias Moosmayr, Gerhard Rigoll:
Speech recognition in noisy environments using a switching linear dynamic model for feature enhancement. 1789-1792 - Nao Hodoshima, Wataru Yoshida, Takayuki Arai:
Improving consonant identification in noise and reverberation by steady-state suppression as a preprocessing approach. 1793-1796 - Bryce E. Lobdell, Mark Hasegawa-Johnson, Jont B. Allen:
Human speech perception and feature extraction. 1797-1800
Automatic Speech Recognition: Lexical and Prosodic Models
- Tien Ping Tan, Laurent Besacier:
Improving pronunciation modeling for non-native speech recognition. 1801-1804 - Hagai Aronowitz:
Online vocabulary adaptation using contextual information and information retrieval. 1805-1808 - Yoshifumi Onishi:
Lexicon expansion using pronunciation variations extracted on the basis of speaker-related deviation in recognition error statistics. 1809-1812 - Joseph Tepperman, Shrikanth S. Narayanan:
Better nonnative intonation scores through prosodic theory. 1813-1816 - Philip N. Garner:
Silence models in weighted finite-state transducers. 1817-1820 - Tetsuro Sasada, Shinsuke Mori, Tatsuya Kawahara:
Extracting word-pronunciation pairs from comparable set of text and speech. 1821-1824
Speech Synthesis Methods I, II
- Vincent Pollet, Andrew P. Breen:
Synthesis by generation and concatenation of multiform segments. 1825-1828 - João P. Cabral, Steve Renals, Korin Richmond, Junichi Yamagishi:
Glottal spectral separation for parametric speech synthesis. 1829-1832 - John Kominek, Sameer Badaskar, Tanja Schultz, Alan W. Black:
Improving speech systems built from very little data. 1833-1836 - Daisuke Saito, Satoshi Asakawa, Nobuaki Minematsu, Keikichi Hirose:
Structure to speech conversion - speech generation based on infant-like vocal imitation. 1837-1840 - Stas Tiomkin, David Malah:
Statistical text-to-speech synthesis with improved dynamics. 1841-1844 - Gabriel Webster, Norbert Braunschweiler:
An evaluation of non-standard features for grapheme-to-phoneme conversion. 1845-1848 - Yannis Agiomyrgiannakis, Olivier Rosec:
Towards flexible speech coding for speech synthesis: an LF + modulated noise vocoder. 1849-1852 - Hanna Silén, Elina Helander, Jani Nurminen, Moncef Gabbouj:
Evaluation of Finnish unit selection and HMM-based speech synthesis. 1853-1856 - Barry-John Theobald, Nicholas Wilkinson:
A probabilistic trajectory synthesis system for synthesising visual speech. 1857-1860 - Didier Cadic, Lionel Segalen:
Paralinguistic elements in speech synthesis. 1861-1864 - E. Veera Raghavendra, B. Yegnanarayana, Alan W. Black, Kishore Prahallad:
Building sleek synthesizers for multi-lingual screen reader. 1865-1868 - Simon King, Keiichi Tokuda, Heiga Zen, Junichi Yamagishi:
Unsupervised adaptation for HMM-based speech synthesis. 1869-1872 - Volker Strom, Simon King:
Investigating festival's target cost function using perceptual experiments. 1873-1876 - Friedrich Neubarth, Michael Pucher, Christian Kranzler:
Modeling Austrian dialect varieties for TTS. 1877-1880 - Tuomo Raitio, Antti Suni, Hannu Pulakka, Martti Vainio, Paavo Alku:
HMM-based Finnish text-to-speech system utilizing glottal inverse filtering. 1881-1884 - Tanuja Sarkar, Sachin Joshi, Sathish Chandra Pammi, Kishore Prahallad:
LTS using decision forest of regression trees and neural networks. 1885-1888 - Silvia Rustullet, Daniela Braga, João Nogueira, Miguel Sales Dias:
Automatic word stress marking and syllabification for Catalan TTS. 1889-1892
Speaker Recognition: Adverse Conditions and Forensics
- Qin Jin, Tanja Schultz:
Robust far-field speaker identification under mismatched conditions. 1893-1896 - Chien-Lin Huang, Bin Ma, Chung-Hsien Wu, Brian Mak, Haizhou Li:
Robust speaker verification using short-time frequency with long-time window and fusion of multi-resolutions. 1897-1900 - C. H. Kwon, J. K. Choi, Eliathamby Ambikairajah:
Performance improvement of text-independent speaker verification systems based on histogram enhancement in noisy environments. 1901-1904 - Jun-Won Suh, Pongtep Angkititrakul, John H. L. Hansen:
Filling acoustic holes through leveraged uncorellated GMMs for in-set/out-of-set speaker recognition. 1905-1908 - Wooil Kim, John H. L. Hansen:
Missing-feature method for speaker recognition in band-restricted conditions. 1909-1912 - Yushi Zhang, Waleed H. Abdulla:
Robust speaker identification using cross-correlation GTF-ICA feature. 1913-1916 - Kanae Amino, Takayuki Arai:
Perceptual speaker identification using monosyllabic stimuli - effects of the nucleus vowels and speaker characteristics contained in nasals. 1917-1920 - Amitava Das, Gokul Chittaranjan:
Text-dependent speaker recognition by efficient capture of speaker dynamics in compressed time-frequency representations of speech. 1921-1924 - Amitava Das, Gokul Chittaranjan, Gopala Krishna Anumanchipalli:
Usefulness of text-conditioning and a new database for text-dependent speaker recognition research. 1925-1928 - Satoru Tsuge, Takashi Osanai, Hisanori Makinae, Toshiaki Kamada, Minoru Fukumi, Shingo Kuroiwa:
Combination method of bone-conduction speech and air-conduction speech for speaker recognition. 1929-1932 - Doroteo T. Toledano, Daniel Hernández López, Cristina Esteve-Elizalde, Joaquin Gonzalez-Rodriguez, Rubén Fernández Pozo, Luis A. Hernández Gómez:
MAP and sub-word level t-norm for text-dependent speaker recognition. 1933-1936 - Cuiling Zhang, Geoffrey Stewart Morrison, Philip Rose:
Forensic speaker recognition in Chinese: a multivariate likelihood ratio discrimination on /i/ and /y/. 1937-1940 - Shunichi Ishihara, Yuko Kinoshita:
How many do we need? exploration of the population size effect on the performance of forensic speaker classification. 1941-1944 - Cheung-Chi Leung, Marc Ferras, Claude Barras, Jean-Luc Gauvain:
Comparing prosodic models for speaker recognition. 1945-1948 - Christian Zieger, Maurizio Omologo:
Combination of clean and contaminated GMM/SVM for far-field text-independent speaker verification. 1949-1952
Phonetics: Development, Learning, Cross-Language and Language-Specific
- Bettina Braun, Kristin Lemhofer, Anne Cutler:
English word stress as produced by English and dutch speakers: the role of segmental and suprasegmental differences. 1953 - Eva Reinisch, Alexandra Jesse, James M. McQueen:
The strength of stress-related lexical competition depends on the presence of first-syllable stress. 1954 - Keiichi Ishikawa, Jun Nomura:
Word stress placement by native speakers and Japanese learners of English. 1955-1958 - H. Timothy Bunnell, Jason Lilley:
Schwa variants in american English. 1959-1962 - Jiahong Yuan:
Covariations of English segmental durations across speakers. 1963 - Akiyo Joto:
The intelligibility of the English vowel /ʌ/ produced by native speakers of Japanese and its relations to the acoustic characteristics. 1964-1967 - Benjamin Weiss:
Rate dependent spectral reduction for voiceless fricatives. 1968 - Stina Ojala, Olli Aaltonen, Tapio Salakoski:
Investigating perception of places of articulation in sign and speech. 1969 - Michael D. Tyler, Catherine T. Best, Louis M. Goldstein, Mark Antoniou, Lidija Krebs-Lazendic:
Six- and twelve-month-olds' discrimination of native versus non-native between- and within-organ fricative place contrasts. 1970 - Christa Lam, Christine Kitamura:
"your baby can't hear you": how mothers talk to infants with simulated hearing loss. 1971 - Eeva Klintfors, Ulla Sundberg, Francisco Lacerda, Ellen Marklund, Lisa Gustavsson, Ulla Bjursäter, Iris-Corinna Schwarz, Göran Söderlund:
Development of communicative skills in 8- to 16-month-old children: a longitudinal study. 1972-1975 - Lisa Gustavsson, Francisco Lacerda:
Vocal imitation in early language acquisition. 1976-1979 - Okko Johannes Räsänen, Unto K. Laine, Toomas Altosaar:
Computational language acquisition by statistical bottom-up processing. 1980-1983 - Noriaki Katagiri, Goh Kawai:
Lexical analyses of native and non-native English language instructor speech based on a six-month co-taught classroom video corpus. 1984-1987 - Hinako Masuda, Takayuki Arai:
Perception and production of consonant clusters in Japanese-English bilingual and Japanese monolingual speakers. 1988-1991
Robust Automatic Speech Recognition I-III
- Jinyu Li, Chin-Hui Lee:
On a generalization of margin-based discriminative training to robust speech recognition. 1992-1995 - Mark J. F. Gales, Chris Longworth:
Discriminative classifiers with generative kernels for noise robust ASR. 1996-1999 - Rogier C. van Dalen, Mark J. F. Gales:
Covariance modelling for noise-robust speech recognition. 2000-2003 - Wei-Hau Chen, Shih-Hsiang Lin, Berlin Chen:
Exploiting spatial-temporal feature distribution characteristics for robust speech recognition. 2004-2007 - Masakiyo Fujimoto, Kentaro Ishizuka, Tomohiro Nakatani:
Study of integration of statistical model-based voice activity detection and noise suppression. 2008-2011 - Weifeng Li, John Dines, Mathew Magimai-Doss, Hervé Bourlard:
Neural network based regression for robust overlapping speech recognition using microphone arrays. 2012-2015
Multimodal Signal Processing
- Samer Al Moubayed, Michaël De Smet, Hugo Van hamme:
Lip synchronization: from phone lattice to PCA eigen-projections using neural networks. 2016-2019 - Ryoei Takahashi, Yasunori Ohishi, Norihide Kitaoka, Kazuya Takeda:
Building and combining document and music spaces for music query-by-webpage system. 2020-2023 - Lei Wang, Shen Huang, Sheng Hu, Jiaen Liang, Bo Xu:
Improving searching speed and accuracy of query by humming system based on three methods: feature fusion, candidates set reduction and multiple similarity measurement rescoring. 2024-2027 - Thomas Hueber, Gérard Chollet, Bruce Denby, Gérard Dreyfus, Maureen Stone:
Towards a segmental vocoder driven by ultrasound and optical images of the tongue and lips. 2028-2031 - Thomas Hueber, Gérard Chollet, Bruce Denby, Gérard Dreyfus, Maureen Stone:
Phone recognition from ultrasound and optical video sequences for a silent speech interface. 2032-2035 - Jan Trmal, Marek Hrúz, Jan Zelinka, Pavel Campr, Ludek Müller:
Feature space transforms for Czech sign-language recognition. 2036-2039
-Speech Perception
- Chris Davis, Jeesun Kim, Angelo Barbaro:
Masked speech priming: no priming in dense neighbourhoods. 2040-2043 - Azra Nahid Ali:
Integration of audiovisual speech and priming effects. 2044-2047 - Jason D. Zevin, Thomas A. Farmer:
Similarity between vowels influences response execution in word identification. 2048-2051 - Tom Lentz:
Phonotactically well-formed onset clusters as processing units in word recognition. 2052-2055 - Anne Cutler, James M. McQueen, Sally Butterfield, Dennis Norris:
Prelexically-driven perceptual retuning of phoneme boundaries. 2056 - Erin Cvejic, Jeesun Kim, Chris Davis:
Visual speech modifies the phoneme restoration effect. 2057
Evaluation and Standardisation of Spoken-Language Technology
- Chuan Cao, Ming Li, Jian Liu, Yonghong Yan:
An objective singing evaluation approach by relating acoustic measurements to perceptual ratings. 2058-2061 - Valérie Gautier-Turbin, Laetitia Gros:
On the perceived quality of noise reduced signals. 2062-2065 - Uma Murthy, John F. Pitrelli, Ganesh N. Ramaswamy, Martin Franz, Burn L. Lewis:
A methodology and tool suite for evaluation of accuracy of interoperating statistical natural language processing engines. 2066-2069 - Youngja Park, Siddharth Patwardhan, Karthik Visweswariah, Stephen C. Gates:
An empirical analysis of word error rate and keyword error rate. 2070-2073 - Virginie Durin, Laetitia Gros:
Measuring speech quality impact on tasks performance. 2074-2077 - Hannu Soronen, Markku Turunen, Jaakko Hakulinen:
Voice commands in home environment - a consumer survey. 2078-2081
Automatic Speech Recognition: Search Methods
- Ghazi Bouselmi, Jun Cai:
Extended partial distance elimination and dynamic Gaussian selection for fast likelihood computation. 2082-2085 - Joris Driesen, Hugo Van hamme:
Improving the multigram algorithm by using lattices as input. 2086-2089 - Min Tang, Philippe Di Cristo:
Backward Viterbi beam search for utilizing dynamic task complexity information. 2090-2093 - Nicola Bertoldi, Marcello Federico, Daniele Falavigna, Matteo Gerosa:
Fast speech decoding through phone confusion networks. 2094-2097 - Liang Gu, Jian Xue, Xiaodong Cui, Yuqing Gao:
High-performance low-latency speech recognition via multi-layered feature streaming and fast Gaussian computation. 2098-2101 - Patrick J. Bourke, Rob A. Rutenbar:
A low-power hardware search architecture for speech recognition. 2102-2105 - Jonathan Mamou, Bhuvana Ramabhadran:
Phonetic query expansion for spoken document retrieval. 2106-2109 - Tasuku Oonishi, Paul R. Dixon, Koji Iwano, Sadaoki Furui:
Implementation and evaluation of fast on-the-fly WFST composition algorithms. 2110-2113
Speech Synthesis: Prosody and Emotion I, II
- Shoichi Takeda, Yuuri Yasuda, Risako Isobe, Shogo Kiryu, Makiko Tsuru:
Analysis of voice-quality features of speech that expresses "anger", "joy", and "sadness" uttered by radio actors and actresses. 2114-2117 - Leonardo Badino, Robert A. J. Clark, Volker Strom:
Including pitch accent optionality in unit selection text-to-speech synthesis. 2118-2121 - Zeynep Inanoglu, Steve J. Young:
Emotion conversion using F0 segment selection. 2122-2125 - Yao Qian, Hui Liang, Frank K. Soong:
Generating natural F0 trajectory with additive trees. 2126-2129 - Cédric Boidin, Olivier Boëffard:
Generating intonation from a mixed CART-HMM model for speech synthesis. 2130-2133 - Pablo Daniel Agüero, Antonio Bonafonte, Lu Yu, Juan Carlos Tulli:
Intonation modeling of Mandarin Chinese using a superpositional approach. 2134-2137 - Hao Tang, Xi Zhou, Matthias Odisio, Mark Hasegawa-Johnson, Thomas S. Huang:
Two-stage prosody prediction for emotional text-to-speech synthesis. 2138-2141 - Yue-Ning Hu, Min Chu, Chao Huang, Yan-Ning Zhang:
Prosody boundary detection through context-dependent position models. 2142-2145
Language Information Retrieval Systems
- Sha Meng, Jian Shao, Roger Peng Yu, Jia Liu, Frank Seide:
Addressing the out-of-vocabulary problem for large-scale Chinese spoken term detection. 2146-2149 - Jian Shao, Roger Peng Yu, Qingwei Zhao, Yonghong Yan, Frank Seide:
Towards vocabulary-independent speech indexing for large-scale repositories. 2150-2153 - Antonio Moreno-Daniel, Jay G. Wilpon, Biing-Hwang Juang, Sarangarajan Parthasarathy:
Towards the integration of automatic speech recognition and information retrieval for spoken query processing. 2154-2157 - Ville T. Turunen:
Reducing the effect of OOV query words by using morph-based spoken document retrieval. 2158-2161 - Meng-Sung Wu, Jen-Tzung Chien:
Bayesian latent topic clustering model. 2162-2165 - Tomoyosi Akiba, Yusuke Yokota:
Spoken document retrieval by translating recognition candidates into correct transcriptions. 2166-2169 - Carlo Drioli, Piero Cosi:
Audio indexing for an interactive Italian literature management system. 2170 - Makoto Terao, Takafumi Koshinaka, Shinichi Ando, Ryosuke Isotani, Akitoshi Okumura:
Open-vocabulary spoken-document retrieval based on query expansion using related web documents. 2171-2174 - Upendra V. Chaudhari, Hong-Kwang Jeff Kuo, Brian Kingsbury:
Discriminative graph training for ultra-fast low-footprint speech indexing. 2175-2178 - Yun-Cheng Ju, Julian Odell:
A language-modeling approach to inverse text normalization and data cleanup for multimodal voice search applications. 2179-2182 - Rui Amaral, Isabel Trancoso:
Topic segmentation and indexation in a media watch system. 2183-2186 - J. Scott Olsson:
Vocabulary independent discriminative term frequency estimation. 2187-2190 - Hui Lin, Alex Stupakov, Jeff A. Bilmes:
Spoken keyword spotting via multi-lattice alignment. 2191-2194 - Kenji Iwata, Koichi Shinoda, Sadaoki Furui:
Robust spoken term detection using combination of phone-based and word-based recognition. 2195-2198
Applications for the Aged and Handicapped
- Luis Fernando D'Haro, Rubén San Segundo, Ricardo de Córdoba, Jan Bungeroth, Daniel Stein, Hermann Ney:
Language model adaptation for a speech to sign language translation system using web frequencies and a MAP framework. 2199-2202 - Jonas Beskow, Björn Granström, Peter Nordqvist, Samer Al Moubayed, Giampiero Salvi, Tobias Herzke, Arne Schulz:
Hearing at home - communication support in home environments for hearing impaired persons. 2203-2206 - Daniel A. Taft, David B. Grayden, Anthony N. Burkitt:
Traveling wave based group delays for cochlear implant speech processing. 2207 - Damien J. Smith, Denis Burnham:
Multimodal perception of Mandarin tone for cochlear implant users. 2208 - Keigo Nakamura, Tomoki Toda, Yoshitaka Nakajima, Hiroshi Saruwatari, Kiyohiro Shikano:
Evaluation of speaking-aid system with voice conversion for laryngectomees toward its use in practical environments. 2209-2212 - Jacqueline McKechnie, Kirrie J. Ballard, Donald A. Robin, Adam Jacks, Sallyanne Palethorpe, Kristin M. Rosen:
An acoustic typology of apraxic speech - toward reliable diagnosis. 2213 - Gilles Pouchoulin, Corinne Fredouille, Jean-François Bonastre, Alain Ghio, Antoine Giovanni:
Dysphonic voices and the 0-3000 hz frequency band. 2214-2217 - Shou-Chun Yin, Richard C. Rose, Oscar Saz, Eduardo Lleida:
Verifying pronunciation accuracy from speakers with neuromuscular disorders. 2218-2221 - Ali Alpan, Youri Maryn, Francis Grenez, Abdellah Kacha, Jean Schoentgen:
Multi-band and multi-cue analyses of disordered connected speech. 2222-2225 - James Carmichael, Vincent Wan, Phil D. Green:
Combining neural network and rule-based systems for dysarthria diagnosis. 2226-2229 - Shona D'Arcy, Viliam Rapcan, Nils Penard, Margaret E. Morris, Ian H. Robertson, Richard B. Reilly:
Speech as a means of monitoring cognitive function of elderly speakers. 2230-2233 - Hironori Matsumasa, Tetsuya Takiguchi, Yasuo Ariki, Ichao Li, Toshitaka Nakabayashi:
Integration of metamodel and acoustic model for speech recognition. 2234-2237 - Francisco J. Fraga, Leticia P. Costa S. Prates, Maria Cecilia M. Iorio:
Frequency compression/transposition of fricative consonants for the hearing impaired with high-frequency dead regions. 2238-2241
Automatic Speech Recognition: Features I, II
- Fabio Valente, Hynek Hermansky:
On the combination of auditory and modulation frequency channels for ASR applications. 2242-2245 - Vivek Tyagi:
Tandem processing of fepstrum features. 2246-2249 - Shuo-Yiin Chang, Lin-Shan Lee:
Data-driven clustered hierarchical tandem system for LVCSR. 2250-2253 - Hung-Shin Lee, Berlin Chen:
Linear discriminant feature extraction using weighted classification confusion information. 2254-2257 - D. Rama Sanand, V. Balaji, Rani R. Sandhya, Srinivasan Umesh:
Use of spectral centre of gravity for generating speaker invariant features for automatic speech recognition. 2258-2261 - Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura:
Short- and long-term dynamic features for robust speech recognition. 2262-2265
Speech Synthesis: Prosody and Emotion I, II
- Boyang Gao, Yao Qian, Zhizheng Wu, Frank K. Soong:
Duration refinement by jointly optimizing state and longer unit likelihood. 2266-2269 - Ausdang Thangthai, Nattanun Thatphithakkul, Chai Wutiwiwatchai, Anocha Rugchatjaroen, Sittipong Saychum:
T-tilt: a modified tilt model for F0 analysis and synthesis in tonal languages. 2270-2273 - Javier Latorre, Masami Akamine:
Multilevel parametric-base F0 model for speech synthesis. 2274-2277 - Jordi Adell, Antonio Bonafonte, David Escudero Mancebo:
On the generation of synthetic disfluent speech: local prosodic modifications caused by the insertion of editing terms. 2278-2281 - Oytun Türk, Marc Schröder:
A comparison of voice conversion methods for transforming voice quality in emotional speech synthesis. 2282-2285 - Joseph Tepperman, Shrikanth S. Narayanan:
Tree grammars as models of prosodic structure. 2286-2289
Human Speech Production
- Sungbok Lee, Tsuneo Kato, Shrikanth S. Narayanan:
Relation between geometry and kinematics of articulatory trajectory associated with emotional speech production. 2290-2293 - Michael J. Carne:
Intrinsic consonantal F0 perturbation in 3-way VOT contrast and its implications for aspiration-conditioned tonal split: evidence from Vietnamese. 2294-2297 - Qiang Fang, Satoru Fujita, Xugang Lu, Jianwu Dang:
A model based investigation of activation patterns of the tongue muscles for vowel production. 2298-2301 - Maeva Garnier, Joe Wolfe, Nathalie Henrich, John Smith:
Interrelationship between vocal effort and vocal tract acoustics: a pilot study. 2302-2305 - Chao Qin, Miguel Á. Carreira-Perpiñán, Korin Richmond, Alan Wrench, Steve Renals:
Predicting tongue shapes from a few landmark locations. 2306-2309
Special Session: LIPS 2008 - Visual Speech Synthesis Challenge
- Barry-John Theobald, Sascha Fagel, Gérard Bailly, Frédéric Elisei:
LIPS2008: visual speech synthesis challenge. 2310-2313 - Gregor Hofer, Junichi Yamagishi, Hiroshi Shimodaira:
Speech-driven lip motion generation with a trajectory HMM. 2314-2317 - Gérard Bailly, Oxana Govokhina, Gaspard Breton, Frédéric Elisei, Christophe Savariaux:
A trainable trajectory formation model TD-HMM parameterized for the LIPS 2008 challenge. 2318-2321 - Barry-John Theobald, Gavin C. Cawley, J. Andrew Bangham, Iain A. Matthews, Nicholas Wilkinson:
Comparing text-driven and speech-driven visual speech synthesisers. 2322 - Goranka Zoric, Aleksandra Cerekovic, Igor S. Pandzic:
Automatic lip synchronization by speech signal analysis. 2323 - Sascha Fagel:
MASSY speaks English: adaptation and evaluation of a talking head. 2324 - Sascha Fagel, Frédéric Elisei, Gérard Bailly:
From 3-d speaker cloning to text-to-audiovisual-speech. 2325 - Zdenek Krnoul, Milos Zelezný:
A development of Czech talking head. 2326-2329 - Kang Liu, Jörn Ostermann:
Realistic facial animation system for interactive services. 2330-2333 - Juan Yan, Xiang Xie, Hao Hu:
Speech-driven 3d facial animation for mobile entertainment. 2334-2337 - Lijuan Wang, Xiaojun Qian, Lei Ma, Yao Qian, Yining Chen, Frank K. Soong:
A real-time text to audio-visual speech synthesis system. 2338-2341 - Evgeny Matusov, Björn Hoffmeister, Hermann Ney:
Spoken language translation systems ************ ASR word lattice translation with exhaustive reordering is possible. 2342-2345 - Jing Zheng, Wen Wang, Necip Fazil Ayan:
Development of SRI's translation systems for broadcast news and broadcast conversations. 2346-2349 - Ruhi Sarikaya, Yonggang Deng, Mohamed Afify, Brian Kingsbury, Yuqing Gao:
Machine translation in continuous space. 2350-2353 - Caroline Lavecchia, David Langlois, Kamel Smaïli:
Discovering phrases in machine translation by simulated annealing. 2354-2357 - Aarthi M. Reddy, Richard C. Rose:
Towards domain independence in machine aided human translation. 2358-2361 - Ian R. Lane, Alex Waibel:
Class-based statistical machine translation for field maintainable speech-to-speech translation. 2362-2365
Automatic Speech Recognition: Acoustic Models I-III
- Qingqing Zhang, Ta Li, Jielin Pan, Yonghong Yan:
Nonnative speech recognition based on state-candidate bilingual model modification. 2366-2369 - Björn W. Schuller, Xiaohua Zhang, Gerhard Rigoll:
Prosodic and spectral features within segment-based acoustic modeling. 2370-2373 - Jeff Z. Ma, Richard M. Schwartz:
Unsupervised versus supervised training of acoustic models. 2374-2377 - Tara N. Sainath, Victor Zue:
A comparison of broad phonetic and acoustic units for noise robust segment-based phonetic recognition. 2378-2381 - Takahiro Shinozaki, Sadaoki Furui, Tatsuya Kawahara:
Aggregated cross-validation and its efficient application to Gaussian mixture optimization. 2382-2385 - Mike Matton, Dirk Van Compernolle, Ronald Cools:
A minimum classification error based distance measure for template based speech recognition. 2386-2389 - Sabato Marco Siniscalchi, Torbjørn Svendsen, Chin-Hui Lee:
A penalized logistic regression approach to detection based phone classification. 2390-2393 - Alberto Abad, João Paulo Neto:
Incorporating acoustical modelling of phone transitions in an hybrid ANN/HMM speech recognizer. 2394-2397 - Erik McDermott, Atsushi Nakamura:
Flexible discriminative training based on equal error group scores obtained from an error-indexed forward-backward algorithm. 2398-2401 - Giulia Garau, Steve Renals:
Pitch adaptive features for LVCSR. 2402-2405 - Chris D. Bartels, Jeff A. Bilmes:
Using syllable nuclei locations to improve automatic speech recognition in the presence of burst noise. 2406-2409 - Hyejin Hong, Sunhee Kim, Minhwa Chung:
Effects of allophones on the performance of Korean speech recognition. 2410-2413 - Joel Pinto, Hynek Hermansky:
Combining evidence from a generative and a discriminative model in phoneme recognition. 2414-2417 - Kishan Thambiratnam, Frank Seide:
Fragmented context-dependent syllable acoustic models. 2418-2421 - Hongwei Hu, Martin J. Russell:
Speech recognition using non-linear trajectories in a formant-based articulatory layer of a multiple-level segmental HMM. 2422-2425 - Christian Plahl, Björn Hoffmeister, Mei-Yuh Hwang, Danju Lu, Georg Heigold, Jonas Lööf, Ralf Schlüter, Hermann Ney:
Recent improvements of the RWTH GALE Mandarin LVCSR system. 2426-2429
Spoken Language: Parsing and Summarisation
- Sameer Maskey, Andrew Rosenberg, Julia Hirschberg:
Intonational phrases for speech summarization. 2430-2433 - Korbinian Riedhammer, Daniel Gillick, Benoît Favre, Dilek Hakkani-Tür:
Packing the meeting summarization knapsack. 2434-2437 - Yasuhisa Fujii, Kazumasa Yamamoto, Norihide Kitaoka, Seiichi Nakagawa:
Class lecture summarization taking into account consecutiveness of important sentences. 2438-2441 - Xiaodan Zhu, Xuming He, Cosmin Munteanu, Gerald Penn:
Using latent Dirichlet allocation to incorporate domain knowledge for topic transition detection. 2443-2445 - Wen Wang:
Weakly supervised training for parsing Mandarin broadcast transcripts. 2446-2449 - Tomohiro Ohno, Shigeki Matsubara, Hideki Kashioka, Yasuyoshi Inagaki:
Dependency parsing of Japanese spoken monologue based on clause-starts detection. 2454-2457 - Mrugesh R. Gajjar, R. Govindarajan, T. V. Sreenivas:
Online unsupervised pattern discovery in speech using parallelization. 2458-2461
Multimodal Interfaces
- Aleksi Melto, Markku Turunen, Jaakko Hakulinen, Anssi Kainulainen, Tomi Heimonen:
A comparison of input entry rates in a multimodal mobile application. 2462-2465 - Markku Turunen, Jaakko Hakulinen, Cameron G. Smith, Daniel Charlton, Li Zhang, Marc Cavazza:
Physically embodied conversational agents as health and fitness companions. 2466-2469 - Florian Metze, Roman Englert, Udo Bub, Ingmar Kliche, Thomas Scheerbarth:
User perception of multi-modal interfaces for mobile applications. 2470-2473 - Teppei Nakano, Tomoyuki Kumai, Tetsunori Kobayashi, Yasushi Ishikawa:
Design and formulation for speech interface based on flexible shortcuts. 2474-2477 - Bo Yin, Natalie Ruiz, Fang Chen, Eliathamby Ambikairajah:
Exploring classification techniques in speech based cognitive load monitoring. 2478-2481 - Masayuki Okamoto, Naoki Iketani, Keisuke Nishimura, Masaaki Kikuchi, Kenta Cho, Masanori Hattori, Sougo Tsuboi:
Finding two-level interpersonal context: proximity and conversation detection from personal audio feature data. 2482-2485 - Sudeep Gandhe, David DeVault, Antonio Roque, Bilyana Martinovski, Ron Artstein, Anton Leuski, Jillian Gerten, David R. Traum:
From domain specification to virtual humans: an integrated approach to authoring tactical questioning characters. 2486-2489 - Mike Rozak:
Designing a massively multiplayer online role-playing game around text-to-speech. 2490-2493
Speech, Music, Audio Segmentation and Classification
- Jie Gao, Xiang Zhang, Qingwei Zhao, Yonghong Yan:
Robust speaker change detection using Kernel-Gaussian model. 2494-2497 - Stavros Ntalampiras, Nikos Fakotakis:
A comparative study in automatic recognition of broadcast audio. 2498-2501 - Charturong Tantibundhit, Gernot Kubin:
Joint time-frequency segmentation for transient decomposition. 2502-2505 - Vikramjit Mitra, Daniel Garcia-Romero, Carol Y. Espy-Wilson:
Language and genre detection in audio content analysis. 2506-2509 - Chi Zhang, John H. L. Hansen:
An entropy based feature for whisper-island detection within audio streams. 2510-2513 - Matej Grasic, Marko Kos, Andrej Zgank, Zdravko Kacic:
Two step speaker segmentation method using Bayesian information criterion and adapted Gaussian mixtures models. 2514-2517 - Sebastian Germesin, Tilman Becker, Peter Poller:
Domain-specific classification methods for disfluency detection. 2518-2521 - Tin Lay Nwe, Minghui Dong, Swe Zin Kalayar Khine, Haizhou Li:
Multi-speaker meeting audio segmentation. 2522-2525 - Namunu Chinthaka Maddage, Haizhou Li:
Rhythm based music segmentation and octave scale cepstral features for sung language recognition. 2526-2529 - Md. Khademul Islam Molla, Keikichi Hirose, Nobuaki Minematsu:
Robust voiced/unvoiced speech classification using empirical mode decomposition and periodic correlation model. 2530-2533 - Qiong Wu, Qin Yan, Jun Wang, Jun Hong:
A combination of data mining method with decision trees building for speech/music discrimination. 2534-2537 - Vishwa Gupta, Gilles Boulianne, Patrick Kenny, Pierre Dumouchel:
Advertisement detection in French broadcast news using acoustic repetition and Gaussian mixture models. 2538-2541
Spoken Language: Parsing and Summarisation
- Barbara Plank, Khalil Sima'an:
Parsing with subdomain instance weighting from raw corpora. 2540
Speech, Music, Audio Segmentation and Classification
- Timothy J. Hazen, Fred Richardson:
A hybrid SVM/MCE training approach for vector space topic identification of spoken audio recordings. 2542-2545 - Isabel Trancoso, José Portelo, Miguel M. F. Bugalho, João Paulo Neto, António Joaquim Serralheiro:
Training audio events detectors with a sound effects corpus. 2546-2549
Automatic Speech Recognition: New Paradigms
- Ravichander Vipperla, Steve Renals, Joe Frankel:
Longitudinal study of ASR performance on ageing voices. 2550-2553 - Hugo Van hamme:
HAC-models: a novel approach to continuous speech recognition. 2554-2557 - Prateeti Mohapatra, Eric Fosler-Lussier:
Investigations into phonological attribute classifier representations for CRF phone recognition. 2558-2561 - Amarnag Subramanya, Jeff A. Bilmes:
Applications of virtual-evidence based speech recognizer training. 2562-2565 - Joost van Doremalen, Lou Boves:
Spoken digit recognition using a hierarchical temporal memory. 2566-2569 - Louis ten Bosch, Hugo Van hamme, Lou Boves:
A computational model of language acquisition: focus on word discovery. 2570-2573
Speech and Acoustic Activity Detection
- Lakshmish Kaushik, Douglas D. O'Shaughnessy:
Voice activity detection using modified Wigner-ville distribution. 2574-2577 - Krishna Chaitanya, Rohit Sinha:
Energy and entropy based switching algorithm for speech endpoint detection in varying SNR conditions. 2578-2581 - Jörn Anemüller, Denny Schmidt, Jörg-Hendrik Bach:
Detection of speech embedded in real acoustic background based on amplitude modulation spectrogram features. 2582-2585 - Tuan Van Pham, Michael Stadtschnitzer, Franz Pernkopf, Gernot Kubin:
Voice activity detection algorithms using subband power distance feature for noisy environments. 2586-2589 - Christian A. Müller, Joan-Isaac Biel, Edward Kim, Daniel Rosario:
Speech-overlapped acoustic event detection for automotive applications. 2590-2593 - Andrey Temko, Climent Nadeu:
Detection of acoustic events in interactive seminar data with temporal overlaps. 2594-2597
Speech Analysis and Processing
- Chanwoo Kim, Richard M. Stern:
Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis. 2598-2601 - Anthony P. Stark, Kuldip K. Paliwal:
Speech analysis using instantaneous frequency deviation. 2602-2605 - Claudius Gläser, Martin Heckmann, Frank Joublin, Christian Goerick:
Auditory-based formant estimation in noise using a probabilistic framework. 2606-2609 - K. Sri Rama Murty, Saurav Khurana, Yogendra Umesh Itankar, M. R. Kesheorey, B. Yegnanarayana:
Efficient representation of throat microphone speech. 2610-2613 - Om Deshmukh, Ashish Verma:
Acoustic-phonetic approach for automatic evaluation of spoken grammar. 2614-2617 - Stephen Cox:
On estimation of a speaker's confusion matrix from sparse data. 2618-2621
Special Session: Talking Heads and Pronunciation Training
- Valérie Hazan:
Talking heads and pronunciation training: a review. 2622 - Dominic W. Massaro, Stephanie Bigler, Trevor H. Chen, Marcus Perlman, Slim Ouni:
Pronunciation training: the role of eye and ear. 2623-2626 - Preben Wik, Olov Engwall:
Can visualization of internal articulators support speech perception? 2627-2630 - Olov Engwall:
Can audio-visual instructions help learners improve their articulation? - an ultrasound study of short term changes. 2631-2634 - Pierre Badin, Yuliya Tarabalka, Frédéric Elisei, Gérard Bailly:
Can you "read tongue movements"? 2635-2638 - Bernd J. Kröger, Verena Graf-Borttscheller, Anja Lowit:
Two- and three-dimensional visual articulatory models for pronunciation training and for treatment of speech disorders. 2639-2642 - Sascha Fagel, Katja Madany:
A 3-d virtual head as a tool for speech therapy for children. 2643-2646 - Robin Hofe, Roger K. Moore:
Anton: an animatronic model of a human tongue and vocal tract. 2647-2650 - Takayuki Arai:
Physical models of the human vocal tract with gel-type material. 2651-2654 - Chao Huang, Feng Zhang, Frank K. Soong, Min Chu:
Mispronunciation detection for Mandarin Chinese. 2655-2658
Multimodal Speech Processing
- Lijuan Wang, Tao Hu, Peng Liu, Frank K. Soong:
Efficient handwriting correction of speech recognition errors with template constrained posterior (TCP). 2659-2662 - Pascual Ejarque, Javier Hernando:
Bi-Gaussian score equalization in an audio-visual SVM-based person verification system. 2663-2666 - Geoffrey S. Meltzner, Jason J. Sroka, James T. Heaton, L. Donald Gilmore, Glen Colby, Serge H. Roy, Nancy Chen, Carlo J. De Luca:
Speech recognition for vocalized and subvocal modes of production using surface EMG signals from the neck and face. 2667-2670 - Trent W. Lewis, David M. W. Powers:
Distinctive feature fusion for recognition of australian English consonants. 2671-2674 - Yasushi Watanabe, Koichi Shinoda, Sadaoki Furui:
Time-lag adaptation for semi-synchronous speech and pen input. 2675-2678 - Patrick Lucey, Sridha Sridharan, David Dean:
Continuous pose-invariant lipreading. 2679-2682
Cross-Lingual and Multilingual Automatic Speech Recognition, Speech Translation
- Jan Nouza, Jan Silovský, Jindrich Zdánský, Petr Cerva, Martin Kroul, Josef Chaloupka:
Czech-to-slovak adapted broadcast news transcription system. 2683-2686 - Dau-Cheng Lyu, Sabato Marco Siniscalchi, Tae-Yoon Kim, Chin-Hui Lee:
Continuous phone recognition without target language training data. 2687-2690 - Christopher M. White, Sanjeev Khudanpur, James K. Baker:
An investigation of acoustic models for multilingual code-switching. 2691-2694 - László Tóth, Joe Frankel, Gábor Gosztolya, Simon King:
Cross-lingual portability of MLP-based tandem features - a case study for English and Hungarian. 2695-2698 - Xufang Zhao, Douglas D. O'Shaughnessy:
Seed models combination and state level mappings of cross-lingual transfer for rapid HMM development: from English to Mandarin. 2699-2702 - Ghazi Bouselmi, Dominique Fohr, Irina Illina:
Multi-accent and accent-independent non-native speech recognition. 2703-2706 - Adish Kumar Singla, Dilek Hakkani-Tür:
Cross-lingual sentence extraction for information distillation. 2707-2710 - Stefano Scanzio, Pietro Laface, Luciano Fissore, Roberto Gemello, Franco Mana:
On the use of a multilingual neural network front-end. 2711-2714 - Khe Chai Sim, Haizhou Li:
Context-sensitive probabilistic phone mapping model for cross-lingual speech recognition. 2715-2718 - Chen Liu, Lynette Melnar:
A non-acoustic approach to crosslingual speech recognition performance prediction. 2719-2722 - Vivek Kumar Rangarajan Sridhar, Srinivas Bangalore, Shrikanth S. Narayanan:
Factored translation models for enriching spoken language translation with prosody. 2723-2726 - Holger Schwenk, Yannick Estève:
Data selection and smoothing in an open-source system for the 2008 NIST machine translation evaluation. 2727-2730 - Andreas Kathol, Jing Zheng:
Strategies for building a Farsi-English SMT system from limited resources. 2731-2734 - Muntsin Kolss, Stephan Vogel, Alex Waibel:
Stream decoding for simultaneous spoken language translation. 2735-2738 - Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Towards unsupervised training of the classifier-based speech translator. 2739-2742 - John F. Pitrelli, Burn L. Lewis, Edward A. Epstein, Martin Franz, Daniel Kiecza, Jerome L. Quinn, Ganesh N. Ramaswamy, Amit Srivastava, Paola Virga:
Aggregating distributed STT, MT, and information extraction engines: the GALE interoperability-demo system. 2743-2746
Expression, Emotion and Personality Recognition
- Abe Kazemzadeh, Sungbok Lee, Shrikanth S. Narayanan:
An interval type-2 fuzzy logic system to translate between emotion-related vocabularies. 2747-2750 - Ting Huang, Yingchun Yang:
Applying pitch-dependent difference detection and modification to emotional speaker recognition. 2751-2754 - Daniel Neiberg, Kjell Elenius:
Automatic recognition of anger in spontaneous speech. 2755-2758 - Takashi Nose, Yoichi Kato, Makoto Tachibana, Takao Kobayashi:
An estimation technique of style expressiveness for emotional speech using model adaptation based on multiple-regression HSMM. 2759-2762 - Fabien Ringeval, Mohamed Chetouani:
A vowel based approach for acted emotion recognition. 2763-2766 - Gordon McIntyre, Roland Göcke:
A composite framework for affective sensing. 2767-2770 - Arslan Shaukat, Ke Chen:
Towards automatic emotional state categorization from speech signals. 2771-2774 - Jeong-Sik Park, Ji-Hwan Kim, Sang-Min Yoon, Yung-Hwan Oh:
Speaker-independent emotion recognition based on feature vector classification. 2775-2778
Applications in Education and Learning I, II
- Matthew Black, Joseph Tepperman, Sungbok Lee, Shrikanth S. Narayanan:
Estimation of children's reading ability by fusion of automatic pronunciation verification and fluency detection. 2779-2782 - Matthew Black, Joseph Tepperman, Abe Kazemzadeh, Sungbok Lee, Shrikanth S. Narayanan:
Pronunciation verification of English letter-sounds in preliterate children. 2783-2786 - Alissa M. Harrison, Wing Yiu Lau, Helen M. Meng, Lan Wang:
Improving mispronunciation detection and diagnosis of learners' speech with context-sensitive phonological rules based on language transfer. 2787-2790 - Catia Cucchiarini, Joost van Doremalen, Helmer Strik:
DISCO: development and integration of speech technology into courseware for language learning. 2791-2794 - Abdurrahman Samir, Jacques Duchateau, Hugo Van hamme:
Discriminative model combination and language model selection in a reading tutor for children. 2795-2798 - Jakob Schou Pedersen, Lars Bo Larsen, Børge Lindberg:
Usability of ASR-based reading training for dyslexics. 2799-2802 - Shingo Togashi, Seiichi Nakagawa:
A browsing system for classroom lecture speech. 2803-2806 - Dean Luo, Naoya Shimomura, Nobuaki Minematsu, Yutaka Yamauchi, Keikichi Hirose:
Automatic pronunciation evaluation of language learners' utterances generated through shadowing. 2807-2810 - Sylvain Chevalier, Zhenhai Cao:
Application and evaluation of speech technologies in language learning: experiments with the Saybot player. 2811-2814 - Fengpei Ge, Fuping Pan, Changliang Liu, Bin Dong, Yonghong Yan:
Forward optimal modeling of acoustic confusions in Mandarin CALL system. 2815-2818 - Akinori Ito, Ryohei Tsutsui, Shozo Makino, Motoyuki Suzuki:
Recognition of English utterances with grammatical and lexical mistakes for dialogue-based CALL system. 2819-2822
Human Speech Production and Speech Perception
- Erik Bresch, Daylen Riggs, Louis M. Goldstein, Dani Byrd, Sungbok Lee, Shrikanth S. Narayanan:
An analysis of vocal tract shaping in English sibilant fricatives using real-time magnetic resonance imaging. 2823-2826 - Takayuki Arai:
Science workshop with sliding vocal-tract model. 2827-2830 - Odile Bagou, Ulrich H. Frauenfelder:
Segmentation cues in lexical identification and in lexical acquisition: same or different? 2831-2834 - Cecile T. L. Kuijpers, Louis ten Bosch:
Phonological representations in poor readers. 2835-2838 - Stéphanie Buchaillard, Pascal Perrier, Yohan Payan:
To what extent does tagged-MRI technique allow to infer tongue muscles' activation pattern? a modelling study. 2839-2842 - Noureddine Aboutabit, Denis Beautemps, Olivier Mathieu, Laurent Besacier:
Feature adaptation of hearing-impaired lip shapes: the vowel case in the cued speech context. 2843-2846 - Nanette Veilleux, Stefanie Shattuck-Hufnagel:
Automatic detection of the context of acoustic landmark deletion. 2847-2850 - Slim Ouni:
Aspects of pharyngealized phonemes in Arabic using articulography. 2851 - Elizabeth Beach, Christine Kitamura, Harvey Dillon, Teresa Ching, Denis Burnham:
The effect of spectral tilt on infants' discrimination of fricatives. 2852 - Monja A. Knoll, Lisa Scharrer:
"look at the shark": evaluation of student produced standardized sentences of infant- and foreigner-directed speech. 2853-2856 - Sankaran Panchapagesan, Abeer Alwan:
Vocal tract inversion by cepstral analysis-by-synthesis using chain matrices. 2857-2860 - Paavo Alku, Carlo Magi, Tom Bäckström:
DC-constrained linear prediction for glottal inverse filtering. 2861-2864 - Magnus Alm, Dawn M. Behne:
Voicing influences the saliency of place of articulation in audio-visual speech perception in babble. 2865-2868 - Shigeaki Amano, Yukari Hirata:
Correspondence of perception and production boundaries between single and geminate stops in Japanese. 2869-2872 - Michael C. W. Yip:
Inhibitory processes of Chinese spoken word recognition. 2873-2876
Automatic Speech Recognition: Acoustic Models I-III
- Klára Vicsi, György Szaszák:
Using prosody for the improvement of ASR - sentence modality recognition. 2877-2880
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.