default search action
INTERSPEECH 2009: Brighton, UK
- 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009, Brighton, United Kingdom, September 6-10, 2009. ISCA 2009
Keynotes
- Sadaoki Furui:
Selected topics from 40 years of research on speech and speaker recognition. 1-8 - Thomas L. Griffiths:
Connecting human and machine learning via probabilistic models of cognition. 9-12 - Deb Roy:
New horizons in the study of child language acquisition. 13-20 - Mari Ostendorf:
Transcribing human-directed speech for spoken language processing. 21-27
ASR: Features for Noise Robustness
- Chanwoo Kim, Richard M. Stern:
Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction. 28-31 - Yu-Hsiang Bosco Chiu, Bhiksha Raj, Richard M. Stern:
Towards fusion of feature extraction and acoustic model training: a top down process for robust speech recognition. 32-35 - Hong You, Abeer Alwan:
Temporal modulation processing of speech signals for noise robust ASR. 36-39 - Luz García, Roberto Gemello, Franco Mana, José C. Segura:
Progressive memory-based parametric non-linear feature equalization. 40-43 - Osamu Ichikawa, Takashi Fukuda, Ryuki Tachibana, Masafumi Nishimura:
Dynamic features in the linear domain for robust automatic speech recognition in a reverberant environment. 44-47 - Antonio Miguel, Alfonso Ortega, Luis Buera, Eduardo Lleida:
Local projections and support vector based feature selection in speech recognition. 48-51
Production: Articulatory Modelling
- Qiang Fang, Akikazu Nishikido, Jianwu Dang, Aijun Li:
Feedforward control of a 3d physiological articulatory model for vowel production. 52-55 - Jun Cai, Yves Laprie, Julie Busset, Fabrice Hirsch:
Articulatory modeling based on semi-polar coordinates and guided PCA technique. 56-59 - Juraj Simko, Fred Cummins:
Sequencing of articulatory gestures using cost optimization. 60-63 - Xiao Bo Lu, William Thorpe, Kylie Foster, Peter Hunter:
From experiments to articulatory motion - a three dimensional talking head model. 64-67 - Javier Pérez, Antonio Bonafonte:
Towards robust glottal source modeling. 68-71 - Takayuki Arai:
Sliding vocal-tract model and its application for vowel production. 72-75
Systems for LVCSR and Rich Transcription
- Haihua Xu, Daniel Povey, Jie Zhu, Guanyong Wu:
Minimum hypothesis phone error as a decoding method for speech recognition. 76-79 - Stefan Kombrink, Lukás Burget, Pavel Matejka, Martin Karafiát, Hynek Hermansky:
Posterior-based out of vocabulary word detection in telephone speech. 80-83 - Yuya Akita, Masato Mimura, Tatsuya Kawahara:
Automatic transcription system for meetings of the Japanese national congress. 84-87 - Jonas Lööf, Christian Gollan, Hermann Ney:
Cross-language bootstrapping for unsupervised acoustic model training: rapid development of a Polish speech recognition system. 88-91 - Alberto Abad, Isabel Trancoso, Nelson Neto, Céu Viana:
Porting an european portuguese broadcast news recognition system to brazilian portuguese. 92-95 - Julien Despres, Petr Fousek, Jean-Luc Gauvain, Sandrine Gay, Yvan Josse, Lori Lamel, Abdelkhalek Messaoudi:
Modeling northern and southern varieties of dutch for STT. 96-99
Speech Analysis and Processing I-III
- Thomas Ewender, Sarah Hoffmann, Beat Pfister:
Nearly perfect detection of continuous f_0 contour and frame classification for TTS synthesis. 100-103 - Yannis Pantazis, Olivier Rosec, Yannis Stylianou:
AM-FM estimation for speech based on a time-varying sinusoidal model. 104-107 - Jón Guðnason, Mark R. P. Thomas, Patrick A. Naylor, Daniel P. W. Ellis:
Voice source waveform analysis and synthesis using principal component analysis and Gaussian mixture modelling. 108-111 - Jung Ook Hong, Patrick J. Wolfe:
Model-based estimation of instantaneous pitch in noisy speech. 112-115 - Thomas Drugman, Baris Bozkurt, Thierry Dutoit:
Complex cepstrum-based decomposition of speech for glottal source estimation. 116-119 - Frank Tompkins, Patrick J. Wolfe:
Approximate intrinsic fourier analysis of speech. 120-123 - Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu:
Spectral and temporal modulation features for phonetic recognition. 1071-1074 - Ibon Saratxaga, Daniel Erro, Inmaculada Hernáez, Iñaki Sainz, Eva Navas:
Use of harmonic phase information for polarity detection in speech signals. 1075-1078 - Michael Wohlmayr, Franz Pernkopf:
Finite mixture spectrogram modeling for multipitch tracking using a factorial hidden Markov model. 1079-1082 - Anthony P. Stark, Kuldip K. Paliwal:
Group-delay-deviation based spectral analysis of speech. 1083-1086 - Joseph M. Anand, B. Yegnanarayana, Sanjeev Gupta, M. R. Kesheorey:
Speaker dependent mapping for low bit rate coding of throat microphone speech. 1087-1090 - G. Bapineedu, B. Avinash, Suryakanth V. Gangashetty, B. Yegnanarayana:
Analysis of Lombard speech using excitation source information. 1091-1094 - Andrew Errity, John McKenna:
A comparison of linear and nonlinear dimensionality reduction methods applied to synthetic speech. 1095-1098 - Christian Fischer Pedersen, Ove Andersen, Paul Dalsgaard:
ZZT-domain immiscibility of the opening and closing phases of the LF GFM under frame length variations. 1099-1102 - Hongjun Sun, Jianhua Tao, Huibin Jia:
Dimension reducing of LSF parameters based on radial basis function neural network. 1103-1106 - A. N. Harish, D. Rama Sanand, Srinivasan Umesh:
Characterizing speaker variability using spectral envelopes of vowel sounds. 1107-1110 - Tharmarajah Thiruvaran, Eliathamby Ambikairajah, Julien Epps:
Analysis of band structures for speaker-specific information in FM feature extraction. 1111-1114 - Karl Schnell, Arild Lacroix:
Artificial nasalization of speech sounds based on pole-zero models of spectral relations between mouth and nose signals. 1115-1118 - Andrew Hines, Naomi Harte:
Error metrics for impaired auditory nerve responses of different phoneme groups. 1119-1122 - Chatchawarn Hansakunbuntheung, Hiroaki Kato, Yoshinori Sagisaka:
Model-based automatic evaluation of L2 learner's English timing. 2871-2874 - Petko Nikolov Petkov, Iman S. Mossavat, W. Bastiaan Kleijn:
A Bayesian approach to non-intrusive quality assessment of speech. 2875-2878 - Ladan Baghai-Ravary, Greg Kochanski, John S. Coleman:
Precision of phoneme boundaries derived using hidden Markov models. 2879-2882 - Lakshmish Kaushik, Douglas D. O'Shaughnessy:
A novel method for epoch extraction from speech signals. 2883-2886 - Jia Min Karen Kua, Julien Epps, Eliathamby Ambikairajah, Eric H. C. Choi:
LS regularization of group delay features for speaker recognition. 2887-2890 - Thomas Drugman, Thierry Dutoit:
Glottal closure and opening instant detection from speech signals. 2891-2894
Speech Perception I, II
- Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano:
Relative importance of formant and whole-spectral cues for vowel perception. 124-127 - Chihiro Takeshima, Minoru Tsuzaki, Toshio Irino:
Influences of vowel duration on speaker-size estimation and discrimination. 128-131 - Václav Jonás Podlipský, Radek Skarnitzl, Jan Volín:
High front vowels in Czech: a contrast in quantity or quality? 132-135 - Marjorie Dole, Michel Hoen, Fanny Meunier:
Effect of contralateral noise on energetic and informational masking on speech-in-speech intelligibility. 136-139 - Heidi Christensen, Jon Barker:
Using location cues to track speaker changes from mobile, binaural microphones. 140-143 - Ioana Vasilescu, Martine Adda-Decker, Lori Lamel, Pierre A. Hallé:
A perceptual investigation of speech transcription errors involving frequent near-homophones in French and american English. 144-147 - Etienne Gaudrain, Su Li, Vin Shen Ban, Roy D. Patterson:
The role of glottal pulse rate and vocal tract length in the perception of speaker identity. 148-151 - Victoria Medina, Willy Serniclaes:
Development of voicing categorization in deaf children with cochlear implant. 152-155 - Annie Tremblay:
Processing liaison-initial words in native and non-native French: evidence from eye movements. 156-159 - Nigel G. Ward, Benjamin H. Walker:
Estimating the potential of signal and interlocutor-track information for language modeling. 160-163 - Antje Heinrich, Sarah Hawkins:
Effect of r-resonance information on intelligibility. 804-807 - Hsin-Yi Lin, Janice Fon:
Perception of temporal cues at discourse boundaries. 808-811 - Zhanyu Ma, Arne Leijon:
Human audio-visual consonant recognition analyzed with three bimodal integration models. 812-815 - Hanny den Ouden, Hugo Quené:
Effects of tempo in radio commercials on young and elderly listeners. 816-819 - Sofia Strömbergsson:
Self-voice recognition in 4 to 5-year-old children. 820-823 - Olov Engwall, Preben Wik:
Are real tongue movements easier to speech read than synthesized? 824-827 - Carmen Peláez-Moreno, Ana I. García-Moral, Francisco J. Valverde-Albacete:
Eliciting a hierarchical structure of human consonant perception task errors using formal concept analysis. 828-831 - Takeshi Saitou, Masataka Goto:
Acoustic and perceptual effects of vocal training in amateur male singing. 832-835
Accent and Language Recognition
- Florian Verdet, Driss Matrouf, Jean-François Bonastre, Jean Hennebert:
Factor analysis and SVM for language recognition. 164-167 - Sabato Marco Siniscalchi, Jeremy Reed, Torbjørn Svendsen, Chin-Hui Lee:
Exploring universal attribute characterization of spoken languages for spoken language recognition. 168-171 - Abhijeet Sangwan, John H. L. Hansen:
On the use of phonological features for automatic accent analysis. 172-175 - Fabio Castaldo, Sandro Cumani, Pietro Laface, Daniele Colibro:
Language recognition using language factors. 176-179 - Je Hun Jeon, Yang Liu:
Automatic accent detection: effect of base units and boundary information. 180-183 - Ron M. Hecht, Omer Hezroni, Amit Manna, Ruth Aloni-Lavi, Gil Dobry, Amir Alfandary, Yaniv Zigel:
Age verification using a hybrid speech processing approach. 184-187 - Ron M. Hecht, Omer Hezroni, Amit Manna, Gil Dobry, Yaniv Zigel, Naftali Tishby:
Information bottleneck based age verification. 188-191 - Fred S. Richardson, William M. Campbell, Pedro A. Torres-Carrasquillo:
Discriminative n-gram selection for dialect recognition. 192-195 - Linsen Loots, Thomas Niesler:
Data-driven phonetic comparison and conversion between south african, british and american English pronunciations. 196-199 - Rong Tong, Bin Ma, Haizhou Li, Engsiong Chng, Kong-Aik Lee:
Target-aware language models for spoken language recognition. 200-203 - Daniel Chung Yong Lim, Ian R. Lane:
Language identification for speech-to-speech translation. 204-207 - Fadi Biadsy, Julia Hirschberg:
Using prosody and phonotactics in Arabic dialect identification. 208-211
ASR: Acoustic Model Training and Combination
- Pierre L. Dognin, John R. Hershey, Vaibhava Goel, Peder A. Olsen:
Refactoring acoustic models using variational expectation-maximization. 212-215 - Georg Heigold, David Rybach, Ralf Schlüter, Hermann Ney:
Investigations on convex optimization using log-linear HMMs for digit string recognition. 216-219 - Janne Pylkkönen:
Investigations on discriminative training in large scale acoustic model estimation. 220-223 - Erik McDermott, Shinji Watanabe, Atsushi Nakamura:
Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training. 224-227 - Etienne Marcheret, Jia-Yu Chen, Petr Fousek, Peder A. Olsen, Vaibhava Goel:
Compacting discriminative feature space transforms for embedded devices. 228-231 - Hung-An Chang, James R. Glass:
A back-off discriminative acoustic model for automatic speech recognition. 232-235 - Junho Park, Frank Diehl, Mark J. F. Gales, Marcus Tomalin, Philip C. Woodland:
Efficient generation and use of MLP features for Arabic speech recognition. 236-239 - Xiaodong Cui, Jian Xue, Bing Xiang, Bowen Zhou:
A study of bootstrapping with multiple acoustic features for improved automatic speech recognition. 240-243 - Scott Novotney, Richard M. Schwartz:
Analysis of low-resource acoustic model self-training. 244-247 - Björn Hoffmeister, Ruoying Liang, Ralf Schlüter, Hermann Ney:
Log-linear model combination with word-dependent scaling factors. 248-251
Spoken Dialogue Systems
- Kyoko Matsuyama, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Enabling a user to specify an item at any time during system enumeration - item identification for barge-in-able conversational dialogue systems. 252-255 - Tomoyuki Yamagata, Tetsuya Takiguchi, Yasuo Ariki:
System request detection in human conversation based on multi-resolution Gabor wavelet features. 256-259 - Stefan Schwärzler, Stefan Maier, Joachim Schenk, Frank Wallhoff, Gerhard Rigoll:
Using graphical models for mixed-initiative dialog management systems with realtime Policies. 260-263 - Shinya Fujie, Yoichi Matsuyama, Hikaru Taniyama, Tetsunori Kobayashi:
Conversation robot participating in and activating a group communication. 264-267 - Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, Hideki Kashioka, Satoshi Nakamura:
Recent advances in WFST-based dialog system. 268-271 - David Griol, Giuseppe Riccardi, Emilio Sanchis:
A statistical dialog manager for the LUNA project. 272-275 - Heriberto Cuayáhuitl, Juventino Montiel-Hernández:
A Policy-switching learning approach for adaptive spoken dialogue agents. 276-279 - Luis Fernando D'Haro, Ricardo de Córdoba, Rubén San Segundo, Javier Macías Guarasa, José Manuel Pardo:
Strategies for accelerating the design of dialogue applications using heuristic information from the backend database. 280-283 - Florian Pinault, Fabrice Lefèvre, Renato de Mori:
Feature-based summary space for stochastic dialogue modeling with hierarchical semantic frames. 284-287 - Rajesh Balchandran, Leonid Rachevsky, Larry Sansone:
Language modeling and dialog management for address recognition. 288-291 - Ea-Ee Jan, Hong-Kwang Kuo, Osamuyimen Stewart, David M. Lubensky:
A framework for rapid development of conversational natural language call routing systems for call centers. 292-295 - Jonas Beskow, Jens Edlund, Björn Granström, Joakim Gustafson, Gabriel Skantze, Helena Tobiasson:
The MonAMI reminder: a spoken dialogue system for face-to-face interaction. 296-299 - Julia Seebode, Stefan Schaffer, Ina Wechsung, Florian Metze:
Influence of training on direct and indirect measures for the evaluation of multimodal systems. 300-303 - Christine Kühnel, Benjamin Weiss, Sebastian Möller:
Talking heads for interacting with spoken dialog smart-home systems. 304-307 - Aki Kunikoshi, Yu Qiao, Nobuaki Minematsu, Keikichi Hirose:
Speech generation from hand gestures based on space mapping. 308-311
Special Session: INTERSPEECH 2009 Emotion Challenge
- Björn W. Schuller, Stefan Steidl, Anton Batliner:
The INTERSPEECH 2009 emotion challenge. 312-315 - Santiago Planet, Ignasi Iriondo Sanz, Joan Claudi Socoró, Carlos Monzo, Jordi Adell:
GTM-URL contribution to the INTERSPEECH 2009 emotion challenge. 316-319 - Chi-Chun Lee, Emily Mower, Carlos Busso, Sungbok Lee, Shrikanth S. Narayanan:
Emotion recognition using a hierarchical binary decision tree approach. 320-323 - Elif Bozkurt, Engin Erzin, Çigdem Eroglu Erdem, A. Tanju Erdem:
Improving automatic emotion recognition from speech signals. 324-327 - Thurid Vogt, Elisabeth André:
Exploring the benefits of discretization of acoustic features for speech emotion recognition. 328-331 - Iker Luengo, Eva Navas, Inmaculada Hernáez:
Combining spectral and prosodic information for emotion recognition in the interspeech 2009 emotion challenge. 332-335 - Roberto Barra-Chicote, Fernando Fernández Martínez, Syaheerah L. Lutfi, Juan Manuel Lucas-Cuesta, Javier Macías Guarasa, Juan Manuel Montero, Rubén San Segundo, José Manuel Pardo:
Acoustic emotion recognition using dynamic Bayesian networks and multi-space distributions. 336-339 - Tim Polzehl, Shiva Sundaram, Hamed Ketabdar, Michael Wagner, Florian Metze:
Emotion classification in children's speech using fusion of acoustic and linguistic features. 340-343 - Pierre Dumouchel, Najim Dehak, Yazid Attabi, Réda Dehak, Narjès Boufaden:
Cepstral and long-term features for emotion recognition. 344-347 - Marcel Kockmann, Lukás Burget, Jan Cernocký:
Brno University of Technology system for Interspeech 2009 emotion challenge. 348-351
Automatic Speech Recognition: Language Models I, II
- Boulos Harb, Ciprian Chelba, Jeffrey Dean, Sanjay Ghemawat:
Back-off language model compression. 352-355 - Tobias Kaufmann, Thomas Ewender, Beat Pfister:
Improving broadcast news transcription with a precision grammar and discriminative reranking. 356-359 - Xunying Liu, Mark J. F. Gales, Philip C. Woodland:
Use of contexts in language model interpolation and adaptation. 360-363 - Jim L. Hieronymus, Xunying Liu, Mark J. F. Gales, Philip C. Woodland:
Exploiting Chinese character models to improve speech recognition performance. 364-367 - Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot:
Constraint selection for topic-based MDI adaptation of language models. 368-371 - Chuang-Hua Chueh, Jen-Tzung Chien:
Nonstationary latent Dirichlet allocation for speech recognition. 372-375 - Sopheap Seng, Laurent Besacier, Brigitte Bigi, Eric Castelli:
Multiple text segmentation for statistical language modeling. 2663-2666 - Denis Filimonov, Mary P. Harper:
Measuring tagging performance of a joint language model. 2667-2670 - Langzhou Chen, K. K. Chin, Kate M. Knill:
Improved language modelling using bag of word pairs. 2671-2674 - Frank Diehl, Mark J. F. Gales, Marcus Tomalin, Philip C. Woodland:
Morphological analysis and decomposition for Arabic speech-to-text systems. 2675-2678 - Amr El-Desoky, Christian Gollan, David Rybach, Ralf Schlüter, Hermann Ney:
Investigating the use of morphological decomposition and diacritization for improving Arabic LVCSR. 2679-2682 - Welly Naptali, Masatoshi Tsuchiya, Seiichi Nakagawa:
Topic dependent language model based on topic voting on noun history. 2683-2686 - Péter Mihajlik, Balázs Tarján, Zoltán Tüske, Tibor Fegyó:
Investigation of morph-based speech recognition improvements across speech genres. 2687-2690 - Kengo Ohta, Masatoshi Tsuchiya, Seiichi Nakagawa:
Effective use of pause information in language modelling for speech recognition. 2691-2694 - Songfang Huang, Steve Renals:
A parallel training algorithm for hierarchical pitman-yor process language models. 2695-2698 - Stanislas Oger, Vladimir Popescu, Georges Linarès:
Probabilistic and possibilistic language models based on the world wide web. 2699-2702
Phoneme-Level Perception
- Jack C. Rogers, Matthew H. Davis:
Categorical perception of speech without stimulus repetition. 376-379 - Anne Cutler, Chris Davis, Jeesun Kim:
Non-automaticity of use of orthographic knowledge in phoneme evaluation. 380-383 - Meghan Sumner:
Learning and generalization of novel contrastive cues. 384-387 - Einar Meister, Stefan Werner:
Vowel category perception affected by microdurational variations. 388-391 - Nandini Iyer, Douglas Brungart, Brian D. Simpson:
Perceptual grouping of alternating word pairs: effect of pitch difference and presentation rate. 392-395 - Titia Benders, Paul Boersma:
Comparing methods to find a best exemplar in a multidimensional space. 396-399
Statistical Parametric Synthesis I, II
- Matt Shannon, William Byrne:
Autoregressive HMMs for speech synthesis. 400-403 - Cheng-Cheng Wang, Zhen-Hua Ling, Li-Rong Dai:
Asynchronous F0 and spectrum modeling for HMM-based speech synthesis. 404-407 - Yao Qian, Frank K. Soong, Miaomiao Wang, Zhizheng Wu:
A minimum v/u error approach to F0 generation in HMM-based TTS. 408-411 - Shiyin Kang, Zhiwei Shuang, Quansheng Duan, Yong Qin, Lianhong Cai:
Voiced/unvoiced decision algorithm for HMM-based speech synthesis. 412-415 - Xavi Gonzalvo, Alexander Gutkin, Joan Claudi Socoró, Ignasi Iriondo Sanz, Paul Taylor:
Local minimum generation error criterion for hybrid HMM speech synthesis. 416-419 - Junichi Yamagishi, Bela Usabaev, Simon King, Oliver Watts, John Dines, Jilei Tian, Rile Hu, Yong Guan, Keiichiro Oura, Keiichi Tokuda, Reima Karhila, Mikko Kurimo:
Thousands of voices for HMM-based speech synthesis. 420-423
Systems for Spoken Language Translation
- Sylvain Raybaud, David Langlois, Kamel Smaïli:
Efficient combination of confidence measures for machine translation. 424-427 - David Stallard, Stavros Tsakalidis, Shirin Saleem:
Incremental dialog clustering for speech-to-speech translation. 428-431 - Ruhi Sarikaya, Sameer Maskey, R. Zhang, Ea-Ee Jan, D. Wang, Bhuvana Ramabhadran, Salim Roukos:
Iterative sentence-pair extraction from quasi-parallel corpora for machine translation. 432-435 - Juan M. Huerta, Cheng Wu, Andrej Sakrajda, Sasha Caskey, Ea-Ee Jan, Alexander Faisman, Shai Ben-David, Wen Liu, Antonio Lee, Osamuyimen Stewart, Michael Frissora, David M. Lubensky:
RTTS: towards enterprise-level real-time speech transcription and translation services. 436-439 - Jing Zheng, Necip Fazil Ayan, Wen Wang, David Burkett:
Using syntax in large-scale audio document translation. 440-443 - Andreas Tsiartas, Prasanta Kumar Ghosh, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Context-driven automatic bilingual movie subtitle alignment. 444-447
Human Speech Production I, II
- Francisco Torreira, Mirjam Ernestus:
Probabilistic effects on French [t] duration. 448-451 - Odile Bagou, Violaine Michel, Marina Laganaro:
On the production of sandhi phenomena in French: psycholinguistic and acoustic data. 452-455 - Chierh Cheng, Yi Xu:
Extreme reductions: contraction of disyllables into monosyllables in taiwan Mandarin. 456-459 - Mitchell Peabody, Stephanie Seneff:
Annotation and features of non-native Mandarin tone quality. 460-463 - Katerina Chládková, Paul Boersma, Václav Jonás Podlipský:
On-line formant shifting as a function of F0. 464-467 - Kimiko Yamakawa, Shigeaki Amano, Shuichi Itahashi:
Production boundary between fricative and affricate in Japanese and Korean speakers. 468-471 - Cátia M. R. Pinho, Luis M. T. Jesus, Anna Barney:
Aerodynamics of fricative production in european portuguese. 472-475 - Anne Bonneau, Julie Buquet, Brigitte Wrobel-Dautcourt:
Contextual effects on protrusion and lip opening for /i, y/. 476-479 - Catarina Oliveira, Paula Martins, António J. S. Teixeira:
Speech rate effects on european portuguese nasal vowels. 480-483 - Tamás Gábor Csapó, Zsuzsanna Bárkányi, Tekla Etelka Gráczi, Tamás Bohm, Steven M. Lulich:
Relation of formants and subglottal resonances in Hungarian vowels. 484-487 - Takayuki Arai:
Simple physical models of the vocal tract for education in speech science. 756-759 - Kyohei Hayashi, Nobuhiro Miki:
Auto-meshing algorithm for acoustic analysis of vocal tract. 760-763 - Tokihiko Kaburagi, Katsunori Daimo, Shogo Nakamura:
Voice production model employing an interactive boundary-layer analysis of glottal flow. 764-767 - Matt Speed, Damian T. Murphy, David M. Howard:
Characteristics of two-dimensional finite difference techniques for vocal tract analysis and voice synthesis. 768-771 - Chao Qin, Miguel Á. Carreira-Perpiñán:
Adaptation of a predictive model of tongue shapes. 772-775 - Christian Kroos:
Using sensor orientation information for computational head stabilisation in 3d electromagnetic articulography (EMA). 776-779 - Laura Enflo, Johan Sundberg, Friedemann Pabst:
Collision threshold pressure before and after vocal loading. 780-783 - Elke Philburn:
Gender differences in the realization of vowel-initial glottalization. 784-787 - Hayo Terband, Frits van Brenk, Pascal van Lieshout, Lian Nijland, Ben Maassen:
Stability and composition of functional synergies for speech movements in children and adults. 788-791 - Frits van Brenk, Hayo Terband, Pascal van Lieshout, Anja Lowit, Ben Maassen:
An analysis of speech rate strategies in aging. 792-795 - Stefan Benus:
Variability and stability in collaborative dialogues: turn-taking and filled pauses. 796-799 - Youyi Lu, Martin Cooke:
Speaking in the presence of a competing talker. 800-803
Prosody, Text Analysis, and Multilingual Models
- Harald Romsdorfer:
Polyglot speech prosody control. 488-491 - Harald Romsdorfer:
Weighted neural network ensemble models for speech prosody control. 492-495 - Vataya Boonpiam, Anocha Rugchatjaroen, Chai Wutiwiwatchai:
Cross-language F0 modeling for under-resourced tonal languages: a case study on Thai-Mandarin. 496-499 - Dafydd Gibbon, Pramod Pandey, D. Mary Kim Haokip, Jolanta Bachan:
Prosodic issues in synthesising thadou, a tibeto-burman tone language. 500-503 - Chen-Yu Chiang, Sin-Horng Chen, Yih-Ru Wang:
Advanced unsupervised joint prosody labeling and modeling for Mandarin speech and its application to prosody generation for TTS. 504-507 - Ausdang Thangthai, Anocha Rugchatjaroen, Nattanun Thatphithakkul, Ananlada Chotimongkol, Chai Wutiwiwatchai:
Optimization of t-tilt F0 modeling. 508-511 - Nicolas Obin, Xavier Rodet, Anne Lacheret-Dujour:
A multi-level context-dependent prosodic model applied to durational modeling. 512-515 - Alexandre Trilla, Francesc Alías:
Sentiment classification in English from sentence-level annotations of emotions regarding models of affect. 516-519 - Leonardo Badino, J. Sebastian Andersson, Junichi Yamagishi, Robert A. J. Clark:
Identification of contrast and its emphatic realization in HMM based speech synthesis. 520-523 - Antonio Rui Ferreira Rebordão, Shaikh Mostafa Al Masum, Keikichi Hirose, Nobuaki Minematsu:
How to improve TTS systems for emotional expressivity. 524-527 - Yi-Jian Wu, Yoshihiko Nankaku, Keiichi Tokuda:
State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis. 528-531 - Frederick Weber, Kalika Bali:
Real voice and TTS accent effects on intelligibility and comprehension for indian speakers of English as a second language. 532-535 - Pablo Daniel Agüero, Antonio Bonafonte, Juan Carlos Tulli:
Improving consistence of phonetic transcription for text-to-speech. 536-539
Automatic Speech Recognition: Adaptation I, II
- Piero Cosi:
On the development of matched and mismatched Italian children's speech recognition systems. 540-543 - Oscar Saz, Eduardo Lleida, Antonio Miguel:
Combination of acoustic and lexical speaker adaptation for disordered speech recognition. 544-547 - Hwa Jeon Song, Yongwon Jeong, Hyung Soon Kim:
Bilinear transformation space-based maximum likelihood linear regression frameworks. 548-551 - Yusuke Ijima, Takeshi Matsubara, Takashi Nose, Takao Kobayashi:
Speaking style adaptation for spontaneous speech recognition using multiple-regression HMM. 552-555 - Shakti Prasad Rath, Srinivasan Umesh:
Acoustic class specific VTLN-warping using regression class trees. 556-559 - Sébastien Demange, Dirk Van Compernolle:
Speaker normalization for template based speech recognition. 560-563 - Hans-Günter Hirsch, Andreas Kitzig:
Improving the robustness with multiple sets of HMMs. 564-567 - Rohit Sinha, Shweta Ghai:
On the use of pitch normalization for improving children's speech recognition. 568-571 - Shakti Prasad Rath, Srinivasan Umesh, Achintya Kumar Sarkar:
Using VTLN matrices for rapid and computationally-efficient speaker adaptation with robustness to first-pass transcription errors. 572-575 - Koichi Shinoda, Hiroko Murakami, Sadaoki Furui:
Speaker adaptation based on two-step active learning. 576-579 - Mats Blomberg, Daniel Elenius:
Tree-based estimation of speaker characteristics for speech recognition. 580-583 - D. Rama Sanand, Shakti Prasad Rath, Srinivasan Umesh:
A study on the influence of covariance adaptation on jacobian compensation in vocal tract length normalization. 584-587 - Santiago Omar Caballero Morales, Stephen J. Cox:
On the estimation and the use of confusion-matrices for improving ASR accuracy. 1599-1602 - Shigeki Matsuda, Yu Tsao, Jinyu Li, Satoshi Nakamura, Chin-Hui Lee:
A study on soft margin estimation of linear regression parameters for speaker adaptation. 1603-1606 - Shweta Ghai, Rohit Sinha:
Exploring the role of spectral smoothing in context of children's speech recognition. 1607-1610 - Kishan Thambiratnam, Frank Seide:
Unsupervised lattice-based acoustic model adaptation for speaker-dependent conversational telephone speech transcription. 1611-1614 - Satoshi Kobashikawa, Atsunori Ogawa, Yoshikazu Yamaguchi, Satoshi Takahashi:
Rapid unsupervised adaptation using frame independent output probabilities of gender and context independent phoneme models. 1615-1618 - Shizhen Wang, Yi-Hui Lee, Abeer Alwan:
Bark-shift based nonlinear speaker normalization using the second subglottal resonance. 1619-1622
Applications in Learning and Other Areas
- Gregory Aist, Jack Mostow:
Designing spoken tutorial dialogue with children to elicit predictable but educationally valuable responses. 588-591 - Joost van Doremalen, Helmer Strik, Catia Cucchiarini:
Optimizing non-native speech recognition for CALL applications. 592-595 - Akinori Ito, Tomoaki Konno, Masashi Ito, Shozo Makino:
Evaluation of English intonation based on combination of multiple evaluation scores. 596-599 - Andreas K. Maier, Florian Hönig, Viktor Zeißler, Anton Batliner, Erik Körner, Nobuyuki Yamanaka, Peter Ackermann, Elmar Nöth:
A language-independent feature set for the automatic evaluation of prosody. 600-603 - Klaus Zechner, Derrick Higgins, René Lawless, Yoko Futagi, Sarah Ohls, George Ivanov:
Adapting the acoustic model of a speech recognizer for varied proficiency non-native spontaneous speech using read speech with language-specific pronunciation difficulty. 604-607 - Dean Luo, Yu Qiao, Nobuaki Minematsu, Yutaka Yamauchi, Keikichi Hirose:
Analysis and utilization of MLLR speaker adaptation technique for learners' pronunciation evaluation. 608-611 - Miki Iimura, Taichi Sato, Kihachiro Tanaka:
Control of human generating force by use of acoustic information - study on onomatopoeic utterances for controlling small lifting-force. 612-615 - Ching-Hsien Lee, Hsu-Chih Wu:
Mi-DJ: a multi-source intelligent DJ service. 616-619 - Géza Németh, Csaba Zainkó, Mátyás Bartalis, Gábor Olaszy, Géza Kiss:
Human voice or prompt generation? can they co-exist in an application? 620-623 - Quoc Anh Le, Andrei Popescu-Belis:
Automatic vs. human question answering over multimedia meeting recordings. 624-627
Special Session: Silent Speech Interfaces
- John F. Holzrichter:
Characterizing silent and pseudo-silent speech using radar-like sensors. 628-631 - Tomoki Toda, Keigo Nakamura, Takayuki Nagai, Tomomi Kaino, Yoshitaka Nakajima, Kiyohiro Shikano:
Technologies for processing body-conducted speech detected with non-audible murmur microphone. 632-635 - Jonathan S. Brumberg, Philip R. Kennedy, Frank H. Guenther:
Artificial speech synthesizer control by brain-computer interface. 636-639 - Thomas Hueber, Elie-Laurent Benaroya, Gérard Chollet, Bruce Denby, Gérard Dreyfus, Maureen Stone:
Visuo-phonetic decoding using multi-stream and context-dependent models for an ultrasound-based silent speech interface. 640-643 - Yunbin Deng, Rupal Patel, James T. Heaton, Glen Colby, L. Donald Gilmore, Joao Cabrera, Serge H. Roy, Carlo J. De Luca, Geoffrey S. Meltzner:
Disordered speech recognition using acoustic and sEMG signals. 644-647 - Michael Wand, Szu-Chen Stan Jou, Arthur R. Toth, Tanja Schultz:
Impact of different speaking modes on EMG-based speech recognition. 648-651 - Arthur R. Toth, Michael Wand, Tanja Schultz:
Synthesizing speech from electromyography using voice transformation techniques. 652-655 - Viet-Anh Tran, Gérard Bailly, Hélène Loevenbruck, Tomoki Toda:
Multimodal HMM-based NAM-to-speech conversion. 656-659
ASR: Discriminative Training
- Jonathan Malkin, Amarnag Subramanya, Jeff A. Bilmes:
On the semi-supervised learning of multi-layered perceptrons. 660-663 - Roger Hsiao, Tanja Schultz:
Generalized discriminative feature transformation for speech recognition. 664-667 - Chih-Chieh Cheng, Fei Sha, Lawrence K. Saul:
A fast online algorithm for large margin training of continuous density hidden Markov models. 668-671 - Dalei Wu, Baojie Li, Hui Jiang:
Maximum mutual information estimation via second order cone programming for large vocabulary continuous speech recognition. 672-675 - Dong Yu, Li Deng, Alex Acero:
Hidden conditional random field with distribution constraints for phone classification. 676-679 - Sayaka Shiota, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda:
Deterministic annealing based training algorithm for Bayesian speech recognition. 680-683
Language Acquisition
- Ilana Heintz, Mary E. Beckman, Eric Fosler-Lussier, Lucie Ménard:
Evaluating parameters for mapping adult vowels to imitative babbling. 688-691 - Chiharu Tsurutani:
Intonation of Japanese sentences spoken by English speakers. 692-695 - Mark A. Huckvale, Ian S. Howard, Sascha Fagel:
KLAIR: a virtual infant for spoken language acquisition research. 696-699 - Joseph Tepperman, Erik Bresch, Yoon-Chul Kim, Sungbok Lee, Louis Goldstein, Shrikanth S. Narayanan:
An articulatory analysis of phonological transfer using real-time MRI. 700-703 - Louis ten Bosch, Okko Johannes Räsänen, Joris Driesen, Guillaume Aimetti, Toomas Altosaar, Lou Boves, A. Corns:
Do multiple caregivers speed up language acquisition? 704-707
ASR: Lexical and Prosodic Models
- Antoine Laurent, Paul Deléglise, Sylvain Meignier:
Grapheme to phoneme conversion using an SMT system. 708-711 - Long Nguyen, Tim Ng, Kham Nguyen, Rabih Zbib, John Makhoul:
Lexical and phonetic modeling for Arabic automatic speech recognition. 712-715 - Gina-Anne Levow:
Assessing context and learning for isizulu tone recognition. 716-719 - Simon Dobrisek, Bostjan Vesnicer, France Mihelic:
A sequential minimization algorithm for finite-state pronunciation lexicon models. 720-723 - Kornel Laskowski, Mattias Heldner, Jens Edlund:
A general-purpose 32 ms prosodic vector for hidden Markov modeling. 724-727 - Dong Yang, Yi-Cheng Pan, Sadaoki Furui:
Vocabulary expansion through automatic abbreviation generation for Chinese voice search. 728-731
Unit-Selection Synthesis
- Qi Miao, Alexander Kain, Jan P. H. van Santen:
Perceptual cost function for cross-fading based concatenation. 732-735 - Daniel Tihelka, Jan Romportl:
Exploring automatic similarity measures for unit selection tuning. 736-739 - Cédric Boidin, Olivier Boëffard, Thierry Moudenc, Géraldine Damnati:
Towards intonation control in unit selection speech synthesis. 740-743 - Jerome R. Bellegarda:
A novel approach to cost weighting in unit selection TTS. 744-747 - Abubeker Gamboa Rosales, Hamurabi Gamboa Rosales, Rüdiger Hoffmann:
Maximum likelihood unit selection for corpus-based speech synthesis. 748-751 - Shinsuke Sakai, Ranniery Maia, Hisashi Kawai, Satoshi Nakamura:
A close look into the probabilistic concatenation model for corpus-based speech synthesis. 752-755
Speech and Audio Segmentation and Classification
- Michael Wiesenegger, Franz Pernkopf:
Wavelet-based speaker change detection in single channel speech data. 836-839 - Laura Docío Fernández, Paula Lopez-Otero, Carmen García-Mateo:
An adaptive threshold computation for unsupervised speaker segmentation. 840-843 - Gibak Kim, Philipos C. Loizou:
A data-driven approach for estimating the time-frequency binary mask. 844-847 - Haolang Zhou, Damianos G. Karakos, Andreas G. Andreou:
A semi-supervised version of heteroscedastic linear discriminant analysis. 848-851 - Okko Johannes Räsänen, Unto K. Laine, Toomas Altosaar:
Self-learning vector quantization for pattern discovery from speech. 852-855 - Rohit Prabhavalkar, Zhaozhang Jin, Eric Fosler-Lussier:
Monaural segregation of voiced speech using discriminative random fields. 856-859 - Chi Zhang, John H. L. Hansen:
Advancements in whisper-island detection within normally phonated audio streams. 860-863 - Matthias Zimmermann:
Joint segmentation and classification of dialog acts using conditional random fields. 864-867 - Claire Brierley, Eric Atwell:
Exploring complex vowels as phrase break correlates in a corpus of English speech with proPOSEL, a prosody and POS English lexicon. 868-871 - Caroline Clemens, Stefan Feldes, Karlheinz Schuhmacher, Joachim Stegmann:
Automatic topic detection of recorded voice messages. 872-875 - Jindrich Matousek, Radek Skarnitzl, Pavel Machac, Jan Trmal:
Identification and automatic detection of parasitic speech sounds. 876-879 - Daniel R. van Niekerk, Etienne Barnard:
Phonetic alignment for speech synthesis in under-resourced languages. 880-883 - Kalu U. Ogbureke, Julie Carson-Berndsen:
Improving initial boundary estimation for HMM-based automatic phonetic segmentation. 884-887
Speaker Recognition and Diarisation
- Howard Lei, Eduardo López Gonzalo:
Importance of nasality measures for speaker recognition data selection and performance prediction. 888-891 - Ning Wang, P. C. Ching, Tan Lee:
Exploration of vocal excitation modulation features for speaker recognition. 892-895 - Xing Fan, John H. L. Hansen:
Speaker identification for whispered speech using modified temporal patterns and MFCCs. 896-899 - Hanwu Sun, Tin Lay Nwe, Bin Ma, Haizhou Li:
Speaker diarization for meeting room audio. 900-903 - Runxin Li, Tanja Schultz, Qin Jin:
Improving speaker segmentation via speaker identification and text segmentation. 904-907 - David A. van Leeuwen:
Overall performance metrics for multi-condition speaker recognition evaluations. 908-911 - Matthias Wölfel, Qian Yang, Qin Jin, Tanja Schultz:
Speaker identification using warped MVDR cepstral features. 912-915 - Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman:
Entropy based overlapped speech detection as a pre-processing stage for speaker diarization. 916-919 - Marco Grimaldi, Fred Cummins:
Speech style and speaker recognition: a case study. 920-923 - Marijn Huijbregts, David A. van Leeuwen, Franciska M. G. de Jong:
The majority wins: a method for combining speaker diarization systems. 924-927 - Yosef A. Solewicz, Hagai Aronowitz:
Two-wire nuisance attribute projection. 928-931
Special Session: Advanced Voice Function Assessment
- Krzysztof Izdebski, Yuling Yan, Melda Kunduk:
Acoustic and high-speed digital imaging based analysis of pathological voice contributes to better understanding and differential diagnosis of neurological dysphonias and of mimicking phonatory disorders. 932-934 - Maria E. Markaki, Yannis Stylianou:
Normalized modulation spectral features for cross-database voice pathology detection. 935-938 - Christophe Mertens, Francis Grenez, Jean Schoentgen:
Speech sample salience analysis for speech cycle detection. 939-942 - Viliam Rapcan, Shona D'Arcy, Nils Penard, Ian H. Robertson, Richard B. Reilly:
The use of telephone speech recordings for assessment and monitoring of cognitive function in elderly people. 943-946 - Sunil Nagaraja, Eduardo Castillo Guerra:
Optimized feature set to assess acoustic perturbations in dysarthric speech. 947-950 - Andreas K. Maier, Stefan Wenhardt, Tino Haderlein, Maria Schuster, Elmar Nöth:
A microphone-independent visualization technique for speech disorders. 951-954 - Rubén Fraile, Carmelo Sánchez, Juan Ignacio Godino-Llorente, Nicolás Sáenz-Lechón, Víctor Osma-Ruiz, Juana M. Gutiérrez:
Evaluation of the effect of the GSM full rate codec on the automatic detection of laryngeal pathologies based on cepstral analysis. 955-958 - Ali Alpan, Jean Schoentgen, Youri Maryn, Francis Grenez, P. Murphy:
Cepstral analysis of vocal dysperiodicities in disordered connected speech. 959-962 - Lise Crevier-Buchman, Stephanie Borel, Stéphane Hans, Madeleine Menard, Jacqueline Vaissière:
Standard information from patients: the usefulness of self-evaluation (measured with the French version of the VHI). 963-966 - Marcello Scipioni, Matteo Gerosa, Diego Giuliani, Elmar Nöth, Andreas K. Maier:
Intelligibility assessment in children with cleft lip and palate in Italian and German. 967-970 - Luis M. T. Jesus, Anna Barney, Ricardo Santos, Janine Caetano, Juliana Jorge, Pedro Sá-Couto:
Universidade de aveiro's voice evaluation protocol. 971-974
Automotive and Mobile Applications
- Hoon Chung, JeonGue Park, HyeonBae Jeon, Yunkeun Lee:
Fast speech recognition for voice destination entry in a car navigation system. 975-978 - Yun-Cheng Ju, Michael L. Seltzer, Ivan Tashev:
Improving perceived accuracy for in-car media search. 979-982 - Florian Schiel, Christian Heinrich:
Laying the foundation for in-car alcohol detection by speech. 983-986 - Yun-Cheng Ju, Tim Paek:
A voice search approach to replying to SMS messages in automobiles. 987-990 - Charl Johannes van Heerden, Johan Schalkwyk, Brian Strope:
Language modeling for what-with-where on GOOG-411. 991-994 - Jan Nouza, Petr Cerva, Jindrich Zdánský:
Very large vocabulary voice dictation for mobile devices. 995-998
Prosody: Production I, II
- Diana V. Dimitrova, Gisela Redeker, John C. J. Hoeks:
Did you say a BLUE banana? the prosody of contrast and abnormality in bulgarian and dutch. 999-1002 - Hansjörg Mixdorff, Hartmut R. Pfitzinger:
A quantitative study of F0 peak alignment and sentence modality. 1003-1006 - Szu-wei Chen, Bei Wang, Yi Xu:
Closely related languages, different ways of realizing focus. 1007-1010 - Plínio Almeida Barbosa, Céu Viana, Isabel Trancoso:
Cross-variety rhythm typology in portuguese. 1011-1014 - Marie Nilsenová, Marc Swerts, Véronique Houtepen, Heleen Dittrich:
Pitch adaptation in different age groups: boundary tones versus global pitch. 1015-1018 - Agustín Gravano, Julia Hirschberg:
Backchannel-inviting cues in task-oriented dialogue. 1019-1022 - Willemijn Heeren, Vincent J. van Heuven:
Perception and production of boundary tones in whispered dutch. 2411-2414 - Katrin Schweitzer, Arndt Riester, Michael Walsh, Grzegorz Dogil:
Pitch accents and information status in a German radio news corpus. 2415-2418 - Adrian Leemann, Keikichi Hirose, Hiroya Fujisaki:
Analysis of voice fundamental frequency contours of continuing and terminating prosodic phrases in four swiss German dialects. 2419-2422 - Michelina Savino:
Intonational features for identifying regional accents of Italian. 2423-2426 - Agnieszka Wagner:
Analysis and recognition of accentual patterns. 2427-2430 - Nigel G. Ward, Rafael Escalante-Ruiz:
Using responsive prosodic variation to acknowledge the user's current state. 2431-2434 - Oliver Niebuhr:
Intonation segments and segmental intonation. 2435-2438 - David House, Anastasia Karlsson, Jan-Olof Svantesson, Damrong Tayanin:
The phrase-final accent in kammu: effects of tone, focus and engagement. 2439-2442 - Raya Kalaldeh, Amelie Dorn, Ailbhe Ní Chasaide:
Tonal alignment in three varieties of hiberno-English. 2443-2446 - Lourdes Aguilar, Antonio Bonafonte, Francisco Campillo, David Escudero Mancebo:
Determining intonational boundaries from the acoustic signal. 2447-2450 - Claudia K. Ohl, Hartmut R. Pfitzinger:
Compression and truncation revisited. 2451-2454 - Hartmut R. Pfitzinger, Hansjörg Mixdorff, Jan Schwarz:
Comparison of Fujisaki-model extractors and F0 stylizers. 2455-2458 - Caterina Petrone, Mariapaola D'Imperio:
Is tonal alignment interpretation independent of methodology? 2459-2462 - Margaret Zellers, Brechtje Post, Mariapaola D'Imperio:
Modeling the intonation of topic structure: two approaches. 2463-2466
ASR: Spoken Language Understanding
- Silvia Quarteroni, Giuseppe Riccardi, Marco Dinarelli:
What's in an ontology for spoken language understanding. 1023-1026 - Hiroaki Nanjo, Hiroki Mikami, Hiroshi Kawano, Takanobu Nishiura:
A fundamental study of shouted speech for acoustic-based security system. 1027-1030 - Timo Baumann, Okko Buß, Michaela Atterer, David Schlangen:
Evaluating the potential utility of ASR n-best lists for incremental spoken dialogue systems. 1031-1034 - Bin Zhang, Wei Wu, Jeremy G. Kahn, Mari Ostendorf:
Improving the recognition of names by document-level clustering. 1035-1038 - Frédéric Béchet, Alexis Nasr:
Robust dependency parsing for spoken language understanding of spontaneous speech. 1039-1042 - Chao-Hong Liu, Chung-Hsien Wu:
Semantic role labeling with discriminative feature selection for spoken language understanding. 1043-1046
Speaker Diarisation
- Douglas A. Reynolds, Patrick Kenny, Fabio Castaldo:
A study of new approaches to speaker diarization. 1047-1050 - Themos Stafylakis, Vassilis Katsouros, George Carayannis:
Redefining the Bayesian information criterion for speaker diarisation. 1051-1054 - Shih-Sian Cheng, Chun-Han Tseng, Chia-Ping Chen, Hsin-Min Wang:
Speaker diarization using divide-and-conquer. 1055-1058 - Deepu Vijayasenan, Fabio Valente, Hervé Bourlard:
KL realignment for speaker diarization with multiple feature streams. 1059-1062 - Marijn Huijbregts, David A. van Leeuwen, Franciska M. G. de Jong:
Speech overlap detection in a two-pass speaker diarization system. 1063-1066 - Kyu Jeong Han, Shrikanth S. Narayanan:
Improved speaker diarization of meeting speech with recurrent selection of representative speech segments and participant interaction pattern modeling. 1067-1070
Speech Processing with Audio or Audiovisual Input
- Henry Widjaja, Suryoadhi Wibowo:
Application of differential microphone array for IS-127 EVRC rate determination algorithm. 1123-1126 - Alberto Yoshihiro Nakano, Seiichi Nakagawa, Kazumasa Yamamoto:
Estimating the position and orientation of an acoustic source with a microphone array network. 1127-1130 - Vishweshwara Rao, S. Ramakrishnan, Preeti Rao:
Singing voice detection in polyphonic music using predominant pitch. 1131-1134 - Juan Pablo Arias, Néstor Becerra Yoma, Hiram Vivanco:
Word stress assessment for computer aided language learning. 1135-1138 - Adrien Leman, Julien Faure, Etienne Parizet:
A non-intrusive signal-based model for speech quality evaluation using automatic classification of background noises. 1139-1142 - Kouhei Sumi, Tatsuya Kawahara, Jun Ogata, Masataka Goto:
Acoustic event detection for spotting "hot spots" in podcasts. 1143-1146 - Taras Butko, Cristian Canton-Ferrer, Carlos Segura, Xavier Giró, Climent Nadeu, Javier Hernando, Josep R. Casas:
Improving detection of acoustic events using audiovisual data and feature level fusion. 1147-1150 - Miguel M. F. Bugalho, José Portelo, Isabel Trancoso, Thomas Pellegrini, Alberto Abad:
Detecting audio events for semantic video search. 1151-1154 - Mickael Rouvier, Driss Matrouf, Georges Linarès:
Factor analysis for audio-based video genre classification. 1155-1158 - Mickael Rouvier, Georges Linarès, Driss Matrouf:
Robust audio-based classification of video genre. 1159-1162 - Joerg Schmalenstroeer, Martin Kelling, Volker Leutnant, Reinhold Haeb-Umbach:
Fusing audio and video information for online speaker diarization. 1163-1166 - Girija Chetty, Michael Wagner:
Multimodal speaker verification using ancillary known speaker characteristics such as gender or age. 1167-1170 - Guillaume Aimetti, Roger K. Moore, Louis ten Bosch, Okko Johannes Räsänen, Unto Kalervo Laine:
Discovering keywords from cross-modal input: ecological vs. engineering methods for enhancing acoustic repetitions. 1171-1174
ASR: Decoding and Confidence Measures
- Miroslav Novak:
Incremental composition of static decoding graphs. 1175-1178 - Jacques Duchateau, Kris Demuynck, Hugo Van hamme:
Evaluation of phone lattice based speech decoding. 1179-1182 - Jike Chong, Ekaterina Gonina, Youngmin Yi, Kurt Keutzer:
A fully data parallel WFST-based large vocabulary continuous speech recognition on a graphics processing unit. 1183-1186 - Benjamin Lecouteux, Georges Linarès, Benoît Favre:
Combined low level and high level features for out-of-vocabulary word detection. 1187-1190 - Björn Hoffmeister, Ralf Schlüter, Hermann Ney:
Bayes risk approximations using time overlap with an application to system combination. 1191-1194 - Christopher M. White, Ariya Rastrow, Sanjeev Khudanpur, Frederick Jelinek:
Unsupervised estimation of the language model scaling factor. 1195-1198 - Atsunori Ogawa, Atsushi Nakamura:
Simultaneous estimation of confidence and error cause in speech recognition using discriminative model. 1199-1202 - Cyril Allauzen, Michael Riley, Johan Schalkwyk:
A generalized composition algorithm for weighted finite-state transducers. 1203-1206 - Stefano Scanzio, Pietro Laface, Daniele Colibro, Roberto Gemello:
Word confidence using duration models. 1207-1210 - Preethi Jyothi, Eric Fosler-Lussier:
A comparison of audio-free speech recognition error prediction methods. 1211-1214 - Petr Motlícek:
Automatic out-of-language detection based on confidence measures derived from LVCSR word and phone lattices. 1215-1218 - Brian Mak, Tom Ko:
Automatic estimation of decoding parameters using large-margin iterative linear programming. 1219-1222
Robust Automatic Speech Recognition I-III
- Randy Gomez, Tatsuya Kawahara:
Optimization of dereverberation parameters based on likelihood of speech recognizer. 1223-1226 - Jort F. Gemmeke, Yujun Wang, Maarten Van Segbroeck, Bert Cranen, Hugo Van hamme:
Application of noise robust MDT speech recognition on the SPEECON and speechdat-car databases. 1227-1230 - Alexander Krueger, Reinhold Haeb-Umbach:
Model based feature enhancement for automatic speech recognition in reverberant environments. 1231-1234 - Masakiyo Fujimoto, Kentaro Ishizuka, Tomohiro Nakatani:
A study of mutual front-end processing method based on statistical model for noise robust speech recognition. 1235-1238 - Guan-min He, Jeih-Weih Hung:
Integrating codebook and utterance information in cepstral statistics normalization techniques for robust speech recognition. 1239-1242 - Hynek Boril, John H. L. Hansen:
Reduced complexity equalization of lombard effect for speech recognition in noisy adverse environments. 1243-1246 - Luis Buera, Antonio Miguel, Alfonso Ortega, Eduardo Lleida, Richard M. Stern:
Unsupervised training scheme with non-stereo data for empirical feature vector compensation. 1247-1250 - Federico Flego, Mark J. F. Gales:
Incremental adaptation with VTS and joint adaptively trained systems. 1251-1254 - Takahiro Shinozaki, Sadaoki Furui:
Target speech GMM-based spectral compensation for noise robust speech recognition. 1255-1258 - Sheng-Chiuan Chiou, Chia-Ping Chen:
Noise-robust feature extraction based on forward masking. 1259-1262 - Tetsuo Kosaka, You Saito, Masaharu Kato:
Noisy speech recognition by using output combination of discrete-mixture HMMs and continuous-mixture HMMs. 2379-2382 - D. K. Kim, Mark J. F. Gales:
Adaptive training with noisy constrained maximum likelihood linear regression for noise robust speech recognition. 2383-2386 - Guanghu Shen, Soo-Young Suk, Hyun-Yeol Chung:
Performance comparisons of the integrated parallel model combination approaches with front-end noise reduction. 2387-2390 - Jibran Yousafzai, Zoran Cvetkovic, Peter Sollich:
Tuning support vector machines for robust phoneme classification with acoustic waveforms. 2391-2394 - Volker Leutnant, Reinhold Haeb-Umbach:
An analytic derivation of a phase-sensitive observation model for noise robust speech recognition. 2395-2398 - Wooil Kim, John H. L. Hansen:
Variational model composition for robust speech recognition with time-varying background noise. 2399-2402 - Haitian Xu, K. K. Chin:
Comparison of estimation techniques in joint uncertainty decoding for noise robust speech recognition. 2403-2406 - Jianhua Lu, Ji Ming, Roger F. Woods:
Replacing uncertainty decoding with subband re-estimation for large vocabulary speech recognition in noise. 2407-2410 - Ramón Fernandez Astudillo, Dorothea Kolossa, Reinhold Orglmeister:
Accounting for the uncertainty of speech estimates in the complex domain for minimum mean square error speech enhancement. 2491-2494 - Chanwoo Kim, Kshitiz Kumar, Bhiksha Raj, Richard M. Stern:
Signal separation for robust speech recognition based on phase difference information obtained in the frequency domain. 2495-2498 - Rogier C. van Dalen, Federico Flego, Mark J. F. Gales:
Transforming features to compensate speech recogniser models for noise. 2499-2502 - Xugang Lu, Masashi Unoki, Satoshi Nakamura:
Subband temporal modulation spectrum normalization for automatic speech recognition in reverberant environments. 2503-2506 - Martin Wöllmer, Florian Eyben, Björn W. Schuller, Yang Sun, Tobias Moosmayr, Nhu Nguyen-Thien:
Robust in-car spelling recognition - a tandem BLSTM-HMM approach. 2507-2510 - Maarten Van Segbroeck, Hugo Van hamme:
Applying non-negative matrix factorization on time-frequency reassignment spectra for missing data mask estimation. 2511-2514
Speaker Verification and Identification I-III
- Lukás Burget, Pavel Matejka, Valiantsina Hubeika, Jan Cernocký:
Investigation into variants of joint factor analysis for speaker recognition. 1263-1266 - Mitchell McLaren, Robbie Vogt, Brendan Baker, Sridha Sridharan:
Improved GMM-based speaker verification using SVM-driven impostor dataset selection. 1267-1270 - Yossi Bar-Yosef, Yuval Bistritz:
Adaptive individual background model for speaker verification. 1271-1274 - Shi-Xiong Zhang, Man-Wai Mak:
Optimization of discriminative kernels in SVM speaker verification. 1275-1278 - Zhenchun Lei:
UBM-based sequence kernel for speaker recognition. 1279-1282 - Minqiang Xu, Xi Zhou, Beiqian Dai, Thomas S. Huang:
GMM kernel by Taylor series for speaker verification. 1283-1286 - Elizabeth Shriberg, Sachin S. Kajarekar, Nicolas Scheffer:
Does session variability compensation in speaker recognition model intrinsic variation under mismatched conditions? 1551-1554 - Zahi N. Karam, William M. Campbell:
Variability compensated support vector machines applied to speaker verification. 1555-1558 - Najim Dehak, Réda Dehak, Patrick Kenny, Niko Brümmer, Pierre Ouellet, Pierre Dumouchel:
Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. 1559-1562 - Robbie Vogt, Jason W. Pelecanos, Nicolas Scheffer, Sachin S. Kajarekar, Sridha Sridharan:
Within-session variability modelling for factor analysis speaker verification. 1563-1566 - Ron M. Hecht, Elad Noor, Naftali Tishby:
Speaker recognition by Gaussian information bottleneck. 1567-1570 - Chris Longworth, Rogier C. van Dalen, Mark J. F. Gales:
Variational dynamic kernels for speaker verification. 1571-1574 - Howard Lei, Eduardo López Gonzalo:
Mel, linear, and antimel frequency cepstral coefficients in broad phonetic regions for telephone speaker recognition. 2323-2326 - Guoli Ye, Brian Mak, Man-Wai Mak:
Fast GMM computation for speaker verification using scalar quantization and discrete densities. 2327-2330 - Achintya Kumar Sarkar, Srinivasan Umesh, Shakti Prasad Rath:
Text-independent speaker identification using vocal tract length normalization for building universal background model. 2331-2334 - Lukás Burget, Michal Fapso, Valiantsina Hubeika, Ondrej Glembek, Martin Karafiát, Marcel Kockmann, Pavel Matejka, Petr Schwarz, Jan Cernocký:
BUT system for NIST 2008 speaker recognition evaluation. 2335-2338 - José R. Calvo, Rafael Fernández, Gabriel Hernández:
Selection of the best set of shifted delta cepstral features in speaker verification using mutual information. 2339-2342 - Alberto de Castro, Daniel Ramos, Joaquin Gonzalez-Rodriguez:
Forensic speaker recognition using traditional features comparing automatic and human-in-the-loop formant tracking. 2343-2346 - Surosh G. Pillay, Aladdin M. Ariyaeeinia, P. Sivakumaran, M. Pawlewski:
Open-set speaker identification under mismatch conditions. 2347-2350 - Xavier Anguera:
Minivectors: an improved GMM-SVM approach for speaker verification. 2351-2354 - R. Padmanabhan, Sree Hari Krishnan Parthasarathi, Hema A. Murthy:
Robustness of phase based features for speaker recognition. 2355-2358 - Douglas E. Sturim, William M. Campbell, Zahi N. Karam, Douglas A. Reynolds, Fred S. Richardson:
The MIT lincoln laboratory 2008 speaker recognition system. 2359-2362 - A. R. Stauffer, Aaron D. Lawson:
Speaker recognition on lossy compressed speech using the speex codec. 2363-2366 - Haruka Okamoto, Satoru Tsuge, Amira Abdelwahab, Masafumi Nishida, Yasuo Horiuchi, Shingo Kuroiwa:
Text-independent speaker verification using rank threshold in large number of speaker models. 2367-2370 - Yun Lei, John H. L. Hansen:
The role of age in factor analysis for speaker identification. 2371-2374 - Juliette Kahn, Solange Rossato:
Do humans and speaker verification system use the same information to differentiate voices? 2375-2378
Text Processing for Spoken Language Generation
- Jeppe Beck, Daniela Braga, João Nogueira, Miguel Sales Dias, Luís Pinto Coelho:
Automatic syllabification for danish text-to-speech systems. 1287-1290 - Jinsik Lee, Byeongchang Kim, Gary Geunbae Lee:
Hybrid approach to grapheme to phoneme conversion for Korean. 1291-1294 - Korin Richmond, Robert A. J. Clark, Susan Fitt:
Robust LTS rules with the Combilex speech technology lexicon. 1295-1298 - Vincent Claveau:
Letter-to-phoneme conversion by inference of rewriting rules. 1299-1302 - Sittichai Jiampojamarn, Grzegorz Kondrak:
Online discriminative training for grapheme-to-phoneme conversion. 1303-1306 - Peter Cahill, Jinhua Du, Andy Way, Julie Carson-Berndsen:
Using same-language machine translation to create alternative target sequences for text-to-speech synthesis. 1307-1310
Single- and Multichannel Speech Enhancement
- Robert Morris, Ralph Johnson, Vladimir Goncharoff, Joseph DiVita:
Watermark recovery from speech using inverse filtering and sign correlation. 1311-1314 - Jouni Pohjalainen, Heikki Kallasjoki, Kalle J. Palomäki, Mikko Kurimo, Paavo Alku:
Weighted linear prediction for speech analysis in noisy conditions. 1315-1318 - Richard C. Hendriks, Richard Heusdens, Jesper Jensen:
Log-spectral magnitude MMSE estimators under super-Gaussian densities. 1319-1322 - Yusuke Hioka, Ken'ichi Furuya, Youichi Haneda, Akitoshi Kataoka:
Speech enhancement in a 2-dimensional area based on power spectrum estimation of multiple areas with investigation of existence of active sources. 1323-1326 - Kuldip K. Paliwal, Belinda Schwerin, Kamil K. Wójcicki:
Modulation domain spectral subtraction for speech enhancement. 1327-1330 - Steven J. Rennie, John R. Hershey, Peder A. Olsen:
Variational loopy belief propagation for multi-talker speech recognition. 1331-1334 - Nadir Cazi, T. V. Sreenivas:
Enhancement of binaural speech using codebook constrained iterative binaural wiener filter. 1335-1338 - Kazunobu Kondo, Makoto Yamada, Hideki Kenmochi:
A semi-blind source separation method with a less amount of computation suitable for tiny DSP modules. 1339-1342 - Siu Wa Lee, Frank K. Soong, Tan Lee:
Model-based speech separation: identifying transcription using orthogonality. 1343-1346 - Yun-Sik Park, Ji-Hyun Song, Jae-Hun Choi, Joon-Hyuk Chang:
Enhanced minimum statistics technique incorporating soft decision for noise suppression. 1347-1350 - Mark A. Huckvale, Jayne Leak:
Effect of noise reduction on reaction time to speech in noise. 1351-1354 - Behdad Dashtbozorg, Hamid Reza Abutalebi:
Joint noise reduction and dereverberation of speech using hybrid TF-GSC and adaptive MMSE estimator. 1355-1358 - Kook Cho, Takanobu Nishiura, Yoichi Yamashita:
A study on multiple sound source localization with a distributed microphone system. 1359-1362 - Tao Yu, John H. L. Hansen:
Robust minimal variance distortionless speech power spectra enhancement using order statistic filter for microphone array. 1363-1366 - Amit Das, John H. L. Hansen:
Speech enhancement minimizing generalized euclidean distortion using supergaussian priors. 1367-1370 - Iman Haji Abolhassani, Sid-Ahmed Selouani, Douglas D. O'Shaughnessy:
STFT-based speech enhancement by reconstructing the harmonics. 1371-1374 - Ciira Wa Maina, John MacLaren Walsh:
Joint speech enhancement and speaker identification using monte carlo methods. 1375-1378
ASR: Acoustic Modelling
- Jing Huang, Karthik Visweswariah:
Combined discriminative training for multi-stream HMM-based audio-visual speech recognition. 1379-1382 - Panikos Heracleous, Denis Beautemps, Noureddine Aboutabit:
Cued speech recognition for augmentative communication in normal-hearing and hearing-impaired subjects. 1383-1386 - Daniel Neiberg, Gopal Ananthakrishnan, Mats Blomberg:
On acquiring speech production knowledge from articulatory measurements for phoneme recognition. 1387-1390 - John Dines, Junichi Yamagishi, Simon King:
Measuring the gap between HMM-based ASR and TTS. 1391-1394 - John Dines, Lakshmi Babu Saheer, Hui Liang:
Speech recognition with speech synthesis models by marginalising over decision tree leaves. 1395-1398 - Motoyuki Suzuki, Daisuke Honma, Akinori Ito, Shozo Makino:
Detailed description of triphone model using SSS-free algorithm. 1399-1402 - Jitendra Ajmera, Masami Akamine:
Decision tree acoustic models for ASR. 1403-1406 - Catherine Breslin, Matthew N. Stuttle, Kate M. Knill:
Compression techniques applied to multiple speech recognition systems. 1407-1410 - Antonio Miguel, Alfonso Ortega, Luis Buera, Eduardo Lleida:
Graphical models for discrete hidden Markov models in speech recognition. 1411-1414 - Chuan-Wei Ting, Jen-Tzung Chien:
Factor analyzed HMM topology for speech recognition. 1415-1418 - Soo-Young Suk, Hiroaki Kojima:
Tied-state multi-path HMnet model using three-domain successive state splitting. 1419-1422 - Vaibhava Goel, Peder A. Olsen:
Acoustic modeling using exponential families. 1423-1426
Assistive Speech Technology
- Sarah M. Creer, Stuart P. Cunningham, Phil D. Green, K. Fatema:
Personalizing synthetic voices for people with progressive speech disorders: judging voice similarity. 1427-1430 - Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Electrolaryngeal speech enhancement based on statistical voice conversion. 1431-1434 - Maria Klara Wolters, Ravichander Vipperla, Steve Renals:
Age recognition for spoken dialogue systems: do we need it? 1435-1438 - Markku Turunen, Jaakko Hakulinen, Aleksi Melto, Juho Hella, Juha-Pekka Rajaniemi, Erno Mäkinen, Jussi Rantala, Tomi Heimonen, Tuuli Laivo, Hannu Soronen, Mervi Hansen, Pellervo Valkama, Toni Miettinen, Roope Raisamo:
Speech-based and multimodal media center for different user groups. 1439-1442 - Samer Al Moubayed, Jonas Beskow, Anne-Marie Öster, Giampiero Salvi, Björn Granström, Nic van Son, Ellen Ormel:
Virtual speech reading support for hard of hearing in a domestic multi-media setting. 1443-1446 - Patrick Cardinal, Gilles Boulianne:
Real-time correction of closed-captions. 1447-1450 - Harsh Vardhan Sharma, Mark Hasegawa-Johnson:
Universal access: speech recognition for talkers with spastic dysarthria. 1451-1454 - Mohammed E. Hoque, Joseph K. Lane, Rana El Kaliouby, Matthew S. Goodwin, Rosalind W. Picard:
Exploring speech therapy games with children on the autism spectrum. 1455-1458 - José Luis Blanco Murillo, Rubén Fernández Pozo, David Díaz Pardo de Vera, Álvaro Sigüenza, Luis A. Hernández Gómez, José Alcázar Ramírez:
Analyzing GMMs to characterize resonance anomalies in speakers suffering from apnoea. 1459-1462 - Thomas Drugman, Thomas Dubuisson, Thierry Dutoit:
On the mutual information between source and filter contributions for voice pathology detection. 1463-1466 - Morten Højfeldt Rasmussen, Zheng-Hua Tan, Børge Lindberg, Søren Holdt Jensen:
A system for detecting miscues in dyslexic read speech. 1467-1470
Topics in Spoken Language Processing
- Jonathan Wintrode, Scott Kulp:
Techniques for rapid and robust topic identification of conversational telephone speech. 1471-1474 - David Suendermann, Jackson Liscombe, Krishna Dayanidhi, Roberto Pieraccini:
Localization of speech recognition in spoken dialog systems: how machine translation can make our lives easier. 1475-1478 - Kunal Mukerjee, Shankar L. Regunathan, Jeffrey Cole:
Algorithms for speech indexing in microsoft recite. 1479-1482 - Tsuyoshi Fujinaga, Kazuo Miura, Hiroki Noguchi, Hiroshi Kawaguchi, Masahiko Yoshimoto:
Parallelized viterbi processor for 5, 000-word large-vocabulary real-time continuous speech recognition FPGA system. 1483-1486 - Sara Romano, Elvio Cecere, Francesco Cutugno:
SplaSH (spoken language search hawk): integrating time-aligned with text-aligned annotations. 1487-1490 - Jun Ogata, Masataka Goto:
Podcastle: collaborative training of acoustic models on the basis of wisdom of crowds for podcast transcription. 1491-1494 - Graham Neubig, Shinsuke Mori, Tatsuya Kawahara:
A WFST-based log-linear framework for speaking-style transformation. 1495-1498 - Nikhil Garg, Benoît Favre, Korbinian Riedhammer, Dilek Hakkani-Tür:
Clusterrank: a graph based method for meeting summarization. 1499-1502 - Shasha Xie, Benoît Favre, Dilek Hakkani-Tür, Yang Liu:
Leveraging sentence weights in a concept-based optimization framework for extractive meeting summarization. 1503-1506 - Shih-Hsiang Lin, Yueng-Tien Lo, Yao-Ming Yeh, Berlin Chen:
Hybrids of supervised and unsupervised models for extractive speech summarization. 1507-1510 - I. Dan Melamed, Yeon-Jun Kim:
Automatic detection of audio advertisements. 1511-1514 - Sameer Maskey, Wisam Dakka:
Named entity network based on wikipedia. 1515-1518
Special Session: Measuring the Rhythm of Speech
- Daniel Hirst:
The rhythm of text and the rhythm of utterances: from metrics to models. 1519-1522 - Petra Wagner, Andreas Windmann:
Paper 8003 was not available at the time of publication oral presentation of poster papers no time to lose? time shrinking effects enhance the impression of rhythmic "isochrony" and fast speech rate. 1523-1526 - Plínio A. Barbosa:
Measuring speech rhythm variation in a model-based framework. 1527-1530 - Anastassia Loukina, Greg Kochanski, Chilin Shih, Elinor Keane, Ian Watson:
Rhythm measures with language-independent segmentation. 1531-1534 - Margaret Maclagan, Catherine Inez Watson, Jeanette King, Ray Harlow, Laura Thompson, Peter Keegan:
Investigating changes in the rhythm of maori over time. 1535-1538 - Shizuka Nakamura, Hiroaki Kato, Yoshinori Sagisaka:
Effects of mora-timing in English rhythm control by Japanese learners. 1539-1542 - Jan Volín, Petr Pollák:
The dynamic dimension of the global speech-rhythm attributes. 1543-1546 - Zofia Malisz:
Vowel duration in pre-geminate contexts in Polish. 1547-1550
Emotion and Expression I, II
- Martijn Goudbeek, Jean-Philippe Goldman, Klaus R. Scherer:
Emotion dimensions and formant position. 1575-1578 - Heather Pon-Barry, Stuart M. Shieber:
Identifying uncertain words within an utterance via prosodic features. 1579-1582 - Emily Mower, Maja J. Mataric, Shrikanth S. Narayanan:
Evaluating evaluators: a case study in understanding the benefits and pitfalls of multi-evaluator modeling. 1583-1586 - Jaime C. Acosta, Nigel G. Ward:
Responding to user emotional state by adding emotional coloring to utterances. 1587-1590 - K. Sudheer Kumar, Sri Harish Reddy Mallidi, K. Sri Rama Murty, B. Yegnanarayana:
Analysis of laugh signals for detecting in continuous speech. 1591-1594 - Martin Wöllmer, Florian Eyben, Björn W. Schuller, Ellen Douglas-Cowie, Roddy Cowie:
Data-driven clustering in emotional space for affect recognition using discriminatively trained LSTM networks. 1595-1598 - Catherine Lai:
Perceiving surprise on cue words: prosody and semantics interact on right and really. 1963-1966 - Rok Gajsek, Vitomir Struc, Simon Dobrisek, France Mihelic:
Emotion recognition using linear transformations in combination with video. 1967-1970 - Ignacio López-Moreno, Carlos Ortego-Resa, Joaquin Gonzalez-Rodriguez, Daniel Ramos:
Speaker dependent emotion recognition using prosodic supervectors. 1971-1974 - Yu Zhou, Yanqing Sun, Junfeng Li, Jianping Zhang, Yonghong Yan:
Physiologically-inspired feature extraction for emotion recognition. 1975-1978 - Irena Yanushevskaya, Christer Gobl, Ailbhe Ní Chasaide:
Perceived loudness and voice quality in affect cueing. 1979-1982 - Chi-Chun Lee, Carlos Busso, Sungbok Lee, Shrikanth S. Narayanan:
Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions. 1983-1986 - Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan:
A detailed study of word-position effects on emotion expression in speech. 1987-1990 - Norhaslinda Kamaruddin, Abdul Wahab:
CMAC for speech emotion profiling. 1991-1994 - Marko Lugger, Bin Yang:
On the relevance of high-level features for speaker independent emotion recognition of spontaneous speech. 1995-1998 - Björn W. Schuller, Gerhard Rigoll:
Recognising interest in conversational speech - comparing bag of frames and supra-segmental features. 1999-2002
Voice Transformation I, II
- Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Many-to-many eigenvoice conversion with reference voice. 1623-1626 - Elizabeth Godoy, Olivier Rosec, Thierry Chonavel:
Alleviating the one-to-many mapping problem in voice conversion with context-dependent modeling. 1627-1630 - Binh Phu Nguyen, Masato Akagi:
Efficient modeling of temporal structure of speech for applications in voice transformation. 1631-1634 - Malorie Charlier, Yamato Ohtani, Tomoki Toda, Alexis Moinet, Thierry Dutoit:
Cross-language voice conversion based on eigenvoices. 1635-1638 - Alejandro José Uriz, Pablo Daniel Agüero, Antonio Bonafonte, Juan Carlos Tulli:
Voice conversion using k-histograms and frame selection. 1639-1642 - Dalei Wu, Baojie Li, Hui Jiang, Qian-Jie Fu:
Online model adaptation for voice conversion using model-based speech synthesis techniques. 1643-1646 - Oliver Watts, Junichi Yamagishi, Simon King, Kay Berkling:
HMM adaptation and voice conversion for the synthesis of child speech: a comparison. 2627-2630 - Takashi Nose, Junichi Adada, Takao Kobayashi:
HMM-based speaker characteristics emphasis using average voice model. 2631-2634 - Damien Lolive, Nelly Barbot, Olivier Boëffard:
An evaluation methodology for prosody transformation systems based on chirp signals. 2635-2638 - Yoshiki Nambu, Masahiko Mikawa, Kazuyo Tanaka:
Voice morphing based on interpolation of vocal tract area functions using AR-HMM analysis of speech. 2639-2642 - Hsin-Te Hwang, Chen-Yu Chiang, Po-Yi Sung, Sin-Horng Chen:
A novel model-based pitch conversion method for Mandarin speech. 2643-2646 - Hideki Kawahara, Masanori Morise, Toru Takahashi, Hideki Banno, Ryuichi Nisimura, Toshio Irino:
Observation of empirical cumulative distribution of vowel spectral distances and its application to vowel based voice conversion. 2647-2650 - Ryuki Tachibana, Zhiwei Shuang, Masafumi Nishimura:
Japanese pitch conversion for voice morphing based on differential modeling. 2651-2654 - Victor Popa, Jani Nurminen, Moncef Gabbouj:
A novel technique for voice conversion based on style and content decomposition with bilinear models. 2655-2658 - Felix Burkhardt:
Rule-based voice quality variation with formant synthesis. 2659-2662
Phonetics, Phonology, Cross-Language Comparisons, Pathology
- Brandon Roy, Deb Roy:
Fast transcription of unstructured audio recordings. 1647-1650 - Timothy Kempton, Roger K. Moore:
Finding allophones: an evaluation on consonants in the TIMIT corpus. 1651-1654 - Keelan Evanini, Stephen Isard, Mark Liberman:
Automatic formant extraction for sociolinguistic analysis of large corpora. 1655-1658 - William Hartmann, Eric Fosler-Lussier:
Investigating phonetic information reduction and lexical confusability. 1659-1662 - Hyejin Hong, Minhwa Chung:
Improving phone recognition performance via phonetically-motivated units. 1663-1666 - Imen Jemaa, Oussama Rekhis, Kaïs Ouni, Yves Laprie:
An evaluation of formant tracking methods on an Arabic database. 1667-1670 - Wolfgang Wokurek, Andreas Madsack:
Comparison of manual and automated estimates of subglottal resonances. 1671-1674 - Odette Scharenborg:
Using durational cues in a computational model of spoken-word recognition. 1675-1678 - Bianca Sisinni, Mirko Grimaldi:
Second language discrimination vowel contrasts by adults speakers with a five vowel system. 1679-1682 - Tomohiko Ooigawa, Shigeko Shinohara:
Three-way laryngeal categorization of Japanese, French, English and Chinese plosives by Korean speakers. 1683-1686 - Shinichi Tokuma, Yi Xu:
The effect of F0 peak-delay on the L1 / L2 perception of English lexical stress. 1687-1690 - Joan Ka-Yin Ma:
Lexical tone production by Cantonese speakers with parkinson's disease. 1691-1694 - Daniela Müller, Sidney Martin Mota:
Acoustic cues of palatalisation in plosive + lateral onset clusters. 1695-1698
Prosody Perception and Language Acquisition
- Irene Vogel, Arild Hestvik, H. Timothy Bunnell, Laura Spinu:
Perception of English compound vs. phrasal stress: natural vs. synthetic speech. 1699-1702 - Martti Vainio, Antti Suni, Tuomo Raitio, Jani Nurminen, Juhani Järvikivi, Paavo Alku:
New method for delexicalization and its application to prosodic tagging for text-to-speech synthesis. 1703-1706 - Minnaleena Toivola, Mietta Lennes, Eija Aho:
Speech rate and pauses in non-native Finnish. 1707-1710 - Uwe D. Reichel, Felicitas Kleber, Raphael Winkelmann:
Modelling similarity perception of intonation. 1711-1714 - Helen Meng, Chiu-yu Tseng, Mariko Kondo, Alissa M. Harrison, Tanya Visceglia:
Studying L2 suprasegmental features in asian Englishes: a position paper. 1715-1718 - Helena Moniz, Isabel Trancoso, Ana Isabel Mata:
Classification of disfluent phenomena as fluent communicative devices in specific prosodic contexts. 1719-1722 - Rolf Carlson, Julia Hirschberg:
Cross-cultural perception of discourse phenomena. 1723-1726 - Roger K. Moore, Louis ten Bosch:
Modelling vocabulary growth from birth to young adulthood. 1727-1730 - Joris Driesen, Louis ten Bosch, Hugo Van hamme:
Adaptive non-negative matrix factorization in a computational model of language acquisition. 1731-1734 - Akiko Amano-Kusumoto, John-Paul Hosom, Izhak Shafran:
Classifying clear and conversational speech based on acoustic features. 1735-1738 - Elena E. Lyakso, Olga V. Frolova, Aleks S. Grigoriev:
The acoustic characteristics of Russian vowels in children of 6 and 7 years of age. 1739-1742 - Takaaki Shochi, Donna Erickson, Kaoru Sekiyama, Albert Rilliard, Véronique Aubergé:
Japanese children's acquisition of prosodic Politeness expressions. 1743-1746 - Mee Sonu, Keiichi Tajima, Hiroaki Kato, Yoshinori Sagisaka:
Perceptual training of singleton and geminate stops in Japanese language by Korean learners. 1747-1750
Statistical Parametric Synthesis II
- Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda:
A Bayesian approach to Hidden Semi-Markov Model based speech synthesis. 1751-1754 - Zhi-Jie Yan, Yao Qian, Frank K. Soong:
Rich context modeling for high quality HMM-based TTS. 1755-1758 - Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems. 1759-1762 - Guntram Strecha, Matthias Wolff, Frank Duckhorn, Sören Wittenberg, Constanze Tschöpe:
The HMM synthesis algorithm of an embedded unified speech recognizer and synthesizer. 1763-1766 - Zhiwei Shuang, Shiyin Kang, Qin Shi, Yong Qin, Lianhong Cai:
Syllable HMM based Mandarin TTS and comparison with concatenative TTS. 1767-1770 - Yoshinori Shiga:
Pulse density representation of spectrum for statistical speech processing. 1771-1774 - Hanna Silén, Elina Helander, Jani Nurminen, Moncef Gabbouj:
Parameterization of vocal fry in HMM-based speech synthesis. 1775-1778 - Thomas Drugman, Geoffrey Wilfart, Thierry Dutoit:
A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis. 1779-1782 - Ranniery Maia, Tomoki Toda, Keiichi Tokuda, Shinsuke Sakai, Satoshi Nakamura:
A decision tree-based clustering approach to state definition in an excitation modeling framework for HMM-based speech synthesis. 1783-1786 - Yi-Jian Wu, Long Qin, Keiichi Tokuda:
An improved minimum generation error based model adaptation for HMM-based speech synthesis. 1787-1790 - Matthew Gibson:
Two-pass decision tree construction for unsupervised adaptation of HMM-based synthesis models. 1791-1794 - Anocha Rugchatjaroen, Nattanun Thatphithakkul, Ananlada Chotimongkol, Ausdang Thangthai, Chai Wutiwiwatchai:
Speaker adaptation using a parallel phone set pronunciation dictionary for Thai-English bilingual TTS. 1795-1798 - Michal Dziemianko, Gregor Hofer, Hiroshi Shimodaira:
HMM-based automatic eye-blink synthesis from speech. 1799-1802
Resources, Annotation and Evaluation
- Lou Boves, Rolf Carlson, Erhard W. Hinrichs, David House, Steven Krauwer, Lothar Lemnitzer, Martti Vainio, Peter Wittenburg:
Resources for speech research: present and future infrastructure needs. 1803-1806 - Catherine Dickie, Felix Schaeffler, Christoph Draxler, Klaus Jänsch:
Speech recordings via the internet: an overview of the VOYS project in scotland. 1807-1810 - Aaron D. Lawson, A. R. Stauffer, Edward J. Cupples, Stanley J. Wenndt, W. P. Bray, John J. Grieco:
The multi-session audio research project (MARP) corpus: goals, design and initial findings. 1811-1814 - Katarzyna Klessa, Grazyna Demenko:
Structure and annotation of Polish LVCSR speech database. 1815-1818 - Martina Waclawicová, Michal Kren, Lucie Válková:
Balanced corpus of informal spoken Czech: compilation, design and findings. 1819-1822 - Christophe Cerisara, Odile Mella, Dominique Fohr:
JTrans: an open-source software for semi-automatic text-to-speech alignment. 1823-1826 - Ina Wechsung, Klaus-Peter Engelbrecht, Anja B. Naumann, Stefan Schaffer, Julia Seebode, Florian Metze, Sebastian Möller:
Predicting the quality of multimodal systems based on judgments of single modalities. 1827-1830 - Lijuan Wang, Shenghao Qin, Frank K. Soong:
Auto-checking speech transcriptions by multiple template constrained posterior. 1831-1834 - Toshihiko Itoh, Norihide Kitaoka, Ryota Nishimura:
Subjective experiments on influence of response timing in spoken dialogues. 1835-1838 - Jun Okamoto, Tomoyuki Kato, Makoto Shozakai:
Usability study of VUI consistent with GUI focusing on age-groups. 1839-1842 - Teruhisa Misu, Kiyonori Ohtake, Chiori Hori, Hideki Kashioka, Satoshi Nakamura:
Annotating communicative function and semantic content in dialogue act for construction of consulting dialogue systems. 1843-1846 - Shih-Hsiang Lin, Berlin Chen:
Improved speech summarization with multiple-hypothesis representations and kullback-leibler divergence measures. 1847-1850 - Okko Johannes Räsänen, Unto Kalervo Laine, Toomas Altosaar:
An improved speech segmentation quality measure: the r-value. 1851-1854 - Michaela Atterer, Timo Baumann, David Schlangen:
No sooner said than done? testing incrementality of semantic interpretations of spontaneous speech. 1855-1858
Special Session: Lessons and Challenges Deploying Voice Search
- Junlan Feng, Srinivas Bangalore, Mazin Gilbert:
Role of natural language understanding in voice local search. 1859-1862 - Keith Vertanen, Per Ola Kristensson:
Recognition and correction of voice web search queries. 1863-1866
Word-Level Perception
- Chao Wang, Johan Schalkwyk, Roberto Sicconi, Geoffrey Zweig, Marco van de Ven, Benjamin V. Tucker, Mirjam Ernestus:
Semantic context effects in the recognition of acoustically unreduced and reduced words. 1867-1870 - Michael C. W. Yip:
Context effects and the processing of ambiguous words: further evidence from semantic incongruence. 1871-1874 - Mirjam Ernestus:
The roles of reconstruction and lexical storage in the comprehension of regular pronunciation variants. 1875-1878 - Odette Scharenborg, Stefanie Okolowski:
Lexical embedding in spoken dutch. 1879-1882 - Véronique Boulenger, Michel Hoen, François Pellegrino, Fanny Meunier:
Real-time lexical competitions during speech-in-speech comprehension. 1883-1886 - Martin Cooke:
Discovering consistent word confusions in noise. 1887-1890
Applications in Education and Learning
- Dimitrios P. Lyras, George K. Kokkinakis, Alexandros Lazaridis, Kyriakos N. Sgarbas, Nikos Fakotakis:
A large greek-English dictionary with incorporated speech and language processing tools. 1891-1894 - Matthew Black, Joseph Tepperman, Sungbok Lee, Shrikanth S. Narayanan:
Predicting children's reading ability using evaluator-informed features. 1895-1898 - György Szaszák, David Sztahó, Klára Vicsi:
Automatic intonation classification for speech training systems. 1899-1902 - Su-Youn Yoon, Mark Hasegawa-Johnson, Richard Sproat:
Automated pronunciation scoring using confidence scoring and landmark-based SVM. 1903-1906 - Carlos Molina, Néstor Becerra Yoma, Jorge Wuth, Hiram Vivanco:
ASR based pronunciation evaluation with automatically generated competing vocabulary. 1907-1910 - Hongyan Li, Shijin Wang, Jiaen Liang, Shen Huang, Bo Xu:
High performance automatic mispronunciation detection method based on neural network and TRAP features. 1911-1914
ASR: New Paradigms I, II
- Amarnag Subramanya, Jeff A. Bilmes:
The semi-supervised switchboard transcription project. 1915-1918 - Geoffrey Zweig, Patrick Nguyen:
Maximum mutual information multi-phone units in direct modeling. 1919-1922 - Kai Yu, Rob A. Rutenbar:
Profiling large-vocabulary continuous speech recognition on embedded devices: a hardware resource sensitivity analysis. 1923-1926 - Ozlem Kalinli, Shrikanth S. Narayanan:
Continuous speech recognition using attention shift decoding with soft decision. 1927-1930 - Ariya Rastrow, Abhinav Sethy, Bhuvana Ramabhadran, Frederick Jelinek:
Towards using hybrid word and fragment units for vocabulary independent LVCSR systems. 1931-1934 - Herbert Gish, Man-Hung Siu, Arthur Chan, William Belfield:
Unsupervised training of an HMM-based speech recognizer for topic classification. 1935-1938 - Viktoria Maier, Roger K. Moore:
The case for case-based automatic speech recognition. 3027-3030 - Ian McGraw, Alexander Gruenstein, Andrew M. Sutherland:
A self-labeling speech corpus: collecting spoken words with an online educational game. 3031-3034 - Okko Johannes Räsänen, Unto Kalervo Laine, Toomas Altosaar:
A noise robust method for pattern discovery in quantized time series: the concept matrix approach. 3035-3038 - Patrick Cardinal, Pierre Dumouchel, Gilles Boulianne:
Using parallel architectures in speech recognition. 3039-3042 - Christopher James Watkins, Stephen J. Cox:
Example-based speech recognition using formulaic phrases. 3043-3046 - Naveen Parihar, Ralf Schlüter, David Rybach, Eric A. Hansen:
Parallel fast likelihood computation for LVCSR using mixture decomposition. 3047-3050 - Chen Liu:
An indexing weight for voice-to-text search. 3051-3054 - Yu Qiao, Nobuaki Minematsu, Keikichi Hirose:
On invariant structural representation for speech recognition: theoretical validation and experimental improvement. 3055-3058 - I-Fan Chen, Hsin-Min Wang:
Articulatory feature asynchrony analysis and compensation in detection-based ASR. 3059-3062 - Jeremy Morris, Eric Fosler-Lussier:
CRANDEM: conditional random fields for word recognition. 3063-3066 - Sébastien Demange, Dirk Van Compernolle:
HEAR: an hybrid episodic-abstract speech recognizer. 3067-3070
Single-Channel Speech Enhancement
- Kaustubh Kalgaonkar, Mark A. Clements:
Constrained probabilistic subspace maps applied to speech enhancement. 1939-1942 - Ben Milner, Jonathan Darch, Ibrahim Almajai:
Reconstructing clean speech from noisy MFCC vectors. 1943-1946 - Cees H. Taal, Richard C. Hendriks, Richard Heusdens, Jesper Jensen, Ulrik Kjems:
An evaluation of objective quality measures for speech intelligibility prediction. 1947-1950 - Mohammad H. Radfar, Wai-Yip Chan, Richard M. Dansereau, Willy Wong:
Performance comparison of HMM and VQ based single channel speech separation. 1951-1954 - Yosuke Izumi, Kenta Nishiki, Shinji Watanabe, Takuya Nishimoto, Nobutaka Ono, Shigeki Sagayama:
Stereo-input speech recognition using sparseness-based time-frequency masking in a reverberant environment. 1955-1958 - Ibrahim Almajai, Ben Milner:
Enhancing audio speech using visual speech features. 1959-1962
Expression, Emotion and Personality Recognition
- Diane J. Litman, Mihai Rotaru, Greg Nicholas:
Classifying turn-level uncertainty using word-level prosody. 2003-2006 - Gabriel Murray, Giuseppe Carenini:
Detecting subjectivity in multiparty speech. 2007-2010 - Vidhyasaharan Sethu, Eliathamby Ambikairajah, Julien Epps:
Pitch contour parameterisation based on linear stylisation for emotion recognition. 2011-2014 - Martin Graciarena, Tobias Bocklet, Elizabeth Shriberg, Andreas Stolcke, Sachin S. Kajarekar:
Feature-based and channel-based analyses of intrinsic variability in speaker verification. 2015-2018 - Wooil Kim, John H. L. Hansen:
Robust angry speech detection employing a TEO-based discriminative classifier combination. 2019-2022 - Dmitri Bitouk, Ani Nenkova, Ragini Verma:
Improving emotion recognition using class-level spectral features. 2023-2026 - Khiet P. Truong, David A. van Leeuwen, Mark A. Neerincx, Franciska M. G. de Jong:
Arousal and valence prediction in spontaneous emotional speech: felt versus perceived emotion. 2027-2030 - Gil Dobry, Ron M. Hecht, Mireille Avigal, Yaniv Zigel:
Dimension reduction approaches for SVM based speaker age estimation. 2031-2034 - Lu Xu, Mingxing Xu, Dali Yang:
ANN based decision fusion for speech emotion recognition. 2035-2038 - Bogdan Vlasenko, Andreas Wendemuth:
Processing affected speech within human machine interaction. 2039-2042 - Ali Hassan, Robert I. Damper:
Emotion recognition from speech using extended feature selection and a simple classifier. 2043-2046
Speech Synthesis Methods
- Daisuke Saito, Yu Qiao, Nobuaki Minematsu, Keikichi Hirose:
Optimal event search using a structural cost function - improvement of structure to speech conversion. 2047-2050 - Ziad Al Bawab, Lorenzo Turicchia, Richard M. Stern, Bhiksha Raj:
Deriving vocal tract shapes from electromagnetic articulograph data via geometric adaptation and matching. 2051-2054 - Ingmar Steiner, Korin Richmond:
Towards unsupervised articulatory resynthesis of German utterances using EMA data. 2055-2058 - David Weenink:
The klattgrid speech synthesizer. 2059-2062 - Mucemi Gakuru:
Development of a kenyan English text to speech system: a method of developing a TTS for a previously undefined English dialect. 2063-2066 - Javier Latorre, Sergio Gracia, Masami Akamine:
Feedback loop for prosody prediction in concatenative speech synthesis. 2067-2070 - Donata Moers, Petra Wagner:
Assessing a speaker for fast speech in unit selection speech synthesis. 2071-2074 - Ling Cen, Minghui Dong, Paul Y. Chan, Haizhou Li:
Unit selection based speech synthesis for poor channel condition. 2075-2078 - Didier Cadic, Cédric Boidin, Christophe d'Alessandro:
Vocalic sandwich, a unit designed for unit selection TTS. 2079-2082 - Ryo Morinaka, Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima:
Speech synthesis based on the plural unit selection and fusion method using FWF model. 2083-2086 - Matthew P. Aylett, Simon King, Junichi Yamagishi:
Speech synthesis without a phone inventory. 2087-2090 - Heiga Zen, Norbert Braunschweiler:
Context-dependent additive log f_0 model for HMM-based speech synthesis. 2091-2094
LVCSR Systems and Spoken Term Detection
- Alfonso Ortega, José Enrique García Laínez, Antonio Miguel, Eduardo Lleida:
Real-time live broadcast news subtitling system for Spanish. 2095-2098 - Xin Lei, Wei Wu, Wen Wang, Arindam Mandal, Andreas Stolcke:
Development of the 2008 SRI Mandarin speech-to-text system for broadcast news and conversation. 2099-2102 - Wen Wang, Arindam Mandal, Xin Lei, Andreas Stolcke, Jing Zheng:
Multifactor adaptation for Mandarin broadcast news and conversation speech recognition. 2103-2106 - Christian Plahl, Björn Hoffmeister, Georg Heigold, Jonas Lööf, Ralf Schlüter, Hermann Ney:
Development of the GALE 2008 Mandarin LVCSR system. 2107-2110 - David Rybach, Christian Gollan, Georg Heigold, Björn Hoffmeister, Jonas Lööf, Ralf Schlüter, Hermann Ney:
The RWTH aachen university open source speech recognition system. 2111-2114 - Jie Gao, Qingwei Zhao, Yonghong Yan:
Online detecting end times of spoken utterances for synchronization of live speech and its transcripts. 2115-2118 - Philip N. Garner, John Dines, Thomas Hain, Asmaa El Hannani, Martin Karafiát, Danil Korchagin, Mike Lincoln, Vincent Wan, Le Zhang:
Real-time ASR from meetings. 2119-2122 - Paul Deléglise, Yannick Estève, Sylvain Meignier, Téva Merlin:
Improvements to the LIUM French ASR system based on CMU sphinx: what helps to significantly reduce the word error rate? 2123-2126 - Timo Mertens, Daniel Schneider, Joachim Köhler:
Merging search spaces for subword spoken term detection. 2127-2130 - Javier Tejedor, Dong Wang, Simon King, Joe Frankel, José Colás:
A posterior probability-based system hybridisation and combination for spoken term detection. 2131-2134 - Dong Wang, Simon King, Joe Frankel:
Stochastic pronunciation modelling for spoken term detection. 2135-2138 - Dong Wang, Simon King, Joe Frankel, Peter Bell:
Term-dependent confidence for out-of-vocabulary term detection. 2139-2142 - Wade Shen, Christopher M. White, Timothy J. Hazen:
A comparison of query-by-example methods for spoken term detection. 2143-2146 - Kouichi Katsurada, Shigeki Teshima, Tsuneo Nitta:
Fast keyword detection using suffix array. 2147-2150
Special Session: Active Listening & Synchrony
- Dirk Heylen:
Understanding speaker-listener interactions. 2151-2154 - Plínio A. Barbosa:
Detecting changes in speech expressiveness in participants of a radio program. 2155-2158 - Nick Campbell:
An audio-visual approach to measuring discourse synchrony in multimodal conversation data. 2159-2162 - Spyros Kousidis, David Dorran, Ciaran McDonnell, Eugene Coyle:
Towards flexible representations for analysis of accommodation of temporal features in spontaneous dialogue speech. 2163-2166 - Stefan Benus:
Are we 'in sync': turn-taking in collaborative dialogues. 2167-2170 - Martin Heckmann, Holger Brandl, Xavier Domont, Bram Bolder, Frank Joublin, Christian Goerick:
An audio-visual attention system for online association learning. 2171-2174
Language Recognition
- Rosemary Orr, David A. van Leeuwen:
A human benchmark for language recognition. 2175-2178 - Donglai Zhu, Bin Ma, Haizhou Li:
Large margin estimation of Gaussian mixture model parameters with extended baum-welch for spoken language recognition. 2179-2182 - Cécile Woehrling, Philippe Boula de Mareüil, Martine Adda-Decker:
Linguistically-motivated automatic classification of regional French varieties. 2183-2186 - Niko Brümmer, Albert Strasheim, Valiantsina Hubeika, Pavel Matejka, Lukás Burget, Ondrej Glembek:
Discriminative acoustic language recognition via channel-compensated GMM statistics. 2187-2190 - Mohamed Faouzi BenZeghiba, Jean-Luc Gauvain, Lori Lamel:
Language score calibration using adapted Gaussian back-end. 2191-2194 - William M. Campbell, Zahi N. Karam:
A framework for discriminative SVM/GMM systems for language recognition. 2195-2198
Phonetics & Phonology
- Michele Gubian, Francisco Torreira, Helmer Strik, Lou Boves:
Functional data analysis as a tool for analyzing speech dynamics - a case study on the French word c'était. 2199-2202 - Nancy F. Chen, Wade Shen, Joseph P. Campbell, Reva Schwartz:
Large-scale analysis of formant frequency estimation variability in conversational telephone speech. 2203-2206 - Saandia Ali, Daniel Hirst:
Developing an automatic functional annotation system for british English intonation. 2207-2210 - Joshua Tauberer, Keelan Evanini:
Intrinsic vowel duration and the post-vocalic voicing effect: some evidence from dialects of north american English. 2211-2214 - Jiahong Yuan, Mark Liberman:
Investigating /l/ variation in English through forced alignment. 2215-2218 - Xuebin Ma, Akira Nemoto, Nobuaki Minematsu, Yu Qiao, Keikichi Hirose:
Structural analysis of dialects, sub-dialects and sub-sub-dialects of Chinese. 2219-2222
Speech Activity Detection
- Hwa Jeon Song, Sung Min Ban, Hyung Soon Kim:
Voice activity detection using singular value decomposition-based filter. 2223-2226 - Chiyoun Park, Namhoon Kim, Jeongmi Cho:
Voice activity detection using partially observable Markov decision process. 2227-2230 - Zheng-Hua Tan, Børge Lindberg:
High-accuracy, low-complexity voice activity detection based on a posteriori SNR weighted energy. 2231-2234 - Stéphane Pigeon, Patrick Verlinde:
Fusing fast algorithms to achieve efficient speech detection in FM broadcasts. 2235-2238 - Tasuku Oonishi, Paul R. Dixon, Koji Iwano, Sadaoki Furui:
Robust speech recognition using VAD-measure-embedded decoder. 2239-2242 - Sree Hari Krishnan Parthasarathi, Mathew Magimai-Doss, Hervé Bourlard, Daniel Gatica-Perez:
Investigating privacy-sensitive features for speech detection in multiparty conversations. 2243-2246
Multimodal Speech (e.g. Audiovisual Speech, Gesture)
- Lan Wang, Hui Chen, JianJun Ouyang:
Evaluation of external and internal articulator dynamics for pronunciation learning. 2247-2250 - Kshitiz Kumar, Jirí Navrátil, Etienne Marcheret, Vit Libal, Gerasimos Potamianos:
Robust audio-visual speech synchrony detection by generalized bimodal linear prediction. 2251-2254 - Atef Ben Youssef, Pierre Badin, Gérard Bailly, Panikos Heracleous:
Acoustic-to-articulatory inversion using speech recognition and trajectory formation based on phoneme hidden Markov models. 2255-2258 - Jeesun Kim, Chris Davis, Christian Kroos, Harold Hill:
Speaker discriminability for visual speech modes. 2259-2262 - Dang-Khoa Mac, Véronique Aubergé, Albert Rilliard, Eric Castelli:
Audio-visual prosody of social attitudes in vietnamese: building and evaluating a tones balanced corpus. 2263-2266 - György Takács:
Direct, modular and hybrid audio to visual speech conversion methods - a comparative study. 2267-2270
Phonetics
- Audrey Bürki, Cécile Fougeron, Christophe Veaux, Ulrich H. Frauenfelder:
How similar are clusters resulting from schwa deletion in French to identical underlying clusters? 2271-2274 - Barbara Schuppler, Wim A. van Dommelen, Jacques C. Koreman, Mirjam Ernestus:
Word-final [t]-deletion: an analysis on the segmental and sub-segmental level. 2275-2278 - Amanda Miller, Abigail Scott, Bonny E. Sands, Sheena Shah:
Rarefaction gestures and coarticulation in mangetti dune !xung clicks. 2279-2282 - Amanda Miller, Sheena Shah:
The acoustics of mangetti dune !xung clicks. 2283-2286 - Hussien Seid Worku, S. Rajendran, B. Yegnanarayana:
Acoustic characteristics of ejectives in amharic. 2287-2290 - Wing Li Wu:
Sentence-final particles in hong kong Cantonese: are they tonal or intonational? 2291-2294 - William Steed, Phil Rose:
Same tone, different category: linguistic-tonetic variation in the areal tone acoustics of chuqu wu. 2295-2298 - Caicai Zhang:
Why would aspiration lower the pitch of the following vowel? observations from leng-shui-jiang Chinese. 2299-2302 - Kanae Amino, Takayuki Arai:
Dialectal characteristics of osaka and tokyo Japanese: analyses of phonologically identical words. 2303-2306 - Brechtje Post, Francis Nolan, Emmanuel A. Stamatakis, Toby Hudson:
Categories and gradience in intonation: evidence from linguistics and neurobiology. 2307-2310 - Mitsuhiro Nakamura:
Exploring vocalization of /l/ in English: an EPG and EMA study. 2311-2314 - Robert Mayr, Hannah Davies:
The monophthongs and diphthongs of north-eastern welsh: an acoustic study. 2315-2318 - Jagoda Sieczkowska, Bernd Möbius, Antje Schweitzer, Michael Walsh, Grzegorz Dogil:
Voicing profile of Polish sonorants: [r] in obstruent clusters. 2319-2322
Special Session: Machine Learning for Adaptivity in Spoken Dialogue Systems
- Katherine Forbes-Riley, Diane J. Litman:
A user modeling-based performance analysis of a wizarded uncertainty-adaptive dialogue system corpus. 2467-2470 - Juan Manuel Lucas-Cuesta, Fernando Fernández Martínez, Javier Ferreiros:
Using dialogue-based dynamic language models for improving speech recognition. 2471-2474 - Lihong Li, Jason D. Williams, Suhrid Balakrishnan:
Reinforcement learning for dialog management using least-squares Policy iteration and fast feature selection. 2475-2478 - Romain Laroche, Ghislain Putois, Philippe Bretier, Bernadette Bouchon-Meunier:
Hybridisation of expertise and reinforcement learning in dialogue systems. 2479-2482 - Komei Sugiura, Naoto Iwahashi, Hideki Kashioka, Satoshi Nakamura:
Bayesian learning of confidence measure function for generation of utterances and motions in object manipulation dialogue task. 2483-2486 - Cédric Boidin, Verena Rieser, Lonneke van der Plas, Oliver Lemon, Jonathan Chevelu:
Predicting how it sounds: re-ranking dialogue prompts based on TTS quality for adaptive spoken dialogue systems. 2487-2490
Prosody: Perception
- Antje Schweitzer, Bernd Möbius:
Experiments on automatic prosodic labeling. 2515-2518 - Katrin Schneider, Grzegorz Dogil, Bernd Möbius:
German boundary tones show categorical perception and a perceptual magnet effect when presented in different contexts. 2519-2522 - Michael White, Rajakrishnan Rajkumar, Kiwako Ito, Shari R. Speer:
Eye tracking for the online evaluation of prosody in speech synthesis: not so fast! 2523-2526 - Hansjörg Mixdorff, John Ingram:
Prosodic analysis of foreign-accented English. 2527-2530 - Philippe Boula de Mareüil, Albert Rilliard, Alexandre Allauzen:
Perception of the evolution of prosody in the French broadcast news style. 2531-2534 - Yoonsook Mo, Jennifer Cole, Mark Hasegawa-Johnson:
Prosodic effects on vowel production: evidence from formant structure. 2535-2538
Segmentation and Classification
- Janez Zibert, Andrej Brodnik, France Mihelic:
An adaptive BIC approach for robust audio stream segmentation. 2539-2542 - Vaishali Patil, Shrikant Joshi, Preeti Rao:
Improving the robustness of phonetic segmentation to accent and style variation with a two-staged approach. 2543-2546 - Kyu Jeong Han, Shrikanth S. Narayanan:
Signature cluster model selection for incremental Gaussian mixture cluster modeling in agglomerative hierarchical speaker clustering. 2547-2550 - Lingyun Gu, Richard M. Stern:
Speaker segmentation and clustering for simultaneously presented speech. 2551-2554 - Nash M. Borges, Gerard G. L. Meyer:
Trimmed KL divergence between Gaussian mixtures for robust unsupervised acoustic anomaly detection. 2555-2558 - Hui Lin, Jeff A. Bilmes, Koby Crammer:
How to loose confidence: probabilistic linear machines for multiclass classification. 2559-2562
Evaluation & Standardisation of SL Technology and Systems
- Sebastian Möller, Nicolas Côté, Atsuko Kurashima, Noritsugu Egi, Akira Takahashi:
Quantifying wideband speech codec degradations via impairment factors: the new ITU-t p.834.1 methodology and its application to the g.711.1 codec. 2563-2566 - Markku Turunen, Jaakko Hakulinen, Aleksi Melto, Tomi Heimonen, Tuuli Laivo, Juho Hella:
SUXES - user experience evaluation method for spoken and multimodal interaction. 2567-2570 - David A. van Leeuwen, Judith M. Kessens, Eric Sanders, Henk van den Heuvel:
Results of the n-best 2008 dutch speech recognition evaluation. 2571-2574 - Marijn Huijbregts, Roeland Ordelman, Laurens van der Werff, Franciska M. G. de Jong:
SHoUT, the university of twente submission to the n-best 2008 speech recognition evaluation for dutch. 2575-2578 - Alvin F. Martin, Craig S. Greenberg:
NIST 2008 speaker recognition evaluation: performance across telephone and room microphone channels. 2579-2582 - Sylvain Galliano, Guillaume Gravier, Laura Chaubard:
The ester 2 evaluation campaign for the rich transcription of French radio broadcasts. 2583-2586
Speech Coding
- José Enrique García Laínez, Alfonso Ortega, Antonio Miguel, Eduardo Lleida:
Differential vector quantization of feature vectors for distributed speech recognition. 2587-2590 - Petr Motlícek, Sriram Ganapathy, Hynek Hermansky:
Arithmetic coding of sub-band residuals in FDLP speech/audio codec. 2591-2594 - Tom Bäckström, Stefan Bayer, Sascha Disch:
Pitch variation estimation. 2595-2598 - Yun-Sik Park, Ji-Hyun Song, Jae-Hun Choi, Joon-Hyuk Chang:
Soft decision-based acoustic echo suppression in a frequency domain. 2599-2602 - Mouloud Djamah, Douglas D. O'Shaughnessy:
Fine-granular scalable MELP coder based on embedded vector quantization. 2603-2606 - Emre Unver, Stephane Villette, Ahmet M. Kondoz:
Joint quantization strategies for low bit-rate sinusoidal coding. 2607-2610 - Akira Nishimura:
Steganographic band width extension for the AMR codec of low-bit-rate modes. 2611-2614 - V. Ramasubramanian, D. Harish:
Ultra low bit-rate speech coding based on unit-selection with joint spectral-residual quantization: no transmission of any residual information. 2615-2618 - Konstantin Schmidt, Markus Schnell, Nikolaus Rettelbach, Manfred Lutzky, Jochen Issing:
On the cost of backward compatibility for communication codecs. 2619-2622 - Young Han Lee, Hong Kook Kim:
A media-specific FEC based on huffman coding for distributed speech recognition. 2623-2626
Systems for Spoken Language Understanding
- Sibel Yaman, Dilek Hakkani-Tür, Gökhan Tür, Ralph Grishman, Mary P. Harper, Kathleen R. McKeown, Adam Meyers, Kartavya Sharma:
Classification-based strategies for combining multiple 5-w question answering systems. 2703-2706 - Sibel Yaman, Dilek Hakkani-Tür, Gökhan Tür:
Combining semantic and syntactic information sources for 5-w question answering. 2707-2710 - Benoît Favre, Dilek Hakkani-Tür:
Phrase and word level strategies for detecting appositions in speech. 2711-2714 - Nathalie Camelin, Renato de Mori, Frédéric Béchet, Géraldine Damnati:
Error correction of proportions in spoken opinion surveys. 2715-2718 - Filip Jurcícek, Milica Gasic, Simon Keizer, François Mairesse, Blaise Thomson, Kai Yu, Steve J. Young:
Transformation-based learning for semantic parsing. 2719-2722 - Patrick Lehnen, Stefan Hahn, Hermann Ney, Agnieszka Mykowiecka:
Large-scale Polish SLU. 2723-2726 - Stefan Hahn, Patrick Lehnen, Georg Heigold, Hermann Ney:
Optimizing CRFs for SLU tasks in various languages using modified training criteria. 2727-2730 - Ryo Taguchi, Naoto Iwahashi, Takashi Nose, Kotaro Funakoshi, Mikio Nakano:
Learning lexicons from spoken utterances based on statistical model selection. 2731-2734 - Masaki Katsumaru, Mikio Nakano, Kazunori Komatani, Kotaro Funakoshi, Tetsuya Ogata, Hiroshi G. Okuno:
Improving speech understanding accuracy with limited training data using multiple language models and multiple understanding models. 2735-2738 - Youngja Park, Wilfried Teiken, Stephen C. Gates:
Low-cost call type classification for contact center calls using partial transcripts. 2739-2742 - Mehryar Mohri, Pedro J. Moreno, Eugene Weinstein:
A new quality measure for topic segmentation of text and speech. 2743-2746 - Marco Dinarelli, Alessandro Moschitti, Giuseppe Riccardi:
Concept segmentation and labeling for conversational speech. 2747-2750
Special Session: New Approaches to Modeling Variability for Automatic Speech Recognition
- Vikramjit Mitra, Bengt J. Borgstrom, Carol Y. Espy-Wilson, Abeer Alwan:
A noise-type and level-dependent MPO-based speech enhancement architecture with variable frame analysis for noise-robust speech recognition. 2751-2754 - Bernd T. Meyer, Birger Kollmeier:
Complementarity of MFCC, PLP and Gabor features in the presence of speech-intrinsic variabilities. 2755-2758 - Vikramjit Mitra, Hosung Nam, Carol Y. Espy-Wilson, Elliot Saltzman, Louis Goldstein:
Noise robustness of tract variables and their application to speech recognition. 2759-2762 - Xiaodan Zhuang, Hosung Nam, Mark Hasegawa-Johnson, Louis Goldstein, Elliot Saltzman:
Articulatory phonological code for word classification. 2763-2766 - Aren Jansen, Partha Niyogi:
Robust keyword spotting with rapidly adapting point process models. 2767-2770 - Joseph Tepperman, Louis Goldstein, Sungbok Lee, Shrikanth S. Narayanan:
Automatically rating pronunciation through articulatory phonology. 2771-2774
User Interactions in Spoken Dialog Systems
- David Griol, Giuseppe Riccardi, Emilio Sanchis:
Learning the structure of human-computer and human-human dialogs. 2775-2778 - Jens Edlund, Mattias Heldner, Julia Hirschberg:
Pause and gap length in face-to-face interaction. 2779-2782 - Kornel Laskowski, Elizabeth Shriberg:
Modeling other talkers for improved dialog act recognition in meetings. 2783-2786 - Klaus-Peter Engelbrecht, Felix Hartard, Florian Gödde, Sebastian Möller:
A closer look at quality judgments of spoken dialog systems. 2787-2790 - Geoffrey Zweig:
New methods for the analysis of repeated utterances. 2791-2794 - Ing-Marie Jonsson, Nils Dahlbäck:
The effects of different voices for speech-based in-vehicle interfaces: impact of young and old voices on driving performance and attitude. 2795-2798
Production: Articulation and Acoustics
- Gopal Ananthakrishnan, Daniel Neiberg, Olov Engwall:
In search of non-uniqueness in the acoustic-to-articulatory mapping. 2799-2802 - Prasanta Kumar Ghosh, Shrikanth S. Narayanan, Pierre L. Divenyi, Louis Goldstein, Elliot Saltzman:
Estimation of articulatory gesture patterns from speech acoustics. 2803-2806 - I. Yücel Özbek, Mark Hasegawa-Johnson, Mübeccel Demirekler:
Formant trajectories for acoustic-to-articulatory inversion. 2807-2810 - Blaise Potard, Yves Laprie:
A robust variational method for the acoustic-to-articulatory problem. 2811-2814 - Jianwu Dang, Mark Tiede, Jiahong Yuan:
Comparison of vowel structures of Japanese and English in articulatory and auditory spaces. 2815-2818 - Janine Lilienthal:
The articulatory and acoustic impact of scottish English /r/ on the preceding vowel-onset. 2819-2822
Features for Speech and Speaker Recognition
- Sriram Ganapathy, Samuel Thomas, Hynek Hermansky:
Static and dynamic modulation spectrum for speech recognition. 2823-2826 - Tianyu T. Wang, Thomas F. Quatieri:
2-d processing of speech for multi-pitch analysis. 2827-2830 - Wei Chu, Abeer Alwan:
A correlation-maximization denoising filter used as an enhancement frontend for noise robust bird call classification. 2831-2834 - Korin Richmond:
Preliminary inversion mapping results with a new EMA corpus. 2835-2838 - Daniel Rudoy, Thomas F. Quatieri, Patrick J. Wolfe:
Time-varying autoregressive tests for multiscale speech analysis. 2839-2842 - Armando Muscariello, Guillaume Gravier, Frédéric Bimbot:
Audio keyword extraction by unsupervised word discovery. 2843-2846
Speech and Multimodal Resources & Annotation
- Etienne Barnard, Marelie H. Davel, Charl Johannes van Heerden:
ASR corpus design for resource-scarce languages. 2847-2850 - Marelie H. Davel, Olga Martirosian:
Pronunciation dictionary development in resource-scarce environments. 2851-2854 - Meghan Lammie Glenn, Stephanie M. Strassel, Haejoong Lee:
XTrans: a speech annotation and transcription tool. 2855-2858 - Hui Lin, Jeff A. Bilmes:
How to select a good training-data subset for transcription: submodular active selection for sequences. 2859-2862 - Zoraida Callejas, Ramón López-Cózar:
Improving acceptability assessment for the labelling of affective speech corpora. 2863-2866 - Christopher Cieri, Linda Brandschain, Abby Neely, David Graff, Kevin Walker, Chris Caruso, Alvin F. Martin, Craig S. Greenberg:
The broadcast narrow band speech corpus: a new resource type for large scale language recognition. 2867-2870
Speaker and Speech Variability, Paralinguistic and Nonlinguistic Cues
- Yen-Liang Shue, Jody Kreiman, Abeer Alwan:
A novel codebook search technique for estimating the open quotient. 2895-2898 - Aaron D. Lawson, Allen R. Stauffer, Brett Y. Smolenski, Benjamin B. Pokines, Matthew Leonard, Edward J. Cupples:
Long term examination of intra-session and inter-session speaker variability. 2899-2902 - Ragnhild Eg, Dawn M. Behne:
Distorted visual information influences audiovisual perception of voicing. 2903-2906 - Samia Fraj, Francis Grenez, Jean Schoentgen:
Perceived naturalness of a synthesizer of disordered voices. 2907-2910 - Alexey Karpov, Liliya Tsirulnik, Zdenek Krnoul, Andrey Ronzhin, Boris Lobanov, Milos Zelezný:
Audio-visual speech asynchrony modeling in a talking head. 2911-2914 - Takayuki Kagomiya, Seiji Nakagawa:
The effects of fundamental frequency and formant space on speaker discrimination through bone-conducted ultrasonic hearing. 2915-2918 - Céline De Looze, Stéphane Rauzy:
Automatic detection and prediction of topic changes through automatic detection of register variations and pause duration. 2919-2922 - Werner Spiegl, Georg Stemmer, Eva Lasarcyk, Varada Kolhatkar, Andrew Cassidy, Blaise Potard, Stephen Shum, Young Chol Song, Puyang Xu, Peter Beyerlein, James D. Harnsberger, Elmar Nöth:
Analyzing features for automatic age estimation on cross-sectional data. 2923-2926 - Emi Juliana Yamauchi, Satoshi Imaizumi, Hagino Maruyama, Tomoyuki Haji:
Intercultural differences in evaluation of pathological voice quality: perceptual and acoustical comparisons between RASATI and GRBASI scales. 2927-2930 - Kalika Bali:
F0 cues for the discourse functions of "hã" in hindi. 2931-2934 - Stuart N. Wrigley, Simon Tucker, Guy J. Brown, Steve Whittaker:
Audio spatialisation strategies for multitasking during teleconferences. 2935-2938 - Alexsandro R. Meireles, Plínio A. Barbosa:
Speech rate effects on linguistic change. 2939-2942 - Chiu-yu Tseng, Zhao-yu Su, Lin-Shan Lee:
Mandarin spontaneous narrative planning - prosodic evidence from national taiwan university lecture corpus. 2943-2946
ASR: Acoustic Model Features
- Frantisek Grézl, Martin Karafiát, Lukás Burget:
Investigation into bottle-neck features for meeting speech recognition. 2947-2950 - Sherry Y. Zhao, Suman V. Ravuri, Nelson Morgan:
Multi-stream to many-stream: using spectro-temporal features for ASR. 2951-2954 - Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Tandem representations of spectral envelope and modulation frequency features for ASR. 2955-2958 - Panji Setiawan, Harald Höge, Tim Fingscheidt:
Entropy-based feature analysis for speech recognition. 2959-2962 - Fabio Valente, Mathew Magimai-Doss, Christian Plahl, Suman V. Ravuri:
Hierarchical processing of the modulation spectrum for GALE Mandarin LVCSR system. 2963-2966 - David Gelbart, Nelson Morgan, Alexey Tsymbal:
Hill-climbing feature selection for multi-stream ASR. 2967-2970 - Yusuke Kida, Masaru Sakai, Takashi Masuko, Akinori Kawamura:
Robust F0 estimation based on log-time scale autocorrelation and its application to Mandarin tone recognition. 2971-2974 - Florian Müller, Alfred Mertins:
Invariant-integration method for robust feature extraction in speaker-independent speech recognition. 2975-2978 - Omid Dehzangi, Bin Ma, Engsiong Chng, Haizhou Li:
Discriminative feature transformation using output coding for speech recognition. 2979-2982 - Nima Mesgarani, Garimella S. V. S. Sivaram, Sridhar Krishna Nemala, Mounya Elhilali, Hynek Hermansky:
Discriminant spectrotemporal features for phoneme recognition. 2983-2986 - Saikat Chatterjee, Christos Koniaris, W. Bastiaan Kleijn:
Auditory model based optimization of MFCCs improves automatic speech recognition performance. 2987-2990
ASR: Tonal Language, Cross-Lingual and Multilingual ASR
- Henk van den Heuvel, Bert Réveil, Jean-Pierre Martens:
Pronunciation-based ASR for names. 2991-2994 - Bert Réveil, Jean-Pierre Martens, Bart D'hoore:
How speaker tongue and name source language affect the automatic recognition of spoken names. 2995-2998 - Martin Raab, Guillermo Aradilla, Rainer Gruhn, Elmar Nöth:
Online generation of acoustic models for multilingual speech recognition. 2999-3002 - Charl Johannes van Heerden, Etienne Barnard, Marelie H. Davel:
Basic speech recognition for spoken dialogues. 3003-3006 - Qingqing Zhang, Jielin Pan, Yonghong Yan:
Tonal articulatory feature for Mandarin and its application to conversational LVCSR. 3007-3010 - Houwei Cao, P. C. Ching, Tan Lee:
Effects of language mixing for automatic recognition of Cantonese-English code-mixing utterances. 3011-3014 - Changliang Liu, Fengpei Ge, Fuping Pan, Bin Dong, Yonghong Yan:
A one-step tone recognition approach using MSD-HMM for continuous speech. 3015-3018 - Khe Chai Sim, Haizhou Li:
Stream-based context-sensitive phone mapping for cross-lingual speech recognition. 3019-3022 - Sebastian Stüker, Laurent Besacier, Alex Waibel:
Human translations guided language discovery for ASR systems. 3023-3026
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.