default search action
INTERSPEECH/EUROSPEECH 2005: Lisbon, Portugal
- 9th European Conference on Speech Communication and Technology, INTERSPEECH-Eurospeech 2005, Lisbon, Portugal, September 4-8, 2005. ISCA 2005
Keynote Papers
- Graeme M. Clark:
The multiple-channel cochlear implant: interfacing electronic technology to human consciousness. 1-4
Speech Recognition - Language Modelling I-III
- Yik-Cheung Tam, Tanja Schultz:
Dynamic language model adaptation using variational Bayes inference. 5-8 - Vidura Seneviratne, Steve J. Young:
The hidden vector state language model. 9-12 - Shinsuke Mori, Gakuto Kurata:
Class-based variable memory length Markov model. 13-16 - Alexander Gruenstein, Chao Wang, Stephanie Seneff:
Context-sensitive statistical language modeling. 17-20 - Chao Wang, Stephanie Seneff, Grace Chung:
Language model data filtering via user simulation and dialogue resynthesis. 21-24 - Jen-Tzung Chien, Meng-Sung Wu, Chia-Sheng Wu:
Bayesian learning for latent semantic analysis. 25-28
Prosody in Language Performance I, II
- Daniel Hirst, Caroline Bouzon:
The effect of stress and boundaries on segmental duration in a corpus of authentic speech (british English). 29-32 - Tomoko Ohsuga, Masafumi Nishida, Yasuo Horiuchi, Akira Ichikawa:
Investigation of the relationship between turn-taking and prosodic features in spontaneous dialogue. 33-36 - Michiko Watanabe, Keikichi Hirose, Yasuharu Den, Nobuaki Minematsu:
Filled pauses as cues to the complexity of following phrases. 37-40 - Katrin Schneider, Bernd Möbius:
Perceptual magnet effect in German boundary tones. 41-44 - Angela Grimm, Jochen Trommer:
Constraints on the acquisition of simplex and complex words in German. 45-48 - Julien Meyer:
Whistled speech: a natural phonetic description of languages adapted to human perception and to the acoustical environment. 49-52
Spoken Language Extraction / Retrieval I, II
- Olivier Siohan, Michiel Bacchiani:
Fast vocabulary-independent audio search using path-based graph indexing. 53-56 - John Makhoul, Alex Baron, Ivan Bulyko, Long Nguyen, Lance A. Ramshaw, David Stallard, Richard M. Schwartz, Bing Xiang:
The effects of speech recognition and punctuation on information extraction performance. 57-60 - Ciprian Chelba, Alex Acero:
Indexing uncertainty for spoken document search. 61-64 - Tomoyosi Akiba, Hiroyuki Abe:
Exploiting passage retrieval for n-best rescoring of spoken questions. 65-68 - BalaKrishna Kolluru, Heidi Christensen, Yoshihiko Gotoh:
Multi-stage compaction approach to broadcast news summarisation. 69-72 - Chien-Lin Huang, Chia-Hsin Hsieh, Chung-Hsien Wu:
Audio-video summarization of TV news using speech recognition and shot change detection. 73-76
The Blizzard Challenge 2005
- Alan W. Black, Keiichi Tokuda:
The blizzard challenge - 2005: evaluating corpus-based speech synthesis on common datasets. 77-80 - Shinsuke Sakai, Han Shu:
A probabilistic approach to unit selection for corpus-based speech synthesis. 81-84 - John Kominek, Christina L. Bennett, Brian Langner, Arthur R. Toth:
The blizzard challenge 2005 CMU entry - a method for improving speech synthesis systems. 85-88 - H. Timothy Bunnell, Christopher A. Pennington, Debra Yarrington, John Gray:
Automatic personal synthetic voice construction. 89-92 - Heiga Zen, Tomoki Toda:
An overview of nitech HMM-based speech synthesis system for blizzard challenge 2005. 93-96 - Wael Hamza, Raimo Bakis, Zhiwei Shuang, Heiga Zen:
On building a concatenative speech synthesis system from the blizzard challenge speech databases. 97-100 - Robert A. J. Clark, Korin Richmond, Simon King:
Multisyn voices from ARCTIC data for the blizzard challenge. 101-104 - Christina L. Bennett:
Large scale evaluation of corpus-based synthesizers: results and lessons from the blizzard challenge 2005. 105-108
New Applications
- Berlin Chen, Yi-Ting Chen, Chih-Hao Chang, Hung-Bin Chen:
Speech retrieval of Mandarin broadcast news via mobile devices. 109-112 - Michiaki Katoh, Kiyoshi Yamamoto, Jun Ogata, Takashi Yoshimura, Futoshi Asano, Hideki Asoh, Nobuhiko Kitawaki:
State estimation of meetings by information fusion using Bayesian network. 113-116 - Roger K. Moore:
Results from a survey of attendees at ASRU 1997 and 2003. 117-120 - Reinhold Haeb-Umbach, Basilis Kladis, Joerg Schmalenstroeer:
Speech processing in the networked home environment - a view on the amigo project. 121-124 - Masahide Sugiyama:
Fixed distortion segmentation in efficient sound segment searching. 125-128 - Tin Lay Nwe, Haizhou Li:
Identifying singers of popular songs. 129-132 - Jun Ogata, Masataka Goto:
Speech repair: quick error correction just by using selection operation for speech input interfaces. 133-136 - Dirk Olszewski, Fransiskus Prasetyo, Klaus Linhard:
Steerable highly directional audio beam loudspeaker. 137-140 - Hassan Ezzaidi, Jean Rouat:
Automatic music genre classification using second-order statistical measures for the prescriptive approach. 141-144 - Alberto Abad, Dusan Macho, Carlos Segura, Javier Hernando, Climent Nadeu:
Effect of head orientation on the speaker localization performance in smart-room environment. 145-148 - Corinne Fredouille, Gilles Pouchoulin, Jean-François Bonastre, M. Azzarello, Antoine Giovanni, Alain Ghio:
Application of automatic speaker recognition techniques to pathological voice assessment (dysphonia). 149-152 - Upendra V. Chaudhari, Ganesh N. Ramaswamy, Edward A. Epstein, Sasha Caskey, Mohamed Kamal Omar:
Adaptive speech analytics: system, infrastructure, and behavior. 153-156
E-learning and Spoken Language Processing
- Katherine Forbes-Riley, Diane J. Litman:
Correlating student acoustic-prosodic profiles with student learning in spoken tutoring dialogues. 157-160 - Diane J. Litman, Katherine Forbes-Riley:
Speech recognition performance and learning in spoken dialogue tutoring. 161-164 - Satoshi Asakawa, Nobuaki Minematsu, Toshiko Isei-Jaakkola, Keikichi Hirose:
Structural representation of the non-native pronunciations. 165-168 - Fu-Chiang Chou:
Ya-ya language box - a portable device for English pronunciation training with speech recognition technologies. 169-172 - Akinori Ito, Yen-Ling Lim, Motoyuki Suzuki, Shozo Makino:
Pronunciation error detection method based on error rule clustering using a decision tree. 173-176 - Abhinav Sethy, Shrikanth S. Narayanan, Nicolaus Mote, W. Lewis Johnson:
Modeling and automating detection of errors in Arabic language learner speech. 177-180 - Felicia Zhang, Michael Wagner:
Effects of F0 feedback on the learning of Chinese tones by native speakers of English. 181-184
E-inclusion and Spoken Language Processing I, II
- Tom Brøndsted, Erik Aaskoven:
Voice-controlled internet browsing for motor-handicapped users. design and implementation issues. 185-188 - Briony Williams, Delyth Prys, Ailbhe Ní Chasaide:
Creating an ongoing research capability in speech technology for two minority languages: experiences from the WISPR project. 189-192 - Anestis Vovos, Basilis Kladis, Nikolaos D. Fakotakis:
Speech operated smart-home control system for users with special needs. 193-196 - Takatoshi Jitsuhiro, Shigeki Matsuda, Yutaka Ashikari, Satoshi Nakamura, Ikuko Eguchi Yairi, Seiji Igi:
Spoken dialog system and its evaluation of geographic information system for elderly persons' mobility support. 197-200 - Daniele Falavigna, Toni Giorgino, Roberto Gretter:
A frame based spoken dialog system for home care. 201-204
Acoustic Processing for ASR I-III
- Matthias Wölfel:
Frame based model order selection of spectral envelopes. 205-208 - Vivek Tyagi, Christian Wellekens, Hervé Bourlard:
On variable-scale piecewise stationary spectral analysis of speech signals for ASR. 209-212 - Arlo Faria, David Gelbart:
Efficient pitch-based estimation of VTLN warp factors. 213-216 - Yanli Zheng, Richard Sproat, Liang Gu, Izhak Shafran, Haolang Zhou, Yi Su, Daniel Jurafsky, Rebecca Starr, Su-Youn Yoon:
Accent detection and speech recognition for Shanghai-accented Mandarin. 217-220 - Loïc Barrault, Renato de Mori, Roberto Gemello, Franco Mana, Driss Matrouf:
Variability of automatic speech recognition systems using different features. 221-224 - Slavomír Lihan, Jozef Juhár, Anton Cizmar:
Crosslingual and bilingual speech recognition with Slovak and Czech speechdat-e databases. 225-228 - Carmen Peláez-Moreno, Qifeng Zhu, Barry Y. Chen, Nelson Morgan:
Automatic data selection for MLP-based feature extraction for ASR. 229-232 - Thilo Köhler, Christian Fügen, Sebastian Stüker, Alex Waibel:
Rapid porting of ASR-systems to mobile devices. 233-236 - Hugo Meinedo, João Paulo Neto:
A stream-based audio segmentation, classification and clustering pre-processing system for broadcast news using ANN models. 237-240 - Etienne Marcheret, Karthik Visweswariah, Gerasimos Potamianos:
Speech activity detection fusing acoustic phonetic and energy features. 241-244 - Zoltán Tüske, Péter Mihajlik, Zoltán Tobler, Tibor Fegyó:
Robust voice activity detection based on the entropy of noise-suppressed spectrum. 245-248 - Masamitsu Murase, Shun'ichi Yamamoto, Jean-Marc Valin, Kazuhiro Nakadai, Kentaro Yamada, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Multiple moving speaker tracking by microphone array on mobile robot. 249-252
Speech Recognition - Adaptation I, II
- Yaxin Zhang, Bian Wu, Xiaolin Ren, Xin He:
A speaker biased SI recognizer for embedded mobile applications. 253-256 - Bart Bakker, Carsten Meyer, Xavier L. Aubert:
Fast unsupervised speaker adaptation through a discriminative eigen-MLLR algorithm. 257-260 - Rusheng Hu, Jian Xue, Yunxin Zhao:
Incremental largest margin linear regression and MAP adaptation for speech separation in telemedicine applications. 261-264 - Giulia Garau, Steve Renals, Thomas Hain:
Applying vocal tract length normalization to meeting recordings. 265-268 - Srinivasan Umesh, András Zolnay, Hermann Ney:
Implementing frequency-warping and VTLN through linear transformation of conventional MFCC. 269-272 - Xiaodong Cui, Abeer Alwan:
MLLR-like speaker adaptation based on linearization of VTLN with MFCC features. 273-276 - Chandra Kant Raut, Takuya Nishimoto, Shigeki Sagayama:
Model adaptation by state splitting of HMM for long reverberation. 277-280 - Daben Liu, Daniel Kiecza, Amit Srivastava, Francis Kubala:
Online speaker adaptation and tracking for real-time speech recognition. 281-284 - Masafumi Nishida, Yasuo Horiuchi, Akira Ichikawa:
Automatic speech recognition based on adaptation and clustering using temporal-difference learning. 285-288 - Hui Ye, Steve J. Young:
Improving the speech recognition performance of beginners in spoken conversational interaction for language learning. 289-292 - Randy Gomez, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Rapid unsupervised speaker adaptation based on multi-template HMM sufficient statistics in noisy environments. 293-296 - Dong-jin Choi, Yung-Hwan Oh:
Rapid speaker adaptation for continuous speech recognition using merging eigenvoices. 297-300
Signal Analysis, Processing and Feature Estimation I-III
- Jian Liu, Thomas Fang Zheng, Jing Deng, Wenhu Wu:
Real-time pitch tracking based on combined SMDSF. 301-304 - András Bánhalmi, Kornél Kovács, András Kocsor, László Tóth:
Fundamental frequency estimation by least-squares harmonic model fitting. 305-308 - Siu Wa Lee, Frank K. Soong, Pak-Chung Ching:
Harmonic filtering for joint estimation of pitch and voiced source with single-microphone input. 309-312 - Marián Képesi, Luis Weruaga:
High-resolution noise-robust spectral-based pitch estimation. 313-316 - John-Paul Hosom:
F0 estimation for adult and children's speech. 317-320 - Ben Milner, Xu Shao, Jonathan Darch:
Fundamental frequency and voicing prediction from MFCCs for speech reconstruction from unconstrained speech. 321-324 - Nelly Barbot, Olivier Boëffard, Damien Lolive:
F0 stylisation with a free-knot b-spline model and simulated-annealing optimization. 325-328 - Friedhelm R. Drepper:
Voiced excitation as entrained primary response of a reconstructed glottal master oscillator. 329-332 - Damien Vincent, Olivier Rosec, Thierry Chonavel:
Estimation of LF glottal source parameters based on an ARX model. 333-336 - Leigh D. Alsteris, Kuldip K. Paliwal:
Some experiments on iterative reconstruction of speech from STFT phase and magnitude spectra. 337-340 - R. Muralishankar, Abhijeet Sangwan, Douglas D. O'Shaughnessy:
Statistical properties of the warped discrete cosine transform cepstrum compared with MFCC. 341-344 - Aníbal J. S. Ferreira:
New signal features for robust identification of isolated vowels. 345-348 - Jonathan Pincas, Philip J. B. Jackson:
Amplitude modulation of frication noise by voicing saturates. 349-352 - Ron M. Hecht, Naftali Tishby:
Extraction of relevant speech features using the information bottleneck method. 353-356 - Mohammad Firouzmand, Laurent Girin, Sylvain Marchand:
Comparing several models for perceptual long-term modeling of amplitude and phase trajectories of sinusoidal speech. 357-360 - Hynek Hermansky, Petr Fousek:
Multi-resolution RASTA filtering for TANDEM-based ASR. 361-364 - Woojay Jeon, Biing-Hwang Juang:
A category-dependent feature selection method for speech signals. 365-368 - Trausti T. Kristjansson, Sabine Deligne, Peder A. Olsen:
Voicing features for robust speech detection. 369-372
Robust Speech Recognition I-IV
- Svein Gunnar Pettersen, Magne Hallstein Johnsen, Tor André Myrvoll:
Joint Bayesian predictive classification and parallel model combination for robust speech recognition. 373-376 - Glauco F. G. Yared, Fábio Violaro, Lívio C. Sousa:
Gaussian elimination algorithm for HMM complexity reduction in continuous speech recognition systems. 377-380 - Luis Buera, Eduardo Lleida, Antonio Miguel, Alfonso Ortega:
Robust speech recognition in cars using phoneme dependent multi-environment linear normalization. 381-384 - Yi Chen, Lin-Shan Lee:
Energy-based frame selection for reliable feature normalization and transformation in robust speech recognition. 385-388 - Yoshitaka Nakajima, Hideki Kashioka, Kiyohiro Shikano, Nick Campbell:
Remodeling of the sensor for non-audible murmur (NAM). 389-392 - Amarnag Subramanya, Jeff A. Bilmes, Chia-Ping Chen:
Focused word segmentation for ASR. 393-396
Speech Perception I, II
- Jennifer A. Alexander, Patrick C. M. Wong, Ann R. Bradlow:
Lexical tone perception in musicians and non-musicians. 397-400 - Joan K.-Y. Ma, Valter Ciocca, Tara L. Whitehill:
Contextual effect on perception of lexical tones in Cantonese. 401-404 - Hansjörg Mixdorff, Yu Hu, Denis Burnham:
Visual cues in Mandarin tone perception. 405-408 - Hansjörg Mixdorff, Yu Hu:
Cross-language perception of word stress. 409-412 - Anne Cutler:
The lexical statistics of word recognition problems caused by L2 phonetic confusion. 413-416 - Chun-Fang Huang, Masato Akagi:
A multi-layer fuzzy logical model for emotional speech perception. 417-420
Spoken Language Understanding I, II
- Ian R. Lane, Tatsuya Kawahara:
Utterance verification incorporating in-domain confidence and discourse coherence measures. 421-424 - Constantinos Boulis, Mari Ostendorf:
Using symbolic prominence to help design feature subsets for topic classification and clustering of natural human-human conversations. 425-428 - Katsuhito Sudoh, Hajime Tsukada:
Tightly integrated spoken language understanding using word-to-concept translation. 429-432 - Ruhi Sarikaya, Hong-Kwang Jeff Kuo, Vaibhava Goel, Yuqing Gao:
Exploiting unlabeled data using multiple classifiers for improved natural language call-routing. 433-436 - Hong-Kwang Jeff Kuo, Vaibhava Goel:
Active learning with minimum expected error for spoken language understanding. 437-440 - Matthias Thomae, Tibor Fábián, Robert Lieb, Günther Ruske:
Lexical out-of-vocabulary models for one-stage speech interpretation. 441-444
E-inclusion and Spoken Language Processing I, II
- Mark S. Hawley, Phil D. Green, Pam Enderby, Stuart P. Cunningham, Roger K. Moore:
Speech technology for e-inclusion of people with physical disabilities and disordered speech. 445-448 - Björn Granström:
Speech technology for language training and e-inclusion. 449-452 - Roger C. F. Tucker, Ksenia Shalonova:
Supporting the creation of TTS for local language voice information systems. 453-456 - Ove Andersen, Christian Hjulmand:
Access for all - a talking internet service. 457-460 - Knut Kvale, Narada D. Warakagoda:
A speech centric mobile multimodal service useful for dyslectics and aphasics. 461-464
Paralinguistic and Nonlinguistic Information in Speech
- Nick Campbell, Hideki Kashioka, Ryo Ohara:
No laughing matter. 465-468 - Christophe Blouin, Valérie Maffiolo:
A study on the automatic detection and characterization of emotion in a voice service context. 469-472 - Raul Fernandez, Rosalind W. Picard:
Classical and novel discriminant features for affect recognition from speech. 473-476 - Jaroslaw Cichosz, Krzysztof Slot:
Low-dimensional feature space derivation for emotion recognition. 477-480 - Carlos Toshinori Ishi, Hiroshi Ishiguro, Norihiro Hagita:
Proposal of acoustic measures for automatic detection of vocal fry. 481-484 - Khiet P. Truong, David A. van Leeuwen:
Automatic detection of laughter. 485-488 - Anton Batliner, Stefan Steidl, Christian Hacker, Elmar Nöth, Heinrich Niemann:
Tales of tuning - prototyping for automatic classification of emotional user states. 489-492 - Iker Luengo, Eva Navas, Inmaculada Hernáez, Jon Sánchez:
Automatic emotion recognition using prosodic parameters. 493-496 - Sungbok Lee, Serdar Yildirim, Abe Kazemzadeh, Shrikanth S. Narayanan:
An articulatory study of emotional speech production. 497-500 - Gregor Hofer, Korin Richmond, Robert A. J. Clark:
Informed blending of databases for emotional speech synthesis. 501-504 - Fabio Tesser, Piero Cosi, Carlo Drioli, Graziano Tisato:
Emotional FESTIVAL-MBROLA TTS synthesis. 505-508 - Felix Burkhardt:
Emofilt: the simulation of emotional speech by prosody-transformation. 509-512 - Andrew Rosenberg, Julia Hirschberg:
Acoustic/prosodic and lexical correlates of charismatic speech. 513-516 - Yoko Greenberg, Minoru Tsuzaki, Hiroaki Kato, Yoshinori Sagisaka:
Communicative speech synthesis using constituent word attributes. 517-520 - Angelika Braun, Matthias Katerbow:
Emotions in dubbed speech: an intercultural approach with respect to F0. 521-524 - Nicolas Audibert, Véronique Aubergé, Albert Rilliard:
The prosodic dimensions of emotion in speech: the relative weights of parameters. 525-528 - Susanne Schötz:
Stimulus duration and type in perception of female and male speaker age. 529-532 - Cecilia Ovesdotter Alm, Richard Sproat:
Perceptions of emotions in expressive storytelling. 533-536 - Hideki Kawahara, Alain de Cheveigné, Hideki Banno, Toru Takahashi, Toshio Irino:
Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT. 537-540 - Tomoko Yonezawa, Noriko Suzuki, Kenji Mase, Kiyoshi Kogure:
Gradually changing expression of singing voice based on morphing. 541-544
Issues in Large Vocabulary Decoding
- I. Lee Hetherington:
A multi-pass, dynamic-vocabulary approach to real-time, large-vocabulary speech recognition. 545-548 - George Saon, Daniel Povey, Geoffrey Zweig:
Anatomy of an extremely fast LVCSR decoder. 549-552 - Dong Yu, Li Deng, Alex Acero:
Evaluation of a long-contextual-Span hidden trajectory model and phonetic recognizer using a* lattice search. 553-556 - Takaaki Hori, Atsushi Nakamura:
Generalized fast on-the-fly composition algorithm for WFST-based speech recognition. 557-560 - Hiroaki Nanjo, Teruhisa Misu, Tatsuya Kawahara:
Minimum Bayes-risk decoding considering word significance for information retrieval system. 561-564 - Arthur Chan, Mosur Ravishankar, Alexander I. Rudnicky:
On improvements to CI-based GMM selection. 565-568 - Dominique Massonié, Pascal Nocera, Georges Linarès:
Scalable language model look-ahead for LVCSR. 569-572 - Miroslav Novak:
Memory efficient approximative lattice generation for grammar based decoding. 573-576 - Dong-Hoon Ahn, Su-Byeong Oh, Minhwa Chung:
Improved semi-dynamic network decoding using WFSTs. 577-580 - Janne Pylkkönen:
New pruning criteria for efficient decoding. 581-584 - Tibor Fábián, Robert Lieb, Günther Ruske, Matthias Thomae:
A confidence-guided dynamic pruning approach - utilization of confidence measurement in speech recognition. 585-588
Spoken Language Extraction / Retrieval I, II
- Toru Taniguchi, Akishige Adachi, Shigeki Okawa, Masaaki Honda, Katsuhiko Shirai:
Discrimination of speech, musical instruments and singing voices using the temporal patterns of sinusoidal segments in audio signals. 589-592 - Gabriel Murray, Steve Renals, Jean Carletta:
Extractive summarization of meeting recordings. 593-596 - Arjan van Hessen, Jaap Hinke:
IR-based classification of customer-agent phone calls. 597-600 - Benoît Favre, Frédéric Béchet, Pascal Nocera:
Mining broadcast news data: robust information extraction from word lattices. 601-604 - Mikko Kurimo, Ville T. Turunen:
To recover from speech recognition errors in spoken document retrieval. 605-608 - Edgar González, Jordi Turmo:
Unsupervised clustering of spontaneous speech documents. 609-612 - Masahide Yamaguchi, Masaru Yamashita, Shoichi Matsunaga:
Spectral cross-correlation features for audio indexing of broadcast news and meetings. 613-616 - Chiori Hori, Alex Waibel:
Spontaneous speech consolidation for spoken language applications. 617-620 - Sameer Maskey, Julia Hirschberg:
Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization. 621-624 - Te-Hsuan Li, Ming-Han Lee, Berlin Chen, Lin-Shan Lee:
Hierarchical topic organization and visual presentation of spoken documents using probabilistic latent semantic analysis (PLSA) for efficient retrieval/browsing applications. 625-628 - Janez Zibert, France Mihelic, Jean-Pierre Martens, Hugo Meinedo, João Paulo Neto, Laura Docío Fernández, Carmen García-Mateo, Petr David, Jindrich Zdánský, Matús Pleva, Anton Cizmar, Andrej Zgank, Zdravko Kacic, Csaba Teleki, Klára Vicsi:
The COST278 broadcast news segmentation and speaker clustering evaluation - overview, methodology, systems, results. 629-632 - Igor Szöke, Petr Schwarz, Pavel Matejka, Lukás Burget, Martin Karafiát, Michal Fapso, Jan Cernocký:
Comparison of keyword spotting approaches for informal continuous speech. 633-636 - Teruhisa Misu, Tatsuya Kawahara:
Dialogue strategy to clarify user's queries for document retrieval system with speech interface. 637-640 - Nicolas Moreau, Shan Jin, Thomas Sikora:
Comparison of different phone-based spoken document retrieval methods with text and spoken queries. 641-644
Signal Analysis, Processing and Feature Estimation I-III
- Pedro Gómez, Francisco Díaz Pérez, Agustín Álvarez Marquina, Rafael Martínez, Victoria Rodellar, Roberto Fernández-Baíllo, Alberto Nieto, Francisco J. Fernandez:
PCA of perturbation parameters in voice pathology detection. 645-648 - Anindya Sarkar, T. V. Sreenivas:
Dynamic programming based segmentation approach to LSF matrix reconstruction. 649-652 - T. Nagarajan, Douglas D. O'Shaughnessy:
Explicit segmentation of speech based on frequency-domain AR modeling. 653-656 - Petr Motlícek, Lukás Burget, Jan Cernocký:
Non-parametric speaker turn segmentation of meeting data. 657-660 - Petri Korhonen, Unto K. Laine:
Unsupervised segmentation of continuous speech using vector autoregressive time-frequency modeling errors. 661-664 - P. Vijayalakshmi, M. Ramasubba Reddy:
The analysis on band-limited hypernasal speech using group delay based formant extraction technique. 665-668 - Jindrich Zdánský, Jan Nouza:
Detection of acoustic change-points in audio records via global BIC maximization and dynamic programming. 669-672 - Md. Khademul Islam Molla, Keikichi Hirose, Nobuaki Minematsu:
Multi-band approach of audio source discrimination with empirical mode decomposition. 673-676 - Minoru Tsuzaki, Satomi Tanaka, Hiroaki Kato, Yoshinori Sagisaka:
Application of auditory image model for speech event detection. 677-680 - José Anibal Arias:
Unsupervised identification of speech segments using kernel methods for clustering. 681-684 - Georgios Evangelopoulos, Petros Maragos:
Speech event detection using multiband modulation energy. 685-688 - John Kominek, Alan W. Black:
Measuring unsupervised acoustic clustering through phoneme pair merge-and-split tests. 689-692 - Fabio Valente, Christian Wellekens:
Variational Bayesian speaker change detection. 693-696 - Sarah Borys, Mark Hasegawa-Johnson:
Distinctive feature based SVM discriminant features for improvements to phone recognition on telephone band speech. 697-700 - P. Vijayalakshmi, M. Ramasubba Reddy:
Detection of hypernasality using statistical pattern classifiers. 701-704 - Luis Weruaga, Marián Képesi:
Self-organizing chirp-sensitive artificial auditory cortical model. 705-708 - Sotiris Karabetsos, Pirros Tsiakoulis, Stavroula-Evita Fotinea, Ioannis Dologlou:
On the use of a decimative spectral estimation method based on eigenanalysis and SVD for formant and bandwidth tracking of speech signals. 709-712 - Alexei V. Ivanov, Marek Parfieniuk, Alexander A. Petrovsky:
Frequency-domain auditory suppression modelling (FASM) - a WDFT-based anthropomorphic noise-robust feature extraction algorithm for speech recognition. 713-716
Keynote Papers
- Fernando C. N. Pereira:
Linear models for structure prediction. 717-720
Speech Recognition - Language Modelling I-III
- Chuang-Hua Chueh, To-Chang Chien, Jen-Tzung Chien:
Discriminative maximum entropy language model for speech recognition. 721-724 - Maximilian Bisani, Hermann Ney:
Open vocabulary speech recognition with flat hybrid models. 725-728 - Minwoo Jeong, Jihyun Eun, Sangkeun Jung, Gary Geunbae Lee:
An error-corrective language-model adaptation for automatic speech recognition. 729-732 - Shiuan-Sung Lin, François Yvon:
Discriminative training of finite state decoding graphs. 733-736 - Holger Schwenk, Jean-Luc Gauvain:
Building continuous space language models for transcribing european languages. 737-740 - Peng Xu, Lidia Mangu:
Using random forest language models in the IBM RT-04 CTS system. 741-744
Spoken Language Acquisition, Development and Learning I, II
- Willemijn Heeren:
Perceptual development of the duration cue in dutch /a-a: /. 745-748 - Hong You, Abeer Alwan, Abe Kazemzadeh, Shrikanth S. Narayanan:
Pronunciation variations of Spanish-accented English spoken by young children. 749-752 - Willemijn Heeren:
L2 development of quantity perception: dutch listeners learning Finnish /t-t: /. 753-756 - Claudio Zmarich, Serena Bonifacio:
Phonetic inventories in Italian children aged 18-27 months: a longitudinal study. 757-760 - Hiroko Hirano, Goh Kawai:
Pitch patterns of intonational phrases and intonational phrase groups in native and non-native speech. 761-764 - Rebecca Hincks:
Measuring liveliness in presentation speech. 765-768
Multi-modal / Multi-media Processing I, II
- Nick Campbell:
Non-verbal speech processing for a communicative agent. 769-772 - Stuart N. Wrigley, Guy J. Brown:
Physiologically motivated audio-visual localisation and tracking. 773-776 - Jing Huang, Daniel Povey:
Discriminatively trained features using fMPE for multi-stream audio-visual speech recognition. 777-780 - Graziano Tisato, Piero Cosi, Carlo Drioli, Fabio Tesser:
INTERFACE: a new tool for building emotive/expressive talking heads. 781-784 - Pascual Ejarque, Javier Hernando:
Variance reduction by using separate genuine- impostor statistics in multimodal biometrics. 785-788 - Volker Schubert, Stefan W. Hamerich:
The dialog application metalanguage GDialogXML. 789-792 - Jonas Beskow, Mikael Nordenberg:
Data-driven synthesis of expressive visual speech using an MPEG-4 talking head. 793-796 - Oytun Türk, Marc Schröder, Baris Bozkurt, Levent M. Arslan:
Voice quality interpolation for emotional text-to-speech synthesis. 797-800 - Murtaza Bulut, Carlos Busso, Serdar Yildirim, Abe Kazemzadeh, Chul Min Lee, Sungbok Lee, Shrikanth S. Narayanan:
Investigating the role of phoneme-level modifications in emotional speech resynthesis. 801-804 - Björn W. Schuller, Ronald Müller, Manfred K. Lang, Gerhard Rigoll:
Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. 805-808 - Jonghwa Kim, Elisabeth André, Matthias Rehm, Thurid Vogt, Johannes Wagner:
Integrating information from speech and physiological signals to achieve emotional sensitivity. 809-812 - Ellen Douglas-Cowie, Laurence Devillers, Jean-Claude Martin, Roddy Cowie, Suzie Savvidou, Sarkis Abrilian, Cate Cox:
Multimodal databases of everyday emotion: facing up to complexity. 813-816
Spoken / Multi-modal Dialogue Systems I, II
- Francisco Torres, Emilio Sanchis, Encarna Segarra:
Learning of stochastic dialog models through a dialog simulation technique. 817-820 - Lesley-Ann Black, Michael F. McTear, Norman D. Black, Roy Harper, Michelle Lemon:
Evaluating the DI@l-log system on a cohort of elderly, diabetic patients: results from a preliminary study. 821-824 - Pavel Král, Christophe Cerisara, Jana Klecková:
Combination of classifiers for automatic recognition of dialog acts. 825-828 - Xiaojun Wu, Thomas Fang Zheng, Michael Brasser, Zhanjiang Song:
Rapidly developing spoken Chinese dialogue systems with the d-ear SDS SDK. 829-832 - Daniela Oria, Akos Vetek:
Robust algorithms and interaction strategies for voice spelling. 833-836 - Ioannis Toptsis, Axel Haasch, Sonja Hwel, Jannik Fritsch, Gernot A. Fink:
Modality integration and dialog management for a robotic assistant. 837-840 - Norbert Reithinger, Daniel Sonntag:
An integration framework for a mobile multimodal dialogue system accessing the semantic web. 841-844 - Ryuichi Nisimura, Akinobu Lee, Masashi Yamada, Kiyohiro Shikano:
Operating a public spoken guidance system in real environment. 845-848 - Esa-Pekka Salonen, Markku Turunen, Jaakko Hakulinen, Leena Helin, Perttu Prusi, Anssi Kainulainen:
Distributed dialogue management for smart terminal devices. 849-852 - Jaakko Hakulinen, Markku Turunen, Esa-Pekka Salonen:
Visualization of spoken dialogue systems for demonstration, debugging and tutoring. 853-856 - César González Ferreras, Valentín Cardeñoso-Payo:
Development and evaluation of a spoken dialog system to access a newspaper web site. 857-860 - Olivier Pietquin, Richard Beaufort:
Comparing ASR modeling methods for spoken dialogue simulation and optimal strategy learning. 861-864 - Shiu-Wah Chu, Ian M. O'Neill, Philip Hanna, Michael F. McTear:
An approach to multi-strategy dialogue management. 865-868 - Anna Hjalmarsson:
Towards user modelling in conversational dialogue systems: a qualitative study of the dynamics of dialogue parameters. 869-872 - Kouichi Katsurada, Kazumine Aoki, Hirobumi Yamada, Tsuneo Nitta:
Reducing the description amount in authoring MMI applications. 873-876 - Kazunori Komatani, Naoyuki Kanda, Tetsuya Ogata, Hiroshi G. Okuno:
Contextual constraints based on dialogue models in database search task for spoken dialogue systems. 877-880 - Mihai Rotaru, Diane J. Litman:
Using word-level pitch features to better predict student emotions during spoken tutoring dialogues. 881-884 - Antoine Raux, Brian Langner, Dan Bohus, Alan W. Black, Maxine Eskénazi:
Let's go public! taking a spoken dialog system to the real world. 885-888 - Shinya Fujie, Kenta Fukushima, Tetsunori Kobayashi:
Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue system. 889-892 - Kallirroi Georgila, James Henderson, Oliver Lemon:
Learning user simulations for information state update dialogue systems. 893-896 - Darío Martín-Iglesias, Yago Pereiro-Estevan, Ana I. García-Moral, Ascensión Gallardo-Antolín, Fernando Díaz-de-María:
Design of a voice-enabled interface for real-time access to stock exchange from a PDA through GPRS. 897-900 - William Schuler, Timothy A. Miller:
Integrating denotational meaning into a DBN language model. 901-904 - Louis ten Bosch:
Improving out-of-coverage language modelling in a multimodal dialogue system using small training sets. 905-908 - Olivier Galibert, Gabriel Illouz, Sophie Rosset:
Ritel: an open-domain, human-computer dialog system. 909-912
Robust Speech Recognition I-IV
- Reinhold Haeb-Umbach, Joerg Schmalenstroeer:
A comparison of particle filtering variants for speech feature enhancement. 913-916 - Ilyas Potamitis, Nikolaos D. Fakotakis:
Enhancement of mel log-power spectrum of speech using particle filtering. 917-920 - Makoto Shozakai, Goshu Nagino:
Improving robustness of speech recognition performance to aggregate of noises by two-dimensional visualization. 921-924 - Woohyung Lim, Bong Kyoung Kim, Nam Soo Kim:
Feature compensation based on switching linear dynamic model and soft decision. 925-928 - Shilei Huang, Xiang Xie, Jingming Kuang:
Using output probability distribution for improving speech recognition in adverse environment. 929-932 - Eric H. C. Choi:
A generalized framework for compensation of mel-filterbank outputs in feature extraction for robust ASR. 933-936 - Hesham Tolba, Zili Li, Douglas D. O'Shaughnessy:
Robust automatic speech recognition using a perceptually-based optimal spectral amplitude estimator speech enhancement algorithm in various low-SNR environments. 937-940 - Stephen So, Kuldip K. Paliwal:
Improved noise-robustness in distributed speech recognition via perceptually-weighted vector quantisation of filterbank energies. 941-944 - Babak Nasersharif, Ahmad Akbari:
Sub-band weighted projection measure for robust sub-band speech recognition. 945-948 - Jianping Deng, Martin Bouchard, Tet Hin Yeap:
Noise compensation using interacting multiple kalman filters. 949-952 - Veronique Stouten, Hugo Van hamme, Patrick Wambacq:
Kalman and unscented kalman filter feature enhancement for noise robust ASR. 953-956 - Chia-Yu Wan, Lin-Shan Lee:
Histogram-based quantization (HQ) for robust and scalable distributed speech recognition. 957-960 - Yong-Joo Chung:
A data-driven approach for the model parameter compensation in noisy speech recognition. 961-964 - Satoshi Kobashikawa, Satoshi Takahashi, Yoshikazu Yamaguchi, Atsunori Ogawa:
Rapid response and robust speech recognition by preliminary model adaptation for additive and convolutional noise. 965-968 - Saurabh Prasad, Stephen A. Zahorian:
Nonlinear and linear transformations of speech features to compensate for channel and noise effects. 969-972 - Motoyuki Suzuki, Yusuke Kato, Akinori Ito, Shozo Makino:
Construction method of acoustic models dealing with various background noises based on combination of HMMs. 973-976 - Haitian Xu, Zheng-Hua Tan, Paul Dalsgaard, Børge Lindberg:
Robust speech recognition based on noise and SNR classification - a multiple-model framework. 977-980 - Hwa Jeon Song, Hyung Soon Kim:
Eigen-environment based noise compensation method for robust speech recognition. 981-984 - Martin Graciarena, Horacio Franco, Gregory K. Myers, Victor Abrash:
Robust feature compensation in nonstationary and multiple noise environments. 985-988 - Jasha Droppo, Alex Acero:
Maximum mutual information SPLICE transform for seen and unseen conditions. 989-992 - Sven E. Krüger, Martin Schafföner, Marcel Katz, Edin Andelic, Andreas Wendemuth:
Speech recognition with support vector machines in a hybrid system. 993-996 - Vincent Barreaud, Douglas D. O'Shaughnessy, Jean-Guy Dahan:
Experiments on speaker profile portability. 997-1000 - Daniele Colibro, Luciano Fissore, Claudio Vair, Emanuele Dalmasso, Pietro Laface:
A confidence measure invariant to language and grammar. 1001-1004 - Ken Schutte, James R. Glass:
Robust detection of sonorant landmarks. 1005-1008
Speech Production I
- Amélie Rochet-Capellan, Jean-Luc Schwartz:
The labial-coronal effect and CVCV stability during reiterant speech production: an acoustic analysis. 1009-1012 - Amélie Rochet-Capellan, Jean-Luc Schwartz:
The labial-coronal effect and CVCV stability during reiterant speech production: an articulatory analysis. 1013-1016 - Mitsuhiro Nakamura:
Articulatory constraints and coronal stops: an EPG study. 1017-1020 - Vincent Robert, Brigitte Wrobel-Dautcourt, Yves Laprie, Anne Bonneau:
Strategies of labial coarticulation. 1021-1024 - Jianwu Dang, Jianguo Wei, Takeharu Suzuki, Pascal Perrier:
Investigation and modeling of coarticulation during speech. 1025-1028 - Fang Hu:
Tongue kinematics in diphthong production in Ningbo Chinese. 1029-1032 - Takayuki Arai:
Comparing tongue positions of vowels in oral and nasal contexts. 1033-1036 - Slim Ouni:
Can we retrieve vocal tract dynamics that produced speech? toward a speaker articulatory strategy model. 1037-1040 - Pascal Perrier, Liang Ma, Yohan Payan:
Modeling the production of VCV sequences via the inversion of a biomechanical model of the tongue. 1041-1044 - Xiaochuan Niu, Alexander Kain, Jan P. H. van Santen:
Estimation of the acoustic properties of the nasal tract during the production of nasalized vowels. 1045-1048 - Kohichi Ogata:
A web-based articulatory speech synthesis system for distance education. 1049-1052 - Paavo Alku, Matti Airas, Tom Bäckström, Hannu Pulakka:
Group delay function as a means to assess quality of glottal inverse filtering. 1053-1056 - Eva Björkner, Johan Sundberg, Paavo Alku:
Subglottal pressure and NAQ variation in voice production of classically trained baritone singers. 1057-1060 - Gunnar Fant, Anita Kruckenberg:
Covariation of subglottal pressure, F0 and intensity. 1061-1064 - Javier Pérez, Antonio Bonafonte:
Automatic voice-source parameterization of natural speech. 1065-1068 - Chakir Zeroual, John H. Esling, Lise Crevier-Buchman:
Physiological study of whispered speech in Moroccan Arabic. 1069-1072 - Carla P. Moura, D. Andrade, Luis M. Cunha, Maria J. Cunha, Helena Vilarinho, Henrique Barros, Diamantino Freitas, M. Pais-Clemente:
Voice quality in down syndrome children treated with rapid maxillary expansion. 1073-1076 - Julien Hanquinet, Francis Grenez, Jean Schoentgen:
Synthesis of disordered speech. 1077-1080 - Julie Fontecave, Frédéric Berthommier:
Quasi-automatic extraction of tongue movement from a large existing speech cineradiographic database. 1081-1084 - Shimon Sapir, Ravit Cohen Mimran:
The working memory token test (WMTT): preliminary findings in young adults with and without dyslexia. 1085-1088 - Sérgio Paulo, Luís C. Oliveira:
Reducing the corpus-based TTS signal degradation due to speaker's word pronunciations. 1089-1092 - Wai-Sum Lee:
A phonetic study of the "er-hua" rimes in Beijing Mandarin. 1093-1096
Acoustic Processing for ASR I-III
- Li Deng, Dong Yu, Alex Acero:
Learning statistically characterized resonance targets in a hidden trajectory model of speech coarticulation and reduction. 1097-1100 - Daniil Kocharov, András Zolnay, Ralf Schlüter, Hermann Ney:
Articulatory motivated acoustic features for speech recognition. 1101-1104 - Shinji Watanabe, Atsushi Nakamura:
Effects of Bayesian predictive classification using variational Bayesian posteriors for sparse training data in speech recognition. 1105-1108 - Yu Tsao, Jinyu Li, Chin-Hui Lee:
A study on separation between acoustic models and its applications. 1109-1112 - Mohamed Afify:
Extended baum-welch reestimation of Gaussian mixture models based on reverse Jensen inequality. 1113-1116 - Asela Gunawardana, Milind Mahajan, Alex Acero, John C. Platt:
Hidden conditional random fields for phone classification. 1117-1120
Signal Analysis, Processing and Feature Estimation I-III
- Francesco Gianfelici, Giorgio Biagetti, Paolo Crippa, Claudio Turchetti:
Asymptotically exact AM-FM decomposition based on iterated hilbert transform. 1121-1124 - Athanassios Katsamanis, Petros Maragos:
Advances in statistical estimation and tracking of AM-FM speech components. 1125-1128 - Jonathan Darch, Ben P. Milner, Saeed Vaseghi:
Formant frequency prediction from MFCC vectors in noisy environments. 1129-1132 - S. R. Mahadeva Prasanna, B. Yegnanarayana:
Detection of vowel onset point events using excitation information. 1133-1136 - João P. Cabral, Luís C. Oliveira:
Pitch-synchronous time-scaling for prosodic and voice quality transformations. 1137-1140 - Yasunori Ohishi, Masataka Goto, Katunobu Itou, Kazuya Takeda:
Discrimination between singing and speaking voices. 1141-1144
Spoken Language Resources and Technology Evaluation I, II
- Douglas A. Jones, Wade Shen, Elizabeth Shriberg, Andreas Stolcke, Teresa M. Kamm, Douglas A. Reynolds:
Two experiments comparing reading with listening for human processing of conversational telephone speech. 1145-1148 - Sylvain Galliano, Edouard Geoffrois, Djamel Mostefa, Khalid Choukri, Jean-François Bonastre, Guillaume Gravier:
The ESTER phase II evaluation campaign for the rich transcription of French broadcast news. 1149-1152 - Takashi Saito:
A method of multi-layered speech segmentation tailored for speech synthesis. 1153-1156 - Sérgio Paulo, Luís C. Oliveira:
Generation of word alternative pronunciations using weighted finite state transducers. 1157-1160 - Helmer Strik, Diana Binnenpoorte, Catia Cucchiarini:
Multiword expressions in spontaneous speech: do we really speak like that? 1161-1164 - Jáchym Kolár, Jan Svec, Stephanie M. Strassel, Christopher Walker, Dagmar Kozlíková, Josef Psutka:
Czech spontaneous speech corpus with structural metadata. 1165-1168
Early Language Acquisition
- Kentaro Ishizuka, Ryoko Mugitani, Hiroko Kato Solvang, Shigeaki Amano:
A longitudinal analysis of the spectral peaks of vowels for a Japanese infant. 1169-1172 - Krisztina Zajdó, Jeannette M. van der Stelt, Ton G. Wempe, Louis C. W. Pols:
Cross-linguistic comparison of two-year-old children's acoustic vowel spaces: contrasting Hungarian with dutch. 1173-1176 - Britta Lintfert, Katrin Schneider:
Acoustic correlates of contrastive stress in German children. 1177-1180 - Giampiero Salvi:
Ecological language acquisition via incremental model-based clustering. 1181-1184 - Tamami Sudo, Ken Mogi:
Perceptual and linguistic category formation in infants. 1185-1188
Multi-modal / Multi-media Processing I, II
- Raghunandan S. Kumaran, Karthik Narayanan, John N. Gowdy:
Myoelectric signals for multimodal speech recognition. 1189-1192 - Philippe Daubias:
Is color information really useful for lip-reading ? (or what is lost when color is not used). 1193-1196 - Islam Shdaifat, Rolf-Rainer Grigat:
A system for audio-visual speech recognition. 1197-1200 - Norihide Kitaoka, Hironori Oshikawa, Seiichi Nakagawa:
Multimodal interface for organization name input based on combination of isolated word recognition and continuous base-word recognition. 1201-1204 - Yosuke Matsusaka:
Recognition of (3) party conversation using prosody and gaze. 1205-1208 - Dongdong Li, Yingchun Yang, Zhaohui Wu:
Combining voiceprint and face biometrics for speaker identification using SDWS. 1209-1212 - Neil Cooke, Martin J. Russell:
Using the focus of visual attention to improve spontaneous speech recognition. 1213-1216 - Sabri Gurbuz:
Real-time outer lip contour tracking for HCI applications. 1217-1220 - Jing Huang, Karthik Visweswariah:
Improving lip-reading with feature space transforms for multi-stream audio-visual speech recognition. 1221-1224 - Hansjörg Mixdorff, Denis Burnham, Guillaume Vignali, Patavee Charnvivit:
Are there facial correlates of Thai syllabic tones? 1225-1228 - Rowan Seymour, Ji Ming, Darryl Stewart:
A new posterior based audio-visual integration method for robust speech recognition. 1229-1232
Bridging the Gap ASR-HSR
- Sorin Dusan, Lawrence R. Rabiner:
On integrating insights from human speech perception into automatic speech recognition. 1233-1236 - Odette Scharenborg:
Parallels between HSR and ASR: how ASR can contribute to HSR. 1237-1240 - Louis ten Bosch, Odette Scharenborg:
ASR decoding in a computational model of human word recognition. 1241-1244 - Viktoria Maier, Roger K. Moore:
An investigation into a simulation of episodic memory for automatic speech recognition. 1245-1248 - Eric Fosler-Lussier, C. Anton Rytting, Soundararajan Srinivasan:
Phonetic ignorance is bliss: investigating the effects of phonetic information reduction on ASR performance. 1249-1252 - Marcus Holmberg, David Gelbart, Ulrich Ramacher, Werner Hemmert:
Automatic speech recognition with neural spike trains. 1253-1256 - Michael J. Carey, Tuan P. Quang:
A speech similarity distance weighting for robust recognition. 1257-1260 - Takao Murakami, Kazutaka Maruyama, Nobuaki Minematsu, Keikichi Hirose:
Japanese vowel recognition based on structural representation of speech. 1261-1264 - Soundararajan Srinivasan, DeLiang Wang:
Modeling the perception of multitalker speech. 1265-1268 - Sue Harding, Jon P. Barker, Guy J. Brown:
Binaural feature selection for missing data speech recognition. 1269-1272 - Thorsten Wesker, Bernd T. Meyer, Kirsten Wagener, Jörn Anemüller, Alfred Mertins, Birger Kollmeier:
Oldenburg logatome speech corpus (OLLO) for speech recognition experiments with humans and machines. 1273-1276
Speech Recognition - Language Modelling I-III
- Jen-Wei Kuo, Berlin Chen:
Minimum word error based discriminative training of language models. 1277-1280 - A. Ghaoui, François Yvon, Chafic Mokbel, Gérard Chollet:
On the use of morphological constraints in n-gram statistical language model. 1281-1284 - Elvira I. Sicilia-Garcia, Ji Ming, Francis Jack Smith:
A posteriori multiple word-domain language model. 1285-1288 - Javier Dieguez-Tirado, Carmen García-Mateo, Antonio Cardenal López:
Effective topic-tree based language model adaptation. 1289-1292 - Abhinav Sethy, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Building topic specific language models from webdata using competitive models. 1293-1296 - Carlos Troncoso, Tatsuya Kawahara:
Trigger-based language model adaptation for automatic meeting transcription. 1297-1300 - Jacques Duchateau, Dong Hoon Van Uytsel, Hugo Van hamme, Patrick Wambacq:
Statistical language models for large vocabulary spontaneous speech recognition in dutch. 1301-1304 - Alexandre Allauzen, Jean-Luc Gauvain:
Diachronic vocabulary adaptation for broadcast news transcription. 1305-1308 - Vesa Siivola, Bryan L. Pellom:
Growing an n-gram language model. 1309-1312 - Harald Hning, Manuel Kirschner, Fritz Class, André Berton, Udo Haiber:
Embedding grammars into statistical language models. 1313-1316 - Simo Broman, Mikko Kurimo:
Methods for combining language models in speech recognition. 1317-1320 - Airenas Vaiciunas, Gailius Raskinis:
Review of statistical modeling of highly inflected lithuanian using very large vocabulary. 1321-1324 - Genevieve Gorrell, Brandyn Webb:
Generalized hebbian algorithm for incremental latent semantic analysis. 1325-1328 - Arnar Thor Jensson, Edward W. D. Whittaker, Koji Iwano, Sadaoki Furui:
Language model adaptation for resource deficient languages using translated data. 1329-1332 - Petra Witschel, Sergey Astrov, Gabriele Bakenecker, Josef G. Bauer, Harald Höge:
POS-based language models for large vocabulary speech recognition on embedded systems. 1333-1336
Speech Recognition - Pronunciation Modelling
- Je Hun Jeon, Minhwa Chung:
Automatic generation of domain-dependent pronunciation lexicon with data-driven rules and rule adaptation. 1337-1340 - Michael Tjalve, Mark A. Huckvale:
Pronunciation variation modelling using accent features. 1341-1344 - Khiet P. Truong, Ambra Neri, Febe de Wet, Catia Cucchiarini, Helmer Strik:
Automatic detection of frequent pronunciation errors made by L2-learners. 1345-1348 - Josef Psutka, Pavel Ircing, Josef V. Psutka, Jan Hajic, William J. Byrne, Jirí Mírovský:
Automatic transcription of Czech, Russian, and Slovak spontaneous speech in the MALACH project. 1349-1352 - Stéphane Dupont, Christophe Ris, Laurent Couvreur, Jean-Marc Boite:
A study of implicit and explicit modeling of coarticulation and pronunciation variation. 1353-1356 - Shinya Takahashi, Tsuyoshi Morimoto, Sakashi Maeda, Naoyuki Tsuruta:
Detection of coughs from user utterances using imitated phoneme model. 1357-1360 - V. Ramasubramanian, P. Srinivas, T. V. Sreenivas:
Stochastic pronunciation modeling by ergodic-HMM of acoustic sub-word units. 1361-1364 - Chen Liu, Lynette Melnar:
An automated linguistic knowledge-based cross-language transfer method for building acoustic models for a language without native training data. 1365-1368 - Ghazi Bouselmi, Dominique Fohr, Irina Illina, Jean Paul Haton:
Fully automated non-native speech recognition using confusion-based acoustic model integration. 1369-1372
Prosodic Structure
- Véronique Aubergé, Albert Rilliard:
The focus prosody: more than a simple binary function. 1373-1376 - Martha Dalton, Ailbhe Ní Chasaide:
Peak timing in two dialects of connaught irish. 1377-1380 - Janet Fletcher:
Compound rises and "uptalk" in spoken English. 1381-1384 - Li-chiung Yang:
Duration and the temporal structure of Mandarin discourse. 1385-1388 - Bei Wang:
Prosodic realization of split noun phrases in Mandarin Chinese compared in topic and focus contexts. 1389-1392 - Ziyu Xiong:
Downstep effect on disyllabic words of citation forms in standard Chinese. 1393-1396 - Jinfu Ni, Hisashi Kawai, Keikichi Hirose:
Estimation of intonation variation with constrained tone transformations. 1397-1400 - Ho-hsien Pan:
Voice quality of falling tones in taiwan min. 1401-1404 - Chiu-yu Tseng, Bau-Ling Fu:
Duration, intensity and pause predictions in relation to prosody organization. 1405-1408 - Jiahong Yuan, Jason M. Brenier, Daniel Jurafsky:
Pitch accent prediction: effects of genre and speaker. 1409-1412 - Hiroya Fujisaki, Sumio Ohno:
Analysis and modeling of fundamental frequency contours of hindi utterances. 1413-1416 - Natasha Govender, Etienne Barnard, Marelie H. Davel:
Fundamental frequency and tone in isizulu: initial experiments. 1417-1420 - Judith Bishop, Marc Peake, Dmitry Sityaev:
Intonational sequences in tuscan Italian. 1421-1424 - Caterina Petrone:
Effects of raddoppiamento sintattico on tonal alignment in Italian. 1425-1428 - Tomás Dubeda, Jan Votrubec:
Acoustic analysis of Czech stress: intonation, duration and intensity revisited. 1429-1432 - Mohamed Yeou:
Variability of F0 peak alignment in moroccan Arabic accentual focus. 1433-1436 - Anne Lacheret, Ch. Lyche, Michel Morel:
Phonological analysis of schwa and liaison within the PFC project (phonologie du fran ais contemporain): how determinant are the prosodic factors? 1437-1440 - Plínio A. Barbosa, Pablo Arantes, Alexsandro R. Meireles, Jussara M. Vieira:
Abstractness in speech-metronome synchronisation: P-centres as cyclic attractors. 1441-1444
Applications of Confidence Related Measures to ASR
- Makoto Yamada, Tsuneo Kato, Masaki Naito, Hisashi Kawai:
Improvement of rejection performance of keyword spotting using anti-keywords derived from large vocabulary considering acoustical similarity to keywords. 1445-1448 - Ralf Schlüter, T. Scharrenbach, Volker Steinbiss, Hermann Ney:
Bayes risk minimization using metric loss functions. 1449-1452 - Akio Kobayashi, Kazuo Onoe, Shoei Sato, Toru Imai:
Word error rate minimization using an integrated confidence measure. 1453-1456 - Bin Dong, Qingwei Zhao, Yonghong Yan:
Fast confidence measure algorithm for continuous speech recognition. 1457-1460 - Hamed Ketabdar, Jithendra Vepa, Samy Bengio, Hervé Bourlard:
Developing and enhancing posterior based speech recognition systems. 1461-1464 - Peng Liu, Ye Tian, Jian-Lai Zhou, Frank K. Soong:
Background model based posterior probability for measuring confidence. 1465-1468
Multilingual TTS
- Laura Mayfield Tomokiyo, Alan W. Black, Kevin A. Lenzo:
Foreign accents in synthetic speech: development and evaluation. 1469-1472 - Raul Fernandez, Wei Zhang, Ellen Eide, Raimo Bakis, Wael Hamza, Yi Liu, Michael Picheny, John F. Pitrelli, Yong Qing, Zhiwei Shuang, Li Qin Shen:
Toward multiple-language TTS: experiments in English and Mandarin. 1473-1476 - Javier Latorre, Koji Iwano, Sadaoki Furui:
Cross-language synthesis with a polyglot synthesizer. 1477-1480 - Mucemi Gakuru, Frederick K. Iraki, Roger C. F. Tucker, Ksenia Shalonova, Kamanda Ngugi:
Development of a Kiswahili text to speech system. 1481-1484 - Jaime Botella Ordinas, Volker Fischer, Claire Waast-Richard:
Multilingual models in the IBM bilingual text-to-speech systems. 1485-1488 - Artur Janicki, Piotr Herman:
Reconstruction of Polish diacritics in a text-to-speech system. 1489-1492
Speech Bandwidth Extension
- Hiroyuki Ehara, Toshiyuki Morii, Masahiro Oshikiri, Koji Yoshida, Kouichi Honma:
Design of bandwidth scalable LSF quantization using interframe and intraframe prediction. 1493-1496 - Bernd Geiser, Peter Jax, Peter Vary:
Artificial bandwidth extension of speech supported by watermark-transmitted side information. 1497-1500 - Rongqiang Hu, Venkatesh Krishnan, David V. Anderson:
Speech bandwidth extension by improved codebook mapping towards increased phonetic classification. 1501-1504 - Dhananjay Bansal, Bhiksha Raj, Paris Smaragdis:
Bandwidth expansion of narrowband speech using non-negative matrix factorization. 1505-1508 - Michael L. Seltzer, Alex Acero, Jasha Droppo:
Robust bandwidth extension of noise-corrupted narrowband speech. 1509-1512 - João P. Cabral, Luís C. Oliveira:
Pitch-synchronous time-scaling for high-frequency excitation regeneration. 1513-1516
Spoken Language Resources and Technology Evaluation I, II
- Felix Burkhardt, Astrid Paeschke, M. Rolfes, Walter F. Sendlmeier, Benjamin Weiss:
A database of German emotional speech. 1517-1520 - Philippe Boula de Mareüil, Christophe d'Alessandro, Gérard Bailly, Frédéric Béchet, Marie-Neige Garcia, Michel Morel, Romain Prudon, Jean Véronis:
Evaluating the pronunciation of proper names by four French grapheme-to-phoneme converters. 1521-1524 - Filip Jurcícek, Jirí Zahradil, Libor Jelínek:
A human-human train timetable dialogue corpus. 1525-1528 - Gloria Branco, Luís Almeida, Rui Gomes, Nuno Beires:
A Portuguese spoken and multi-modal dialog corpora. 1529-1532 - Joyce Y. C. Chan, P. C. Ching, Tan Lee:
Development of a Cantonese-English code-mixing speech corpus. 1533-1536 - Andrej Zgank, Darinka Verdonik, Aleksandra Zögling Markus, Zdravko Kacic:
BNSI Slovenian broadcast news database - speech and text corpus. 1537-1540 - Jan Volín, Radek Skarnitzl, Petr Pollák:
Confronting HMM-based phone labelling with human evaluation of speech production. 1541-1544 - Stephanie M. Strassel, Jáchym Kolár, Zhiyi Song, Leila Barclay, Meghan Lammie Glenn:
Structural metadata annotation: moving beyond English. 1545-1548 - Delphine Charlet, Sacha Krstulovic, Frédéric Bimbot, Olivier Boëffard, Dominique Fohr, Odile Mella, Filip Korkmazsky, Djamel Mostefa, Khalid Choukri, Arnaud Vallée:
Neologos: an optimized database for the development of new speech processing algorithms. 1549-1552 - Cheng-Yuan Lin, Kuan-Ting Chen, Jyh-Shing Roger Jang:
A hybrid approach to automatic segmentation and labeling for Mandarin Chinese speech corpus. 1553-1556 - Yuang-Chin Chiang, Min-Siong Liang, Hong-Yi Lin, Ren-Yuan Lyu:
The multiple pronunciations in Taiwanese and the automatic transcription of Buddhist sutra with augmented read speech. 1557-1560 - Marelie H. Davel, Etienne Barnard:
Bootstrapping pronunciation dictionaries: practical issues. 1561-1564 - Nigel G. Ward, Anais G. Rivera, Karen Ward, David G. Novick:
Root causes of lost time and user stress in a simple dialog system. 1565-1568 - Julie A. Parisi, Douglas Brungart:
Evaluating communication effectiveness in team collaboration. 1569-1572 - David Conejero, Alan Lounds, Carmen García-Mateo, Leandro Rodríguez Liñares, Raquel Mochales, Asunción Moreno:
Bilingual aligned corpora for speech to speech translation for Spanish, English and Catalan. 1573-1576 - Hynek Boril, Petr Pollák:
Design and collection of Czech Lombard speech database. 1577-1580 - Abe Kazemzadeh, Hong You, Markus Iseli, Barbara Jones, Xiaodong Cui, Margaret Heritage, Patti Price, Elaine Andersen, Shrikanth S. Narayanan, Abeer Alwan:
TBALL data collection: the making of a young children's speech corpus. 1581-1584 - Hitomi Tohyama, Shigeki Matsubara, Nobuo Kawaguchi, Yasuyoshi Inagaki:
Construction and utilization of bilingual speech corpus for simultaneous machine interpretation research. 1585-1588 - Rebecca A. Bates, Patrick Menning, Elizabeth Willingham, Chad Kuyper:
Meeting acts: a labeling system for group interaction in meetings. 1589-1592 - Marius-Calin Silaghi, Rachna Vargiya:
A new evaluation criteria for keyword spotting techniques and a new algorithm. 1593-1596 - Christoph Draxler, Alexander Steffen:
Phattsessionz: recording 1000 adolescent speakers in schools in Germany. 1597-1600 - Solomon Teferra Abate, Wolfgang Menzel, Bairu Tafila:
An Amharic speech corpus for large vocabulary continuous speech recognition. 1601-1604 - Hans Dolfing, David Reitter, Luís Almeida, Nuno Beires, Michael Cody, Rui Gomes, Kerry Robinson, Roman Zielinski:
The FASil speech and multimodal corpora. 1605-1608 - Karin Müller:
Revealing phonological similarities between German and dutch. 1609-1612
Large Vocabulary Speech Recognition Systems
- Dimitra Vergyri, Katrin Kirchhoff, Venkata Ramana Rao Gadde, Andreas Stolcke, Jing Zheng:
Development of a conversational telephone speech recognizer for Levantine Arabic. 1613-1616 - Bhuvana Ramabhadran:
Exploiting large quantities of spontaneous speech for unsupervised training of acoustic models. 1617-1620 - Che-Kuang Lin, Lin-Shan Lee:
Improved spontaneous Mandarin speech recognition by disfluency interruption point (IP) detection using prosodic features. 1621-1624 - Jeff Z. Ma, Spyros Matsoukas:
Improvements to the BBN RT04 Mandarin conversational telephone speech recognition system. 1625-1628 - Sakriani Sakti, Satoshi Nakamura, Konstantin Markov:
Incorporating a Bayesian wide phonetic context model for acoustic rescoring. 1629-1632 - Abdelkhalek Messaoudi, Lori Lamel, Jean-Luc Gauvain:
Modeling vowels for Arabic BN transcription. 1633-1636 - Mohamed Afify, Long Nguyen, Bing Xiang, Sherif M. Abdou, John Makhoul:
Recent progress in Arabic broadcast news transcription at BBN. 1637-1640 - Spyros Matsoukas, Rohit Prasad, Srinivas Laxminarayan, Bing Xiang, Long Nguyen, Richard M. Schwartz:
The 2004 BBN 1xRT recognition systems for English broadcast news and conversational telephone speech. 1641-1644 - Rohit Prasad, Spyros Matsoukas, Chia-Lin Kao, Jeff Z. Ma, Dongxin Xu, Thomas Colthurst, Owen Kimball, Richard M. Schwartz, Jean-Luc Gauvain, Lori Lamel, Holger Schwenk, Gilles Adda, Fabrice Lefèvre:
The 2004 BBN/LIMSI 20xRT English conversational telephone speech recognition system. 1645-1648 - Bing Xiang, Long Nguyen, Xuefeng Guo, Dongxin Xu:
The BBN Mandarin broadcast news transcription system. 1649-1652 - Paul Deléglise, Yannick Estève, Sylvain Meignier, Téva Merlin:
The LIUM speech transcription system: a CMU Sphinx III-based system for French broadcast news. 1653-1656 - Lori Lamel, Gilles Adda, Éric Bilinski, Jean-Luc Gauvain:
Transcribing lectures and seminars. 1657-1660 - Thomas Hain, John Dines, Giulia Garau, Martin Karafiát, Darren Moore, Vincent Wan, Roeland Ordelman, Steve Renals:
Transcription of conference room meetings: an investigation. 1661-1664 - Jean-Luc Gauvain, Gilles Adda, Martine Adda-Decker, Alexandre Allauzen, Véronique Gendner, Lori Lamel, Holger Schwenk:
Where are we in transcribing French broadcast news? 1665-1668 - Odette Scharenborg, Stephanie Seneff:
Two-pass strategy for handling OOVs in a large vocabulary recognition task. 1669-1672 - Long Nguyen, Bing Xiang, Mohamed Afify, Sherif M. Abdou, Spyros Matsoukas, Richard M. Schwartz, John Makhoul:
The BBN RT04 English broadcast news transcription system. 1673-1676 - Rong Zhang, Ziad Al Bawab, Arthur Chan, Ananlada Chotimongkol, David Huggins-Daines, Alexander I. Rudnicky:
Investigations on ensemble based semi-supervised acoustic model training. 1677-1680 - Jan Nouza, Jindrich Zdánský, Petr David, Petr Cerva, Jan Kolorenc, Dana Nejedlová:
Fully automated system for Czech spoken broadcast transcription with very large (300k+) lexicon. 1681-1684 - Mike Schuster, Takaaki Hori, Atsushi Nakamura:
Experiments with probabilistic principal component analysis in LVCSR. 1685-1688 - Thang Tat Vu, Dung Tien Nguyen, Chi Mai Luong, John-Paul Hosom:
Vietnamese large vocabulary continuous speech recognition. 1689-1692 - Takahiro Shinozaki, Mari Ostendorf, Les E. Atlas:
Data sampling for improved speech recognizer training. 1693-1696
Speech Perception I, II
- Do Dat Tran, Eric Castelli, Jean-François Serignat, Van Loan Trinh, Le Xuan Hung:
Influence of F0 on Vietnamese syllable perception. 1697-1700 - Barbara Schwanhäußer, Denis Burnham:
Lexical tone and pitch perception in tone and non-tone language speakers. 1701-1704 - Isabel Falé, Isabel Hub Faria:
Intonational contrasts in EP: a categorical perception approach. 1705-1708 - Bettina Braun, Andrea Weber, Matthew W. Crocker:
Does narrow focus activate alternative referents? 1709-1712 - Kiyoaki Aikawa, Hayato Hashimoto:
Audiovisual interaction on the perception of frequency glide of linear sweep tones. 1713-1716 - Kei Omata, Ken Mogi:
Audiovisual integration in dichotic listening. 1717-1720 - Gunilla Svanfeldt, Dirk Olszewski:
Perception experiment combining a parametric loudspeaker and a synthetic talking head. 1721-1724 - Catherine Mayo, Robert A. J. Clark, Simon King:
Multidimensional scaling of listener responses to synthetic speech. 1725-1728 - Hiroko Terasawa, Malcolm Slaney, Jonathan Berger:
A timbre space for speech. 1729-1732 - Abdellah Kacha, Francis Grenez, Jean Schoentgen:
Voice quality assessment by means of comparative judgments of speech tokens. 1733-1736 - Toshio Irino, Satoru Satou, Shunsuke Nomura, Hideki Banno, Hideki Kawahara:
Speech intelligibility derived from time-frequency and source smearing. 1737-1740 - Nahoko Hayashi, Takayuki Arai, Nao Hodoshima, Yusuke Miyauchi, Kiyohiro Kurisu:
Steady-state pre-processing for improving speech intelligibility in reverberant environments: evaluation in a hall with an electrical reverberator. 1741-1744 - Patrick C. M. Wong, Kiara M. Lee, Todd B. Parrish:
Neural bases of listening to speech in noise. 1745-1748 - P. Jongmans, Frans J. M. Hilgers, Louis C. W. Pols, Corina J. van As-Brooks:
The intelligibility of tracheoesophageal speech: first results. 1749-1752 - Guy J. Brown, Kalle J. Palomäki:
A computational model of the speech reception threshold for laterally separated speech and noise. 1753-1756 - Esther Janse:
Lexical inhibition effects in time-compressed speech. 1757-1760 - Caroline Jacquier, Fanny Meunier:
Perception of time-compressed rapid acoustic cues in French CV syllables. 1761-1764 - Claire-Léonie Grataloup, Michel Hoen, François Pellegrino, E. Veuillet, Lionel Collet, Fanny Meunier:
Reversed speech comprehension depends on the auditory efferent system functionality. 1765-1768 - Won Tokuma, Shinichi Tokuma:
Perceptual space of English fricatives for Japanese learners. 1769-1772 - Ioana Vasilescu, Maria Candea, Martine Adda-Decker:
Perceptual salience of language-specific acoustic differences in autonomous fillers across eight languages. 1773-1776 - Marc D. Pell:
Effects of cortical and subcortical brain damage on the processing of emotional prosody. 1777-1780
Keynote Papers
- Elizabeth Shriberg:
Spontaneous speech: how people really talk and why engineers should care. 1781-1784
Speech Recognition - Adaptation I, II
- Karthik Visweswariah, Peder A. Olsen:
Feature adaptation using projection of Gaussian posteriors. 1785-1788 - Xiao Li, Jeff A. Bilmes, Jonathan Malkin:
Maximum margin learning and adaptation of MLP classifiers. 1789-1792 - Arindam Mandal, Mari Ostendorf, Andreas Stolcke:
Leveraging speaker-dependent variation of adaptation. 1793-1796 - Roger Wend-Huu Hsiao, Brian Kan-Wing Mak:
A comparative study of two kernel eigenspace-based speaker adaptation methods on large vocabulary continuous speech recognition. 1797-1800 - Xuechuan Wang, Douglas D. O'Shaughnessy:
Environmental compensation using ASR model adaptation by a Bayesian parametric representation method. 1801-1804 - Jun Luo, Zhijian Ou, Zuoying Wang:
Discriminative speaker adaptation with eigenvoices. 1805-1808
Prosody Modelling and Speech Technology I, II
- Gina-Anne Levow:
Context in multi-lingual tone and pitch accent recognition. 1809-1812 - Fabio Tamburini:
Automatic prominence identification and prosodic typology. 1813-1816 - Tommy Ingulfsen, Tina Burrows, Sabine Buchholz:
Influence of syntax on prosodic boundary prediction. 1817-1820 - Roberto Gretter, Dino Seppi:
Using prosodic information for disambiguation purposes. 1821-1824 - Wentao Gu, Keikichi Hirose, Hiroya Fujisaki:
Analysis of the effects of word emphasis and echo question on F0 contours of Cantonese utterances. 1825-1828 - Tina Burrows, Peter Jackson, Katherine M. Knill, Dmitry Sityaev:
Combining models of prosodic phrasing and pausing. 1829-1832
Detecting and Synthesizing Speaker State
- Julia Hirschberg, Stefan Benus, Jason M. Brenier, Frank Enos, Sarah Friedman, Sarah Gilman, Cynthia Girand, Martin Graciarena, Andreas Kathol, Laura A. Michaelis, Bryan L. Pellom, Elizabeth Shriberg, Andreas Stolcke:
Distinguishing deceptive from non-deceptive speech. 1833-1836 - Jackson Liscombe, Julia Hirschberg, Jennifer J. Venditti:
Detecting certainness in spoken tutorial dialogues. 1837-1840 - Laurence Vidrascu, Laurence Devillers:
Detection of real-life emotions in call centers. 1841-1844 - Jackson Liscombe, Giuseppe Riccardi, Dilek Hakkani-Tür:
Using context to improve emotion detection in spoken dialog systems. 1845-1848 - Irena Yanushevskaya, Christer Gobl, Ailbhe Ní Chasaide:
Voice quality and f0 cues for affect expression: implications for synthesis. 1849-1852 - Toru Takahashi, Takeshi Fujii, Masashi Nishi, Hideki Banno, Toshio Irino, Hideki Kawahara:
Voice and emotional expression transformation based on statistics of vowel parameters in an emotional speech database. 1853-1856
Rapid Development of Spoken Dialogue Systems
- Giuseppe Di Fabbrizio, Gökhan Tür, Dilek Hakkani-Tür:
Automated wizard-of-oz for spoken dialogue systems. 1857-1860 - Kouichi Katsurada, Kunitoshi Sato, Hiroaki Adachi, Hirobumi Yamada, Tsuneo Nitta:
A rapid prototyping tool for constructing web-based MMI applications. 1861-1864 - Philip Hanna, Ian M. O'Neill, Xingkun Liu, Michael F. McTear:
Developing extensible and reusable spoken dialogue components: an examination of the Queen's communicator. 1865-1868 - Ye-Yi Wang, Alex Acero:
SGStudio: rapid semantic grammar development for spoken language understanding. 1869-1872 - Murat Akbacak, Yuqing Gao, Liang Gu, Hong-Kwang Jeff Kuo:
Rapid transition to new spoken dialogue domains: language model training using knowledge from previous domain applications and web text resources. 1873-1876 - Manny Rayner, Pierrette Bouillon, Nikos Chatzichrisafis, Beth Ann Hockey, Marianne Santaholma, Marianne Starlander, Hitoshi Isahara, Kyoko Kanzaki, Yukie Nakao:
A methodology for comparing grammar-based and robust approaches to speech understanding. 1877-1880
Text-to-Speech I, II
- François Mairesse, Marilyn A. Walker:
Learning to personalize spoken generation for dialogue systems. 1881-1884 - S. Revelin, Didier Cadic, Claire Waast-Richard:
Optimization of text-to-speech phonetic transcriptions using a-posteriori signal comparison. 1885-1888 - Özgül Salor, Mübeccel Demirekler:
Voice transformation using principle component analysis based LSF quantization and dynamic programming approach. 1889-1892 - Hai Ping Li, Wei Zhang:
Adapt Mandarin TTS system to Chinese dialect TTS systems. 1893-1896 - Min Zheng, Qin Shi, Wei Zhang, Lianhong Cai:
Grapheme-to-phoneme conversion based on TBL algorithm in Mandarin TTS system. 1897-1900 - Paolo Massimino, Alberto Pacchiotti:
An automaton-based machine learning technique for automatic phonetic transcription. 1901-1904 - Tasanawan Soonklang, Robert I. Damper, Yannick Marchand:
Comparative objective and subjective evaluation of three data-driven techniques for proper name pronunciation. 1905-1908 - Olov Engwall:
Articulatory synthesis using corpus-based estimation of line spectrum pairs. 1909-1912 - Aoju Chen, Els den Os:
Effects of pitch accent type on interpreting information status in synthetic speech. 1913-1916 - Perttu Prusi, Anssi Kainulainen, Jaakko Hakulinen, Markku Turunen, Esa-Pekka Salonen, Leena Helin:
Towards generic spatial object model and route guidance grammar for speech-based systems. 1917-1920 - Chi-Chun Hsia, Chung-Hsien Wu, Te-Hsien Liu:
Duration-embedded bi-HMM for expressive voice conversion. 1921-1924 - Toshio Hirai, Hisashi Kawai, Minoru Tsuzaki, Nobuyuki Nishizawa:
Analysis of major factors of naturalness degradation in concatenative synthesis. 1925-1928 - Jilei Tian, Jani Nurminen, Imre Kiss:
Duration modeling and memory optimization in a Mandarin TTS system. 1929-1932 - Min-Siong Liang, Ke-Chun Chuang, Rhuei-Cheng Yang, Yuang-Chin Chiang, Ren-Yuan Lyu:
A bi-lingual Mandarin-to-taiwanese text-to-speech system. 1933-1936 - Uwe D. Reichel, Florian Schiel:
Using morphology and phoneme history to improve grapheme-to-phoneme conversion. 1937-1940 - Olga Goubanova, Simon King:
Predicting consonant duration with Bayesian belief networks. 1941-1944 - Per-Anders Jande:
Inducing decision tree pronunciation variation models from annotated speech data. 1945-1948 - Lijuan Wang, Yong Zhao, Min Chu, Frank K. Soong, Zhigang Cao:
Phonetic transcription verification with generalized posterior probability. 1949-1952 - Hua Cheng, Fuliang Weng, Niti Hantaweepant, Lawrence Cavedon, Stanley Peters:
Training a maximum entropy model for surface realization. 1953-1956 - Tomoki Toda, Kiyohiro Shikano:
NAM-to-speech conversion with Gaussian mixture models. 1957-1960 - Michelina Savino, Mario Refice, Massimo Mitaritonna:
Which Italian do current systems speak? a first step towards pronunciation modelling of Italian varieties. 1961-1964 - Dominika Oliver, Robert A. J. Clark:
Modelling pitch accent types for Polish speech synthesis. 1965-1968 - Chatchawarn Hansakunbuntheung, Ausdang Thangthai, Chai Wutiwiwatchai, Rungkarn Siricharoenchai:
Learning methods and features for corpus-based phrase break prediction on Thai. 1969-1972 - Paul Taylor:
Hidden Markov models for grapheme to phoneme conversion. 1973-1976
Speaker Characterization and Recognition I-IV
- Longbiao Wang, Norihide Kitaoka, Seiichi Nakagawa:
Robust distant speaker recognition based on position dependent cepstral mean normalization. 1977-1980 - David A. van Leeuwen:
Speaker adaptation in the NIST speaker recognition evaluation 2004. 1981-1984 - Jacob Goldberger, Hagai Aronowitz:
A distance measure between GMMs based on the unscented transform and its application to speaker recognition. 1985-1988 - Sorin Dusan:
Estimation of speaker's height and vocal tract length from speech signal. 1989-1992 - Doroteo Torre Toledano, Carlos Fombella, Joaquin Gonzalez-Rodriguez, Luis A. Hernández Gómez:
On the relationship between phonetic modeling precision and phonetic speaker recognition accuracy. 1993-1996 - J. Fortuna, P. Sivakumaran, Aladdin M. Ariyaeeinia, Amit S. Malegaonkar:
Open-set speaker identification using adapted Gaussian mixture models. 1997-2000 - James McAuley, Ji Ming, Pat Corr:
Speaker verification in noisy conditions using correlated subband features. 2001-2004 - Mikaël Collet, Yassine Mami, Delphine Charlet, Frédéric Bimbot:
Probabilistic anchor models approach for speaker verification. 2005-2008 - Mijail Arcienega, Anil Alexander, Philipp Zimmermann, Andrzej Drygajlo:
A Bayesian network approach combining pitch and spectral envelope features to reduce channel mismatch in speaker verification and forensic speaker recognition. 2009-2012 - Kwok-Kwong Yiu, Man-Wai Mak, Sun-Yuan Kung:
Channel robust speaker verification via Bayesian blind stochastic feature transformation. 2013-2016 - Tomoko Matsui, Kunio Tanabe:
dPLRM-based speaker identification with log power spectrum. 2017-2020 - Xianxian Zhang, John H. L. Hansen, Pongtep Angkititrakul, Kazuya Takeda:
Speaker verification using Gaussian mixture models within changing real car environments. 2021-2024 - Kanae Amino, Tsutomu Sugawara, Takayuki Arai:
The correspondences between the perception of the speaker individualities contained in speech sounds and their acoustic properties. 2025-2028 - Samuel Kim, Sung-Wan Yoon, Thomas Eriksson, Hong-Goo Kang, Dae Hee Youn:
A noise-robust pitch synchronous feature extraction algorithm for speaker recognition systems. 2029-2032 - Jing Deng, Thomas Fang Zheng, Zhanjiang Song, Jian Liu:
Modeling high-level information by using Gaussian mixture correlation for GMM-UBM based speaker recognition. 2033-2036 - Xianxian Zhang, John H. L. Hansen:
In-set/out-of-set speaker identification based on discriminative speech frame selection. 2037-2040 - Zhenchun Lei, Yingchun Yang, Zhaohui Wu:
Mixture of support vector machines for text-independent speaker recognition. 2041-2044 - Shilei Zhang, Junmei Bai, Shuwu Zhang, Bo Xu:
Optimal model order selection based on regression tree in speaker identification. 2045-2048 - Marcos Faúndez-Zanuy, Jordi Solé-Casals:
Speaker verification improvement using blind inversion of distortions. 2049-2052
Single-channel Speech Enhancement
- Israel Cohen:
Supergaussian GARCH models for speech signals. 2053-2056 - Athanasios Mouchtaris, Jan Van der Spiegel, Paul Mueller, Panagiotis Tsakalides:
A spectral conversion approach to feature denoising and speech enhancement. 2057-2060 - Alfonso Ortega, Eduardo Lleida, Enrique Masgrau, Luis Buera, Antonio Miguel:
Acoustic feedback cancellation in speech reinforcement systems for vehicles. 2061-2064 - Julien Bourgeois, Jürgen Freudenberger, Guillaume Lathoud:
Implicit control of noise canceller for speech enhancement. 2065-2068 - T. M. Sunil Kumar, T. V. Sreenivas:
Speech enhancement using Markov model of speech segments. 2069-2072 - Vladimir Braquet, Takao Kobayashi:
A wavelet based noise reduction algorithm for speech signal corrupted by coloured noise. 2073-2076 - Esfandiar Zavarehei, Saeed Vaseghi:
Speech enhancement in temporal DFT trajectories using Kalman filters. 2077-2080 - Qin Yan, Saeed Vaseghi, Esfandiar Zavarehei, Ben P. Milner:
Formant-tracking linear prediction models for speech processing in noisy environments. 2081-2084 - Hui Jiang, Qian-Jie Fu:
Statistical noise compensation for cochlear implant processing. 2085-2088 - Tuan Van Pham, Gernot Kubin:
WPD-based noise suppression using nonlinearly weighted threshold quantile estimation and optimal wavelet shrinking. 2089-2092 - Weifeng Li, Katunobu Itou, Kazuya Takeda, Fumitada Itakura:
Subjective and objective quality assessment of regression-enhanced speech in real car environments. 2093-2096 - Masashi Unoki, Masaaki Kubo, Atsushi Haniu, Masato Akagi:
A model for selective segregation of a target instrument sound from the mixed sound of various instruments. 2097-2100 - Richard C. Hendriks, Richard Heusdens, Jesper Jensen:
Improved decision directed approach for speech enhancement using an adaptive time segmentation. 2101-2104 - Heinrich W. Löllmann, Peter Vary:
Generalized filter-bank equalizer for noise reduction with reduced signal delay. 2105-2108 - Nicoleta Roman, DeLiang Wang:
A pitch-based model for separation of reverberant speech. 2109-2112 - David Yuheng Zhao, W. Bastiaan Kleijn:
On noise gain estimation for HMM-based speech enhancement. 2113-2116 - Om Deshmukh, Carol Y. Espy-Wilson:
Speech enhancement using auditory phase opponency model. 2117-2120
Acoustic Modelling for LVCSR
- Brian Mak, Jeff Siu-Kei Au-Yeung, Yiu-Pong Lai, Man-Hung Siu:
High-density discrete HMM with the use of scalar quantization indexing. 2121-2124 - Jing Zheng, Andreas Stolcke:
Improved discriminative training using phone lattices. 2125-2128 - Qifeng Zhu, Barry Y. Chen, Frantisek Grézl, Nelson Morgan:
Improved MLP structures for data-driven feature extraction for ASR. 2129-2132 - Wolfgang Macherey, Lars Haferkamp, Ralf Schlüter, Hermann Ney:
Investigations on error minimizing training criteria for discriminative training in automatic speech recognition. 2133-2136 - Khe Chai Sim, Mark J. F. Gales:
Temporally varying model parameters for large vocabulary continuous speech recognition. 2137-2140 - Qifeng Zhu, Andreas Stolcke, Barry Y. Chen, Nelson Morgan:
Using MLP features in SRI's conversational speech recognition system. 2141-2144
Speech Production I
- Matti Airas, Hannu Pulakka, Tom Bäckström, Paavo Alku:
A toolkit for voice inverse filtering and parametrisation. 2145-2148 - Denisse Sciamarella, Christophe d'Alessandro:
Stylization of glottal-flow spectra produced by a mechanical vocal-fold model. 2149-2152 - Hideyuki Nomura, Tetsuo Funada:
Numerical glottal sound source model as coupled problem between vocal cord vibration and glottal flow. 2153-2156 - Marianne Pouplier, Maureen Stone:
A tagged-cine MRI investigation of German vowels. 2157-2160 - Antoine Serrurier, Pierre Badin:
A three-dimensional linear articulatory model of velum based on MRI data. 2161-2164 - Anne Cros, Didier Demolin, Ana Georgina Flesia, Antonio Galves:
On the relationship between intra-oral pressure and speech sonority. 2165-2168
Speaker Characterization and Recognition I-IV
- Mohamed Kamal Omar, Jirí Navrátil, Ganesh N. Ramaswamy:
Maximum conditional mutual information modeling for speaker verification. 2169-2172 - Luciana Ferrer, M. Kemal Sönmez, Sachin S. Kajarekar:
Class-dependent score combination for speaker recognition. 2173-2176 - Hagai Aronowitz, Dror Irony, David Burshtein:
Modeling intra-speaker variability for speaker recognition. 2177-2180 - Girija Chetty, Michael Wagner:
Liveness detection using cross-modal correlations in face-voice person authentication. 2181-2184 - Taichi Asami, Koji Iwano, Sadaoki Furui:
Stream-weight optimization by LDA and adaboost for multi-stream speaker verification. 2185-2188 - Yosef A. Solewicz, Moshe Koppel:
Considering speech quality in speaker verification fusion. 2189-2192
Gender and Age Issues in Speech and Language Research I, II
- Matteo Gerosa, Diego Giuliani, Fabio Brugnara:
Speaker adaptive acoustic modeling with mixture of adult and children's speech. 2193-2196 - Shona D'Arcy, Martin J. Russell:
A comparison of human and computer recognition accuracy for children's speech. 2197-2200 - Piero Cosi, Bryan L. Pellom:
Italian children's speech recognition for advanced interactive literacy tutors. 2201-2204 - Martine Adda-Decker, Lori Lamel:
Do speech recognizers prefer female speakers? 2205-2208 - Serdar Yildirim, Chul Min Lee, Sungbok Lee, Alexandros Potamianos, Shrikanth S. Narayanan:
Detecting Politeness and frustration state of a child in a conversational computer game. 2209-2212 - Diana Binnenpoorte, Christophe Van Bael, Els den Os, Lou Boves:
Gender in everyday speech and language: a corpus-based study. 2213-2216
Spoken Language Acquisition, Development and Learning I, II
- Shigeaki Amano:
Developmental change of phoneme duration in a Japanese infant and mother. 2217-2220 - Haiping Jia, Hiroki Mori, Hideki Kasuya:
Mora timing organization in producing contrastive geminate/single consonants and long/short vowels by native and non-native speakers of Japanese: effects of speaking rate. 2221-2224 - Hongyan Wang, Vincent J. van Heuven:
Mutual intelligibility of american, Chinese and dutch-accented speakers of English. 2225-2228 - Peter Juel Henrichsen:
Deriving a bi-lingual dictionary from raw transcription data. 2229-2232 - Kei Ohta, Seiichi Nakagawa:
A statistical method of evaluating pronunciation proficiency for Japanese words. 2233-2236
Language and Dialect Identification I, II
- Pavel Matejka, Petr Schwarz, Jan Cernocký, Pavel Chytil:
Phonotactic language identification using high quality phoneme recognition. 2237-2240 - Rongqing Huang, John H. L. Hansen:
Advances in word based dialect/accent classification. 2241-2244 - Rym Hamdi, Salem Ghazali, Melissa Barkat-Defradas:
Syllable structure in spoken Arabic: a comparative investigation. 2245-2248 - J. C. Marcadet, Volker Fischer, Claire Waast-Richard:
A transformation-based learning approach to language identification for mixed-lingual text-to-speech synthesis. 2249-2252 - Shuichi Itahashi, Shiwei Zhu, Mikio Yamamoto:
Constructing family trees of multilingual speech using Gaussian mixture models. 2253-2256 - Jean-Luc Rouas:
Modeling long and short-term prosody for language identification. 2257-2260
Spoken Language Translation I, II
- Matthias Paulik, Christian Fügen, Sebastian Stüker, Tanja Schultz, Thomas Schaaf, Alex Waibel:
Document driven machine translation enhanced ASR. 2261-2264 - Shahram Khadivi, András Zolnay, Hermann Ney:
Automatic text dictation in computer-assisted translation. 2265-2268 - Luis Rodríguez, Jorge Civera, Enrique Vidal, Francisco Casacuberta, César Ernesto Martínez:
On the use of speech recognition in computer assisted translation. 2269-2272 - Andreas Kathol, Kristin Precoda, Dimitra Vergyri, Wen Wang, Susanne Z. Riehemann:
Speech translation for low-resource languages: the case of Pashto. 2273-2276 - David Picó, Jorge González, Francisco Casacuberta, Diamantino Caseiro, Isabel Trancoso:
Finite-state transducer inference for a speech-input Portuguese-to-English machine translation system. 2277-2280 - Kenko Ohta, Keiji Yasuda, Gen-ichiro Kikui, Masuzo Yanagida:
Quantitative evaluation of effects of speech recognition errors on speech translation quality. 2281-2284
Multi-channel Speech Enhancement
- Thomas Lotter, Bastian Sauert, Peter Vary:
A stereo input-output superdirective beamformer for dual channel noise reduction. 2285-2288 - Ulrich Klee, Tobias Gehrig, John W. McDonough:
Kalman filters for time delay of arrival-based source localization. 2289-2292 - Osamu Ichikawa, Masafumi Nishimura:
Simultaneous adaptation of echo cancellation and spectral subtraction for in-car speech recognition. 2293-2296 - Rong Hu, Yunxin Zhao:
Variable step size adaptive decorrelation filtering for competing speech separation. 2297-2300 - Daisuke Saitoh, Atsunobu Kaminuma, Hiroshi Saruwatari, Tsuyoki Nishikawa, Akinobu Lee:
Speech extraction in a car interior using frequency-domain ICA with rapid filter adaptations. 2301-2304 - Rongqiang Hu, Sunil D. Kamath, David V. Anderson:
Speech enhancement using non-acoustic sensors. 2305-2308 - Marc Delcroix, Takafumi Hikichi, Masato Miyoshi:
Improved blind dereverberation performance by using spatial information. 2309-2312 - Junfeng Li, Masato Akagi:
A hybrid microphone array post-filter in a diffuse noise field. 2313-2316 - Venkatesh Krishnan, Phil Spencer Whitehead, David V. Anderson, Mark A. Clements:
A framework for estimation of clean speech by fusion of outputs from multiple speech enhancement systems. 2317-2320 - Yuki Denda, Takanobu Nishiura, Yoichi Yamashita:
A study of weighted CSP analysis with average speech spectrum for noise robust talker localization. 2321-2324 - Young-Ik Kim, Sung Jun An, Rhee Man Kil, Hyung-Min Park:
Sound segregation based on binaural zero-crossings. 2325-2328 - Jürgen Freudenberger, Klaus Linhard:
A two-microphone diversity system and its application for hands-free car kits. 2329-2332 - Takahiro Murakami, Kiyoshi Kurihara, Yoshihisa Ishida:
Directionally constrained minimization of power algorithm for speech signals. 2333-2336 - Alessio Brutti, Maurizio Omologo, Piergiorgio Svaizer:
Oriented global coherence field for the estimation of the head orientation in smart rooms equipped with distributed microphone arrays. 2337-2340 - Nilesh Madhu, Rainer Martin:
Robust speaker localization through adaptive weighted pair TDOA (AWEPAT) estimation. 2341-2344 - Guillaume Lathoud, Mathew Magimai-Doss, Bertrand Mesot:
A spectrogram model for enhanced source localization and noise-robust ASR. 2345-2348 - Sriram Srinivasan, Mattias Nilsson, W. Bastiaan Kleijn:
Denoising through source separation and minimum tracking. 2349-2352 - Louisa Busca Grisoni, John H. L. Hansen:
Collaborative voice activity detection for hearing aids. 2353-2356 - Enrique Robledo-Arnuncio, Biing-Hwang Juang:
Using inter-frequency decorrelation to reduce the permutation inconsistency problem in blind source separation. 2357-2360 - Amarnag Subramanya, Zhengyou Zhang, Zicheng Liu, Jasha Droppo, Alex Acero:
A graphical model for multi-sensory speech processing in air-and-bone conductive microphones. 2361-2364
Prosody in Language Performance I, II
- Heejin Kim, Jennifer Cole:
The stress foot as a unit of planned timing: evidence from shortening in the prosodic phrase. 2365-2368 - Pauline Welby, Hélène Loevenbruck:
Segmental "anchorage" and the French late rise. 2369-2372 - Ivan Chow:
Prosodic cues for syntactically-motivated junctures. 2373-2376 - Isabel Falé, Isabel Hub Faria:
A glimpse of the time-course of intonation processing in European Portuguese. 2377-2380 - Petra Wagner:
Great expectations - introspective vs. perceptual prominence ratings and their acoustic correlates. 2381-2384 - Christian Jensen, John Tøndering:
Choosing a scale for measuring perceived prominence. 2385-2388 - Jens Edlund, David House, Gabriel Skantze:
The effects of prosodic features on the interpretation of clarification ellipses. 2389-2392 - Matthias Jilka:
Exploration of different types of intonational deviations in foreign-accented and synthesized speech. 2393-2396 - Jörg Bröggelwirth:
A rhythmic-prosodic model of poetic speech. 2397-2400 - Sonja Biersack, Vera Kempe, Lorna Knapton:
Fine-tuning speech registers: a comparison of the prosodic features of child-directed and foreigner-directed speech. 2401-2404 - Timothy Arbisi-Kelm:
An analysis of the intonational structure of stuttered speech. 2405-2408 - Britta Lintfert, Wolfgang Wokurek:
Voice quality dimensions of pitch accents. 2409-2412 - Marion Dohen, Hélène Loevenbruck:
Audiovisual production and perception of contrastive focus in French: a multispeaker study. 2413-2416 - Pashiera Barkhuysen, Emiel Krahmer, Marc Swerts:
Predicting end of utterance in multimodal and unimodal conditions. 2417-2420 - Saori Tanaka, Masafumi Nishida, Yasuo Horiuchi, Akira Ichikawa:
Production of prominence in Japanese sign language. 2421-2424
Speaker Characterization and Recognition I-IV
- Andreas Stolcke, Luciana Ferrer, Sachin S. Kajarekar, Elizabeth Shriberg, Anand Venkataraman:
MLLR transforms as features in speaker recognition. 2425-2428 - Brendan Baker, Robbie Vogt, Sridha Sridharan:
Gaussian mixture modelling of broad phonetic and syllabic events for text-independent speaker verification. 2429-2432 - Hagai Aronowitz, David Burshtein:
Efficient speaker identification and retrieval. 2433-2436 - Rohit Sinha, S. E. Tranter, Mark J. F. Gales, Philip C. Woodland:
The Cambridge University March 2005 speaker diarisation system. 2437-2440 - Xuan Zhu, Claude Barras, Sylvain Meignier, Jean-Luc Gauvain:
Combining speaker identification and BIC for speaker diarization. 2441-2444 - Dan Istrate, Nicolas Scheffer, Corinne Fredouille, Jean-François Bonastre:
Broadcast news speaker tracking for ESTER 2005 campaign. 2445-2448
Phonetics and Phonology I, II
- Sorin Dusan:
On the nature of acoustic information in identification of coarticulated vowels. 2449-2452 - Cédric Gendrot, Martine Adda-Decker:
Impact of duration on F1/F2 formant values of oral vowels: an automatic analysis of large broadcast news corpora in French and German. 2453-2456 - Hugo Quené:
Modeling of between-speaker and within-speaker variation in spontaneous speech tempo. 2457-2460 - Masahiko Komatsu, Makiko Aoyagi:
Vowel devoicing vs. mora-timed rhythm in spontaneous Japanese - inspection of phonetic labels of OGI_TS. 2461-2464 - Jalal-Eddin Al-Tamimi, Emmanuel Ferragne:
Does vowel space size depend on language vowel inventories? evidence from two Arabic dialects and French. 2465-2468 - Chilin Shih:
Understanding phonology by phonetic implementation. 2469-2472
Spoken / Multi-modal Dialogue Systems I, II
- Niels Ole Bernsen, Laila Dybkjær:
User evaluation of conversational agent h. c. Andersen. 2473-2476 - Silke Goronzy, Nicole Beringer:
Integrated development and on-the-fly simulation of multimodal dialogs. 2477-2480 - Mihai Rotaru, Diane J. Litman, Katherine Forbes-Riley:
Interactions between speech recognition problems and user emotions. 2481-2484 - Junlan Feng, Srihari Reddy, Murat Saraclar:
Webtalk: mining websites for interactively answering questions. 2485-2488 - Sebastian Möller:
Towards generic quality prediction models for spoken dialogue systems - a case study. 2489-2492 - S. Parthasarathy, Cyril Allauzen, R. Munkong:
Robust access to large structured data using voice form-filling. 2493-2496
Human factors, User Experience and Natural Language Application Design
- Esther Levin, Alex Levin:
Spoken dialog system for real-time data capture. 2497-2500 - Michael Pucher, Peter Fröhlich:
A user study on the influence of mobile device class, synthesis method, data rate and lexicon on speech synthesis quality. 2501-2504 - Fang Chen, Yael Katzenellenbogen:
User's experience of a commercial speech dialogue system. 2505-2508 - Esther Levin, Amir M. Mané:
Voice user interface design for automated directory assistance. 2509-2512 - Maria Gabriela Alvarez-Ryan, Narendra K. Gupta, Barbara Hollister, Tirso Alonso:
Optimizing user experience through design of the spoken language understanding (SLU) module. 2513-2516 - Jeremy H. Wright, David A. Kapilow, Alicia Abella:
Interactive visualization of human-machine dialogs. 2517-2520
TTS Inventory
- Matthew P. Aylett:
Synthesising hyperarticulation in unit selection TTS. 2521-2524 - Daniel Tihelka:
Symbolic prosody driven unit selection for highly natural synthetic speech. 2525-2528 - Jindrich Matousek, Zdenek Hanzlícek, Daniel Tihelka:
Hybrid syllable/triphone speech synthesis. 2529-2532 - Francisco Campillo Díaz, José Luis Alba, Eduardo Rodríguez Banga:
A neural network approach for the design of the target cost function in unit-selection speech synthesis. 2533-2536 - Christian Weiss:
FSM and k-nearest-neighbor for corpus based video-realistic audio-visual synthesis. 2537-2540 - Gui-Lin Chen, Ke-Song Han, Zhen-Li Yu, Dong-Jian Yue, Yi-Qing Zu:
An embedded and concatenative approach to TTS of multiple languages. 2541-2544 - Tony Ezzat, Ethan Meyers, James R. Glass, Tomaso A. Poggio:
Morphing spectral envelopes using audio flow. 2545-2548 - Vincent Colotte, Richard Beaufort:
Linguistic features weighting for a text-to-speech system without prosody model. 2549-2552 - Ingunn Amdal, Torbjørn Svendsen:
Unit selection synthesis database development using utterance verification. 2553-2556 - Yong Zhao, Lijuan Wang, Min Chu, Frank K. Soong, Zhigang Cao:
Refining phoneme segmentations using speaker-adaptive context dependent boundary models. 2557-2560 - Yining Chen, Yong Zhao, Min Chu:
Customizing base unit set with speech database in TTS systems. 2561-2564 - Soufiane Rouibia, Olivier Rosec:
Unit selection for speech synthesis based on a new acoustic target cost. 2565-2568 - Dan Chazan, Ron Hoory, Zvi Kons, Ariel Sagi, Slava Shechtman, Alexander Sorin:
Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling. 2569-2572 - Francesc Alías, Ignasi Iriondo Sanz, Lluís Formiga, Xavier Gonzalvo, Carlos Monzo, Xavier Sevillano:
High quality Spanish restricted-domain TTS oriented to a weather forecast application. 2573-2576 - Ingmund Bjrkan, Torbjørn Svendsen, Snorre Farner:
Comparing spectral distance measures for join cost optimization in concatenative speech synthesis. 2577-2580 - Maria João Barros, Ranniery Maia, Keiichi Tokuda, Fernando Gil Resende, Diamantino Freitas:
HMM-based european Portuguese TTS system. 2581-2584 - Wael Hamza, John F. Pitrelli:
Combining the flexibility of speech synthesis with the naturalness of pre-recorded audio: a comparison of two approaches to phrase-splicing TTS. 2585-2588 - Guntram Strecha, Oliver Jokisch, Matthias Eichner, Rüdiger Hoffmann:
Codec integrated voice conversion for embedded speech synthesis. 2589-2592 - David Sündermann, Guntram Strecha, Antonio Bonafonte, Harald Höge, Hermann Ney:
Evaluation of VTLN-based voice conversion for embedded speech synthesis. 2593-2596 - Juri Isogai, Junichi Yamagishi, Takao Kobayashi:
Model adaptation and adaptive training using ESAT algorithm for HMM-based speech synthesis. 2597-2600 - Tien Ying Fung, Yuk-Chi Li, Eddie Sio, Icarus Lee, Helen M. Meng, P. C. Ching:
Embedded Cantonese TTS for multi-device access to web content. 2601-2604 - Karl Schnell, Arild Lacroix:
Model based analysis of a diphone database for improved unit concatenation. 2605-2608
Robust Speech Recognition I-IV
- Ning Ma, Phil D. Green:
Context-dependent word duration modelling for robust speech recognition. 2609-2612 - Julien Epps, Eric H. C. Choi:
An energy search approach to variable frame rate front-end processing for robust ASR. 2613-2616 - Roberto Gemello, Franco Mana, Renato de Mori:
Non-linear estimation of voice activity to improve automatic recognition of noisy speech. 2617-2620 - Yusuke Kida, Tatsuya Kawahara:
Voice activity detection based on optimally weighted combination of multiple features. 2621-2624 - Pei Ding:
Soft decision strategy and adaptive compensation for robust speech recognition against impulsive noise. 2625-2628 - Nicolás Morales, Doroteo T. Toledano, John H. L. Hansen, José Colás, Javier Garrido Salas:
Statistical class-based MFCC enhancement of filtered and band-limited speech for robust ASR. 2629-2632 - Hemant Misra, Hervé Bourlard:
Spectral entropy feature in full-combination multi-stream for robust ASR. 2633-2636 - Wooil Kim, Richard M. Stern, Hanseok Ko:
Environment-independent mask estimation for missing-feature reconstruction. 2637-2640 - André Coy, Jon Barker:
Soft harmonic masks for recognising speech in the presence of a competing speaker. 2641-2644 - Lech Szymanski, Martin Bouchard:
Comb filter decomposition for robust ASR. 2645-2648 - Panikos Heracleous, Tomomi Kaino, Hiroshi Saruwatari, Kiyohiro Shikano:
Investigating the role of the Lombard reflex in non-audible murmur (NAM) recognition. 2649-2652 - Evan Ruzanski, John H. L. Hansen, Don Finan, James Meyerhoff, William Norris, Terry Wollert:
Improved "TEO" feature-based automatic stress detection using physiological and acoustic speech sensors. 2653-2656 - Takeshi S. Kobayakawa:
Spectral subtraction using elliptic integral for multiplication factor. 2657-2660 - Longbiao Wang, Norihide Kitaoka, Seiichi Nakagawa:
Robust distant speech recognition based on position dependent CMN using a novel multiple microphone processing technique. 2661-2664 - H. Tanaka, Hiroshi Fujimura, Chiyomi Miyajima, Takanori Nishino, Katunobu Itou, Kazuya Takeda:
Data collection and evaluation of speech recognition for motorbike riders. 2665-2668 - Agustín Álvarez Marquina, Pedro Gómez, Victor Nieto Lluis, Rafael Martínez, Victoria Rodellar:
Application of a first-order differential microphone for efficient voice activity detection in a car platform. 2669-2672 - Panji Setiawan, Suhadi Suhadi, Tim Fingscheidt, Sorel Stan:
Robust speech recognition for mobile devices in car noise. 2673-2676 - Péter Mihajlik, Zoltán Tobler, Zoltán Tüske, Géza Gordos:
Evaluation and optimization of noise robust front-end technologies for the automatic recognition of Hungarian telephone speech. 2677-2680 - Gang Chen, Douglas D. O'Shaughnessy, Hesham Tolba:
A performance investigation of noisy voice recognition over IP telephony networks. 2681-2684 - Akinori Ito, Takashi Kanayama, Motoyuki Suzuki, Shozo Makino:
Internal noise suppression for speech recognition by small robots. 2685-2688 - Florian Kraft, Robert G. Malkin, Thomas Schaaf, Alex Waibel:
Temporal ICA for classification of acoustic events i a kitchen environment. 2689-2692 - Jan Felix Krebber:
"hello - is anybody at home?" - about the minimum word accuracy of a smart home spoken dialogue system. 2693-2696 - Hans-Günter Hirsch, Harald Finster:
The simulation of realistic acoustic input scenarios for speech recognition systems. 2697-2700 - Michael Walsh, Gregory M. P. O'Hare, Julie Carson-Berndsen:
An agent-based framework for speech investigation. 2701-2704
Speech Coding
- Stephen So, Kuldip K. Paliwal:
Switched split vector quantisation of line spectral frequencies for wideband speech coding. 2705-2708 - Changchun Bao, Jason Lukasiak, Christian H. Ritz:
A novel voicing cut-off determination for low bit-rate harmonic speech coding. 2709-2712 - Hauke Krüger, Peter Vary:
A partial decorrelation scheme for improved predictive open loop quantization with noise shaping. 2713-2716 - Venkatesh Krishnan, Thomas P. Barnwell III, David V. Anderson:
Using dynamic codebook re-ordering to exploit inter-frame correlation in MELP coders. 2717-2720 - Adriane Swalm Durey, Venkatesh Krishnan, Thomas P. Barnwell III:
Enhanced speech coding based on phonetic class segmentation. 2721-2724 - Ali Erdem Ertan, Thomas P. Barnwell III:
A pitch-synchronous pitch-cycle modification method for designing a hybrid i-MELP/waveform-matching speech coder. 2725-2728 - Joon-Hyuk Chang, Jong Won Shin, Seung Yeol Lee, Nam Soo Kim:
A new structural preprocessor for low-bit rate speech coding. 2729-2732 - Tiago H. Falk, Wai-Yip Chan, Peter Kabal:
An improved GMM-based voice quality predictor. 2733-2736 - Jan S. Erkelens:
High-quality memoryless subband coding of impulse responses at 22 bits per frame. 2737-2740 - Shi-Han Chen, Kuo-Guan Wu, Chih-Chung Kuo:
A study of variable pulse allocation for MPE and CELP coders based on PESQ analysis. 2741-2744 - José L. Pérez-Córdoba, Antonio M. Peinado, Angel M. Gomez, Antonio J. Rubio:
Joint source-channel coding of LSP parameters for bursty channels. 2745-2748
Gender and Age Issues in Speech and Language Research I, II
- Daniel Elenius, Mats Blomberg:
Adaptation and normalization experiments in speech recognition for 4 to 8 year old children. 2749-2752 - Wim Jansen, Hugo Van hamme:
PROSPECT features and their application to missing data techniques for vocal tract length normalization. 2753-2756 - Andreas Hagen, Bryan L. Pellom:
Data driven subword unit modeling for speech recognition and its application to interactive reading tutors. 2757-2760 - Anton Batliner, Mats Blomberg, Shona D'Arcy, Daniel Elenius, Diego Giuliani, Matteo Gerosa, Christian Hacker, Martin J. Russell, Stefan Steidl, Michael Wong:
The PF_STAR children's speech corpus. 2761-2764 - Linda Bell, Johan Boye, Joakim Gustafson, Mattias Heldner, Anders Lindström, Mats Wirén:
The Swedish NICE corpus - spoken dialogues between children and embodied characters in a computer game scenario. 2765-2768 - Yusuke Miyauchi, Nao Hodoshima, Keiichi Yasu, Nahoko Hayashi, Takayuki Arai, Mitsuko Shindo:
A preprocessing technique for improving speech intelligibility in reverberant environments: the effect of steady-state suppression on elderly people. 2769-2772
Discourse and Dialogue I, II
- Norbert Pfleger, Markus Löckelt:
Synchronizing dialogue contributions of human users and virtual characters in a virtual reality environment. 2773-2776 - Anand Venkataraman, Yang Liu, Elizabeth Shriberg, Andreas Stolcke:
Does active learning help automatic dialog act tagging in meeting data? 2777-2780 - Dan Bohus, Alexander I. Rudnicky:
A principled approach for rejection threshold optimization in spoken dialog systems. 2781-2784 - David Pérez-Piñar López, Carmen García-Mateo:
Application of confidence measures for dialogue systems through the use of parallel speech recognizers. 2785-2788 - Sophie Rosset, Delphine Tribout:
Multi-level information and automatic dialog acts detection in human-human spoken dialogs. 2789-2792 - Rieks op den Akker, Harry Bunt, Simon Keizer, Boris W. van Schooten:
From question answering to spoken dialogue: towards an information search assistant for interactive multimodal information extraction. 2793-2796
Text-to-Speech I, II
- Ulrich Reubold, Alexander Steffen:
Pitch-effects in diphone recording: are logatomes inappropriate? 2797-2800 - Tomoki Toda, Keiichi Tokuda:
Speech parameter generation algorithm considering global variance for HMM-based speech synthesis. 2801-2804 - Makoto Tachibana, Junichi Yamagishi, Takashi Masuko, Takao Kobayashi:
Performance evaluation of style adaptation for hidden semi-Markov model based speech synthesis. 2805-2808 - Gabriel Webster, Tina Burrows, Katherine M. Knill:
A comparison of methods for speaker-dependent pronunciation tuning for text-to-speech synthesis. 2809-2812 - Ann K. Syrdal, Alistair Conkie:
Perceptually-based data-driven join costs: comparing join types. 2813-2816 - Yannis Pantazis, Yannis Stylianou, Esther Klabbers:
Discontinuity detection in concatenated speech synthesis based on nonlinear speech analysis. 2817-2820
Language and Dialect Identification I, II
- Tingyao Wu, Dirk Van Compernolle, Jacques Duchateau, Qian Yang, Jean-Pierre Martens:
Improving the discrimination between native accents when recorded over different channels. 2821-2824 - Isabel Trancoso, António Joaquim Serralheiro, Céu Viana, Diamantino Caseiro:
Aligning and recognizing spoken books in different varieties of Portuguese. 2825-2828 - Bin Ma, Haizhou Li, Chin-Hui Lee:
An acoustic segment modeling approach to automatic language identification. 2829-2832 - Dong Zhu, Martine Adda-Decker, Fabien Antoine:
Different size multilingual phone inventories and context-dependent acoustic models for language identification. 2833-2836 - Sheng Gao, Bin Ma, Haizhou Li, Chin-Hui Lee:
A text categorization approach to automatic language identification. 2837-2840 - Giampiero Salvi:
Advances in regional accent clustering in Swedish. 2841-2844
Speech Recognition in Ubiquitous Networking and Context-Aware Computing
- David Pearce, Jonathan Engelsma, James C. Ferrans, John Johnson:
An architecture for seamless access to distributed multimodal services. 2845-2848 - Zheng-Hua Tan, Paul Dalsgaard, Børge Lindberg, Haitian Xu:
Robust speech recognition in ubiquitous networking and context-aware computing. 2849-2852 - Valentin Ion, Reinhold Haeb-Umbach:
Unified probabilistic approach to error concealment for distributed speech recognition. 2853-2856 - Alastair Bruce James, Ben Milner:
Combining packet loss compensation methods for robust distributed speech recognition. 2857-2860 - Trond Skogstad, Torbjørn Svendsen:
Distributed ASR using speech coder data for efficient feature vector representation. 2861-2864 - Sadaoki Furui, Tomohisa Ichiba, Takahiro Shinozaki, Edward W. D. Whittaker, Koji Iwano:
Cluster-based modeling for ubiquitous speech recognition. 2865-2868
Phonetics and Phonology I, II
- Danny R. Moates, Zinny S. Bond, Russell Fox, Verna Stockmal:
The feature [sonorant] in lexical access. 2869-2872 - Simone Mikuteit:
Voice and aspiration in German and east bengali stops: a cross-language study. 2873-2876 - Irene Jacobi, Louis C. W. Pols, Jan Stroop:
Polder dutch: aspects of the /ei/-lowering in standard dutch. 2877-2880 - Eric Castelli, René Carré:
Production and perception of Vietnamese vowels. 2881-2884 - Vu Ngoc Tuan, Christophe d'Alessandro, Alexis Michaud:
Using open quotient for the characterisation of vietnamese glottalised tones. 2885-2888 - John Hajek, Mary Stevens:
On the acoustic characterization of ejective stops in Waima'a. 2889-2892 - Mary Stevens, John Hajek:
Spirantization of /p t k/ in Sienese Italian and so-called semi-fricatives. 2893-2896 - Barbara Gili Fivela, Claudio Zmarich:
Italian geminates under speech rate and focalization changes: kinematic, acoustic, and perception data. 2897-2900 - Sunhee Kim:
Durational characteristics of Korean Lombard speech. 2901-2904 - Toshiko Isei-Jaakkola, Satoshi Asakawa:
A cross-linguistic study of vowel quantity in different word structures: Japanese, Finnish and Czech. 2905-2908 - Laura Mori, Melissa Barkat-Defradas:
Acoustic properties of foreign accent: VOT variations in Moroccan-accented Italian. 2909-2912 - Andréia S. Rauber, Paola Escudero, Ricardo Augusto Hoffmann Bion, Barbara O. Baptista:
The interrelation between the perception and production of English vowels by native speakers of Brazilian Portuguese. 2913-2916 - Julia Hoelterhoff:
Recognition of German obstruents. 2917-2920 - Radek Skarnitzl, Jan Volín:
Czech voiced labiodental continuant discrimination from basic acoustic data. 2921-2924 - Jean-Baptiste Maj, Anne Bonneau, Dominique Fohr, Yves Laprie:
An elitist approach for extracting automatically well-realized speech sounds with high confidence. 2925-2928 - Na'im R. Tyson:
Applying multiple regression models for predicting word duration in a corpus of spontaneous speech. 2929-2932 - Catarina Oliveira, Lurdes Castro Moutinho, António J. S. Teixeira:
On european Portuguese automatic syllabification. 2933-2936 - Aimilios Chalamandaris, Spyros Raptis, Pirros Tsiakoulis:
Rule-based grapheme-to-phoneme method for the Greek. 2937-2940 - Constandinos Kalimeris, George K. Mikros, Stelios Bakamidis:
Assimilation and deletion phenomena involving word-final /n/ and word-initial /p, t, k/ in modern Greek: a codification of the observed variation intended for use in TTS synthesis. 2941-2944 - Christian Weiss, Bianca Aschenberner:
A German viseme-set for automatic transcription of input text used for audio-visual speech synthesis. 2945-2948 - Johanna-Pascale Roy:
Visual perception of anticipatory rounding gestures in French. 2949-2952
Acoustic Processing for ASR I-III
- Michael Jonas, James G. Schmolze:
Hierarchical clustering of mixture tying using a partially observable Markov decision process. 2953-2956 - Pierre Ouellet, Gilles Boulianne, Patrick Kenny:
Flavors of Gaussian warping. 2957-2960 - Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer, Dan Chazan:
Phoneme alignment based on discriminative learning. 2961-2964 - Jussi Leppänen, Imre Kiss:
Comparison of low footprint acoustic modeling techniques for embedded ASR systems. 2965-2968 - Atiwong Suchato, Proadpran Punyabukkana:
Factors in classification of stop consonant place of articulation. 2969-2972 - Arthur R. Toth, Alan W. Black:
Cross-speaker articulatory position data for phonetic feature prediction. 2973-2976 - Daniel Povey:
Improvements to fMPE for discriminative training of features. 2977-2980 - Xin Lei, Mei-Yuh Hwang, Mari Ostendorf:
Incorporating tone-related MLP posteriors in the feature representation for Mandarin ASR. 2981-2984 - Yan Han, Johan de Veth, Lou Boves:
Speech trajectory clustering for improved speech recognition. 2985-2988 - Andrey Temko, Dusan Macho, Climent Nadeu:
Selection of features and combination of classifiers using a fuzzy approach for acoustic event classification. 2989-2992 - Jan Stadermann, Wolfram Koska, Gerhard Rigoll:
Multi-task learning strategies for a recurrent neural net in a hybrid tied-posteriors acoustic model. 2993-2996 - Florian Hönig, Georg Stemmer, Christian Hacker, Fabio Brugnara:
Revising Perceptual Linear Prediction (PLP). 2997-3000 - Joel Pinto, R. N. V. Sitaram:
Confidence measures in speech recognition based on probability distribution of likelihoods. 3001-3004 - Frank Diehl, Asunción Moreno, Enric Monte:
Continuous local codebook features for multi- and cross-lingual acoustic phonetic modelling. 3005-3008 - Antonio Miguel, Eduardo Lleida, Richard C. Rose, Luis Buera, Alfonso Ortega:
Augmented state space acoustic decoding for modeling local variability in speech. 3009-3012 - Dimitrios Dimitriadis, Petros Maragos, Alexandros Potamianos:
Auditory Teager energy cepstrum coefficients for robust speech recognition. 3013-3016 - Yasser Hifny, Steve Renals, Neil D. Lawrence:
A hybrid Maxent/HMM based ASR system. 3017-3020 - Hakan Erdogan:
Regularizing linear discriminant analysis for speech recognition. 3021-3024 - Yadong Wang, Steven Greenberg, Jayaganesh Swaminathan, Ramdas Kumaresan, David Poeppel:
Comprehensive modulation representation for automatic speech recognition. 3025-3028 - Qiang Fu, Biing-Hwang Juang:
Segment-based phonetic class detection using minimum verification error (MVE) training. 3029-3032 - Yi Liu, Pascale Fung:
Acoustic and phonetic confusions in accented speech recognition. 3033-3036 - Mario E. Munich, Qiguang Lin:
Auditory image model features for automatic speech recognition. 3037-3040 - Panikos Heracleous, Tomomi Kaino, Hiroshi Saruwatari, Kiyohiro Shikano:
Applications of NAM microphones in speech recognition for privacy in human-machine communication. 3041-3044 - Joe Frankel, Simon King:
A hybrid ANN/DBN approach to articulatory feature recognition. 3045-3048
Speaker Characterization and Recognition I-IV
- Daniel Moraru, Mathieu Ben, Guillaume Gravier:
Experiments on speaker tracking and segmentation in radio broadcast news. 3049-3052 - Emanuele Dalmasso, Pietro Laface, Daniele Colibro, Claudio Vair:
Unsupervised segmentation and verification of multi-speaker conversational speech. 3053-3056 - Sacha Krstulovic, Frédéric Bimbot, Delphine Charlet, Olivier Boëffard:
Focal speakers: a speaker selection method able to deal with heterogeneous similarity criteria. 3057-3060 - Mathieu Ben, Guillaume Gravier, Frédéric Bimbot:
A model space framework for efficient speaker detection. 3061-3064 - Nicolas Scheffer, Jean-François Bonastre:
Speaker detection using acoustic event sequences. 3065-3068 - Wei-Ho Tsai, Hsin-Min Wang:
Speaker clustering of unknown utterances based on maximum purity estimation. 3069-3072 - Petra Zochová, Vlasta Radová:
Modified DISTBIC algorithm for speaker change detection. 3073-3076 - Gilles Gonon, Rémi Gribonval, Frédéric Bimbot:
Decision trees with improved efficiency for fast speaker verification. 3077-3080 - Nicolas Eveno, Laurent Besacier:
A speaker independent "liveness" test for audio-visual biometrics. 3081-3084 - Shingo Kuroiwa, Yoshiyuki Umeda, Satoru Tsuge, Fuji Ren:
Distributed speaker recognition using speaker-dependent VQ codebook and earth mover's distance. 3085-3088 - Ka-Yee Leung, Man-Wai Mak, Man-Hung Siu, Sun-Yuan Kung:
Speaker verification via articulatory feature-based conditional pronunciation modeling with vowel and consonant mixture models. 3089-3092 - Jixu Chen, Beiqian Dai, Jun Sun:
Prosodic features based on wavelet analysis for speaker verification. 3093-3096 - Mohamed Mihoubi, Douglas D. O'Shaughnessy, Pierre Dumouchel:
Relevant information extraction for discriminative training applied to speaker identification. 3097-3100 - Jérôme Louradour, Khalid Daoudi:
Conceiving a new sequence kernel and applying it to SVM speaker verification. 3101-3104 - Jing Deng, Thomas Fang Zheng, Jian Liu, Wenhu Wu:
The predictive differential amplitude spectrum for robust speaker recognition in stationary noises. 3105-3108 - Michael Mason, Robbie Vogt, Brendan Baker, Sridha Sridharan:
Data-driven clustering for blind feature mapping in speaker verification. 3109-3112 - Xi Zhou, Zhiqiang Yao, Beiqian Dai:
Improved covariance modeling for GMM in speaker identification. 3113-3116 - Robbie Vogt, Brendan Baker, Sridha Sridharan:
Modelling session variability in text-independent speaker verification. 3117-3120 - Mihalis Siafarikas, Todor Ganchev, Nikolaos D. Fakotakis, George K. Kokkinakis:
Overlapping wavelet packet features for speaker verification. 3121-3124 - An-rong Yin, Xiang Xie, Jingming Kuang:
Using Hadamard ECOC in multi-class problems based on SVM. 3125-3128
Robust Speech Recognition I-IV
- Hank Liao, Mark J. F. Gales:
Joint uncertainty decoding for noise robust speech recognition. 3129-3132 - Vincent Vanhoucke:
Confidence scoring and rejection using multi-pass speech recognition. 3133-3136 - Cheng-Lung Lee, Wen-Whei Chang:
Memory-enhanced MMSE-based channel error mitigation for distributed speech recognition. 3137-3140 - Takashi Fukuda, Muhammad Ghulam, Tsuneo Nitta:
Designing multiple distinctive phonetic feature extractors for canonicalization by using clustering technique. 3141-3144 - Keisuke Kinoshita, Tomohiro Nakatani, Masato Miyoshi:
Efficient blind dereverberation framework for automatic speech recognition. 3145-3148 - Matthias Wölfel, John W. McDonough:
Combining multi-source far distance speech recognition strategies: beamforming, blind channel and confusion network combination. 3149-3152
Speech Coding and Quality Assessment
- Akira Takahashi, Atsuko Kurashima, Chiharu Morioka, Hideaki Yoshino:
Objective quality assessment of wideband speech by an extension of ITU-t recommendation p.862. 3153-3156 - Marc Werner, Peter Vary:
Quality control for UMTS-AMR speech channels. 3157-3160 - Wei Chen, Peter Kabal, Turaj Zakizadeh Shabestary:
Perceptual postfilter estimation for low bit rate speech coders using Gaussian mixture models. 3161-3164 - Kengo Fujita, Tsuneo Kato, Hideaki Yamada, Hisashi Kawai:
SNR-dependent background noise compensation of PESQ values for cellular phone speech. 3165 - Gil Ho Lee, Jae Sam Yoon, Hong Kook Kim:
A MFCC-based CELP speech coder for server-based speech recognition in network environments. 3169-3172 - Volodya Grancharov, Jonas Samuelsson, W. Bastiaan Kleijn:
Distortion measures for vector quantization of noisy spectrum. 3173-3176
Spoken Language Translation I, II
- Evgeny Matusov, Stephan Kanthak, Hermann Ney:
On the integration of speech recognition and statistical machine translation. 3177-3180 - V. H. Quan, Marcello Federico, Mauro Cettolo:
Integrated n-best re-ranking for spoken language translation. 3181-3184 - Josep Maria Crego, José B. Mariño, Adrià de Gispert:
An n-gram-based statistical machine translation decoder. 3185-3188 - Liang Gu, Yuqing Gao:
Use of maximum entropy in natural word generation for statistical concept-based speech-to-speech translation. 3189-3192 - Adrià de Gispert, José B. Mariño, Josep Maria Crego:
Improving statistical machine translation by classifying and generalizing inflected verb forms. 3193-3196 - Abdulvohid Bozarov, Yoshinori Sagisaka, Ruiqiang Zhang, Gen-ichiro Kikui:
Improved speech recognition word lattice translation by confidence measure. 3197-3200
Speech Inversion
- Parham Mokhtari, Tatsuya Kitamura, Hironori Takemoto, Kiyoshi Honda:
Vocal tract area function inversion by linear regression of cepstrum. 3201-3204 - Olov Engwall:
Introducing visual cues in acoustic-to-articulatory inversion. 3205-3208 - Victor N. Sorokin, Alexander S. Leonov, I. S. Makarov, A. I. Tsyplikhin:
Speech inversion and re-synthesis. 3209-3212 - Mark A. Huckvale, Ian S. Howard:
Teaching a vocal tract simulation to imitate stop consonants. 3213-3216 - Blaise Potard, Yves Laprie:
Using phonetic constraints in acoustic-to-articulatory inversion. 3217-3220 - Asterios Toutios, Konstantinos G. Margaritis:
A support vector approach to the acoustic-to-articulatory mapping. 3221-3224
Prosody Modelling and Speech Technology I, II
- Daniel Hirst, Cyril Auran:
Analysis by synthesis of speech prosody: the Prozed environment. 3225-3228 - Stephen Cox:
A discriminative approach to phrase break modelling. 3229-3232 - Ian Read, Stephen Cox:
Stochastic and syntactic techniques for predicting phrase breaks. 3233-3236 - Gerasimos Xydas, Panagiotis Zervas, Georgios Kouroupetroglou, Nikolaos D. Fakotakis, George K. Kokkinakis:
Tree-based prediction of prosodic phrase breaks on top of shallow textual features. 3237-3240 - Honghui Dong, Jianhua Tao, Bo Xu:
Chinese prosodic phrasing with a constraint-based approach. 3241-3244 - Minghui Dong, Kim-Teng Lua, Haizhou Li:
A probabilistic approach to prosodic word prediction for Mandarin Chinese TTS. 3245-3248 - João Paulo Ramos Teixeira, Diamantino Freitas, Hiroya Fujisaki:
Evaluation of a system for F0 contour prediction for european Portuguese. 3249-3252 - Ke Li, Yoshinori Sagisaka:
Analysis on command sequences of a F0 generation model for Mandarin speech and its application to their automatic extraction. 3253-3256 - Keikichi Hirose, Yusuke Furuyama, Nobuaki Minematsu:
Corpus-based extraction of F0 contour generation process model parameters. 3257-3260 - David Escudero Mancebo, Valentín Cardeñoso-Payo:
Optimized selection of intonation dictionaries in corpus based intonation modelling. 3261-3264 - Qinghua Sun, Keikichi Hirose, Wentao Gu, Nobuaki Minematsu:
Generation of fundamental frequency contours for Mandarin speech synthesis based on tone nucleus model. 3265-3268 - Chen-Yu Chiang, Yih-Ru Wang, Sin-Horng Chen:
On the inter-syllable coarticulation effect of pitch modeling for Mandarin speech. 3269-3272 - Matej Rojc, Pablo Daniel Agüero, Antonio Bonafonte, Zdravko Kacic:
Training the tilt intonation model using the JEMA methodology. 3273-3276 - Dagen Wang, Shrikanth S. Narayanan:
Piecewise linear stylization of pitch via wavelet analysis. 3277-3280 - Harald Romsdorfer, Beat Pfister:
Phonetic labeling and segmentation of mixed-lingual prosody databases. 3281-3284 - Edmilson Morais, Fábio Violaro:
Exploratory analysis of linguistic data based on genetic algorithm for robust modeling of the segmental duration of speech. 3285-3288 - Dafydd Gibbon, Flaviane Romani Fernandes:
Annotation-mining for rhythm model comparison in Brazilian portuguese. 3289-3292 - Tohru Nagano, Shinsuke Mori, Masafumi Nishimura:
A stochastic approach to phoneme and accent estimation. 3293-3296 - Jason M. Brenier, Daniel M. Cer, Daniel Jurafsky:
The detection of emphatic words using acoustic and lexical features. 3297-3300 - Dinoj Surendran, Gina-Anne Levow, Yi Xu:
Tone recognition in Mandarin using focus. 3301-3304 - Mikolaj Wypych:
An automatic intonation recognizer for the Polish language based on machine learning and expert knowledge. 3305-3308 - Atsuhiro Sakurai:
Generalized envelope matching technique for time-scale modification of speech (GEM-TSM). 3309-3312
Topics in Speech Recognition
- Yang Liu, Elizabeth Shriberg, Andreas Stolcke, Mary P. Harper:
Comparing HMM, maximum entropy, and conditional random fields for disfluency detection. 3313-3316 - Bhiksha Raj, Rita Singh, Paris Smaragdis:
Recognizing speech from simultaneous speakers. 3317-3320 - Vincent Wan, James Carmichael:
Polynomial dynamic time warping kernel support vector machines for dysarthric speech recognition with sparse training data. 3321-3324 - R. Lejeune, J. Baude, C. Tchong, Hubert Crepy, Claire Waast-Richard:
Flavoured acoustic model and combined spelling to sound for asymmetrical bilingual environment. 3325-3328 - Chris D. Bartels, Kevin Duh, Jeff A. Bilmes, Katrin Kirchhoff, Simon King:
Genetic triangulation of graphical models for speech and language processing. 3329-3332 - Guillermo Aradilla, Jithendra Vepa, Hervé Bourlard:
Improving speech recognition using a data-driven approach. 3333-3336 - Shigeki Matsuda, Wolfgang Herbordt, Satoshi Nakamura:
Outlier detection for acoustic model training using robust statistics. 3337-3340 - Jonathan Le Roux, Erik McDermott:
Optimization methods for discriminative training. 3341-3344 - Patrick Cardinal, Gilles Boulianne, Michel Comeau:
Segmentation of recordings based on partial transcriptions. 3345-3348 - Hussien Seid, Björn Gambäck:
A speaker independent continuous speech recognizer for Amharic. 3349-3352 - Tetsuji Ogawa, Tetsunori Kobayashi:
Optimizing the structure of partly-hidden Markov models using weighted likelihood-ratio maximization criterion. 3353-3356 - C. Santhosh Kumar, V. P. Mohandas, Haizhou Li:
Multilingual speech recognition: a unified approach. 3357-3360 - Tomás Bartos, Ludek Müller:
Detection of recognition errors based on classifiers trained on artificially created data. 3361-3364 - Jinyu Li, Chin-Hui Lee:
On designing and evaluating speech event detectors. 3365-3368 - Joseph Razik, Odile Mella, Dominique Fohr, Jean Paul Haton:
Local word confidence measure using word graph and n-best list. 3369-3372 - Xiaolin Ren, Xin He, Yaxin Zhang:
Mandarin/English mixed-lingual name recognition for mobile phone. 3373-3376 - Javier Ferreiros, Rubén San Segundo, Fernando Fernández Martínez, Luis Fernando D'Haro, Valentín Sama, Roberto Barra-Chicote, Pedro Mellén:
New word-level and sentence-level confidence scoring using graph theory calculus and its evaluation on speech understanding. 3377-3380 - Masanobu Nakamura, Koji Iwano, Sadaoki Furui:
Analysis of spectral space reduction in spontaneous speech and its effects on speech recognition performances. 3381-3384 - Simon King, Chris D. Bartels, Jeff A. Bilmes:
SVitchboard 1: small vocabulary tasks from Switchboard. 3385-3388
Discourse and Dialogue I, II
- Wieneke Wesseling, R. J. J. H. van Son:
Timing of experimentally elicited minimal responses as quantitative evidence for the use of intonation in projecting TRPs. 3389-3392 - Shinya Yamada, Toshihiko Itoh, Kenji Araki:
Linguistic and acoustic features depending on different situations - the experiments considering speech recognition rate. 3393-3396 - Dirk Bühler, Stefan W. Hamerich:
Towards voiceXML compilation for portable embedded applications in ubiquitous environments. 3397-3400 - Eva Strangert:
Prosody in public speech: analyses of a news announcement and a Political interview. 3401-3404 - Tanveer A. Faruquie, Pankaj Kankar, Nitendra Rajput, Abhishek Verma:
An architecture for pluggable disambiguation mechanism for RDC based voice applications. 3409-3412 - Nitendra Rajput, Amit Anil Nanavati, Abhishek Kumar, Neeraj Chaudhary:
Adapting dialog call-flows for pervasive devices. 3413-3416 - Ulf Krum, Hartwig Holzapfel, Alex Waibel:
Clarification questions to improve dialogue flow and speech recognition in spoken dialogue systems. 3417-3420 - Fernando Fernández Martínez, Javier Ferreiros, Valentín Sama, Juan Manuel Montero, Rubén San Segundo, Javier Macías Guarasa, Rafael García:
Speech interface for controlling an hi-fi audio system based on a Bayesian belief networks approach for dialog modeling. 3421-3424
Spoken Language Understanding I, II
- Matthias Thomae, Tibor Fábián, Robert Lieb, Günther Ruske:
Hierarchical language models for one-stage speech interpretation. 3425-3428 - Nick J.-C. Wang:
Spoken language understanding using layered n-gram modeling. 3429-3432 - Mihai Surdeanu, Jordi Turmo, Eli Comelles:
Named entity recognition from spontaneous open-domain speech. 3433-3436 - Imed Zitouni, Hui Jiang, Qiru Zhou:
Discriminative training and support vector machine for natural language call routing. 3437-3440 - Jihyun Eun, Minwoo Jeong, Gary Geunbae Lee:
A multiple classifier-based concept-spotting approach for robust spoken language understanding. 3441-3444 - Robert Lieb, Matthias Thomae, Günther Ruske, Daniel Bobbert, Frank Althoff:
A flexible and integrated interface between speech recognition, speech interpretation and dialog management. 3445-3448 - Tomohiro Ohno, Shigeki Matsubara, Hideki Kashioka, Naoto Kato, Yasuyoshi Inagaki:
Incremental dependency parsing of Japanese spoken monologue based on clause boundaries. 3449-3452 - Atsushi Sako, Tetsuya Takiguchi, Yasuo Ariki:
Situation based speech recognition for structuring baseball live games. 3453-3456 - Hélène Bonneau-Maynard, Sophie Rosset, Christelle Ayache, Anne Kuhn, Djamel Mostefa:
Semantic annotation of the French media dialog corpus. 3457-3460 - Ralf Engel:
Robust and efficient semantic parsing of free word order languages in spoken dialogue systems. 3461-3464 - Catherine Kobus, Géraldine Damnati, Lionel Delphin-Poulat, Renato de Mori:
Conceptual language model design for spoken language understanding. 3465-3468 - Luís Seabra Lopes, António J. S. Teixeira, Marcelo Quinderé, Mário Rodrigues:
From robust spoken language understanding to knowledge acquisition and management. 3469-3472 - Cheng Wu, Xiang Li, Hong-Kwang Jeff Kuo, E. E. Jan, Vaibhava Goel, David M. Lubensky:
Improving end-to-end performance of call classification through data confusion reduction and model tolerance enhancement. 3473-3476
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.