default search action
INTERSPEECH 2010: Makuhari, Japan
- Takao Kobayashi, Keikichi Hirose, Satoshi Nakamura:
11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, Chiba, Japan, September 26-30, 2010. ISCA 2010
Keynotes
- Steve J. Young:
Still talking to machines (cognitively speaking). 1-10 - Tohru Ifukube:
Sound-based assistive technology supporting "seeing", "hearing" and "speaking" for the disabled and the elderly. 11-19 - Chiu-yu Tseng:
Beyond sentence prosody. 20-29
Special Session: Models of Speech - In Search of Better Representations
- Hosung Nam, Vikramjit Mitra, Mark Tiede, Elliot Saltzman, Louis Goldstein, Carol Y. Espy-Wilson, Mark Hasegawa-Johnson:
A procedure for estimating gestural scores from natural speech. 30-33 - Yen-Liang Shue, Gang Chen, Abeer Alwan:
On the interdependencies between voice quality, glottal gaps, and voice-source related acoustic measures. 34-37 - Hideki Kawahara, Masanori Morise, Toru Takahashi, Hideki Banno, Ryuichi Nisimura, Toshio Irino:
Simplification and extension of non-periodic excitation source representations for high-quality speech manipulation systems. 38-41 - Sadao Hiroya, Takemi Mochida:
Phase equalization-based autoregressive model of speech signals. 42-45 - Yi Xu, Santitham Prom-on:
Articulatory-functional modeling of speech prosody: a review. 46-49 - Humberto M. Torres, Hansjörg Mixdorff, Jorge A. Gurlekian, Hartmut R. Pfitzinger:
Two new estimation methods for a superpositional intonation model. 50-53
ASR: Acoustic Models I-III
- Simon Wiesler, Georg Heigold, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney:
A discriminative splitting criterion for phonetic decision trees. 54-57 - Mark J. F. Gales, Kai Yu:
Canonical state models for automatic speech recognition. 58-61 - Pierre L. Dognin, John R. Hershey, Vaibhava Goel, Peder A. Olsen:
Restructuring exponential family mixture models. 62-65 - Françoise Beaufays, Vincent Vanhoucke, Brian Strope:
Unsupervised discovery and training of maximally dissimilar cluster models. 66-69 - Khe Chai Sim:
Probabilistic state clustering using conditional random field for context-dependent acoustic modelling. 70-73 - Xie Sun, Yunxin Zhao:
Integrate template matching and statistical modeling for speech recognition. 74-77 - George Saon, Hagen Soltau:
Boosting systems for LVCSR. 1341-1344 - Vaibhava Goel, Tara N. Sainath, Bhuvana Ramabhadran, Peder A. Olsen, David Nahamoo, Dimitri Kanevsky:
Incorporating sparse representation phone identification features in automatic speech recognition using exponential families. 1345-1348 - Xin Chen, Yunxin Zhao:
Integrating MLP features and discriminative training in data sampling based ensemble acoustic modeling. 1349-1352 - Jui-Ting Huang, Mark Hasegawa-Johnson:
Semi-supervised training of Gaussian mixture models by conditional entropy minimization. 1353-1356 - Guangchuan Shi, Yu Shi, Qiang Huo:
A study of irrelevant variability normalization based training and unsupervised online adaptation for LVCSR. 1357-1360 - Roger Hsiao, Florian Metze, Tanja Schultz:
Improvements to generalized discriminative feature transformation for speech recognition. 1361-1364 - Karel Veselý, Lukás Burget, Frantisek Grézl:
Parallel training of neural networks for speech recognition. 2934-2937 - Rita Singh, Benjamin Lambert, Bhiksha Raj:
The use of sense in unsupervised training of acoustic models for ASR systems. 2938-2941 - Jun Du, Yu Hu, Hui Jiang:
Boosted mixture learning of Gaussian mixture HMMs for speech recognition. 2942-2945 - Volker Leutnant, Reinhold Haeb-Umbach:
On the exploitation of hidden Markov models and linear dynamic models in a hybrid decoder architecture for continuous speech recognition. 2946-2949 - Alberto Abad, Thomas Pellegrini, Isabel Trancoso, João Paulo Neto:
Context dependent modelling approaches for hybrid speech recognizers. 2950-2953 - Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi:
A regularized discriminative training method of acoustic models derived by minimum relative entropy discrimination. 2954-2957 - Hank Liao, Christopher Alberti, Michiel Bacchiani, Olivier Siohan:
Decision tree state clustering with word and syllable features. 2958-2961 - Hiroshi Fujimura, Takashi Masuko, Mitsuyoshi Tachimori:
A duration modeling technique with incremental speech rate normalization. 2962-2965 - Martin Wöllmer, Yang Sun, Florian Eyben, Björn W. Schuller:
Long short-term memory networks for noise robust speech recognition. 2966-2969 - Tsuneo Nitta, Takayuki Onoda, Masashi Kimura, Yurie Iribe, Kouichi Katsurada:
One-model speech recognition and synthesis based on articulatory movement HMMs. 2970-2973 - Xiaodong Cui, Jian Xue, Pierre L. Dognin, Upendra V. Chaudhari, Bowen Zhou:
Acoustic modeling with bootstrap and restructuring for low-resourced languages. 2974-2977 - Tetsuo Kosaka, Keisuke Goto, Takashi Ito, Masaharu Katoh:
Lecture speech recognition by combining word graphs of various acoustic models. 2978-2981 - Khe Chai Sim, Shilin Liu:
Semi-parametric trajectory modelling using temporally varying feature mapping for speech recognition. 2982-2985 - Dong Yu, Li Deng:
Deep-structured hidden conditional random fields for phonetic recognition. 2986-2989 - Jonathan Malkin, Jeff A. Bilmes:
Semi-supervised learning for improved expression of uncertainty in discriminative classifiers. 2990-2993 - Peder A. Olsen, Vaibhava Goel, Charles A. Micchelli, John R. Hershey:
Modeling posterior probabilities using the linear exponential family. 2994-2997
Spoken Dialogue Systems I, II
- Fabrice Lefèvre, François Mairesse, Steve J. Young:
Cross-lingual spoken language understanding from unaligned data using discriminative classification models and machine translation. 78-81 - Rajesh Balchandran, Leonid Rachevsky, Bhuvana Ramabhadran, Miroslav Novak:
Techniques for topic detection based processing in spoken dialog systems. 82-85 - Senthilkumar Chandramohan, Matthieu Geist, Olivier Pietquin:
Optimizing spoken dialogue management with fitted value iteration. 86-89 - Filip Jurcícek, Blaise Thomson, Simon Keizer, François Mairesse, Milica Gasic, Kai Yu, Steve J. Young:
Natural belief-critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems. 90-93 - Alexander Schmitt, Michael Scholz, Wolfgang Minker, Jackson Liscombe, David Suendermann:
Is it possible to predict task completion in automated troubleshooters?. 94-97 - David Suendermann, Jackson Liscombe, Roberto Pieraccini:
Minimally invasive surgery for spoken dialog systems. 98-101
Spoken Dialogue Systems II
- Ramón López-Cózar, David Griol:
New technique to enhance the performance of spoken dialogue systems based on dialogue states-dependent language models and grammatical rules. 2998-3001 - Lluís F. Hurtado, Joaquin Planells, Encarna Segarra, Emilio Sanchis, David Griol:
A stochastic finite-state transducer approach to spoken dialog management. 3002-3005 - Romain Laroche, Philippe Bretier, Ghislain Putois:
Enhanced monitoring tools and online dialogue optimisation merged into a new spoken dialogue system design experience. 3006-3009 - Romain Laroche, Ghislain Putois, Philippe Bretier:
Optimising a handcrafted dialogue system design. 3010-3013 - Felix Putze, Tanja Schultz:
Utterance selection for speech acts in a cognitive tourguide scenario. 3014-3017 - Gabriel Parent, Maxine Eskénazi:
Lexical entrainment of real users in the let's go spoken dialog system. 3018-3021 - Silvia Quarteroni, Meritxell González, Giuseppe Riccardi, Sebastian Varges:
Combining user intention and error modeling for statistical dialog simulators. 3022-3025 - Jaakko Hakulinen, Markku Turunen, Raúl Santos de la Cámara, Nigel T. Crook:
Parallel processing of interruptions and feedback in companions affective dialogue system. 3026-3029 - Antoine Raux, Neville Mehta, Deepak Ramachandran, Rakesh Gupta:
Dynamic language modeling using Bayesian networks for spoken dialog systems. 3030-3033 - Sunao Hara, Norihide Kitaoka, Kazuya Takeda:
Automatic detection of task-incompleted dialog for spoken dialog system based on dialog act n-gram. 3034-3037 - Wei-Bin Liang, Chung-Hsien Wu, Yu-Cheng Hsiao:
Dialogue act detection in error-prone spoken dialogue systems using partial sentence tree and latent dialogue act matrix. 3038-3041 - Tatsuya Kawahara, Kouhei Sumi, Zhi-Qiang Chang, Katsuya Takanashi:
Detection of hot spots in poster conversations based on reactive tokens of audience. 3042-3045 - Yoichi Matsuyama, Shinya Fujie, Hikaru Taniyama, Tetsunori Kobayashi:
Psychological evaluation of a group communication activation robot in a party game. 3046-3049 - Kyoko Matsuyama, Kazunori Komatani, Ryu Takeda, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno:
Analyzing user utterances in barge-in-able spoken dialogue system for improving identification accuracy. 3050-3053 - Mattias Heldner, Jens Edlund, Julia Hirschberg:
Pitch similarity in the vicinity of backchannels. 3054-3057 - Khiet P. Truong, Ronald Poppe, Dirk Heylen:
A rule-based backchannel prediction model using pitch and pause information. 3058-3061
Speech Perception: Factors Influencing Perception
- Paul Boersma, Katerina Chládková:
Detecting categorical perception in continuous discrimination data. 102-105 - Titia Benders, Paola Escudero:
The interrelation between the stimulus range and the number of response categories in vowel categorization. 106-109 - Marie Nilsenová, Martijn Goudbeek, Luuk Kempen:
The relation between pitch perception preference and emotion identification. 110-113 - Takashi Otake, James M. McQueen, Anne Cutler:
Competition in the perception of spoken Japanese words. 114-117 - Makiko Sadakata, Lotte van der Zanden, Kaoru Sekiyama:
Influence of musical training on perception of L2 speech. 118-121 - Donald Derrick, Bryan Gick:
Full body aero-tactile integration in speech perception. 122-125
Prosody: Models
- Tomás Dubeda, Katalin Mády:
Nucleus position within the intonation phrase: a typological study of English, Czech and Hungarian. 126-129 - Yong-cheol Lee, Satoshi Nambu:
Focus-sensitive operator or focus inducer: always and only. 130-133 - Jiahong Yuan, Mark Liberman:
F0 declination in English and Mandarin broadcast news speech. 134-137 - Katrin Schweitzer, Michael Walsh, Bernd Möbius, Hinrich Schütze:
Frequency of occurrence effects on pitch accent realisation. 138-141 - César González Ferreras, Carlos Vivaracho-Pascual, David Escudero Mancebo, Valentín Cardeñoso-Payo:
On the automatic toBI accent type identification from data. 142-145 - Andrew Rosenberg:
AutoBI - a tool for automatic toBI annotation. 146-149
Speech Synthesis: Unit Selection and Others
- Volker Strom, Simon King:
A classifier-based target cost for unit selection speech synthesis trained on perceptual data. 150-153 - Wei Zhang, Xiaodong Cui:
Applying scalable phonetic context similarity in unit selection of concatenative text-to-speech. 154-157 - Mitsuaki Isogai, Hideyuki Mizuno:
Speech database reduction method for corpus-based TTS system. 158-161 - Heng Lu, Zhen-Hua Ling, Si Wei, Li-Rong Dai, Ren-Hua Wang:
Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier. 162-165 - Hanna Silén, Elina Helander, Jani Nurminen, Konsta Koppinen, Moncef Gabbouj:
Using robust viterbi algorithm and HMM-modeling in unit selection TTS to replace units of poor quality. 166-169 - Yeon-Jun Kim, Marc C. Beutnagel:
Automatic detection of abnormal stress patterns in unit selection synthesis. 170-173 - Daniel Tihelka, Jirí Kala, Jindrich Matousek:
Enhancements of viterbi search for fast unit selection synthesis. 174-177 - Thomas Ewender, Beat Pfister:
Accurate pitch marking for prosodic modification of speech segments. 178-181 - Shifeng Pan, Meng Zhang, Jianhua Tao:
A novel hybrid approach for Mandarin speech synthesis. 182-185 - Josafá de Jesus Aguiar Pontes, Sadaoki Furui:
Modeling liaison in French by using decision trees. 186-189 - Jian Luan, Jian Li:
Improvement on plural unit selection and fusion. 190-193 - Alok Parlikar, Alan W. Black, Stephan Vogel:
Improving speech synthesis of machine translation output. 194-197 - Ghislain Putois, Jonathan Chevelu, Cédric Boidin:
Paraphrase generation to improve text-to-speech synthesis. 198-201
ASR: Search, Decoding and Confidence Measures I, II
- Chang Woo Han, Shin Jae Kang, Chul Min Lee, Nam Soo Kim:
Phone mismatch penalty matrices for two-stage keyword spotting via multi-pass phone recognizer. 202-205 - Petr Motlícek, Fabio Valente, Philip N. Garner:
English spoken term detection in multilingual recordings. 206-209 - Icksang Han, Chiyoun Park, Jeongmi Cho, Jeongsu Kim:
A hybrid approach to robust word lattice generation via acoustic-based word detection. 210-213 - Volker Steinbiss, Martin Sundermeyer, Hermann Ney:
Direct observation of pruning errors (DOPE): a search analysis tool. 214-217 - David Rybach, Michael Riley:
Direct construction of compact context-dependency transducers from data. 218-221 - Miroslav Novak:
Incremental composition of static decoding graphs with label pushing. 222-225 - Zhanlei Yang, Wenju Liu:
A novel path extension framework using steady segment detection for Mandarin speech recognition. 226-229 - Ralf Schlüter, Markus Nußbaum-Thom, Hermann Ney:
On the relation of Bayes risk, word error, and word posteriors in ASR. 230-233 - David Nolden, Hermann Ney, Ralf Schlüter:
Time conditioned search in automatic speech recognition reconsidered. 234-237 - Satoshi Kobashikawa, Taichi Asami, Yoshikazu Yamaguchi, Hirokazu Masataki, Satoshi Takahashi:
Efficient data selection for speech recognition based on prior confidence estimation using speech and context independent models. 238-241 - Atsunori Ogawa, Atsushi Nakamura:
A novel confidence measure based on marginalization of jointly estimated error cause probabilities. 242-245 - Julien Fayolle, Fabienne Moreau, Christian Raymond, Guillaume Gravier, Patrick Gros:
CRF-based combination of contextual features to improve a posteriori word-level confidence measures. 1942-1945 - Martin Wöllmer, Florian Eyben, Björn W. Schuller, Gerhard Rigoll:
Recognition of spontaneous conversational speech using long short-term memory phoneme predictions. 1946-1949 - Thomas Pellegrini, Isabel Trancoso:
Improving ASR error detection with non-decoder based features. 1950-1953 - Ladan Golipour, Douglas D. O'Shaughnessy:
Phoneme classification and lattice rescoring based on a k-NN approach. 1954-1957 - Jeff A. Bilmes, Hui Lin:
Online adaptive learning for speech recognition decoding. 1958-1961 - Takaaki Hori, Shinji Watanabe, Atsushi Nakamura:
Improvements of search error risk minimization in viterbi beam search for speech recognition. 1962-1965
Special-Purpose Speech Applications
- Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore, Sergey I. Rybchenko:
Evaluation of a silent speech interface based on magnetic sensing. 246-249 - Rubén San Segundo, Verónica López-Ludeña, Raquel Martín, Syaheerah L. Lutfi, Javier Ferreiros, Ricardo de Córdoba, José Manuel Pardo:
Advanced speech communication system for deaf people. 250-253 - Sethserey Sam, Eric Castelli, Laurent Besacier:
Unsupervised acoustic model adaptation for multi-origin non native ASR. 254-257 - Dilek Hakkani-Tür, Dimitra Vergyri, Gökhan Tür:
Speech-based automated cognitive status assessment. 258-261 - Toru Imai, Shinichi Homma, Akio Kobayashi, Takahiro Oku, Shoei Sato:
Speech recognition with a seamlessly updated language model for real-time closed-captioning. 262-265 - Takuya Nishimoto, Takayuki Watanabe:
The comparison between the deletion-based methods and the mixing-based methods for audio CAPTCHA systems. 266-269 - Martine Adda-Decker, Lori Lamel, Natalie D. Snoeren:
Comparing mono- & multilingual acoustic seed models for a low e-resourced language: a case-study of luxembourgish. 270-273 - R. J. J. H. van Son, Irene Jacobi, Frans J. M. Hilgers:
Manipulating treacheoesophageal speech. 274-277 - David Imseng, Hervé Bourlard, Mathew Magimai-Doss:
Towards mixed language speech recognition systems. 278-281 - Etienne Barnard, Johan Schalkwyk, Charl Johannes van Heerden, Pedro J. Moreno:
Voice search for development. 282-285 - Gina-Anne Levow, Susan Duncan, Edward T. King:
Cross-cultural investigation of prosody in verbal feedback in interactional rapport. 286-289 - Mary Tai Knox, Gerald Friedland:
Multimodal speaker diarization using oriented optical flow histograms. 290-293 - Catherine Middag, Yvan Saeys, Jean-Pierre Martens:
Towards an ASR-free objective analysis of pathological speech. 294-297
Speech Analysis
- Keith W. Godin, John H. L. Hansen:
Session variability contrasts in the MARP corpus. 298-301 - Kazuhiro Kondo, Yusuke Takano:
Estimation of two-to-one forced selection intelligibility scores by speech recognizers using noise-adapted models. 302-305 - Thomas Schaaf, Florian Metze:
Analysis of gender normalization using MLP and VTLN features. 306-309 - Guillaume Aimetti, Roger K. Moore, Louis ten Bosch:
Discovering an optimal set of minimally contrasting acoustic speech units: a point of focus for whole-word pattern matching. 310-313 - Themos Stafylakis, Xavier Anguera:
Improvements to the equal-parameter BIC for speaker diarization. 314-317 - Nima Mesgarani, Samuel Thomas, Hynek Hermansky:
A multistream multiresolution framework for phoneme recognition. 318-321 - Giampiero Salvi, Fabio Tesser, Enrico Zovato, Piero Cosi:
Cluster analysis of differential spectral envelopes on emotional speech. 322-325 - Samuel R. Bowman, Karen Livescu:
Modeling pronunciation variation with context-dependent articulatory feature decision trees. 326-329 - Bhiksha Raj, Kevin W. Wilson, Alexander Krueger, Reinhold Haeb-Umbach:
Ungrounded independent non-negative factor analysis. 330-333 - John R. Hershey, Peder A. Olsen, Steven J. Rennie:
Signal interaction and the devil function. 334-337
Systems for LVCSR
- Yuya Akita, Masato Mimura, Graham Neubig, Tatsuya Kawahara:
Semi-automated update of automatic transcription system for the Japanese national congress. 338-341 - Xunying Liu, Mark J. F. Gales, Philip C. Woodland:
Language model cross adaptation for LVCSR system combination. 342-345 - Shinji Watanabe, Takaaki Hori, Atsushi Nakamura:
Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data. 346-349 - Pavel Kveton, Miroslav Novak:
Accelerating hierarchical acoustic likelihood computation on graphics processors. 350-353 - Jiulong Shan, Genqing Wu, Zhihong Hu, Xiliu Tang, Martin Jansche, Pedro J. Moreno:
Search by voice in Mandarin Chinese. 354-357 - Thomas Hain, Lukás Burget, John Dines, Philip N. Garner, Asmaa El Hannani, Marijn Huijbregts, Martin Karafiát, Mike Lincoln, Vincent Wan:
The AMIDA 2009 meeting transcription system. 358-361
Speaker Characterization and Recognition I-IV
- William M. Campbell, Zahi N. Karam:
Simple and efficient speaker comparison using approximate KL divergence. 362-365 - Hanwu Sun, Bin Ma, Chien-Lin Huang, Trung Hieu Nguyen, Haizhou Li:
The IIR NIST SRE 2008 and 2010 summed channel speaker recognition systems. 366-369 - Chien-Lin Huang, Hanwu Sun, Bin Ma, Haizhou Li:
Speaker characterization using long-term and temporal information. 370-373 - Sergio Perez-Gomez, Daniel Ramos, Javier Gonzalez-Dominguez, Joaquin Gonzalez-Rodriguez:
Score-level compensation of extreme speech duration variability in speaker verification. 374-377 - Alberto Abad, Isabel Trancoso:
Speaker recognition experiments using connectionist transformation network features. 378-381 - Yun Lei, John H. L. Hansen:
Speaker recognition using supervised probabilistic principal component analysis. 382-385 - Benjamin Bigot, Julien Pinquier, Isabelle Ferrané, Régine André-Obrecht:
Looking for relevant features for speaker role recognition. 1057-1060 - Marcel Kockmann, Lukás Burget, Ondrej Glembek, Luciana Ferrer, Jan Cernocký:
Prosodic speaker verification using subspace multinomial models with intersession compensation. 1061-1064 - Eryu Wang, Kong-Aik Lee, Bin Ma, Haizhou Li, Wu Guo, Li-Rong Dai:
The estimation and kernel metric of spectral correlation for text-independent speaker verification. 1065-1068 - Rahim Saeidi, Pejman Mowlaee, Tomi Kinnunen, Zheng-Hua Tan, Mads Græsbøll Christensen, Søren Holdt Jensen, Pasi Fränti:
Improving monaural speaker identification by double-talk detection. 1069-1072 - B. Avinash, S. Guruprasad, B. Yegnanarayana:
Exploring subsegmental and suprasegmental features for a text-dependent speaker verification in distant speech signals. 1073-1076 - Qingsong Liu, Wei Huang, Dongxing Xu, Hongbin Cai, Beiqian Dai:
A fast implementation of factor analysis for speaker verification. 1077-1080 - Ce Zhang, Rong Zheng, Bo Xu:
An investigation into direct scoring methods without SVM training in speaker verification. 1437-1440 - Reda Jourani, Khalid Daoudi, Régine André-Obrecht, Driss Aboutajdine:
Large margin Gaussian mixture models for speaker identification. 1441-1444 - Rong Zheng, Bo Xu:
On the use of Gaussian component information in the generative likelihood ratio estimation for speaker verification. 1445-1448 - Man-Wai Mak, Wei Rao:
Acoustic vector resampling for GMMSVM-based speaker verification. 1449-1452 - Konstantin Biatov:
A fast speaker indexing using vector quantization and second order statistics with adaptive threshold computation. 1453-1456 - Gang Wang, Xiaojun Wu, Thomas Fang Zheng:
Using phoneme recognition and text-dependent speaker verification to improve speaker segmentation for Chinese speech. 1457-1460 - Claudio Garretón, Néstor Becerra Yoma:
On enhancing feature sequence filtering with filter-bank energy transformation in speaker verification with telephone speech. 1461-1464 - Donglai Zhu, Bin Ma, Kong-Aik Lee, Cheung-Chi Leung, Haizhou Li:
MAP estimation of subspace transform for speaker recognition. 1465-1468 - Ayeh Jafari, Ramji Srinivasan, Danny Crookes, Ji Ming:
A longest matching segment approach for text-independent speaker recognition. 1469-1472 - Ville Hautamäki, Tomi Kinnunen, Mohaddeseh Nosratighods, Kong-Aik Lee, Bin Ma, Haizhou Li:
Approaching human listener accuracy with modern speaker verification. 1473-1476 - Jouni Pohjalainen, Rahim Saeidi, Tomi Kinnunen, Paavo Alku:
Extended weighted linear prediction (XLP) analysis of speech and its application to speaker verification in adverse conditions. 1477-1480 - Guoli Ye, Brian Mak:
The use of subvector quantization and discrete densities for fast GMM computation for speaker verification. 1481-1484 - Fred S. Richardson, Joseph P. Campbell:
Transcript-dependent speaker recognition using mixer 1 and 2. 2102-2105 - Thomas Drugman, Thierry Dutoit:
On the potential of glottal signatures for speaker recognition. 2106-2109 - R. Padmanabhan, Hema A. Murthy:
Acoustic feature diversity and speaker verification. 2110-2113 - Omid Dehzangi, Bin Ma, Engsiong Chng, Haizhou Li:
A discriminative performance metric for GMM-UBM speaker identification. 2114-2117 - Xavier Anguera, Jean-François Bonastre:
A novel speaker binary key derived from anchor models. 2118-2121 - Weiqiang Zhang, Yan Deng, Liang He, Jia Liu:
Variant time-frequency cepstral features for speaker recognition. 2122-2125 - Ning Wang, P. C. Ching, Tan Lee:
Exploitation of phase information for speaker recognition. 2126-2129 - Yanhua Long, Li-Rong Dai, Bin Ma, Wu Guo:
Effects of the phonological relevance in speaker verification. 2130-2133 - Gabriel Hernández Sierra, Jean-François Bonastre, Driss Matrouf, José R. Calvo:
Topological representation of speech for speaker recognition. 2134-2137 - Seyed Omid Sadjadi, John H. L. Hansen:
Assessment of single-channel speech enhancement techniques for speaker identification under mismatched conditions. 2138-2141 - Xiang Zhang, Chuan Cao, Lin Yang, Hongbin Suo, Jianping Zhang, Yonghong Yan:
Speaker recognition using the resynthesized speech via spectrum modeling. 2142-2145
Source Separation
- Robert Peharz, Michael Stark, Franz Pernkopf, Yannis Stylianou:
A factorial sparse coder model for single channel source separation. 386-389 - Yasmina Benabderrahmane, Sid-Ahmed Selouani, Douglas D. O'Shaughnessy:
Oriented PCA method for blind speech separation of convolutive mixtures. 390-393 - Hsin-Lung Hsieh, Jen-Tzung Chien:
Online Gaussian process for nonstationary speech separation. 394-397 - Meng Yu, Wenye Ma, Jack Xin, Stanley J. Osher:
Convexity and fast speech extraction by split bregman method. 398-401 - Wenye Ma, Meng Yu, Jack Xin, Stanley J. Osher:
Reducing musical noise in blind source separation by time-domain sparse filters and split bregman method. 402-405 - John Woodruff, Rohit Prabhavalkar, Eric Fosler-Lussier, DeLiang Wang:
Combining monaural and binaural evidence for reverberant speech segregation. 406-409
Speech Synthesis: HMM-Based Speech Synthesis I, II
- Heiga Zen:
Speaker and language adaptive training for HMM-based polyglot speech synthesis. 410-413 - Kai Yu, Heiga Zen, François Mairesse, Steve J. Young:
Context adaptive training with factorized decision trees for HMM-based speech synthesis. 414-417 - Junichi Yamagishi, Oliver Watts, Simon King, Bela Usabaev:
Roles of the average voice in speaker-adaptive HMM-based speech synthesis. 418-421 - Yao Qian, Zhi-Jie Yan, Yi-Jian Wu, Frank K. Soong, Xin Zhuang, Shengyi Kong:
An HMM trajectory tiling (HTT) approach to high quality TTS. 422-425 - Yining Chen, Zhi-Jie Yan, Frank K. Soong:
A perceptual study of acceleration parameters in HMM-based TTS. 426-429 - Shuji Yokomizo, Takashi Nose, Takao Kobayashi:
Evaluation of prosodic contextual factors for HMM-based speech synthesis. 430-433 - Slava Shechtman, Alexander Sorin:
Sinusoidal model parameterization for HMM-based TTS system. 805-808 - Yoshinori Shiga, Tomoki Toda, Shinsuke Sakai, Hisashi Kawai:
Improved training of excitation for HMM-based parametric speech synthesis. 809-812 - June Sig Sung, Doo Hwa Hong, Kyung Hwan Oh, Nam Soo Kim:
Excitation modeling based on waveform interpolation for HMM-based speech synthesis. 813-816 - Xin Zhuang, Yao Qian, Frank K. Soong, Yi-Jian Wu, Bo Zhang:
Formant-based frequency warping for improving speaker adaptation in HMM TTS. 817-820 - Hongwei Hu, Martin J. Russell:
Improved modelling of speech dynamics using non-linear formant trajectories for HMM-based speech synthesis. 821-824 - Zhen-Hua Ling, Yu Hu, Li-Rong Dai:
Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis. 825-828 - Matt Shannon, William Byrne:
Autoregressive clustering for HMM speech synthesis. 829-832 - Nicholas Pilkington, Heiga Zen:
An implementation of decision tree-based context clustering on graphics processing units. 833-836 - Alexander Gutkin, Xavi Gonzalvo, Stefan Breuer, Paul Taylor:
Quantized HMMs for low footprint text-to-speech synthesis. 837-840 - Oliver Watts, Junichi Yamagishi, Simon King:
The role of higher-level linguistic features in HMM-based speech synthesis. 841-844 - Ayami Mase, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:
HMM-based singing voice synthesis system using pitch-shifted pseudo training data. 845-848 - Jinfu Ni, Hisashi Kawai:
An unsupervised approach to creating web audio contents-based HMM voices. 849-852 - Tomoki Koriyama, Takashi Nose, Takao Kobayashi:
Conversational spontaneous speech synthesis using average voice model. 853-856
Multi-Modal Signal Processing
- Jonas Hörnstein, José Santos-Victor:
Learning words and speech units through natural interactions. 434-437 - Qingju Liu, Wenwu Wang, Philip J. B. Jackson:
Bimodal coherence based scale ambiguity cancellation for target speech extraction and enhancement. 438-441 - Hiroaki Kawashima, Yu Horii, Takashi Matsuyama:
Speech estimation in non-stationary noise environments using timing structures between mouth movements and sound signals. 442-445 - Lijuan Wang, Xiaojun Qian, Wei Han, Frank K. Soong:
Synthesizing photo-real talking head via trajectory-guided sample selection. 446-449 - Victoria M. Florescu, Lise Crevier-Buchman, Bruce Denby, Thomas Hueber, Antonia Colazo-Simon, Claire Pillot-Loiseau, Pierre Roussel-Ragot, Cédric Gendrot, Sophie Quattrocchi:
Silent vs vocalized articulation for a portable ultrasound-based silent speech interface. 450-453 - Gregor Hofer, Korin Richmond:
Comparison of HMM and TMDN methods for lip synchronisation. 454-457
Paralanguage
- Florian Schiel, Christian Heinrich, Veronika Neumeyer:
Rhythm and formant features for automatic alcohol detection. 458-461 - Irena Yanushevskaya, Christer Gobl, John Kane, Ailbhe Ní Chasaide:
An exploration of voice source correlates of focus. 462-465 - James D. Harnsberger, Rahul Shrivastav, W. S. Brown Jr.:
Modeling perceived vocal age in american English. 466-469 - Marie-José Caraty, Claude Montacié:
Multivariate analysis of vocal fatigue in continuous reading. 470-473 - Alexander Kain, Jan P. H. van Santen:
Frequency-domain delexicalization using surrogate vowels. 474-477 - Florian Metze, Anton Batliner, Florian Eyben, Tim Polzehl, Björn W. Schuller, Stefan Steidl:
Emotion recognition using imperfect speech recognition. 478-481 - Gang Liu, Yun Lei, John H. L. Hansen:
A novel feature extraction strategy for multi-stream robust emotion identification. 482-485 - Asterios Toutios, Utpala Musti, Slim Ouni, Vincent Colotte, Brigitte Wrobel-Dautcourt, Marie-Odile Berger:
Setup for acoustic-visual speech synthesis by concatenating bimodal units. 486-489 - Bart Jochems, Martha A. Larson, Roeland Ordelman, Ronald Poppe, Khiet P. Truong:
Towards affective state modeling in narrative and conversational settings. 490-493 - Narichika Nomoto, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi:
Detection of anger emotion in dialog speech using prosody feature and temporal relation of utterances. 494-497 - Benjamin Roustan, Marion Dohen:
Gesture and speech coordination: the influence of the relationship between manual gesture and speech. 498-501 - Hynek Boril, Seyed Omid Sadjadi, Tristan Kleinschmidt, John H. L. Hansen:
Analysis and detection of cognitive load and frustration in drivers' speech. 502-505 - Akira Sasou, Yasuharu Hashimoto, Katsuhiko Sakaue:
Acoustic-based recognition of head gestures accompanying speech. 506-509 - Sandro Castronovo, Angela Mahr, Margarita Pentcheva, Christian A. Müller:
Multimodal dialog in the car: combining speech and turn-and-push dial to control comfort functions. 510-513 - Danil Korchagin, Philip N. Garner, Petr Motlícek:
Hands free audio analysis from home entertainment. 514-517 - Shaikh Mostafa Al Masum, Antonio Rui Ferreira Rebordão, Keikichi Hirose:
Affective story teller: a TTS system for emotional expressivity. 518-521
ASR: Speaker Adaptation, Robustness Against Reverberation
- Shweta Ghai, Rohit Sinha:
Enhancing children's speech recognition under mismatched condition by explicit acoustic normalization. 522-525 - Bo Li, Khe Chai Sim:
Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems. 526-529 - Ravichander Vipperla, Steve Renals, Joe Frankel:
Augmentation of adaptation data. 530-533 - Lukás Machlica, Zbynek Zajíc, Ludek Müller:
Discriminative adaptation based on fast combination of DMAP and dfMLLR. 534-537 - Doddipatla Rama Sanand, Ralf Schlüter, Hermann Ney:
Revisiting VTLN using linear transformation on conventional MFCC. 538-541 - Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
Speaker adaptation based on nonlinear spectral transform for speech recognition. 542-545 - Tetsuo Kosaka, Takashi Ito, Masaharu Katoh, Masaki Kohda:
Speaker adaptation based on system combination using speaker-class models. 546-549 - Yongwon Jeong, Young Rok Song, Hyung Soon Kim:
Speaker adaptation in transformation space using two-dimensional PCA. 550-553 - Jan Trmal, Jan Zelinka, Ludek Müller:
On speaker adaptive training of artificial neural networks. 554-557 - Yongjun He, Jiqing Han:
Model synthesis for band-limited speech recognition. 558-561 - Takahiro Fukumori, Masanori Morise, Takanobu Nishiura:
Performance estimation of reverberant speech recognition based on reverberant criteria RSR-dn with acoustic parameters. 562-565 - Armin Sehr, Christian Hofmann, Roland Maas, Walter Kellermann:
A novel approach for matched reverberant training of HMMs using data pairs. 566-569 - Hari Krishna Maganti, Marco Matassoni:
An auditory based modulation spectral feature for reverberant speech recognition. 570-573 - Martin Wolf, Climent Nadeu:
On the potential of channel selection for recognition of reverberated speech with multiple microphones. 574-577 - Randy Gomez, Tatsuya Kawahara:
An improved wavelet-based dereverberation for robust automatic speech recognition. 578-581 - Rico Petrick, Thomas Fehér, Masashi Unoki, Rüdiger Hoffmann:
Methods for robust speech recognition in reverberant environments: a comparison. 582-585
Language Learning, TTS, and Other Applications
- Masayuki Suzuki, Yu Qiao, Nobuaki Minematsu, Keikichi Hirose:
Integration of multilayer regression analysis with structure-based pronunciation assessment. 586-589 - Joost van Doremalen, Catia Cucchiarini, Helmer Strik:
Using non-native error patterns to improve pronunciation verification. 590-593 - Dean Luo, Yu Qiao, Nobuaki Minematsu, Yutaka Yamauchi, Keikichi Hirose:
Regularized-MLLR speaker adaptation for computer-assisted language learning system. 594-597 - Kuniaki Hirabayashi, Seiichi Nakagawa:
Automatic evaluation of English pronunciation by Japanese speakers using various acoustic features and pattern recognition techniques. 598-601 - Hsien-Cheng Liao, Jiang-Chun Chen, Sen-Chia Chang, Ying-Hua Guan, Chin-Hui Lee:
Decision tree based tone modeling with corrective feedbacks for automatic Mandarin tone assessment. 602-605 - Jingli Lu, Ruili Wang, Liyanage C. De Silva, Yang Gao, Jia Liu:
CASTLE: a computer-assisted stress teaching and learning environment for learners of English as a second language. 606-609 - Shen Huang, Hongyan Li, Shijin Wang, Jiaen Liang, Bo Xu:
Automatic reference independent evaluation of prosody quality using multiple knowledge fusions. 610-613 - Su-Youn Yoon, Mark Hasegawa-Johnson, Richard Sproat:
Landmark-based automated pronunciation error detection. 614-617 - Zhiwei Shuang, Shiyin Kang, Yong Qin, Li-Rong Dai, Lianhong Cai:
HMM based TTS for mixed language text. 618-621 - Hui Liang, John Dines:
An analysis of language mismatch in HMM state mapping-based cross-lingual speaker adaptation. 622-625 - Tatsuya Kawahara, Norihiro Katsumaru, Yuya Akita, Shinsuke Mori:
Classroom note-taking system for hearing impaired students using automatic speech recognition adapted to lectures. 626-629 - Paul R. Dixon, Sadaoki Furui:
Exploring web-browser based runtimes engines for creating ubiquitous speech interfaces. 630-632
Pitch and Glottal-Waveform Estimation and Modeling I, II
- Xuejing Sun, Sameer Gadre:
Efficient three-stage pitch estimation for packet loss concealment. 633-636 - Keiichi Funaki:
On evaluation of the f0 estimation based on time-varying complex speech analysis. 637-640 - Feng Huang, Tan Lee:
Pitch estimation in noisy speech based on temporal accumulation of spectrum peaks. 641-644 - Tianyu T. Wang, Thomas F. Quatieri:
Multi-pitch estimation by a joint 2-d representation of pitch and pitch dynamics. 645-648 - Pirros Tsiakoulis, Alexandros Potamianos:
On the effect of fundamental frequency on amplitude and frequency modulation patterns in speech resonances. 649-652 - M. Shahidur Rahman, Tetsuya Shimamura:
Pitch determination using autocorrelation function in spectral domain. 653-656 - Thomas Drugman, Thierry Dutoit:
Chirp complex cepstrum-based decomposition for asynchronous glottal analysis. 657-660 - Alan Ó Cinnéide, David Dorran, Mikel Gainza, Eugene Coyle:
Exploiting glottal formant parameters for glottal inverse filtering and parameterization. 661-664 - Nicolas Sturmel, Christophe d'Alessandro, Boris Doval:
Glottal parameters estimation on speech using the zeros of the z-transform. 665-668 - Sri Harish Reddy Mallidi, Kishore Prahallad, Suryakanth V. Gangashetty, B. Yegnanarayana:
Significance of pitch synchronous analysis for speaker recognition using AANN models. 669-672 - Gang Chen, Xue Feng, Yen-Liang Shue, Abeer Alwan:
On using voice source measures in automatic gender classification of children's speech. 673-676 - Wei Chu, Abeer Alwan:
SAFE: a statistical algorithm for F0 estimation for both clean and noisy speech. 2590-2593 - Jung Ook Hong, Patrick J. Wolfe:
Robust and efficient pitch estimation using an iterative ARMA technique. 2594-2597 - Yasunori Ohishi, Hirokazu Kameoka, Daichi Mochihashi, Hidehisa Nagano, Kunio Kashino:
Statistical modeling of F0 dynamics in singing voices based on Gaussian processes with multiple oscillation bases. 2598-2601 - Martin Heckmann, Claudius Gläser, Frank Joublin, Kazuhiro Nakadai:
Applying geometric source separation for improved pitch extraction in human-robot interaction. 2602-2605 - John Kane, Mark Kane, Christer Gobl:
A spectral LF model based approach to voice source parameterisation. 2606-2609 - Thomas Drugman, Thierry Dutoit:
Glottal-based analysis of the lombard effect. 2610-2613
Open Vocabulary Spoken Document Retrieval (Special Session)
- Yoshiaki Itoh, Hiromitsu Nishizaki, Xinhui Hu, Hiroaki Nanjo, Tomoyosi Akiba, Tatsuya Kawahara, Seiichi Nakagawa, Tomoko Matsui, Yoichi Yamashita, Kiyoaki Aikawa:
Constructing Japanese test collections for spoken term detection. 677-680 - Satoshi Natori, Hiromitsu Nishizaki, Yoshihiro Sekiguchi:
Japanese spoken term detection using syllable transition network derived from multiple speech recognizers' outputs. 681-684 - Sha Meng, Weiqiang Zhang, Jia Liu:
Combining Chinese spoken term detection systems via side-information conditioned linear logistic regression. 685-688 - Taisuke Kaneko, Tomoyosi Akiba:
Metric subspace indexing for fast spoken term detection. 689-692 - Chun-an Chan, Lin-Shan Lee:
Unsupervised spoken-term detection with spoken queries using segment-based dynamic time warping. 693-696 - Daniel Schneider, Timo Mertens, Martha A. Larson, Joachim Köhler:
Contextual verification for open vocabulary spoken term detection. 697-700 - Javier Tejedor, Doroteo T. Toledano, Miguel Bautista, Simon King, Dong Wang, José Colás:
Augmented set of features for confidence estimation in spoken term detection. 701-704 - Xinhui Hu, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:
Cluster-based language model for spoken document retrieval using NMF-based document clustering. 705-708
Robust ASR
- Rogier C. van Dalen, Mark J. F. Gales:
Asymptotically exact noise-corrupted speech likelihoods. 709-712 - Ramón Fernandez Astudillo, Reinhold Orglmeister:
A MMSE estimator in mel-cepstral domain for robust large vocabulary automatic speech recognition using uncertainty propagation. 713-716 - Bhiksha Raj, Tuomas Virtanen, Sourish Chaudhuri, Rita Singh:
Non-negative matrix factorization based compensation of music for automatic speech recognition. 717-720 - Kris Demuynck, Xueru Zhang, Dirk Van Compernolle, Hugo Van hamme:
Feature versus model based noise robustness. 721-724 - Ji Hun Park, Seon Man Kim, Jae Sam Yoon, Hong Kook Kim, Sung Joo Lee, Yunkeun Lee:
SNR-based mask compensation for computational auditory scene analysis applied to speech recognition in a car environment. 725-728 - Chanwoo Kim, Richard M. Stern, Kiwan Eom, Jaewon Lee:
Automatic selection of thresholds for signal separation algorithms based on interaural delay. 729-732
Language and Dialect Identification
- Florian Verdet, Driss Matrouf, Jean-François Bonastre, Jean Hennebert:
Channel detectors for system fusion in the context of NIST LRE 2009. 733-736 - Rong Tong, Bin Ma, Haizhou Li, Engsiong Chng:
Selecting phonotactic features for language recognition. 737-740 - Abualsoud Hanani, Michael J. Carey, Martin J. Russell:
Improved language recognition using mixture components statistics. 741-744 - Mikel Peñagarikano, Amparo Varona, Luis Javier Rodríguez-Fuentes, Germán Bordel:
Using cross-decoder co-occurrences of phone n-grams in SVM-based phonotactic language recognition. 745-748 - Oscar Koller, Alberto Abad, Isabel Trancoso, Céu Viana:
Exploiting variety-dependent phones in portuguese variety identification applied to broadcast news transcription. 749-752 - Fadi Biadsy, Julia Hirschberg, Michael Collins:
Dialect recognition using a phone-GMM-supervector-based SVM kernel. 753-756
Technologies for Learning and Education
- Xiaojun Qian, Frank K. Soong, Helen M. Meng:
Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT). 757-760 - Liang-Yu Chen, Jyh-Shing Roger Jang:
Automatic pronunciation scoring using learning to rank and DP-based score segmentation. 761-764 - Wai Kit Lo, Shuang Zhang, Helen M. Meng:
Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system. 765-768 - Minh Duong, Jack Mostow:
Adapting a duration synthesis model to rate children's oral reading prosody. 769-772 - Su-Youn Yoon, Lei Chen, Klaus Zechner:
Predicting word accuracy for the automatic speech recognition of non-native speech. 773-776 - Taotao Zhu, Dengfeng Ke, Zhenbiao Chen, Bo Xu:
A new approach for automatic tone error detection in strong accented Mandarin based on dominant set. 777-780
Emotional Speech
- S. R. Mahadeva Prasanna, D. Govind:
Analysis of excitation source information in emotional speech. 781-784 - Dongrui Wu, Thomas D. Parsons, Shrikanth S. Narayanan:
Acoustic feature analysis in speech emotion primitives estimation. 785-788 - Lan-Ying Yeh, Tai-Shih Chi:
Spectro-temporal modulations for robust speech emotion recognition. 789-792 - Chi-Chun Lee, Matthew Black, Athanasios Katsamanis, Adam C. Lammert, Brian R. Baucom, Andrew Christensen, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Quantification of prosodic entrainment in affective spontaneous spoken interactions of married couples. 793-796 - Emily Mower, Kyu Jeong Han, Sungbok Lee, Shrikanth S. Narayanan:
A cluster-profile representation of emotion using agglomerative hierarchical clustering. 797-800 - Björn W. Schuller, Laurence Devillers:
Incremental acoustic valence recognition: an inter-corpus perspective on features, matching, and performance in a gating paradigm. 801-804
New Paradigms in ASR I, II
- Xiaodong Wang, Kunihiko Owa, Makoto Shozakai:
Mandarin digit recognition assisted by selective tone distinction. 857-860 - Kazuhiko Abe, Sakriani Sakti, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:
Brazilian portuguese acoustic model training based on data borrowing from other language. 861-864 - Ngoc Thang Vu, Tim Schlippe, Franziska Kraus, Tanja Schultz:
Rapid bootstrapping of five eastern european languages using the rapid language adaptation toolkit. 865-868 - Houwei Cao, Tan Lee, P. C. Ching:
Cross-lingual speaker adaptation via Gaussian component mapping. 869-872 - Mohamed Elmahdy, Rainer Gruhn, Wolfgang Minker, Slim Abdennadher:
Cross-lingual acoustic modeling for dialectal Arabic speech recognition. 873-876 - Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Cross-lingual and multi-stream posterior features for low resource LVCSR systems. 877-880 - Shiva Sundaram, Jerome R. Bellegarda:
Latent perceptual mapping: a new acoustic modeling framework for speech recognition. 881-884 - Richard Dufour, Fethi Bougares, Yannick Estève, Paul Deléglise:
Unsupervised model adaptation on targeted speech segments for LVCSR system combination. 885-888 - Irene Ayllón Clemente, Martin Heckmann, Alexander Denecke, Britta Wrede, Christian Goerick:
Incremental word learning using large-margin discriminative training and variance floor estimation. 889-892 - Tuomas Virtanen, Jort F. Gemmeke, Antti Hurmalainen:
State-based labelling for a sparse representation of speech and its application to robust speech recognition. 893-896 - Mirko Hannemann, Stefan Kombrink, Martin Karafiát, Lukás Burget:
Similarity scoring for recognizing repeated out-of-vocabulary words. 897-900 - Dino Seppi, Dirk Van Compernolle:
Data pruning for template-based automatic speech recognition. 901-904 - Man-Hung Siu, Herbert Gish, Arthur Chan, William Belfield:
Improved topic classification and keyword discovery using an HMM-based speech recognizer trained without supervision. 2838-2841 - Dimitri Kanevsky, Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo:
An analysis of sparseness and regularization in exemplar-based methods for speech classification. 2842-2845 - Abdel-rahman Mohamed, Dong Yu, Li Deng:
Investigation of full-sequence training of deep belief networks for speech recognition. 2846-2849 - Yow-Bang Wang, Lin-Shan Lee:
Mandarin tone recognition using affine-invariant prosodic features and tone posteriorgram. 2850-2853 - Geoffrey Zweig, Patrick Nguyen, Jasha Droppo, Alex Acero:
Continuous speech recognition with a TF-IDF acoustic model. 2854-2857 - Geoffrey Zweig, Patrick Nguyen:
SCARF: a segmental conditional random field toolkit for speech recognition. 2858-2861
Speech Production: Various Approaches
- Akiko Amano-Kusumoto, John-Paul Hosom, Alexander Kain:
Speaking style dependency of formant targets. 905-908 - Tatsuya Kitamura:
Similarity of effects of emotions on the speech organ configuration with and without speaking. 909-912 - Daniel Bone, Samuel Kim, Sungbok Lee, Shrikanth S. Narayanan:
A study of intra-speaker and inter-speaker affective variability using electroglottograph and inverse filtered glottal waveforms. 913-916 - Ken-Ichi Sakakibara, Hiroshi Imagawa, Miwako Kimura, Hisayuki Yokonishi, Niro Tayama:
Modal analysis of vocal fold vibrations using laryngotopography. 917-920 - Martti Vainio, Matti Airas, Juhani Järvikivi, Paavo Alku:
Laryngeal voice quality in the expression of focus. 921-924 - Masako Fujimoto, Kikuo Maekawa, Seiya Funatsu:
Laryngeal characteristics during the production of geminate consonants. 925-928 - Julien Cisonni, Kazunori Nozaki, Annemie Van Hirtum, Shigeo Wada:
Numerical study of turbulent flow-induced sound production in presence of a tooth-shaped obstacle: towards sibilant [s] physical modeling. 929-932 - Iris Hanique, Barbara Schuppler, Mirjam Ernestus:
Morphological and predictability effects on schwa reduction: the case of dutch word-initial syllables. 933-936 - Samer Al Moubayed, Gopal Ananthakrishnan:
Acoustic-to-articulatory inversion based on local regression. 937-940 - Mirjam Broersma:
Korean lenis, fortis, and aspirated stops: effect of place of articulation on acoustic realization. 941-944 - Toru Nakashika, Ryuki Tachibana, Masafumi Nishimura, Tetsuya Takiguchi, Yasuo Ariki:
Speech synthesis by modeling harmonics structure with multiple function. 945-948 - Makoto Otani, Tatsuya Hirahara:
Physics of body-conducted silent speech - production, propagation and representation of non-audible murmur. 949-952
Speech Enhancement
- Subhojit Chakladar, Nam Soo Kim, Yu Gwang Jin, Tae Gyoon Kang:
Multichannel noise reduction using low order RTF estimate. 953-956 - Inho Lee, Jongsung Yoon, Yoonjae Lee, Hanseok Ko:
Reinforced blocking matrix with cross channel projection for speech enhancement. 957-960 - Ning Cheng, Wenju Liu, Lan Wang:
Masking property based microphone array post-filter design. 961-964 - Yusuke Sato, Tetsuya Hoya, Hovagim Bakardjian, Andrzej Cichocki:
Reduction of broadband noise in speech signals by multilinear subspace analysis. 965-968 - Jungpyo Hong, Seung Ho Han, Sangbae Jeong, Minsoo Hahn:
Novel probabilistic control of noise reduction for improved microphone array beamforming. 969-972 - Kai Li, Qiang Fu, Yonghong Yan:
Speech enhancement using improved generalized sidelobe canceller in frequency domain with multi-channel postfiltering. 973-976 - Jani Even, Carlos Toshinori Ishi, Hiroshi Saruwatari, Norihiro Hagita:
Close speaker cancellation for suppression of non-stationary background noise for hands-free speech interface. 977-980 - Ajay Srinivasamurthy, Thippur V. Sreenivas:
Multi-channel iterative dereverberation based on codebook constrained iterative multi-channel wiener filter. 981-984 - Anand Joseph Xavier Medabalimi, Sri Harish Reddy Mallidi, B. Yegnanarayana:
Speaker-dependent mapping of source and system features for enhancement of throat microphone speech. 985-988 - Jun Cai, Stefano Marini, Pierre Malarme, Francis Grenez, Jean Schoentgen:
An analytic modeling approach to enhancing throat microphone speech commands for keyword spotting. 989-992 - Stephen So, Kamil K. Wójcicki, Kuldip K. Paliwal:
Single-channel speech enhancement using kalman filtering in the modulation domain. 993-996 - Miao Yao, Weiqian Liang:
Integrated feedback and noise reduction algorithm in digital hearing aids via oscillation detection. 997-1000 - Charles Mercier, Roch Lefebvre:
A blind signal-to-noise ratio estimator for high noise speech recordings. 1001-1004
Special Session: Fact and Replica of Speech Production (Special Session)
- Hiroshi Imagawa, Ken-Ichi Sakakibara, Isao T. Tokuda, Mamiko Otsuka, Niro Tayama:
Estimation of glottal area function using stereo-endoscopic high-speed digital imaging. 1005-1008 - Kazunori Nozaki, Youhei Ohnishi, Takashi Suda, Shigeo Wada, Shinji Shimojo:
Toward aero-acoustical analysis of the sibilant /s/: an oral cavity modeling. 1009-1012 - Kunitoshi Motoki:
Effects of wall impedance on transmission and attenuation of higher-order modes in vocal-tract model. 1013-1016 - Peter Birkholz, Bernd J. Kröger, Christiane Neuschaefer-Rube:
Articulatory synthesis and perception of plosive-vowel syllables with virtual consonant targets. 1017-1020 - Kotaro Fukui, Toshihiro Kusano, Yoshikazu Mukaeda, Yuto Suzuki, Atsuo Takanishi, Masaaki Honda:
Speech robot mimicking human articulatory motion. 1021-1024 - Takayuki Arai:
Mechanical vocal-tract models for speech dynamics. 1025-1028 - Michael C. Brady:
Prosodic timing analysis for articulatory re-synthesis using a bank of resonators with an adaptive oscillator. 1029-1032
ASR: Language Modeling
- Ahmad Emami, Stanley F. Chen, Abraham Ittycheriah, Hagen Soltau, Bing Zhao:
Decoding with shrinkage-based language models. 1033-1036 - Stanley F. Chen, Stephen M. Chu:
Enhanced word classing for model M. 1037-1040 - Junho Park, Xunying Liu, Mark J. F. Gales, Philip C. Woodland:
Improved neural network based language modelling and adaptation. 1041-1044 - Tomás Mikolov, Martin Karafiát, Lukás Burget, Jan Cernocký, Sanjeev Khudanpur:
Recurrent neural network based language model. 1045-1048 - Preethi Jyothi, Eric Fosler-Lussier:
Discriminative language modeling using simulated ASR errors. 1049-1052 - Graham Neubig, Masato Mimura, Shinsuke Mori, Tatsuya Kawahara:
Learning a language model from continuous speech. 1053-1056
Single-Channel Speech Enhancement
- Stephen So, Kuldip K. Paliwal:
Fast converging iterative kalman filtering for speech enhancement using long and overlapped tapered windows with large side lobe attenuation. 1081-1084 - Xuejing Sun, Kuan-Chieh Yen, Rogerio Guedes Alves:
Robust noise estimation using minimum correction with harmonicity control. 1085-1088 - Mahdi Triki:
New insights into subspace noise tracking. 1089-1092 - Mahdi Triki, Kees Janse:
Bias considerations for minimum subspace noise tracking. 1093-1096 - Ji Ming, Ramji Srinivasan, Danny Crookes:
A corpus-based approach to speech enhancement from nonstationary noise. 1097-1100 - Zhe Chen, You-Chi Cheng, Fuliang Yin, Chin-Hui Lee:
Bandwidth expansion of speech based on wavelet transform modulus maxima vector mapping. 1101-1104
Speech Synthesis: Miscellaneous Topics
- Kalu U. Ogbureke, Peter Cahill, Julie Carson-Berndsen:
Hidden Markov models with context-sensitive observations for grapheme-to-phoneme conversion. 1105-1108 - Brian Langner, Stephan Vogel, Alan W. Black:
Evaluating a dialog language generation system: comparing the mountain system to other NLG approaches. 1109-1112 - Wesley Mattheyses, Lukas Latacz, Werner Verhelst:
Active appearance models for photorealistic visual speech synthesis. 1113-1116 - Jerome R. Bellegarda:
Latent affective mapping: a novel framework for the data-driven analysis of emotion in text. 1117-1120 - Anna C. Janska, Robert A. J. Clark:
Native and non-native speaker judgements on the quality of synthesized speech. 1121-1124 - Dominic Espinosa, Michael White, Eric Fosler-Lussier, Chris Brew:
Machine learning for text selection with expressive unit-selection voices. 1125-1128
Prosody: Basics Applications
- Alexei V. Ivanov, Giuseppe Riccardi, Sucheta Ghosh, Sara Tonelli, Evgeny A. Stepanov:
Acoustic correlates of meaning structure in conversational speech. 1129-1132 - Nicolas Obin, Xavier Rodet, Anne Lacheret:
HMM-based prosodic structure model using rich linguistic context. 1133-1136 - Charlotte Wollermann, Bernhard Schröder, Ulrich Schade:
Audiovisual congruence and pragmatic focus marking. 1137-1140 - Margaret Zellers, Michele Gubian, Brechtje Post:
Redescribing intonational categories with functional data analysis. 1141-1144 - Shen Huang, Hongyan Li, Shijin Wang, Jiaen Liang, Bo Xu:
Exploring goodness of prosody by diverse matching templates. 1145-1148 - Mickael Rouvier, Richard Dufour, Georges Linarès, Yannick Estève:
A language-identification inspired method for spontaneous speech detection. 1149-1152 - Gérard Bailly, Amélie Lelong:
Speech dominoes and phonetic convergence. 1153-1156 - Mátyás Brendel, Riccardo Zaccarelli, Laurence Devillers:
A quick sequential forward floating feature selection algorithm for emotion detection from speech. 1157-1160 - Géza Kiss, Jan P. H. van Santen:
Automated vocal emotion recognition using phoneme class specific features. 1161-1164 - Adrian Pass, Jianguo Zhang, Darryl Stewart:
Feature selection for pose invariant lip biometrics. 1165-1168 - Hussein Hussein, Rüdiger Hoffmann:
Signal-based accent and phrase marking using the fujisaki model. 1169-1172 - Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan:
A study of interplay between articulatory movement and prosodic characteristics in emotional speech production. 1173-1176
ASR: Feature Extraction I, II
- Shang-wen Li, Liang-Che Sun, Lin-Shan Lee:
Improved phoneme recognition by integrating evidence from spectro-temporal and cepstral features. 1177-1180 - Suman V. Ravuri, Nelson Morgan:
Using spectro-temporal features to improve AFE feature extraction for ASR. 1181-1184 - Ibon Saratxaga, Inma Hernáez, Igor Odriozola, Eva Navas, Iker Luengo, Daniel Erro:
Using harmonic phase information to improve ASR rate. 1185-1188 - Kazumasa Yamamoto, Eiichi Sueyoshi, Seiichi Nakagawa:
Speech recognition using long-term phase information. 1189-1192 - Jan Zelinka, Jan Trmal, Ludek Müller:
Low-dimensional space transforms of posteriors in speech recognition. 1193-1196 - Christian Plahl, Ralf Schlüter, Hermann Ney:
Hierarchical bottle neck features for LVCSR. 1197-1200 - Frantisek Grézl, Martin Karafiát:
Hierarchical neural net architectures for feature extraction in ASR. 1201-1204 - Vivek Kumar Rangarajan Sridhar, Rohit Prasad, Prem Natarajan:
Mutual information analysis for feature and sensor subset selection in surface electromyography based speech recognition. 1205-1208 - Bernd T. Meyer, Birger Kollmeier:
Learning from human errors: prediction of phoneme confusions based on modified ASR training. 1209-1212 - Bo Li, Khe Chai Sim:
Hidden logistic linear regression for support vector machine based phone verification. 2614-2617 - Tim Ng, Bing Zhang, Long Nguyen:
Jointly optimized discriminative features for speech recognition. 2618-2621 - Florian Müller, Alfred Mertins:
Invariant integration features combined with speaker-adaptation methods. 2622-2625 - Mark Raugas, Vivek Kumar Rangarajan Sridhar, Rohit Prasad, Prem Natarajan:
Multi resolution discriminative models for subvocalic speech recognition. 2626-2629 - Fabio Valente, Mathew Magimai-Doss, Christian Plahl, Suman V. Ravuri, Wen Wang:
A comparative large scale study of MLP features for Mandarin ASR. 2630-2633 - Cong-Thanh Do, Dominique Pastor, Gaël Le Lan, André Goalic:
Recognizing cochlear implant-like spectrally reduced speech with HMM-based ASR: experiments with MFCCs and PLP coefficients. 2634-2637
Speech Perception: Cross Language and Age
- Kazuhiro Kondo, Takayuki Kanda, Yosuke Kobayashi, Hiroyuki Yagyu:
Speech intelligibility of diagonally localized speech with competing noise using bone-conduction headphones. 1213-1216 - Pierre L. Divenyi:
Masking of vowel-analog transitions by vowel-analog distracters. 1217-1220 - François Pellegrino, Emmanuel Ferragne, Fanny Meunier:
2010, a speech oddity: phonetic transcription of reversed speech. 1221-1224 - Hsin-Yi Lin, Janice Fon:
Perception on pitch reset at discourse boundaries. 1225-1228 - Marjorie Dole, Michel Hoen, Fanny Meunier:
Effect of spatial separation on speech-in-noise comprehension in dyslexic adults. 1229-1232 - Ellen Marklund, Francisco Lacerda, Anna Ericsson:
Speech categorization context effects in seven- to nine-month-old infants. 1233-1236 - Diane Kewley-Port, Larry E. Humes, Daniel Fogerty:
Changes in temporal processing of speech across the adult lifespan. 1237-1240 - Jared Bernstein, Jian Cheng, Masanori Suzuki:
Fluency and structural complexity as predictors of L2 oral proficiency. 1241-1244 - Marco van de Ven, Benjamin V. Tucker, Mirjam Ernestus:
Semantic facilitation in bilingual everyday speech comprehension. 1245-1248 - Bo-ren Hsieh, Ho-hsien Pan:
L2 experience and non-native vowel categorization of L1-Mandarin speakers. 1249-1252 - Mirjam Wester:
Cross-lingual talker discrimination. 1253-1256 - Takashi Otake:
Dajare is not the lowest form of wit. 1257-1260
SLP Systems
- Rafael Torres, Shota Takeuchi, Hiromichi Kawanami, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano:
Comparison of methods for topic classification in a speech-oriented guidance system. 1261-1264 - Pere Comas, Jordi Turmo, Lluís Màrquez:
Using dependency parsing and machine learning for factoid question answering on spoken documents. 1265-1268 - Carolina Parada, Abhinav Sethy, Mark Dredze, Frederick Jelinek:
A spoken term detection framework for recovering out-of-vocabulary words using the web. 1269-1272 - Hung-yi Lee, Chia-Ping Chen, Ching-feng Yeh, Lin-Shan Lee:
Improved spoken term detection by discriminative training of acoustic models based on user relevance feedback. 1273-1276 - Sebastian Tschöpel, Daniel Schneider:
A lightweight keyword and tag-cloud retrieval algorithm for automatic speech recognition transcripts. 1277-1280 - Noboru Kanedera, Tetsuo Funada, Seiichi Nakagawa:
Lecture subtopic retrieval by retrieval keyword expansion using subordinate concept. 1281-1284 - Hiroaki Nanjo, Yusuke Iyonaga, Takehiko Yoshimi:
Spoken document retrieval for oral presentations integrating global document similarities into local document similarities. 1285-1288 - Joseph Polifroni, Stephanie Seneff:
Combining word-based features, statistical language models, and parsing for named entity recognition. 1289-1292 - Azeddine Zidouni, Sophie Rosset, Hervé Glotin:
Efficient combined approach for named entity recognition in spoken language. 1293-1296 - Sree Harsha Yella, Vasudeva Varma, Kishore Prahallad:
Prominence based scoring of speech segments for automatic speech-to-speech summarization. 1297-1300 - Zihan Liu, Lei Xie, Wei Feng:
Maximum lexical cohesion for fine-grained news story segmentation. 1301-1304 - Xiaoxuan Wang, Lei Xie, Bin Ma, Engsiong Chng, Haizhou Li:
Phoneme lattice based texttiling towards multilingual story segmentation. 1305-1308
Quality of Experiencing Speech Services (Special Session)
- Anton Schlesinger, Marinus M. Boone:
The characterization of the relative information content by spectral features for the objective intelligibility assessment of nonlinearly processed speech. 1309-1312 - Marcel Wältermann, Alexander Raake, Sebastian Möller:
Analytical assessment and distance modeling of speech transmission quality. 1313-1316 - Nicolas Côté, Vincent Koehl, Valérie Gautier-Turbin, Alexander Raake, Sebastian Möller:
An intrusive super-wideband speech quality model: DIAL. 1317-1320 - Sebastian Egger, Raimund Schatz, Stefan Scherer:
It takes two to tango - assessing the impact of delay on conversational interactivity on perceived speech quality. 1321-1324 - Sebastian Möller, Florian Hinterleitner, Tiago H. Falk, Tim Polzehl:
Comparison of approaches for instrumentally predicting the quality of text-to-speech systems. 1325-1328 - Imre Kiss, Joseph Polifroni, Chao Wang, Ghinwa F. Choueiter, Mike Phillips:
A hybrid architecture for mobile voice user interfaces. 1329-1332 - Markku Turunen, Jaakko Hakulinen, Tomi Heimonen:
Assessment of spoken and multimodal applications: lessons learned from laboratory and field studies. 1333-1336 - Klaus-Peter Engelbrecht, Hamed Ketabdar, Sebastian Möller:
Improving cross database prediction of dialogue quality using mixture of experts. 1337-1340
Language Processing
- Camille Guinaudeau, Guillaume Gravier, Pascale Sébillot:
Improving ASR-based topic segmentation of TV programs with confidence measures and semantic relations. 1365-1368 - Saturnino Luz, Jing Su:
The relevance of timing, pauses and overlaps in dialogues: detecting topic changes in scenario based meetings. 1369-1372 - Richard Dufour, Benoît Favre:
Semi-supervised part-of-speech tagging in speech applications. 1373-1376 - Frédéric Tantini, Christophe Cerisara, Claire Gardent:
Memory-based active learning for French broadcast news. 1377-1380 - Dan Gillick:
Can conversational word usage be used to predict speaker demographics?. 1381-1384 - Chao-Hong Liu, Chung-Hsien Wu:
Prosodic word-based error correction in speech recognition using prosodic word expansion and contextual information. 1385-1388
Speech and Audio Segmentation
- Sarah Hoffmann, Beat Pfister:
Fully automatic segmentation for prosodic speech corpora. 1389-1392 - Vahid Khanagha, Khalid Daoudi, Oriol Pont, Hussein M. Yahia:
A novel text-independent phonetic segmentation algorithm based on the microcanonical multiscale formalism. 1393-1396 - You-Yu Lin, Yih-Ru Wang, Yuan-Fu Liao:
Phone boundary detection using sample-based acoustic parameters. 1397-1400 - Utpala Musti, Asterios Toutios, Slim Ouni, Vincent Colotte, Brigitte Wrobel-Dautcourt, Marie-Odile Berger:
HMM-based automatic visual speech segmentation using facial data. 1401-1404 - David Wang, Robert Vogt, Sridha Sridharan:
Bayes factor based speaker segmentation for speaker diarization. 1405-1408 - Qiang Huang, Stephen J. Cox:
Using high-level information to detect key audio events in a tennis game. 1409-1412
Prosody: Analysis
- Catherine Lai:
What do you mean, you're uncertain?: the interpretation of cue words and rising intonation in dialogue. 1413-1416 - Yi-Fen Liu, Shu-Chuan Tseng, Jyh-Shing Roger Jang, C.-H. Alvin Chen:
Coping imbalanced prosodic unit boundary detection with linguistically-motivated prosodic features. 1417-1420 - Zhigang Chen, Guoping Hu, Wei Jiang:
Improving prosodic phrase prediction by unsupervised adaptation and syntactic features extraction. 1421-1424 - Yujia Li, Tan Lee:
Perception-based automatic approximation of F0 contours in Cantonese speech. 1425-1428 - Raul Fernandez, Bhuvana Ramabhadran:
Discriminative training and unsupervised adaptation for labeling prosodic events with limited training data. 1429-1432 - Erin Cvejic, Jeesun Kim, Chris Davis, Guillaume Gibert:
Prosody for the eyes: quantifying visual prosody using guided principal component analysis. 1433-1436
Systems for LVCSR and Rich Transcription
- Naveen Parihar, Ralf Schlüter, David Rybach, Eric A. Hansen:
Parallel lexical-tree based LVCSR on multi-core processors. 1485-1488 - Jike Chong, Ekaterina Gonina, Kisun You, Kurt Keutzer:
Exploring recognition network representations for efficient speech inference on highly parallel platforms. 1489-1492 - Diamantino Caseiro:
WFST compression for automatic speech recognition. 1493-1496 - Ivan Bulyko:
Speech recognizer optimization under speed constraints. 1497-1500 - Florian Metze, Roger Hsiao, Qin Jin, Udhyakumar Nallasamy, Tanja Schultz:
The 2010 CMU GALE speech-to-text system. 1501-1504 - Tin Lay Nwe, Hanwu Sun, Bin Ma, Haizhou Li:
Speaker diarization in meeting audio for single distant microphone. 1505-1508 - Fernando Batista, Helena Moniz, Isabel Trancoso, Hugo Meinedo, Ana Isabel Mata, Nuno J. Mamede:
Extending the punctuation module for european portuguese. 1509-1512 - Sakriani Sakti, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:
Utilizing a noisy-channel approach for Korean LVCSR. 1513-1516 - Markus Nußbaum-Thom, Simon Wiesler, Martin Sundermeyer, Christian Plahl, Stefan Hahn, Ralf Schlüter, Hermann Ney:
The RWTH 2009 quaero ASR evaluation system for English and German. 1517-1520
Phonetics
- Benjamin Munson, Renata Solum:
When is indexical information about speech activated? evidence from a cross-modal priming experiment. 1521-1524 - Benjamin Munson:
The influence of actual and perceived sexual orientation on diadochokinetic rate in women and men. 1525-1528 - Kristine M. Yu:
Laryngealization and features for Chinese tonal recognition. 1529-1532 - Viet Son Nguyen, Eric Castelli, René Carré:
Production and perception of vietnamese short vowels in V1V2 context. 1533-1536 - Gertraud Fenk-Oczlon, August Fenk:
Measuring basic tempo across languages and some implications for speech rhythm. 1537-1540 - Yukari Hirata, Shigeaki Amano:
Durational structure of Japanese single/geminate stops in three- and four-mora words spoken at varied rates. 1541-1544 - Shin-ichiro Sano, Tomohiko Ooigawa:
Distribution and trichotomic realization of voiced velars in Japanese - an experimental study. 1545-1548 - Jagoda Sieczkowska, Bernd Möbius, Grzegorz Dogil:
Specification in context - devoicing processes in Polish, French, american English and German sonorants. 1549-1552 - Kuniko Y. Nielsen:
Phonetic imitation of Japanese vowel devoicing. 1553-1556 - Mary Stevens, John Hajek:
Post-aspiration in standard Italian: some first cross-regional acoustic evidence. 1557-1560 - Mirko Grimaldi, Andrea Calabrese, Francesco Sigona, Luigia Garrapa, Bianca Sisinni:
Articulatory grounding of southern salentino harmony processes. 1561-1564 - Yuuki Tanida, Taiji Ueno, Satoru Saito, Matthew A. Lambon Ralph:
Effects of accent typicality and phonotactic frequency on nonword immediate serial recall performance in Japanese. 1565-1567 - Osamu Fujimura:
How abstract is phonetics?. 1568-1571
Speech Production: Vocal Tract Modeling and Imaging
- Adam C. Lammert, Michael I. Proctor, Shrikanth S. Narayanan:
Data-driven analysis of realtime vocal tract MRI using correlated image regions. 1572-1575 - Michael I. Proctor, Daniel Bone, Athanasios Katsamanis, Shrikanth S. Narayanan:
Rapid semi-automatic segmentation of real-time magnetic resonance images for parametric vocal tract analysis. 1576-1579 - Yoon-Chul Kim, Shrikanth S. Narayanan, Krishna S. Nayak:
Improved real-time MRI of oral-velar coordination using a golden-ratio spiral view order. 1580-1583 - Erik Bresch, Athanasios Katsamanis, Louis Goldstein, Shrikanth S. Narayanan:
Statistical multi-stream modeling of real-time MRI articulatory speech data. 1584-1587 - Gopal Ananthakrishnan, Pierre Badin, Julián Andrés Valdés Vargas, Olov Engwall:
Predicting unseen articulations from multi-speaker articulatory models. 1588-1591 - Chao Qin, Miguel Á. Carreira-Perpiñán:
Estimating missing data sequences in x-ray microbeam recordings. 1592-1595 - Chao Qin, Miguel Á. Carreira-Perpiñán, Mohsen Farhadloo:
Adaptation of a tongue shape model by local feature transformations. 1596-1599 - Sungbok Lee, Shrikanth S. Narayanan:
Vocal tract contour analysis of emotional speech by the functional data curve representation. 1600-1603 - Adam C. Lammert, Louis Goldstein, Khalil Iskarous:
Locally-weighted regression for estimating the forward kinematics of a geometric vocal tract model. 1604-1607 - Michael Reimer, Frank Rudzicz:
Identifying articulatory goals from kinematic data using principal differential analysis. 1608-1611 - Zuheng Ming, Denis Beautemps, Gang Feng, Sébastien Schmerber:
Estimation of speech lip features from discrete cosinus transform. 1612-1615 - Farzaneh Ahmadi, Ian Vince McLoughlin, Hamid R. Sharifzadeh:
Autoregressive modelling for linear prediction of ultrasonic speech. 1616-1619
Speech Intelligibility Enhancement for All Ages, Health Conditions and Environments (Special Session)
- Takayuki Arai, Nao Hodoshima:
Enhanced speech yielding higher intelligibility for all listeners and environments. 1620-1623 - Seyed Omid Sadjadi, Sanjay A. Patil, John H. L. Hansen:
Quality conversion of non-acoustic signals for facilitating human-to-human speech communication under harsh acoustic conditions. 1624-1627 - Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
The use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion. 1628-1631 - Gibak Kim, Philipos C. Loizou:
A new binary mask based on noise constraints for improved speech intelligibility. 1632-1635 - Yan Tang, Martin Cooke:
Energy reallocation strategies for speech enhancement in known noise conditions. 1636-1639 - Jing Chen, Thomas Baer, Brian C. J. Moore:
Effects of enhancement of spectral changes on speech quality and subjective speech intelligibility. 1640-1643
ASR: Acoustic Model Adaptation
- Catherine Breslin, K. K. Chin, Mark J. F. Gales, Kate M. Knill, Haitian Xu:
Prior information for rapid speaker adaptation. 1644-1647 - Jonas Lööf, Ralf Schlüter, Hermann Ney:
Discriminative adaptation for log-linear acoustic models. 1648-1651 - Dimitra Vergyri, Lori Lamel, Jean-Luc Gauvain:
Automatic speech recognition of multiple accented English data. 1652-1655 - Jinyu Li, Yu Tsao, Chin-Hui Lee:
Shrinkage model adaptation in automatic speech recognition. 1656-1659 - Jinyu Li, Dong Yu, Yifan Gong, Li Deng:
Unscented transform with online distortion estimation for HMM adaptation. 1660-1663 - Michael L. Seltzer, Alex Acero:
HMM adaptation using linear spline interpolation with integrated spline parameter training for robust speech recognition. 1664-1667
SLP Systems for Information Extraction/Retrieval
- Dong Wang, Simon King, Nicholas W. D. Evans, Raphaël Troncy:
CRF-based stochastic pronunciation modeling for out-of-vocabulary spoken term detection. 1668-1671 - Chia-Ping Chen, Hung-yi Lee, Ching-feng Yeh, Lin-Shan Lee:
Improved spoken term detection by feature space pseudo-relevance feedback. 1672-1675 - Aren Jansen, Kenneth Church, Hynek Hermansky:
Towards spoken term discovery at scale with zero resources. 1676-1679 - Evandro B. Gouvêa, Tony Ezzat:
Vocabulary independent spoken query: a case for subword units. 1680-1683 - Shih-Hsiang Lin, Yao-Ming Yeh, Berlin Chen:
Extractive speech summarization - from the view of decision theory. 1684-1687 - Gabriel Murray, Giuseppe Carenini, Raymond T. Ng:
The impact of ASR on abstractive vs. extractive meeting summaries. 1688-1691
Speech Representation
- Li Deng, Michael L. Seltzer, Dong Yu, Alex Acero, Abdel-rahman Mohamed, Geoffrey E. Hinton:
Binary coding of speech spectrograms using a deep auto-encoder. 1692-1695 - Juhan Nam, Gautham J. Mysore, Joachim Ganseman, Kyogu Lee, Jonathan S. Abel:
A super-resolution spectrogram using coupled PLCA. 1696-1699 - Georgios Tzedakis, Yannis Pantazis, Olivier Rosec, Yannis Stylianou:
Fast least-squares solution for sinusoidal, harmonic and quasi-harmonic models. 1700-1703 - Afsaneh Asaei, Hervé Bourlard, Philip N. Garner:
Sparse component analysis for speech recognition in multi-speaker environment. 1704-1707 - Trond Skogstad, Torbjørn Svendsen:
Intra-frame variability as a predictor of frame classifiability. 1708-1711 - Tetsuya Shimamura, Ngoc Dinh Nguyen:
Autocorrelation and double autocorrelation based spectral representations for a noisy word recognition system. 1712-1715
Voice Conversion
- Elina Helander, Hanna Silén, Joaquín Míguez, Moncef Gabbouj:
Maximum a posteriori voice conversion using sequential monte carlo methods. 1716-1719 - Pierre Lanchantin, Xavier Rodet:
Dynamic model selection for spectral voice conversion. 1720-1723 - Takashi Nose, Takao Kobayashi:
Speaker-independent HMM-based voice conversion using quantized fundamental frequency. 1724-1727 - Daisuke Saito, Shinji Watanabe, Atsushi Nakamura, Nobuaki Minematsu:
Probabilistic integration of joint density model and speaker model for voice conversion. 1728-1731 - Zhizheng Wu, Tomi Kinnunen, Engsiong Chng, Haizhou Li:
Text-independent F0 transformation with non-parallel data for voice conversion. 1732-1735 - Xiaodan Zhuang, Lijuan Wang, Frank K. Soong, Mark Hasegawa-Johnson:
A minimum converted trajectory error (MCTE) approach to high quality speech-to-lips conversion. 1736-1739
Prosody: Language-Specific Models
- Anastasia Karlsson, David House, Jan-Olof Svantesson, Damrong Tayanin:
Influence of lexical tones on intonation in kammu. 1740-1743 - Satoshi Nambu, Yong-cheol Lee:
Phonetic realization of second occurrence focus in Japanese. 1744-1747 - Jianjing Kuang:
Prosodic grouping and relative clause disambiguation in Mandarin. 1748-1751 - Ya Li, Jianhua Tao, Meng Zhang, Shifeng Pan, Xiaoying Xu:
Text-based unstressed syllable prediction in Mandarin. 1752-1755 - Tomás Dubeda:
"flat pitch accents" in Czech. 1756-1759 - Tomás Dubeda:
Positional variability of pitch accents in Czech. 1760-1763 - Shyamal Kr. Das Mandal, Arup Saha, Tulika Basu, Keikichi Hirose, Hiroya Fujisaki:
Modeling of sentence-medial pauses in bangla readout speech: occurrence and duration. 1764-1767 - Adrian Leemann, Lucy Zuberbühler:
Declarative sentence intonation patterns in 8 swiss German dialects. 1768-1771 - Je Hun Jeon, Yang Liu:
Syllable-level prominence detection with acoustic evidence. 1772-1775 - Sankalan Prasad, Kalika Bali:
Prosody cues for classification of the discourse particle "hã" in hindi. 1776-1779 - Yuan Jia, Aijun Li:
Interaction of syntax-marked focus and wh-question induced focus in standard Chinese. 1780-1783 - Samer Al Moubayed, Jonas Beskow:
Prominence detection in Swedish using syllable correlates. 1784-1787 - Na Zhi, Daniel Hirst, Pier Marco Bertinetto:
Automatic analysis of the intonation of a tone language. applying the momel algorithm to spontaneous standard Chinese (beijing). 1788-1791 - Raymond W. M. Ng, Cheung-Chi Leung, Ville Hautamäki, Tan Lee, Bin Ma, Haizhou Li:
Towards long-range prosodic attribute modeling for language recognition. 1792-1795 - Robert Schubert, Oliver Jokisch, Diane Hirschfeld:
A modified parameterization of the Fujisaki model. 1796-1799
ASR: Language Modeling and Speech Understanding I
- Saeedeh Momtazi, Friedrich Faubel, Dietrich Klakow:
Within and across sentence boundary language model. 1800-1803 - Ruhi Sarikaya, Stanley F. Chen, Abhinav Sethy, Bhuvana Ramabhadran:
Impact of word classing on shrinkage-based language models. 1804-1807 - Stanislas Oger, Vladimir Popescu, Georges Linarès:
Combination of probabilistic and possibilistic language models. 1808-1811 - Brandon Ballinger, Cyril Allauzen, Alexander Gruenstein, Johan Schalkwyk:
On-demand language model interpolation for mobile speech input. 1812-1815 - Tim Schlippe, Chenfei Zhu, Jan Gebhardt, Tanja Schultz:
Text normalization based on statistical machine translation and internet user support. 1816-1819 - Tanel Alumäe, Mikko Kurimo:
Efficient estimation of maximum entropy language models with n-gram features: an SRILM extension. 1820-1823 - Christian Gillot, Christophe Cerisara, David Langlois, Jean Paul Haton:
Similar n-gram language model. 1824-1827 - Markpong Jongtaveesataporn, Sadaoki Furui:
Topic and style-adapted language modeling for Thai broadcast news ASR. 1828-1831 - Ahmad Emami, Hong-Kwang Jeff Kuo, Imed Zitouni, Lidia Mangu:
Augmented context features for Arabic speech recognition. 1832-1835 - Lucía Ortega, Isabel Galiano, Lluís F. Hurtado, Emilio Sanchis, Encarna Segarra:
A statistical segment-based approach for spoken language understanding. 1836-1839 - Benjamin Lecouteux, Raphaël Rubino, Georges Linarès:
Improving back-off models with bag of words and hollow-grams. 2418-2421 - Ciprian Chelba, Thorsten Brants, Will Neveitt, Peng Xu:
Study on interaction between entropy pruning and kneser-ney smoothing. 2422-2425 - Hitoshi Yamamoto, Ken Hanazawa, Kiyokazu Miki, Koichi Shinoda:
Dynamic language model adaptation using keyword category classification. 2426-2429 - Welly Naptali, Masatoshi Tsuchiya, Seiichi Nakagawa:
Integration of cache-based model and topic dependent class model with soft clustering and soft voting. 2430-2433 - Frédéric Duvert, Renato de Mori:
Conditional models for detecting lambda-functions in a spoken language understanding system. 2434-2437 - Md. Akmal Haidar, Douglas D. O'Shaughnessy:
Novel weighting scheme for unsupervised language model adaptation using latent dirichlet allocation. 2438-2441 - Qun Feng Tan, Kartik Audhkhasi, Panayiotis G. Georgiou, Emil Ettelaie, Shrikanth S. Narayanan:
Automatic speech recognition system channel modeling. 2442-2445 - Takanobu Oba, Takaaki Hori, Atsushi Nakamura:
Round-robin discrimination model for reranking ASR hypotheses. 2446-2449 - Hasim Sak, Murat Saraclar, Tunga Güngör:
On-the-fly lattice rescoring for real-time automatic speech recognition. 2450-2453
First and Second Language Acquisition
- Angela Cooper, Yue Wang:
Cantonese tone word learning by tone and non-tone language speakers. 1840-1843 - Anne Cutler, Janise Shanley:
Validation of a training method for L2 continuous-speech segmentation. 1844-1847 - Jiahong Yuan:
Linguistic rhythm in foreign accent. 1848-1849 - Mee Sonu, Keiichi Tajima, Hiroaki Kato, Yoshinori Sagisaka:
The effect of a word embedded in a sentence and speaking rate variation on the perceptual training of geminate and singleton consonant distinction. 1850-1853 - Chiharu Tsurutani:
Foreign accent matters most when timing is wrong. 1854-1857 - Hyejin Hong, Jina Kim, Minhwa Chung:
Effects of Korean learners' consonant cluster reduction strategies on English speech recognition performance. 1858-1861 - June S. Levitt, William F. Katz:
The effects of EMA-based augmented visual feedback on the English speakers' acquisition of the Japanese flap: a perceptual study. 1862-1865 - Hinako Masuda, Takayuki Arai:
Perception of voiceless fricatives by Japanese listeners of advanced and intermediate level English proficiency. 1866-1869 - Lya Meister, Einar Meister:
Perception of estonian vowel categories by native and non-native speakers. 1870-1873 - Qin Shi, Kun Li, Shilei Zhang, Stephen M. Chu, Ji Xiao, Zhijian Ou:
Spoken English assessment system for non-native speakers using acoustic and prosodic features. 1874-1877 - Elena E. Lyakso, Olga V. Frolova, Anna V. Kurazhova, Julia S. Gaikova:
Russian infants and children's sounds and speech corpuses for language acquisition studies. 1878-1881 - Julia Monnin, Hélène Loevenbruck:
Language-specific influence on phoneme development: French and drehu data. 1882-1885 - Jeffrey J. Holliday, Mary E. Beckman, Chanelle Mays:
Did you say susi or shushi? measuring the emergence of robust fricative contrasts in English- and Japanese-acquiring children. 1886-1889
Spoken Language Resources, Systems and Evaluation I, II
- Josef R. Novak, Paul R. Dixon, Sadaoki Furui:
An empirical comparison of the t3, juicer, HDecode and sphinx3 decoders. 1890-1893 - Philip N. Garner, John Dines:
Tracter: a lightweight dataflow framework. 1894-1897 - Marelie H. Davel, Febe de Wet:
Verifying pronunciation dictionaries using conflict analysis. 1898-1901 - Brandon Roy, Soroush Vosoughi, Deb Roy:
Automatic estimation of transcription accuracy and difficulty. 1902-1905 - Benjamin Lambert, Rita Singh, Bhiksha Raj:
Creating a linguistic plausibility dataset with non-expert annotators. 1906-1909 - Xinhui Hu, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:
Construction and evaluations of an annotated Chinese conversational corpus in travel domain for the language model of speech recognition. 1910-1913 - Thad Hughes, Kaisuke Nakajima, Linne Ha, Atul Vasu, Pedro J. Moreno, Mike LeBeau:
Building transcribed speech corpora quickly and cheaply for many languages. 1914-1917 - Heidi Christensen, Jon Barker, Ning Ma, Phil D. Green:
The CHiME corpus: a resource and a challenge for computational hearing in multisource environments. 1918-1921 - Wen Cao, Dongning Wang, Jinsong Zhang, Ziyu Xiong:
Developing a Chinese L2 speech database of Japanese learners with narrow-phonetic labels for computer assisted pronunciation training. 1922-1925 - Shogo Ishikawa, Shinya Kiriyama, Yoichi Takebayashi, Shigeyoshi Kitazawa:
How children acquire situation understanding skills?: a developmental analysis utilizing multimodal speech behavior corpus. 1926-1929 - Ina Wechsung, Stefan Schaffer, Robert Schleicher, Anja Naumann, Sebastian Möller:
The influence of expertise and efficiency on modality selection strategies and perceived mental effort. 1930-1933 - Christine Kühnel, Benjamin Weiss, Sebastian Möller:
Parameters describing multimodal interaction - definitions and three usage scenarios. 1934-1937 - Alexander Zgorzelski, Alexander Schmitt, Tobias Heinroth, Wolfgang Minker:
Repair strategies on trial: which error recovery do users like best?. 1938-1941 - Maryam Kamvar, Doug Beeferman:
Say what? why users choose to speak their web queries. 1966-1969 - Jonathan Teutenberg, Catherine Inez Watson:
The effect of audience familiarity on the perception of modified accent. 1970-1973 - Korin Richmond, Robert A. J. Clark, Susan Fitt:
On generating combilex pronunciations via morphological analysis. 1974-1977 - Florian Gödde, Sebastian Möller:
Say it as you mean it - analyzing free user comments in the VOICE awards corpus. 1978-1981 - Viktor Rozgic, Bo Xiao, Athanasios Katsamanis, Brian R. Baucom, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
A new multichannel multi modal dyadic interaction database. 1982-1985 - Dau-Cheng Lyu, Tien Ping Tan, Engsiong Chng, Haizhou Li:
SEAME: a Mandarin-English code-switching speech corpus in south-east asia. 1986-1989
Speech Production: Analysis
- Daniel Felps, Christian Geng, Michael Berger, Korin Richmond, Ricardo Gutierrez-Osuna:
Relying on critical articulators to estimate vocal tract spectra in an articulatory-acoustic database. 1990-1993 - Vikram Ramanarayanan, Dani Byrd, Louis Goldstein, Shrikanth S. Narayanan:
Investigating articulatory setting - pauses, ready position, and rest - using real-time MRI. 1994-1997 - Chao Qin, Miguel Á. Carreira-Perpiñán:
Articulatory inversion of american English /turnr/ by conditional density modes. 1998-2001 - Atef Ben Youssef, Pierre Badin, Gérard Bailly:
Can tongue be recovered from face? the answer of data-driven statistical models. 2002-2005 - Francisco Torreira, Mirjam Ernestus:
Phrase-medial vowel devoicing in spontaneous French. 2006-2009 - Chierh Cheng, Yi Xu, Michele Gubian:
Exploring the mechanism of tonal contraction in taiwan Mandarin. 2010-2013
Paralanguage Cognition
- Benjamin Weiss, Felix Burkhardt:
Voice attributes affecting likability perception. 2014-2017 - Kristiina Jokinen, Kazuaki Harada, Masafumi Nishida, Seiichi Yamamoto:
Turn-alignment using eye-gaze and speech in conversational interaction. 2018-2021 - Tet Fei Yap, Julien Epps, Eliathamby Ambikairajah, Eric H. C. Choi:
An investigation of formant frequencies for cognitive load classification. 2022-2025 - Martijn Goudbeek, Mirjam Broersma:
Language specific effects of emotion on phoneme duration. 2026-2029 - Matthew Black, Athanasios Katsamanis, Chi-Chun Lee, Adam C. Lammert, Brian R. Baucom, Andrew Christensen, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Automatic classification of married couples' behavior using audio features. 2030-2033 - Gideon Kowadlo, Patrick Ye, Ingrid Zukerman:
Influence of gestural salience on the interpretation of spoken requests. 2034-2037
Robust ASR Against Noise
- Vikramjit Mitra, Hosung Nam, Carol Y. Espy-Wilson, Elliot Saltzman, Louis Goldstein:
Robust word recognition using articulatory trajectories and gestures. 2038-2041 - Takeshi Yamada, Tomohiro Nakajima, Nobuhiko Kitawaki, Shoji Makino:
Performance estimation of noisy speech recognition considering recognition task complexity. 2042-2045 - Friedrich Faubel, Dietrich Klakow:
Estimating noise from noisy speech features with a monte carlo variant of the expectation maximization algorithm. 2046-2049 - Satoshi Tamura, Eriko Hishikawa, Wataru Taguchi, Satoru Hayamizu:
Template-based spectral estimation using microphone array for speech recognition. 2050-2053 - Aleem Mushtaq, Yu Tsao, Chin-Hui Lee:
A particle filter feature compensation approach to robust speech recognition. 2054-2057 - Chanwoo Kim, Richard M. Stern:
Nonlinear enhancement of onset for robust speech recognition. 2058-2061 - Shirin Badiezadegan, Richard C. Rose:
Mask estimation in non-stationary noise environments for missing feature based robust speech recognition. 2062-2065 - Lae-Hoon Kim, Kyung-Tae Kim, Mark Hasegawa-Johnson:
Robust automatic speech recognition with decoder oriented ideal binary mask estimation. 2066-2069 - Gökhan Ince, Kazuhiro Nakadai, Tobias Rodemann, Hiroshi Tsujino, Jun-ichi Imura:
A robust speech recognition system against the ego noise of a robot. 2070-2073 - Kuo-Hao Wu, Chia-Ping Chen:
Empirical mode decomposition for noise-robust automatic speech recognition. 2074-2077 - Wooil Kim, Jun-Won Suh, John H. L. Hansen:
An effective feature compensation scheme tightly matched with speech recognizer employing SVM-based GMM generation. 2078-2081 - Jort F. Gemmeke, Tuomas Virtanen:
Artificial and online acquired noise dictionaries for noise robust ASR. 2082-2085 - Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
Voice activity detection based on conditional random fields using multiple features. 2086-2089 - Yong Zhao, Biing-Hwang Juang:
A comparative study of noise estimation algorithms for VTS-based robust speech recognition. 2090-2093 - Frank Seide, Pei Zhao:
On using missing-feature theory with cepstral features - approximations to the multivariate integral. 2094-2097 - Yang Sun, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves:
Using a DBN to integrate sparse classification and GMM-based ASR. 2098-2101
Voice Conversion and Speech Synthesis
- Axel Röbel:
Shape-invariant speech transformation with the phase vocoder. 2146-2149 - Kayoko Yanagisawa, Mark A. Huckvale:
A phonetic alternative to cross-language voice conversion in a text-dependent context: evaluation of speaker identity. 2150-2153 - Esther Klabbers, Alexander Kain, Jan P. H. van Santen:
Evaluation of speaker mimic technology for personalizing SGD voices. 2154-2157 - Kumi Ohta, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari, Kiyohiro Shikano:
Adaptive voice-quality control based on one-to-many eigenvoice conversion. 2158-2161 - Fernando Villavicencio, Jordi Bonada:
Applying voice conversion to concatenative singing-voice synthesis. 2162-2165 - Miaomiao Wang, Miaomiao Wen, Keikichi Hirose, Nobuaki Minematsu:
Improved generation of fundamental frequency in HMM-based speech synthesis using generation process model. 2166-2169 - Ming Lei, Yi-Jian Wu, Frank K. Soong, Zhen-Hua Ling, Li-Rong Dai:
A hierarchical F0 modeling method for HMM-based speech synthesis. 2170-2173 - Javier Latorre, Mark J. F. Gales, Heiga Zen:
Training a parametric-based logF0 model with the minimum generation error criterion. 2174-2177 - Miaomiao Wen, Miaomiao Wang, Keikichi Hirose, Nobuaki Minematsu:
Improving Mandarin segmental duration prediction with automatically extracted syntax features. 2178-2181 - Daniel R. van Niekerk, Etienne Barnard:
An intonation model for TTS in sepedi. 2182-2185 - Michael Pucher, Dietmar Schabus, Junichi Yamagishi:
Synthesis of fast speech with interpolation of adapted HSMMs and its evaluation by blind and sighted listeners. 2186-2189 - Gabriel Webster, Sacha Krstulovic, Kate M. Knill:
A comparison of pronunciation modeling approaches for HMM-TTS. 2190-2193 - Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi:
HMM-based text-to-articulatory-movement prediction and analysis of critical articulators. 2194-2197
Detection, Classification, and Segmentation
- Jiaxing Ye, Takumi Kobayashi, Tetsuya Higuchi:
Audio-based sports highlight detection by fourier local auto-correlations. 2198-2201 - Hynek Boril, Abhijeet Sangwan, Taufiq Hasan, John H. L. Hansen:
Automatic excitement-level detection for sports highlights generation. 2202-2205 - Jörg-Hendrik Bach, Jörn Anemüller:
Detecting novel objects in acoustic scenes through classifier incongruence. 2206-2209 - Stavros Ntalampiras, Ilyas Potamitis, Nikos Fakotakis:
A multidomain approach for automatic home environmental sound classification. 2210-2213 - Patrick Cardinal, Vishwa Gupta, Gilles Boulianne:
Content-based advertisement detection. 2214-2217 - Stavros Ntalampiras, Ilyas Potamitis, Nikos Fakotakis:
Identification of abnormal audio events based on probabilistic novelty detection. 2218-2221 - Norbert Braunschweiler, Mark J. F. Gales, Sabine Buchholz:
Lightly supervised recognition for automatic alignment of large coherent speech recordings. 2222-2225 - Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman:
Incremental diarization of telephone conversations. 2226-2229 - Srikanth Cherla, V. Ramasubramanian:
Audio analytics by template modeling and 1-pass DP based decoding. 2230-2233 - Mariusz Ziólko, Jakub Galka, Bartosz Ziólko, Tomasz Drwiega:
Perceptual wavelet decomposition for speech segmentation. 2234-2237 - Venkatesh Keri, Kishore Prahallad:
A comparative study of constrained and unconstrained approaches for segmentation of speech signal. 2238-2241 - Morgan Sonderegger, Joseph Keshet:
Automatic discriminative measurement of voice onset time. 2242-2245 - Yi Ren Leng, Tran Huy Dat, Norihide Kitaoka, Haizhou Li:
Selective gammatone filterbank feature for robust sound event recognition. 2246-2249
Compressive Sensing for Speech and Language Processing (Special Session)
- Allen Y. Yang, Zihan Zhou, Yi Ma, Shankar Sastry:
Towards a robust face recognition system using compressive sensing. 2250-2253 - Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky, Abhinav Sethy:
Sparse representation features for speech recognition. 2254-2257 - Abhinav Sethy, Tara N. Sainath, Bhuvana Ramabhadran, Dimitri Kanevsky:
Data selection for language modeling using sparse representations. 2258-2261 - Jort F. Gemmeke, Ulpu Remes, Kalle J. Palomäki:
Observation uncertainty measures for sparse imputation. 2262-2265 - Tara N. Sainath, Sameer Maskey, Dimitri Kanevsky, Bhuvana Ramabhadran, David Nahamoo, Julia Hirschberg:
Sparse representations for text categorization. 2266-2269 - Garimella S. V. S. Sivaram, Sriram Ganapathy, Hynek Hermansky:
Sparse auto-associative neural networks: theory and application to speech recognition. 2270-2273
ASR: Lexical and Pronunciation Modeling
- Chi Hu, Xiaodan Zhuang, Mark Hasegawa-Johnson:
FSM-based pronunciation modeling using articulatory phonological code. 2274-2277 - Denis Jouvet, Dominique Fohr, Irina Illina:
Detailed pronunciation variant modeling for speech transcription. 2278-2281 - Line Adde, Bert Réveil, Jean-Pierre Martens, Torbjørn Svendsen:
A minimum classification error approach to pronunciation variation modeling of non-native proper names. 2282-2285 - Antoine Laurent, Sylvain Meignier, Téva Merlin, Paul Deléglise:
Acoustics-based phonetic transcription method for proper nouns. 2286-2289 - Tim Schlippe, Sebastian Ochs, Tanja Schultz:
Wiktionary as a source for automatic pronunciation extraction. 2290-2293 - Ibrahim Badr, Ian McGraw, James R. Glass:
Learning new word pronunciations from spoken examples. 2294-2297
Speaker Recognition and Diarization
- I-Fan Chen, Shih-Sian Cheng, Hsin-Min Wang:
Phonetic subspace mixture model for speaker diarization. 2298-2301 - Martin Zelenák, Carlos Segura, Javier Hernando:
Overlap detection for speaker diarization by fusing spectral and spatial features. 2302-2305 - Alfred Dielmann, Giulia Garau, Hervé Bourlard:
Floor holder detection and end of speaker turn prediction in meetings. 2306-2309 - Carlos Vaquero, Alfonso Ortega, Jesús Antonio Villalba López, Antonio Miguel, Eduardo Lleida:
Confidence measures for speaker segmentation and their relation to speaker verification. 2310-2313 - Anthony Larcher, Christophe Lévy, Driss Matrouf, Jean-François Bonastre:
Decoupling session variability modelling and speaker characterisation. 2314-2317 - Cheung-Chi Leung, Donglai Zhu, Kong-Aik Lee, Bin Ma, Haizhou Li:
Incorporating MAP estimation and covariance transform for SVM based speaker recognition. 2318-2321
Speech and Audio Classification
- Stéphane Rossignol, Olivier Pietquin:
Single-speaker/multi-speaker co-channel speech classification. 2322-2325 - Oriol Vinyals, Gerald Friedland, Nelson Morgan:
Discriminative training for hierarchical clustering in speaker diarization. 2326-2329 - Jürgen T. Geiger, Frank Wallhoff, Gerhard Rigoll:
GMM-UBM based open-set online speaker diarization. 2330-2333 - Ladan Golipour, Douglas D. O'Shaughnessy:
A segment-based non-parametric approach for monophone recognition. 2334-2337 - Taras Butko, Climent Nadeu:
A fast one-pass-training feature selection technique for GMM-based acoustic event detection with audio-visual data. 2338-2341 - Nobuhide Yamakawa, Tetsuro Kitahara, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition. 2342-2345
Emotion Recognition
- Ling He, Margaret Lech, Nicholas B. Allen:
On the importance of glottal flow spectral energy for the recognition of emotions in speech. 2346-2349 - Laurence Devillers, Christophe Vaudable, Clément Chastagnol:
Real-life emotion-related states detection in call centers: a cross-corpora study. 2350-2353 - Ali Hassan, Robert I. Damper:
Multi-class and hierarchical SVMs for emotion recognition. 2354-2357 - David Philippou-Hübner, Bogdan Vlasenko, Tobias Grosser, Andreas Wendemuth:
Determining optimal features for emotion recognition from speech by applying an evolutionary algorithm. 2358-2361 - Martin Wöllmer, Angeliki Metallinou, Florian Eyben, Björn W. Schuller, Shrikanth S. Narayanan:
Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling. 2362-2365 - Kartik Audhkhasi, Shrikanth S. Narayanan:
Data-dependent evaluator modeling and its application to emotional valence classification from speech. 2366-2369
Speech Coding, Modeling, and Transmission
- Zhanyu Ma, Arne Leijon:
Modelling speech line spectral frequencies with dirichlet mixture models. 2370-2373 - Zhanyu Ma, Arne Leijon:
PDF-optimized LSF vector quantization based on beta mixture models. 2374-2377 - José Enrique García Laínez, Alfonso Ortega, Antonio Miguel, Eduardo Lleida:
Non-linear predictive vector quantization of feature vectors for distributed speech recognition. 2378-2381 - Lasse Laaksonen, Mikko Tammi, Vladimir Malenovsky, Tommy Vaillancourt, Mi Suk Lee, Tomofumi Yamanashi, Masahiro Oshikiri, Claude Lamblin, Balázs Kövesi, Lei Miao, Deming Zhang, Jon Gibbs, Holly Francois:
Superwideband extension of g.718 and g.729.1 speech codecs. 2382-2385 - José L. Carmona, Angel M. Gomez, Antonio M. Peinado, José L. Pérez-Córdoba, José A. González:
A multipulse FEC scheme based on amplitude estimation for CELP codecs over packet networks. 2386-2389 - Anssi Rämö, Henri Toukomaa:
Voice quality evaluation of recent open source codecs. 2390-2393 - Bengt J. Borgström, Per Henrik Borgström, Abeer Alwan:
Efficient HMM-based estimation of missing features, with applications to packet loss concealment. 2394-2397 - Xiaoqiang Xiao, Robert M. Nickel:
Speech inventory based discriminative training for joint speech enhancement and low-rate speech coding. 2398-2401 - Qipeng Gong, Peter Kabal:
Quality-based playout buffering with FEC for conversational voIP. 2402-2405 - Masatsune Tamura, Takehiko Kagoshima, Masami Akamine:
Sub-band basis spectrum model for pitch-synchronous log-spectrum and phase based on approximation of sparse coding. 2406-2409 - Sundar Harshavardhan, Chandra Sekhar Seelamantula, Thippur V. Sreenivas:
A multimodal density function estimation approach to formant tracking. 2410-2413 - Heikki Rasilo, Unto K. Laine, Okko Johannes Räsänen:
Estimation studies of vocal tract shape trajectory using a variable length and lossy kelly-lochbaum model. 2414-2417
Speech Perception: Processing and Intelligibility
- Serajul Haque, Roberto Togneri:
A feature extraction method for automatic speech recognition based on the cochlear nucleus. 2454-2457 - Samuel Thomas, Kailash Patil, Sriram Ganapathy, Nima Mesgarani, Hynek Hermansky:
A phoneme recognition framework based on auditory spectro-temporal receptive fields. 2458-2461 - Amy V. Beeston, Guy J. Brown:
Perceptual compensation for effects of reverberation in speech identification: a computer model based on auditory efferent processing. 2462-2465 - Barbara Schuppler, Mirjam Ernestus, Wim A. van Dommelen, Jacques C. Koreman:
Predicting human perception and ASR classification of word-final [t] by its acoustic sub-segmental properties. 2466-2469 - Matthew Robertson, Guy J. Brown, Wendy Lecluyse, Manasa Panda, Christine M. Tan:
A speech-in-noise test based on spoken digits: comparison of normal and impaired listeners using a computer model. 2470-2473 - Takayuki Kagomiya, Seiji Nakagawa:
Evaluation of bone-conducted ultrasonic hearing-aid regarding transmission of paralinguistic information: a comparison with cochlear implant simulator. 2474-2477 - Tim Jürgens, Stefan Fredelake, Ralf M. Meyer, Birger Kollmeier, Thomas Brand:
Challenging the speech intelligibility index: macroscopic vs. microscopic prediction of sentence recognition in normal and hearing-impaired listeners. 2478-2481 - Verena N. Uslar, Thomas Brand, Mirko Hanke, Rebecca Carroll, Esther Ruigendijk, Cornelia Hamann, Birger Kollmeier:
Does sentence complexity interfere with intelligibility in noise? evaluation of the oldenburg linguistically and audiologically controlled sentence test (OLACS). 2482-2485 - Juan-Pablo Ramirez, Hamed Ketabdar, Alexander Raake:
Intelligibility predictions for speech against fluctuating masker. 2486-2489 - Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano:
An effect of formant amplitude in vowel perception. 2490-2493 - Christopher I. Petkov, Benjamin Wilson:
Functional imaging of brain regions sensitive to communication sounds in primates. 2494-2497
Spoken Language Understanding and Spoken Language Translation I, II
- Ye-Yi Wang:
Strategies for statistical spoken language understanding with small amount of data - an empirical study. 2498-2501 - Bassam Jabaian, Laurent Besacier, Fabrice Lefèvre:
Investigating multiple approaches for SLU portability to a new language. 2502-2505 - Anja Austermann, Seiji Yamada, Kotaro Funakoshi, Mikio Nakano:
Learning naturally spoken commands for a robot. 2506-2509 - Amparo Albalate, Aparna Suchindranath, David Suendermann, Wolfgang Minker:
A semi-supervised cluster-and-label approach for utterance classification. 2510-2513 - Silvia Quarteroni, Giuseppe Riccardi:
Classifying dialog acts in human-human and human-machine spoken conversations. 2514-2517 - Fei Liu, Yang Liu:
Exploring speaker characteristics for meeting summarization. 2518-2521 - Shasha Xie, Hui Lin, Yang Liu:
Semi-supervised extractive speech summarization via co-training algorithm. 2522-2525 - Asli Celikyilmaz, Dilek Hakkani-Tür:
Extractive summarization using a latent variable model. 2526-2529 - Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Hierarchical classification for speech-to-speech translation. 2530-2533 - Matthias Paulik, Alex Waibel:
Rapid development of speech translation using consecutive interpretation. 2534-2537 - Sameer Maskey, Steven J. Rennie, Bowen Zhou:
Combining many alignments for speech to speech translation. 2538-2541 - Pierre Gotab, Géraldine Damnati, Frédéric Béchet, Lionel Delphin-Poulat:
Online SLU model adaptation with a partial oracle. 2862-2865 - Om Deshmukh, Harish Doddala, Ashish Verma, Karthik Visweswariah:
Role of language models in spoken fluency evaluation. 2866-2869 - Sibel Yaman, Dilek Hakkani-Tür, Gökhan Tür:
Social role discovery from spoken language using dynamic Bayesian networks. 2870-2873 - Michelle Hewlett Sanchez, Gökhan Tür, Luciana Ferrer, Dilek Hakkani-Tür:
Domain adaptation and compensation for emotion detection. 2874-2877 - Sankaranarayanan Ananthakrishnan, Rohit Prasad, Prem Natarajan:
Phrase alignment confidence for statistical machine translation. 2878-2881 - Ian R. Lane, Alex Waibel:
Named-entity projection and data-driven morphological decomposition for field maintainable speech-to-speech translation systems. 2882-2885
Social Signals in Speech (Special Session)
- Paul M. Brunet, Marcela Charfuelan, Roderick Cowie, Marc Schröder, Hastings Donnan, Ellen Douglas-Cowie:
Detecting Politeness and efficiency in a cooperative social interaction. 2542-2545 - Nick Campbell, Stefan Scherer:
Comparing measures of synchrony and alignment in dialogue speech timing with respect to turn-taking activity. 2546-2549 - Emina Kurtic, Guy J. Brown, Bill Wells:
Resources for turn competition in overlap in multi-party conversations: speech rate, pausing and duration. 2550-2553 - Khiet P. Truong, Dirk Heylen:
Disambiguating the functions of conversational sounds with prosody: the case of 'yeah'. 2554-2557 - Marcela Charfuelan, Marc Schröder, Ingmar Steiner:
Prosody and voice quality of vocal social signals: the case of dominance in scenario meetings. 2558-2561 - Daniel Neiberg, Joakim Gustafson:
The prosody of Swedish conversational grunts. 2562-2565
Physiology and Pathology of Spoken Language
- Christophe Mertens, Francis Grenez, Lise Crevier-Buchman, Jean Schoentgen:
Reliable tracking based on speech sample salience of vocal cycle length perturbations. 2566-2569 - Hideki Kasuya, Hajime Yoshida, Satoshi Ebihara, Hiroki Mori:
Longitudinal changes of selected voice source parameters. 2570-2573 - Ali Alpan, Jean Schoentgen, Youri Maryn, Francis Grenez:
Automatic perceptual categorization of disordered connected speech. 2574-2577 - Heejin Kim, Panying Rong, Torrey M. Loucks, Mark Hasegawa-Johnson:
Kinematic analysis of tongue movement control in spastic dysarthria. 2578-2581 - Irene Jacobi, Lisette van der Molen, Maya van Rossum, Frans J. M. Hilgers:
Pre- and short-term posttreatment vocal functioning in patients with advanced head and neck cancer treated with concomitant chemoradiotherapy. 2582-2585 - Joan K. Y. Ma, Rüdiger Hoffmann:
Acoustic analysis of intonation in parkinson's disease. 2586-2589
Speaker Diarization
- Carlos Vaquero, Oriol Vinyals, Gerald Friedland:
A hybrid approach to online speaker diarization. 2638-2641 - Simon Bozonnet, Nicholas W. D. Evans, Xavier Anguera, Oriol Vinyals, Gerald Friedland, Corinne Fredouille:
System output combination for improved speaker diarization. 2642-2645 - Simon Bozonnet, Nicholas W. D. Evans, Corinne Fredouille, Dong Wang, Raphaël Troncy:
An integrated top-down/bottom-up approach to speaker diarization. 2646-2649 - Deepu Vijayasenan, Fabio Valente, Hervé Bourlard:
Advances in fast multistream diarization based on the information bottleneck framework. 2650-2653 - Giulia Garau, Alfred Dielmann, Hervé Bourlard:
Audio-visual synchronisation for speaker diarisation. 2654-2657 - Kyu Jeong Han, Shrikanth S. Narayanan:
An improved cluster model selection method for agglomerative hierarchical speaker clustering using incremental Gaussian mixture models. 2658-2661 - Nigel G. Ward, Olac Fuentes, Alejandro Vega:
Dialog prediction for a general model of turn-taking. 2662-2665 - Tobias Herbig, Franz Gerl, Wolfgang Minker:
Speaker tracking in an unsupervised speech controlled system. 2666-2669 - Paula Lopez-Otero, Laura Docío Fernández, Carmen García-Mateo:
MultiBIC: an improved speaker segmentation technique for TV shows. 2670-2673
Multi-Modal ASR, Including Audio-Visual ASR
- John-Paul Hosom, Tom Jakobs, Allen Baker, Susan Fager:
Automatic speech recognition for assistive writing in speech supplemented word prediction. 2674-2677 - Alexey Karpov, Andrey Ronzhin, Konstantin Markov, Milos Zelezný:
Viseme-dependent weight optimization for CHMM-based audio-visual speech recognition. 2678-2681 - Louis H. Terry, Karen Livescu, Janet B. Pierrehumbert, Aggelos K. Katsaggelos:
Audio-visual anticipatory coarticulation modeling by human and machine. 2682-2685 - Matthias Janke, Michael Wand, Tanja Schultz:
Impact of lack of acoustic feedback in EMG-based silent speech recognition. 2686-2689 - Chong-Jia Ni, Wenju Liu, Bo Xu:
Using prosody to improve Mandarin automatic speech recognition. 2690-2693 - Satoshi Tamura, Masato Ishikawa, Takashi Hashiba, Shin'ichi Takeuchi, Satoru Hayamizu:
A robust audio-visual speech recognition using audio-visual voice activity detection. 2694-2697 - Dorothea Kolossa, Jike Chong, Steffen Zeiler, Kurt Keutzer:
Efficient manycore CHMM speech recognition for audiovisual and multistream data. 2698-2701 - Takami Yoshida, Kazuhiro Nakadai:
Two-layered audio-visual integration in voice activity detection and automatic speech recognition for robots. 2702-2705 - Panikos Heracleous, Norihiro Hagita:
Non-audible murmur recognition based on fusion of audio and visual streams. 2706-2709
Speaker and Language Recognition
- Mohamed Faouzi BenZeghiba, Jean-Luc Gauvain, Lori Lamel:
Improved n-gram phonotactic models for language recognition. 2710-2713 - Sirinoot Boonsuk, Donglai Zhu, Bin Ma, Atiwong Suchato, Proadpran Punyabukkana, Nattanun Thatphithakkul, Chai Wutiwiwatchai:
A study of term weighting in phonotactic approach to spoken language recognition. 2714-2717 - Sabato Marco Siniscalchi, Jeremy Reed, Torbjørn Svendsen, Chin-Hui Lee:
Exploiting context-dependency and acoustic resolution of universal speech attribute models in spoken language recognition. 2718-2721 - David Imseng, Mathew Magimai-Doss, Hervé Bourlard:
Hierarchical multilayer perceptron based language identification. 2722-2725 - Alvin F. Martin, Craig S. Greenberg:
The NIST 2010 speaker recognition evaluation. 2726-2729 - Shih-Sian Cheng, I-Fan Chen, Hsin-Min Wang:
Bayesian speaker recognition using Gaussian mixture model and laplace approximation. 2730-2733 - Tomi Kinnunen, Rahim Saeidi, Johan Sandberg, Maria Hansson-Sandsten:
What else is new than the hamming window? robust MFCCs for speaker recognition via multitapering. 2734-2737 - Achintya Kumar Sarkar, Srinivasan Umesh:
Fast computation of speaker characterization vector using MLLR and sufficient statistics in anchor model framework. 2738-2741 - Zahi N. Karam, William M. Campbell:
Graph-embedding for speaker recognition. 2742-2745 - Chang Huai You, Haizhou Li, Kong-Aik Lee:
A hybrid modeling strategy for GMM-SVM speaker recognition with adaptive relevance factor. 2746-2749 - Sundar Harshavardhan, Thippur V. Sreenivas:
Robust mixture modeling using t-distribution: application to speaker ID. 2750-2753 - Chi-Sang Jung, Kyu Jeong Han, Hyunson Seo, Shrikanth S. Narayanan, Hong-Goo Kang:
A variable frame length and rate algorithm based on the spectral kurtosis measure for speaker verification. 2754-2757
Source Localization and Separation
- Kohei Hayashida, Masanori Morise, Takanobu Nishiura:
Near field sound source localization based on cross-power spectrum phase analysis with multiple microphones. 2758-2761 - Jinho Choi, Chang D. Yoo:
A maximum a posteriori sound source localization in reverberant and noisy conditions. 2762-2765 - Tomohiro Nakatani, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto:
Multichannel source separation based on source location cue with log-spectral shaping by hidden Markov source model. 2766-2769 - Duc Thanh Chau, Junfeng Li, Masato Akagi:
A DOA estimation algorithm based on equalization-cancellation theory. 2770-2773 - Tania Habib, Harald Romsdorfer:
Concurrent speaker localization using multi-band position-pitch (m-popi) algorithm with spectro-temporal pre-processing. 2774-2777 - Ji-Hyun Song, Kyu-Ho Lee, Yun-Sik Park, Sang-Ick Kang, Joon-Hyuk Chang:
On using Gaussian mixture model for double-talk detection in acoustic echo suppression. 2778-2781 - Cemil Demir, A. Taylan Cemgil, Murat Saraclar:
Catalog-based single-channel speech-music separation. 2782-2785 - Ke Hu, DeLiang Wang:
Unvoiced speech segregation based on CASA and spectral subtraction. 2786-2789 - Ke Hu, DeLiang Wang:
Unsupervised sequential organization for cochannel speech separation. 2790-2793
INTERSPEECH 2010 Paralinguistic Challenge (Special Session)
- Björn W. Schuller, Stefan Steidl, Anton Batliner, Felix Burkhardt, Laurence Devillers, Christian A. Müller, Shrikanth S. Narayanan:
The INTERSPEECH 2010 paralinguistic challenge. 2794-2797 - Florian Lingenfelser, Johannes Wagner, Thurid Vogt, Jonghwa Kim, Elisabeth André:
Age and gender classification from speech using decision level fusion and ensemble based techniques. 2798-2801 - Je Hun Jeon, Rui Xia, Yang Liu:
Level of interest sensing in spoken dialog using multi-level fusion of acoustic and lexical evidence. 2802-2805 - Phuoc Nguyen, Trung Le, Dat Tran, Xu Huang, Dharmendra Sharma:
Fuzzy support vector machines for age and gender classification. 2806-2809 - Rok Gajsek, Janez Zibert, Tadej Justin, Vitomir Struc, Bostjan Vesnicer, France Mihelic:
Gender and affect recognition based on GMM and GMM-UBM modeling with relevance MAP estimation. 2810-2813 - Royi Porat, Dan Lange, Yaniv Zigel:
Age recognition based on speech signals using weights supervector. 2814-2817 - Hugo Meinedo, Isabel Trancoso:
Age and gender classification using fusion of acoustic and prosodic features. 2818-2821 - Marcel Kockmann, Lukás Burget, Jan Cernocký:
Brno university of technology system for interspeech 2010 paralinguistic challenge. 2822-2825 - Ming Li, Chi-Sang Jung, Kyu Jeong Han:
Combining five acoustic level modeling methods for automatic speaker age and gender recognition. 2826-2829 - Tobias Bocklet, Georg Stemmer, Viktor Zeißler, Elmar Nöth:
Age and gender recognition based on multiple systems - early vs. late fusion. 2830-2833 - Michael Feld, Felix Burkhardt, Christian A. Müller:
Automatic speaker age and gender recognition in the car for tailoring dialog and mobile services. 2834-2837
Signal Processing for Music and Song
- Kiyoaki Aikawa, Junko Uenuma, Tomoko Akitake:
Acoustic correlates of voice quality improvement by voice training. 2886-2889 - Minghui Dong, Paul Y. Chan, Ling Cen, Haizhou Li, Jason Teo, Ping Jen Kua:
Phonetic segmentation of singing voice using MIDI and parallel speech. 2890-2893 - Keijiro Saino, Makoto Tachibana, Hideki Kenmochi:
A singing style modeling system for singing voice synthesizers. 2894-2897 - Jingzhou Yang, Jia Liu, Weiqiang Zhang:
A fast query by humming system based on notes. 2898-2901 - Seokhwan Jo, Sihyun Joo, Chang D. Yoo:
Melody pitch estimation based on range estimation and candidate extraction using harmonic structure model. 2902-2905 - Jihoon Park, Kwang-Ki Kim, Jeongil Seo, Minsoo Hahn:
Modified spatial audio object coding scheme with harmonic extraction and elimination structure for interactive audio service. 2906-2909
Modeling First Language Acquisition
- Christina Bergmann, Michele Gubian, Lou Boves:
Modelling the effect of speaker familiarity and noise on infant word recognition. 2910-2913 - Kouki Miyazawa, Hideaki Kikuchi, Reiko Mazuka:
Unsupervised learning of vowels from continuous speech based on self-organized phoneme acquisition model. 2914-2917 - Andrew R. Plummer, Mary E. Beckman, Mikhail Belkin, Eric Fosler-Lussier, Benjamin Munson:
Learning speaker normalization using semisupervised manifold alignment. 2918-2921 - Okko Johannes Räsänen:
Fully unsupervised word learning from continuous speech using transitional probabilities of atomic acoustic events. 2922-2925 - Louis ten Bosch, Lou Boves:
Language acquisition and cross-modal associations: computational simulation of the result of infant studies. 2926-2929 - Maarten Versteegh, Louis ten Bosch, Lou Boves:
Active word learning under uncertain input conditions. 2930-2933
Discourse and Dialogue
- Rémi Lavalley, Chloé Clavel, Patrice Bellot, Marc El-Bèze:
Combining text categorization and dialog modeling for speaker role identification on call center conversations. 3062-3065 - Akira Nakamura, Satoru Hayamizu:
Topic-dependent n-gram models based on optimization of context lengths in LDA. 3066-3069 - Nicolas Obin, Volker Dellwo, Anne Lacheret, Xavier Rodet:
Expectations for discourse genre identification: a prosodic study. 3070-3073 - Ramón Granell, Stephen G. Pulman, Carlos D. Martínez-Hinarejos, José-Miguel Benedí:
Dialogue act tagging and segmentation with a single perceptron. 3074-3077 - Yasuhisa Fujii, Kazumasa Yamamoto, Seiichi Nakagawa:
Improving the readability of class lecture ASR results using a confusion network. 3078-3081
Voice Activity and Turn Detection
- Sang-Kyun Kim, Jae-Hun Choi, Sang-Ick Kang, Ji-Hyun Song, Joon-Hyuk Chang:
Toward detecting voice activity employing soft decision in second-order conditional MAP. 3082-3085 - Xugang Lu, Masashi Unoki, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:
Voice activity detection in a reguarized reproducing kernel hilbert space. 3086-3089 - Ji Wu, Xiao-Lei Zhang, Wei Li:
A new VAD framework using statistical model and human knowledge based empirical rule. 3090-3093 - Mark C. Huggins, Brett Y. Smolenski, Aaron D. Lawson:
Adaptive high accuracy approaches to speech activity detection in noisy and hostile audio environments. 3094-3097 - Prasanta Kumar Ghosh, Andreas Tsiartas, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Robust voice activity detection in stereo recording with crosstalk. 3098-3101 - Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani:
Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization. 3102-3105 - Bowon Lee, Debargha Muhkerjee:
Spectral entropy-based voice activity detector for videoconferencing systems. 3106-3109 - David Dean, Sridha Sridharan, Robert Vogt, Michael Mason:
The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms. 3110-3113 - Tao Yu, John H. L. Hansen:
A Bayesian approach to voice activity detection using multiple statistical models and discriminative training. 3114-3117 - Houman Ghaemmaghami, Brendan Baker, Robert Vogt, Sridha Sridharan:
Noise robust voice activity detection using features extracted from the time-domain autocorrelation function. 3118-3121 - Tasuku Oonishi, Koji Iwano, Sadaoki Furui:
VAD-measure-embedded decoder with online model adaptation. 3122-3125 - Shiwen Deng, Jiqing Han:
Robust statistical voice activity detection using a likelihood ratio sign test. 3126-3129 - Alexei V. Ivanov, Giuseppe Riccardi:
Automatic turn segmentation in spoken conversations. 3130-3133 - Yohei Kawaguchi, Masahito Togami, Yasunari Obuchi:
Turn taking-based conversation detection by using DOA estimation. 3134-3137
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.