default search action
ASRU 2011: Waikoloa, HI, USA
- David Nahamoo, Michael Picheny:
2011 IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2011, Waikoloa, HI, USA, December 11-15, 2011. IEEE 2011, ISBN 978-1-4673-0365-1
Acoustic Modeling
- Simon Wiesler, Ralf Schlüter, Hermann Ney:
A convergence analysis of log-linear training and its application to speech recognition. 1-6 - Muhammad Ali Tahir, Ralf Schlüter, Hermann Ney:
Discriminative splitting of Gaussian/log-linear mixture HMMs for speech recognition. 7-11 - Ryuki Tachibana, Takashi Fukuda, Upendra V. Chaudhari, Bhuvana Ramabhadran, Puming Zhan:
Frame-level AnyBoost for LVCSR with the MMI Criterion. 12-17 - Shi-Xiong Zhang, Mark J. F. Gales:
Extending noise robust structured support vector machines to larger vocabulary tasks. 18-23 - Frank Seide, Gang Li, Xie Chen, Dong Yu:
Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription. 24-29 - Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran, Petr Fousek, Petr Novák, Abdel-rahman Mohamed:
Making Deep Belief Networks effective for large vocabulary continuous speech recognition. 30-35 - Martin Wöllmer, Björn W. Schuller, Gerhard Rigoll:
A novel bottleneck-BLSTM front-end for feature-level context modeling in conversational speech recognition. 36-41 - Karel Veselý, Martin Karafiát, Frantisek Grézl:
Convolutive Bottleneck Network features for LVCSR. 42-47 - Wen-Lin Zhang, Wei-Qiang Zhang, Bi-Cheng Li:
Speaker adaptation based on speaker-dependent eigenphone estimation. 48-52 - Peder A. Olsen, Jing Huang, Vaibhava Goel, Steven J. Rennie:
Sparse Maximum A Posteriori adaptation. 53-58 - Tara N. Sainath, David Nahamoo, Dimitri Kanevsky, Bhuvana Ramabhadran, Parikshit M. Shah:
A convex hull approach to sparse representations for exemplar-based speech recognition. 59-64 - George Saon, Jen-Tzung Chien:
Some properties of Bayesian sensing hidden Markov models. 65-70 - Dan Gillick, Larry Gillick, Steven Wegmann:
Don't multiply lightly: Quantifying problems with the acoustic model assumptions in speech recognition. 71-76 - Rohit Prabhavalkar, Eric Fosler-Lussier, Karen Livescu:
A factored conditional random field model for articulatory feature forced transcription. 77-82 - Hiroshi Fujimura, Masanobu Nakamura, Yusuke Shinohara, Takashi Masuko:
N-Best rescoring by adaboost phoneme classifiers for isolated word recognition. 83-88 - Hung-An Chang, James R. Glass:
Multi-level context-dependent acoustic modeling for automatic speech recognition. 89-94 - Matthias Paulik, Panchi Panchapagesan:
Leveraging large amounts of loosely transcribed corporate videos for acoustic model training. 95-100
ASR Robustness
- Jort F. Gemmeke, Hugo Van hamme:
An hierarchical exemplar-based sparse model of speech, with an application to ASR. 101-106 - Khe Chai Sim, Minh-Thang Luong:
A Trajectory-based Parallel Model Combination with a unified static and dynamic parameter compensation for noisy speech recognition. 107-112 - Yongqiang Wang, Mark J. F. Gales:
Improving reverberant VTS for hands-free robust speech recognition. 113-118 - Anton Ragni, Mark J. F. Gales:
Derivative kernels for noise robust ASR. 119-124 - Rogier C. van Dalen, Mark J. F. Gales:
A variational perspective on noise-robust speech recognition. 125-130 - Vikramjit Mitra, Hosung Nam, Carol Y. Espy-Wilson:
Robust speech recognition using articulatory gestures in a Dynamic Bayesian Network framework. 131-136 - Steven J. Rennie, Pierre L. Dognin, Petr Fousek:
Matched-condition robust Dynamic Noise Adaptation. 137-140 - Mickael Rouvier, Mohamed Bouallegue, Driss Matrouf, Georges Linarès:
Factor analysis based session variability compensation for Automatic Speech Recognition. 141-145 - Michael L. Seltzer, Alex Acero:
Factored adaptation for separable compensation of speaker and environmental variability. 146-151 - Martin Karafiát, Lukás Burget, Pavel Matejka, Ondrej Glembek, Jan Cernocký:
iVector-based discriminative adaptation for automatic speech recognition. 152-157 - Daniel Povey, Geoffrey Zweig, Alex Acero:
Speaker adaptation with an Exponential Transform. 158-163 - Sid-Ahmed Selouani:
Evolutionary discriminative speaker adaptation. 164-168 - Arata Itoh, Sunao Hara, Norihide Kitaoka, Kazuya Takeda:
Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation. 169-172 - Yasunari Obuchi, Ryu Takeda, Masahito Togami:
Bidirectional OM-LSA speech estimator for noise robust speech recognition. 173-178 - Ken'ichi Kumatani, John W. McDonough, Bhiksha Raj:
Maximum kurtosis beamforming with a subspace filter for distant speech recognition. 179-184 - Cemil Demir, Ali Taylan Cemgil, Murat Saraclar:
Gain estimation approaches in catalog-based single-channel speech-music separation. 185-190 - Hiroko Murakami, Koichi Shinoda, Sadaoki Furui:
Designing text corpus using phone-error distribution for acoustic modeling. 191-195
Language Modeling and ASR Systems
- Tomás Mikolov, Anoop Deoras, Daniel Povey, Lukás Burget, Jan Cernocký:
Strategies for training large scale neural network language models. 196-201 - Hasim Sak, Murat Saraclar, Tunga Gungor:
Discriminative reranking of ASR hypotheses with morpholexical and N-best-list features. 202-207 - Hong-Kwang Jeff Kuo, Ebru Arisoy, Lidia Mangu, George Saon:
Minimum Bayes risk discriminative language models for Arabic speech recognition. 208-213 - Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur:
Efficient discriminative training of long-span language models. 214-219 - Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur:
Adapting n-gram maximum entropy language models with conditional entropy regularization. 220-225 - Puyang Xu, Sanjeev Khudanpur, Asela Gunawardana:
Randomized maximum entropy language models. 226-230 - Jia Cui, Stanley F. Chen, Bowen Zhou:
Efficient representation and fast look-up of Maximum Entropy language models. 231-236 - Stanley F. Chen, Abhinav Sethy, Bhuvana Ramabhadran:
Pruning exponential language models. 237-242 - Timo Mertens, Stephanie Seneff:
Subword-based automatic lexicon learning for Speech Recognition. 243-248 - Upendra V. Chaudhari, Xiaodong Cui, Bowen Zhou, Rong Zhang:
An investigation of heuristic, manual and statistical pronunciation derivation for Pashto. 249-253 - Timo Mertens, Kit Thambiratnam, Frank Seide:
Subword-based multi-span pronunciation adaptation for recognizing accented speech. 254-259 - Horia Cucu, Laurent Besacier, Corneliu Burileanu, Andi Buzo:
Investigating the role of machine translated text in ASR domain adaptation: Unsupervised and semi-supervised methods. 260-265 - Hagen Soltau, Lidia Mangu, Fadi Biadsy:
From Modern Standard Arabic to Levantine ASR: Leveraging GALE for dialects. 266-271 - Lidia Mangu, Hong-Kwang Kuo, Stephen M. Chu, Brian Kingsbury, George Saon, Hagen Soltau, Fadi Biadsy:
The IBM 2011 GALE Arabic speech transcription system. 272-277 - Fethi Bougares, Yannick Estève, Paul Deléglise, Georges Linarès:
Bag of n-gram driven decoding for LVCSR system harnessing. 278-282 - Izhak Shafran, Richard Sproat, Mahsa Yarmohammadi, Brian Roark:
Efficient determinization of tagged word lattices using categorial and lexicographic semirings. 283-288
TTS, Dialog and MLSP
- William Yang Wang, Kallirroi Georgila:
Automatic detection of unnatural word-level segments in unit-selection speech synthesis. 289-294 - Chai Wutiwiwatchai, Ausdang Thangthai, Ananlada Chotimongkol, Chatchawarn Hansakunbuntheung, Nattanun Thatphithakkul:
Accent level adjustment in bilingual Thai-English text-to-speech synthesis. 295-299 - Jerome R. Bellegarda:
Sentiment analysis of text-to-speech input using latent affective mapping. 300-305 - José Lopes, Maxine Eskénazi, Isabel Trancoso:
Towards choosing better primes for spoken dialog systems. 306-311 - Milica Gasic, Filip Jurcícek, Blaise Thomson, Kai Yu, Steve J. Young:
On-line policy optimisation of spoken dialogue systems via live interaction with human subjects. 312-317 - Toyomi Meguro, Yasuhiro Minami, Ryuichiro Higashinaka, Kohji Dohsaka:
Wizard of Oz evaluation of listening-oriented dialogue control using POMDP. 318-323 - Jingjing Liu, Stephanie Seneff:
A dialogue system for accessing drug reviews. 324-329 - Ryuichiro Higashinaka, Noriaki Kawamae, Kugatsu Sadamitsu, Yasuhiro Minami, Toyomi Meguro, Kohji Dohsaka, Hirohito Inagaki:
Building a conversational model from two-tweets. 330-335 - Mitsuru Takaoka, Hiromitsu Nishizaki, Yoshihiro Sekiguchi:
Utterance verification using garbage words for a hospital appointment system with speech interface. 336-341 - Shuai Huang, Damianos G. Karakos, Glen A. Coppersmith, Kenneth Ward Church, Sabato Marco Siniscalchi:
Bootstrapping a spoken language identification system using unsupervised integrated sensing and processing decision trees. 342-347 - David Imseng, Ramya Rasipuram, Mathew Magimai-Doss:
Fast and flexible Kullback-Leibler divergence based acoustic modeling for non-native speech recognition. 348-353 - Yanmin Qian, Ji Xu, Daniel Povey, Jia Liu:
Strategies for using MLP based features with limited target-language training data. 354-358 - Frantisek Grézl, Martin Karafiát, Milos Janda:
Study of probabilistic and Bottle-Neck features in multilingual environment. 359-364 - Liang Lu, Arnab Ghoshal, Steve Renals:
Regularized subspace Gaussian mixture models for cross-lingual speech recognition. 365-370 - Christian Plahl, Ralf Schlüter, Hermann Ney:
Cross-lingual portability of Chinese and english neural network features for French and German LVCSR. 371-376 - Luis Javier Rodríguez, Mikel Peñagarikano, Amparo Varona, Mireia Díez, Germán Bordel, David Martínez González, Jesús Antonio Villalba López, Antonio Miguel, Alfonso Ortega, Eduardo Lleida, Alberto Abad, Oscar Koller, Isabel Trancoso, Paula Lopez-Otero, Laura Docío Fernández, Carmen García-Mateo, Rahim Saeidi, Mehdi Soufifar, Tomi Kinnunen, Torbjørn Svendsen, Pasi Fränti:
Multi-site heterogeneous system fusions for the Albayzin 2010 Language Recognition Evaluation. 377-382
Spoken Document Retrieval and Spoken Language Understanding
- Tsung-wei Tu, Hung-yi Lee, Lin-Shan Lee:
Improved spoken term detection using support vector machines with acoustic and context features from pseudo-relevance feedback. 383-388 - Berlin Chen, Pei-Ning Chen, Kuan-Yu Chen:
Query modeling for spoken document retrieval. 389-394 - Timothy J. Hazen, Man-Hung Siu, Herbert Gish, Steve Lowe, Arthur Chan:
Topic modeling for spoken documents using only phonetic information. 395-400 - Aren Jansen, Benjamin Van Durme:
Efficient spoken term discovery using randomized algorithms. 401-406 - Damianos G. Karakos, Mark Dredze, Ken Ward Church, Aren Jansen, Sanjeev Khudanpur:
Estimating document frequencies in a speech corpus. 407-412 - Weiqun Xu, Changchun Bao, Yali Li, Jielin Pan, Yonghong Yan:
Robust understanding of spoken Chinese through character-based tagging and prior knowledge exploitation. 413-418 - Dilek Hakkani-Tür, Gökhan Tür, Larry P. Heck, Asli Celikyilmaz, Ashley Fidler, Dustin Hillard, Rukmini Iyer, Sarangarajan Parthasarathy:
Employing web search query click logs for multi-domain spoken language understanding. 419-424 - Asli Celikyilmaz, Dilek Hakkani-Tür, Gökhan Tür, Ashley Fidler, Dustin Hillard:
Exploiting distance based similarity in topic models for user intent detection. 425-430 - Liva Ralaivola, Benoît Favre, Pierre Gotab, Frédéric Béchet, Géraldine Damnati:
Applying Multiclass Bandit algorithms to call-type classification. 431-436 - Babak Loni, Seyedeh Halleh Khoshnevis, Pascal Wiggers:
Latent semantic analysis for question classification with neural networks. 437-442 - Bin Zhang, Alex Marin, Brian Hutchinson, Mari Ostendorf:
Analyzing conversations using rich phrase patterns. 443-448 - Anthony P. Stark, Izhak Shafran, Jeffrey A. Kaye:
Supervised and unsupervised feature selection for inferring social nature of telephone conversations from their content. 449-454 - Yangyang Shi, Pascal Wiggers, Catholijn M. Jonker:
Socio-situational setting classification based on language use. 455-460 - Klaus Zechner, Xiaoming Xi, Lei Chen:
Evaluating prosodic features for automated scoring of non-native read speech. 461-466 - Di Lu, Takuya Nishimoto, Nobuaki Minematsu:
Decision of response timing for incremental speech recognition with reinforcement learning. 467-472
New Applications in Speech Processing
- Lei Chen:
Applying feature bagging for more accurate and robust automated speaking assessment. 473-477 - Tobias Bocklet, Elmar Nöth, Georg Stemmer, Hana Ruzickova, Jan Rusz:
Detection of persons with Parkinson's disease by acoustic, vocal, and prosodic analysis. 478-483 - Emily Tucker Prud'hommeaux, Brian Roark:
Alignment of spoken narratives for automated neuropsychological assessment. 484-489 - Jiahong Yuan, Mark Liberman:
Automatic detection of "g-dropping" in American English using forced alignment. 490-493 - Shunta Ishii, Tomoki Toda, Hiroshi Saruwatari, Sakriani Sakti, Satoshi Nakamura:
Blind noise suppression for Non-Audible Murmur recognition with stereo signal processing. 494-499 - Chao Zhang, Yi Liu, Chin-Hui Lee:
Detection-based accented speech recognition using articulatory features. 500-505 - Alfonso M. Canterla, Magne Hallstein Johnsen:
Minimum detection error training of subword detectors. 506-511 - Mohamed Bouallegue, Driss Matrouf, Mickael Rouvier, Georges Linarès:
Subspace Gaussian Mixture Models for vectorial HMM-states representation. 512-516 - Guangpu Huang, Meng Joo Er:
A novel neural-based pronunciation modeling method for robust speech recognition. 517-522 - Zixing Zhang, Felix Weninger, Martin Wöllmer, Björn W. Schuller:
Unsupervised learning in cross-corpus acoustic emotion recognition. 523-528 - Sankaranarayanan Ananthakrishnan, Aravind Namandi Vembu, Rohit Prasad:
Model-based parametric features for emotion recognition from speech. 529-534 - Jason D. Williams, I. Dan Melamed, Tirso Alonso, Barbara Hollister, Jay G. Wilpon:
Crowd-sourcing for difficult transcription of speech. 535-540 - Kengo Ohta, Masatoshi Tsuchiya, Seiichi Nakagawa:
Detection of precisely transcribed parts from inexact transcribed corpus. 541-546 - Md. Jahangir Alam, Tomi Kinnunen, Patrick Kenny, Pierre Ouellet, Douglas D. O'Shaughnessy:
Multi-taper MFCC features for speaker verification using I-vectors. 547-552 - Ekaterina Gonina, Gerald Friedland, Henry Cook, Kurt Keutzer:
Fast speaker diarization using a high-level scripting language. 553-558 - Xinhui Zhou, Daniel Garcia-Romero, Ramani Duraiswami, Carol Y. Espy-Wilson, Shihab A. Shamma:
Linear versus mel frequency cepstral coefficients for speaker recognition. 559-564
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.