default search action
INTERSPEECH 2014: Singapore
- Haizhou Li, Helen M. Meng, Bin Ma, Engsiong Chng, Lei Xie:
15th Annual Conference of the International Speech Communication Association, INTERSPEECH 2014, Singapore, September 14-18, 2014. ISCA 2014
Keynote
- Anne Cutler:
Learning about speech. - K. J. Ray Liu:
Decision learning in data science: where John Nash meets social media. - Lori Lamel:
Language diversity: speech processing in a multi-lingual context. - William S.-Y. Wang:
Sound patterns in language. - Li Deng:
Achievements and challenges of deep learning - from speech analysis and recognition to language and multimodal processing.
Deep Neural Networks for Speech Generation and Synthesis (Special
- Dong Yu, Adam Eversole, Michael L. Seltzer, Kaisheng Yao, Brian Guenter, Oleksii Kuchaiev, Frank Seide, Huaming Wang, Jasha Droppo, Zhiheng Huang, Geoffrey Zweig, Christopher J. Rossbach, Jon Currey:
An introduction to computational networks and the computational network toolkit (invited talk).
Multi-Lingual ASR
- Anne Cutler, Yu Zhang, Ekapol Chuangsuwanich, James R. Glass:
Language ID-based training of multilingual stacked bottleneck features. 1-5 - Van Hai Do, Xiong Xiao, Chng Eng Siong, Haizhou Li:
Kernel density-based acoustic model with cross-lingual bottleneck features for resource limited LVCSR. 6-10 - Ngoc Thang Vu, Yuanfan Wang, Marten Klose, Zlatka Mihaylova, Tanja Schultz:
Improving ASR performance on non-native speech using multilingual and crosslingual information. 11-15 - Kate M. Knill, Mark J. F. Gales, Anton Ragni, Shakti P. Rath:
Language independent and unsupervised acoustic models for speech recognition and keyword spotting. 16-20 - Peter Bell, Joris Driesen, Steve Renals:
Cross-lingual adaptation with multi-task adaptive networks. 21-25 - Marzieh Razavi, Mathew Magimai-Doss:
On recognition of non-native speech using probabilistic lexical model. 26-30
Prosody Processing
- Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Direct F0 control of an electrolarynx based on statistical excitation feature prediction and its evaluation through simulation. 31-35 - Daniel R. van Niekerk, Etienne Barnard:
A target approximation intonation model for yorùbá TTS. 36-40 - Anandaswarup Vadapalli, Kishore Prahallad:
Learning continuous-valued word representations for phrase break prediction. 41-45 - Hao Che, Jianhua Tao, Ya Li:
Improving Mandarin prosodic boundary prediction with rich syntactic features. 46-50 - Rasmus Dall, Marcus Tomalin, Mirjam Wester, William J. Byrne, Simon King:
Investigating automatic & human filled pause insertion for speech synthesis. 51-55 - Rasmus Dall, Mirjam Wester, Martin Corley:
The effect of filled pauses and speaking rate on speech comprehension in natural, vocoded and synthetic speech. 56-60
Speaker Recognition - Applications
- Elie Khoury, Tomi Kinnunen, Aleksandr Sizov, Zhizheng Wu, Sébastien Marcel:
Introducing i-vectors for joint anti-spoofing and speaker verification. 61-65 - Ryan Leary, Walter Andrews:
Random projections for large-scale speaker search. 66-70 - Corinne Fredouille, Delphine Charlet:
Analysis of i-vector framework for speaker identification in TV-shows. 71-75 - Antoine Laurent, Nathalie Camelin, Christian Raymond:
Boosting bonsai trees for efficient features combination: application to speaker role identification. 76-80 - Yves Raimond, Thomas Nixon:
Identifying contributors in the BBC world service archive. 81-85 - Finnian Kelly, Rahim Saeidi, Naomi Harte, David A. van Leeuwen:
Effect of long-term ageing on i-vector speaker verification. 86-90
Phonetics and Phonology 1, 2
- Maarten Versteegh, Amanda Seidl, Alejandrina Cristià:
Acoustic correlates of phonological status. 91-95 - Manu Airaksinen, Paavo Alku:
Parameterization of the glottal source with the phase plane plot. 96-100 - Phil Rose:
Transcribing tone - a likelihood-based quantitative evaluation of chao's tone letters. 101-105 - Diyana Hamzah, James Sneed German:
Intonational phonology and prosodic hierarchy in malay. 106-110 - Uwe D. Reichel, Katalin Mády:
Comparing parameterizations of pitch register and its discontinuities at prosodic boundaries for Hungarian. 111-115 - George Christodoulides, Mathieu Avanzi:
An evaluation of machine learning methods for prominence detection in French. 116-119
Open Domain Situated Conversational Interaction (Special Session)
- Aasish Pappu, Alexander I. Rudnicky:
Learning situated knowledge bases through dialog. 120-124 - Teruhisa Misu:
Crowdsourcing for situated dialog systems in a moving car. 125-129 - Ryuichiro Higashinaka, Toyomi Meguro, Kenji Imamura, Hiroaki Sugiyama, Toshiro Makino, Yoshihiro Matsuo:
Evaluating coherence in open domain conversational systems. 130-134 - Frédéric Béchet, Alexis Nasr, Benoît Favre:
Adapting dependency parsing to spontaneous speech for open domain spoken language understanding. 135-139 - Milica Gasic, Dongho Kim, Pirros Tsiakoulis, Catherine Breslin, Matthew Henderson, Martin Szummer, Blaise Thomson, Steve J. Young:
Incremental on-line adaptation of POMDP-based dialogue managers to extended domains. 140-144 - Jean-Philippe Robichaud, Paul A. Crook, Puyang Xu, Omar Zia Khan, Ruhi Sarikaya:
Hypotheses ranking for robust domain classification and tracking in dialogue systems. 145-149
Speech Production: Models and Acoustics
- Vikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan:
Motor control primitives arising from a learned dynamical systems model of speech articulation. 150-154 - Chia-Hsin Yeh, Chiung-Yao Wang, Jung-Yueh Tu:
Nonword repetition of taiwanese disyllabic tonal sequences in adults with language attrition. 155-158 - Andreas Windmann, Juraj Simko, Petra Wagner:
A unified account of prominence effects in an optimization-based model of speech timing. 159-163 - Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan:
Estimation of the movement trajectories of non-crucial articulators based on the detection of crucial moments and physiological constraints. 164-168 - Prasad Sudhakar, Prasanta Kumar Ghosh:
Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: benefit to speech recognition. 169-173 - Jun Wang, William F. Katz, Thomas F. Campbell:
Contribution of tongue lateral to consonant production. 174-178 - Min Liu, Shuju Shi, Jinsong Zhang:
A preliminary study on acoustic correlates of tone2+tone2 disyllabic word stress in Mandarin. 179-183 - Mohammad Abuoudeh, Olivier Crouzet:
Vowel length impact on locus equation parameters: an investigation on jordanian Arabic. 184-188 - Philip J. Roberts, Henning Reetz, Aditi Lahiri:
Corpus-testing a fricative discriminator; or, just how invariant is this invariant? 189-192 - Brian O. Bush, Alexander Kain:
Modeling coarticulation in continuous speech. 193-197 - Khalid Daoudi, Blaise Bertrac:
On classification between normal and pathological voices using the MEEI-kayPENTAX database: issues and consequences. 198-202 - Véronique Bukmaier, Jonathan Harrington, Ulrich Reubold, Felicitas Kleber:
Synchronic variation in the articulation and the acoustics of the Polish three-way place distinction in sibilants and its implications for diachronic change. 203-207
Extraction of Para-Linguistic Information
- Rahul Gupta, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan:
Predicting client's inclination towards target behavior change in motivational interviewing and investigating the role of laughter. 208-212 - Bo Xiao, Daniel Bone, Maarten Van Segbroeck, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Modeling therapist empathy through prosody in drug addiction counseling. 213-217 - Daniel Bone, Chi-Chun Lee, Alexandros Potamianos, Shrikanth S. Narayanan:
An investigation of vocal arousal dynamics in child-psychologist interactions using synchrony measures and a conversation-based model. 218-222 - Kun Han, Dong Yu, Ivan Tashev:
Speech emotion recognition using deep neural network and extreme learning machine. 223-227 - Khiet P. Truong, Gerben J. Westerhof, Franciska de Jong, Dirk Heylen:
An annotation scheme for sighs in spontaneous dialogue. 228-232 - Lei He, Volker Dellwo:
Speaker idiosyncratic variability of intensity across syllables. 233-237 - Soroosh Mariooryad, Reza Lotfian, Carlos Busso:
Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora. 238-242 - Saeid Safavi, Martin J. Russell, Peter Jancovic:
Identification of age-group from children's speech by computers and humans. 243-247
Spoken Language Understanding
- Mohamed Morchid, Richard Dufour, Mohamed Bouallegue, Georges Linarès, Renato De Mori:
Theme identification in human-human conversations with features from specific speaker type hidden spaces. 248-252 - Alex Marin, Roman Holenstein, Ruhi Sarikaya, Mari Ostendorf:
Learning phrase patterns for text classification using a knowledge graph and unlabeled data. 253-257 - Puyang Xu, Ruhi Sarikaya:
Targeted feature dropout for robust slot filling in natural language understanding. 258-262 - Sz-Rung Shiang, Hung-yi Lee, Lin-Shan Lee:
Spoken question answering using tree-structured conditional random fields and two-layer random walk. 263-267 - Ruhi Sarikaya, Asli Celikyilmaz, Anoop Deoras, Minwoo Jeong:
Shrinkage based features for slot tagging with conditional random fields. 268-272 - Yangyang Shi, Yi-Cheng Pan, Mei-Yuh Hwang:
Cluster based Chinese abbreviation modeling. 273-277 - Xiantao Zhang, Dongchen Li, Xihong Wu:
Parsing named entity as syntactic structure. 278-282 - Gökhan Tür, Anoop Deoras, Dilek Hakkani-Tür:
Detecting out-of-domain utterances addressed to a virtual personal assistant. 283-287 - Spiros Georgiladakis, Christina Unger, Elias Iosif, Sebastian Walter, Philipp Cimiano, Euripides G. M. Petrakis, Alexandros Potamianos:
Fusion of knowledge-based and data-driven approaches to grammar induction. 288-292 - Denys Katerenchuk, Andrew Rosenberg:
Improving named entity recognition with prosodic features. 293-297 - Suman V. Ravuri, Andreas Stolcke:
Neural network models for lexical addressee detection. 298-302 - Valerie Freeman, Julian Chan, Gina-Anne Levow, Richard A. Wright, Mari Ostendorf, Victoria Zayats:
Manipulating stance and involvement using collaborative tasks: an exploratory comparison. 303-307
Spoken Dialogue Systems
- Fabrizio Ghigi, Maxine Eskénazi, M. Inés Torres, Sungjin Lee:
Incremental dialog processing in a task-oriented dialog. 308-312 - Naoki Hotta, Kazunori Komatani, Satoshi Sato, Mikio Nakano:
Detecting incorrectly-segmented utterances for posteriori restoration of turn-taking and ASR results. 313-317 - Hany Hassan, Lee Schwartz, Dilek Hakkani-Tür, Gökhan Tür:
Segmentation and disfluency removal for conversational speech translation. 318-322 - Shinji Watanabe, John R. Hershey, Tim K. Marks, Youichi Fujii, Yusuke Koji:
Cost-level integration of statistical and rule-based dialog managers. 323-327 - Dongho Kim, Catherine Breslin, Pirros Tsiakoulis, Milica Gasic, Matthew Henderson, Steve J. Young:
Inverse reinforcement learning for micro-turn management. 328-332 - John Kane, Irena Yanushevskaya, Céline De Looze, Brian Vaughan, Ailbhe Ní Chasaide:
Analysing the prosodic characteristics of speech-chunks preceding silences in task-based interactions. 333-337
DNN Architectures and Robust Recognition
- Hasim Sak, Andrew W. Senior, Françoise Beaufays:
Long short-term memory recurrent neural network architectures for large scale acoustic modeling. 338-342 - George Saon, Hagen Soltau, Ahmad Emami, Michael Picheny:
Unfolded recurrent neural networks for speech recognition. 343-347 - Vikrant Singh Tomar, Richard C. Rose:
Manifold regularized deep neural networks. 348-352 - Bo Li, Khe Chai Sim:
Modeling long temporal contexts for robust DNN-based speech recognition. 353-357 - Feipeng Li, Phani S. Nidadavolu, Hynek Hermansky:
A long, deep and wide artificial neural net for robust speech recognition in unknown noise. 358-362 - Ladislav Seps, Jirí Málek, Petr Cerva, Jan Nouza:
Investigation of deep neural networks for robust recognition of nonlinearly distorted speech. 363-367
Speaker Recognition - Evaluation and Forensics
- Désiré Bansé, George R. Doddington, Daniel Garcia-Romero, John J. Godfrey, Craig S. Greenberg, Alvin F. Martin, Alan McCree, Mark A. Przybocki, Douglas A. Reynolds:
Summary and initial results of the 2013-2014 speaker recognition i-vector machine learning challenge. 368-372 - David A. van Leeuwen, Niko Brümmer:
Constrained speaker linking. 373-377 - Sergey Novoselov, Timur Pekhovsky, Konstantin Simonchik, Andrey Shulipa:
RBM-PLDA subsystem for the NIST i-vector challenge. 378-382 - Stephen H. Shum, Najim Dehak, James R. Glass:
Limited labels for unlimited data: active learning for speaker recognition. 383-387 - Niko Brümmer, Albert Swart:
Bayesian calibration for forensic evidence reporting. 388-392 - Shunichi Ishihara:
Replicate mismatch between test/background and development databases: the impact on the performance of likelihood ratio-based forensic voice comparison. 393-397
Speech Production I, II
- Manu Airaksinen, Tom Bäckström, Paavo Alku:
Automatic estimation of the lip radiation effect in glottal inverse filtering. 398-402 - Marcelo de Oliveira Rosa:
Simulation of 3d larynges with asymmetric distribution of viscoelastic properties in their vocal folds. 403-407 - Hironori Takemoto, Parham Mokhtari, Tatsuya Kitamura:
Comparison of vocal tract transfer functions calculated using one-dimensional and three-dimensional acoustic simulation methods. 408-412 - Jangwon Kim, Donna Erickson, Sungbok Lee, Shrikanth S. Narayanan:
A study of invariant properties and variation patterns in the converter/distributor model for emotional speech. 413-417 - Alexander Hewer, Ingmar Steiner, Stefanie Wuhrer:
A hybrid approach to 3d tongue modeling from vocal tract MRI using unsupervised image segmentation and mesh deformation. 418-421 - Tokihiko Kaburagi:
Estimation of vocal-tract shape from speech spectrum and speech resynthesis based on a generative model. 422-426
INTERSPEECH 2014 Computational Paralinguistics ChallengE (ComParE)
- Björn W. Schuller, Stefan Steidl, Anton Batliner, Julien Epps, Florian Eyben, Fabien Ringeval, Erik Marchi, Yue Zhang:
The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load. 427-431 - Jouni Pohjalainen, Paavo Alku:
Filtering and subspace selection for spectral features in detecting speech under physical stress. 432-436 - Ming Li:
Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens. 437-441 - Heysem Kaya, Tugçe Özkaptan, Albert Ali Salah, Sadik Fikret Gürgen:
Canonical correlation analysis and local fisher discriminant analysis based multi-view acoustic feature reduction for physical load prediction. 442-446 - How Jing, Ting-Yao Hu, Hung-Shin Lee, Wei-Chen Chen, Chi-Chun Lee, Yu Tsao, Hsin-Min Wang:
Ensemble of machine learning algorithms for cognitive and physical speaker load detection. 447-451 - Gábor Gosztolya, Tamás Grósz, Róbert Busa-Fekete, László Tóth:
Detecting the intensity of cognitive and physical load using AdaBoost and deep rectifier neural networks. 452-456
Hearing and Perception
- Nandini Iyer, Eric R. Thompson, Brian D. Simpson, Griffin D. Romigh:
Revisiting the right-ear advantage for speech: implications for speech displays. 457-461 - Louis ten Bosch, Mirjam Ernestus, Lou Boves:
Comparing reaction time sequences from human participants and computational models. 462-466 - Valentin Andrei, Horia Cucu, Andi Buzo, Corneliu Burileanu:
Detecting the number of competing speakers - human selective hearing versus spectrogram distance based estimator. 467-470 - Guo Li, Gang Peng:
The influence of sensory memory and attention on the context effect in talker normalization. 471-475 - Payton Lin, Fei Chen, Syu-Siang Wang, Ying-Hui Lai, Yu Tsao:
Automatic speech recognition with primarily temporal envelope information. 476-480 - Ying-Hui Lai, Fei Chen, Yu Tsao:
An adaptive envelope compression strategy for speech processing in cochlear implants. 481-484 - Brian S. Helfer, Thomas F. Quatieri, James R. Williamson, Laurel Keyes, Benjamin Evans, W. Nicholas Greene, Trina Vian, Joseph Lacirignola, Trey E. Shenk, Thomas M. Talavage, Jeff Palmer, Kristin Heaton:
Articulatory dynamics and coordination in classifying cognitive change with preclinical mTBI. 485-489 - Nozomi Jinbo, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
A hearing impairment simulation method using audiogram-based approximation of auditory charatecteristics. 490-494 - Dongmei Wang, James M. Kates, John H. L. Hansen:
Investigation of the relative perceptual importance of temporal envelope and temporal fine structure between tonal and non-tonal languages. 495-498 - Daniel Fogerty, Fei Chen:
Vowel spectral contributions to English and Mandarin sentence intelligibility. 499-503 - Vinay Kumar Mittal, B. Yegnanarayana:
Significance of aperiodicity in the pitch perception of expressive voices. 504-508
Cross-Linguistic Studies
- Mirjam Wester, María Luisa García Lecumberri, Martin Cooke:
DIAPIX-FL: a symmetric corpus of problem-solving dialogues in first and second languages. 509-513 - Christophe Coupé, Yoon Mi Oh, François Pellegrino, Egidio Marsico:
Cross-linguistic investigations of oral and silent reading. 514-518 - Juul Coumans, Roeland van Hout, Odette Scharenborg:
Non-native word recognition in noise: the role of word-initial and word-final information. 519-523 - Janice Wing Sze Wong:
The effects of high and low variability phonetic training on the perception and production of English vowels /e/-/æ/ by Cantonese ESL learners with high and low L2 proficiency levels. 524-528 - Pepi Burgos, Mátyás Jani, Catia Cucchiarini, Roeland van Hout, Helmer Strik:
Dutch vowel production by Spanish learners: duration and spectral features. 529-533 - Angelos Lengeris, Katerina Nicolaidis:
English consonant confusions by Greek listeners in quiet and noise and the role of phonological short-term memory. 534-538 - Sylvain Detey, Isabelle Racine, Julien Eychenne, Yuji Kawaguchi:
Corpus-based L2 phonological data and semi-automatic perceptual analysis: the case of nasal vowels produced by beginner Japanese learners of French. 539-543 - Gábor Pintér, Shinobu Mizuguchi, Koichi Tateishi:
Perception of prosodic prominence and boundaries by L1 and L2 speakers of English. 544-547 - Rose Thomas Kalathottukaren, Suzanne C. Purdy, Elaine Ballard:
Prosody perception, reading accuracy, nonliteral language comprehension, and music and tonal pitch discrimination in school aged children. 548-552 - Polina Drozdova, Roeland van Hout, Odette Scharenborg:
Phoneme category retuning in a non-native language. 553-557 - Bo-Chang Chiou, Chia-Ping Chen:
Speech emotion recognition with cross-lingual databases. 558-561
Speaker Diarization
- Koji Inoue, Yukoh Wakabayashi, Hiromasa Yoshimoto, Tatsuya Kawahara:
Speaker diarization using eye-gaze information in multi-party conversations. 562-566 - Che-Wei Huang, Bo Xiao, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Unsupervised speaker diarization using riemannian manifold clustering. 567-571 - Héctor Delgado, Corinne Fredouille, Javier Serrano:
Towards a complete binary key system for the speaker diarization task. 572-576 - Houman Ghaemmaghami, David Dean, Sridha Sridharan:
An iterative speaker re-diarization scheme for improving speaker-based entity extraction in multimedia archives. 577-581 - Binyam Gebrekidan Gebre, Peter Wittenburg, Sebastian Drude, Marijn Huijbregts, Tom Heskes:
Speaker diarization using gesture and speech. 582-586 - Grégor Dupuy, Sylvain Meignier, Yannick Estève:
Is incremental cross-show speaker diarization efficient for processing large volumes of data? 587-591 - Pranay Dighe, Marc Ferras, Hervé Bourlard:
Detecting and labeling speakers on overlapping speech using vector taylor series. 592-596 - Sree Harsha Yella, Petr Motlícek, Hervé Bourlard:
Phoneme background model for information bottleneck based speaker diarization. 597-601 - Marc Ferras, Stefano Masneri, Oliver Schreer, Hervé Bourlard:
Diarizing large corpora using multi-modal speaker linking. 602-606 - Frédéric Béchet, Meriem Bendris, Delphine Charlet, Géraldine Damnati, Benoît Favre, Mickael Rouvier, Rémi Auguste, Benjamin Bigot, Richard Dufour, Corinne Fredouille, Georges Linarès, Jean Martinet, Grégory Senay, Pierre Tirilly:
Multimodal understanding for person recognition in video broadcasts. 607-611
Robust ASR 1, 2
- James Gibson, Maarten Van Segbroeck, Shrikanth S. Narayanan:
Comparing time-frequency representations for directional derivative features. 612-615 - Jun Du, Qing Wang, Tian Gao, Yong Xu, Li-Rong Dai, Chin-Hui Lee:
Robust speech recognition with speech enhanced deep neural networks. 616-620 - Emmanuel Vincent, Aggelos Gkiokas, Dominik Schnitzer, Arthur Flexer:
An investigation of likelihood normalization for robust ASR. 621-625 - Constantin Spille, Bernd T. Meyer:
Identifying the human-machine differences in complex binaural scenes: what can be learned from our auditory system. 626-630 - Jürgen T. Geiger, Zixing Zhang, Felix Weninger, Björn W. Schuller, Gerhard Rigoll:
Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling. 631-635 - Shilin Liu, Khe Chai Sim:
Joint adaptation and adaptive training of TVWR for robust automatic speech recognition. 636-640
Implementation of Language Model Algorithms
- Xie Chen, Yongqiang Wang, Xunying Liu, Mark J. F. Gales, Philip C. Woodland:
Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch. 641-645 - David Nolden, Ralf Schlüter, Hermann Ney:
Word pair approximation for more efficient decoding with high-order language models. 646-650 - Heike Adel, Katrin Kirchhoff, Ngoc Thang Vu, Dominic Telaar, Tanja Schultz:
Comparing approaches to convert recurrent neural networks into backoff language models for efficient decoding. 651-655 - David Nolden, Hagen Soltau, Daniel Povey, Pegah Ghahremani, Lidia Mangu, Hermann Ney:
Removing redundancy from lattices. 656-660 - Martin Sundermeyer, Zoltán Tüske, Ralf Schlüter, Hermann Ney:
Lattice decoding and rescoring with long-Span neural network language models. 661-665 - Michael Levit, Sarangarajan Parthasarathy, Shuangyu Chang, Andreas Stolcke, Benoît Dumoulin:
Word-phrase-entity language models: getting more mileage out of n-grams. 666-670
Speaker Recognition - Noise and Channel Robustness
- Sourjya Sarkar, K. Sreenivasa Rao:
A novel boosting algorithm for improved i-vector based speaker verification in noisy environments. 671-675 - William M. Campbell:
Using deep belief networks for vector-based speaker recognition. 676-680 - Yun Lei, Luciana Ferrer, Mitchell McLaren, Nicolas Scheffer:
A deep neural network speaker verification system targeting microphone speech. 681-685 - Mitchell McLaren, Yun Lei, Nicolas Scheffer, Luciana Ferrer:
Application of convolutional neural networks to speaker recognition in noisy conditions. 686-690 - Jason W. Pelecanos, Weizhong Zhu, Sibel Yaman:
SVM based speaker recognition: harnessing trials with multiple enrollment sessions. 691-695 - Laura Fernández Gallardo, Michael Wagner, Sebastian Möller:
I-vector speaker verification based on phonetic information under transmission channel effects. 696-700
Speech Production I, II
- Andrés Benítez, Vikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan:
A real-time MRI study of articulatory setting in second language speech. 701-705 - Takayuki Arai:
Retroflex and bunched English /r/ with physical models of the human vocal tract. 706-710 - Panying Rong, Yana Yunusova, James D. Berry, Lorne Zinman, Jordan R. Green:
Parameterization of articulatory pattern in speakers with ALS. 711-715 - P. Sujith, Prasanta Kumar Ghosh:
Missing samples estimation in electromagnetic articulography data using equality constrained kalman smoother. 716-720 - An Ji, Michael T. Johnson, Jeffrey Berry:
Palate-referenced articulatory features for acoustic-to-articulator inversion. 721-725 - Hidetsugu Uchida, Kohei Wakamiya, Tokihiko Kaburagi:
A study on the improvement of measurement accuracy of the three-dimensional electromagnetic articulography. 726-730
INTERSPEECH 2014 Computational Paralinguistics ChallengE (ComParE)
- Claude Montacié, Marie-José Caraty:
High-level speech event analysis for cognitive load classification. 731-735 - Tin Lay Nwe, Trung Hieu Nguyen, Bin Ma:
On the use of Bhattacharyya based GMM distance and neural net features for identification of cognitive load levels. 736-740 - Mark A. Huckvale:
Prediction of cognitive load from speech with the VOQAL voice quality toolbox for the interspeech 2014 computational paralinguistics challenge. 741-745 - Jia Min Karen Kua, Vidhyasaharan Sethu, Phu Ngoc Le, Eliathamby Ambikairajah:
The UNSW submission to INTERSPEECH 2014 compare cognitive load challenge. 746-750 - Maarten Van Segbroeck, Ruchir Travadi, Colin Vaz, Jangwon Kim, Matthew P. Black, Alexandros Potamianos, Shrikanth S. Narayanan:
Classification of cognitive load from speech using an i-vector framework. 751-755
Speech Synthesis I-III
- Xiao Zang, Zhiyong Wu, Helen M. Meng, Jia Jia, Lianhong Cai:
Using conditional random fields to predict focus word pair in spontaneous spoken English. 756-760 - Richard Sproat, Keith B. Hall:
Applications of maximum entropy rankers to problems in spoken language processing. 761-764 - Xavi Gonzalvo, Monika Podsiadlo:
Text-to-speech with cross-lingual neural network-based grapheme-to-phoneme models. 765-769 - Daiki Nagahama, Takashi Nose, Tomoki Koriyama, Takao Kobayashi:
Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis. 770-774 - B. Ramani, M. P. Actlin Jeeva, P. Vijayalakshmi, T. Nagarajan:
Cross-lingual voice conversion-based polyglot speech synthesizer for indian languages. 775-779 - Qiong Hu, Yannis Stylianou, Ranniery Maia, Korin Richmond, Junichi Yamagishi, Javier Latorre:
An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis. 780-784 - Hemant A. Patil, Tanvina B. Patel:
Chaotic mixed excitation source for speech synthesis. 785-789 - Alexander Sorin, Slava Shechtman, Vincent Pollet:
Refined inter-segment joining in multi-form speech synthesis. 790-794 - Ran Zhang, Zhengqi Wen, Jianhua Tao, Ya Li, Bing Liu, Xiaoyan Lou:
A hierarchical viterbi algorithm for Mandarin hybrid speech synthesis system. 795-799
Multi-Lingual Cross-Lingual and Low-Resource ASR
- Yajie Miao, Florian Metze:
Improving language-universal feature extraction with deep maxout and convolutional neural networks. 800-804 - Raul Fernandez, Jia Cui, Andrew Rosenberg, Bhuvana Ramabhadran, Xiaodong Cui:
Exploiting vocal-source features to improve ASR accuracy for low-resource languages. 805-809 - Anton Ragni, Kate M. Knill, Shakti P. Rath, Mark J. F. Gales:
Data augmentation for low resource languages. 810-814 - Denis Jouvet, Dominique Fohr:
About combining forward and backward-based decoders for selecting data for unsupervised training of acoustic models. 815-819 - Frantisek Grézl, Martin Karafiát:
Combination of multilingual and semi-supervised training for under-resourced languages. 820-824 - Ngoc Thang Vu, Jochen Weiner, Tanja Schultz:
Investigating the learning effect of multilingual bottle-neck features for ASR. 825-829 - Yajie Miao, Hao Zhang, Florian Metze:
Distributed learning of multilingual DNN feature extractors using GPUs. 830-834 - Shakti P. Rath, Kate M. Knill, Anton Ragni, Mark J. F. Gales:
Combining tandem and hybrid systems for improved speech recognition and keyword spotting on low resource languages. 835-839 - Jia Cui, Bhuvana Ramabhadran, Xiaodong Cui, Andrew Rosenberg, Brian Kingsbury, Abhinav Sethy:
Recent improvements in neural network acoustic modeling for LVCSR in low resource languages. 840-844 - Yan Huang, Malcolm Slaney, Michael L. Seltzer, Yifan Gong:
Towards better performance with heterogeneous training data in acoustic modeling using deep neural networks. 845-849
Speech Estimation and Sound Source Separation
- Takuya Higuchi, Hirofumi Takeda, Tomohiko Nakamura, Hirokazu Kameoka:
A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models. 850-854 - Colin Vaz, Dimitrios Dimitriadis, Shrikanth S. Narayanan:
Enhancing audio source separability using spectro-temporal regularization with NMF. 855-859 - Sayeh Mirzaei, Hugo Van hamme, Yaser Norouzi:
Blind speech source localization, counting and separation for 2-channel convolutive mixtures in a reverberant environment. 860-864 - Felix Weninger, Jonathan Le Roux, John R. Hershey, Shinji Watanabe:
Discriminative NMF and its application to single-channel source separation. 865-869 - Hideki Kawahara, Tatsuya Kitamura, Hironori Takemoto, Ryuichi Nisimura, Toshio Irino:
Vocal tract length estimation based on vowels using a database consisting of 385 speakers and a database with MRI-based vocal tract shape information. 870-874 - Haipeng Wang, Tan Lee, Cheung-Chi Leung, Bin Ma, Haizhou Li:
A graph-based Gaussian component clustering approach to unsupervised acoustic modeling. 875-879 - Ali Ziaei, Abhijeet Sangwan, John H. L. Hansen:
A speech system for estimating daily word counts. 880-884 - Xugang Lu, Yu Tsao, Shigeki Matsuda, Chiori Hori:
Ensemble modeling of denoising autoencoder for speech spectrum restoration. 885-889
Feature Extraction and Modeling for ASR 1, 2
- Zoltán Tüske, Pavel Golik, Ralf Schlüter, Hermann Ney:
Acoustic modeling with deep neural networks using raw time signal for LVCSR. 890-894 - Vikramjit Mitra, Wen Wang, Horacio Franco, Yun Lei, Chris Bartels, Martin Graciarena:
Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions. 895-899 - Tara N. Sainath, Vijayaditya Peddinti, Brian Kingsbury, Petr Fousek, Bhuvana Ramabhadran, David Nahamoo:
Deep scattering spectra with deep neural networks for LVCSR tasks. 900-904 - Shuo-Yiin Chang, Nelson Morgan:
Robust CNN-based speech recognition with Gabor filter kernels. 905-909 - Liang Lu, Steve Renals:
Probabilistic linear discriminant analysis with bottleneck features for speech recognition. 910-914 - Thomas Schatz, Vijayaditya Peddinti, Xuan-Nga Cao, Francis R. Bach, Hynek Hermansky, Emmanuel Dupoux:
Evaluating speech features with the minimal-pair ABX task (II): resistance to noise. 915-919
Speech Analysis I, II
- Marija Tabain, Andrew Butcher, Gavan Breen, Richard Beare:
Lateral formants in three central australian languages. 920-924 - Alina Khasanova, Jennifer Cole, Mark Hasegawa-Johnson:
Detecting articulatory compensation in acoustic data through linear regression modeling. 925-929 - Jinxi Guo, Angli Liu, Harish Arsikere, Abeer Alwan, Steven M. Lulich:
The relationship between the second subglottal resonance and vowel class, standing height, trunk length, and F0 variation for Mandarin speakers. 930-934 - Nisha Meenakshi, Chiranjeevi Yarra, B. K. Yamini, Prasanta Kumar Ghosh:
Comparison of speech quality with and without sensors in electromagnetic articulograph AG 501 recording. 935-939 - Luciana Albuquerque, Catarina Oliveira, António J. S. Teixeira, Pedro Sá-Couto, João Freitas, Miguel Sales Dias:
Impact of age in the production of European Portuguese vowels. 940-944 - Chengzhu Yu, John H. L. Hansen, Douglas W. Oard:
'houston, we have a solution': a case study of the analysis of astronaut speech during NASA apollo 11 for long-term speaker modeling. 945-948
Speech Technologies and Applications
- David Harwath, Alexander Gruenstein, Ian McGraw:
Choosing useful word alternates for automatic speech recognition correction interfaces. 949-953 - Xie Chen, Mark J. F. Gales, Kate M. Knill, Catherine Breslin, Langzhou Chen, K. K. Chin, Vincent Wan:
An initial investigation of long-term adaptation for meeting transcription. 954-958 - Tim Ng, Roger Hsiao, Le Zhang, Damianos G. Karakos, Sri Harish Reddy Mallidi, Martin Karafiát, Karel Veselý, Igor Szöke, Bing Zhang, Long Nguyen, Richard M. Schwartz:
Progress in the BBN keyword search system for the DARPA RATS program. 959-963 - Jan Nouza, Petr Cerva, Jindrich Zdánský, Karel Blavka, Marek Bohac, Jan Silovský, Josef Chaloupka, Michaela Kucharová, Ladislav Seps, Jirí Málek, Michal Rott:
Speech-to-text technology to transcribe and disclose 100, 000+ hours of bilingual documents from historical Czech and Czechoslovak radio archive. 964-968 - Emre Yilmaz, Joris Pelemans, Hugo Van hamme:
Automatic assessment of children's reading with the FLaVoR decoding using a phone confusion model. 969-972 - M. Ali Basha Shaik, Zoltán Tüske, Muhammad Ali Tahir, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney:
RWTH LVCSR systems for quaero and EU-bridge: German, Polish, Spanish and Portuguese. 973-977
Source Separation and Computational Auditory Scene Analysis
- Matthias Zöhrer, Franz Pernkopf:
Single channel source separation with general stochastic networks. 978-982 - Yu Ting Yeung, Tan Lee, Cheung-Chi Leung:
Large-margin conditional random fields for single-microphone speech separation. 983-987 - Ingrid Jafari, Roberto Togneri, Sven Nordholm:
On the use of the Watson mixture model for clustering-based under-determined blind source separation. 988-992 - Chung-Chien Hsu, Jen-Tzung Chien, Tai-Shih Chi:
Binary mask estimation based on frequency modulations. 993-997 - Po-Kai Yang, Chung-Chien Hsu, Jen-Tzung Chien:
Bayesian factorization and selection for speech and music separation. 998-1002 - Michael Wohlmayr, Ludwig Mohr, Franz Pernkopf:
Self-adaption in single-channel source separation. 1003-1007
Speech Technologies for Ambient Assisted Living (Special Session)
- Michel Vacher, Benjamin Lecouteux, François Portet:
Multichannel automatic recognition of voice command in a multi-room smart home: an experiment involving seniors and users with visual impairment. 1008-1012 - Oliver Walter, Vladimir Despotovic, Reinhold Haeb-Umbach, Jort F. Gemmeke, Bart Ons, Hugo Van hamme:
An evaluation of unsupervised acoustic model training for a dysarthric speech interface. 1013-1017 - José A. González, Lam Aun Cheah, Jie Bai, Stephen R. Ell, James M. Gilbert, Roger K. Moore, Phil D. Green:
Analysis of phonetic similarity in a silent speech interface based on permanent magnetic articulography. 1018-1022 - Alexey Karpov, Lale Akarun, Hülya Yalçin, Alexander L. Ronzhin, Baris Evrim Demiröz, Aysun Çoban, Milos Zelezný:
Audio-visual signal processing in a multimodal assisted living environment. 1023-1027 - Mirco Ravanelli, Maurizio Omologo:
On the selection of the impulse responses for distant-speech recognition based on contaminated speech training. 1028-1032 - I. Casanueva, Heidi Christensen, Thomas Hain, Phil D. Green:
Adaptive speech recognition and dialogue management for users with speech disorders. 1033-1037 - Bea Yu, Thomas F. Quatieri, James R. Williamson, James C. Mundt:
Prediction of cognitive performance in an animal fluency task based on rate and articulatory markers. 1038-1042 - Carlos Toshinori Ishi, Hiroaki Hatano, Norihiro Hagita:
Analysis of laughter events in real science classes by using multiple environment sensor data. 1043-1047
DNN for ASR
- Tara N. Sainath, I-Hsin Chung, Bhuvana Ramabhadran, Michael Picheny, John A. Gunnels, Brian Kingsbury, George Saon, Vernon Austel, Upendra V. Chaudhari:
Parallel deep neural network training for LVCSR tasks using blue gene/Q. 1048-1052 - Samy Bengio, Georg Heigold:
Word embeddings for speech recognition. 1053-1057 - Frank Seide, Hao Fu, Jasha Droppo, Gang Li, Dong Yu:
1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. 1058-1062 - Ryu Takeda, Naoyuki Kanda, Nobuo Nukaga:
Boundary contraction training for acoustic models based on discrete deep neural networks. 1063-1067 - Yotaro Kubo, Jun Suzuki, Takaaki Hori, Atsushi Nakamura:
Restructuring output layers of deep neural networks using minimum risk parameter clustering. 1068-1072 - William Chan, Ian R. Lane:
Distributed asynchronous optimization of convolutional neural networks. 1073-1077 - László Tóth:
Convolutional deep maxout networks for phone recognition. 1078-1082 - Dongpeng Chen, Brian Mak, Sunil Sivadas:
Joint sequence training of phone and grapheme acoustic model based on multi-task learning deep neural networks. 1083-1087 - Roger Hsiao, Tim Ng, Le Zhang, Shivesh Ranjan, Stavros Tsakalidis, Long Nguyen, Richard M. Schwartz:
Improving semi-supervised deep neural network for keyword search in low resource languages. 1088-1091 - Chao Liu, Zhiyong Zhang, Dong Wang:
Pruning deep neural networks by optimal brain damage. 1092-1095
Speaker Recognition - General Topics
- Anderson R. Avila, Milton Orlando Sarria-Paja, Francisco J. Fraga, Douglas D. O'Shaughnessy, Tiago H. Falk:
Improving the performance of far-field speaker verification using multi-condition training: the case of GMM-UBM and i-vector systems. 1096-1100 - Hung-Shin Lee, Yu Tsao, Hsin-Min Wang, Shyh-Kang Jeng:
Clustering-based i-vector formulation for speaker recognition. 1101-1105 - Harish Arsikere, Hitesh Anand Gupta, Abeer Alwan:
Speaker recognition via fusion of subglottal features and MFCCs. 1106-1110 - Hanwu Sun, Bin Ma:
The NIST SRE summed channel speaker recognition system. 1111-1114 - Laura Fernández Gallardo, Michael Wagner, Sebastian Möller:
Advantages of wideband over narrowband channels for speaker verification employing MFCCs and LFCCs. 1115-1119 - Ming Li, Wenbo Liu:
Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem features. 1120-1124 - T. Asha, M. S. Saranya, D. S. Karthik Pandia, Srikanth R. Madikeri, Hema A. Murthy:
Feature Switching in the i-vector framework for speaker verification. 1125-1129 - Jinghua Zhong, Weiwu Jiang, Wei Rao, Man-Wai Mak, Helen M. Meng:
PLDA modeling in the fishervoice subspace for speaker verification. 1130-1134 - Alvin F. Martin, Craig S. Greenberg, Vincent M. Stanford, John M. Howard, George R. Doddington, John J. Godfrey:
Performance factor analysis for the 2012 NIST speaker recognition evaluation. 1135-1138 - Hiroshi Fujimura:
Simultaneous gender classification and voice activity detection using deep neural networks. 1139-1143
Speech Processing with Multi-Modalities
- Ahmed Hussen Abdelaziz, Dorothea Kolossa:
Dynamic stream weight estimation in coupled-HMM-based audio-visual speech recognition using multilayer perceptrons. 1144-1148 - Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno, Tetsuya Ogata:
Lipreading using convolutional neural network. 1149-1153 - Fei Tao, Carlos Busso:
Lipreading approach for isolated digits recognition under whisper and neutral speech. 1154-1158 - Kenta Masaka, Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki:
Multimodal exemplar-based voice conversion using lip features in noisy environments. 1159-1163 - Yunbin Deng, James T. Heaton, Geoffrey S. Meltzner:
Towards a practical silent speech recognition system. 1164-1168 - João Freitas, Artur J. Ferreira, Mário A. T. Figueiredo, António J. S. Teixeira, Miguel Sales Dias:
Enhancing multimodal silent speech interfaces with feature selection. 1169-1173 - William F. Katz, Thomas F. Campbell, Jun Wang, Eric Farrar, Jessie Colette Eubanks, Arvind Balasubramanian, Balakrishnan Prabhakaran, Rob Rennaker:
Opti-speech: a real-time, 3d visual feedback system for speech training. 1174-1178 - Jun Wang, Ashok Samal, Jordan R. Green:
Across-speaker articulatory normalization for speaker-independent silent speech recognition. 1179-1183 - Marlene Zahner, Matthias Janke, Michael Wand, Tanja Schultz:
Conversion from facial myoelectric signals to speech: a unit selection approach. 1184-1188 - Michael Wand, Tanja Schultz:
Towards real-life application of EMG-based speech recognition by using unsupervised adaptation. 1189-1193 - Yuan Liang, Koji Iwano, Koichi Shinoda:
Simple gesture-based error correction interface for smartphone speech recognition. 1194-1198
Normalization and Discriminative Training Methods
- Kshitiz Kumar, Chaojun Liu, Yifan Gong:
Normalization of ASR confidence classifier scores via confidence mapping. 1199-1203 - Tanel Alumäe:
Neural network phone duration model for speech recognition. 1204-1208 - Hasim Sak, Oriol Vinyals, Georg Heigold, Andrew W. Senior, Erik McDermott, Rajat Monga, Mark Z. Mao:
Sequence discriminative distributed training of long short-term memory recurrent neural networks. 1209-1213 - Zhen Huang, Jinyu Li, Chao Weng, Chin-Hui Lee:
Beyond cross-entropy: towards better frame-level objective functions for deep neural network training in automatic speech recognition. 1214-1218 - Hao Tang, Kevin Gimpel, Karen Livescu:
A comparison of training approaches for discriminative segmental models. 1219-1223 - Erik McDermott, Georg Heigold, Pedro J. Moreno, Andrew W. Senior, Michiel Bacchiani:
Asynchronous stochastic optimization for sequence training of deep neural networks: towards big data. 1224-1228
Paralinguistic and Extralinguistic Information
- Hrishikesh Rao, Jonathan C. Kim, Mark A. Clements, Agata Rozga, Daniel S. Messinger:
Detection of children's paralinguistic events in interaction with caregivers. 1229-1233 - Massimo Pettorino, Elisa Pellegrino:
Age and rhythmic variations: a study on Italian. 1234-1237 - Nicholas Cummins, Vidhyasaharan Sethu, Julien Epps, Jarek Krajewski:
Probabilistic acoustic volume analysis for speech affected by depression. 1238-1242 - Elif Bozkurt, Orith Toledo-Ronen, Alexander Sorin, Ron Hoory:
Exploring modulation spectrum features for speech-based depression level classification. 1243-1247 - Florian Hönig, Anton Batliner, Elmar Nöth, Sebastian Schnieder, Jarek Krajewski:
Automatic modelling of depressed speech: relevant features and relevance of gender. 1248-1252 - P. Gangamohan, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, B. Yegnanarayana:
Excitation source features for discrimination of anger and happy emotions. 1253-1257
Text Processing for Speech Synthesis
- Ke Wu, Cyril Allauzen, Keith B. Hall, Michael Riley, Brian Roark:
Encoding linear models as weighted finite-state transducers. 1258-1262 - Keigo Kubo, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Structured soft margin confidence weighted learning for grapheme-to-phoneme conversion. 1263-1267 - Wei Zhang, Robert A. J. Clark, Yongyuan Wang:
Unsupervised language filtering using the latent dirichlet allocation. 1268-1272 - BalaKrishna Kolluru, Vincent Wan, Javier Latorre, Kayoko Yanagisawa, Mark J. F. Gales:
Generating multiple-accent pronunciations for TTS using joint sequence model interpolation. 1273-1277 - Gustavo Mendonça, Sandra M. Aluísio:
Using a hybrid approach to build a pronunciation dictionary for Brazilian Portuguese. 1278-1282 - Matthew P. Aylett, Rasmus Dall, Arnab Ghoshal, Gustav Eje Henter, Thomas Merritt:
A flexible front-end for HTS. 1283-1287
Cross-language Perception and Production
- Kimiko Tsukada, Felicity Cox, John Hajek:
Cross-language perception of Japanese singleton and geminate consonants: preliminary data from non-native learners of Japanese and native speakers of Italian and australian English. 1288-1292 - Samra Alispahic, Paola Escudero, Karen E. Mulak:
Difficulty in discriminating non-native vowels: are Dutch vowels easier for australian English than Spanish listeners? 1293-1296 - Jing Yang, Robert Allen Fox:
Acoustic properties of shared vowels in bilingual Mandarin-English children. 1297-1301 - María Luisa García Lecumberri, Roberto Barra-Chicote, Rubén Pérez Ramón, Junichi Yamagishi, Martin Cooke:
Generating segmental foreign accent. 1302-1306 - Bistra Andreeva, Grazyna Demenko, Bernd Möbius, Frank Zimmerer, Jeanin Jügler, Magdalena Oleskowicz-Popiel:
Differences of pitch profiles in Germanic and slavic languages. 1307-1311 - Mathieu Avanzi, Guri Bordal, Gélase Nimbona:
The obligatory contour principle in african and European varieties of French. 1312-1316
Text-Dependent Speaker Verification With Short Utterances (Special
- Nicolas Scheffer, Yun Lei:
Content matching for short duration speaker recognition. 1317-1321 - Anthony Larcher, Kong-Aik Lee, Pablo Luis Sordo Martinez, Trung Hieu Nguyen, Bin Ma, Haizhou Li:
Extended RSR2015 for text-dependent speaker verification over VHF channel. 1322-1326 - Tianfan Fu, Yanmin Qian, Yuan Liu, Kai Yu:
Tandem deep features for text-dependent speaker verification. 1327-1331 - Patrick Kenny, Themos Stafylakis, Md. Jahangir Alam, Pierre Ouellet, Marcel Kockmann:
In-domain versus out-of-domain training for text-dependent JFA. 1332-1336 - Hagai Aronowitz, Asaf Rendel:
Domain adaptation for text dependent speaker verification. 1337-1341 - Antonio Miguel, Jesús Antonio Villalba López, Alfonso Ortega, Eduardo Lleida, Carlos Vaquero:
Factor analysis with sampling methods for text dependent speaker recognition. 1342-1346
Speech and Audio Analysis
- Ewout van den Berg, Bhuvana Ramabhadran:
Dictionary-based pitch tracking with dynamic programming. 1347-1351 - Hongbing Hu, Stephen A. Zahorian, Peter Guzewich, Jiang Wu:
Acoustic features for robust classification of Mandarin tones. 1352-1356 - Anastasia Karlsson, Håkan Lundström, Jan-Olof Svantesson:
Preservation of lexical tones in singing in a tone language. 1357-1360 - Theodora Yakoumaki, George P. Kafentzis, Yannis Stylianou:
Emotional speech classification using adaptive sinusoidal modelling. 1361-1365 - Shengbei Wang, Masashi Unoki, Nam Soo Kim:
Formant enhancement based speech watermarking for tampering detection. 1366-1370 - Tom Barker, Hugo Van hamme, Tuomas Virtanen:
Modelling primitive streaming of simple tone sequences through factorisation of modulation pattern tensors. 1371-1375 - Biswajit Dev Sarma, S. R. M. Prasanna:
Detection of vowel onset points in voiced aspirated sounds of indian languages. 1376-1380 - Akira Sasou:
Accuracy evaluation of esophageal voice analysis based on automatic topology generated-voicing source HMM. 1381-1385 - Xuejun Zhang, Xiang Xie:
Audio watermarking based on multiple echoes hiding for FM radio. 1386-1390
Cross-Lingual and Adaptive Language Modeling
- Petr Motlícek, David Imseng, Milos Cernak, Namhoon Kim:
Development of bilingual ASR system for MediaParl corpus. 1391-1394 - Jie Li, Rong Zheng, Bo Xu:
Investigation of cross-lingual bottleneck features in hybrid ASR systems. 1395-1399 - Oluwapelumi Giwa, Marelie H. Davel:
Language identification of individual words with joint sequence models. 1400-1404 - Xavier Anguera, Jordi Luque, Ciro Gracia:
Audio-to-text alignment for speech recognition with very limited resources. 1405-1409 - Hoang Gia Ngo, Nancy F. Chen, Sunil Sivadas, Bin Ma, Haizhou Li:
A minimal-resource transliteration framework for vietnamese. 1410-1414 - Heike Adel, Dominic Telaar, Ngoc Thang Vu, Katrin Kirchhoff, Tanja Schultz:
Combining recurrent neural networks and factored language models during decoding of code-Switching speech. 1415-1419 - Zoltán Tüske, Pavel Golik, David Nolden, Ralf Schlüter, Hermann Ney:
Data augmentation, feature combination, and multilingual neural networks to improve ASR and KWS performance for low-resource languages. 1420-1424 - Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi:
Mixture of latent words language models for domain adaptation. 1425-1429 - Robert Herms, Marc Ritter, Thomas Wilhelm-Stein, Maximilian Eibl:
Improving spoken document retrieval by unsupervised language model adaptation using utterance-based web search. 1430-1433 - Jen-Tzung Chien, Ying-Lan Chang:
The nested indian buffet process for flexible topic modeling. 1434-1437 - Kirill Levin, Irina Ponomareva, Anna Bulusheva, German A. Chernykh, Ivan Medennikov, Nickolay Merkin, Alexey Prudnikov, Natalia A. Tomashenko:
Automated closed captioning for Russian live broadcasting. 1438-1442
Pronunciation Modeling and Learning
- Lei Wang, Rong Tong:
Pronunciation modeling of foreign words for Mandarin ASR by considering the effect of language transfer. 1443-1447 - Attapol T. Rutherford, Fuchun Peng, Françoise Beaufays:
Pronunciation learning for named-entities through crowd-sourcing. 1448-1452 - Barbara Schuppler, Martine Adda-Decker, Juan Andres Morales-Cordovilla:
Pronunciation variation in read and conversational austrian German. 1453-1457 - Maider Lehr, Kyle Gorman, Izhak Shafran:
Discriminative pronunciation modeling for dialectal speech recognition. 1458-1462 - Thomas Pellegrini, Lionel Fontan, Julie Mauclair, Jérôme Farinas, Marina Robert:
The goodness of pronunciation algorithm applied to disordered speech. 1463-1467 - Angeliki Metallinou, Jian Cheng:
Using deep neural networks to improve proficiency assessment for children English language learners. 1468-1472 - Han Lu, Sheng-syun Shen, Sz-Rung Shiang, Hung-yi Lee, Lin-Shan Lee:
Alignment of spoken utterances with slide content for easier learning with recorded lectures using structured support vector machine (SVM). 1473-1477 - Richeng Duan, Jinsong Zhang, Wen Cao, Yanlu Xie:
A preliminary study on ASR-based detection of Chinese mispronunciation by Japanese learners. 1478-1481
Show and Tell Session 1, 1
- Kele Xu, Yin Yang, A. Jaumard-Hakoun, Martine Adda-Decker, Angélique Amelot, Samer Al Kork, Lise Crevier-Buchman, Patrick Chawah, Gérard Dreyfus, Thibaut Fux, Claire Pillot-Loiseau, Pierre Roussel, Maureen Stone, Bruce Denby:
3d tongue motion visualization based on ultrasound image sequences. 1482-1483 - Donald Derrick, Tom De Rybel, Greg A. O'Beirne, Jennifer Hay:
Listen with your skin: aerotak speech perception enhancement system. 1484-1485 - László Czap:
Speech assistant system. 1486-1487 - Rafael E. Banchs, Seokhwan Kim:
Spoken dialogue system for restaurant recommendation and reservation. 1488-1489 - Hayakawa Akira, Nick Campbell, Saturnino Luz:
Interlingual map task corpus collection. 1490-1491 - Jordi Centelles, Marta R. Costa-jussà, Rafael E. Banchs:
A client mobile application for Chinese-Spanish statistical machine translation. 1492-1493 - Alberto Benin, Piero Cosi, Giuseppe Riccardo Leone, Giulio Paci:
LuciawebGL: a new WebGL-Based talking head. 1494-1495 - Babak Naderi, Tim Polzehl, André Beyer, Tibor Pilz, Sebastian Möller:
Crowdee: mobile crowdsourcing micro-task platform for celebrating the diversity of languages. 1496-1497 - Roger K. Moore:
On the use of the 'pure data' programming language for teaching and public outreach in speech processing. 1498-1499 - Aleksandr Dubinsky:
Syncwords: a platform for semi-automated closed captioning and subtitles. 1500-1501 - Robert A. J. Clark:
Simple4all. 1502-1503
Statistical Parametric Speech Synthesis
- Gustav Eje Henter, Thomas Merritt, Matt Shannon, Catherine Mayo, Simon King:
Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech. 1504-1508 - Thomas Merritt, Tuomo Raitio, Simon King:
Investigating source and filter contributions, and their interaction, to statistical parametric speech synthesis. 1509-1513 - Javier Latorre, Vincent Wan, Kayoko Yanagisawa:
Voice expression conversion with factorised HMM-TTS models. 1514-1518 - Kayoko Yanagisawa, Langzhou Chen, Mark J. F. Gales:
Noise-robust TTS speaker adaptation with statistics smoothing. 1519-1523 - Sandrine Brognaux, Benjamin Picart, Thomas Drugman:
Speech synthesis in various communicative situations: impact of pronunciation variations. 1524-1528 - Ming-Qi Cai, Zhen-Hua Ling, Li-Rong Dai:
Formant-controlled speech synthesis using hidden trajectory model. 1529-1533
Voice Activity Detection
- Xiao-Lei Zhang, DeLiang Wang:
Boosted deep neural networks and multi-resolution cochleagram features for voice activity detection. 1534-1538 - Abhay Prasad, Prasanta Kumar Ghosh, Shrikanth S. Narayanan:
Selection of optimal vocal tract regions using real-time magnetic resonance imaging for robust voice activity detection. 1539-1543 - Ali Ziaei, Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen, Douglas W. Oard:
Speech activity detection for NASA apollo space missions: challenges and solutions. 1544-1548 - Ming Tu, Xiang Xie, Yishan Jiao:
Towards improving statistical model based voice activity detection. 1549-1552 - Ian Vince McLoughlin:
The use of low-frequency ultrasound for voice activity detection. 1553-1557 - Jeff Ma:
Improving the speech activity detection for the DARPA RATS phase-3 evaluation. 1558-1562
Disordered Speech
- Duc Le, Emily Mower Provost:
Modeling pronunciation, rhythm, and intonation for automatic assessment of speech quality in aphasia rehabilitation. 1563-1567 - Sofia Strömbergsson, Christina Tånnander, Jens Edlund:
Ranking severity of speech errors by their phonological impact in context. 1568-1572 - Juan Rafael Orozco-Arroyave, Florian Hönig, Julián D. Arias-Londoño, Jesús Francisco Vargas-Bonilla, Sabine Skodda, Jan Rusz, Elmar Nöth:
Automatic detection of parkinson's disease from words uttered in three different languages. 1573-1577 - Jason Lilley, Susan Nittrouer, H. Timothy Bunnell:
Automating an objective measure of pediatric speech intelligibility. 1578-1582 - Mostafa Ali Shahin, Beena Ahmed, Jacqueline McKechnie, Kirrie J. Ballard, Ricardo Gutierrez-Osuna:
A comparison of GMM-HMM and DNN-HMM based pronunciation verification techniques for use in the assessment of childhood apraxia of speech. 1583-1587 - Jeffrey Berry, Andrew Kolb, Cassandra North, Michael T. Johnson:
Acoustic and kinematic characteristics of vowel production through a virtual vocal tract in dysarthria. 1588-1592
Speech and Multimodal Resources
- Michael Wand, Matthias Janke, Tanja Schultz:
The EMG-UKA corpus for electromyographic speech processing. 1593-1597 - Pei Xuan Lee, Darren Wee, Hilary Si Yin Toh, Boon Pang Lim, Nancy F. Chen, Bin Ma:
A whispered Mandarin corpus for speech technology applications. 1598-1602 - Roberto Gretter:
Euronews: a multilingual benchmark for ASR and LID. 1603-1607 - Antigoni Tsiami, Isidoros Rodomagoulakis, Panagiotis Giannoulis, Athanasios Katsamanis, Gerasimos Potamianos, Petros Maragos:
ATHENA: a Greek multi-sensory database for home automation control uthor: isidoros rodomagoulakis (NTUA, Greece). 1608-1612 - Marco Matassoni, Ramón Fernandez Astudillo, Athanasios Katsamanis, Mirco Ravanelli:
The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones. 1613-1617 - Diogo Henriques, Isabel Trancoso, Daniel Mendes, Alfredo Ferreira:
Verbal description of LEGO blocks. 1618-1622
Phase Importance in Speech Processing Applications (Special Session)
- Pejman Mowlaee, Rahim Saeidi, Yannis Stylianou:
Phase importance in speech processing applications. 1623-1627 - Estefanía Cano, Mark D. Plumbley, Christian Dittmar:
Phase-based harmonic/percussive separation. 1628-1632 - Gilles Degottex, Nicolas Obin:
Phase distortion statistics as a representation of the glottal source: application to the classification of voice qualities. 1633-1637 - Gilles Degottex, Daniel Erro:
A measure of phase randomness for the harmonic model in speech synthesis. 1638-1642 - Emma Jokinen, Marko Takanen, Hannu Pulakka, Paavo Alku:
Enhancement of speech intelligibility in near-end noise conditions with phase modification. 1643-1647 - S. Aswin Shanmugam, Hema A. Murthy:
A hybrid approach to segmentation of speech using group delay processing and HMM based embedded reestimation. 1648-1652 - Maria Koutsogiannaki, Olympia Simantiraki, Gilles Degottex, Yannis Stylianou:
The importance of phase on voice quality assessment. 1653-1657 - Karthika Vijayan, Vinay Kumar, K. Sri Rama Murty:
Feature extraction from analytic phase of speech signals for speaker verification. 1658-1662 - Jon Sánchez, Ibon Saratxaga, Inma Hernáez, Eva Navas, Daniel Erro:
A cross-vocoder study of speaker independent synthetic speech detection using phase information. 1663-1667
Phonetics and Phonology 1, 2
- Gang Chen, Soo Jin Park, Jody Kreiman, Abeer Alwan:
Investigating the effect of F0 and vocal intensity on harmonic magnitudes: data from high-speed laryngeal videoendoscopy. 1668-1672 - Elisabeth Delais-Roussarie, Damien Lolive, Hiyon Yoo, Nelly Barbot, Olivier Rosec:
Adapting prosodic chunking algorithm and synthesis system to specific style: the case of dictation. 1673-1677 - Jae-Hyun Sung:
The articulation of lexical and post-lexical palatalization in Korean. 1678-1682 - Diana Archangeli, Samuel Johnston, Jae-Hyun Sung, Muriel Fisher, Michael Hammond, Andrew Carnie:
Articulation and neutralization: a preliminary study of lenition in scottish gaelic. 1683-1687 - Kanae Amino, Hisanori Makinae, Tatsuya Kitamura:
Nasality in speech and its contribution to speaker individuality. 1688-1692 - Jason Brown, Eden Matene:
Is speech rhythm an intrinsic property of language? 1693-1697 - Anke Jackschina, Barbara Schuppler, Rudolf Muhr:
Where /ar/ the /r/s in standard austrian German? 1698-1702 - Fang Hu, Minghui Zhang:
Diphthongized vowels in the yi county hui Chinese dialect. 1703-1707 - Volker Dellwo, Peggy Mok, Mathias Jenny:
Rhythmic variability between some asian languages: results from an automatic analysis of temporal characteristics. 1708-1711 - Angelika Braun, Daniela Decker:
Listener estimation of speaker age based on whispered speech. 1712-1716 - Benjawan Kasisopa, Virginie Attina, Denis Burnham:
The Lombard effect with Thai lexical tones: an acoustic analysis of articulatory modifications in noise. 1717-1721
Spoken Term Detection and Document Retrieval
- Peng Yang, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li:
Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection. 1722-1726 - Julien van Hout, Vikramjit Mitra, Yun Lei, Dimitra Vergyri, Martin Graciarena, Arindam Mandal, Horacio Franco:
Recent improvements in SRI's keyword detection system for noisy audio. 1727-1731 - Mitsuaki Makino, Naoki Yamamoto, Atsuhiko Kai:
Utilizing state-level distance vector representation for improved spoken term detection by text and spoken queries. 1732-1736 - Raghavendra Reddy Pappagari, Shekhar Nayak, K. Sri Rama Murty:
Unsupervised spoken word retrieval using Gaussian-bernoulli restricted boltzmann machines. 1737-1741 - Basil George, Abhijeet Saxena, Gautam Varma Mantena, Kishore Prahallad, B. Yegnanarayana:
Unsupervised query-by-example spoken term detection using bag of acoustic words and non-segmental dynamic time warping. 1742-1746 - Jie Li, Xiaorui Wang, Bo Xu:
An empirical study of multilingual and low-resource spoken term detection using deep neural networks. 1747-1751 - Peter F. Schulam, Murat Akbacak:
Diagnostic techniques for spoken keyword discovery. 1752-1756 - Sho Kawasaki, Tomoyosi Akiba:
Robust retrieval models for false positive errors in spoken documents. 1757-1761 - Yuan-ming Liou, Yi-Sheng Fu, Hung-yi Lee, Lin-Shan Lee:
Semantic retrieval of personal photos using matrix factorization and two-layer random walk fusing sparse speech annotations with visual features. 1762-1766 - Guillaume Gravier, Nathan Souviraà-Labastie, Sébastien Campion, Frédéric Bimbot:
Audio thumbnails for spoken content without transcription based on a maximum motif coverage criterion. 1767-1771 - Fernando García, Emilio Sanchis, Ferran Pla:
Semantically based search in a social speech task. 1772-1776
Prosody and Paralinguistic Information
- Vinay Kumar Mittal, B. Yegnanarayana:
Study of changes in glottal vibration characteristics during laughter. 1777-1781 - Stavros Ntalampiras, Ilyas Potamitis:
On predicting the unpleasantness level of a sound event. 1782-1785 - Bilal Piot, Olivier Pietquin, Matthieu Geist:
Predicting when to laugh with structured classification. 1786-1790 - Benjamin Weiss, Katrin Schoenenberg:
Conversational structures affecting auditory likeability. 1791-1795 - Mathieu Avanzi, George Christodoulides, Damien Lolive, Elisabeth Delais-Roussarie, Nelly Barbot:
Towards the adaptation of prosodic models for expressive text-to-speech synthesis. 1796-1800 - Sho Matsumiya, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Data-driven generation of text balloons based on linguistic and acoustic features of a comics-anime corpus. 1801-1805 - Chiu-yu Tseng, Chao-yu Su:
Learning L2 prosody is more difficult than you realize - F0 characteristics and chunking size of L1 English, TW L2 English and TW L1 Mandarin. 1806-1810 - Khiet P. Truong, Jürgen Trouvain:
Investigating prosodic relations between initiating and responding laughs. 1811-1815 - Dmytro Prylipko, Olga Egorow, Ingo Siegert, Andreas Wendemuth:
Application of image processing methods to filled pauses detection from spontaneous speech. 1816-1820 - Sofoklis Kakouros, Okko Räsänen:
Perception of sentence stress in English infant directed speech. 1821-1825 - Noor Alhusna Madzlan, Jing Guang Han, Francesca Bonin, Nick Campbell:
Automatic recognition of attitudes in video blogs - prosodic and visual feature analysis. 1826-1830 - Denys Katerenchuk, David Guy Brizan, Andrew Rosenberg:
"was that your mother on the phone?": classifying interpersonal relationships between dialog participants with lexical and acoustic properties. 1831-1835
Features and Robustness in Speaker and Language Recognition
- Rohan Kumar Das, S. Abhiram, S. R. M. Prasanna, A. G. Ramakrishnan:
Combining source and system information for limited data speaker verification. 1836-1840 - Mireia Díez, Amparo Varona, Mikel Peñagarikano, Luis Javier Rodríguez-Fuentes, Germán Bordel:
New insight into the use of phone log-likelihood ratios as features for language recognition. 1841-1845 - Sriram Ganapathy, Kyu Jeong Han, Samuel Thomas, Mohamed Kamal Omar, Maarten Van Segbroeck, Shrikanth S. Narayanan:
Robust language identification using convolutional neural network features. 1846-1850 - Chengzhu Yu, Gang Liu, John H. L. Hansen:
Acoustic feature transformation using UBM-based LDA for speaker recognition. 1851-1854 - Man-Wai Mak:
SNR-dependent mixture of PLDA for noise robust speaker verification. 1855-1859 - Seyed Omid Sadjadi, Jason W. Pelecanos, Weizhong Zhu:
Nearest neighbor discriminant analysis for robust speaker recognition. 1860-1864
Topic Spotting and Summarization of Spoken Documents
- Shih-Hung Liu, Kuan-Yu Chen, Yu-Lun Hsieh, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, Wen-Lian Hsu:
Enhanced language modeling for extractive speech summarization with sentence relatedness information. 1865-1869 - Mohamed Morchid, Mohamed Bouallegue, Richard Dufour, Georges Linarès, Driss Matrouf, Renato De Mori:
I-vector based representation of highly imperfect automatic transcriptions. 1870-1874 - Catherine Lai, Steve Renals:
Incorporating lexical and prosodic information at different levels for meeting summarization. 1875-1879 - Mohamed Bouallegue, Mohamed Morchid, Richard Dufour, Driss Matrouf, Georges Linarès, Renato De Mori:
Subspace Gaussian mixture models for dialogues classification. 1880-1884 - Mohamed Bouallegue, Mohamed Morchid, Richard Dufour, Driss Matrouf, Georges Linarès, Renato De Mori:
Factor analysis based semantic variability compensation for automatic conversation representation. 1885-1889 - Abdessalam Bouchekif, Géraldine Damnati, Delphine Charlet:
Speech cohesion for topic segmentation of spoken contents. 1890-1894
DNN Learning
- Yan Huang, Dong Yu, Chaojun Liu, Yifan Gong:
A comparative analytic study on the Gaussian mixture and context dependent deep neural network hidden Markov models. 1895-1899 - Michiel Bacchiani, Andrew W. Senior, Georg Heigold:
Asynchronous, online, GMM-free training of a context dependent acoustic model for speech recognition. 1900-1904 - Navdeep Jaitly, Vincent Vanhoucke, Geoffrey E. Hinton:
Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models. 1905-1909 - Jinyu Li, Rui Zhao, Jui-Ting Huang, Yifan Gong:
Learning small-size DNN with output-distribution-based criteria. 1910-1914 - Li Deng, John C. Platt:
Ensemble deep learning for speech recognition. 1915-1919 - Yucan Zhou, Qinghua Hu, Jie Liu, Yuan Jia:
Learning conditional random field with hierarchical representations for dialogue act recognition. 1920-1923
Perception of Emotion and Prosody
- Cristiane Hsu, Yi Xu:
Can adolescents with autism perceive emotional prosody? 1924-1928 - Juliane Schmidt, Esther Janse, Odette Scharenborg:
Age, hearing loss and the perception of affective utterances in conversational speech. 1929-1933 - Zhaojun Yang, Shrikanth S. Narayanan:
Analysis of emotional effect on speech-body gesture interplay. 1934-1938 - Cyrielle Chappuis, Didier Grandjean:
When voices get emotional: a study of emotion-enhanced memory and impairment during emotional prosody exposure. 1939-1943 - Margaret Zellers:
Perception of pitch tails at potential turn boundaries in Swedish. 1944-1948 - Robert Fuchs:
Towards a perceptual model of speech rhythm: integrating the influence of f0 on perceived duration. 1949-1953
Deep Neural Networks for Speech Generation and Synthesis (Special
- Ling-Hui Chen, Tuomo Raitio, Cassia Valentini-Botinhao, Junichi Yamagishi, Zhen-Hua Ling:
DNN-based stochastic postfilter for HMM-based speech synthesis. 1954-1958 - Shiyin Kang, Helen M. Meng:
Statistical parametric speech synthesis using weighted multi-distribution deep belief network. 1959-1963 - Yuchen Fan, Yao Qian, Feng-Long Xie, Frank K. Soong:
TTS synthesis with bidirectional LSTM based recurrent neural networks. 1964-1968 - Tuomo Raitio, Antti Suni, Lauri Juvela, Martti Vainio, Paavo Alku:
Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort. 1969-1973
Speech Analysis and Perception
- Shufang Xu:
Acoustic investigation of /th/ lenition in brunei Mandarin. 1974-1977 - Ting Wang, Hongwei Ding, Jianjing Kuang, Qiuwu Ma:
Mapping emotions into acoustic space: the role of voice quality. 1978-1982 - Nagaraj Mahajan, Nima Mesgarani, Hynek Hermansky:
Principal components of auditory spectro-temporal receptive fields. 1983-1987 - Marwa Thlithi, Thomas Pellegrini, Julien Pinquier, Régine André-Obrecht:
Segmentation in singer turns with the Bayesian information criterion. 1988-1992 - Catherine Inez Watson:
Mappings between vocal tract area functions, vocal tract resonances and speech formants for multiple speakers. 1993-1997 - Sebastian Arndt, Markus Wenzel, Jan-Niklas Antons, Friedemann Köster, Sebastian Möller, Gabriel Curio:
A next step towards measuring perceived quality of speech through physiology. 1998-2001 - Fei Chen, Sharon W. K. Wong, Lena L. N. Wong:
Effect of spectral degradation to the intelligibility of vowel sentences. 2002-2005 - Jeffrey Berry, John Jaeger IV, Melissa Wiedenhoeft, Brittany Bernal, Michael T. Johnson:
Consonant context effects on vowel sensorimotor adaptation. 2006-2010 - Gérard Bailly, Amélie Martin:
Assessing objective characterizations of phonetic convergence. 2011-2015 - Michael I. Mandel, Sarah E. Yoho, Eric W. Healy:
Generalizing time-frequency importance functions across noises, talkers, and phonemes. 2016-2020 - Yatin Mahajan, Jeesun Kim, Chris Davis:
Does elderly speech recognition in noise benefit from spectral and visual cues? 2021-2025 - Kornel Laskowski:
On the conversant-specificity of stochastic turn-taking models. 2026-2030
Intelligibility Enhancement and Predictive Measures
- Toshihiro Sakano, Yosuke Kobayashi, Kazuhiro Kondo:
Single-ended estimation of speech intelligibility using the ITU p.563 feature set. 2031-2035 - Emma Jokinen, Ulpu Remes, Marko Takanen, Kalle J. Palomäki, Mikko Kurimo, Paavo Alku:
Spectral tilt modelling with GMMs for intelligibility enhancement of narrowband telephone speech. 2036-2040 - Friedemann Köster, Sebastian Möller:
Analyzing perceptual dimensions of conversational speech quality. 2041-2045 - Vincent Aubanel, Chris Davis, Jeesun Kim:
Interplay of informational content and energetic masking in speech perception in noise. 2046-2049 - Tudor-Catalin Zorila, Yannis Stylianou:
On spectral and time domain energy reallocation for speech-in-noise intelligibility enhancement. 2050-2054 - Fei Chen, Yi Hu:
Objective quality evaluation of noise-suppressed speech: effects of temporal envelope and fine-structure cues. 2055-2058 - Dongmei Wang, Philipos C. Loizou, John H. L. Hansen:
Noisy speech enhancement based on long term harmonic model to improve speech intelligibility for hearing impaired listeners. 2059-2062 - Cassia Valentini-Botinhao, Mirjam Wester:
Using linguistic predictability and the lombard effect to increase the intelligibility of synthetic speech in noise. 2063-2067 - Maryam Al Dabel, Jon Barker:
Speech pre-enhancement using a discriminative microscopic intelligibility model. 2068-2072 - Mark J. Harvilla, Richard M. Stern:
Least squares signal declipping for robust speech recognition. 2073-2077
Speech and Language Processing - General Topics
- Haihua Xu, Hang Su, Chng Eng Siong, Haizhou Li:
Semi-supervised training for bottle-neck feature based DNN-HMM hybrid systems. 2078-2082 - Olga Kapralova, John Alex, Eugene Weinstein, Pedro J. Moreno, Olivier Siohan:
A big data approach to acoustic model training corpus selection. 2083-2087 - Patrick Cardinal, Ahmed Ali, Najim Dehak, Yu Zhang, Tuka Al Hanai, Yifan Zhang, James R. Glass, Stephan Vogel:
Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera. 2088-2092 - Martin Sundermeyer, Ralf Schlüter, Hermann Ney:
rwthlm - the RWTH aachen university neural network language modeling toolkit. 2093-2097 - Wei-Chen Cheng, Stanley Kok, Hoai Vu Pham, Hai Leong Chieu, Kian Ming Adam Chai:
Language modeling with sum-product networks. 2098-2102 - Xiaodong Cui, Brian Kingsbury, Jia Cui, Bhuvana Ramabhadran, Andrew Rosenberg, Mohammad Sadegh Rasooli, Owen Rambow, Nizar Habash, Vaibhava Goel:
Improving deep neural network acoustic modeling for audio corpus indexing under the IARPA babel program. 2103-2107 - Shammur Absar Chowdhury, Arindam Ghosh, Evgeny A. Stepanov, Ali Orkan Bayer, Giuseppe Riccardi, Ioannis Klasinas:
Cross-language transfer of semantic annotation via targeted crowdsourcing. 2108-2112 - Dilek Hakkani-Tür, Asli Celikyilmaz, Larry P. Heck, Gökhan Tür, Geoffrey Zweig:
Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding. 2113-2117 - Philip N. Garner, David Imseng, Thomas Meyer:
Automatic speech recognition and translation of a Swiss German dialect: Walliserdeutsch. 2118-2122 - Salima Harrat, Karima Meftouh, Mourad Abbas, Kamel Smaïli:
Building resources for Algerian Arabic dialects. 2123-2127
Show and Tell Session 1, 1
- Patrick Chawah, Samer Al Kork, Thibaut Fux, Martine Adda-Decker, Angélique Amelot, Nicolas Audibert, Bruce Denby, Gérard Dreyfus, A. Jaumard-Hakoun, Claire Pillot-Loiseau, Pierre Roussel, Maureen Stone, Kele Xu, Lise Crevier-Buchman:
An educational platform to capture, visualize and analyze rare singing. 2128-2129 - Kwang Myung Jeon, Chan Jun Chun, Woo Kyeong Seong, Hong Kook Kim, Myung Kyu Choi:
Single-channel speech enhancement based on non-negative matrix factorization and online noise adaptation. 2130-2131 - Dieter Maurer, Peggy Mok, Daniel Friedrichs, Volker Dellwo:
Intelligibility of high-pitched vowel sounds in the singing and speaking of a female Cantonese opera singer. 2132-2133 - Pejman Mowlaee, Mario Kaoru Watanabe, Rahim Saeidi:
Iterative refinement of amplitude and phase in single-channel speech enhancement. 2134-2135 - Sophie Roekhaut, Sandrine Brognaux, Richard Beaufort, Thierry Dutoit:
elite-HTS: a NLP tool for French HMM-based speech synthesis. 2136-2137 - Andreea I. Niculescu, Rafael E. Banchs, Ridong Jiang, Seokhwan Kim, Kheng Hui Yeo, Arthur Niswar:
SARA - singapore's automated responsive assistant for the touristic domain. 2138-2139 - Andrew R. Plummer, Eric Riebling, Anuj Kumar, Florian Metze, Eric Fosler-Lussier, Rebecca Bates:
The speech recognition virtual kitchen: launch party. 2140-2141 - Kyle Marek-Spartz, Benjamin Knoll, Robert Bill, S. Thomas Christie, Serguei V. S. Pakhomov:
System for automated speech and language analysis (SALSA). 2142-2143 - Ikuyo Masuda-Katsuse:
Pronunciation practice support system for children who have difficulty correctly pronouncing words. 2144-2145 - Joris Driesen, Alexandra Birch, Simon Grimsey, Saeid Safarfashandi, Juliet Gauthier, Matt Simpson, Steve Renals:
Automated production of true-cased punctuated subtitles for weather and news broadcasts. 2146-2147 - Minghui Dong, Siu Wa Lee, Haizhou Li, Paul Y. Chan, Xuejian Peng, Jochen Walter Ehnes, Dong-Yan Huang:
I2r speech2singing perfects everyone's singing. 2148-2149
Language, Dialect and Accent Recognition
- Luciana Ferrer, Yun Lei, Mitchell McLaren, Nicolas Scheffer:
Spoken language recognition based on senone posteriors. 2150-2154 - Javier Gonzalez-Dominguez, Ignacio López-Moreno, Hasim Sak, Joaquin Gonzalez-Rodriguez, Pedro J. Moreno:
Automatic language identification using long short-term memory recurrent neural networks. 2155-2159 - Brecht Desplanques, Kris Demuynck, Jean-Pierre Martens:
Robust language recognition via adaptive language factor extraction. 2160-2164 - Hamid Behravan, Ville Hautamäki, Sabato Marco Siniscalchi, Elie Khoury, Tommi Kurki, Tomi Kinnunen, Chin-Hui Lee:
Dialect levelling in Finnish: a universal speech attribute approach. 2165-2169 - Mingming Chen, Zhanlei Yang, Hao Zheng, Wenju Liu:
Improving native accent identification using deep neural networks. 2170-2174 - Marie-José Kolly, Adrian Leemann, Volker Dellwo:
Foreign accent recognition based on temporal information contained in lowpass-filtered speech. 2175-2179
Adaptation 1, 2
- Penny Karanasou, Yongqiang Wang, Mark J. F. Gales, Philip C. Woodland:
Adaptation of deep neural network acoustic models using factorised i-vectors. 2180-2184 - Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura, Steven J. Rennie, Vaibhava Goel:
Regularized feature-space discriminative adaptation for robust ASR. 2185-2188 - Yajie Miao, Hao Zhang, Florian Metze:
Towards speaker adaptive training of deep neural network acoustic models. 2189-2193 - Arseniy Gorin, Denis Jouvet:
Component structuring and trajectory modeling for speech recognition. 2194-2198 - Rama Doddipatla, Madina Hasan, Thomas Hain:
Speaker dependent bottleneck layer training for speaker adaptation in automatic speech recognition. 2199-2203 - Zhao You, Bo Xu:
Improving wideband acoustic models using mixed-bandwidth training data via DNN adaptation. 2204-2208
Speaker Localization
- Kushagra Singhal, Rajesh M. Hegde:
A sparse reconstruction method for speech source localization using partial dictionaries over a spherical microphone array. 2209-2213 - Weiwei Cui, Jaeyeon Cho, Seungyeol Lee:
A robust TDOA estimation method for in-car-noise environments. 2214-2217 - Lorin Netsch, Jacek Stachurski:
Robust low-resource sound localization in correlated noise. 2218-2222 - Dongwen Ying, Ruohua Zhou, Junfeng Li, Jielin Pan, Yonghong Yan:
Direction-of-arrival estimation of multiple speakers using a planar array. 2223-2227 - Wei Xue, Shan Liang, Wenju Liu:
Weighted spatial bispectrum correlation matrix for DOA estimation in the presence of interferences. 2228-2232 - Mariem Bouafif, Zied Lachiri:
Multi-sources separation for sound source localization. 2233-2237
Speech Analysis I, II
- Yi Luan, Richard A. Wright, Mari Ostendorf, Gina-Anne Levow:
Relating automatic vowel space estimates to talker intelligibility. 2238-2242 - Hideki Kawahara, Masanori Morise, Tomoki Toda, Hideki Banno, Ryuichi Nisimura, Toshio Irino:
Excitation source analysis for high-quality speech manipulation systems based on an interference-free representation of group delay with minimum phase response compensation. 2243-2247 - Christian Fischer Pedersen, Tom Bäckström:
Sparse time-frequency representation of speech by the vandermonde transform. 2248-2252 - Mahesh Kumar Nandwana, John H. L. Hansen:
Analysis and identification of human scream: implications for speaker recognition. 2253-2257 - Dongmei Wang, Philipos C. Loizou, John H. L. Hansen:
F0 estimation in noisy speech based on long-term harmonic feature analysis combined with neural network classification. 2258-2262 - Malcolm Slaney, Michael L. Seltzer:
The influence of pitch and noise on the discriminability of filterbank features. 2263-2267
Deep Neural Networks for Speech Generation and Synthesis (Special
- Raul Fernandez, Asaf Rendel, Bhuvana Ramabhadran, Ron Hoory:
Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. 2268-2272 - Xiang Yin, Ming Lei, Yao Qian, Frank K. Soong, Lei He, Zhen-Hua Ling, Li-Rong Dai:
Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree. 2273-2277 - Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki:
High-order sequence modeling using speaker-dependent recurrent temporal restricted boltzmann machines for voice conversion. 2278-2282 - Feng-Long Xie, Yao Qian, Yuchen Fan, Frank K. Soong, Haifeng Li:
Sequence error (SE) minimization training of neural network for voice conversion. 2283-2287 - Florent Bocquelet, Thomas Hueber, Laurent Girin, Pierre Badin, Blaise Yvert:
Robust articulatory speech synthesis using deep neural networks for BCI applications. 2288-2292
Speech Synthesis I-III
- Diandra Fabre, Thomas Hueber, Pierre Badin:
Automatic animation of an articulatory tongue model from ultrasound images using Gaussian mixture regression. 2293-2297 - Patrick Lumban Tobing, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura, Ayu Purwarianti:
Articulatory controllable speech modification based on statistical feature mapping with Gaussian mixture models. 2298-2302 - Chuang Ding, Pengcheng Zhu, Lei Xie, Dongmei Jiang, Zhong-Hua Fu:
Speech-driven head motion synthesis using neural networks. 2303-2307 - Peng Song, Yun Jin, Wenming Zheng, Li Zhao:
Text-independent voice conversion using speaker model alignment method from non-parallel speech. 2308-2312 - Ling-Hui Chen, Zhen-Hua Ling, Li-Rong Dai:
Voice conversion using generative trained deep neural networks with multiple frame spectral envelopes. 2313-2317 - Gerard Sanchez, Hanna Silén, Jani Nurminen, Moncef Gabbouj:
Hierarchical modeling of F0 contours for voice conversion. 2318-2321 - Kento Kadowaki, Tatsuma Ishihara, Nobukatsu Hojo, Hirokazu Kameoka:
Speech prosody generation for text-to-speech synthesis based on generative model of F0 contours. 2322-2326 - Xiayu Chen, Yang Zhang, Mark Hasegawa-Johnson:
An iterative approach to decision tree training for context dependent speech synthesis. 2327-2331 - Thi Thu Trang Nguyen, Albert Rilliard, Do Dat Tran, Christophe d'Alessandro:
Prosodic phrasing modeling for vietnamese TTS using syntactic information. 2332-2336 - Tomoki Koriyama, Hiroshi Suzuki, Takashi Nose, Takahiro Shinozaki, Takao Kobayashi:
Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling. 2337-2341 - Qiang Fang, Jianguo Wei, Fang Hu:
Reconstruction of mistracked articulatory trajectories. 2342-2345
Speech Representation, Detection and Classification
- Chiyuan Zhang, Stephen Voinea, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso A. Poggio:
Phone classification by a hierarchy of invariant representation layers. 2346-2350 - Mark Sinclair, Peter Bell, Alexandra Birch, Fergus McInnes:
A semi-Markov model for speech segmentation with an utterance-break prior. 2351-2355 - G. Aneeja, B. Yegnanarayana:
Speech detection in transient noises. 2356-2360 - Yongjun He, Guanglu Sun, Guibin Zheng, Jiqing Han:
Evaluation of dictionary for sparse coding in speech processing. 2361-2364 - Colin Vaz, Vikram Ramanarayanan, Shrikanth S. Narayanan:
Joint filtering and factorization for recovering latent structure from noisy speech data. 2365-2369 - Ascensión Gallardo-Antolín, Juan Manuel Montero, Simon King:
A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis. 2370-2374 - Taichi Asami, Ryo Masumura, Hirokazu Masataki, Sumitaka Sakauchi:
Read and spontaneous speech classification based on variance of GMM supervectors. 2375-2379 - Navid Shokouhi, Seyed Omid Sadjadi, John H. L. Hansen:
Co-channel speech detection via spectral analysis of frequency modulated sub-bands. 2380-2384 - Stephen Voinea, Chiyuan Zhang, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso A. Poggio:
Word-level invariant representations from acoustic waveforms. 2385-2389 - Paul Dalsgaard, Ove Andersen:
On closed form calculation of line spectral frequencies (LSF). 2390-2394 - Chahid Ouali, Pierre Dumouchel, Vishwa Gupta:
Robust features for content-based audio copy detection. 2395-2399 - Yi Jiang, DeLiang Wang, Runsheng Liu:
Binaural deep neural network classification for reverberant speech segregation. 2400-2404
Feature Extraction and Modeling for ASR 1, 2
- Jürgen T. Geiger, Jort F. Gemmeke, Björn W. Schuller, Gerhard Rigoll:
Investigating NMF speech enhancement for neural network based acoustic models. 2405-2409 - Jason Lilley, James J. Mahshie, H. Timothy Bunnell:
Automatic speech feature classification for children with cochlear implants. 2410-2414 - Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey:
Sequential maximum mutual information linear discriminant analysis for speech recognition. 2415-2419 - Shabnam Ghaffarzadegan, Hynek Boril, John H. L. Hansen:
Model and feature based compensation for whispered speech recognition. 2420-2424 - Amir R. Moghimi, Bhiksha Raj, Richard M. Stern:
Post-masking: a hybrid approach to array processing for speech recognition. 2425-2429 - Fernando de-la-Calle-Silos, Francisco José Valverde Albacete, Ascensión Gallardo-Antolín, Carmen Peláez-Moreno:
ASR feature extraction with morphologically-filtered power-normalized cochleograms. 2430-2434 - Angel Mario Castro Martinez, Niko Moritz, Bernd T. Meyer:
Should deep neural nets have ears? the role of auditory features in deep learning approaches. 2435-2439 - Charles Fox, Thomas Hain:
Extending Limabeam with discrimination and coarse gradients. 2440-2444 - Sankar Mukherjee, Shyamal Kumar Das Mandal:
Generation of F0 contour using deep boltzmann machine and twin Gaussian process hybrid model for bengali language. 2445-2449 - Juan Andres Morales-Cordovilla, Hannes Pessentheiner, Martin Hagmüller, Gernot Kubin:
Room localization for distant speech recognition. 2450-2453 - Sara Bahaadini, Afsaneh Asaei, David Imseng, Hervé Bourlard:
Posterior-based sparse representation for automatic speech recognition. 2454-2458
Spoken Term Detection for Low-Resource Languages I, II
- Xavier Anguera, Luis Javier Rodríguez-Fuentes, Igor Szöke, Andi Buzo, Florian Metze, Mikel Peñagarikano:
Query-by-example spoken term detection on multilingual unconstrained speech. 2459-2463 - Victor Soto, Lidia Mangu, Andrew Rosenberg, Julia Hirschberg:
A comparison of multiple methods for rescoring keyword search lists for low resource languages. 2464-2468 - Damianos G. Karakos, Richard M. Schwartz:
Subword and phonetic search for detecting out-of-vocabulary keywords. 2469-2473 - Yun Wang, Florian Metze:
An in-depth comparison of keyword specific thresholding and sum-to-one score normalization. 2474-2478 - Hung-yi Lee, Yu Zhang, Ekapol Chuangsuwanich, James R. Glass:
Graph-based re-ranking using acoustic feature similarity between search results for spoken term detection on low-resource languages. 2479-2483 - Viet Bac Le, Lori Lamel, Abdelkhalek Messaoudi, William Hartmann, Jean-Luc Gauvain, Cécile Woehrling, Julien Despres, Anindya Roy:
Developing STT and KWS systems using limited language resources. 2484-2488
Voice Conversion
- Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Masami Akamine:
GMM-based bandwidth extension using sub-band basis spectrum model. 2489-2493 - Kazuhiro Nakamura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:
A mel-cepstral analysis technique restoring high frequency components from low-sampling-rate speech. 2494-2498 - Siu Wa Lee, Zhizheng Wu, Minghui Dong, Xiaohai Tian, Haizhou Li:
A comparative study of spectral transformation techniques for singing voice synthesis. 2499-2503 - Daisuke Saito, Hidenobu Doi, Nobuaki Minematsu, Keikichi Hirose:
Application of matrix variate Gaussian mixture model to statistical voice conversion. 2504-2508 - Zhizheng Wu, Chng Eng Siong, Haizhou Li:
Joint nonnegative matrix factorization for exemplar-based voice conversion. 2509-2513 - Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Statistical singing voice conversion with direct waveform modification based on the spectrum differential. 2514-2518
Speech and Audio Segmentation and Classification
- Daniel P. W. Ellis, Hiroyuki Satoh, Zhuo Chen:
Detecting proximity from personal audio recordings. 2519-2523 - Huy Phan, Marco Maaß, Radoslaw Mazur, Alfred Mertins:
Acoustic event detection and localization with regression forests. 2524-2528 - Marc Ferras, Hervé Bourlard:
Multi-source posteriors for speech activity detection on public talks. 2529-2532 - Jonathan William Dennis, Tran Huy Dat, Chng Eng Siong:
Analysis of spectrogram image methods for sound event classification. 2533-2537 - Aharon Satt, Ron Hoory, Alexandra König, Pauline Aalten, Philippe H. Robert:
Speech-based automatic and robust detection of very early dementia. 2538-2542 - Ganna Raboshchuk, Climent Nadeu, Omid Ghahabi, Sergi Solvez, Blanca Muñoz Mahamud, Ana Riverola de Veciana, Santiago Navarro Hervas:
On the acoustic environment of a neonatal intensive care unit: initial description, and detection of equipment alarms. 2543-2547
Language Acquisition
- Robert Allen Fox, Ewa Jacewicz, Florence Hardjono:
Non-native perception of regionally accented speech in a multitalker context. 2548-2552 - Giuseppina Turco, Elisabeth Delais-Roussarie:
A crosslinguistic and acquisitional perspective on intonational rises in French. 2553-2557 - Jung-Yueh Tu, Yuwen Hsiung, Min-Da Wu, Yao-Ting Sung:
Error patterns of Mandarin disyllabic tones by Japanese learners. 2558-2562 - Victoria Leong, Marina Kalashnikova, Denis Burnham, Usha Goswami:
Infant-directed speech enhances temporal rhythmic structure in the envelope. 2563-2567 - Dilu Wewalaarachchi, Leher Singh:
Influences of tone sandhi on word recognition in preschool children. 2568-2571 - Hwee Hwee Goh, Charlene Hu, Kheng Hui Yeo, Leher Singh:
Lexical representation of consonant, vowels and tones in early childhood. 2572-2574
Speech Perception
- Ana A. Francisco, Alexandra Jesse, Margriet A. Groen, James M. McQueen:
Audiovisual temporal sensitivity in typical and dyslexic adult readers. 2575-2579 - Donald Derrick, Greg A. O'Beirne, Tom De Rybel, Jennifer Hay:
Aero-tactile integration in fricatives: converting audio to air flow information for speech perception enhancement. 2580-2584 - Guangting Mai:
Relative importance of AM and FM cues for speech comprehension: effects of speaking rate and their implications for neurophysiological processing of speech. 2585-2589 - Louise Stringer, Paul Iverson:
The effect of regional and non-native accents on word recognition processes: a comparison of EEG responses in quiet to speech recognition in noise. 2590-2594 - Manson Cheuk-Man Fong, James W. Minett, Thierry Blu, William S.-Y. Wang:
Towards a neural measure of perceptual distance - classification of electroencephalographic responses to synthetic vowels. 2595-2599 - Odette Scharenborg, Eric Sanders, Bert Cranen:
Collecting a corpus of Dutch noise-induced 'slips of the ear'. 2600-2604
Language and Lexical Modeling
- Tuka Al Hanai, James R. Glass:
Lexical modeling for Arabic ASR: a systematic approach. 2605-2609 - Luiza Orosanu, Denis Jouvet:
Hybrid language models for speech transcription. 2610-2614 - Ankur Gandhe, Florian Metze, Ian R. Lane:
Neural network language models for low resource languages. 2615-2619 - Siva Reddy Gangireddy, Fergus McInnes, Steve Renals:
Feed forward pre-training for recurrent neural network language models. 2620-2624 - Brandon C. Roy, Soroush Vosoughi, Deb Roy:
Grounding language models in spatiotemporal context. 2625-2629 - Shahab Jalalvand, Daniele Falavigna:
Direct word graph rescoring using a* search and RNNLM. 2630-2634 - Ciprian Chelba, Tomás Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, Tony Robinson:
One billion word benchmark for measuring progress in statistical language modeling. 2635-2639 - Andrea Schnall, Martin Heckmann:
Integrating sequence information in the audio-visual detection of word prominence in a human-machine interaction scenario. 2640-2644 - Fadi Biadsy, Keith B. Hall, Pedro J. Moreno, Brian Roark:
Backoff inspired features for maximum entropy language models. 2645-2649 - Dominic Telaar, Michael Wand, Dirk Gehrig, Felix Putze, Christoph Amma, Dominic Heger, Ngoc Thang Vu, Mark Erhardt, Tim Schlippe, Matthias Janke, Christian Herff, Tanja Schultz:
BioKIT - real-time decoder for biosignal processing. 2650-2654 - David F. Harwath, James R. Glass:
Speech recognition without a lexicon - bridging the gap between graphemic and phonetic systems. 2655-2659
Speech Enhancement (Single- and Multi-Channel) 1, 2
- Shengkui Zhao, Douglas L. Jones:
A new auxiliary-vector algorithm with conjugate orthogonality for speech enhancement. 2660-2664 - Neehar Jathar, Preeti Rao:
Acoustic characteristics of critical message utterances in noise applied to speech intelligibility enhancement. 2665-2669 - Yong Xu, Jun Du, Li-Rong Dai, Chin-Hui Lee:
Dynamic noise aware training for speech enhancement based on deep neural networks. 2670-2674 - Pasi Pertilä, Joonas Nikunen:
Microphone array post-filtering using supervised machine learning for speech enhancement. 2675-2679 - Senthil Kumar Mani, Jitendra Kumar Dhiman, K. Sri Rama Murty:
Novel speech duration modifier for packet based communication system. 2680-2684 - Ding Liu, Paris Smaragdis, Minje Kim:
Experiments on deep learning for speech denoising. 2685-2689 - Nasser Mohammadiha, Simon Doclo:
Single-channel dynamic exemplar-based speech enhancement. 2690-2694 - Akihiro Kato, Ben Milner:
Using hidden Markov models for speech enhancement. 2695-2699 - Lukas Pfeifenberger, Franz Pernkopf:
Blind source extraction based on a direction-dependent a-priori SNR. 2700-2704 - Carlos Eduardo Cancino Chacón, Pejman Mowlaee:
Least squares phase estimation of mixed signals. 2705-2709 - Ji Ming, Danny Crookes:
Speech enhancement from additive noise and channel distortion - a corpus-based approach. 2710-2714
Robust ASR 1, 2
- Hyung-Min Park, Matthew Maciejewski, Chanwoo Kim, Richard M. Stern:
Robust speech recognition in reverberant environments using subband-based steady-state monaural and binaural suppression. 2715-2718 - Rui Zhao, Jinyu Li, Yifan Gong:
Variable-component deep neural network for robust speech recognition. 2719-2723 - Yu-Chen Kao, Yi-Ting Wang, Berlin Chen:
Effective modulation spectrum factorization for robust speech recognition. 2724-2728 - Suman V. Ravuri:
Hybrid MLP/structured-SVM tandem systems for large vocabulary and robust ASR. 2729-2733 - Chanwoo Kim, Kean K. Chin, Michiel Bacchiani, Richard M. Stern:
Robust speech recognition using temporal masking and thresholding algorithm. 2734-2738 - Xurong Xie, Rongfeng Su, Xunying Liu, Lan Wang:
Deep neural network bottleneck features for generalized variable parameter HMMs. 2739-2743 - Suliang Bu, Yanmin Qian, Kai Yu:
A novel dynamic parameters calculation approach for model compensation. 2744-2748 - Naoaki Hashimoto, Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa:
Speech recognition based on Itakura-Saito divergence and dynamics/sparseness constraints from mixed sound of speech and music by non-negative matrix factorization. 2749-2753 - Yong-Joo Chung:
Noise robust speech recognition based on noise-adapted HMMs using speech feature compensation. 2754-2758 - Md. Jahangir Alam, Patrick Kenny, Pierre Dumouchel, Douglas D. O'Shaughnessy:
Noise spectrum estimation using Gaussian mixture model-based speech presence probability for robust speech recognition. 2759-2763
Spoken Term Detection for Low-Resource Languages I, II
- William Hartmann, Viet Bac Le, Abdelkhalek Messaoudi, Lori Lamel, Jean-Luc Gauvain:
Comparing decoding strategies for subword-based keyword spotting in low-resourced languages. 2764-2768 - Min Ma, Justin Richards, Victor Soto, Julia Hirschberg, Andrew Rosenberg:
Strategies for rescoring keyword search results using word-burst and acoustic features. 2769-2773 - Di Xu, Florian Metze:
Word-based probabilistic phonetic retrieval for low-resource spoken term detection. 2774-2778 - I-Fan Chen, Nancy F. Chen, Chin-Hui Lee:
A keyword-boosted sMBR criterion to enhance keyword search performance in deep neural network based acoustic modeling. 2779-2783 - Justin T. Chiu, Yun Wang, Jan Trmal, Daniel Povey, Guoguo Chen, Alexander I. Rudnicky:
Combination of FST and CN search in spoken term detection. 2784-2788 - Chunxi Liu, Aren Jansen, Guoguo Chen, Keith Kintzley, Jan Trmal, Sanjeev Khudanpur:
Low-resource open vocabulary keyword search using point process models. 2789-2793
Speech Coding and Transmission
- Tom Bäckström, Christian R. Helmrich:
Decorrelated innovative codebooks for ACELP using factorization of autocorrelation matrix. 2794-2798 - Milos Cernak, Alexandros Lazaridis, Philip N. Garner, Petr Motlícek:
Stress and accent transmission in HMM-based syllable-context very low bit rate speech coding. 2799-2803 - Hannu Pulakka, Anssi Rämö, Ville Myllylä, Henri Toukomaa, Paavo Alku:
Subjective voice quality evaluation of artificial bandwidth extension: comparing different audio bandwidths and speech codecs. 2804-2808 - Zhong-Hua Fu, Lei Xie:
Stereo acoustic echo suppression using widely linear filtering in the frequency domain. 2809-2813 - Bong-Ki Lee, Inyoung Hwang, Jihwan Park, Joon-Hyuk Chang:
Enhanced muting method in packet loss concealment of ITU-t g.722 using sigmoid function with on-line optimized parameters. 2814-2818 - Chao Wu, Kaiyu Jiang, Yanmeng Guo, Qiang Fu, Yonghong Yan:
A robust step-size control algorithm for frequency domain acoustic echo cancellation. 2819-2823
Speech Enhancement (Single- and Multi-Channel) 1, 2
- Zhiyuan Zhou, Zhaogui Ding, Weifeng Li, Zhiyong Wu, Longbiao Wang, Qingmin Liao:
Multi-channel speech enhancement using sparse coding on local time-frequency structures. 2824-2827 - Seyedmahdad Mirsamadi, John H. L. Hansen:
Multichannel speech dereverberation based on convolutive nonnegative tensor factorization for ASR applications. 2828-2832 - Zhuo Chen, Brian McFee, Daniel P. W. Ellis:
Speech enhancement by low-rank and convolutive dictionary spectrogram decomposition. 2833-2837 - Xabier Jaureguiberry, Emmanuel Vincent, Gaël Richard:
Multiple-order non-negative matrix factorization for speech enhancement. 2838-2842 - Tae Gyoon Kang, Kisoo Kwon, Jong Won Shin, Nam Soo Kim:
NMF-based speech enhancement incorporating deep neural network. 2843-2846 - Sukanya Sonowal, Kisoo Kwon, Nam Soo Kim, Jong Won Shin:
A data-driven approach to speech enhancement using Gaussian process. 2847-2851
Unsupervised or Corrective Lexical Modeling
- E. Byambakhishig, Katsuyuki Tanaka, Ryo Aihara, Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki:
Error correction of automatic speech recognition based on normalized web distance. 2852-2856 - Erinç Dikici, Murat Saraçlar:
Unsupervised training methods for discriminative language modeling. 2857-2861 - Long Qin, Alexander I. Rudnicky:
Building a vocabulary self-learning speech recognition system. 2862-2866 - Tim Schlippe, Matthias Merz, Tanja Schultz:
Methods for efficient semi-automatic pronunciation dictionary bootstrapping. 2867-2871 - Murat Akbacak, Dilek Hakkani-Tür, Gökhan Tür:
Rapidly building domain-specific entity-centric language models using semantic web knowledge sources. 2872-2876 - Ann Lee, James R. Glass:
Context-dependent pronunciation error pattern discovery with limited annotations. 2877-2881
Meta Data
- Ashtosh Sapru, Hervé Bourlard:
Detecting speaker roles and topic changes in multiparty conversations using latent topic models. 2882-2886 - Chenglin Xu, Lei Xie, Guangpu Huang, Xiong Xiao, Engsiong Chng, Haizhou Li:
A deep neural network approach for sentence boundary detection in broadcast news. 2887-2891 - Rahul Gupta, Sankaranarayanan Ananthakrishnan, Zhaojun Yang, Shrikanth S. Narayanan:
Variable Span disfluency detection in ASR transcripts. 2892-2896 - Camille Dutrey, Chloé Clavel, Sophie Rosset, Ioana Vasilescu, Martine Adda-Decker:
A CRF-based approach to automatic disfluency detection in a French call-centre corpus. 2897-2901 - Madina Hasan, Rama Doddipatla, Thomas Hain:
Multi-pass sentence-end detection of lecture speech. 2902-2906 - Victoria Zayats, Mari Ostendorf, Hannaneh Hajishirzi:
Multi-domain disfluency and repair detection. 2907-2911
Speech Synthesis I-III
- Langzhou Chen, Norbert Braunschweiler:
Enabling controllability for continuous expression space. 2912-2916 - Takashi Nose, Akinori Ito:
Analysis of spectral enhancement using global variance in HMM-based speech synthesis. 2917-2921 - Cassia Valentini-Botinhao, Markus Toman, Michael Pucher, Dietmar Schabus, Junichi Yamagishi:
Intelligibility analysis of fast synthesized speech. 2922-2926 - Susana Palmaz López-Peláez, Robert A. J. Clark:
Speech synthesis reactive to dynamic noise environmental conditions. 2927-2931 - Timo Baumann:
Partial representations improve the prosody of incremental speech synthesis. 2932-2936 - Pirros Tsiakoulis, Catherine Breslin, Milica Gasic, Matthew Henderson, Dongho Kim, Steve J. Young:
Dialogue context sensitive speech synthesis using factorized decision trees. 2937-2941 - Xin Wang, Zhen-Hua Ling, Li-Rong Dai:
Concept-to-speech generation by integrating syntagmatic features into HMM-based speech synthesis. 2942-2946 - Dhananjaya N. Gowda, Heikki Kallasjoki, Reima Karhila, Cristian Contan, Kalle J. Palomäki, Mircea Giurgiu, Mikko Kurimo:
On the role of missing data imputation and NMF feature enhancement in building synthetic voices using reverberant speech. 2947-2951 - Cong-Thanh Do, Marc Evrard, A. Leman, Christophe d'Alessandro, Albert Rilliard, J.-L. Crebouw:
Objective evaluation of HMM-based speech synthesis system using kullback-leibler divergence. 2952-2956 - Javier Latorre, Kayoko Yanagisawa, Vincent Wan, BalaKrishna Kolluru, Mark J. F. Gales:
Speech intonation for TTS: study on evaluation methodology. 2957-2961
Adaptation 1, 2
- Thomas Pellegrini, Vahid Hedayati, Isabel Trancoso, Annika Hämäläinen, Miguel Sales Dias:
Speaker age estimation for elderly speech recognition in European Portuguese. 2962-2966 - Maryam Najafian, Andrea DeMarco, Stephen J. Cox, Martin J. Russell:
Unsupervised model selection for recognition of regional accented speech. 2967-2971 - Wen-Lin Zhang, Dan Qu, Wei-Qiang Zhang, Bi-Cheng Li:
Speaker adaptation based on sparse and low-rank eigenphone matrix estimation. 2972-2976 - Yan Huang, Dong Yu, Chaojun Liu, Yifan Gong:
Multi-accent deep neural network acoustic model with accent-specific top layer using the KLD-regularized model adaptation. 2977-2981 - Syed Shahnawazuddin, Rohit Sinha:
A low complexity model adaptation approach involving sparse coding over multiple dictionaries. 2982-2986 - Yuichi Kubota, Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Tsuneo Nitta:
Effect of frequency weighting on MLP-based speaker canonicalization. 2987-2991 - Zhen Huang, Jinyu Li, Sabato Marco Siniscalchi, I-Fan Chen, Chao Weng, Chin-Hui Lee:
Feature space maximum a posteriori linear regression for adaptation of deep neural networks. 2992-2996 - Natalia A. Tomashenko, Yuri Y. Khokhlov:
Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. 2997-3001 - Martin Karafiát, Frantisek Grézl, Karel Veselý, Mirko Hannemann, Igor Szöke, Jan Cernocký:
BUT 2014 Babel system: analysis of adaptation in NN based systems. 3002-3006 - Mickael Rouvier, Benoît Favre:
Speaker adaptation of DNN-based ASR with i-vectors: does it actually adapt models to speakers? 3007-3011
Language Recognition
- Bing Jiang, Yan Song, Si Wei, Ian Vince McLoughlin, Li-Rong Dai:
Task-aware deep bottleneck features for spoken language identification. 3012-3016 - Rong Tong, Bin Ma, Haizhou Li:
Virtual example for phonotactic language recognition. 3017-3021 - Wei-Wei Liu, Wei-Qiang Zhang, Jia Liu:
Phonotactic language recognition based on time-gap-weighted lattice kernels. 3022-3026 - Maarten Van Segbroeck, Ruchir Travadi, Shrikanth S. Narayanan:
UBM fused total variability modeling for language identification. 3027-3031 - Mireia Díez, Mikel Peñagarikano, Germán Bordel, Amparo Varona, Luis Javier Rodríguez-Fuentes:
On the complementarity of short-time fourier analysis windows of different lengths for improved language recognition. 3032-3036 - Ruchir Travadi, Maarten Van Segbroeck, Shrikanth S. Narayanan:
Modified-prior i-vector estimation for language identification of short duration utterances. 3037-3041 - Luis Fernando D'Haro, Ricardo de Córdoba, Christian Salamea Palacios, Javier Ferreiros:
Language recognition using phonotactic-based shifted delta coefficients and multiple phone recognizers. 3042-3046 - Oldrich Plchot, Mireia Díez, Mehdi Soufifar, Lukás Burget:
PLLR features in language recognition system for RATS. 3047-3051 - Yin-Lai Yeong, Tien-Ping Tan:
Language identification of code Switching sentences and multilingual sentences of under-resourced languages by using multi structural word information. 3052-3055
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.