default search action
INTERSPEECH 2012: Portland, Oregon, USA
- 13th Annual Conference of the International Speech Communication Association, INTERSPEECH 2012, Portland, Oregon, USA, September 9-13, 2012. ISCA 2012
An Information-Extraction Approach to Speech Analysis and Processing
- Chin-Hui Lee:
An Information-Extraction Approach to Speech Analysis and Processing. 1-5
ASR: Deep Neural Networks I
- Dong Yu, Li Deng, Frank Seide:
Large Vocabulary Speech Recognition Using Deep Tensor Neural Networks. 6-9 - Brian Kingsbury, Tara N. Sainath, Hagen Soltau:
Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization. 10-13 - George Saon, Brian Kingsbury:
Discriminative feature-space transforms using deep neural networks. 14-17 - Zoltán Tüske, Ralf Schlüter, Hermann Ney, Martin Sundermeyer:
Context-Dependent MLPs for LVCSR: TANDEM, Hybrid or Both? 18-21 - Andrew L. Maas, Quoc V. Le, Tyler M. O'Neil, Oriol Vinyals, Patrick Nguyen, Andrew Y. Ng:
Recurrent Neural Networks for Noise Reduction in Robust ASR. 22-25 - Xie Chen, Adam Eversole, Gang Li, Dong Yu, Frank Seide:
Pipelined Back-Propagation for Context-Dependent Deep Neural Networks. 26-29
Language Recognition
- Hynek Boril, Abhijeet Sangwan, John H. L. Hansen:
Arabic Dialect Identification - 'Is the Secret in the Silence?' and Other Observations. 30-33 - Craig S. Greenberg, Alvin F. Martin, Mark A. Przybocki:
The 2011 NIST Language Recognition Evaluation. 34-37 - Luis Javier Rodríguez-Fuentes, Mikel Peñagarikano, Amparo Varona, Mireia Díez, Germán Bordel, Alberto Abad, David Martínez González, Jesús Antonio Villalba López, Alfonso Ortega, Eduardo Lleida:
The BLZ Submission to the NIST 2011 LRE: Data Collection, System Development and Performance. 38-41 - Luis Fernando D'Haro, Ondrej Glembek, Oldrich Plchot, Pavel Matejka, Mehdi Soufifar, Ricardo de Córdoba, Jan Cernocký:
Phonotactic Language Recognition using i-vectors and Phoneme Posteriogram Counts. 42-45 - Alan McCree, Bengt J. Borgstrom:
Supervector LDA: A New Approach to Reduced-Complexity I-vector Language Recognition. 46-49 - Pavel Matejka, Oldrich Plchot, Mehdi Soufifar, Ondrej Glembek, Luis Fernando D'Haro, Karel Veselý, Frantisek Grézl, Jeff Z. Ma, Spyros Matsoukas, Najim Dehak:
Patrol Team Language Identification System for DARPA RATS P1 Evaluation. 50-53
Communication Disorders and Assistive Technologies
- Fang Hu, Yungang Wu, Wen Xu, Demin Han:
Articulatory Strategies in Obstruent Production in Mandarin Esophageal Speech. 54-57 - Marion Bechet, Fabrice Hirsch, Camille Fauth, Rudolph Sock:
Consonantal space area in Children with a Cleft Palate An acoustic Study. 58-61 - Milton Orlando Sarria-Paja, Tiago H. Falk:
Automated Dysarthria Severity Classification for Improved Objective Intelligibility Assessment of Spastic Dysarthric Speech. 62-65 - Abdellah Kacha, Francis Grenez, Jean Schoentgen:
Assessment of Disordered Voices Using Empirical Mode Decomposition in the Log-Spectral Domain. 66-69 - Anna Katharina Fuchs, Martin Hagmüller:
Learning an Artificial F0-Contour for ALT Speech. 70-73 - Korin Richmond, Steve Renals:
Ultrax: An Animated Midsagittal Vocal Tract Display for Speech Therapy. 74-77
Voice Conversion
- Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Yih-Ru Wang, Sin-Horng Chen:
A Study of Mutual Information for GMM-Based Spectral Conversion. 78-81 - Na Li, Yu Qiao:
Bayesian Mixture of Probabilistic Linear Regressions for Voice Conversion. 82-85 - Daniel Erro, Eva Navas, Inma Hernáez:
Iterative MMSE Estimation of Vocal Tract Length Normalization Factors for Voice Transformation. 86-89 - Winston S. Percybrooks, Elliot Moore:
A HMM approach to residual estimation for high resolution voice conversion. 90-93 - Tomoki Toda, Takashi Muramatsu, Hideki Banno:
Implementation of Computationally Efficient Real-Time Voice Conversion. 94-97 - Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose:
Effects of Speaker Adaptive Training on Tensor-based Arbitrary Speaker Conversion. 98-101
Speaker Trait Challenge - Part 1
- Björn W. Schuller, Stefan Steidl, Anton Batliner, Elmar Nöth, Alessandro Vinciarelli, Felix Burkhardt, Rob van Son, Felix Weninger, Florian Eyben, Tobias Bocklet, Gelareh Mohammadi, Benjamin Weiss:
The INTERSPEECH 2012 Speaker Trait Challenge. 254-257 - Tim Polzehl, Katrin Schoenenberg, Sebastian Möller, Florian Metze, Gelareh Mohammadi, Alessandro Vinciarelli:
On Speaker-Independent Personality Perception and Prediction from Speech. 258-261 - Kartik Audhkhasi, Angeliki Metallinou, Ming Li, Shrikanth S. Narayanan:
Speaker Personality Classification Using Systems Based on Acoustic-Lexical Cues and an Optimal Tree-Structured Bayesian Network. 262-265 - Clément Chastagnol, Laurence Devillers:
Personality traits detection using a parallelized modified SFFS algorithm. 266-269 - Jouni Pohjalainen, Serdar Kadioglu, Okko Räsänen:
Feature Selection for Speaker Traits. 270-273 - Johannes Wagner, Florian Lingenfelser, Elisabeth André:
A Frame Pruning Approach for Paralinguistic Recognition Tasks. 274-277 - Alexei Ivanov, Xin Chen:
Modulation Spectrum Analysis for Speaker Personality Trait Recognition. 278-281 - Nicholas Cummins, Julien Epps, Jia Min Karen Kua:
A Comparison of Classification Paradigms for Speaker Likeability Determination. 282-285 - Dingchao Lu, Fei Sha:
Predicting Likability of Speakers with Gaussian Processes. 286-289 - Raymond Brueckner, Björn W. Schuller:
Likability Classification - A Not so Deep Neural Network Approach. 290-293 - Dongrui Wu:
Genetic Algorithm Based Feature Selection for Speaker Trait Classification. 294-297
Phonetics and Phonology
- Felix Weninger, Björn W. Schuller:
Discrimination of Linguistic and Non-Linguistic Vocalizations in Spontaneous Speech: Intra- and Inter-Corpus Perspectives. 102-105 - Mathieu Avanzi, Pauline Dubosson, Sandra Schwab, Nicolas Obin:
Accentual Transfer from Swiss-German to French. A Study of "Français Fédéral". 106-109 - Stefanie Jannedy, Melanie Weirich:
Phonology & the Interpretation of Fine Phonetic Detail in Berlin German. 110-113 - Carlos Toshinori Ishi, Chaoran Liu, Hiroshi Ishiguro, Norihiro Hagita:
Evaluation of a formant-based speech-driven lip motion generation. 114-117 - Jeffrey Kallay, Jeffrey J. Holliday:
Using spectral measures to differentiate Mandarin and Korean sibilant fricatives. 118-121 - Hua-Li Jian, Richard Konopka:
EFL Conversational Triads: Foreigner-directed Speech and Hyperarticulation. 122-125 - Iris Chuoying Ouyang, Khalil Iskarous:
Syllable perception depends on tone perception. 126-129 - Masako Fujimoto, Seiya Funatsu, Ichiro Fujimoto:
How consonants, dialect and speech rate affect vowel devoicing? 134-137
Enhancement
- Thomas Fehér, Dietmar Richter, Oliver Jokisch, Rüdiger Hoffmann:
Distance-Dependent Noise Reduction for Two-Channel Microphones. 138-141 - Wei Xue, Wenju Liu:
Direction of Arrival Estimation Based on Subband Weighting for Noisy Conditions. 142-145 - Jorge I. Marin-Hurtado, David V. Anderson:
Binaural Noise Reduction Using Frequency-Warped FIR Filters. 146-149 - Meng Yu, Jack Xin:
Exploring Off Time Nature for Speech Enhancement. 150-153 - Xulei Bao, Jie Zhu:
Model-based Single-Channel Dereverberation in Noisy Acoustical Environments. 154-157 - Majid Mirbagheri, Sahar Akram, Shihab A. Shamma:
An Auditory Inspired Multimodal Framework for Speech Enhancement. 158-161 - Oldooz Hazrati, Jaewook Lee, Philipos C. Loizou:
Binary Mask Estimation for Improved Speech Intelligibility in Reverberant Environments. 162-165 - Petko Nikolov Petkov, W. Bastiaan Kleijn, Gustav Eje Henter:
Enhancing Subjective Speech Intelligibility Using a Statistical Model of Speech. 166-169
Language Modeling
- Amr El-Desoky Mousa, M. Ali Basha Shaik, Ralf Schlüter, Hermann Ney:
Morpheme Level Feature-based Language Models for German LVCSR. 170-173 - Hitoshi Yamamoto, Paul R. Dixon, Shigeki Matsuda, Chiori Hori, Hideki Kashioka:
Tied-State Mixture Language Model for WFST-based Speech Recognition. 174-177 - Tanel Alumäe, Kaarel Kaljurand:
Maximum Entropy Language Model Adaptation for Mobile Speech Input. 178-181 - Gwénolé Lecorvé, John Dines, Thomas Hain, Petr Motlícek:
Supervised and unsupervised Web-based language model domain adaptation. 182-185 - Yik-Cheung Tam, Paul Vozila:
A Hierarchical Bayesian Approach for Semi-supervised Discriminative Language Modeling. 186-189 - Youzheng Wu, Kazuhiko Abe, Paul R. Dixon, Chiori Hori, Hideki Kashioka:
Leveraging Social Annotation for Topic Language Model Adaptation. 190-193 - Martin Sundermeyer, Ralf Schlüter, Hermann Ney:
LSTM Neural Networks for Language Modeling. 194-197 - Puyang Xu, Brian Roark, Sanjeev Khudanpur:
Phrasal Cohort Based Unsupervised Discriminative Language Modeling. 198-201 - Damianos G. Karakos, Brian Roark, Izhak Shafran, Kenji Sagae, Maider Lehr, Emily Tucker Prud'hommeaux, Puyang Xu, Nathan Glenn, Sanjeev Khudanpur, Murat Saraclar, Daniel M. Bikel, Mark Dredze, Chris Callison-Burch, Yuan Cao, Keith B. Hall, Eva Hasler, Philipp Koehn, Adam Lopez, Matt Post, Darcey Riley:
Deriving conversation-based features from unlabeled speech for discriminative language modeling. 202-205 - Erinç Dikici, Arda Çelebi, Murat Saraclar:
Performance Comparison of Training Algorithms for Semi-Supervised Discriminative Language Modeling. 206-209 - Kapil Thadani, Fadi Biadsy, Daniel M. Bikel:
On-the-fly Topic Adaptation for YouTube Video Transcription. 210-213
Spoken Language Understanding and Dialog
- Bassam Jabaian, Fabrice Lefèvre, Laurent Besacier:
Portability of Semantic Annotations for Fast Development of Dialogue Corpora. 214-217 - Zoraida Callejas, Ramón López-Cózar:
Optimization of Dialog Strategies using Automatic Dialog Simulation and Statistical Dialog Management Techniques. 218-221 - Hiroaki Sugiyama, Toyomi Meguro, Yasuhiro Minami:
Preference-learning based Inverse Reinforcement Learning for Dialog Control. 222-225 - Raveesh Meena, Gabriel Skantze, Joakim Gustafson:
A Data-driven Approach to Understanding Spoken Route Directions in Human-Robot Dialogue. 226-229 - Kazunori Komatani, Akira Hirano, Mikio Nakano:
Detecting System-directed Utterances using Dialogue-level Features. 230-233 - Joaquin Planells, Lluís F. Hurtado, Emilio Sanchis, Encarna Segarra:
An Online Generated Transducer to Increase Dialog Manager Coverage. 234-237 - Abe Kazemzadeh, James Gibson, Juanchen Li, Sungbok Lee, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
A Sequential Bayesian Dialog Agent for Computational Ethnography. 238-241 - Frank Seide, Sean McDirmid:
ClippyScript: A Programming Language for Multi-Domain Dialogue Systems. 242-245 - Klaus-Peter Engelbrecht, Sebastian Möller:
Correlation Between Model-based Approximations of Grounding-related Cognition and User Judgments. 246-249 - Keith Vertanen, Per Ola Kristensson:
Spelling as a Complementary Strategy for Speech Recognition. 2294-2297
ASR: Noise Robustness
- Ken'ichi Kumatani, Bhiksha Raj, Rita Singh, John W. McDonough:
Microphone Array Post-filter based on Spatially-Correlated Noise Measurements for Distant Speech Recognition. 298-301 - Felix Weninger, Martin Wöllmer, Björn W. Schuller:
Combining Bottleneck-BLSTM and Semi-Supervised Sparse NMF for Recognition of Conversational Speech in Highly Instationary Noise. 302-305 - Liang Lu, K. K. Chin, Arnab Ghoshal, Steve Renals:
Noise Compensation for Subspace Gaussian Mixture Models. 306-309 - Yang Sun, Mathew M. Doss, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves:
Combination of Sparse Classification and Multilayer Perceptron for Noise-robust ASR. 310-313 - Weifeng Li, Hervé Bourlard:
Sub-band based Log-energy and Its Dynamic Range Stretching for Robust In-car Speech Recognition. 314-317 - Mohamed Bouallegue, Driss Matrouf, Georges Linarès, Mickael Rouvier:
Subspace Gaussian Mixture Models Based on Noise Compensation for Speech Recognition. 318-321
Spoken Language Understanding and Dialog II
- Florian Kretzschmar, Sebastian Möller:
"Help Me, I Need More User Tests!" User Simulations as Supportive Tool in the Development Process of Spoken Dialogue Systems. 322-325 - Silke M. Witt:
Caller Response Timing Patterns in Spoken Dialog Systems. 326-329 - Dilek Hakkani-Tür, Gökhan Tür, Larry P. Heck, Ashley Fidler, Asli Celikyilmaz:
A Discriminative Classification-Based Approach to Information State Updates for a Multi-Domain Dialog System. 330-333 - Elizabeth Shriberg, Andreas Stolcke, Dilek Hakkani-Tür, Larry P. Heck:
Learning When to Listen: Detecting System-Addressed Speech in Human-Human-Computer Dialog. 334-337 - Gökhan Tür, Minwoo Jeong, Ye-Yi Wang, Dilek Hakkani-Tür, Larry P. Heck:
Exploiting the Semantic Web for Unsupervised Natural Language Semantic Parsing. 338-341 - Andrew Fandrianto, Maxine Eskénazi:
Prosodic Entrainment in an Information-Driven Dialog System. 342-345
Paralinguistics I
- Fabien Ringeval, Mohamed Chetouani, Björn W. Schuller:
Novel Metrics of Speech Rhythm for the Assessment of Emotion. 346-349 - Martin Wöllmer, Florian Eyben, Björn W. Schuller, Gerhard Rigoll:
Temporal and Situational Context Modeling for Improved Dominance Recognition in Meetings. 350-353 - Marc Swerts, Kitty Leuverink, Madelène Munnik, Vera Nijveld:
Audiovisual correlates of basic emotions in blind and sighted people. 354-357 - Houwei Cao, Ragini Verma, Ani Nenkova:
Combining Ranking and Classification to Improve Emotion Recognition in Spontaneous Speech. 358-361 - Zixing Zhang, Björn W. Schuller:
Active Learning by Sparse Instance Tracking and Classifier Confidence in Acoustic Emotion Recognition. 362-365 - Viktor Rozgic, Sankaranarayanan Ananthakrishnan, Shirin Saleem, Rohit Kumar, Aravind Namandi Vembu, Rohit Prasad:
Emotion Recognition using Acoustic and Lexical Features. 366-369
Pitch and HarMondayic Analysis
- Phillip L. De Leon, Bryan Stewart, Junichi Yamagishi:
Synthetic Speech Discrimination using Pitch Pattern Statistics Derived from Image Analysis. 370-373 - Zhengqi Wen, Hideki Kawahara, Jianhua Tao:
Pitch-Scaled Analysis based Residual Reconstruction for Speech Analysis and Synthesis. 374-377 - Feng Huang, Tan Lee:
Robust Pitch Estimation Using l1-regularized Maximum Likelihood Estimation. 378-381 - Gilles Degottex, Yannis Stylianou:
A full-band adaptive harmonic representation of speech. 382-385 - Hideki Kawahara, Masanori Morise, Ryuichi Nisimura, Toshio Irino:
Deviation measure of waveform symmetry and its application to high-speed and temporally-fine F0 extraction for vocal sound texture manipulation. 386-389 - Kota Yoshizato, Hirokazu Kameoka, Daisuke Saito, Shigeki Sagayama:
Hidden Markov Convolutive Mixture Model for Pitch Contour Analysis of Speech. 390-393
Speaker Trait Challenge - Part 2
- Benjamin Weiss, Felix Burkhardt:
Is 'not bad' good enough? Aspects of unknown voices' likability. 510-513 - Michelle Hewlett Sanchez, Aaron Lawson, Dimitra Vergyri, Harry Bratt:
Multi-System Fusion of Extended Context Prosodic and Cepstral Features for Paralinguistic Speaker Trait Classification. 514-517 - Harm Buisman, Eric O. Postma:
The log-Gabor method: speech classification using spectrogram image analysis. 518-521 - Yazid Attabi, Pierre Dumouchel:
Anchor Models and WCCN Normalization For Speaker Trait Classification. 522-525 - Claude Montacié, Marie-José Caraty:
Pitch and Intonation Contribution to Speakers' Traits Classification. 526-529 - Gopala Krishna Anumanchipalli, Hugo Meinedo, Miguel M. F. Bugalho, Isabel Trancoso, Luís C. Oliveira, Alan W. Black:
Text-dependent pathological voice detection. 530-533 - Jangwon Kim, Naveen Kumar, Andreas Tsiartas, Ming Li, Shrikanth S. Narayanan:
Intelligibility classification of pathological speech using fusion of multiple high level descriptors. 534-537 - Anthony P. Stark, Alireza Bayestehtashk, Meysam Asgari, Izhak Shafran:
Interspeech Pathology Challenge: Investigations into Speaker and Sentence Specific Effects. 538-541 - Xinhui Zhou, Daniel Garcia-Romero, Nima Mesgarani, Maureen L. Stone, Carol Y. Espy-Wilson, Shihab A. Shamma:
Automatic intelligibility assessment of pathologic speech in head and neck cancer based on auditory-inspired spectro-temporal modulations. 542-545 - Dong-Yan Huang, Yongwei Zhu, Dajun Wu, Rongshan Yu:
Detecting Intelligibility by Linear Dimensionality Reduction and Normalized Voice Quality Hierarchical Features. 546-549
Perceptual Learning and Perceptual Cues to Segments and Tones
- Matthias J. Sjerps, James M. McQueen, Holger Mitterer:
Extrinsic normalization for vocal tracts depends on the signal, not on attention. 394-397 - Hiroaki Hatano, Tatsuya Kitamura, Hironori Takemoto, Parham Mokhtari, Kiyoshi Honda, Shinobu Masaki:
Correlation between vocal tract length, body height, formant frequencies, and pitch frequency for the five Japanese vowels uttered by fifteen male speakers. 402-405 - Natthawut Kertkeidkachorn, Surapol Vorapatratorn, Sirinart Tangruamsub, Proadpran Punyabukkana, Atiwong Suchato:
Contribution of Spectral Shapes to Tone Perception. 414-417 - Julien Meyer:
Pitch and phonological perception of tone in the Suruí language of Rondônia (Brazil): identification task of LHL and LHH tonal patterns. 422-425 - Rui Cao, Ratree Wayland, Edith Kaan:
The Role of Creaky Voice in Mandarin Tone 2 and Tone 3 Perception. 426-429 - K. S. Nataraj, Prem C. Pandey:
Detection of Transition Segments in VCV Utterances for Estimation of the Place of Closure of Oral Stops for Speech Training. 406-409 - Odette Scharenborg, Esther Janse, Andrea Weber:
Perceptual Learning of /f/-/s/ by Older Listeners. 398-401 - Cyril Dubois, Rudolph Sock:
Audiovisual discrimination of CV syllables: a simultaneous fMRI-EEG study. 410-413 - Charturong Tantibundhit, Chutamanee Onsuwan, P. Phienphanich, Chai Wutiwiwatchai:
Methodological Issues in Assessing Perceptual Representation of Consonant Sounds in Thai. 418-421 - Michael D. Tyler, Mona Faris:
Can litheners retune native categories acroth a thoneme boundary? 430-433
Speech Synthesis: Prosody
- Eric Morley, Esther Klabbers, Jan P. H. van Santen, Alexander Kain, Seyed Hamidreza Mohammadi:
Synthetic F0 Can Effectively Convey Speaker ID in Delexicalized Speech. 434-437 - Timo Baumann, David Schlangen:
Evaluating Prosodic Processing for Incremental Speech Synthesis. 438-441 - Kazuhiko Iwata, Tetsunori Kobayashi:
Expressing Speaker's Intentions through Sentence-Final Intonations for Japanese Conversational Speech Synthesis. 442-445 - Alok Parlikar, Alan W. Black:
Modeling Pause-Duration for Style-Specific Speech Synthesis. 446-449 - Martin Gruber:
Enumerating Differences Between Various Communicative Functions for Purposes of Czech Expressive Speech Synthesis in Limited Domain. 450-453 - Christoph Norrenbrock, Florian Hinterleitner, Ulrich Heute, Sebastian Möller:
Quality Analysis of Macroprosodic F0 Dynamics in Text-to-Speech Signals. 454-457 - Hiroya Hashimoto, Keikichi Hirose, Nobuaki Minematsu:
Improved Automatic Extraction of Generation Process Model Commands and Its use for Generating Fundamental Frequency Contours for Training HMM-based Speech Synthesis. 458-461 - Tomoki Koriyama, Takashi Nose, Takao Kobayashi:
Discontinuous Observation HMM for Prosodic-Event-Based F0 Generation. 462-465 - Fanbo Meng, Zhiyong Wu, Helen M. Meng, Jia Jia, Lianhong Cai:
Hierarchical English Emphatic Speech Synthesis Based on HMM with Limited Training Data. 466-469 - Sarah Hoffmann, Beat Pfister:
Employing Sentence Structure: Syntax Trees as Prosody Generators. 470-473 - Yasunori Ohishi, Hirokazu Kameoka, Daichi Mochihashi, Kunio Kashino:
A Stochastic Model of Singing Voice F0 Contours for Characterizing Expressive Dynamic Components. 474-477
Speaker Diarization and Age Recognition
- Jan Silovský, Petr Cerva, Jindrich Zdánský, Jan Nouza:
Study on Integration of Speaker Diarization with Speaker Adaptive Speech Recognition for Broadcast Transcription. 478-481 - Stephen Shum, Najim Dehak, Jim Glass:
On the Use of Spectral and Iterative Methods for Speaker Diarization. 482-485 - Mary Tai Knox, Nikki Mirghafori, Gerald Friedland:
Where did I go wrong?: Identifying troublesome segments for speaker diarization systems. 486-489 - Sree Harsha Yella, Fabio Valente:
Speaker diarization of overlapping speech based on silence distribution in meeting recordings. 490-493 - Simon Bozonnet, Ravichander Vipperla, Nicholas W. D. Evans:
Phone Adaptive Training for Speaker Diarization. 494-497 - Finnian Kelly, Andrzej Drygajlo, Naomi Harte:
Compensating for Ageing and Quality variation in Speaker Verification. 498-501 - David A. van Leeuwen, Mohamad Hasan Bahari:
Calibration of probabilistic age recognition. 502-505 - Mohamad Hasan Bahari, Mitchell McLaren, Hugo Van hamme, David A. van Leeuwen:
Age Estimation from Telephone Speech using i-vectors. 506-509
ASR: Discriminative Training
- Shakti P. Rath, Martin Karafiát, Ondrej Glembek, Jan Cernocký:
A factorized representation of FMLLR transform based on QR-decomposition. 551-554 - Vikrant Singh Tomar, Richard C. Rose:
A Correlational Discriminant Approach to Feature Extraction for Robust Speech Recognition. 555-558 - Chao Weng, Biing-Hwang Juang, Daniel Povey:
Discriminative Training Using Non-uniform Criteria for Keyword Spotting on Spontaneous Speech. 559-562 - Masayuki Suzuki, Gakuto Kurata, Masafumi Nishimura, Nobuaki Minematsu:
Discriminative Reranking for LVCSR Leveraging Invariant Structure. 563-566 - Ting-Yao Hu, Yu Tsao, Lin-Shan Lee:
Discriminative Fuzzy Clustering Maximum a Posterior Linear Regression for Speaker Adaptation. 567-570 - Muhammad Ali Tahir, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney:
Simultaneous Discriminative Training and Mixture Splitting of HMMs for Speech Recognition. 571-574
Single Channel Speech Enhancement
- Laura E. Boucheron, Phillip L. De Leon:
Low-SNR, Speaker-Dependent Speech Enhancement using GMMs and MFCCs. 575-578 - Maria Koutsogiannaki, Michèle Pettinato, Cassie Mayo, Varvara Kandia, Yannis Stylianou:
Can modified casual speech reach the intelligibility of clear speech? 579-582 - Michael Carlin, Nicolas Malyska, Thomas F. Quatieri:
Speech Enhancement Using Sparse Convolutive Non-negative Matrix Factorization with Basis Adaptation. 583-586 - Dorothea Kolossa, Robert M. Nickel, Steffen Zeiler, Rainer Martin:
Inventory-Based Audio-Visual Speech Enhancement. 587-590 - Emma Jokinen, Paavo Alku, Martti Vainio:
Utilization of the Lombard effect in post-filtering for intelligibility enhancement of telephone speech. 591-594 - Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis:
Speech Enhancement by Online Non-negative Spectrogram Decomposition in Non-stationary Noise Environments. 595-598
Conversation and Interaction I
- Léo Varnet, Julien Meyer, Michel Hoen, Fanny Meunier:
Phoneme resistance during speech-in-speech comprehension. 599-602 - Hugo Quené, Will Schuerman:
Smile with a smile. 603-606 - Rebecca Lunsford, Peter A. Heeman, Jan P. H. van Santen:
Interactions Between Turn-taking Gaps, Disfluencies and Social Obligation. 607-610 - Maeva Garnier, Lucie Ménard, Gabrielle Richard:
Effect of being seen on the production of visible speech cues. A pilot study on Lombard speech. 611-614 - Marcin Wlodarczak, Juraj Simko, Petra Wagner:
Temporal entrainment in overlapped speech: Cross-linguistic study. 615-618 - Chi-Chun Lee, Athanasios Katsamanis, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Based on Isolated Saliency or Causal Integration? Toward a Better Understanding of Human Annotation Process using Multiple Instance Learning and Sequential Probability Ratio Test. 619-622
Speech Synthesis: Intelligibility
- Ann K. Syrdal, H. Timothy Bunnell, Susan R. Hertz, Taniya Mishra, Murray F. Spiegel, Corine A. Bickley, Deborah Rekart, Matthew J. Makashay:
Text-To-Speech Intelligibility Across Speech Rates. 623-626 - Linfang Wang, Lijuan Wang, Yan Teng, Zhe Geng, Frank K. Soong:
Objective Intelligibility Assessment of Text-to-Speech System using Template Constrained Generalized Posterior Probability. 627-630 - Cassia Valentini-Botinhao, Junichi Yamagishi, Simon King:
Mel cepstral coefficient modification based on the Glimpse Proportion measure for improving the intelligibility of HMM-generated synthetic speech in noise. 631-634 - Tudor-Catalin Zorila, Varvara Kandia, Yannis Stylianou:
Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression. 635-638 - Daniel Erro, Yannis Stylianou, Eva Navas, Inma Hernáez:
Implementation of Simple Spectral Techniques to Enhance the Intelligibility of Speech using a Harmonic Model. 639-642 - Seyed Hamidreza Mohammadi, Alexander Kain, Jan P. H. van Santen:
Making Conversational Vowels More Clear. 643-646
Speech and Language Technologies for STEM
- Diane J. Litman, Heather Friedberg, Katherine Forbes-Riley:
Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues. 755-758 - Wayne H. Ward, Daniel Bolaños, Ronald A. Cole:
Spoken Dialogs With a Virtual Science Tutor. 759-762 - Petr Cerva, Jan Silovský, Jindrich Zdánský, Jan Nouza, Jirí Málek:
Real-Time Lecture Transcription using ASR for Czech Hearing Impaired or Deaf Students. 763-766 - Lei Chen, Su-Youn Yoon:
Application of Structural Events Detected on ASR Outputs for Automated Speaking Assessment. 767-770 - Oscar Saz, Maxine Eskénazi:
Addressing Confusions in Spoken Language in ESL Pronunciation Tutors. 771-774 - Xiaojun Qian, Helen M. Meng, Frank K. Soong:
The Use of DBN-HMMs for Mispronunciation Detection and Diagnosis in L2 English to Support Computer-Aided Pronunciation Training. 775-778 - Catia Cucchiarini, Joost van Doremalen, Helmer Strik:
Practice and feedback in L2 speaking: an evaluation of the DISCO CALL system. 779-782 - Thomas Hueber, Atef Ben Youssef, Gérard Bailly, Pierre Badin, Frédéric Elisei:
Cross-speaker Acoustic-to-Articulatory Inversion using Phone-based Trajectory HMM for Pronunciation Training. 783-786
Prosody I
- Chiharu Tsurutani, Shunichi Ishihara:
Naturalness Judgement of Prosodic Variation of Japanese Utterances with Prosody Modified Stimuli. 647-650 - Mathieu Avanzi, Pauline Dubosson, Sandra Schwab:
Effects of Dialectal Origin on Articulation Rate in French. 651-654 - Chiao-Hua Hsieh, Chen-Yu Chiang, Yih-Ru Wang, Hsiu-Min Yu, Sin-Horng Chen:
A New Approach of Speaking Rate Modeling for Mandarin Speech Prosody. 655-658 - David Doukhan, Albert Rilliard, Sophie Rosset, Christophe d'Alessandro:
Modelling pause duration as a function of contextual length. 659-662 - Bei Wang, Chenxia Li, Qian Wu, Xiaxia Zhang, Baofeng Wang, Yi Xu:
Production and Perception of Focus in PFC and non-PFC Languages: Comparing Beijing Mandarin and Hainan Tsat. 663-666 - Xiaxia Zhang, Bei Wang, Qian Wu, Yi Xu:
Prosodic Realization of Focus in Statement and Question in Tibetan (Lhasa Dialect). 667-670 - Martti Vainio, Daniel Aalto, Antti Suni, Anja Arnhold, Tuomo Raitio, Henri Seijo, Juhani Järvikivi, Paavo Alku:
Effect of noise type and level on focus related fundamental frequency changes. 671-674 - Anal Warsi, Tulika Basu, Debasis Mazumdar:
Role of Prosody in Automatic Modality Recognition of Bangla Speech. 675-678 - Bettina Braun:
Where to associate stressed additive particles? Evidence from speech prosody. 679-682 - Matthew Benton:
From PVI to Perception: A Return to the Roots of Rhythm in Broadcast News. 683-686 - Julien Meyer, Laure Dentel, Frank Seifart:
A methodology for the study of rhythm in drummed forms of languages: application to Bora Manguaré of Amazon. 687-690
Speech Analysis
- Jouni Pohjalainen, Tuomo Raitio, Hannu Pulakka, Paavo Alku:
Automatic Detection of High Vocal Effort in Telephone Speech. 691-694 - D. Gomathi, Sathya Adithya Thati, Karthik Venkat Sridaran, Bayya Yegnanarayana:
Analysis of Mimicry Speech. 695-698 - Christian H. Kasess, Wolfgang Kreuzer, Ewald Enzinger, Nadja Kerschhofer-Puhalo:
Estimation of the vocal tract shape of nasals using a Bayesian scheme. 699-702 - Peter Birkholz, Philippe Daechert, Christiane Neuschaefer-Rube:
Advances in combined electro-optical palatography. 703-706 - Byung Suk Lee, Daniel P. W. Ellis:
Noise Robust Pitch Tracking by Subband Autocorrelation Classification. 707-710 - Alexander Sepúlveda, Rodrigo Capobianco Guido, Germán Castellanos-Domínguez:
Inference of Critical Articulator Position for Fricative Consonants. 711-714 - Markus Brückl:
Vocal Tremor Measurement Based on Autocorrelation of Contours. 715-718 - Chatchawarn Hansakunbuntheung, Ananlada Chotimongkol, Sumonmas Thatphithakkul, Patcharika Chootrakool:
Model-based Duration-difference Approach on Accent Evaluation of L2 Learner. 719-722
Dialog Systems
- Thomas Hueber, Gérard Bailly, Bruce Denby:
Continuous Articulatory-to-Acoustic Mapping using Phone-based Trajectory HMM for a Silent Speech Interface. 723-726 - Tatsuya Kawahara, Takuma Iwatate, Katsuya Takanashi:
Prediction of Turn-Taking by Combining Prosodic and Eye-Gaze Information in Poster Conversations. 727-730 - Ina Wechsung, Klaus-Peter Engelbrecht, Sebastian Möller:
Using Quality Ratings to Predict Modality Choice in Multimodal Systems. 731-734 - Fuming Fang, Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa, Sadaoki Furui, Toshimitsu Musha:
HMM Based Continuous EOG Recognition for Eye-input Speech Interface. 735-738 - Jason Lilley, Amanda Stent, Ilija Zeljkovic:
A Random, Semantically Appropriate Sentence Generator for Speaker Verification. 739-742 - Daniel Macías Galindo, Wilson Wong, Lawrence Cavedon, John Thangarajah:
Coherent Topic Transition in a Conversational Agent. 743-746 - Peter A. Heeman, Jordan Fryer, Rebecca Lunsford, Andrew Rueckert, Ethan Selfridge:
Using Reinforcement Learning for Dialogue Management Policies: Towards Understanding MDP Violations and Convergence. 747-750 - Ramón López-Cózar, Zoraida Callejas, David Griol:
Enhancing Speech Understanding in Spoken Dialogue Systems by Means of a New Frame-Correction Technique. 751-754 - Zoraida Callejas, David Griol, Klaus-Peter Engelbrecht:
Assessment of user simulators for spoken dialogue systems by means of subspace multidimensional clustering. 250-253
ASR: Bayesian Modeling
- Keith Kintzley, Aren Jansen, Hynek Hermansky:
MAP Estimation of Whole-Word Acoustic Models with Dictionary Priors. 787-790 - Samuel Thomas, Sriram Ganapathy, Aren Jansen, Hynek Hermansky:
Data-driven Posterior Features for Low Resource Speech Recognition Applications. 791-794 - Xiaodong Cui, Mohamed Afify, George Saon, Vaibhava Goel:
Sparse Bayesian Factor Analysis for Stereo-based Stochastic Mapping. 795-798 - Niklas Vanhainen, Giampiero Salvi:
Word Discovery with Beta Process Factor Analysis. 799-802 - Seong-Jun Hahm, Atsunori Ogawa, Masakiyo Fujimoto, Takaaki Hori, Atsushi Nakamura:
Speaker Adaptation Using Variational Bayesian Linear Regression in Normalized Feature Space. 803-806 - Alexander Krueger, Oliver Walter, Volker Leutnant, Reinhold Haeb-Umbach:
Bayesian Feature Enhancement for ASR of Noisy Reverberant Real-World Data. 807-810
Computer Assisted Language Learning I
- Emre Yilmaz, Dirk Van Compernolle, Hugo Van hamme:
Robust Tracking for Automatic Reading Tutors. 811-814 - Huang Hao, Jianming Wang, Halidan Abudureyimu:
Maximum F1-Score Discriminative Training for Automatic Mispronunciation Detection in Computer-Assisted Language Learning. 815-818 - Yow-Bang Wang, Lin-Shan Lee:
Error Pattern Detection Integrating Generative and Discriminative Learning for Computer-Aided Pronunciation Training. 819-822 - Florian Hönig, Tobias Bocklet, Korbinian Riedhammer, Anton Batliner, Elmar Nöth:
The Automatic Assessment of Non-native Prosody: Combining Classical Prosodic Analysis with Acoustic Modelling. 823-826 - Theban Stanley, Kadri Hacioglu:
Improving L1-Specific Phonological Error Diagnosis in Computer Assisted Pronunciation Training. 827-830 - Jort F. Gemmeke, Janneke van de Loo, Guy De Pauw, Joris Driesen, Hugo Van hamme, Walter Daelemans:
A Self-Learning Assistive Vocal Interface Based on Vocabulary Learning and Grammar Induction. 831-834
Conversation and Interaction II
- Gina-Anne Levow, Susan Duncan:
Contrasting Cues to Verbal and Non-Verbal Backchannels in Multi-lingual Dyadic Rapport. 835-838 - Sofia Strömbergsson, Jens Edlund, David House:
Prosodic measurements and question types in the Spontal corpus of Swedish dialogues. 839-842 - Khiet P. Truong, Dirk Heylen:
Measuring prosodic alignment in cooperative task-based conversations. 843-846 - Kornel Laskowski, Mattias Heldner, Jens Edlund:
On the Dynamics of Overlap in Multi-Party Conversation. 847-850 - Khiet P. Truong, Jürgen Trouvain:
On the acoustics of overlapping laughter in conversational speech. 851-854 - Agustín Gravano, Julia Hirschberg:
A Corpus-Based Study of Interruptions in Spoken Dialogue. 855-858
Speech Analysis and Modeling
- George P. Kafentzis, Olivier Rosec, Yannis Stylianou:
On the Modeling of Voiceless Stop Sounds of Speech using Adaptive Quasi-Harmonic Models. 859-862 - Raymond W. M. Ng, Thomas Hain, Keikichi Hirose:
An alignment matching method to explore pseudosyllable properties across different corpora. 863-866 - Benigno Uria, Iain Murray, Steve Renals, Korin Richmond:
Deep Architectures for Articulatory Inversion. 867-870 - Katharine Henry, Morgan Sonderegger, Joseph Keshet:
Automatic Measurement of Positive and Negative Voice Onset Time. 871-874 - Vahid Khanagha, Khalid Daoudi:
Efficient multipulse approximation of speech excitation using the most singular manifold. 875-878 - Aren Jansen, Samuel Thomas, Hynek Hermansky:
Intrinsic Spectral Analysis for Zero and High Resource Speech Recognition. 879-882
Analysis of Spoken Disorders in Health Applications - Part 1
- Maider Lehr, Emily Tucker Prud'hommeaux, Izhak Shafran, Brian Roark:
Fully Automated Neuropsychological Assessment for Detecting Mild Cognitive Impairment. 1039-1042 - Daniel Bone, Matthew P. Black, Chi-Chun Lee, Marian E. Williams, Pat Levitt, Sungbok Lee, Shrikanth S. Narayanan:
Spontaneous-Speech Acoustic-Prosodic Features of Children with Autism and the Interacting Psychologist. 1043-1046 - Constantijn Kaland, Emiel Krahmer, Marc Swerts:
Contrastive intonation in autism: The effect of speaker- and listener-perspective. 1047-1050 - Christina Hagedorn, Michael I. Proctor, Louis Goldstein, Maria Luisa Gorno-Tempini, Shrikanth S. Narayanan:
Characterizing Covert Articulation in Apraxic Speech Using real-time MRI. 1051-1054 - Alberto Abad, Anna Pompili, Ângela Costa, Isabel Trancoso:
Automatic word naming recognition for treatment and assessment of aphasia. 1055-1058 - Thomas F. Quatieri, Nicolas Malyska:
Vocal-Source Biomarkers for Depression: A Link to Psychomotor Activity. 1059-1062
Language Learning and Cross-Language Production and Perception
- Odette Scharenborg, Marijt J. Witteman, Andrea Weber:
Computational Modelling of the Recognition of Foreign-Accented Speech. 883-886 - Lya Meister, Einar Meister:
The production and perception of Estonian quantity degrees by native and non-native speakers. 887-890 - Makiko Sadakata, Mizuki Shingai, Alex Brandmeyer, Kaoru Sekiyama:
Perception of the moraic obstruent /Q/: a cross-linguistic study. 891-894 - Tomoko Nariai, Kazuyo Tanaka, Tatsuya Kawahara:
Comparative Analysis of Intensity between Native Speakers and Japanese Speakers of English. 895-898 - Christos Koniaris, Olov Engwall, Giampiero Salvi:
Auditory and Dynamic Modeling Paradigms to Detect L2 Mispronunciations. 899-902 - Sheng Li, Lan Wang:
Cross Linguistic Comparison of Mandarin and English EMA Articulatory Data. 903-906 - Chakir Zeroual, Diamantis Gafos, Phil Hoole, John H. Esling:
Physiological and acoustic study of word initial post-lexical gemination in Moroccan Arabic. 907-910 - Michael D. Tyler, Sarah Fenwick:
Perceptual Assimilation of Arabic Voiceless Fricatives by English Monolinguals. 911-914 - Okko Räsänen:
Non-auditory cognitive capabilities in computational modeling of early language acquisition. 915-918 - Okko Räsänen, Heikki Rasilo, Unto K. Laine:
Modeling spoken language acquisition with a generic cognitive architecture for associative learning. 919-922
Enhancement and Coding
- Dongmei Wang, Philipos C. Loizou:
Pitch Estimation Based on Long Frame Harmonic Model and Short Frame Average Correlation Coefficient. 923-926 - Sebastian Möller, Marcel Wältermann, Nicolas Côté:
Diagnostic Prediction of Transmitted Speech Quality: A New Framework for Signal-based and Parametric Models. 927-930 - Tom Bäckström:
Enumerative Algebraic Coding for ACELP. 931-934 - Atanu Saha, Tetsuya Shimamura:
Speech Enhancement With Bivariate Gamma Model. 935-938 - Marek B. Trawicki, Michael T. Johnson:
Improvements of the Beta-Order Minimum Mean-Square Error (MMSE) Spectral Amplitude Estimator using Chi Priors. 939-942 - Philip Harding, Ben Milner:
Enhancing Speech by Reconstruction from Robust Acoustic Features. 943-946 - Srikanth Raj Chetupally, Thippur V. Sreenivas:
Joint Pitch-Analysis Formant-Synthesis framework for CS recovery of speech. 947-950 - Shan Liang, Wei Jiang, Wenju Liu:
A new noise-tracking algorithm for generalizing binary time-frequency (T-F) masking to ratio masking. 951-954 - Yan Tang, Martin Cooke:
Optimised spectral weightings for noise-dependent speech intelligibility enhancement. 955-958
Speech Synthesis: Adaptation
- Langzhou Chen, Mark J. F. Gales, Vincent Wan, Javier Latorre, Masami Akamine:
Exploring Rich Expressive Information from Audiobook Data Using Cluster Adaptive Training. 959-962 - Ji He, Yao Qian, Frank K. Soong, Sheng Zhao:
Turning a Monolingual Speaker into Multilingual for a Mixed-language TTS. 963-966 - Christophe Veaux, Junichi Yamagishi, Simon King:
Using HMM-based Speech Synthesis to Reconstruct the Voice of Individuals with Degenerative Speech Disorders. 967-970 - Javier Latorre, Vincent Wan, Mark J. F. Gales, Langzhou Chen, K. K. Chin, Kate M. Knill, Masami Akamine:
Speech factorization for HMM-TTS based on cluster adaptive training. 971-974 - June Sig Sung, Doo Hwa Hong, Hyun Woo Koo, Nam Soo Kim:
Factored MLLR Adaptation Algorithm for HMM-based Expressive TTS. 975-978 - Dietmar Schabus, Michael Pucher, Gregor Hofer:
Speaker-adaptive visual speech synthesis in the HMM-framework. 979-982 - Viviane de Franca Oliveira, Sayaka Shiota, Yoshihiko Nankaku, Keiichi Tokuda:
Cross-lingual Speaker Adaptation for HMM-based Speech Synthesis based on Perceptual Characteristics and Speaker Interpolation. 983-986 - Mauro Nicolao, Javier Latorre, Roger K. Moore:
C2H: A Computational Model of H&H-based Phonetic Contrast in Synthetic Speech. 987-990 - Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi:
Vowel Creation by Articulatory Control in HMM-based Parametric Speech Synthesis. 991-994 - Rasmus Dall, Christophe Veaux, Junichi Yamagishi, Simon King:
Analysis of speaker clustering strategies for HMM-based speech synthesis. 995-998
Search and Decoding
- Kuan-Yu Chen, Hao-Chin Chang, Berlin Chen, Hsin-Min Wang:
Word Relevance Modeling for Speech Recognition. 999-1002 - Frank Duckhorn, Rüdiger Hoffmann:
Using context-free grammars for embedded speech recognition with Weighted Finite-State Transducers. 1003-1006 - Richard Dufour, Géraldine Damnati, Delphine Charlet, Frédéric Béchet:
Automatic transcription error recovery for Person Name Recognition. 1007-1010 - Satoshi Kobashikawa, Takaaki Hori, Yoshikazu Yamaguchi, Taichi Asami, Hirokazu Masataki, Satoshi Takahashi:
Efficient Beam Width Control to Suppress Excessive Speech Recognition Computation Time Based on Prior Score Range Normalization. 1011-1014 - David Nolden, Ralf Schlüter, Hermann Ney:
Search Space Pruning Based on Anticipated Path Recombination in LVCSR. 1015-1018 - Ian McGraw, Alexander Gruenstein:
Estimating Word-Stability During Incremental Speech Recognition. 1019-1022 - Stefan Ziegler, Bogdan Ludusan, Guillaume Gravier:
Using broad phonetic classes to guide search in automatic speech recognition. 1023-1026 - João Miranda, João Paulo Neto, Alan W. Black:
Parallel combination of multilingual speech streams for improved ASR. 1027-1030 - Fethi Bougares, Mickael Rouvier, Yannick Estève, Georges Linarès:
Low latency combination of parallelized single-pass LVCSR systems. 1031-1034 - Jungsuk Kim, Jike Chong, Ian R. Lane:
Efficient On-The-Fly Hypothesis Rescoring in a Hybrid GPU/CPU-based Large Vocabulary Continuous Speech Recognition Engine. 1035-1038
Dynamic Decoding
- Preethi Jyothi, Eric Fosler-Lussier, Karen Livescu:
Discriminatively learning factorized finite state pronunciation models from dynamic Bayesian networks. 1063-1066 - Anoop Deoras, Ruhi Sarikaya, Gökhan Tür, Dilek Hakkani-Tür:
Joint Decoding for Speech Recognition and Semantic Tagging. 1067-1070 - M. Ali Basha Shaik, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney:
Investigation of Maximum Entropy Hybrid Language Models for Open Vocabulary German and Polish LVCSR. 1071-1074 - Paul R. Dixon, Chiori Hori, Hideki Kashioka:
A Specialized WFST Approach for Class Models and Dynamic Vocabulary. 1075-1078 - Josef R. Novak, Nobuaki Minematsu, Keikichi Hirose:
Dynamic Grammars with Lookahead Composition for WFST-based Speech Recognition. 1079-1082 - Todd Shore, Friedrich Faubel, Hartmut Helmke, Dietrich Klakow:
Knowledge-Based Word Lattice Rescoring in a Dynamic Context. 1083-1086
Speaker Recognition I
- Richard D. McClanahan, Phillip L. De Leon:
Mixture Component Clustering for Efficient Speaker Verification. 1087-1090 - Taufiq Hasan, John H. L. Hansen:
Front-end Channel Compensation using Mixture-dependent Feature Transformations for i-Vector Speaker Recognition. 1091-1094 - William M. Campbell, Elliot Singer:
Query-by-Example using Speaker Content Graphs. 1095-1098 - Hanwu Sun, Bin Ma:
Unsupervised NAP Training Data Design for Speaker Recognition. 1099-1102 - George R. Doddington:
The Role of Score Calibration in Speaker Recognition. 1103-1106 - Takafumi Hattori, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda:
A Bayesian Approach to Speaker Recognition Based on GMMs Using Multiple Model Structures. 1107-1110
Development of Speech Production and Perception
- Ellen Marklund, Francisco Lacerda, Iris-Corinna Schwarz, Ulla Sundberg:
Similarities in fundamental frequency in infant speech segmentation models. 1111-1114 - Ulrika Marklund, Ulla Sundberg, Iris-Corinna Schwarz, Francisco Lacerda:
Phonological complexity and vocabulary size in 30-month-old Swedish children. 1115-1118 - Jeesun Kim, Chris Davis, Christine Kitamura:
Auditory-visual speech to infants and adults: signals and correlations. 1119-1122 - Dongxin Xu, Jill Gilkerson, Jeffery Richards:
Objective Child Vocal Development Measurement with Naturalistic Daylong Audio Recording. 1123-1126 - Kyoko Nagao, Mark Paullin, Vilena Livinsky, James B. Polikoff, Linda D. Vallino, Thierry G. Morlet, N. Carolyn Schanen, H. Timothy Bunnell:
Speech Production-Perception Relationships in Children with Speech Delay. 1127-1130 - Sofia Strömbergsson:
Synthetic correction of deviant speech - children's perception of phonologically modified recordings of their own speech. 1131-1134
HMM Synthesis I
- Vincent Wan, Javier Latorre, K. K. Chin, Langzhou Chen, Mark J. F. Gales, Heiga Zen, Kate M. Knill, Masami Akamine:
Combining multiple high quality corpora for improving HMM-TTS. 1135-1138 - Shinnosuke Takamichi, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, Sakriani Sakti, Satoshi Nakamura:
An Evaluation of Parameter Generation Methods with Rich Context Models in HMM-Based Speech Synthesis. 1139-1142 - Heng Lu, Simon King:
Using Bayesian Networks to find relevant context features for HMM-based speech synthesis. 1143-1146 - Xiang Yin, Zhen-Hua Ling, Ming Lei, Li-Rong Dai:
Considering Global Variance of the Log Power Spectrum Derived from Mel-Cepstrum in HMM-based Parametric Speech Synthesis. 1147-1150 - Vataya Chunwijitra, Takashi Nose, Takao Kobayashi:
A speech parameter generation algorithm using local variance for HMM-based speech synthesis. 1151-1154 - Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima, Masami Akamine:
Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP. 1155-1158
Analysis of Spoken Disorders in Health Applications - Part 2
- Thomas Drugman, Jérôme Urbain, Nathalie Bauwens, Ricardo Chessini, Anne-Sophie Aubriot, Patrick Lebecque, Thierry Dutoit:
Audio and Contact Microphones for Cough Detection. 1303-1306 - Nancy F. Chen, Wade Shen, Joseph P. Campbell:
Analyzing and Interpreting Automatically Learned Rules Across Dialects. 1307-1310 - Andrey N. Raev, Yuri Matveev, Tatiana Goloshchapova:
The Effect of Use of Drugs on Speaker's Fundamental Frequency and Formants. 1311-1314 - Marc Swerts, Cees de Bie:
On the assessment of audiovisual cues to speaker confidence by preteens with typical development (TD) and a-typical development (AD). 1315-1318 - Theodora Chaspari, Chi-Chun Lee, Shrikanth S. Narayanan:
Interplay between verbal response latency and physiology of children with autism during ECA interactions. 1319-1322 - Myung Jong Kim, Hoirin Kim:
Combination of Multiple Speech Dimensions for Automatic Assessment of Dysarthric Speech Intelligibility. 1323-1326 - Jun Wang, Ashok Samal, Jordan R. Green, Frank Rudzicz:
Whole-Word Recognition from Articulatory Movements for Silent Speech Interfaces. 1327-1330 - Shou-Chun Yin, Richard C. Rose, Yun Tang:
Verifying Session Level Pronunciation Accuracy in a Speech Therapy Application. 1331-1334 - Daryush D. Mehta, Rebecca Woodbury Listfield, Harold A. Cheyne II, James T. Heaton, Shengran W. Feng, Matías Zanartu, Robert E. Hillman:
Duration of ambulatory monitoring needed to accurately estimate voice use. 1335-1338 - Khairun-nisa Hassanali, Yang Liu, Thamar Solorio:
Evaluating NLP Features for Automatic Prediction of Language Impairment Using Child Speech Transcripts. 1339-1342 - Géza Kiss, Jan P. H. van Santen, Emily Tucker Prud'hommeaux, Lois M. Black:
Quantitative Analysis of Pitch in Speech of Children with Neurodevelopmental Disorders. 1343-1346
Paralinguistics II
- Felix Weninger, Erik Marchi, Björn W. Schuller:
Improving Recognition of Speaker States and Traits by Cumulative Evidence: Intoxication, Sleepiness, Age and Gender. 1159-1162 - Ni Ding, Julien Epps:
Speaker Clustering in Emotion Recognition. 1163-1166 - Samuel Kim, Sree Harsha Yella, Fabio Valente:
Automatic detection of conflict escalation in spoken conversations. 1167-1170 - Uwe D. Reichel:
The entropy of intoxicated speech - lexical creativity and heavy tongues. 1171-1174 - Daniel Bone, Chi-Chun Lee, Shrikanth S. Narayanan:
A Robust Unsupervised Arousal Rating Framework using Prosody with Cross-Corpora Evaluation. 1175-1178 - Carlos Busso, Tauhidur Rahman:
Unveiling the Acoustic Properties that Describe the Valence Dimension. 1179-1182 - Fabio Valente, Samuel Kim, Petr Motlícek:
Annotation and Recognition of Personality Traits in Spoken Conversations from the AMI Meetings Corpus. 1183-1186 - Shao-Ren Lyu:
The Effects of Lexical Tones and Nasal Coda /-n/ to Sadness in Taiwan Hakka. 1187-1190
ASR: Robust Modeling
- David Imseng, John Dines, Petr Motlícek, Philip N. Garner, Hervé Bourlard:
Comparing different acoustic modeling techniques for multilingual boosting. 1191-1194 - Yongqiang Wang, Mark J. F. Gales:
Model-based approaches to adaptive training in reverberant environments. 1195-1198 - Mark J. F. Gales, Federico Flego:
Model-Based Approaches for Degraded Channel Modelling in Robust ASR. 1199-1202 - William Hartmann, Eric Fosler-Lussier:
Improved Model Selection for the ASR-Driven Binary Mask. 1203-1206 - Simon Wiesler, Ralf Schlüter, Hermann Ney:
Accelerated Batch Learning of Convex Log-linear Models for LVCSR. 1207-1210 - Janne Pylkkönen, Mikko Kurimo:
Improving Discriminative Training for Robust Acoustic Models in Large Vocabulary Continuous Speech Recognition. 1211-1214 - Scott Novotney, Ivan Bulyko, Richard M. Schwartz, Sanjeev Khudanpur, Owen Kimball:
Semi-Supervised Methods for Improving Keyword Search of Unseen Terms. 1215-1218 - Xiangang Li, Dan Su, Zaihu Pang, Xihong Wu:
Probabilistic Speaker-Class based Acoustic Modeling for Large Vocabulary Continuous Speech Recognition. 1219-1222 - Xiao Yao, Takatoshi Jitsuhiro, Chiyomi Miyajima, Norihide Kitaoka, Kazuya Takeda:
Classification of Stressed Speech Using Physical Parameters Derived from Two-Mass Model. 1223-1226 - Jun Du, Qiang Huo:
IVN-Based Joint Training Of GMM And HMMs Using An Improved VTS-Based Feature Compensation For Noisy Speech Recognition. 1227-1230
ASR: Robust Features I
- Niko Moritz, Jörn Anemüller, Birger Kollmeier:
Amplitude Modulation Filters as Feature Sets for Robust ASR: Constant Absolute or Relative Bandwidth? 1231-1234 - Cemil Demir, Ali Taylan Cemgil, Murat Saraçlar:
Effect of speech priors in single-channel speech-music separation for ASR. 1235-1238 - Arun Narayanan, DeLiang Wang:
On the Role of Binary Mask Pattern in Automatic Speech Recognition. 1239-1242 - Tatsuya Kawahara, Randy Gomez:
Dereverberation based on Wavelet Packet Filtering for Robust Automatic Speech Recognition. 1243-1246 - Trausti T. Kristjansson, Thad Hughes:
Spectral Intersections for Non-Stationary Signal Separation. 1247-1250 - Kyohei Odani, Longbiao Wang, Atsuhiko Kai:
Speech Recognition by Denoising and Dereverberation Based on Spectral Subtraction in a Real Noisy Reverberant Environment. 1251-1254 - Hilman Ferdinandus Pardede, Koichi Shinoda, Koji Iwano:
Q-Gaussian based spectral subtraction for robust speech recognition. 1255-1258 - Bernd T. Meyer, Constantin Spille, Birger Kollmeier, Nelson Morgan:
Hooking up spectro-temporal filters with auditory-inspired representations for robust automatic speech recognition. 1259-1262 - Peter Li, Xie Sun:
Feature extraction based on hearing system signal processing for robust large vocabulary speech recognition. 1263-1266 - Harish Arsikere, Gary K. F. Leung, Steven M. Lulich, Abeer Alwan:
Automatic estimation of the first two subglottal resonances in children's speech with application to speaker normalization in limited-data conditions. 1267-1270
Computer Assisted Language Learning II
- Yurie Iribe, Takurou Mori, Kouichi Katsurada, Goh Kawai, Tsuneo Nitta:
Real-time Visualization of English Pronunciation on an IPA Chart Based on Articulatory Feature Extraction. 1271-1274 - Je Hun Jeon, Su-Youn Yoon:
Acoustic Feature-based Non-scorable Response Detection for an Automated Speaking Proficiency Assessment. 1275-1278 - Jorge Wuth, Néstor Becerra Yoma, Leopoldo Benavides, Hiram Vivanco:
Pronunciation quality evaluation of sentences by combining word based scores. 1279-1282 - Peter Bell, Myroslava O. Dzikovska, Amy Isard:
Designing a spoken language interface for a tutorial dialogue system. 1283-1286 - Long Zhang, Haifeng Li:
Automatic Pronunciation Error Detection Based on Extended Pronunciation Space Using the Unsupervised Clustering of Pronunciation Errors. 1287-1290 - Thomas Pellegrini, Ângela Costa, Isabel Trancoso:
Less errors with TTS? A dictation experiment with foreign language learners. 1291-1294 - Liang-Yu Chen, Jyh-Shing Roger Jang:
Improvement in Automatic Pronunciation Scoring using Additional Basic Scores and Learning to Rank. 1295-1298 - Jian Cheng:
Automatic Tone Assessment of Non-Native Mandarin Speakers. 1299-1302
ASR: Robust Features II
- Michael A. Carlin, Kailash Patil, Sridhar Krishna Nemala, Mounya Elhilali:
Robust phoneme recognition based on biomimetic speech contours. 1348-1351 - Kaisheng Yao, Yifan Gong, Chaojun Liu:
A Feature Space Transformation Method for Personalization using Generalized I-Vector Clustering. 1352-1355 - T. J. Tsai, Nelson Morgan:
Longer Features: They do a speech detector good. 1356-1359 - Md. Jahangir Alam, Patrick Kenny, Douglas D. O'Shaughnessy:
Robust Feature Extraction for Speech Recognition by Enhancing Auditory Spectrum. 1360-1363 - Florian Müller, Alfred Mertins:
Enhancing Vocal Tract Length Normalization with Elastic Registration for Automatic Speech Recognition. 1364-1367 - Hannes Pessentheiner, Stefan Petrik, Harald Romsdorfer:
Beamforming using uniform circular arrays for distant speech recognition in reverberant environments and double talk scenarios. 1368-1371
ASR: Rich Transcription
- Ales Prazák, Zdenek Loose, Jan Trmal, Josef V. Psutka, Josef Psutka:
Novel Approach to Live Captioning Through Re-speaking: Tailoring Speech Recognition to Re-speaker's Needs. 1372-1375 - Jáchym Kolár, Lori Lamel:
Development and Evaluation of Automatic Punctuation for French and English Speech-to-Text. 1376-1379 - Shajith Ikbal, Sachindra Joshi, Ashish Verma, Om D. Deshmukh:
Spoken Document Clustering Using Word Confusion Networks. 1380-1383 - Xuancong Wang, Hwee Tou Ng, Khe Chai Sim:
Dynamic Conditional Random Fields for Joint Sentence Boundary and Punctuation Prediction. 1384-1387 - Fabio Brugnara, Daniele Falavigna, Diego Giuliani, Roberto Gretter:
Analysis of the Characteristics of Talk-show TV Programs. 1388-1391 - Andrew Rosenberg:
Rethinking The Corpus: Moving towards Dynamic Linguistic Resources. 1392-1395
Phonetics and Phonology
- Marianna Nadeu:
Effects of stress and speech rate on vowel quality in Catalan and Spanish. 1396-1399 - Michael McAuliffe, Molly Babel:
Predictability affects vowel dispersion and dynamics in the Buckeye Corpus. 1400-1403 - Robert Allen Fox, Ewa Jacewicz:
Dialectal and generational variations in vowels in spontaneous speech. 1404-1407 - Christian DiCanio, Hosung Nam, Douglas H. Whalen, H. Timothy Bunnell, Jonathan D. Amith, Rey Castillo García:
Assessing agreement level between forced alignment models with data from endangered language documentation corpora. 130-133 - Ying Chen, Vsevolod Kapatsinski, Susan Guion-Anderson:
Acoustic Cues of Vowel Quality to Coda Nasal Perception in Southern Min. 1412-1415 - Miguel Simonet, José Ignacio Hualde, Marianna Nadeu:
Lenition of /d/ in spontaneous Spanish and Catalan. 1416-1419
HMM Synthesis II
- Tuomo Raitio, Antti Suni, Martti Vainio, Paavo Alku:
Wideband Parametric Speech Synthesis Using Warped Linear Prediction. 1420-1423 - Thomas Drugman, John Kane, Christer Gobl:
Modeling the Creaky Excitation for Parametric Speech Synthesis. 1424-1427 - Zhengqi Wen, Jianhua Tao:
Amplitude Spectrum based Excitation Model for HMM-based Speech Synthesis. 1428-1431 - Nobuyuki Nishizawa, Tsuneo Kato:
Speech synthesis using a non-maximally decimated filter bank for embedded systems. 1432-1435 - Hanna Silén, Elina Helander, Jani Nurminen, Moncef Gabbouj:
Ways to Implement Global Variance in Statistical Speech Synthesis. 1436-1439 - Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima, Masami Akamine:
HMM-based speech synthesis using sub-band basis spectrum model. 1440-1443
Glottal Source Processing: from Analysis to Applications
- Thomas Drugman, John Kane, Christer Gobl:
Resonator-based creaky voice detection. 1592-1595 - Vinay Kumar Mittal, N. Dhananjaya, Bayya Yegnanarayana:
Effect of Tongue Tip Trilling on the Glottal Excitation Source. 1596-1599 - Gang Chen, Yen-Liang Shue, Jody Kreiman, Abeer Alwan:
Estimating the voice source in noise. 1600-1603 - Alan Pinheiro, Tuomo Raitio, Danyane Gomes, Paavo Alku:
Voice source analysis using biomechanical modeling and glottal inverse filtering. 1604-1607 - Carlo Drioli, Andrea Calanca:
Speech modeling and processing by low-dimensional dynamic glottal models. 1608-1611 - Paavo Alku, Jouni Pohjalainen, Martti Vainio, Anne-Maria Laukkanen, Brad H. Story:
Improved formant frequency estimation from high-pitched vowels by downgrading the contribution of the glottal source with weighted linear prediction. 1612-1615 - Akira Sasou:
Automatic Topology Generation of Glottal Source HMM. 1616-1619 - Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Tuomo Raitio, Nicolas Obin, Paavo Alku, Junichi Yamagishi, Juan Manuel Montero:
Towards Glottal Source Controllability in Expressive Speech Synthesis. 1620-1623 - Ali Alpan, Jean Schoentgen, Francis Grenez:
Combining temporal and cepstral features for the automatic perceptual categorization of disordered connected speech. 1624-1627 - Rui Sun, Elliot Moore II:
A Preliminary Study on Cross-Databases Emotion Recognition using the Glottal Features in Speech. 1628-1631 - Ranniery Maia:
Analysis on the Importance of Short-Term Speech Parameterizations for Emotional Statistical Parametric Speech Synthesis. 1632-1635 - Christophe Mertens, Francis Grenez, Jean Schoentgen:
Analysis of vocal tremor and jitter by empirical mode decomposition of glottal cycle length time series. 1636-1639 - Harri Auvinen, Tuomo Raitio, Samuli Siltanen, Paavo Alku:
Utilizing Markov Chain Monte Carlo (MCMC) Method for Improved Glottal Inverse Filtering. 1640-1643 - Stefan Huber, Axel Röbel, Gilles Degottex:
Glottal source shape parameter estimation using phase minimization variants. 1644-1647 - Keith W. Godin, Taufiq Hasan, John H. L. Hansen:
Glottal Waveform Analysis of Physical Task Stress Speech. 1648-1651 - Juan F. Torres, Elliot Moore:
Speaker Discrimination Ability of Glottal Waveform Features. 1652-1655
Hearing
- Okko Räsänen:
Average Spectrotemporal Structure of Continuous Speech Matches with the Frequency Resolution of Human Hearing. 1444-1447 - Ibon Saratxaga, Inma Hernáez, Michael Pucher, Eva Navas, Iñaki Sainz:
Perceptual Importance of the Phase Related Information in Speech. 1448-1451 - Andrea Grigorescu, Marek Rudnicki, Michael Isik, Werner Hemmert, Stefano Rini:
Improving the Entropy Estimate of Neuronal Firings of Modeled Cochlear Nucleus Neurons. 1452-1455 - Kyoko Nagao, Mark Paullin, James B. Polikoff, Jason Lilley, H. Timothy Bunnell:
Perception of Synthetic Speech in Adult Users of Cochlear Implants. 1456-1459 - Odette Scharenborg, Esther Janse:
Hearing Loss and the Use of Acoustic Cues in Phonetic Categorisation of Fricatives. 1460-1463 - Nao Hodoshima, Takayuki Arai, Kiyohiro Kurisu:
Intelligibility of speech spoken in noise/reverberation for older adults in reverberant environments. 1464-1467 - Andrew Hines, Naomi Harte:
Improved Speech Intelligibility with a Chimaera Hearing Aid Algorithm. 1468-1471 - Elizabeth Godoy, Yannis Stylianou:
Unsupervised Acoustic Analyses of Normal and Lombard Speech, with Spectral Envelope Transformation to Improve Intelligibility. 1472-1475 - Akiko Amano-Kusumoto, Justin M. Aronoff, Motokuni Itoh, Sigfrid D. Soli:
The effect of dichotic processing on the perception of binaural cues. 1476-1479 - Nima Mesgarani, Edward Chang:
Speech and speaker separation in human auditory cortex. 1480-1483 - Jens Edlund, Mattias Heldner, Joakim Gustafson:
On the effect of the acoustic environment on the accuracy of perception of speaker orientation from auditory cues alone. 1484-1487
Degraded Speech and Enhancement
- Sira Gonzalez, Mike Brookes:
Sibilant Speech Detection in Noise. 1488-1491 - Kit Thambiratnam, Weiwu Zhu, Frank Seide:
Voice Activity Detection Using Speech Recognizer Feedback. 1492-1495 - Dushyant Sharma, Gaston Hilkhuysen, Patrick A. Naylor, Nikolay D. Gaubitch, Mark A. Huckvale, Mike Brookes:
Descriptive Vocabulary Development for Degraded Speech. 1496-1499 - Ryo Yokoyama, Yu Nasu, Koichi Shinoda, Koji Iwano:
Overlapped Speech Detection in Meeting Using Cross-Channel Spectral Subtraction and Spectrum Similarity. 1500-1503 - Xugang Lu, Shigeki Matsuda, Chiori Hori, Hideki Kashioka:
Speech restoration based on deep learning autoencoder with layer-wised pretraining. 1504-1507 - Rupayan Chakraborty, Climent Nadeu, Taras Butko:
Detection and Positioning of Overlapped Sounds in a Room Environment. 1508-1511 - Deepak K. T., Biswajit Dev Sarma, S. R. Mahadeva Prasanna:
Foreground Speech Segmentation using Zero Frequency Filtered Signal. 1512-1515 - Patrick Reidy, Mary E. Beckman:
The Effect of Spectral Estimator on Common Spectral Measures for Sibilant Fricatives. 1516-1519
Source Separation and Computational Auditory Scene Analysis
- Emad M. Grais, Hakan Erdogan:
Gaussian Mixture Gain Priors for Regularized Nonnegative Matrix Factorization in Single-Channel Source Separation. 1520-1523 - Shivesh Ranjan, Karen L. Payton, Pejman Mowlaee:
Speaker Independent Single Channel Source Separation using Sinusoidal Features. 1524-1527 - Yuxuan Wang, DeLiang Wang:
Boosting Classification Based Speech Separation Using Temporal Dynamics. 1528-1531 - Yuxuan Wang, Kun Han, DeLiang Wang:
Acoustic Features for Classification Based Speech Separation. 1532-1535 - Emad M. Grais, Hakan Erdogan:
Hidden Markov Models as Priors for Regularized Nonnegative Matrix Factorization in Single-Channel Source Separation. 1536-1539 - Ji Ming, Ramji Srinivasan, Danny Crookes:
Unconstrained Speech Separation by Composition of Longest Segments. 1540-1543 - Yi Zhang, Yunxin Zhao:
Modulation domain blind source separation for noisy speech mixture. 1544-1547 - Pejman Mowlaee, Rahim Saeidi, Rainer Martin:
Phase estimation for signal reconstruction in single-channel source separation. 1548-1551 - Jen-Tzung Chien, Hsin-Lung Hsieh:
Bayesian Group Sparse Learning for Nonnegative Matrix Factorization. 1552-1555
Speaker Recognition II
- Michael T. Johnson, Jianglin Wang:
Residual Phase Cepstrum Coefficients with Application to Cross-lingual Speaker Verification. 1556-1559 - Chunyan Liang, Jinchao Yang, Lin Yang, Yonghong Yan:
Speaker Verification Using Neighborhood Preserving Embedding. 1560-1563 - Chunyan Liang, Xiang Zhang, Lin Yang, Yonghong Yan:
Discriminative Decision Function Based Scoring Method in Joint Factor Analysis for Speaker Verification. 1564-1567 - Taufiq Hasan, John H. L. Hansen:
Integrated Feature Normalization and Enhancement for robust Speaker Recognition using Acoustic Factor Analysis. 1568-1571 - Lukás Machlica, Zbynek Zajíc:
Factor Analysis and Nuisance Attribute Projection Revisited. 1572-1575 - Sheng Chen, Mingxing Xu:
Compensation of Intrinsic Variability with Factor Analysis Modeling for Robust Speaker Verification. 1576-1579 - Anthony Larcher, Kong-Aik Lee, Bin Ma, Haizhou Li:
RSR2015: Database for Text-Dependent Speaker Verification using Multiple Pass-Phrases. 1580-1583 - Volker Dellwo, Adrian Leemann, Marie-José Kolly:
Speaker idiosyncratic rhythmic features in the speech signal. 1584-1587 - Yun Lei, Lukás Burget, Nicolas Scheffer:
Bilinear Factor Analysis for iVector Based Speaker Verification. 1588-1591
Language Modeling: New Models and Features
- Xunying Liu, Mark J. F. Gales, Philip C. Woodland:
Paraphrastic Language Models. 1656-1659 - Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur:
Efficient Structured Language Modeling for Speech Recognition. 1660-1663 - Yangyang Shi, Pascal Wiggers, Catholijn M. Jonker:
Towards Recurrent Neural Networks Language Models with Linguistic and Contextual Features. 1664-1667 - Gwénolé Lecorvé, Petr Motlícek:
Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition. 1668-1671 - Hong-Kwang Kuo, Ebru Arisoy, Ahmad Emami, Paul Vozila:
Large Scale Hierarchical Neural Network Language Models. 1672-1675 - Brian Hutchinson, Mari Ostendorf, Maryam Fazel:
A Sparse Plus Low Rank Maximum Entropy Language Model. 1676-1679
Speaker Verification
- Ye Jiang, Kong-Aik Lee, Zhenmin Tang, Bin Ma, Anthony Larcher, Haizhou Li:
PLDA Modeling in I-Vector and Supervector Space for Speaker Verification. 1680-1683 - Konstantin Simonchik, Timur Pekhovsky, Andrey Shulipa, Anton Afanasyev:
Supervized Mixture of PLDA Models for Cross-Channel Speaker Verification. 1684-1687 - Federico Alegre, Ravichander Vipperla, Nicholas W. D. Evans:
Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals. 1688-1691 - Themos Stafylakis, Patrick Kenny, Mohammed Senoussaoui, Pierre Dumouchel:
PLDA using Gaussian Restricted Boltzmann Machines with application to Speaker Verification. 1692-1695 - Seyed Omid Sadjadi, Taufiq Hasan, John H. L. Hansen:
Mean Hilbert Envelope Coefficients (MHEC) for Robust Speaker Recognition. 1696-1699 - Zhizheng Wu, Chng Eng Siong, Haizhou Li:
Detecting Converted Speech and Natural Speech for anti-Spoofing Attack in Speaker Recognition. 1700-1703
Speech Intelligibility in Quiet and in Noise
- Julián Villegas, Martin Cooke:
Maximising objective speech intelligibility by local f0 modulation. 1704-1707 - Catherine Mayo, Vincent Aubanel, Martin Cooke:
Effect of prosodic changes on speech intelligibility. 1708-1711 - Saya Kawase, Yue Wang:
Effects of visual speech information on native listener judgments of L2 consonant intelligibility. 1712-1715 - Guy J. Brown, Amy V. Beeston, Kalle J. Palomäki:
Perceptual compensation for the effects of reverberation on consonant identification: A comparison of human and machine performance. 1716-1719 - Michael Fitzpatrick, Jeesun Kim, Chris Davis:
The Intelligibility of Lombard Speech: Communicative setting matters. 1720-1723 - João Felipe Santos, Stefano Cosentino, Oldooz Hazrati, Philipos C. Loizou, Tiago H. Falk:
Performance Comparison of Intrusive Objective Speech Intelligibility and Quality Metrics for Cochlear Implant Users. 1724-1727
Speech Tools Demo
- Florian Metze, Eric Fosler-Lussier:
The Speech Recognition Virtual Kitchen: An Initial Prototype. 1872-1873 - Uwe D. Reichel:
PermA and Balloon: Tools for string alignment and text processing. 1874-1877 - Slim Ouni, Loic Mangeonjean, Ingmar Steiner:
VisArtico: a visualization tool for articulatory data. 1878-1881 - Przemyslaw Lenkiewicz, Dieter Van Uytvanck, Peter Wittenburg, Sebastian Drude:
Towards Automated Annotation of Audio and Video Recordings by Application of Advanced Web-services. 1882-1885 - Simone Ashby, Sílvia Barbosa, Silvia Brandão, José Pedro Ferreira, Maarten Janssen, Catarina Silva, Mário Eduardo Viaro:
A Rule Based Pronunciation Generator and Regional Accent Databank for Portuguese. 1886-1887 - Roger Chappel, Kuldip K. Paliwal:
Speech Enhancement for Android (SEA): A Speech Processing Demonstration Tool for Android Based Smart Phones and Tablets. 1888-1891 - Jacob Okamoto, Serguei V. S. Pakhomov, Elizabeth Shriberg, Andreas Stolcke:
ProTK: An Improved Prosody Toolkit. 1892-1893 - Suzanne Boyce, Harriet J. Fell, Joel MacAuslan:
SpeechMark: Landmark Detection Tool for Speech Analysis. 1894-1897 - Javier Tejedor, Fernando J. López-Colino, Jordi Porta, José Colás:
An On-Line, Cloud-Based Spanish-Spanish Sign Language Translation System. 2127-2128
Audio Analysis, Estimation and Classification
- Sourish Chaudhuri, Rita Singh, Bhiksha Raj:
Exploiting Temporal Sequence Structure for Semantic Analysis of Multimedia. 1728-1731 - Hong Liu, Xiaofei Li:
Time Delay Estimation for Speech Signal Based on FOC-Spectrum. 1732-1735 - Ziqiang Shi, Tieran Zheng, Jiqing Han, Shiwen Deng:
Low-rank Audio Signal Classification Under Soft Margin and Trace Norm Constraints. 1736-1739 - Carlos Segura, Javier Hernando:
GCC-PHAT based Head Orientation Estimation. 1740-1743 - Soham De, Indradyumna Roy, Tarunima Prabhakar, Kriti Suneja, Sourish Chaudhuri, Rita Singh, Bhiksha Raj:
Plagiarism Detection in Polyphonic Music using Monaural Signal Separation. 1744-1747 - Mariem Bouafif, Zied Lachiri:
TDOA Estimation for Multiple Speakers in Underdetermined Case. 1748-1751 - Toru Nakashika, Christophe Garcia, Tetsuya Takiguchi:
Local-feature-map Integration Using Convolutional Neural Networks for Music Genre Classification. 1752-1755 - Jeffrey Berry, Ian R. Fasel, Luciano Fadiga, Diana Archangeli:
Training Deep Nets with Imbalanced and Unlabeled Data. 1756-1759
Adaptation for ASR
- Taichi Asami, Satoshi Kobashikawa, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi:
Speech Data Clustering Based on Phoneme Error Trend for Unsupervised Acoustic Model Adaptation. 1760-1763 - Wooil Kim, John H. L. Hansen:
Gaussian Map based Acoustic Model Adaptation Using Untranscribed Data for Speech Recognition in Severely Adverse Environments. 1764-1767 - Danning Jiang, Dimitri Kanevsky, Vaibhava Goel, Yong Qin:
Investigating Performance of the Discriminative Methods for Long-Term Speaker Adaptation. 1768-1771 - Bo Li, Khe Chai Sim:
A Two-stage Speaker Adaptation Approach for Subspace Gaussian Mixture Model based Nonnative Speech Recognition. 1772-1775 - Heidi Christensen, Stuart P. Cunningham, Charles Fox, Phil D. Green, Thomas Hain:
A comparative study of adaptive, automatic recognition of disordered speech. 1776-1779 - Seçkin Uluskan, John H. L. Hansen:
Phoneme Class Based Adaptation for Mismatch Acoustic Modeling of Distant Noisy Speech. 1780-1783 - Zoi Roupakia, Anton Ragni, Mark J. F. Gales:
Rapid Nonlinear Speaker Adaptation for Large-Vocabulary Continuous Speech Recognition. 1784-1787 - I-Fan Chen, Chin-Hui Lee:
A Study on Using Word-Level HMMs to Improve ASR Performance over State-of-the-Art Phone-Level Acoustic Modeling for LVCSR. 1788-1791 - Michael L. Seltzer, Alex Acero:
Factored adaptation using a combination of feature-space and model-space transforms. 1792-1795
Robust Speech Recognition I
- Heyun Huang, Louis ten Bosch, Bert Cranen, Lou Boves:
Exploring Discriminative Speech Trajectory Structures. 1796-1799 - Ehsan Variani, Hynek Hermansky:
Estimating Classifier Performance in Unknown Noise. 1800-1803 - Azarakhsh Jalalvand, Fabian Triefenbach, Jean-Pierre Martens:
Continuous Digit Recognition in Noise: Reservoirs can do an excellent job! 1804-1807 - Janne Pylkkönen, Mikko Kurimo:
Optimization-Based Control for the Extended Baum-Welch Algorithm. 1808-1811 - Marc René Schädler, Birger Kollmeier:
Normalization of spectro-temporal Gabor filter bank features for improved robust automatic speech recognition systems. 1812-1815 - Feipeng Li, Sri Harish Reddy Mallidi, Hynek Hermansky:
Phone recognition in critical bands using sub-band temporal modulations. 1816-1819 - Ramya Rasipuram, Mathew Magimai-Doss:
Combining Acoustic Data Driven G2P and Letter-to-Sound Rules for Under Resource Lexicon Generation. 1820-1823 - Sriram Ganapathy, Hynek Hermansky:
Analysis of Temporal Resolution in Frequency Domain Linear Prediction. 1828-1831 - Bing Zhang, Richard M. Schwartz, Stavros Tsakalidis, Long Nguyen, Spyros Matsoukas:
White Listing and Score Normalization for Keyword Spotting of Noisy Speech. 1832-1835
Rich Transcription II
- Saeid Safavi, Maryam Najafian, Abualsoud Hanani, Martin J. Russell, Peter Jancovic, Michael J. Carey:
Speaker Recognition for Children's Speech. 1836-1839 - Germán Bordel, Mikel Peñagarikano, Luis Javier Rodríguez-Fuentes, Amparo Varona:
A simple and efficient method to align very long speech signals to acoustically imperfect transcriptions. 1840-1843 - Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki:
Estimation of Talker's Head Orientation Based on Discrimination of the Shape of Cross-power Spectrum Phase Coefficients. 1844-1847 - Ann Lee, James R. Glass:
Sentence Detection Using Multiple Annotations. 1848-1851 - Delphine Charlet, Géraldine Damnati:
A speaker-role based approach for detecting politicians in TV broadcast news. 1852-1855 - Guangting Mai:
Relative Importance of Temporal Envelope and Fine Structure Cues in Low- and High-Order Harmonic Regions for Mandarin Lexical-tone Recognition. 1856-1859 - Nitya Tiwari, Prem C. Pandey, Pandurangarao N. Kulkarni:
Real-time Implementation of Multi-band Frequency Compression for Listeners with Moderate Sensorineural Impairment. 1860-1863 - Taniya Mishra, Vivek Kumar Rangarajan Sridhar, Alistair Conkie:
Word Prominence Detection using Robust yet Simple Prosodic Features. 1864-1867 - Amit Srivastava, Saurabh Khanwalkar, Gretchen Markiewicz, Guruprasad Saikumar:
Online Story Segmentation of Multilingual Streaming Broadcast News. 1868-1871
Adaptation & Robust Modeling
- Yanzhang He, Eric Fosler-Lussier:
Efficient Segmental Conditional Random Fields for One-Pass Phone Recognition. 1898-1901 - Udhyakumar Nallasamy, Florian Metze, Tanja Schultz:
Enhanced Polyphone Decision Tree Adaptation for Accented Speech Recognition. 1902-1905 - Jinyu Li, Michael L. Seltzer, Yifan Gong:
Efficient VTS Adaptation Using Jacobian Approximation. 1906-1909 - Milos Cernak, David Imseng, Hervé Bourlard:
Robust triphone mapping for acoustic modeling. 1910-1913 - Weibin Zhang, Pascale Fung:
sparse banded precision matrices for low resource speech recognition. 1914-1917 - Abdul Waheed Mohammed, Marco Matassoni, Hari Krishna Maganti, Maurizio Omologo:
Semi-Blind Model Adaptation using Piece-wise Energy Decay Curve for Large Reverberant Environments. 1918-1921
Multi-Channel Speech Enhancement
- Keisuke Kinoshita, Marc Delcroix, Mehrez Souden, Tomohiro Nakatani:
Example-based speech enhancement with joint utilization of spatial, spectral & temporal cues of speech and noise. 1926-1929 - Shengkui Zhao, Douglas L. Jones:
A Fast-Converging Adaptive Frequency-Domain MVDR Beamformer for Speech Enhancement. 1930-1933 - Rita Singh, Ken'ichi Kumatani, John W. McDonough, Chen Liu:
A signal-separation-based array postfilter for distant speech recognition. 1934-1937 - Meng Yu, Frank K. Soong:
Constrained Multichannel Speech Dereverberation. 1938-1941 - Meng Yu, Ryan Ritch, Jack Xin:
A Triple-Microphone Real-Time Speech Enhancement Algorithm Based on Approximate Array Analytical Solutions. 1942-1945
Prosody II
- Ratree Wayland, Donruethai Laphasradakul, Edith Kaan, Cao Rui:
Perception of Pitch Contours among Native Tone Listeners. 1946-1948 - Yosuke Igarashi, Hanae Koiso:
Pitch range control of Japanese boundary pitch movements. 1949-1952 - Grace Kuo:
Perceived prosodic boundaries in Taiwanese and their acoustic correlates. 1953-1956 - Luying Hou, Yuan Jia, Aijun Li:
Phonetic Foreignization of Mandarin for Dubbing in Imported Western Movies. 1957-1960 - Helena Moniz, Fernando Batista, Isabel Trancoso, Ana Isabel Mata:
Prosodic contex-based analysis of disfluencies. 1961-1964 - Britta Lintfert, Bernd Möbius:
Describing the development of intonational categories using a target-oriented parametric approach. 1965-1968
Voice Activity Detection
- Tim Ng, Bing Zhang, Long Nguyen, Spyros Matsoukas, Xinhui Zhou, Nima Mesgarani, Karel Veselý, Pavel Matejka:
Developing a Speech Activity Detection System for the DARPA RATS Program. 1969-1972 - Mohamed Omar:
Speech Activity Detection for Noisy Data Using Adaptation Techniques. 1973-1976 - Ananya Misra:
Speech/Nonspeech Segmentation in Web Videos. 1977-1980 - Philip Harding, Ben Milner:
On the use of Machine Learning Methods for Speech and Voicing Classification. 1981-1984 - Samuel Thomas, Sri Harish Reddy Mallidi, Thomas Janu, Hynek Hermansky, Nima Mesgarani, Xinhui Zhou, Shihab A. Shamma, Tim Ng, Bing Zhang, Long Nguyen, Spyros Matsoukas:
Acoustic and Data-driven Features for Robust Speech Activity Detection. 1985-1988 - Shuo Wang, Wenjun Wu:
A Two-step NMF Based Algorithm for Single Channel Speech Separation. 1989-1992
Systems Demo
- Peter Bell, Myroslava O. Dzikovska, Amy Isard:
A tutorial dialogue system with unrestricted spoken input. 2113-2114 - Xie Sun, Peter Li, Manli Zhu, Qiru Zhou:
Integrating Adaptive Beam-forming and Auditory Features for Robust Large Vocabulary Speech Recognition. 2115-2116 - Hansjörg Hofmann, Ute Ehrlich, Klaus Bader, Ilona Nothelfer, André Berton:
A Natural In-Car Speech Interface to Internet Services Using Hybrid ASR. 2117-2118 - Ronald A. Cole, Daniel Bolaños, Wayne H. Ward, J. T. Carmer, Eric Borts, Edward Svirsky:
How Marni Helps English Language Learners Acquire Oral Reading Fluency. 2119-2120 - Victor S. Finomore:
Demonstration of Advanced Multi-Modal, Network-Centric Communication Management Suite. 2121-2122 - Joris Pelemans, Kris Demuynck, Patrick Wambacq:
Dutch Automatic Speech Recognition on the Web: Towards a General Purpose System. 2123-2126
Perception and Production
- Michael C. W. Yip:
Meaning inhibition and sentence processing in Chinese: Evidence from negative priming. 1993-1996 - Yusuke Ijima, Mitsuaki Isogai, Hideyuki Mizuno:
Similar Speaker Selection Technique Based on Distance Metric Learning with Perceptual Voice Quality Similarity. 1997-2000 - Molly Babel, Grant McGuire:
Gendered sound symbolism and masking effects in speech processing. 2001-2004 - Louis ten Bosch, Odette Scharenborg:
Modeling Cue Trading in Human Word Recognition. 2005-2008 - David Cheng-Huan Li, Elsi Kaiser:
Accounting for Speech Rate in Spoken Word Recognition. 2009-2012 - Iris Hanique, Mirjam Ernestus:
The processes underlying two frequent casual speech phenomena in Dutch: A production experiment. 2013-2016 - Peter Birkholz, Phil Hoole:
Intrinsic velocity differences of lip and jaw movements: preliminary results. 2017-2020 - Malte C. Viebahn, Mirjam Ernestus, James M. McQueen:
Co-occurrence of reduced word forms in natural speech. 2021-2024 - Ikuyo Yoshinaga, Jiangping Kong:
Voice Production Mechanisms of Vibrato in Noh. 2025-2028 - Juan Rafael Orozco-Arroyave, Julián D. Arias-Londoño, Jesús Francisco Vargas-Bonilla, Elmar Nöth:
Automatic detection of hypernasal speech signals using nonlinear and entropy measurements. 2029-2032 - Vincent Aubanel, Martin Cooke, Emma Foster, María Luisa García Lecumberri, Cassie Mayo:
Effects of the availability of visual information and presence of competing conversations on speech production. 2033-2036
Language and Accent Recognition
- Shuai Huang, Glen A. Coppersmith, Damianos G. Karakos:
Constrained Maximum Mutual Information Dimensionality Reduction for Language Identification. 2037-2040 - Mohamed Faouzi BenZeghiba, Jean-Luc Gauvain, Lori Lamel:
Phonotactic Language Recognition Using MLP Features. 2041-2044 - Mikel Peñagarikano, Amparo Varona, Luis Javier Rodríguez-Fuentes, Mireia Díez, Germán Bordel:
The EHU Systems for the NIST 2011 Language Recognition Evaluation. 2045-2048 - Mikel Peñagarikano, Amparo Varona, Mireia Díez, Luis Javier Rodríguez-Fuentes, Germán Bordel:
Study of Different Backends in a State-Of-the-Art Language Recognition System. 2049-2052 - Sibel Yaman, Jason W. Pelecanos, Mohamed Kamal Omar:
On the Use of Non-Linear Polynomial Kernel SVMs in Language Recognition. 2053-2056 - Bing Jiang, Yan Song, Wu Guo, Li-Rong Dai:
Exemplar-Based Sparse Representation for Language Recognition on I-Vectors. 2057-2060 - Yu-Chin Shih, Hung-Shin Lee, Hsin-Min Wang, Shyh-Kang Jeng:
Subspace-Based Feature Representation and Learning for Language Recognition. 2061-2064 - Changhuai You, Haizhou Li, Bin Ma, Kong-Aik Lee:
Effect of Relevance Factor of Maximum a posteriori Adaptation for GMM-SVM in Speaker and Language Recognition. 2065-2068 - Amparo Varona, Mikel Peñagarikano, Luis Javier Rodríguez-Fuentes, Germán Bordel, Mireia Díez:
Using Time-Synchronous Phone Co-occurrences in a SVM-Phonotactic Dialect Recognition System. 2069-2072 - Mahnoosh Mehrabani, Joseph Tepperman, Emily Nava:
Nativeness Classification with Suprasegmental Features on the Accent Group Level. 2073-2076
Voice Search and Spoken Document Retrieval
- Hung-yi Lee, Po-wei Chou, Lin-Shan Lee:
Open-Vocabulary Retrieval of Spoken Content with Shorter/Longer Queries Considering Word/Subword-based Acoustic Feature Similarity. 2077-2080 - Byungki Byun, Ilseo Kim, Sabato Marco Siniscalchi, Chin-Hui Lee:
Consumer-level multimedia event detection through unsupervised audio signal modeling. 2081-2084 - Qin Jin, Peter Franz Schulam, Shourabh Rawat, Susanne Burger, Duo Ding, Florian Metze:
Event-based Video Retrieval Using Audio. 2085-2088 - Xiaodan Zhuang, Stavros Tsakalidis, Shuang Wu, Pradeep Natarajan, Rohit Prasad, Prem Natarajan:
Compact Audio Representation for Event Detection in Consumer Media. 2089-2092 - Chao Liu, Dong Wang, Javier Tejedor:
N-gram FST Indexing for Spoken Term Detection. 2093-2096 - Haruka Majima, Rafael Torres, Yoko Fujita, Hiromichi Kawanami, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano:
Spoken Inquiry Discrimination Using Bag-of-Words for Speech-Oriented Guidance System. 2097-2100 - Stavros Tsakalidis, Xiaodan Zhuang, Roger Hsiao, Shuang Wu, Pradeep Natarajan, Rohit Prasad, Prem Natarajan:
Robust Event Detection From Spoken Content In Consumer Domain Videos. 2101-2104 - Stephanie Pancoast, Murat Akbacak:
Bag-of-Audio-Words Approach for Multimedia Event Classification. 2105-2108 - Ken-ichi Iso, Edward Whittaker, Tadashi Emori, Junpei Miyake:
Improvements in Japanese Voice Search. 2109-2112
Sparse, Template-Based Representations
- Tara N. Sainath, David Nahamoo, Dimitri Kanevsky, Bhuvana Ramabhadran:
Enhancing Exemplar-Based Posteriors for Speech Recognition Tasks. 2130-2133 - Jort F. Gemmeke, Hugo Van hamme:
Advances in noise robust digit recognition using hybrid exemplar-based techniques. 2134-2137 - Antti Hurmalainen, Rahim Saeidi, Tuomas Virtanen:
Group Sparsity for Speaker Identity Discrimination in Factorisation-based Speech Recognition. 2138-2141 - Yang Sun, Bert Cranen, Jort F. Gemmeke, Louis ten Bosch, Lou Boves, Mathew M. Doss:
Using Sparse Classification Outputs as Feature Observations for Noise-robust ASR. 2142-2145 - Serena Soldo, Mathew Magimai-Doss, Hervé Bourlard:
Synthetic References for Template-based ASR using posterior features. 2146-2149 - Dong Wang, Javier Tejedor:
Heterogeneous Convolutive Non-Negative Sparse Coding. 2150-2153
Speaker Diarization
- Jürgen T. Geiger, Ravichander Vipperla, Simon Bozonnet, Nicholas W. D. Evans, Björn W. Schuller, Gerhard Rigoll:
Convolutive Non-Negative Sparse Coding and New Features for Speech Overlap Handling in Speaker Diarization. 2154-2157 - Beatriz Martínez-González, José Manuel Pardo, Julián D. Echeverry-Correa, José A. Vallejo-Pinto, Roberto Barra-Chicote:
Selection of TDOA Parameters for MDM Speaker Diarization. 2158-2161 - Orith Toledo-Ronen, Hagai Aronowitz:
Confidence for Speaker Diarization using PCA Spectral Ratio. 2162-2165 - Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi:
Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model. 2166-2169 - Deepu Vijayasenan, Fabio Valente:
DiarTk : An Open Source Toolkit for Research in Multistream Speaker Diarization and its Application to Meetings Recordings. 2170-2173 - Grégor Dupuy, Mickael Rouvier, Sylvain Meignier, Yannick Estève:
I-vectors and ILP clustering adapted to cross-show speaker diarization. 2174-2177
Speech Production: Imaging and Models
- Assaf Israel, Michael I. Proctor, Louis Goldstein, Khalil Iskarous, Shrikanth S. Narayanan:
Emphatic segments and emphasis spread in Lebanese Arabic: a Real-time Magnetic Resonance Imaging Study. 2178-2181 - Ryan Shosted, Bradley P. Sutton, Abbas Benmamoun:
Using magnetic resonance to image the pharynx during Arabic speech: Static and dynamic aspects. 2182-2185 - Julián Andrés Valdés Vargas, Pierre Badin, Laurent Lamalle:
Articulatory speaker normalisation based on MRI-data using three-way linear decomposition methods. 2186-2189 - Takayuki Arai:
Vowels Produced by Sliding Three-tube Model with Different Lengths. 2190-2193 - Tokihiko Kaburagi, Tetsuro Takano, Yuki Sakamoto:
Estimating the Vocal-Tract Area Function From Formants Using a Sensitivity Function and Least Square. 2194-2197 - Jorge C. Lucero, Laura L. Koenig, Susanne Fuchs:
Modeling source-tract interaction in speech production: Voicing onset vs. vowel height after a voiceless obstruent. 2198-2201
Speech Synthesis
- Bajibabu Bollepalli, Alan W. Black, Kishore Prahallad:
Modelling a Noisy-channel for Voice Conversion Using Articulatory Features. 2202-2205 - Anna C. Janska, Erich Schröger, Thomas Jacobsen, Robert A. J. Clark:
Asymmetries in the perception of synthesized speech. 2206-2209 - Erica Greene, Taniya Mishra, Patrick Haffner, Alistair Conkie:
Predicting Character-Appropriate Voices for a TTS-based Storyteller System. 2210-2213 - Alexander Sorin, Slava Shechtman, Vincent Pollet:
Psychoacoustic Segment Scoring for Multi-Form Speech Synthesis. 2214-2217 - Gérard Bailly, Cécilia Gouvernayre:
Pauses and respiratory markers of the structure of book reading. 2218-2221 - Blaise Potard, Matthew P. Aylett, Christopher J. Pidcock:
Proper Name Splicing in Computer Games with TTS. 2222-2225
Prosodic Prominence: Annotation, Prediction, Applications
- David Escudero Mancebo, Eva Estebas-Vilaplana:
Visualizing tool for evaluating inter-label similarity in prosodic labeling experiments. 2382-2385 - Petra Wagner, Fabio Tamburini, Andreas Windmann:
Objective, Subjective and Linguistic Roads to Perceptual Prominence - How are they compared and why? 2386-2389 - Martin Heckmann:
Audio-visual Evaluation and Detection of Word Prominence in a Human-Machine Interaction Scenario. 2390-2393 - Denis Arnold, Petra Wagner, Bernd Möbius:
Obtaining prominence judgments from naïve listeners - Influence of rating scales, linguistic levels and normalisation. 2394-2397 - Leonardo Badino, Robert A. J. Clark:
Towards Hierarchical Prosodic Prominence Generation in TTS Synthesis. 2398-2401 - Francesco Cutugno, Enrico Leone, Bogdan Ludusan, Antonio Origlia:
Investigating syllabic prominence with Conditional Random Fields and Latent-Dynamic Conditional Random Fields. 2402-2405 - Barbara Samlowski, Petra Wagner, Bernd Möbius:
Disentangling lexical, morphological, syntactic and semantic influences on German prominence - Evidence from a production study. 2406-2409 - Andrew Rosenberg:
Using Prominence and Phrasing Predictions to Improve Weighted Dictionary Pronunciation Models. 2410-2413 - Jean-Philippe Goldman, Mathieu Avanzi, Anne-Catherine Simon, Antoine Auchlin:
A Continuous Prominence Score Based On Acoustic Features. 2414-2417 - Christopher Sappok, Denis Arnold:
More on the Normalization of Syllable Prominence Ratings. 2418-2421 - Tim Mahrt, Jennifer Cole, Margaret M. Fleck, Mark Hasegawa-Johnson:
F0 and the Perception of Prominence. 2422-2425 - Bistra Andreeva, William J. Barry, Magdalena Wolska:
Language differences in the perceptual weight of prominence-lending properties. 2426-2429
Paralinguistics III
- Jun Deng, Björn W. Schuller:
Confidence Measures in Speech Emotion Recognition Based on Semi-supervised Learning. 2226-2229 - Rui Xia, Yang Liu:
Using i-Vector Space Model for Emotion Recognition. 2230-2233 - Nicolas Obin:
Cries and Whispers - Classification of Vocal Effort in Expressive Speech. 2234-2237 - Pouria Fewzee, Fakhri Karray:
Emotional Speech: A Spectral Analysis. 2238-2241 - Andrew Rosenberg:
Classifying Skewed Data: Importance Weighting to Optimize Average Recall. 2242-2245 - Catharine Oertel, Marcin Wlodarczak, Jens Edlund, Petra Wagner, Joakim Gustafson:
Gaze Patterns in Turn-Taking. 2246-2249 - Natalie Fecher:
The 'Audio-Visual Face Cover Corpus': Investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear. 2250-2253 - Dogan Can, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan:
A Case Study: Detecting Counselor Reflections in Psychotherapy for Addictions using Linguistic Features. 2254-2257
Speech and Speaker Segmentation
- Mahnoosh Mehrabani, John H. L. Hansen:
Speaker Clustering for a Mixture of Singing and Reading. 2258-2261 - Sayan Ghosh, T. V. Sreenivas:
Automatic Speech Segmentation Using Probabilistic Latent Component Modeling. 2262-2265 - Jonathan William Dennis, Tran Huy Dat, Engsiong Chng:
Overlapping Sound Event Recognition using Local Spectrogram Features with the Generalised Hough Transform. 2266-2269 - Ozlem Kalinli:
Automatic Phoneme Segmentation Using Auditory Attention Features. 2270-2273 - Jia Min Karen Kua, Tharmarajah Thiruvaran, Eliathamby Ambikairajah:
A Non-Uniform Filterbank for Speaker Recognition. 2274-2277 - Jaime Lorenzo-Trueba, Beatriz Martínez-González, Roberto Barra-Chicote, Verónica López-Ludeña, Javier Ferreiros, Junichi Yamagishi, Juan Manuel Montero:
Towards an Unsupervised Speaking Style Voice Building Framework: Multi-Style Speaker Diarization. 2278-2281 - Seyed Hamidreza Mohammadi, Hossein Sameti, Mahsa Sadat Elyasi Langarani, Amirhossein Tavanaei:
KNNDIST: A Non-Parametric Distance Measure for Speaker Segmentation. 2282-2285 - Wei Feng, Xuecheng Nie, Liang Wan, Lei Xie, Jianmin Jiang:
Lexical Story Co-Segmentation of Chinese Broadcast News. 2286-2289 - Montri Karnjanadecha, Stephen A. Zahorian:
Toward an Optimum Feature Set and HMM Model Parameters for Automatic Phonetic Alignment of Spontaneous Speech. 2290-2293
Spoken Language Understanding
- Tim Schlippe, Sebastian Ochs, Ngoc Thang Vu, Tanja Schultz:
Automatic Error Recovery for Pronunciation Dictionaries. 2298-2301 - Grégory Senay, Georges Linarès:
Confidence measure for speech indexing based on Latent Dirichlet Allocation. 2302-2305 - Christophe Cerisara, Alejandra Lorenzo:
Mixed probabilistic and deterministic dependency parsing. 2306-2309 - Shoko Yamahata, Yoshikazu Yamaguchi, Atsunori Ogawa, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi:
Automatic Vocabulary Adaptation Based on Semantic Similarity and Speech Recognition Confidence Measure. 2310-2313 - Nigel G. Ward, Alejandro Vega:
Towards Empirical Dialog-State Modeling and its Use in Language Modeling. 2314-2317 - Keigo Kubo, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Evaluation of Many-to-Many Alignment Algorithm by Automatic Pronunciation Annotation Using Web Text Mining. 2318-2321 - Sokol Koço, Cécile Capponi, Frédéric Béchet:
Applying multiview learning algorithms to human-human conversation classification. 2322-2325 - Yuya Akita, Makoto Watanabe, Tatsuya Kawahara:
Automatic Transcription of Lecture Speech using Language Model Based on Speaking-Style Transformation of Proceeding Texts. 2326-2329 - Chen Li, Yang Liu:
Normalization of Text Messages Using Character- and Phone-based Machine Translation Approaches. 2330-2333 - Aisha S. Azim, Xiaoxuan Wang, Khe Chai Sim:
A Weighted Combination of Speech with Text-based Models for Arabic Diacritization. 2334-2337 - Matthew Stephen Seigel, Philip C. Woodland:
Using Sub-word-level Information for Confidence Estimation with Conditional Random Field Models. 2338-2341
Spoken Language Applications
- Hung-yi Lee, Yu-Yu Chou, Yow-Bang Wang, Lin-Shan Lee:
Supervised Spoken Document Summarization jointly Considering Utterance Importance and Redundancy by Structured Support Vector Machine. 2342-2345 - Yun-Nung Chen, Florian Metze:
Integrating Intra-Speaker Topic Modeling and Temporal-Based Inter-Speaker Topic Modeling in Random Walk for Improved Multi-Party Meeting Summarization. 2346-2349 - Junlan Feng, Bernard Renger:
Language Modeling for Voice-Enabled Social TV Using Tweets. 2350-2353 - Rohit Kumar, Rohit Prasad, Sankaranarayanan Ananthakrishnan, Aravind Namandi Vembu, David Stallard, Stavros Tsakalidis, Prem Natarajan:
Detecting OOV Named-Entities in Conversational Speech. 2354-2357 - Sameer Maskey, Bowen Zhou:
Unsupervised Deep Belief Features for Speech Translation. 2358-2361 - Alicia Pérez, José M. Alcaide, M. Inés Torres:
EuskoParl: a speech and text Spanish-Basque parallel corpus. 2362-2365 - Hyuksu Ryu, Sunhee Kim, Minhwa Chung:
Comparing transcription agreement on non-native English speech corpus between native and non-native annotators. 2366-2369 - Jun Ogata, Masataka Goto:
PodCastle: Collaborative Training of Language Models on the Basis of Wisdom of Crowds. 2370-2373 - Lei Xie, Yinqing Xu, Lilei Zheng, Qiang Huang, Bingfeng Li:
Speech Pattern Discovery using Audio-Visual Fusion and Canonical Correlation Analysis. 2374-2377 - Sameer Maskey, Andrew Rosenberg:
Power Mean Pyramid Scores for Summarization Evaluation. 2378-2381
Spoken Term and Unseen Word Detection
- Haiyang Li, Jiqing Han, Tieran Zheng, Guibin Zheng:
A Novel Confidence Measure Based on Context Consistency for Spoken Term Detection. 2430-2433 - Panagiota Karanasou, Lukás Burget, Dimitra Vergyri, Murat Akbacak, Arindam Mandal:
Discriminatively trained phoneme confusion model for keyword spotting. 2434-2437 - Keith Kintzley, Aren Jansen, Kenneth Church, Hynek Hermansky:
Inverting the Point Process Model for Fast Phonetic Keyword Search. 2438-2441 - Atta Norouzian, Aren Jansen, Richard C. Rose, Samuel Thomas:
Exploiting Discriminative Point Process Models for Spoken Term Detection. 2442-2445 - Ivan Bulyko, Jose Herrero, Chris Mihelich, Owen Kimball:
Subword speech recognition for detection of unseen words. 2446-2449 - Long Qin, Alexander I. Rudnicky:
OOV Word Detection using Hybrid Models with Mixed Types of Fragments. 2450-2453
Voice Search and Spoken Document Retrieval II
- Jingjing Liu, Scott Cyphers, Panupong Pasupat, Ian McGraw, James R. Glass:
A Conversational Movie Search System Based on Conditional Random Fields. 2454-2457 - Tsung-Hsien Wen, Hung-yi Lee, Lin-Shan Lee:
Interactive Spoken Content Retrieval with Different Types of Actions Optimized By a Markov Decision Process. 2458-2461 - Cyril Allauzen, Edward Benson, Ciprian Chelba, Michael Riley, Johan Schalkwyk:
Voice Query Refinement. 2462-2465 - Aren Jansen, Benjamin Van Durme:
Indexing Raw Acoustic Features for Scalable Zero Resource Search. 2466-2469 - Julien Fayolle, Murat Saraclar, Fabienne Moreau, Christian Raymond, Guillaume Gravier:
Lexical-phonetic automata for spoken utterance indexing and retrieval. 2470-2473 - Ian McGraw, Scott Cyphers, Panupong Pasupat, Jingjing Liu, James R. Glass:
Automating Crowd-supervised Learning for Spoken Language Systems. 2474-2477
Speech and Age Differences
- Soroush Vosoughi, Deb Roy:
An Automatic Child-Directed Speech Detector for the Study of Child Language Development. 2478-2481 - Andrew R. Plummer:
Aligning manifolds to model the earliest phonological abstraction in infant-caretaker vocal imitation. 2482-2485 - Yoko Saikachi, Mafuyu Kitahara, Ken'ya Nishikawa, Ai Kanato, Reiko Mazuka:
The F0 fall delay of lexical pitch accent in Japanese Infant-directed speech. 2486-2489 - Irina Shport:
Children's Productions of Multi-Syllabic Lexical Stress Patterns in Different Prosodic Positions. 2490-2493 - Melissa A. Redford, Laura Dilley, Jessica Gamache, Elizabeth Wieland:
Prosodic Marking of Continuation versus Completion in Children's Narratives. 2494-2497 - Daniel Fogerty, Diane Kewley-Port, Larry E. Humes:
Judging temporal onset differences for concurrent vowels: Results for young, middle-aged, and older adults. 2498-2501
Acoustic Classification
- Pengfei Hu, Wenju Liu, Wei Jiang:
Combining frame and segment based models for environmental sound classification. 2502-2505 - Yi Ren Leng, Tran Huy Dat:
Using Blob Detection in Missing Feature Linear-Frequency Cepstral Coefficients for Robust Sound Event Recognition. 2506-2509 - Kailash Patil, Mounya Elhilali:
Goal-Oriented Auditory Scene Recognition. 2510-2513 - Ali Ziaei, Abhijeet Sangwan, John H. L. Hansen:
Prof-Life-Log: Audio Environment Detection for Naturalistic Audio Streams. 2514-2517 - Po-Sen Huang, Jianchao Yang, Mark Hasegawa-Johnson, Feng Liang, Thomas S. Huang:
Pooling Robust Shift-Invariant Sparse Representations of Acoustic Signals. 2518-2521 - Lee Ngee Tan, Kantapon Kaewtip, Martin L. Cody, Charles E. Taylor, Abeer Alwan:
Evaluation of a Sparse Representation-Based Classifier For Bird Phrase Classification Under Limited Data Conditions. 2522-2525
New Trends in Vowel Nasalization: The Articulation of Nasal Vowels
- Georgia Zellou:
Nasality from Moroccan Arabic Nasal and Pharyngeal Consonants: Patterns of Airflow and Nasalance. 2678-2681 - Véronique Delvaux, Kathy Huet, Myriam Piccaluga, Bernard Harmegnies:
Inter-gestural timing in French nasal vowels: A comparative study of (Liège, Tournai) Northern French vs. (Marseille, Toulouse) Southern French. 2682-2685 - Georgia Zellou, Rebecca Scarborough:
Nasal Coarticulation and Contrastive Stress. 2686-2689 - Catarina Oliveira, Paula Martins, Samuel S. Silva, António J. S. Teixeira:
An MRI study of the oral articulation of European Portuguese nasal vowels. 2690-2693 - Rebecca Scarborough, Georgia Zellou:
Acoustic and Perceptual Similarity in Coarticulatorily Nasalized Vowels. 2694-2697 - Panying Rong, Ryan Shosted, David Kuehn:
Articulatory differences between oral and nasal vowels based on the simulation of a speaker-adaptive articulatory model. 2698-2701
Speech Synthesis: Selected Topics
- Josef R. Novak, Nobuaki Minematsu, Keikichi Hirose, Chiori Hori, Hideki Kashioka, Paul R. Dixon:
Improving WFST-based G2P Conversion with Alignment Constraints and RNNLM N-best Rescoring. 2526-2529 - Jian Luan:
Expand CRF to Model Long Distance Dependencies in Prosodic Break Prediction. 2530-2533 - Nanette Veilleux, Jonathan Barnes, Alejna Brugos, Stefanie Shattuck-Hufnagel:
Perceptual Foundations for Naturalistic Variability in the Prosody of Synthetic Speech. 2534-2537 - Stefan Hahn, Paul Vozila, Maximilian Bisani:
Comparison of Grapheme-to-Phoneme Methods on Large Pronunciation Dictionaries and LVCSR Tasks. 2538-2541 - Frédéric Berthommier, Laurent Girin, Louis-Jean Boë:
A Simple Hybrid Acoustic / Morphologically-Constrained Technique for the Synthesis of Stop Consonants in Various Vocalic Contexts. 2542-2545 - Kishore Prahallad, Naresh Kumar Elluru, Venkatesh Keri, Rajendran S, Alan W. Black:
The IIIT-H Indic Speech Databases. 2546-2549 - Rubén San Segundo, Juan Manuel Montero, Verónica López-Ludeña, Simon King:
Detecting Acronyms from Capital Letter Sequences in Spanish. 2550-2553 - Patrick Lehnen, Stefan Hahn, Vlad-Andrei Guta, Hermann Ney:
Hidden Conditional Random Fields with M-to-N Alignments for Grapheme-to-Phoneme Conversion. 2554-2557 - Andrew Rosenberg, Raul Fernandez, Bhuvana Ramabhadran:
Phrase Boundary Assignment from Text in Multiple Domains. 2558-2561 - Nobuaki Minematsu, Shumpei Kobayashi, Shinya Shimizu, Keikichi Hirose:
Improved Prediction of Japanese Word Accent Sandhi Using CRF. 2562-2565 - Asterios Toutios, Shinji Maeda:
Articulatory VCV Synthesis from EMA Data. 2566-2569
ASR: Deep Neural Networks II
- Oriol Vinyals, Li Deng:
Are Sparse Representations Rich Enough for Acoustic Modeling? 2570-2573 - Yeming Xiao, Zhen Zhang, Shang Cai, Jielin Pan, Yonghong Yan:
A Initial Attempt on Task-Specific Adaptation for Deep Neural Network-based Large Vocabulary Continuous Speech Recognition. 2574-2577 - Navdeep Jaitly, Patrick Nguyen, Andrew W. Senior, Vincent Vanhoucke:
Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition. 2578-2581 - Yanmin Qian, Jia Liu:
Cross-Lingual and Ensemble MLPs Strategies for Low-Resource Speech Recognition. 2582-2585 - Ngoc Thang Vu, Wojtek Breiter, Florian Metze, Tanja Schultz:
Initialization Schemes for Multilayer Perceptron Training and their Impact on ASR Performance using Multilingual Data. 2586-2589 - Sabato Marco Siniscalchi, Jinyu Li, Chin-Hui Lee:
Hermitian based Hidden Activation Functions for Adaptation of Hybrid HMM/ANN Models. 2590-2593 - Yotaro Kubo, Takaaki Hori, Atsushi Nakamura:
Integrating Deep Neural Networks into Structural Classification Approach based on Weighted Finite-State Transducers. 2594-2597 - Li Deng, Brian Hutchinson, Dong Yu:
Parallel Training for Deep Stacking Networks. 2598-2601 - Yanmin Qian, Jia Liu:
Articulatory Feature based Multilingual MLPs for Low-Resource Speech Recognition. 2602-2605 - Ramón Fernandez Astudillo, Alberto Abad, João Paulo Neto:
Uncertainty driven Compensation of Multi-Stream MLP Acoustic Models for Robust ASR. 2606-2609
Robust Speech Recognition II
- Frank Diehl, Philip C. Woodland:
Complementary Phone Error Training. 2610-2613 - Markus Nußbaum-Thom, Zoltán Tüske, Georg Heigold, Ralf Schlüter, Hermann Ney:
Posterior-Scaled MPE: Novel Discriminative Training Criteria. 2614-2617 - Pei Ding, Liqiang He:
Improve the Implementation of Pitch Features for Mandarin Digit String Recognition Task. 2618-2621 - Hsin-Ju Hsieh, Jeih-Weih Hung, Berlin Chen:
Exploring Joint Equalization of Spatial-Temporal Contextual Statistics of Speech Features for Robust Speech Recognition. 2622-2625 - Shigeki Matsuda, Naoya Ito, Kosuke Tsujino, Hideki Kashioka, Shigeki Sagayama:
Speaker-Dependent Voice Activity Detection Robust to Background Speech Noise. 2626-2629 - José A. González, Antonio M. Peinado, Angel M. Gomez, Ning Ma:
Log-spectral feature reconstruction based on an occlusion model for noise robust speech recognition. 2630-2633 - Ahmed Hussen Abdelaziz, Dorothea Kolossa:
Decoding of Uncertain Features Using the Posterior Distribution of the Clean Data for Robust Speech Recognition. 2634-2637 - Ning Ma, Jon Barker:
Coupling identification and reconstruction of missing features for noise-robust automatic speech recognition. 2638-2641 - Bogdan Ludusan, Stefan Ziegler, Guillaume Gravier:
Integrating Stress Information in Large Vocabulary Continuous Speech Recognition. 2642-2645 - Jen-Tzung Chien, Cheng-Chun Chiang:
Group Sparse Hidden Markov Models for Speech Recognition. 2646-2649
Speaker Recognition III
- Johann Poignant, Hervé Bredin, Viet Bac Le, Laurent Besacier, Claude Barras, Georges Quénot:
Unsupervised Speaker Identification using Overlaid Texts in TV Broadcast. 2650-2653 - Yali Zhao, Lei Xie, Zhonghua Fu:
Mask Estimation and Refinement for MFT-based Robust Speaker Verification. 2654-2657 - Hai Yang, Chunyan Liang, Yunfei Xu, Lin Yang, Yonghong Yan:
Sparse Probabilistic Linear Discriminant Analysis for Speaker Verification. 2658-2661 - Achintya Kumar Sarkar, Driss Matrouf, Pierre-Michel Bousquet, Jean-François Bonastre:
Study of the Effect of I-vector Modeling on Short and Mismatch Utterance Duration for Speaker Verification. 2662-2665 - Chien-Lin Huang, Chiori Hori, Hideki Kashioka, Bin Ma:
Ensemble Classifiers Using Unsupervised Data Selection for Speaker Recognition. 2666-2669 - Songgun Hyon, Hongcui Wang, Chen Zhao, Jianguo Wei, Jianwu Dang:
A method of speaker identification based on phoneme mean F-ratio contribution. 2670-2673 - Jeremiah Remus, Jenniffer Estrada, Stephanie A. C. Schuckers:
Mitigating Effects of Recording Condition Mismatch in Speaker Recognition Using Partial Least Squares. 2674-2677
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.