default search action
18th Interspeech 2017: Stockholm, Sweden
- Francisco Lacerda:
18th Annual Conference of the International Speech Communication Association, Interspeech 2017, Stockholm, Sweden, August 20-24, 2017. ISCA 2017
ISCA Medal 2017 Ceremony
- Haizhou Li:
ISCA Medal for Scientific Achievement. 1
Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 1
- Tomi Kinnunen, Md. Sahidullah, Héctor Delgado, Massimiliano Todisco, Nicholas W. D. Evans, Junichi Yamagishi, Kong-Aik Lee:
The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection. 2-6 - Roberto Font, Juan M. Espín, María José Cano:
Experimental Analysis of Features for Replay Attack Detection - Results on the ASVspoof 2017 Challenge. 7-11 - Hemant A. Patil, Madhu R. Kamble, Tanvina B. Patel, Meet H. Soni:
Novel Variable Length Teager Energy Separation Based Instantaneous Frequency Features for Replay Detection. 12-16 - Weicheng Cai, Danwei Cai, Wenbo Liu, Gang Li, Ming Li:
Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and Fusion. 17-21 - Sarfaraz Jelil, Rohan Kumar Das, S. R. Mahadeva Prasanna, Rohit Sinha:
Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features. 22-26 - Marcin Witkowski, Stanislaw Kacprzak, Piotr Zelasko, Konrad Kowalczyk, Jakub Galka:
Audio Replay Attack Detection Using High-Frequency Features. 27-31 - Xianliang Wang, Yanhong Xiao, Xuan Zhu:
Feature Selection Based on CQCCs for Automatic Speaker Verification Spoofing. 32-36
Special Session: Speech Technology for Code-Switching in Multilingual Communities
- Emre Yilmaz, Jelske Dijkstra, Hans Van de Velde, Frederik Kampstra, Jouke Algra, Henk van den Heuvel, David A. van Leeuwen:
Longitudinal Speaker Clustering and Verification Corpus with Code-Switching Frisian-Dutch Speech. 37-41 - Emre Yilmaz, Henk van den Heuvel, David A. van Leeuwen:
Exploiting Untranscribed Broadcast Data for Improved Code-Switching Detection. 42-46 - Vikram Ramanarayanan, David Suendermann-Oeft:
Jee haan, I'd like both, por favor: Elicitation of a Code-Switched Corpus of Hindi-English and Spanish-English Human-Machine Dialog. 47-51 - Sai Krishna Rallabandi, Alan W. Black:
On Building Mixed Lingual Speech Synthesis Systems. 52-56 - Khyathi Raghavi Chandu, Sai Krishna Rallabandi, Sunayana Sitaram, Alan W. Black:
Speech Synthesis for Mixed-Language Navigation Instructions. 57-61 - Djegdjiga Amazouz, Martine Adda-Decker, Lori Lamel:
Addressing Code-Switching in French/Algerian Arabic Speech. 62-66 - Gualberto A. Guzmán, Joseph Ricard, Jacqueline Serigos, Barbara E. Bullock, Almeida Jacqueline Toribio:
Metrics for Modeling Code-Switching Across Corpora. 67-71 - Ewald van der Westhuizen, Thomas Niesler:
Synthesising isiZulu-English Code-Switch Bigrams Using Word Embeddings. 72-76 - Victor Soto, Julia Hirschberg:
Crowdsourcing Universal Part-of-Speech Tags for Code-Switching. 77-81
Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 2
- Galina Lavrentyeva, Sergey Novoselov, Egor Malykh, Alexander Kozlov, Oleg Kudashev, Vadim Shchemelinin:
Audio Replay Attack Detection with Deep Learning Frameworks. 82-86 - Zhe Ji, Zhi-Yi Li, Peng Li, MaoBo An, Shengxiang Gao, Dan Wu, Faru Zhao:
Ensemble Learning for Countermeasure of Audio Replay Spoofing Attack in ASVspoof2017. 87-91 - Lantian Li, Yixiang Chen, Dong Wang, Thomas Fang Zheng:
A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification. 92-96 - Parav Nagarsheth, Elie Khoury, Kailash Patil, Matt Garland:
Replay Attack Detection Using DNN for Channel Discrimination. 97-101 - Zhuxin Chen, Zhifeng Xie, Weibin Zhang, Xiangmin Xu:
ResNet and Model Fusion for Automatic Spoofing Detection. 102-106 - K. N. R. K. Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, Anil Kumar Vuppala:
SFF Anti-Spoofer: IIIT-H Submission for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2017. 107-111
Conversational Telephone Speech Recognition
- William Hartmann, Roger Hsiao, Tim Ng, Jeff Z. Ma, Francis Keith, Man-Hung Siu:
Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features. 112-116 - Jeremy Heng Meng Wong, Mark J. F. Gales:
Student-Teacher Training with Diverse Decision Tree Ensembles. 117-121 - Xiaodong Cui, Vaibhava Goel, George Saon:
Embedding-Based Speaker Adaptive Training of Deep Neural Networks. 122-126 - Jeff Z. Ma, Francis Keith, Tim Ng, Man-Hung Siu, Owen Kimball:
Improving Deliverable Speech-to-Text Systems with Multilingual Knowledge Transfer. 127-131 - George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall:
English Conversational Telephone Speech Recognition by Humans and Machines. 132-136 - Andreas Stolcke, Jasha Droppo:
Comparing Human and Machine Errors in Conversational Speech Transcription. 137-141
Multimodal Paralinguistics
- Volha Petukhova, Manoj Raju, Harry Bunt:
Multimodal Markers of Persuasive Speech: Designing a Virtual Debate Coach. 142-146 - Daniel Bone, Julia Mertens, Emily Zane, Sungbok Lee, Shrikanth S. Narayanan, Ruth B. Grossman:
Acoustic-Prosodic and Physiological Response to Stressful Interactions in Children with Autism Spectrum Disorder. 147-151 - Alec Burmania, Carlos Busso:
A Stepwise Analysis of Aggregated Crowdsourced Labels Describing Multimodal Emotional Behaviors. 152-156 - Gaurav Fotedar, Prasanta Kumar Ghosh:
An Information Theoretic Analysis of the Temporal Synchrony Between Head Gestures and Prosodic Patterns in Spontaneous Speech. 157-161 - Dong-Yan Huang, Wan Ding, Mingyu Xu, Huaiping Ming, Minghui Dong, Xinguo Yu, Haizhou Li:
Multimodal Prediction of Affective Dimensions via Fusing Multiple Regression Techniques. 162-165 - Marion Dohen, Benjamin Roustan:
Co-Production of Speech and Pointing Gestures in Clear and Perturbed Interactive Tasks: Multimodal Designation Strategies. 166-170
Dereverberation, Echo Cancellation and Speech
- Peter Guzewich, Stephen A. Zahorian:
Improving Speaker Verification for Reverberant Conditions with Deep Neural Network Dereverberation Processing. 171-175 - Philipp Bulling, Klaus Linhard, Arthur Wolf, Gerhard Schmidt:
Stepsize Control for Acoustic Feedback Cancellation Based on the Detection of Reverberant Signal Periods and the Estimated System Distance. 176-180 - Jan Franzen, Tim Fingscheidt:
A Delay-Flexible Stereo Acoustic Echo Cancellation for DFT-Based In-Car Communication (ICC) Systems. 181-185 - Dongmei Wang, John H. L. Hansen:
Speech Enhancement Based on Harmonic Estimation Combined with MMSE to Improve Speech Intelligibility for Cochlear Implant Recipients. 186-190 - David Ayllón, Roberto Gil-Pita, Manuel Rosa-Zurera:
Improving Speech Intelligibility in Binaural Hearing Aids by Estimating a Time-Frequency Mask with a Weighted Least Squares Classifier. 191-195 - Tsung-Chen Wu, Tai-Shih Chi, Chia-Fone Lee:
Simulations of High-Frequency Vocoder on Mandarin Speech Recognition for Acoustic Hearing Preserved Cochlear Implant. 196-200
Acoustic and Articulatory Phonetics
- Zainab Hermes, Marissa S. Barlaz, Ryan Shosted, Zhi-Pei Liang, Bradley P. Sutton:
Phonetic Correlates of Pharyngeal and Pharyngealized Consonants in Saudi, Lebanese, and Jordanian Arabic: An rt-MRI Study. 201-205 - Benjamin Elie, Yves Laprie:
Glottal Opening and Strategies of Production of Fricatives. 206-209 - Mohamed Yassine Frej, Christopher Carignan, Catherine T. Best:
Acoustics and Articulation of Medial versus Final Coronal Stop Gemination Contrasts in Moroccan Arabic. 210-214 - Giuseppina Turco, Karim Shoul, Rachid Ridouane:
How are Four-Level Length Distinctions Produced? Evidence from Moroccan Arabic. 215-218 - Caroline Jones, Katherine Demuth, Weicong Li, Andre Almeida:
Vowels in the Barunga Variety of North Australian Kriol. 219-223 - Indranil Dutta, Irfan S., Pamir Gogoi, Priyankoo Sarmah:
Nature of Contrast and Coarticulation: Evidence from Mizo Tones and Assamese Vowel Harmony. 224-228
Multimodal and Articulatory Synthesis
- João Paulo Cabral, Benjamin R. Cowan, Katja Zibrek, Rachel McDonnell:
The Influence of Synthetic Voice on the Evaluation of a Virtual Character. 229-233 - Amelia Jane Gully, Takenori Yoshimura, Damian T. Murphy, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda:
Articulatory Text-to-Speech Synthesis Using the Digital Waveguide Mesh Driven by a Deep Neural Network. 234-238 - Sébastien Le Maguer, Ingmar Steiner, Alexander Hewer:
An HMM/DNN Comparison for Synchronized Text-to-Speech and Tongue Motion Synthesis. 239-243 - Rachel Alexander, Tanner Sorensen, Asterios Toutios, Shrikanth S. Narayanan:
VCV Synthesis Using Task Dynamics to Animate a Factor-Based Articulatory Model. 244-248 - Joseph Mendelson, Matthew P. Aylett:
Beyond the Listening Test: An Interactive Approach to TTS Evaluation. 249-253 - Beiming Cao, Myung Jong Kim, Jan P. H. van Santen, Ted Mau, Jun Wang:
Integrating Articulatory Information in Deep Learning-Based Text-to-Speech Synthesis. 254-258
Neural Networks for Language Modeling
- Min Ma, Michael Nirschl, Fadi Biadsy, Shankar Kumar:
Approaches for Neural-Network Language Model Adaptation. 259-263 - Youssef Oualil, Dietrich Klakow:
A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models. 264-268 - Xie Chen, Anton Ragni, Xunying Liu, Mark J. F. Gales:
Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition. 269-273 - Yinghui Huang, Abhinav Sethy, Bhuvana Ramabhadran:
Fast Neural Network Language Model Lookups at N-Gram Speeds. 274-278 - Gakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, George Saon:
Empirical Exploration of Novel Architectures and Objectives for Language Models. 279-283 - Karel Benes, Murali Karthick Baskar, Lukás Burget:
Residual Memory Networks in Language Modeling: Improving the Reputation of Feed-Forward Networks. 284-288
Pathological Speech and Language
- Amir Hossein Poorjam, Jesper Rindom Jensen, Max A. Little, Mads Græsbøll Christensen:
Dominant Distortion Classification for Pre-Processing of Vowels in Remote Biomedical Voice Analysis. 289-293 - Duc Le, Keli Licata, Emily Mower Provost:
Automatic Paraphasia Detection from Aphasic Speech: A Preliminary Study. 294-298 - Nicanor García, Juan Rafael Orozco-Arroyave, Luis Fernando D'Haro, Najim Dehak, Elmar Nöth:
Evaluation of the Neurological State of People with Parkinson's Disease Using i-Vectors. 299-303 - Yu-Ren Chien, Michal Borský, Jón Guðnason:
Objective Severity Assessment from Disordered Voice Using Estimated Glottal Airflow. 304-308 - Florian B. Pokorny, Björn W. Schuller, Peter B. Marschik, Raymond Brueckner, Pär Nyström, Nicholas Cummins, Sven Bölte, Christa Einspieler, Terje Falck-Ytter:
Earlier Identification of Children with Autism Spectrum Disorder: An Automatic Vocalisation-Based Approach. 309-313 - Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth:
Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson's Disease. 314-318
Speech Analysis and Representation 1
- Linxue Bai, Peter Jancovic, Martin J. Russell, Philip Weber, Stephen M. Houghton:
Phone Classification Using a Non-Linear Manifold with Broad Phone Class Dependent DNNs. 319-323 - Siyuan Chen, Julien Epps, Eliathamby Ambikairajah, Phu Ngoc Le:
An Investigation of Crowd Speech for Room Occupancy Estimation. 324-328 - Karthika Vijayan, Jitendra Kumar Dhiman, Chandra Sekhar Seelamantula:
Time-Frequency Coherence for Periodic-Aperiodic Decomposition of Speech Signals. 329-333 - Alexsandro R. Meireles, Antônio R. M. Simões, Antonio Celso Ribeiro, Beatriz Raposo de Medeiros:
Musical Speech: A New Methodology for Transcribing Speech Prosody. 334-338 - K. S. Nataraj, Prem C. Pandey, Hirak Dasgupta:
Estimation of Place of Articulation of Fricatives from Spectral Characteristics for Speech Training. 339-343 - Tom Bäckström:
Estimation of the Probability Distribution of Spectral Fine Structure in the Speech Source. 344-348
Perception of Dialects and L2
- Sucheta Ghosh, Camille Fauth, Yves Laprie, Aghilas Sini:
End-to-End Acoustic Feedback in Language Learning for Correcting Devoiced French Final-Fricatives. 349-353 - Ewa Jacewicz, Robert Allen Fox:
Dialect Perception by Older Children. 354-358 - Kiyoko Yoneyama, Mafuyu Kitahara, Keiichi Tajima:
Perception of Non-Contrastive Variations in American English by Japanese Learners: Flaps are Less Favored Than Stops. 359-363 - Lieke van Maastricht, Tim Zee, Emiel Krahmer, Marc Swerts:
L1 Perceptions of L2 Prosody: The Interplay Between Intonation, Rhythm, and Speech Rate and Their Contribution to Accentedness and Comprehensibility. 364-368 - Izumi Takiguchi:
Effects of Pitch Fall and L1 on Vowel Length Identification in L2 Japanese. 369-373 - Yuanyuan Zhang, Hongwei Ding:
A Preliminary Study of Prosodic Disambiguation by Chinese EFL Learners. 374-378
Far-field Speech Recognition
- Chanwoo Kim, Ananya Misra, Kean K. Chin, Thad Hughes, Arun Narayanan, Tara N. Sainath, Michiel Bacchiani:
Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home. 379-383 - Keisuke Kinoshita, Marc Delcroix, Haeyong Kwon, Takuma Mori, Tomohiro Nakatani:
Neural Network-Based Spectrum Estimation for Online WPE Dereverberation. 384-388 - Osamu Ichikawa, Takashi Fukuda, Gakuto Kurata, Steven J. Rennie:
Factorial Modeling for Effective Suppression of Directional Noise. 389-393 - Yanhui Tu, Jun Du, Lei Sun, Feng Ma, Chin-Hui Lee:
On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones. 394-398 - Bo Li, Tara N. Sainath, Arun Narayanan, Joe Caroselli, Michiel Bacchiani, Ananya Misra, Izhak Shafran, Hasim Sak, Golan Pundak, Kean K. Chin, Khe Chai Sim, Ron J. Weiss, Kevin W. Wilson, Ehsan Variani, Chanwoo Kim, Olivier Siohan, Mitchel Weintraub, Erik McDermott, Richard Rose, Matt Shannon:
Acoustic Modeling for Google Home. 399-403 - Seyedmahdad Mirsamadi, John H. L. Hansen:
On Multi-Domain Training and Adaptation of End-to-End RNN Acoustic Models for Distant Speech Recognition. 404-408
Speech Analysis and Representation 2
- Masanori Morise, Genta Miyashita, Kenji Ozawa:
Low-Dimensional Representation of Spectral Envelope Without Deterioration for Full-Band Speech Analysis/Synthesis System. 409-413 - Erfan Loweimi, Jon Barker, Oscar Saz-Torralba, Thomas Hain:
Robust Source-Filter Separation of Speech Signal in the Phase Domain. 414-418 - Simon Stone, Peter Steiner, Peter Birkholz:
A Time-Warping Pitch Tracking Algorithm Considering Fast f0 Changes. 419-423 - Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda:
A Modulation Property of Time-Frequency Derivatives of Filtered Phase and its Application to Aperiodicity and fo Estimation. 424-428 - Avinash Kumar, Syed Shahnawazuddin, Gayadhar Pradhan:
Non-Local Estimation of Speech Signal for Vowel Onset Point Detection in Varied Environments. 429-433 - Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Géza Németh:
Time-Domain Envelope Modulating the Noise Component of Excitation in a Continuous Residual-Based Vocoder for Statistical Parametric Speech Synthesis. 434-438 - Chia-Lung Wu, Hsiang-Ping Hsu, Syu-Siang Wang, Jeih-Weih Hung, Ying-Hui Lai, Hsin-Min Wang, Yu Tsao:
Wavelet Speech Enhancement Based on Robust Principal Component Analysis. 439-443 - Bidisha Sharma, S. R. Mahadeva Prasanna:
Vowel Onset Point Detection Using Sonority Information. 444-448 - Unto K. Laine:
Analytic Filter Bank for Speech Analysis, Feature Extraction and Perceptual Studies. 449-453 - Christian Kroos, Mark D. Plumbley:
Learning the Mapping Function from Voltage Amplitudes to Sensor Positions in 3D-EMA Using Deep Neural Networks. 454-458
Speech and Audio Segmentation and Classification 2
- Jia Dai, Wei Xue, Wenju Liu:
Multilingual i-Vector Based Statistical Modeling for Music Genre Classification. 459-463 - Banriskhem K. Khonglah, K. T. Deepak, S. R. Mahadeva Prasanna:
Indoor/Outdoor Audio Classification Using Foreground Speech Segmentation. 464-468 - Jinxi Guo, Ning Xu, Li-Jia Li, Abeer Alwan:
Attention Based CLDNNs for Short-Duration Acoustic Scene Classification. 469-473 - Xianjun Xia, Roberto Togneri, Ferdous Ahmed Sohel, David Huang:
Frame-Wise Dynamic Threshold Based Polyphonic Acoustic Event Detection. 474-478 - Inseon Jang, Chunghyun Ahn, Jeongil Seo, Younseon Jang:
Enhanced Feature Extraction for Speech Detection in Media Audio. 479-483 - Sukanya Sonowal, Tushar Sandhan, In Kyu Choi, Nam Soo Kim:
Audio Classification Using Class-Specific Learned Descriptors. 484-487 - Janek Ebbers, Jahn Heymann, Lukas Drude, Thomas Glarner, Reinhold Haeb-Umbach, Bhiksha Raj:
Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery. 488-492 - Matthias Zöhrer, Franz Pernkopf:
Virtual Adversarial Training and Data Augmentation for Acoustic Event Detection with Gated Recurrent Neural Networks. 493-497 - Michael McAuliffe, Michaela Socolof, Sarah Mihuc, Michael Wagner, Morgan Sonderegger:
Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi. 498-502 - G. Nisha Meenakshi, Prasanta Kumar Ghosh:
A Robust Voiced/Unvoiced Phoneme Classification from Whispered Speech Using the 'Color' of Whispered Phonemes and Deep Neural Network. 503-507
Search, Computational Strategies and Language Modeling
- Ian Williams, Petar S. Aleksic:
Rescoring-Aware Beam Search for Reduced Search Errors in Contextual Automatic Speech Recognition. 508-512 - Thomas Zenkel, Ramon Sanabria, Florian Metze, Jan Niehues, Matthias Sperber, Sebastian Stüker, Alex Waibel:
Comparison of Decoding Strategies for CTC Acoustic Models. 513-517 - Hossein Hadian, Daniel Povey, Hossein Sameti, Sanjeev Khudanpur:
Phone Duration Modeling for LVCSR Using Neural Networks. 518-522 - Jan Chorowski, Navdeep Jaitly:
Towards Better Decoding and Language Model Integration in Sequence to Sequence Models. 523-527 - Wenpeng Li, Binbin Zhang, Lei Xie, Dong Yu:
Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling. 528-532 - Xu Xiang, Yanmin Qian, Kai Yu:
Binary Deep Neural Networks for Speech Recognition. 533-537 - Akshay Chandrashekaran, Ian R. Lane:
Hierarchical Constrained Bayesian Optimization for Feature, Acoustic Model and Decoder Parameter Optimization. 538-542 - Shohei Toyama, Daisuke Saito, Nobuaki Minematsu:
Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech Recognition. 543-547 - Vardaan Pahuja, Anirban Laha, Shachar Mirkin, Vikas C. Raykar, Lili Kotlerman, Guy Lev:
Joint Learning of Correlated Sequence Labeling Tasks Using Bidirectional Recurrent Neural Networks. 548-552 - Xiaoyu Shen, Youssef Oualil, Clayton Greenberg, Mittul Singh, Dietrich Klakow:
Estimation of Gap Between Current Language Models and Human Performance. 553-557 - Anna Moró, György Szaszák:
A Phonological Phrase Sequence Modelling Approach for Resource Efficient and Robust Real-Time Punctuation Recovery. 558-562
Speech Perception
- Lei Wang, Fei Chen:
Factors Affecting the Intelligibility of Low-Pass Filtered Speech. 563-566 - Shiyu Wang, Fei Chen:
Phonetic Restoration of Temporally Reversed Speech. 567-570 - Mako Ishida:
Simultaneous Articulatory and Acoustic Distortion in L1 and L2 Listening: Locally Time-Reversed "Fast" Speech. 571-575 - L. Ann Burchfield, San-hei Kenny Luk, Mark Antoniou, Anne Cutler:
Lexically Guided Perceptual Learning in Mandarin Chinese. 576-580 - Chris Davis, Chee Seng Chong, Jeesun Kim:
The Effect of Spectral Profile on the Intelligibility of Emotional Speech in Noise. 581-585 - Merel Maslowski, Antje S. Meyer, Hans Rutger Bosker:
Whether Long-Term Tracking of Speech Rate Affects Perception Depends on Who is Talking. 586-590 - Daniel Oliveira Peres, Dominic Watt, Waldemar Ferreira Netto:
Emotional Thin-Slicing: A Proposal for a Short- and Long-Term Division of Emotional Speech. 591-595 - Adriana Guevara-Rukoz, Erika Parlato-Oliveira, Shi Yu, Yuki Hirose, Sharon Peperkamp, Emmanuel Dupoux:
Predicting Epenthetic Vowel Quality from Acoustics. 596-600 - Toshie Matsui, Toshio Irino, Kodai Yamamoto, Hideki Kawahara, Roy D. Patterson:
The Effect of Spectral Tilt on Size Discrimination of Voiced Speech Sounds. 601-605 - Jaime Lorenzo-Trueba, Cassia Valentini-Botinhao, Gustav Eje Henter, Junichi Yamagishi:
Misperceptions of the Emotional Content of Natural and Vocoded Speech in a Car. 606-610 - Oliver Niebuhr, Jana Winkler:
The Relative Cueing Power of F0 and Duration in German Prominence Perception. 611-615 - Luciana Marques, Rebecca Scarborough:
Perception and Acoustics of Vowel Nasality in Brazilian Portuguese. 616-620 - Jonny Kim, Katie Drager:
Sociophonetic Realizations Guide Subsequent Lexical Access. 621-625
Speech Production and Perception
- Samuel Silva, António J. S. Teixeira:
Critical Articulators Identification from RT-MRI of the Vocal Tract. 626-630 - Krishna Somandepalli, Asterios Toutios, Shrikanth S. Narayanan:
Semantic Edge Detection for Tracking Vocal Tract Air-Tissue Boundaries in Real-Time Magnetic Resonance Images. 631-635 - Sasan Asadiabadi, Engin Erzin:
Vocal Tract Airway Tissue Boundary Tracking for rtMRI Using Shape and Appearance Priors. 636-640 - T. V. Ananthapadmanabha, A. G. Ramakrishnan, Shubham Sharma:
An Objective Critical Distance Measure Based on the Relative Level of Spectral Valley. 641-644 - Tanner Sorensen, Zisis Iason Skordilis, Asterios Toutios, Yoon-Chul Kim, Yinghua Zhu, Jangwon Kim, Adam C. Lammert, Vikram Ramanarayanan, Louis Goldstein, Dani Byrd, Krishna S. Nayak, Shrikanth S. Narayanan:
Database of Volumetric and Real-Time Vocal Tract MRI for Speech Science. 645-649 - Chong Cao, Yanlu Xie, Qi Zhang, Jinsong Zhang:
The Influence on Realization and Perception of Lexical Tones from Affricate's Aspiration. 650-654 - Matthias K. Franken, Frank Eisner, Jan-Mathijs Schoffelen, Daniel J. Acheson, Peter Hagoort, James M. McQueen:
Audiovisual Recalibration of Vowel Categories. 655-658 - Judith Peters, Marieke Hoetjes:
The Effect of Gesture on Persuasive Speech. 659-663 - Wei Lai:
Auditory-Visual Integration of Talker Gender in Cantonese Tone Perception. 664-668 - Takayuki Ito, Hiroki Ohashi, Eva Montas, Vincent L. Gracco:
Event-Related Potentials Associated with Somatosensory Effect in Audio-Visual Speech Perception. 669-673 - Lena F. Renner, Marcin Wlodarczak:
When a Dog is a Cat and How it Changes Your Pupil Size: Pupil Dilation in Response to Information Mismatch. 674-678 - Win Thuzar Kyaw, Yoshinori Sagisaka:
Cross-Modal Analysis Between Phonation Differences and Texture Images Based on Sentiment Correlations. 679-683 - Daryush D. Mehta, Patrick C. Chwalek, Thomas F. Quatieri, Laura J. Brattain:
Wireless Neck-Surface Accelerometer and Microphone on Flex Circuit with Application to Noise-Robust Monitoring of Lombard Speech. 684-688 - Andrea Bandini, Aravind Namasivayam, Yana Yunusova:
Video-Based Tracking of Jaw Movements During Speech: Preliminary Results and Future Directions. 689-693 - S. B. Sunil Kumar, K. Sreenivasa Rao, Tanumay Mandal:
Accurate Synchronization of Speech and EGG Signal Using Phase Information. 694-698 - Anna Sara H. Romøren, Aoju Chen:
The Acquisition of Focal Lengthening in Stockholm Swedish. 699-703
Multi-lingual Models and Adaptation for ASR
- Shiyu Zhou, Yuanyuan Zhao, Shuang Xu, Bo Xu:
Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech Recognition. 704-708 - Olivier Siohan:
CTC Training of Multi-Phone Acoustic Models for Speech Recognition. 709-713 - Sibo Tong, Philip N. Garner, Hervé Bourlard:
An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation. 714-718 - Martin Karafiát, Murali Karthick Baskar, Pavel Matejka, Karel Veselý, Frantisek Grézl, Lukás Burget, Jan Cernocký:
2016 BUT Babel System: Multilingual BLSTM Acoustic Model with i-Vector Based Adaptation. 719-723 - Marco Matassoni, Alessio Brutti, Daniele Falavigna:
Optimizing DNN Adaptation for Recognition of Enhanced Speech. 724-728 - Younggwan Kim, Hyungjun Lim, Jahyun Goo, Hoirin Kim:
Deep Least Squares Regression for Speaker Adaptation. 729-733 - Van Hai Do, Nancy F. Chen, Boon Pang Lim, Mark Hasegawa-Johnson:
Multi-Task Learning Using Mismatched Transcription for Under-Resourced Speech Recognition. 734-738 - Neethu Mariam Joy, Sandeep Reddy Kothinti, Srinivasan Umesh, Basil Abraham:
Generalized Distillation Framework for Speaker Normalization. 739-743 - Lahiru Samarakoon, Brian Mak, Khe Chai Sim:
Learning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic Models. 744-748 - Joachim Fainberg, Steve Renals, Peter Bell:
Factorised Representations for Neural Network Adaptation to Diverse Acoustic Environments. 749-753
Prosody and Text Processing
- Richard Sproat, Navdeep Jaitly:
An RNN Model of Text Normalization. 754-758 - Asaf Rendel, Raul Fernandez, Zvi Kons, Andrew Rosenberg, Ron Hoory, Bhuvana Ramabhadran:
Weakly-Supervised Phrase Assignment from Text in a Speech-Synthesis System Using Noisy Labels. 759-763 - Yusuke Ijima, Nobukatsu Hojo, Ryo Masumura, Taichi Asami:
Prosody Aware Word-Level Encoder Based on BLSTM-RNNs for DNN-Based Speech Synthesis. 764-768 - Jinfu Ni, Yoshinori Shiga, Hisashi Kawai:
Global Syllable Vectors for Building TTS Front-End with Deep Learning. 769-773 - Ishin Fukuoka, Kazuhiko Iwata, Tetsunori Kobayashi:
Prosody Control of Utterance Sequence for Information Delivering. 774-778 - Yuchen Huang, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai:
Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer. 779-783 - Yibin Zheng, Jianhua Tao, Zhengqi Wen, Ya Li, Bin Liu:
Investigating Efficient Feature Representation Methods and Training Objective for BLSTM-Based Phone Duration Prediction. 784-788 - Bo Chen, Tianling Bian, Kai Yu:
Discrete Duration Model for Speech Synthesis. 789-793 - Bo Chen, Jiahao Lai, Kai Yu:
Comparison of Modeling Target in LSTM-RNN Duration Model. 794-798 - Manuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi:
Learning Word Vector Representations Based on Acoustic Counts. 799-803 - Éva Székely, Joseph Mendelson, Joakim Gustafson:
Synthesising Uncertainty: The Interplay of Vocal Effort and Hesitation Disfluencies. 804-808
Show & Tell 1
- Alp Öktem, Mireia Farrús, Leo Wanner:
Prosograph: A Tool for Prosody Visualisation of Large Speech Corpora. 809-810 - Svetlana Vetchinnikova, Anna Mauranen, Nina Mikusová:
ChunkitApp: Investigating the Relevant Units of Online Speech Processing. 811-812 - Markus Jochim:
Extending the EMU Speech Database Management System: Cloud Hosting, Team Collaboration, Automatic Revision Control. 813-814 - Anne S. Warlaumont, Mark VanDam, Elika Bergelson, Alejandrina Cristià:
HomeBank: A Repository for Long-Form Real-World Audio Recordings of Children. 815-816 - Peter Bell, Joachim Fainberg, Catherine Lai, Mark Sinclair:
A System for Real Time Collaborative Transcription Correction. 817-818 - Chitralekha Bhat, Anjali Kant, Bhavik Vachhani, Sarita Rautara, Ashok Kumar Sinha, Sunil Kumar Kopparapu:
MoPAReST - Mobile Phone Assisted Remote Speech Therapy Platform. 819-820
Show & Tell 2
- Aurore Jaumard-Hakoun, Samy Chikhi, Takfarinas Medani, Angelika Nair, Gérard Dreyfus, François-Benoît Vialatte:
An Apparatus to Investigate Western Opera Singing Skill Learning Using Performance and Result Biofeedback, and Measuring its Neural Correlates. 821-822 - Christoph Draxler:
PercyConfigurator - Perception Experiments as a Service. 823-824 - Askars Salimbajevs, Indra Ikauniece:
System for Speech Transcription and Post-Editing in Microsoft Word. 825-826 - Ji Ho Park, Nayeon Lee, Dario Bertero, Anik Dey, Pascale Fung:
Emojive! Collecting Emotion Data from Speech and Facial Expression Using Mobile Game App. 827-828 - Mietta Lennes, Jussi Piitulainen, Martin Matthiesen:
Mylly - The Mill: A New Platform for Processing Speech and Text Corpora Easily and Efficiently. 829-830 - Kyori Suzuki, Ian Wilson, Hayato Watanabe:
Visual Learning 2: Pronunciation App Using Ultrasound, Video, and MRI. 831-832
Keynote 1: James Allen
- James Allen:
Dialogue as Collaborative Problem Solving. 833
Special Session: Speech and Human-Robot Interaction
- Brian Stasak, Julien Epps, Roland Goecke:
Elicitation Design for Acoustic Depression Classification: An Investigation of Articulation Effort, Linguistic Complexity, and Word Affect. 834-838 - José Novoa, Jorge Wuth, Juan Pablo Escudero, Josué Fredes, Rodrigo Mahú, Richard M. Stern, Néstor Becerra Yoma:
Robustness Over Time-Varying Channels in DNN-HMM ASR Based Human-Robot Interaction. 839-843 - Bekir Berker Türker, Zana Buçinca, Engin Erzin, Yücel Yemez, T. Metin Sezgin:
Analysis of Engagement and User Experience with a Laughter Responsive Social Robot. 844-848 - Alice Baird, Shahin Amiriparian, Nicholas Cummins, Alyssa M. Alcorn, Anton Batliner, Sergey Pugachevskiy, Michael Freitag, Maurice Gerczuk, Björn W. Schuller:
Automatic Classification of Autistic Child Vocalisations: A Novel Database and Results. 849-853 - Catharine Oertel, Patrik Jonell, Dimosthenis Kontogiorgos, Joseph Mendelson, Jonas Beskow, Joakim Gustafson:
Crowd-Sourced Design of Artificial Attentive Listeners. 854-858 - Leonardo Lancia, Thierry Chaminade, Noël Nguyen, Laurent Prévot:
Studying the Link Between Inter-Speaker Coordination and Speech Imitation Through Human-Machine Interactions. 859-863
Special Session: Incremental Processing and Responsive Behaviour
- Samuel Delalez, Christophe d'Alessandro:
Adjusting the Frame: Biphasic Performative Control of Speech Rhythm. 864-868 - Raheleh Saryazdi, Craig G. Chambers:
Attentional Factors in Listeners' Uptake of Gesture Cues During Speech Processing. 869-873 - Carlos Toshinori Ishi, Takashi Minato, Hiroshi Ishiguro:
Motion Analysis in Vocalized Surprise Expressions. 874-878 - Robin Ruede, Markus Müller, Sebastian Stüker, Alex Waibel:
Enhancing Backchannel Prediction Using Word Embeddings. 879-883 - Eran Raveh, Ingmar Steiner, Bernd Möbius:
A Computational Model for Phonetically Responsive Spoken Dialogue Systems. 884-888 - Eustace Ebhotemhen, Volha Petukhova, Dietrich Klakow:
Incremental Dialogue Act Recognition: Token- vs Chunk-Based Classification. 889-893
Special Session: Acoustic Manifestations of Social Characteristics
- Oliver Niebuhr:
Clear Speech - Mere Speech? How Segmental and Prosodic Speech Reduction Shape the Impression That Speakers Create on Listeners. 894-898 - Charlotte Kouklia, Nicolas Audibert:
Relationships Between Speech Timing and Perceived Hostility in a French Corpus of Political Debates. 899-903 - Laura Fernández Gallardo, Benjamin Weiss:
Towards Speaker Characterization: Identifying and Predicting Dimensions of Person Attribution. 904-908 - Carlos Toshinori Ishi, Jun Arai, Norihiro Hagita:
Prosodic Analysis of Attention-Drawing Speech. 909-913 - Adrian P. Simpson, Riccarda Funk, Frederik Palmer:
Perceptual and Acoustic CorreLates of Gender in the Prepubertal Voice. 914-918 - Katrin Schweitzer, Michael Walsh, Antje Schweitzer:
To See or not to See: Interlocutor Visibility and Likeability Influence Convergence in Intonation. 919-923 - Melanie Weirich, Adrian P. Simpson:
Acoustic Correlates of Parental Role and Gender Identity in the Speech of Expecting Parents. 924-928 - Rubén Solera-Ureña, Helena Moniz, Fernando Batista, Vera Cabarrão, Anna Pompili, Ramón Fernandez Astudillo, Joana Campos, Ana Paiva, Isabel Trancoso:
A Semi-Supervised Learning Approach for Acoustic-Prosodic Personality Perception in Under-Resourced Domains. 929-933 - Rachael Tatman, Conner Kasten:
Effects of Talker Dialect, Gender & Race on Accuracy of Bing Speech and YouTube Automatic Captions. 934-938
Neural Network Acoustic Models for ASR 1
- Rohit Prabhavalkar, Kanishka Rao, Tara N. Sainath, Bo Li, Leif Johnson, Navdeep Jaitly:
A Comparison of Sequence-to-Sequence Models for Speech Recognition. 939-943 - Albert Zeyer, Eugen Beck, Ralf Schlüter, Hermann Ney:
CTC in the Context of Generalized Full-Sum HMM Training. 944-948 - Takaaki Hori, Shinji Watanabe, Yu Zhang, William Chan:
Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM. 949-953 - Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith:
Multitask Learning with CTC and Segmental CRF for Speech Recognition. 954-958 - Kartik Audhkhasi, Bhuvana Ramabhadran, George Saon, Michael Picheny, David Nahamoo:
Direct Acoustics-to-Word Models for English Conversational Speech Recognition. 959-963 - Bo Li, Tara N. Sainath:
Reducing the Computational Complexity of Two-Dimensional LSTMs. 964-968
Models of Speech Production
- Jorge C. Lucero:
Functional Principal Component Analysis of Vocal Tract Area Functions. 969-973 - Ganesh Sivaraman, Carol Y. Espy-Wilson, Martijn Wieling:
Analysis of Acoustic-to-Articulatory Speech Inversion Across Different Accents and Languages. 974-978 - Takayuki Arai:
Integrated Mechanical Model of [r]-[l] and [b]-[m]-[w] Producing Consonant Cluster [br]. 979-983 - Leonardo Badino, Luca Franceschi, Raman Arora, Michele Donini, Massimiliano Pontil:
A Speaker Adaptive DNN Training Approach for Speaker-Independent Acoustic Inversion. 984-988 - Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu:
Acoustic-to-Articulatory Mapping Based on Mixture of Probabilistic Canonical Correlation Analysis. 989-993 - Tanner Sorensen, Asterios Toutios, Johannes Töger, Louis Goldstein, Shrikanth S. Narayanan:
Test-Retest Repeatability of Articulatory Strategies Using Real-Time Magnetic Resonance Imaging. 994-998
Speaker Recognition
- David Snyder, Daniel Garcia-Romero, Daniel Povey, Sanjeev Khudanpur:
Deep Neural Network Embeddings for Text-Independent Speaker Verification. 999-1003 - Jesús Villalba, Niko Brümmer, Najim Dehak:
Tied Variational Autoencoder Backends for i-Vector Speaker Recognition. 1004-1008 - Shivesh Ranjan, John H. L. Hansen:
Improved Gender Independent Speaker Recognition Using Convolutional Neural Network Based Bottleneck Features. 1009-1013 - Suwon Shon, Seongkyu Mun, Wooil Kim, Hanseok Ko:
Autoencoder Based Domain Adaptation for Speaker Recognition Under Insufficient Channel Information. 1014-1018 - Abbas Khosravani, Mohammad Mehdi Homayounpour:
Nonparametrically Trained Probabilistic Linear Discriminant Analysis for i-Vector Speaker Verification. 1019-1023 - Jesús Jorrín, Paola García, Luis Buera:
DNN Bottleneck Features for Speaker Clustering. 1024-1028
Phonation and Voice Quality
- Kätlin Aare, Pärtel Lippus, Juraj Simko:
Creak as a Feature of Lexical Stress in Estonian. 1029-1033 - Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl:
Cross-Speaker Variation in Voice Source Correlates of Focus and Deaccentuation. 1034-1038 - Sishir Kalita, Wendy Lalhminghlui, Luke Horo, Priyankoo Sarmah, S. R. Mahadeva Prasanna, Samarendra Dandapat:
Acoustic Characterization of Word-Final Glottal Stops in Mizo and Assam Sora. 1039-1043 - Parham Mokhtari, Hiroshi Ando:
Iterative Optimal Preemphasis for Improved Glottal-Flow Estimation by Iterative Adaptive Inverse Filtering. 1044-1048 - Yaniv Sheena, Mísa Hejná, Yossi Adi, Joseph Keshet:
Automatic Measurement of Pre-Aspiration. 1049-1053 - Kiranpreet Nara:
Acoustic and Electroglottographic Study of Breathy and Modal Vowels as Produced by Heritage and Native Gujarati Speakers. 1054-1058
Speech Synthesis Prosody
- Xin Wang, Shinji Takaki, Junichi Yamagishi:
An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis. 1059-1063 - Viacheslav Klimkov, Adam Nadolski, Alexis Moinet, Bartosz Putrycz, Roberto Barra-Chicote, Thomas Merritt, Thomas Drugman:
Phrase Break Prediction for Long-Form Reading TTS: Exploiting Text Structure Information. 1064-1068 - Kou Tanaka, Hirokazu Kameoka, Tomoki Toda, Satoshi Nakamura:
Physically Constrained Statistical F0 Prediction for Electrolaryngeal Speech Enhancement. 1069-1073 - Nobukatsu Hojo, Yasuhito Ohsugi, Yusuke Ijima, Hirokazu Kameoka:
DNN-SPACE: DNN-HMM-Based Generative Model of Voice F0 Contours for Statistical Phrase/Accent Command Estimation. 1074-1078 - Zofia Malisz, Harald Berthelsen, Jonas Beskow, Joakim Gustafson:
Controlling Prominence Realisation in Parametric DNN-Based Speech Synthesis. 1079-1083 - Simon Betz, Jana Voße, Sina Zarrieß, Petra Wagner:
Increasing Recall of Lengthening Detection via Semi-Automatic Classification. 1084-1088
Emotion Recognition
- Aharon Satt, Shai Rozenberg, Ron Hoory:
Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms. 1089-1093 - Ruo Zhang, Atsushi Ando, Satoshi Kobashikawa, Yushi Aono:
Interaction and Transition Model for Speech Emotion Recognition in Dialogue. 1094-1097 - John Gideon, Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Emily Mower Provost:
Progressive Neural Networks for Transfer Learning in Emotion Recognition. 1098-1102 - Srinivas Parthasarathy, Carlos Busso:
Jointly Predicting Arousal, Valence and Dominance with Multi-Task Learning. 1103-1107 - Duc Le, Zakaria Aldeneh, Emily Mower Provost:
Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural Network. 1108-1112 - Jaebok Kim, Gwenn Englebienne, Khiet P. Truong, Vanessa Evers:
Towards Speech Emotion Recognition "in the Wild" Using Aggregated Corpora and Deep Multi-Task Learning. 1113-1117
WaveNet and Novel Paradigms
- Akira Tamamori, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda:
Speaker-Dependent WaveNet Vocoder. 1118-1122 - Yu Gu, Zhen-Hua Ling:
Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth Extension. 1123-1127 - Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi:
Direct Modeling of Frequency Spectra and Waveform Generation Based on Phase Recovery for DNN-Based Speech Synthesis. 1128-1132 - Srikanth Ronanki, Oliver Watts, Simon King:
A Hierarchical Encoder-Decoder Model for Statistical Parametric Speech Synthesis. 1133-1137 - Kazuhiro Kobayashi, Tomoki Hayashi, Akira Tamamori, Tomoki Toda:
Statistical Voice Conversion with WaveNet-Based Waveform Generation. 1138-1142 - Vincent Wan, Yannis Agiomyrgiannakis, Hanna Silén, Jakub Vít:
Google's Next-Generation Real-Time Unit-Selection Synthesizer Using Sequence-to-Sequence LSTM-Based Autoencoders. 1143-1147
Models of Speech Perception
- Alexander Kain, Max Del Giudice, Kris Tjaden:
A Comparison of Sentence-Level Speech Intelligibility Metrics. 1148-1152 - Toshio Irino, Eri Takimoto, Toshie Matsui, Roy D. Patterson:
An Auditory Model of Speaker Size Perception for Voiced Speech Sounds. 1153-1157 - Louis ten Bosch, Lou Boves, Mirjam Ernestus:
The Recognition of Compounds: A Computational Account. 1158-1162 - Mohsen Zareian Jahromi, Jan Østergaard, Jesper Jensen:
Humans do not Maximize the Probability of Correct Decision When Recognizing DANTALE Words in Noise. 1163-1167 - Rainer Huber, Constantin Spille, Bernd T. Meyer:
Single-Ended Prediction of Listening Effort Based on Automatic Speech Recognition. 1168-1172 - Chris Neufeld:
Modeling Categorical Perception with the Receptive Fields of Auditory Neurons. 1173-1177
Source Separation and Auditory Scene Analysis
- Yannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee:
A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation. 1178-1182 - Takuya Higuchi, Keisuke Kinoshita, Marc Delcroix, Katerina Zmolíková, Tomohiro Nakatani:
Deep Clustering-Based Beamforming for Separation with Unknown Number of Sources. 1183-1187 - Shadi Pirhosseinloo, Kostas Kokkinakis:
Time-Frequency Masking for Blind Source Separation with Preserved Spatial Cues. 1188-1192 - Jen-Tzung Chien, Kuan-Ting Kuo:
Variational Recurrent Neural Networks for Speech Separation. 1193-1197 - Valentin Andrei, Horia Cucu, Corneliu Burileanu:
Detecting Overlapped Speech on Short Timeframes Using Deep Learning. 1198-1202 - Xu Li, Junfeng Li, Yonghong Yan:
Ideal Ratio Mask Estimation Using Deep Neural Networks for Monaural Speech Segregation in Noisy Reverberant Conditions. 1203-1207
Prosody: Tone and Intonation
- Sergio I. Quiroz, Marzena Zygis:
The Vocative Chant and Beyond: German Calling Melodies Under Routine and Urgent Contexts. 1208-1212 - Juraj Simko, Antti Suni, Katri Hiovain, Martti Vainio:
Comparing Languages Using Hierarchical Prosodic Analysis. 1213-1217 - Martin Ho Kwan Ip, Anne Cutler:
Intonation Facilitates Prediction of Focus Even in the Presence of Lexical Tones. 1218-1222 - Katharina Zahner, Heather Kember, Bettina Braun:
Mind the Peak: When Museum is Temporarily Understood as Musical in Australian English. 1223-1227 - Luca Rognoni, Judith Bishop, Miriam Corris:
Pashto Intonation Patterns. 1228-1232 - Kikuo Maekawa:
A New Model of Final Lowering in Spontaneous Monologue. 1233-1237
Emotion Modeling
- Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai:
Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space. 1238-1242 - Saurabh Sahu, Rahul Gupta, Ganesh Sivaraman, Wael AbdAlmageed, Carol Y. Espy-Wilson:
Adversarial Auto-Encoders for Speech Based Emotion Recognition. 1243-1247 - Ting Dang, Vidhyasaharan Sethu, Julien Epps, Eliathamby Ambikairajah:
An Investigation of Emotion Prediction Uncertainty Using Gaussian Mixture Regression. 1248-1252 - Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Melvin G. McInnis, Emily Mower Provost:
Capturing Long-Term Temporal Dependencies with Convolutional Networks for Continuous Emotion Recognition. 1253-1257 - Ailbhe Ní Chasaide, Irena Yanushevskaya, Christer Gobl:
Voice-to-Affect Mapping: Inferences on Language Voice Baseline Settings. 1258-1262 - Michael Neumann, Ngoc Thang Vu:
Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech. 1263-1267
Voice Conversion 1
- Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities. 1268-1272 - Wei-Ning Hsu, Yu Zhang, James R. Glass:
Learning Latent Representations for Speech Generation and Transformation. 1273-1277 - Tetsuya Hashimoto, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu:
Parallel-Data-Free Many-to-Many Voice Conversion Based on DNN Integrated with Eigenspace Using a Non-Parallel Speech Corpus. 1278-1282 - Takuhiro Kaneko, Hirokazu Kameoka, Kaoru Hiramatsu, Kunio Kashino:
Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks. 1283-1287 - Luc Ardaillon, Axel Roebel:
A Mouth Opening Effect Based on Pole Modification for Expressive Singing Voice Transformation. 1288-1292 - Seyed Hamidreza Mohammadi, Alexander Kain:
Siamese Autoencoders for Speech Style Extraction and Switching Applied to Voice Identification and Conversion. 1293-1297
Neural Network Acoustic Models for ASR 2
- Hasim Sak, Matt Shannon, Kanishka Rao, Françoise Beaufays:
Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence Mapping. 1298-1302 - Golan Pundak, Tara N. Sainath:
Highway-LSTM and Recurrent Highway Networks for Speech Recognition. 1303-1307 - Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio:
Improving Speech Recognition by Revising Gated Recurrent Units. 1308-1312 - Jen-Tzung Chien, Chen Shen:
Stochastic Recurrent Neural Network for Speech Recognition. 1313-1317 - Martin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf:
Frame and Segment Level Recurrent Neural Networks for Phone Classification. 1318-1322 - Kyu Jeong Han, Seongjun Hahm, Byung-Hak Kim, Jungsuk Kim, Ian R. Lane:
Deep Learning-Based Telephony Speech Recognition in the Wild. 1323-1327
Speaker Recognition Evaluation
- Kong-Aik Lee, Ville Hautamäki, Tomi Kinnunen, Anthony Larcher, Chunlei Zhang, Andreas Nautsch, Themos Stafylakis, Gang Liu, Mickaël Rouvier, Wei Rao, Federico Alegre, J. Ma, Man-Wai Mak, Achintya Kumar Sarkar, Héctor Delgado, Rahim Saeidi, Hagai Aronowitz, Aleksandr Sizov, Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Bin Ma, Ville Vestman, Md. Sahidullah, M. Halonen, Anssi Kanervisto, Gaël Le Lan, Fahimeh Bahmaninezhad, Sergey Isadskiy, Christian Rathgeb, Christoph Busch, Georgios Tzimiropoulos, Q. Qian, Z. Wang, Q. Zhao, T. Wang, H. Li, J. Xue, S. Zhu, R. Jin, T. Zhao, Pierre-Michel Bousquet, Moez Ajili, Waad Ben Kheder, Driss Matrouf, Zhi Hao Lim, Chenglin Xu, Haihua Xu, Xiong Xiao, Eng Siong Chng, Benoit G. B. Fauve, Kaavya Sriskandaraja, Vidhyasaharan Sethu, W. W. Lin, Dennis Alexander Lehmann Thomsen, Zheng-Hua Tan, Massimiliano Todisco, Nicholas W. D. Evans, Haizhou Li, John H. L. Hansen, Jean-François Bonastre, Eliathamby Ambikairajah:
The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016. 1328-1332 - Pedro A. Torres-Carrasquillo, Fred Richardson, Shahan C. Nercessian, Douglas E. Sturim, William M. Campbell, Youngjune Gwon, Swaroop Vattam, Najim Dehak, Sri Harish Reddy Mallidi, Phani Sankar Nidadavolu, Ruizhi Li, Réda Dehak:
The MIT-LL, JHU and LRDE NIST 2016 Speaker Recognition Evaluation System. 1333-1337 - Daniele Colibro, Claudio Vair, Emanuele Dalmasso, Kevin Farrell, Gennady Karvitsky, Sandro Cumani, Pietro Laface:
Nuance - Politecnico di Torino's 2016 NIST Speaker Recognition Evaluation System. 1338-1342 - Chunlei Zhang, Fahimeh Bahmaninezhad, Shivesh Ranjan, Chengzhu Yu, Navid Shokouhi, John H. L. Hansen:
UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation. 1343-1347 - Oldrich Plchot, Pavel Matejka, Anna Silnova, Ondrej Novotný, Mireia Díez Sánchez, Johan Rohdin, Ondrej Glembek, Niko Brümmer, Albert Swart, Jesús Jorrín-Prieto, Paola García, Luis Buera, Patrick Kenny, Md. Jahangir Alam, Gautam Bhattacharya:
Analysis and Description of ABC Submission to NIST SRE 2016. 1348-1352 - Seyed Omid Sadjadi, Timothée Kheyrkhah, Audrey Tong, Craig S. Greenberg, Douglas A. Reynolds, Elliot Singer, Lisa P. Mason, Jaime Hernandez-Cordero:
The 2016 NIST Speaker Recognition Evaluation. 1353-1357
Glottal Source Modeling
- Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda, Toshio Irino:
A New Cosine Series Antialiasing Function and its Application to Aliasing-Free Glottal Source Models for Speech and Singing Synthesis. 1358-1362 - Ana Ramírez López, Shreyas Seshadri, Lauri Juvela, Okko Räsänen, Paavo Alku:
Speaking Style Conversion from Normal to Lombard Speech Using a Glottal Vocoder and Bayesian GMMs. 1363-1367 - Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku:
Reducing Mismatch in Training of DNN-Based Glottal Excitation Models in a Statistical Parametric Text-to-Speech System. 1368-1372 - Alexander Sorin, Slava Shechtman, Asaf Rendel:
Semi Parametric Concatenative TTS with Instant Voice Modification Capabilities. 1373-1377 - Rodrigo Manríquez, Sean D. Peterson, Pavel Prado, Patricio Orio, Matías Zañartu:
Modeling Laryngeal Muscle Activation Noise for Low-Order Physiological Based Speech Synthesis. 1378-1382 - Felipe Espic, Cassia Valentini-Botinhao, Simon King:
Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech Synthesis. 1383-1387
Prosody: Rhythm, Stress, Quantity and Phrasing
- Heather Kember, Ann-Kathrin Grohe, Katharina Zahner, Bettina Braun, Andrea Weber, Anne Cutler:
Similar Prosodic Structure Perceived Differently in German and English. 1388-1392 - Luying Hou, Bert Le Bruyn, René Kager:
Disambiguate or not? - The Role of Prosody in Unambiguous and Potentially Ambiguous Anaphora Production in Strictly Mandarin Parallel Structures. 1393-1397 - Angeliki Athanasopoulou, Irene Vogel, Hossep Dolatian:
Acoustic Properties of Canonical and Non-Canonical Stress in French, Turkish, Armenian and Brazilian Portuguese. 1398-1402 - Leendert Plug, Rachel Smith:
Phonological Complexity, Segment Rate and Speech Tempo Perception. 1403-1406 - Jing Yang, Yu Zhang, Aijun Li, Li Xu:
On the Duration of Mandarin Tones. 1407-1411 - Otto Ewald, Eva Liina Asu, Susanne Schötz:
The Formant Dynamics of Long Close Vowels in Three Varieties of Swedish. 1412-1416
Speech Recognition for Language Learning
- Yao Qian, Keelan Evanini, Xinhao Wang, Chong Min Lee, Matthew Mulholland:
Bidirectional LSTM-RNN for Improving Automated Assessment of Non-Native Children's Speech. 1417-1421 - Junwei Yue, Fumiya Shiozawa, Shohei Toyama, Yutaka Yamauchi, Kayoko Ito, Daisuke Saito, Nobuaki Minematsu:
Automatic Scoring of Shadowing Speech Based on DNN Posteriors and Their DTW. 1422-1426 - Chong Min Lee, Su-Youn Yoon, Xihao Wang, Matthew Mulholland, Ikkyu Choi, Keelan Evanini:
Off-Topic Spoken Response Detection Using Siamese Convolutional Neural Networks. 1427-1431 - Vipul Arora, Aditi Lahiri, Henning Reetz:
Phonological Feature Based Mispronunciation Detection and Diagnosis Using Multi-Task DNNs and Active Learning. 1432-1436 - Jorge Proença, Carla Lopes, Michael Tjalve, Andreas Stolcke, Sara Candeias, Fernando Perdigão:
Detection of Mispronunciations and Disfluencies in Children Reading Aloud. 1437-1441 - David Escudero Mancebo, César González Ferreras, Lourdes Aguilar, Eva Estebas-Vilaplana:
Automatic Assessment of Non-Native Prosody by Measuring Distances on Prosodic Label Sequences. 1442-1446
Stance, Credibility, and Deception
- Nigel G. Ward, Jason C. Carlson, Olac Fuentes, Diego Castán, Elizabeth Shriberg, Andreas Tsiartas:
Inferring Stance from Prosody. 1447-1451 - Gina-Anne Levow, Richard A. Wright:
Exploring Dynamic Measures of Stance in Spoken Interaction. 1452-1456 - Valentin Barrière, Chloé Clavel, Slim Essid:
Opinion Dynamics Modeling for Movie Review Transcripts Classification with Hidden Conditional Random Fields. 1457-1461 - Qinyi Luo, Rahul Gupta, Shrikanth S. Narayanan:
Transfer Learning Between Concepts for Human Behavior Modeling: An Application to Sincerity and Deception Prediction. 1462-1466 - Anne Schröder, Simon Stone, Peter Birkholz:
The Sound of Deception - What Makes a Speaker Credible? 1467-1471 - Gideon Mendels, Sarah Ita Levitan, Kai-Zhan Lee, Julia Hirschberg:
Hybrid Acoustic-Lexical Deep Learning Approach for Deception Detection. 1472-1476
Short Utterances Speaker Recognition
- Albert Swart, Niko Brümmer:
A Generative Model for Score Normalization in Speaker Recognition. 1477-1481 - Subhadeep Dey, Srikanth R. Madikeri, Petr Motlícek, Marc Ferras:
Content Normalization for Text-Dependent Speaker Verification. 1482-1486 - Chunlei Zhang, Kazuhito Koishida:
End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances. 1487-1491 - Hong Yu, Zheng-Hua Tan, Zhanyu Ma, Jun Guo:
Adversarial Network Bottleneck Features for Noise Robust Speaker Verification. 1492-1496 - Shuai Wang, Yanmin Qian, Kai Yu:
What Does the Speaker Embedding Encode? 1497-1501 - Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong-Aik Lee:
Incorporating Local Acoustic Variability Information into Short Duration Speaker Verification. 1502-1506 - Jinghua Zhong, Wenping Hu, Frank K. Soong, Helen Meng:
DNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances. 1507-1511 - Ville Vestman, Dhananjaya N. Gowda, Md. Sahidullah, Paavo Alku, Tomi Kinnunen:
Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions. 1512-1516 - Gautam Bhattacharya, Jahangir Alam, Patrick Kenny:
Deep Speaker Embeddings for Short-Duration Speaker Verification. 1517-1521 - Soo Jin Park, Gary Yeung, Jody Kreiman, Patricia A. Keating, Abeer Alwan:
Using Voice Quality Features to Improve Short-Utterance, Text-Independent Speaker Verification Systems. 1522-1526 - Kong-Aik Lee, Haizhou Li:
Gain Compensation for Fast i-Vector Extraction Over Short Duration. 1527-1531 - Hee-Soo Heo, Jee-weon Jung, Il-Ho Yang, Sung-Hyun Yoon, Ha-Jin Yu:
Joint Training of Expanded End-to-End DNN for Text-Dependent Speaker Verification. 1532-1536
Speaker Characterization and Recognition
- Chen Chen, Jiqing Han, Yilin Pan:
Speaker Verification via Estimating Total Variability Space Using Probabilistic Partial Least Squares. 1537-1541 - Lantian Li, Yixiang Chen, Ying Shi, Zhiyuan Tang, Dong Wang:
Deep Speaker Feature Learning for Text-Independent Speaker Verification. 1542-1546 - Pierre-Michel Bousquet, Mickael Rouvier:
Duration Mismatch Compensation Using Four-Covariance Model and Deep Neural Network for Speaker Verification. 1547-1551 - Alan McCree, Gregory Sell, Daniel Garcia-Romero:
Extended Variability Modeling and Unsupervised Adaptation for PLDA Speaker Recognition. 1552-1556 - Bengt J. Borgström, Elliot Singer, Douglas A. Reynolds, Seyed Omid Sadjadi:
Improving the Effectiveness of Speaker Verification Domain Adaptation with Inadequate In-Domain Data. 1557-1561 - Zhili Tan, Man-Wai Mak:
i-Vector DNN Scoring and Calibration for Noise Robust Speaker Verification. 1562-1566 - Pavel Matejka, Ondrej Novotný, Oldrich Plchot, Lukás Burget, Mireia Díez Sánchez, Jan Cernocký:
Analysis of Score Normalization in Multilingual Speaker Recognition. 1567-1571 - Anna Silnova, Lukás Burget, Jan Cernocký:
Alternative Approaches to Neural Network Based Speaker Verification. 1572-1575 - Ruchir Travadi, Shrikanth S. Narayanan:
A Distribution Free Formulation of the Total Variability Model. 1576-1580 - Md. Hafizur Rahman, Ivan Himawan, David Dean, Sridha Sridharan:
Domain Mismatch Modeling of Out-Domain i-Vectors for PLDA Speaker Verification. 1581-1585
Acoustic Models for ASR 1
- Gaofeng Cheng, Vijayaditya Peddinti, Daniel Povey, Vimal Manohar, Sanjeev Khudanpur, Yonghong Yan:
An Exploration of Dropout with LSTMs. 1586-1590 - Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee:
Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition. 1591-1595 - Dung T. Tran, Marc Delcroix, Shigeki Karita, Michael Hentschel, Atsunori Ogawa, Tomohiro Nakatani:
Unfolded Deep Recurrent Convolutional Neural Network with Jump Ahead Connections for Acoustic Modeling. 1596-1600 - Shigeki Karita, Atsunori Ogawa, Marc Delcroix, Tomohiro Nakatani:
Forward-Backward Convolutional LSTM for Acoustic Modeling. 1601-1605 - Sercan Ömer Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Christopher Fougner, Ryan Prenger, Adam Coates:
Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting. 1606-1610 - Chunyang Wu, Mark J. F. Gales:
Deep Activation Mixture Model for Speech Recognition. 1611-1615 - Michael Heck, Masayuki Suzuki, Takashi Fukuda, Gakuto Kurata, Satoshi Nakamura:
Ensembles of Multi-Scale VGG Acoustic Models. 1616-1620 - Tamás Grósz, Gábor Gosztolya, László Tóth:
Training Context-Dependent DNN Acoustic Models Using Probabilistic Sampling. 1621-1625 - Tamás Grósz, Gábor Gosztolya, László Tóth:
A Comparative Evaluation of GMM-Free State Tying Methods for ASR. 1626-1630
Acoustic Models for ASR 2
- Yiming Wang, Vijayaditya Peddinti, Hainan Xu, Xiaohui Zhang, Daniel Povey, Sanjeev Khudanpur:
Backstitch: Counteracting Finite-Sample Bias via Negative Steps. 1631-1635 - Ryu Takeda, Kazuhiro Nakadai, Kazunori Komatani:
Node Pruning Based on Entropy of Weights and Node Activity for Small-Footprint Acoustic Model Based on Deep Neural Networks. 1636-1640 - Ehsan Variani, Tom Bagby, Erik McDermott, Michiel Bacchiani:
End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow. 1641-1645 - Khe Chai Sim, Arun Narayanan:
An Efficient Phone N-Gram Forward-Backward Computation Using Dense Matrix Multiplication. 1646-1650 - Zoltán Tüske, Wilfried Michel, Ralf Schlüter, Hermann Ney:
Parallel Neural Network Features for Improved Tandem Acoustic Modeling. 1651-1655 - Qingming Tang, Weiran Wang, Karen Livescu:
Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis. 1656-1660
Dialog Modeling
- Ryo Masumura, Taichi Asami, Hirokazu Masataki, Ryo Ishii, Ryuichiro Higashinaka:
Online End-of-Turn Detection from Speech Based on Stacked Time-Asynchronous Sequential Networks. 1661-1665 - Marcin Wlodarczak, Kornel Laskowski, Mattias Heldner, Kätlin Aare:
Improving Prediction of Speech Activity Using Multi-Participant Respiratory State. 1666-1670 - Peter A. Heeman, Rebecca Lunsford:
Turn-Taking Offsets and Dialogue Context. 1671-1675 - Angelika Maier, Julian Hough, David Schlangen:
Towards Deep End-of-Turn Prediction for Situated Spoken Dialogue Systems. 1676-1680 - Yuichi Ishimoto, Takehiro Teraoka, Mika Enomoto:
End-of-Utterance Prediction by Prosodic Features and Phrase-Dependency Structure in Spontaneous Japanese Speech. 1681-1685 - Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro:
Turn-Taking Estimation Model Based on Joint Embedding of Lexical and Prosodic Contents. 1686-1690 - Hirofumi Inaguma, Koji Inoue, Masato Mimura, Tatsuya Kawahara:
Social Signal Detection in Spontaneous Dialogue Using Bidirectional LSTM-CTC. 1691-1695 - Zahra Rahimi, Anish Kumar, Diane J. Litman, Susannah Paletz, Mingzhi Yu:
Entrainment in Multi-Party Spoken Dialogues at Multiple Linguistic Levels. 1696-1700 - Justine Reverdy, Carl Vogel:
Measuring Synchrony in Task-Based Dialogues. 1701-1705 - Paul A. Crook, Alex Marin:
Sequence to Sequence Modeling for User Simulation in Dialog Systems. 1706-1710 - Vikram Ramanarayanan, Patrick L. Lange, Keelan Evanini, Hillary R. Molloy, David Suendermann-Oeft:
Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human-Machine Spoken Dialog Interactions. 1711-1715 - Atsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono:
Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center Calls. 1716-1720 - Stefan Ultes, Pawel Budzianowski, Iñigo Casanueva, Nikola Mrksic, Lina Maria Rojas-Barahona, Pei-Hao Su, Tsung-Hsien Wen, Milica Gasic, Steve J. Young:
Domain-Independent User Satisfaction Reward Estimation for Dialogue Policy Learning. 1721-1725 - Shizuka Nakamura, Ryosuke Nakanishi, Katsuya Takanashi, Tatsuya Kawahara:
Analysis of the Relationship Between Prosodic Features of Fillers and its Forms or Occurrence Positions. 1726-1730 - Syeda Narjis Fatima, Engin Erzin:
Cross-Subject Continuous Emotion Recognition Using Speech and Body Motion in Dyadic Interactions. 1731-1735
L1 and L2 Acquisition
- Micha Elsner, Kiwako Ito:
An Automatically Aligned Corpus of Child-Directed Speech. 1736-1740 - Ocke-Schwen Bohn, Trine Askjær-Jørgensen:
A Comparison of Danish Listeners' Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences. 1741-1744 - Felicitas Kleber:
On the Role of Temporal Variability in the Acquisition of the German Vowel Length Contrast. 1745-1749 - Patrick F. Reidy, Mary E. Beckman, Jan Edwards, Benjamin Munson:
A Data-Driven Approach for Perceptually Validated Acoustic Features for Children's Sibilant Fricative Productions. 1750-1754 - Yujia Xiao, Frank K. Soong:
Proficiency Assessment of ESL Learner's Sentence Prosody with TTS Synthesized Voice as Reference. 1755-1759 - Si Chen, Yunjuan He, Chun Wah Yuen, Bei Li, Yike Yang:
Mechanisms of Tone Sandhi Rule Application by Non-Native Speakers. 1760-1764 - Seth Wiener:
Changes in Early L2 Cue-Weighting of Non-Native Speech: Evidence from Learners of Mandarin Chinese. 1765-1769 - Ying Chen, Eric Pederson:
Directing Attention During Perceptual Training: A Preliminary Study of Phonetic Learning in Southern Min by Mandarin Speakers. 1770-1774 - Dean Luo, Ruxin Luo, Lixin Wang:
Prosody Analysis of L2 English for Naturalness Evaluation Through Speech Modification. 1775-1778 - Gintare Grigonyte, Gerold Schneider:
Measuring Encoding Efficiency in Swedish and English Language Learner Speech Production. 1779-1783 - Adriana Hanulíková, Jenny Ekström:
Lexical Adaptation to a Novel Accent in German: A Comparison Between German, Swedish, and Finnish Listeners. 1784-1788 - Alejandra Keidel Fernández, Thomas Hörberg:
Qualitative Differences in L3 Learners' Neurophysiological Response to L1 versus L2 Transfer. 1789-1793 - Johan Sjons, Thomas Hörberg, Robert Östling, Johannes Bjerva:
Articulation Rate in Swedish Child-Directed Speech Increases as a Function of the Age of the Child Even When Surprisal is Controlled for. 1794-1798 - Kaile Zhang, Gang Peng:
The Relationship Between the Perception and Production of Non-Native Tones. 1799-1803 - Ellen Marklund, Elísabet Eir Cortes, Johan Sjons:
MMN Responses in Adults After Exposure to Bimodal and Unimodal Frequency Distributions of Rotated Speech. 1804-1808
Voice, Speech and Hearing Disorders
- Visar Berisha, Julie Liss, Timothy Huston, Alan Wisler, Yishan Jiao, Jonathan Eig:
Float Like a Butterfly Sting Like a Bee: Changes in Speech Preceded Parkinsonism Diagnosis for Muhammad Ali. 1809-1813 - Antonella Castellana, Andreas Selamtzis, Giampiero Salvi, Alessio Carullo, Arianna Astolfi:
Cepstral and Entropy Analyses in Vowels Excerpted from Continuous Speech of Dysphonic and Control Speakers. 1814-1818 - Andrea Bandini, Jordan R. Green, Lorne Zinman, Yana Yunusova:
Classification of Bulbar ALS from Kinematic Features of the Jaw and Lips: Towards Computer-Mediated Assessment. 1819-1823 - Nagaraj Adiga, Vikram C. M., Keerthi Pullela, S. R. Mahadeva Prasanna:
Zero Frequency Filter Based Analysis of Voice Disorders. 1824-1828 - Nikitha K., Sishir Kalita, Vikram C. M., M. Pushpavathi, S. R. Mahadeva Prasanna:
Hypernasality Severity Analysis in Cleft Lip and Palate Speech Using Vowel Space Area. 1829-1833 - Imed Laaridh, Waad Ben Kheder, Corinne Fredouille, Christine Meunier:
Automatic Prediction of Speech Evaluation Metrics for Dysarthric Speech. 1834-1838 - Philipp Klumpp, Thomas Janu, Tomás Arias-Vergara, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth:
Apkinson - A Mobile Monitoring Solution for Parkinson's Disease. 1839-1843 - Jan Hlavnicka, Tereza Tykalová, Roman Cmejla, Jirí Klempír, Evzen Ruzicka, Jan Rusz:
Dysprosody Differentiate Between Parkinson's Disease, Progressive Supranuclear Palsy, and Multiple System Atrophy. 1844-1848 - Ming Tu, Visar Berisha, Julie Liss:
Interpretable Objective Assessment of Dysarthric Speech Based on Deep Neural Networks. 1849-1853 - Bhavik Vachhani, Chitralekha Bhat, Biswajit Das, Sunil Kumar Kopparapu:
Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition. 1854-1858 - Jason Lilley, Madhavi Vedula Ratnagiri, H. Timothy Bunnell:
Prediction of Speech Delay from Acoustic Measurements. 1859-1863 - Aijun Li, Hua Zhang, Wen Sun:
The Frequency Range of "The Ling Six Sounds" in Standard Chinese. 1864-1868 - Wentao Gu, Jiao Yin, James J. Mahshie:
Production of Sustained Vowels and Categorical Perception of Tones in Mandarin Among Cochlear-Implanted Children. 1869-1873
Source Separation and Voice Activity Detection
- Anurag Kumar, Benjamin Elizalde, Bhiksha Raj:
Audio Content Based Geotagging in Multimedia. 1874-1878 - Zhaoqiong Huang, Zhanzhong Cao, Dongwen Ying, Jielin Pan, Yonghong Yan:
Time Delay Histogram Based Speech Source Separation Using a Planar Array. 1879-1883 - Gayadhar Pradhan, Avinash Kumar, Syed Shahnawazuddin:
Excitation Source Features for Improving the Detection of Vowel Onset and Offset Points in a Speech Sequence. 1884-1888 - Wei Gao, Roberto Togneri, Victor Sreeram:
A Contrast Function and Algorithm for Blind Separation of Audio Signals. 1889-1893 - Chenglin Xu, Xiong Xiao, Sining Sun, Wei Rao, Eng Siong Chng, Haizhou Li:
Weighted Spatial Covariance Matrix Estimation for MUSIC Based TDOA Estimation of Speech Source. 1894-1898 - Feng Guo, Yuhang Cao, Zheng Liu, Jiaen Liang, Baoqing Li, Xiaobing Yuan:
Speaker Direction-of-Arrival Estimation Based on Frequency-Independent Beampattern. 1899-1903 - Xianyun Wang, Changchun Bao, Feng Bao:
A Mask Estimation Method Integrating Data Field Model for Speech Enhancement. 1904-1908 - Matt Shannon, Gabor Simko, Shuo-Yiin Chang, Carolina Parada:
Improved End-of-Query Detection for Streaming Speech Recognition. 1909-1913 - Di He, Zuofu Cheng, Mark Hasegawa-Johnson, Deming Chen:
Using Approximated Auditory Roughness as a Pre-Filtering Feature for Human Screaming and Affective Speech AED. 1914-1918 - Jeroen Zegers, Hugo Van hamme:
Improving Source Separation via Multi-Speaker Representations. 1919-1923 - Bing Yang, Hong Liu, Cheng Pang:
Multiple Sound Source Counting and Localization Based on Spatial Principal Eigenvector. 1924-1928 - Girija Ramesan Karthik, Prasanta Kumar Ghosh:
Subband Selection for Binaural Speech Source Localization. 1929-1933 - Bo-Rui Chen, Huang-Yi Lee, Yi-Wen Liu:
Unmixing Convolutive Mixtures by Exploiting Amplitude Co-Modulation: Methods and Evaluation on Mandarin Speech Recordings. 1934-1937 - Fei Tao, Carlos Busso:
Bimodal Recurrent Neural Network for Audiovisual Voice Activity Detection. 1938-1942 - Roland Maas, Ariya Rastrow, Kyle Goehner, Gautam Tiwari, Shaun Joseph, Björn Hoffmeister:
Domain-Specific Utterance End-Point Detection for Speech Recognition. 1943-1947 - Vinay Kothapally, John H. L. Hansen:
Speech Detection and Enhancement Using Single Microphone for Distant Speech Applications in Reverberant Environments. 1948-1952
Speech-enhancement
- Yi-Chiao Wu, Hsin-Te Hwang, Syu-Siang Wang, Chin-Cheng Hsu, Yu Tsao, Hsin-Min Wang:
A Post-Filtering Approach Based on Locally Linear Embedding Difference Compensation for Speech Enhancement. 1953-1957 - Hui Zhang, Xueliang Zhang, Guanglai Gao:
Multi-Target Ensemble Learning for Monaural Speech Separation. 1958-1962 - Atsunori Ogawa, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani:
Improved Example-Based Speech Enhancement by Using Deep Neural Network Acoustic Model for Noise Robust Example Search. 1963-1967 - Femke B. Gelderblom, Tron V. Tronstad, Erlend Magnus Viggen:
Subjective Intelligibility of Deep Neural Network-Based Speech Enhancement. 1968-1972 - Maria Koutsogiannaki, Holly Francois, Kihyun Choo, Eunmi Oh:
Real-Time Modulation Enhancement of Temporal Envelopes for Increasing Speech Intelligibility. 1973-1977 - Hans-Günter Hirsch, Michael Gref:
On the Influence of Modifying Magnitude and Phase Spectrum to Enhance Noisy Speech Signals. 1978-1982 - Robert Rehr, Timo Gerkmann:
MixMax Approximation as a Super-Gaussian Log-Spectral Amplitude Estimator for Speech Enhancement. 1983-1987 - Ricard Marxer, Jon Barker:
Binary Mask Estimation Strategies for Constrained Imputation-Based Speech Enhancement. 1988-1992 - Se Rim Park, Jinwon Lee:
A Fully Convolutional Neural Network for Speech Enhancement. 1993-1997 - Li Li, Hirokazu Kameoka, Tomoki Toda, Shoji Makino:
Speech Enhancement Using Non-Negative Spectrogram Models with Mel-Generalized Cepstral Regularization. 1998-2002 - Danny Websdale, Ben Milner:
A Comparison of Perceptually Motivated Loss Functions for Binary Mask Estimation in Speech Separation. 2003-2007 - Daniel Michelsanti, Zheng-Hua Tan:
Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification. 2008-2012 - Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florêncio, Mark Hasegawa-Johnson:
Speech Enhancement Using Bayesian Wavenet. 2013-2017 - Xueliang Zhang, DeLiang Wang:
Binaural Reverberant Speech Separation Based on Deep Neural Networks. 2018-2022 - Tudor-Catalin Zorila, Yannis Stylianou:
On the Quality and Intelligibility of Noisy Speech Processed for Near-End Listening Enhancement. 2023-2027
Show & Tell 3
- Ralf Meermeier, Sean Colbath:
Applications of the BBN Sage Speech Processing Platform. 2028-2029 - Milos Cernak, Alain Komaty, Amir Mohammadi, André Anjos, Sébastien Marcel:
Bob Speaks Kaldi. 2030-2031 - Michal Lenarczyk:
Real Time Pitch Shifting with Formant Structure Preservation Using the Phase Vocoder. 2032-2033 - Nivedita Chennupati, B. H. V. S. Narayana Murthy, B. Yegnanarayana:
A Signal Processing Approach for Speaker Separation Using SFF Analysis. 2034-2035 - Georg Stemmer, Munir Georges, Joachim Hofer, Piotr Rozen, Josef G. Bauer, Jakub Nowicki, Tobias Bocklet, Hannah R. Colett, Ohad Falik, Michael Deisher, Sylvia J. Downing:
Speech Recognition and Understanding on Hardware-Accelerated DSP. 2036-2037 - Sho Tsuji, Christina Bergmann, Molly Lewis, Mika Braginsky, Page Piccinini, Michael C. Frank, Alejandrina Cristià:
MetaLab: A Repository for Meta-Analyses on Language Development, and More. 2038-2039
Show & Tell 4
- Adrien Daniel:
Evolving Recurrent Neural Networks That Process and Classify Raw Audio in a Streaming Fashion. 2040-2041 - Milana Milosevic, Ulrike Glavitsch:
Combining Gaussian Mixture Models and Segmental Feature Models for Speaker Recognition. 2042-2043 - Gerhard Hagerer, Nicholas Cummins, Florian Eyben, Björn W. Schuller:
"Did you laugh enough today?" - Deep Neural Networks for Mobile and Wearable Laughter Trackers. 2044-2045 - Kwang Myung Jeon, Nam Kyun Kim, Chan Woong Kwak, Jung Min Moon, Hong Kook Kim:
Low-Frequency Ultrasonic Communication for Speech Broadcasting in Public Transportation. 2046-2047 - Sean U. N. Wood, Jean Rouat:
Real-Time Speech Enhancement with GCC-NMF: Demonstration on the Raspberry Pi and NVIDIA Jetson. 2048-2049 - Aku Rouhe, Reima Karhila, Peter Smit, Mikko Kurimo:
Reading Validation for Pronunciation Evaluation in the Digitala Project. 2050-2051
Keynote 2: Catherine Pelachaud
- Catherine Pelachaud:
Conversing with Social Agents That Smile and Laugh. 2052
Special Session: Digital Revolution for Under-resourced Languages 1
- Pavlos Papadopoulos, Ruchir Travadi, Colin Vaz, Nikolaos Malandrakis, Ulf Hermjakob, Nima Pourdamghani, Michael Pust, Boliang Zhang, Xiaoman Pan, Di Lu, Ying Lin, Ondrej Glembek, Murali Karthick Baskar, Martin Karafiát, Lukás Burget, Mark Hasegawa-Johnson, Heng Ji, Jonathan May, Kevin Knight, Shrikanth S. Narayanan:
Team ELISA System for DARPA LORELEI Speech Evaluation 2016. 2053-2057 - Péter Mihajlik, Lili Szabó, Balázs Tarján, András Balog, Krisztina Rábai:
First Results in Developing a Medieval Latin Language Charter Dictation System for the East-Central Europe Region. 2058-2062 - Catherine Inez Watson, Peter Keegan, Margaret Maclagan, Ray Harlow, J. King:
The Motivation and Development of MPAi, a Māori Pronunciation Aid. 2063-2067 - Siyuan Feng, Tan Lee:
On the Linguistic Relevance of Speech Units Learned by Unsupervised Acoustic Modeling. 2068-2072 - Amit Das, Mark Hasegawa-Johnson, Karel Veselý:
Deep Auto-Encoder Based Multi-Task Learning Using Probabilistic Transcriptions. 2073-2077 - Alexander Gutkin, Richard Sproat:
Areal and Phylogenetic Features for Multilingual Speech Synthesis. 2078-2082
Special Session: Data Collection, Transcription and Annotation Issues in Child Language Acquisition
- Kathleen Currie Hall, Scott Mackie, Michael Fry, Oksana Tkachman:
SLPAnnotator: Tools for Implementing Sign Language Phonetic Annotation. 2083-2087 - Iris-Corinna Schwarz, Noor Botros, Alekzandra Lord, Amelie Marcusson, Henrik Tidelius, Ellen Marklund:
The LENA System Applied to Swedish: Reliability of the Adult Word Count Estimate. 2088-2092 - Marisa Casillas, Andrei Amatuni, Amanda Seidl, Melanie Soderstrom, Anne S. Warlaumont, Elika Bergelson:
What do Babies Hear? Analyses of Child- and Adult-Directed Speech. 2093-2097 - Marisa Casillas, Elika Bergelson, Anne S. Warlaumont, Alejandrina Cristià, Melanie Soderstrom, Mark VanDam, Han Sloetjes:
A New Workflow for Semi-Automatized Annotations: Tests with Long-Form Naturalistic Recordings of Childrens Language Environments. 2098-2102 - Christina Bergmann, Sho Tsuji, Alejandrina Cristià:
Top-Down versus Bottom-Up Theories of Phonological Acquisition: A Big Data Approach. 2103-2107 - Sho Tsuji, Alejandrina Cristià:
Which Acoustic and Phonological Factors Shape Infants' Vowel Discrimination? Exploiting Natural Variation in InPhonDB. 2108-2112
Special Session: Digital Revolution for Under-resourced Languages 2
- Ailbhe Ní Chasaide, Neasa Ní Chiaráin, Christoph Wendler, Harald Berthelsen, Andy Murphy, Christer Gobl:
The ABAIR Initiative: Bringing Spoken Irish into the Digital Space. 2113-2117 - Armin Saeb, Raghav Menon, Hugh Cameron, William Kibira, John A. Quinn, Thomas Niesler:
Very Low Resource Radio Browsing for Agile Developmental and Humanitarian Monitoring. 2118-2122 - Nikolaos Malandrakis, Ondrej Glembek, Shrikanth S. Narayanan:
Extracting Situation Frames from Non-English Speech: Evaluation Framework and Pilot Results. 2123-2127 - Daniil Kocharov, Tatiana Kachkovskaia, Pavel A. Skrelin:
Eliciting Meaningful Units from Speech. 2128-2132 - Saurabhchand Bhati, Shekhar Nayak, K. Sri Rama Murty:
Unsupervised Speech Signal to Symbol Transformation for Zero Resource Speech Applications. 2133-2137 - Elodie Gauthier, Laurent Besacier, Sylvie Voisin:
Machine Assisted Analysis of Vowel Length Contrasts in Wolof. 2138-2142 - Thomas Glarner, Benedikt T. Boenninghoff, Oliver Walter, Reinhold Haeb-Umbach:
Leveraging Text Data for Word Segmentation for Underresourced Languages. 2143-2147 - Xiaodan Zhuang, Arnab Ghoshal, Antti-Veikko Rosti, Matthias Paulik, Daben Liu:
Improving DNN Bluetooth Narrowband Acoustic Models by Cross-Bandwidth and Cross-Lingual Initialization. 2148-2152 - Basil Abraham, Srinivasan Umesh, Neethu Mariam Joy:
Joint Estimation of Articulatory Features and Acoustic Models for Low-Resource Languages. 2153-2157 - Basil Abraham, Tejaswi Seeram, Srinivasan Umesh:
Transfer Learning and Distillation Techniques to Improve the Acoustic Modeling of Low Resource Languages. 2158-2162 - Inga Rún Helgadóttir, Róbert Kjaran, Anna Björk Nikulásdóttir, Jón Guðnason:
Building an ASR Corpus Using Althingi's Parliamentary Speeches. 2163-2167 - Tanel Alumäe, Andrus Paats, Ivo Fridolin, Einar Meister:
Implementation of a Radiology Speech Recognition System for Estonian Using Open Source Software. 2168-2172 - Jón Guðnason, Matthías Pétursson, Róbert Kjaran, Simon Klüpfel, Anna Björk Nikulásdóttir:
Building ASR Corpora Using Eyra. 2173-2177 - Daniel R. van Niekerk, Charl Johannes van Heerden, Marelie H. Davel, Neil Kleynhans, Oddur Kjartansson, Martin Jansche, Linne Ha:
Rapid Development of TTS Corpora for Four South African Languages. 2178-2182 - Alexander Gutkin:
Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages. 2183-2187 - Joseph Mendelson, Pilar Oplustil, Oliver Watts, Simon King:
Nativization of Foreign Names in TTS for Automatic Reading of World News in Swahili. 2188-2192
Special Session: Computational Models in Child Language Acquisition
- Rong Tong, Nancy F. Chen, Bin Ma:
Multi-Task Learning for Mispronunciation Detection on Singapore Children's Mandarin Speech. 2193-2197 - Elin Larsen, Alejandrina Cristià, Emmanuel Dupoux:
Relating Unsupervised Word Segmentation to Reported Vocabulary Acquisition. 2198-2202 - Mats Wirén, Kristina N. Björkenstam, Robert Östling:
Modelling the Informativeness of Non-Verbal Cues in Parent-Child Interaction. 2203-2207 - Ellen Marklund, David Pagmar, Tove Gerholm, Lisa Gustavsson:
Computational Simulations of Temporal Vocalization Behavior in Adult-Child Interaction. 2208-2212 - Sofia Strömbergsson, Jens Edlund, Jana Götze, Kristina Nilsson Björkenstam:
Approximating Phonotactic Input in Children's Linguistic Environments from Orthographic Transcripts. 2213-2217 - Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux:
Learning Weakly Supervised Multimodal Phoneme Embeddings. 2218-2222
Special Session: Voice Attractiveness
- Yasunari Obuchi:
Personalized Quantification of Voice Attractiveness in Multidimensional Merit Space. 2223-2227 - Hans Rutger Bosker:
The Role of Temporal Amplitude Modulations in the Political Arena: Hillary Clinton vs. Donald Trump. 2228-2232 - Laura Fernández Gallardo, Rafael Zequeira Jiménez, Sebastian Möller:
Perceptual Ratings of Voice Likability Collected Through In-Lab Listening Tests vs. Mobile-Based Crowdsourcing. 2233-2237 - Jürgen Trouvain, Frank Zimmerer:
Attractiveness of French Voices for German Listeners - Results from Native and Non-Native Read Speech. 2238-2242 - Antje Schweitzer, Natalie Lewandowski, Daniel Duran:
Social Attractiveness in Dialogs. 2243-2247 - Eszter Novák-Tót, Oliver Niebuhr, Aoju Chen:
A Gender Bias in the Acoustic-Melodic Features of Charismatic Speech? 2248-2252 - Jan Michalsky, Heike Schoormann:
Pitch Convergence as an Effect of Perceived Attractiveness and Likability. 2253-2256 - Li Jiao, Chengxia Wang, Cristiane Hsu, Peter Birkholz, Yi Xu:
Does Posh English Sound Attractive? 2257-2261 - Timo Baumann:
Large-Scale Speaker Ranking from Crowdsourced Pairwise Listener Ratings. 2262-2266
Speech Production and Physiology
- Rosario Signorello, Sergio Hassid, Didier Demolin:
Aerodynamic Features of French Fricatives. 2267-2271 - Antoine Serrurier, Pierre Badin, Louis-Jean Boë, Laurent Lamalle, Christiane Neuschaefer-Rube:
Inter-Speaker Variability: Speaker Normalisation and Quantitative Estimation of Articulatory Invariants in Speech Production for French. 2272-2276 - Nimisha Patil, Timothy Greer, Reed Blaylock, Shrikanth S. Narayanan:
Comparison of Basic Beatboxing Articulations Between Expert and Novice Artists Using Real-Time Magnetic Resonance Imaging. 2277-2281 - Keyi Tang, Negar M. Harandi, Jonghye Woo, Georges El Fakhri, Maureen Stone, Sidney S. Fels:
Speaker-Specific Biomechanical Model-Based Investigation of a Simple Speech Task Based on Tagged-MRI. 2282-2286 - Reed Blaylock, Nimisha Patil, Timothy Greer, Shrikanth S. Narayanan:
Sounds of the Human Vocal Tract. 2287-2291 - Yasufumi Uezu, Tokihiko Kaburagi:
A Simulation Study on the Effect of Glottal Boundary Conditions on Vocal Tract Formants. 2292-2296
Speech and Harmonic Analysis
- P. Gangamohan, B. Yegnanarayana:
A Robust and Alternative Approach to Zero Frequency Filtering Method for Epoch Extraction. 2297-2300 - Kanru Hua:
Improving YANGsaf F0 Estimator with Adaptive Kalman Filter. 2301-2305 - Jitendra Kumar Dhiman, Nagaraj Adiga, Chandra Sekhar Seelamantula:
A Spectro-Temporal Demodulation Technique for Pitch Estimation. 2306-2310 - Kenichiro Miwa, Masashi Unoki:
Robust Method for Estimating F0 of Complex Tone Based on Pitch Perception of Amplitude Modulated Signal. 2311-2315 - Simon Graf, Tobias Herbig, Markus Buck, Gerhard Schmidt:
Low-Complexity Pitch Estimation Based on Phase Differences Between Low-Resolution Spectra. 2316-2320 - Masanori Morise:
Harvest: A High-Performance Fundamental Frequency Estimator from Speech Signals. 2321-2325
Dialog and Prosody
- Sabrina Stehwien, Ngoc Thang Vu:
Prosodic Event Recognition Using Convolutional Neural Networks with Context Information. 2326-2330 - Ramiro H. Gálvez, Stefan Benus, Agustín Gravano, Marián Trnka:
Prosodic Facilitation and Interference While Judging on the Veracity of Synthesized Statements. 2331-2335 - Margaret Zellers, Antje Schweitzer:
An Investigation of Pitch Matching Across Adjacent Turns in a Corpus of Spontaneous German. 2336-2340 - Sankar Mukherjee, Alessandro D'Ausilio, Noël Nguyen, Luciano Fadiga, Leonardo Badino:
The Relationship Between F0 Synchrony and Speech Convergence in Dyadic Interaction. 2341-2345 - Jordi Luque, Carlos Segura, Ariadna Sánchez, Martí Umbert, Luis Angel Galindo:
The Role of Linguistic and Prosodic Cues on the Prediction of Self-Reported Satisfaction in Contact Centre Phone Calls. 2346-2350 - Pablo Brusco, Juan Manuel Pérez, Agustín Gravano:
Cross-Linguistic Study of the Production of Turn-Taking Cues in American English and Argentine Spanish. 2351-2355
Social Signals, Styles, and Interaction
- Olga Egorow, Andreas Wendemuth:
Emotional Features for Speech Overlaps Classification. 2356-2360 - Chin-Po Chen, Xian-Hong Tseng, Susan Shur-Fen Gau, Chi-Chun Lee:
Computing Multimodal Dyadic Behaviors During Spontaneous Diagnosis Interviews Toward Automatic Categorization of Autism Spectrum Disorder. 2361-2365 - Yun-Shao Lin, Chi-Chun Lee:
Deriving Dyad-Level Interaction Representation Using Interlocutors Structural and Expressive Multimodal Behavior Features. 2366-2370 - Raymond Brueckner, Maximilian Schmitt, Maja Pantic, Björn W. Schuller:
Spotting Social Signals in Conversational Speech over IP: A Deep Learning Perspective. 2371-2375 - Gábor Gosztolya:
Optimized Time Series Filters for Detecting Laughter and Filler Events. 2376-2380 - Fasih Haider, Fahim A. Salim, Saturnino Luz, Carl Vogel, Owen Conlan, Nick Campbell:
Visual, Laughter, Applause and Spoken Expression Features for Predicting Engagement Within TED Talks. 2381-2385
Acoustic Model Adaptation
- Jinyu Li, Michael L. Seltzer, Xi Wang, Rui Zhao, Yifan Gong:
Large-Scale Domain Adaptation via Teacher-Student Learning. 2386-2390 - Waquar Ahmad, Syed Shahnawazuddin, Hemant Kumar Kathania, Gayadhar Pradhan, Arun B. Samaddar:
Improving Children's Speech Recognition Through Explicit Pitch Scaling Based on Iterative Spectrogram Inversion. 2391-2395 - Xurong Xie, Xunying Liu, Tan Lee, Lan Wang:
RNN-LDA Clustering for Feature Based DNN Adaptation. 2396-2400 - Harish Arsikere, Sri Garimella:
Robust Online i-Vectors for Unsupervised Adaptation of DNN Acoustic Models: A Study in the Context of Digital Voice Assistants. 2401-2405 - Ajay Srinivasamurthy, Petr Motlícek, Ivan Himawan, György Szaszák, Youssef Oualil, Hartmut Helmke:
Semi-Supervised Learning with Semantic Knowledge Extraction for Improved Speech Recognition in Air Traffic Control. 2406-2410 - Taesup Kim, Inchul Song, Yoshua Bengio:
Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition. 2411-2415
Cognition and Brain Studies
- Hans Rutger Bosker, Anne Kösem:
An Entrained Rhythm's Frequency, Not Phase, Influences Temporal Sampling of Speech. 2416-2420 - Xiao Wang, Yanhui Zhang, Gang Peng:
Context Regularity Indexed by Auditory N1 and P2 Event-Related Potentials. 2421-2425 - Sakshi Verma, K. L. Prateek, Karthik Pandia, Nauman Dawalatabad, Rogier Landman, Jitendra Sharma, Mriganka Sur, Hema A. Murthy:
Discovering Language in Marmoset Vocalization. 2426-2430 - Hiroki Watanabe, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura:
Subject-Independent Classification of Japanese Spoken Sentences by Multiple Frequency Bands Phase Pattern of EEG Response During Speech Perception. 2431-2435 - Noémie te Rietmolen, Radouane El Yagoubi, Alain Ghio, Corine Astésano:
The Phonological Status of the French Initial Accent and its Role in Semantic Processing: An Event-Related Potentials Study. 2436-2440 - Bin Zhao, Jianwu Dang, Gaoyan Zhang:
A Neuro-Experimental Evidence for the Motor Theory of Speech Perception. 2441-2445
Noise Robust Speech Recognition
- Purvi Agrawal, Sriram Ganapathy:
Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR. 2446-2450 - Masato Mimura, Yoshiaki Bando, Kazuki Shimada, Shinsuke Sakai, Kazuyoshi Yoshii, Tatsuya Kawahara:
Combined Multi-Channel NMF-Based Robust Beamforming for Noisy Speech Recognition. 2451-2455 - Dong Yu, Xuankai Chang, Yanmin Qian:
Recognizing Multi-Talker Speech with Permutation Invariant Training. 2456-2460 - Yuuki Tachioka, Tomohiro Narita, Iori Miura, Takanobu Uramoto, Natsuki Monta, Shingo Uenohara, Ken'ichi Furuya, Shinji Watanabe, Jonathan Le Roux:
Coupled Initialization of Multi-Channel Non-Negative Matrix Factorization Based on Spatial and Spectral Information. 2461-2465 - Erfan Loweimi, Jon Barker, Thomas Hain:
Channel Compensation in the Generalised Vector Taylor Series Approach to Robust ASR. 2466-2470 - Brian John King, I-Fan Chen, Yonatan Vaizman, Yuzong Liu, Roland Maas, Sree Hari Krishnan Parthasarathi, Björn Hoffmeister:
Robust Speech Recognition via Anchor Word Representations. 2471-2475
Topic Spotting, Entity Extraction and Semantic Analysis
- Ankur Bapna, Gökhan Tür, Dilek Hakkani-Tür, Larry P. Heck:
Towards Zero-Shot Frame Semantic Parsing for Domain Scaling. 2476-2480 - Despoina Georgiadou, Vassilios Diakoloukas, Vassilios Tsiaras, Vassilios Digalakis:
ClockWork-RNN Based Architectures for Slot Filling. 2481-2485 - Mohamed Ameur Ben Jannet, Olivier Galibert, Martine Adda-Decker, Sophie Rosset:
Investigating the Effect of ASR Tuning on Named Entity Recognition. 2486-2490 - Marco Dinarelli, Vedran Vukotic, Christian Raymond:
Label-Dependency Coding in Simple Recurrent Networks for Spoken Language Understanding. 2491-2495 - Zhong Meng, Biing-Hwang Juang:
Minimum Semantic Error Cost Training of Deep Long Short-Term Memory Networks for Topic Spotting on Conversational Speech. 2496-2500 - Chunxi Liu, Jan Trmal, Matthew Wiesner, Craig Harman, Sanjeev Khudanpur:
Topic Identification for Speech Without ASR. 2501-2505
Dialog Systems
- Bing Liu, Ian R. Lane:
An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog. 2506-2510 - Heriberto Cuayáhuitl, Seunghak Yu:
Deep Reinforcement Learning of Dialogue Policies with Less Weight Updates. 2511-2515 - Ali Orkan Bayer, Evgeny A. Stepanov, Giuseppe Riccardi:
Towards End-to-End Spoken Dialogue Systems with Turn Embeddings. 2516-2520 - Oleg Akhtiamov, Maxim Sidorov, Alexey A. Karpov, Wolfgang Minker:
Speech and Text Analysis for Multimodal Addressee Detection in Human-Human-Computer Interaction. 2521-2525 - Vikram Ramanarayanan, Chee Wee Leong, David Suendermann-Oeft:
Rushing to Judgement: How do Laypeople Rate Caller Engagement in Thin-Slice Videos of Human-Machine Dialog? 2526-2530 - Ivan Kraljevski, Diane Hirschfeld:
Hyperarticulation of Corrections in Multilingual Dialogue Systems. 2531-2535
Lexical and Pronunciation Modeling
- Benjamin Milde, Christoph Schmidt, Joachim Köhler:
Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion. 2536-2540 - Xiaohui Zhang, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur:
Acoustic Data-Driven Lexicon Learning Based on a Greedy Pronunciation Selection Framework. 2541-2545 - Takahiro Shinozaki, Shinji Watanabe, Daichi Mochihashi, Graham Neubig:
Semi-Supervised Learning of a Pronunciation Dictionary from Disjoint Phonemic Transcripts and Text. 2546-2550 - Peter Smit, Sami Virpioja, Mikko Kurimo:
Improved Subword Modeling for WFST-Based Speech Recognition. 2551-2555 - Antoine Bruguier, Danushen Gnanapragasam, Leif Johnson, Kanishka Rao, Françoise Beaufays:
Pronunciation Learning with RNN-Transducers. 2556-2560 - Einat Naaman, Yossi Adi, Joseph Keshet:
Learning Similarity Functions for Pronunciation Variations. 2561-2565
Language Recognition
- Gregory Gelly, Jean-Luc Gauvain:
Spoken Language Identification Using LSTM-Based Angular Proximity. 2566-2570 - Ma Jin, Yan Song, Ian Vince McLoughlin, Wu Guo, Li-Rong Dai:
End-to-End Language Identification Using High-Order Utterance Representation with Bilinear Pooling. 2571-2575 - Qian Zhang, John H. L. Hansen:
Dialect Recognition Based on Unsupervised Bottleneck Features. 2576-2580 - Saad Irtza, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Haizhou Li:
Investigating Scalability in Hierarchical Language Identification System. 2581-2585 - Yao Qian, Keelan Evanini, Xinhao Wang, David Suendermann-Oeft, Robert A. Pugh, Patrick L. Lange, Hillary R. Molloy, Frank K. Soong:
Improving Sub-Phone Modeling for Better Native Language Identification with Non-Native English Speech. 2586-2590 - Sameer Khurana, Maryam Najafian, Ahmed Ali, Tuka Al Hanai, Yonatan Belinkov, James R. Glass:
QMDIS: QCRI-MIT Advanced Dialect Identification System. 2591-2595
Speaker Database and Anti-spoofing
- K. N. R. K. Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, Anil Kumar Vuppala:
Detection of Replay Attacks Using Single Frequency Filtering Cepstral Coefficients. 2596-2600 - Hardik B. Sailor, Madhu R. Kamble, Hemant A. Patil:
Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection. 2601-2605 - Gajan Suthokumar, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Chamith Wijenayake, Eliathamby Ambikairajah:
Independent Modelling of High and Low Energy Speech Frames for Spoofing Detection. 2606-2610 - Achintya Kumar Sarkar, Md. Sahidullah, Zheng-Hua Tan, Tomi Kinnunen:
Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data. 2611-2615 - Arsha Nagrani, Joon Son Chung, Andrew Zisserman:
VoxCeleb: A Large-Scale Speaker Identification Dataset. 2616-2620 - Karen Jones, Stephanie M. Strassel, Kevin Walker, David Graff, Jonathan Wright:
Call My Net Corpus: A Multilingual Corpus for Evaluation of Speaker Recognition Technology. 2621-2624
Speech Translation
- Ron J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen:
Sequence-to-Sequence Models Can Directly Translate Foreign Speech. 2625-2629 - Takatomo Kano, Sakriani Sakti, Satoshi Nakamura:
Structured-Based Curriculum Learning for End-to-End English-Japanese Speech Translation. 2630-2634 - Nicholas Ruiz, Mattia Antonino Di Gangi, Nicola Bertoldi, Marcello Federico:
Assessing the Tolerance of Neural Machine Translation Systems Against Speech Recognition Errors. 2635-2639 - Quoc Truong Do, Sakriani Sakti, Satoshi Nakamura:
Toward Expressive Speech Translation: A Unified Sequence-to-Sequence LSTMs Approach for Translating Words and Emphasis. 2640-2644 - Eunah Cho, Jan Niehues, Alex Waibel:
NMT-Based Segmentation and Punctuation Insertion for Real-Time Spoken Language Translation. 2645-2649
Multi-channel Speech Enhancement
- Lukas Drude, Reinhold Haeb-Umbach:
Tight Integration of Spatial and Spectral Features for BSS with Deep Clustering Embeddings. 2650-2654 - Katerina Zmolíková, Marc Delcroix, Keisuke Kinoshita, Takuya Higuchi, Atsunori Ogawa, Tomohiro Nakatani:
Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures. 2655-2659 - Lukas Pfeifenberger, Matthias Zöhrer, Franz Pernkopf:
Eigenvector-Based Speech Mask Estimation Using Logistic Regression. 2660-2664 - Sean U. N. Wood, Jean Rouat:
Real-Time Speech Enhancement with GCC-NMF. 2665-2669 - Youna Ji, Jun Byun, Young-Cheol Park:
Coherence-Based Dual-Channel Noise Reduction Algorithm in a Complex Noisy Environment. 2670-2674 - Yang Zhang, Dinei Florêncio, Mark Hasegawa-Johnson:
Glottal Model Based Speech Beamforming for ad-hoc Microphone Arrays. 2675-2679
Speech Recognition: Applications in Medical Practice
- Yuanyuan Liu, Tan Lee, P. C. Ching, Thomas K. T. Law, Kathy Y. S. Lee:
Acoustic Assessment of Disordered Voice with Continuous Speech Based on Utterance-Level ASR Posterior Features. 2680-2684 - Emre Yilmaz, Mario Ganzeboom, Catia Cucchiarini, Helmer Strik:
Multi-Stage DNN Training for Automatic Recognition of Dysarthric Speech. 2685-2689 - Daniel V. Smith, Alex Sneddon, Lauren Ward, Andreas Duenser, Jill Freyne, David Silvera-Tawil, Angela Morgan:
Improving Child Speech Disorder Assessment by Incorporating Out-of-Domain Adult Speech. 2690-2694 - Neethu Mariam Joy, Srinivasan Umesh, Basil Abraham:
On Improving Acoustic Models for TORGO Dysarthric Speech Database. 2695-2699 - Olympia Simantiraki, Paulos Charonyktakis, Anastasia Pampouchidou, Manolis Tsiknakis, Martin Cooke:
Glottal Source Features for Automatic Speech-Based Depression Assessment. 2700-2704 - Roozbeh Sadeghian, J. David Schaffer, Stephen A. Zahorian:
Speech Processing Approach for Diagnosing Dementia in an Early Stage. 2705-2709
Language models for ASR
- Fadi Biadsy, Mohammadreza Ghodsi, Diamantino Caseiro:
Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals. 2710-2714 - Salil Deena, Raymond W. M. Ng, Pranava Swaroop Madhyastha, Lucia Specia, Thomas Hain:
Semi-Supervised Adaptation of RNNLMs by Fine-Tuning with Domain-Specific Auxiliary Features. 2715-2719 - Mittul Singh, Youssef Oualil, Dietrich Klakow:
Approximated and Domain-Adapted LSTM Language Models for First-Pass Decoding in Speech Recognition. 2720-2724 - Ciprian Chelba, Diamantino Caseiro, Fadi Biadsy:
Sparse Non-Negative Matrix Language Modeling: Maximum Entropy Flexibility on the Cheap. 2725-2729 - Manoj Kumar, Daniel Bone, Kelly McWilliams, Shanna Williams, Thomas D. Lyon, Shrikanth S. Narayanan:
Multi-Scale Context Adaptation for Improving Child Automatic Speech Recognition in Child-Adult Spoken Interactions. 2730-2734 - Weiwu Zhu:
Using Knowledge Graph and Search Query Click Logs in Statistical Language Model for Speech Recognition. 2735-2738
Speech Recognition: Technologies for New Applications and Paradigms
- Dimitrios Dimitriadis, Petr Fousek:
Developing On-Line Speaker Diarization System. 2739-2743 - Shreyas Seshadri, Ulpu Remes, Okko Räsänen:
Comparison of Non-Parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing. 2744-2748 - Jorge Proença, Carla Lopes, Michael Tjalve, Andreas Stolcke, Sara Candeias, Fernando Perdigão:
Automatic Evaluation of Children Reading Aloud on Sentences and Pseudowords. 2749-2753 - Su-Youn Yoon, Chong Min Lee, Ikkyu Choi, Xinhao Wang, Matthew Mulholland, Keelan Evanini:
Off-Topic Spoken Response Detection with Word Embeddings. 2754-2758 - Wei Li, Nancy F. Chen, Sabato Marco Siniscalchi, Chin-Hui Lee:
Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep Models. 2759-2763 - Shoko Tsujimura, Kazumasa Yamamoto, Seiichi Nakagawa:
Automatic Explanation Spot Estimation Method Targeted at Text and Figures in Lecture Slides. 2764-2768 - Myung Jong Kim, Beiming Cao, Ted Mau, Jun Wang:
Multiview Representation Learning via Deep CCA for Silent Speech Recognition. 2769-2773 - Kate M. Knill, Mark J. F. Gales, Konstantinos Kyriakopoulos, Anton Ragni, Yu Wang:
Use of Graphemic Lexicons for Spoken Language Assessment. 2774-2778 - Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Ya Li:
Distilling Knowledge from an Ensemble of Models for Punctuation Prediction. 2779-2783 - Ernest Pusateri, Bharat Ram Ambati, Elizabeth Brooks, Ondrej Plátek, Donald McAllaster, Venki Nagesha:
A Mostly Data-Driven Approach to Inverse Text Normalization. 2784-2788 - Wenda Chen, Mark Hasegawa-Johnson, Nancy F. Chen, Boon Pang Lim:
Mismatched Crowdsourcing from Multiple Annotator Languages for Recognizing Zero-Resourced Languages: A Nullspace Clustering Approach. 2789-2793 - William Gale, Sarangarajan Parthasarathy:
Experiments in Character-Level Neural Network Models for Punctuation. 2794-2798 - Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen:
Multi-Channel Apollo Mission Speech Transcripts Calibration. 2799-2803
Speaker and Language Recognition Applications
- Mitchell McLaren, Luciana Ferrer, Diego Castán, Aaron Lawson:
Calibration Approaches for Language Detection. 2804-2808 - Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Julien Epps:
Bidirectional Modelling for Short Duration Language Identification. 2809-2813 - Peng Shen, Xugang Lu, Sheng Li, Hisashi Kawai:
Conditional Generative Adversarial Nets Classifier for Spoken Language Identification. 2814-2818 - Antonio Miguel, Jorge Llombart, Alfonso Ortega, Eduardo Lleida:
Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition. 2819-2823 - Sungrack Yun, Hye Jin Jang, Taesu Kim:
Speaker Clustering by Iteratively Finding Discriminative Feature Space and Cluster Labels. 2824-2828 - Ignacio Viñals, Alfonso Ortega, Jesús Antonio Villalba López, Antonio Miguel, Eduardo Lleida:
Domain Adaptation of PLDA Models in Broadcast Diarization by Means of Unsupervised Speaker Clustering. 2829-2833 - Miquel India, José A. R. Fonollosa, Javier Hernando:
LSTM Neural Network-Based Speaker Segmentation Using Acoustic and Language Modelling. 2834-2838 - Adrien Gresse, Mickael Rouvier, Richard Dufour, Vincent Labatut, Jean-François Bonastre:
Acoustic Pairing of Original and Dubbed Voices in the Context of Video Game Localization. 2839-2843 - Moez Ajili, Jean-François Bonastre, Waad Ben Kheder, Solange Rossato, Juliette Kahn:
Homogeneity Measure Impact on Target and Non-Target Trials in Forensic Voice Comparison. 2844-2848 - Yosef A. Solewicz, Michael Jessen, David van der Vloed:
Null-Hypothesis LLR: A Proposal for Forensic Automatic Speaker Recognition. 2849-2853 - Gang Liu, Qi Qian, Zhibin Wang, Qingen Zhao, Tianzhou Wang, Hao Li, Jian Xue, Shenghuo Zhu, Rong Jin, Tuo Zhao:
The Opensesame NIST 2016 Speaker Recognition Evaluation System. 2854-2858 - Nagendra Kumar, Rohan Kumar Das, Sarfaraz Jelil, Dhanush B. K, H. Kashyap, K. Sri Rama Murty, Sriram Ganapathy, Rohit Sinha, S. R. Mahadeva Prasanna:
IITG-Indigo System for NIST 2016 SRE Challenge. 2859-2863 - Abhinav Misra, Shivesh Ranjan, John H. L. Hansen:
Locally Weighted Linear Discriminant Analysis for Robust Speaker Verification. 2864-2868 - Suwon Shon, Seongkyu Mun, Hanseok Ko:
Recursive Whitening Transformation for Speaker Recognition on Language Mismatched Condition. 2869-2873
Spoken Document Processing
- Shane Settle, Keith D. Levin, Herman Kamper, Karen Livescu:
Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings. 2874-2878 - Daisuke Kaneko, Ryota Konno, Kazunori Kojima, Kazuyo Tanaka, Shi-wook Lee, Yoshiaki Itoh:
Constructing Acoustic Distances Between Subwords and States Obtained from a Deep Neural Network for Spoken Term Detection. 2879-2883 - Yuri Y. Khokhlov, Natalia A. Tomashenko, Ivan Medennikov, Aleksei Romanenko:
Fast and Accurate OOV Decoder on High-Level Features. 2884-2888 - Ying-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang, Berlin Chen:
Exploring the Use of Significant Words Language Modeling for Spoken Document Retrieval. 2889-2893 - Hiroto Tasaki, Tomoyosi Akiba:
Incorporating Acoustic Features for Spontaneous Speech Driven Content Retrieval. 2894-2898 - Bo-Ru Lu, Frank Shyu, Yun-Nung Chen, Hung-yi Lee, Lin-Shan Lee:
Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification. 2899-2903 - Masatoshi Tsuchiya, Ryo Minamiguchi:
Automatic Alignment Between Classroom Lecture Utterances and Slide Components. 2904-2908 - Paula Lopez-Otero, Laura Docío Fernández, Carmen García-Mateo:
Compensating Gender Variability in Query-by-Example Search on Speech Using Voice Conversion. 2909-2913 - Anjishnu Kumar, Pavankumar Reddy Muddireddy, Markus Dreyer, Björn Hoffmeister:
Zero-Shot Learning Across Heterogeneous Overlapping Domains. 2914-2918 - Emiru Tsunoo, Peter Bell, Steve Renals:
Hierarchical Recurrent Neural Network for Story Segmentation. 2919-2923 - Abdessalam Bouchekif, Delphine Charlet, Géraldine Damnati, Nathalie Camelin, Yannick Estève:
Evaluating Automatic Topic Segmentation as a Segment Retrieval Task. 2924-2928 - Jeong-Uk Bang, Mu-Yeol Choi, Sang-Hun Kim, Oh-Wook Kwon:
Improving Speech Recognizers by Refining Broadcast Data with Inaccurate Subtitle Timestamps. 2929-2933 - Jan Svec, Josef V. Psutka, Lubos Smídl, Jan Trmal:
A Relevance Score Estimation for Spoken Term Detection Based on RNN-Generated Pronunciation Embeddings. 2934-2938
Speech Intelligibility
- Laura Fernández Gallardo, Sebastian Möller, John Beerends:
Predicting Automatic Speech Recognition Performance Over Communication Channels from Instrumental Speech Quality and Intelligibility Scores. 2939-2943 - Cassia Valentini-Botinhao, Junichi Yamagishi:
Speech Intelligibility in Cars: The Effect of Speaking Style, Noise and Listener Age. 2944-2948 - Katsuhiko Yamamoto, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani:
Predicting Speech Intelligibility Using a Gammachirp Envelope Distortion Index Based on the Signal-to-Distortion Ratio. 2949-2953 - Yafan Chen, Yong Xu, Jun Yang:
Intelligibilities of Mandarin Chinese Sentences with Spectral "Holes". 2954-2957 - Lauren Ward, Ben G. Shirley, Yan Tang, William J. Davies:
The Effect of Situation-Specific Non-Speech Acoustic Cues on the Intelligibility of Speech in Noise. 2958-2962 - Asger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen:
On the Use of Band Importance Weighting in the Short-Time Objective Intelligibility Measure. 2963-2967 - Constantin Spille, Bernd T. Meyer:
Listening in the Dips: Comparing Relevant Features for Speech Recognition in Humans and Machines. 2968-2972
Articulatory and Acoustic Phonetics
- Kosuke Sugai:
Mental Representation of Japanese Mora; Focusing on its Intrinsic Duration. 2973-2977 - Jia Ying, Christopher Carignan, Jason A. Shaw, Michael I. Proctor, Donald Derrick, Catherine T. Best:
Temporal Dynamics of Lateral Channel Formation in /l/: 3D EMA Data from Australian English. 2978-2982 - Nicola Klingler, Sylvia Moosmüller, Hannes Scheutz:
Vowel and Consonant Sequences in three Bavarian Dialects of Austria. 2983-2987 - Amel Issa:
Acoustic Cues to the Singleton-Geminate Contrast: The Case of Libyan Arabic Sonorants. 2988-2992 - Erika Brandt, Frank Zimmerer, Bistra Andreeva, Bernd Möbius:
Mel-Cepstral Distortion of German Vowels in Different Information Density Contexts. 2993-2997 - Tomás Boril, Pavel Sturm, Radek Skarnitzl, Jan Volín:
Effect of Formant and F0 Discontinuity on Perceived Vowel Duration: Impacts for Concatenative Speech Synthesis. 2998-3002 - Marija Tabain, Richard Beare:
An Ultrasound Study of Alveolar and Retroflex Consonants in Arrernte: Stressed and Unstressed Syllables. 3003-3007 - Christer Gobl:
Reshaping the Transformed LF Model: Generating the Glottal Source from the Waveshape Parameter Rd. 3008-3012 - Stefan Benus, Juraj Simko, Mona Lehtinen:
Kinematic Signatures of Prosody in Lombard Speech. 3013-3017 - Markus Jochim, Felicitas Kleber:
What do Finnish and Central Bavarian Have in Common? Towards an Acoustically Based Quantity Typology. 3018-3022 - Bhanu Teja Nellore, RaviShankar Prasad, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, B. Yegnanarayana:
Locating Burst Onsets Using SFF Envelope and Phase Information. 3023-3027 - Hongwei Ding, Yuanyuan Zhang, Hongchao Liu, Chu-Ren Huang:
A Preliminary Phonetic Investigation of Alphabetic Words in Mandarin Chinese. 3028-3032 - Thomas Schatz, Rory Turnbull, Francis R. Bach, Emmanuel Dupoux:
A Quantitative Measure of the Impact of Coarticulation on Phone Discriminability. 3033-3037
Music and Audio Processing
- Kin Wah Edward Lin, Hans Anderson, Clifford So, Simon Lui:
Sinusoidal Partials Tracking for Singing Analysis Using the Heuristic of the Minimal Frequency and Magnitude Difference. 3038-3042 - Huy Phan, Philipp Koch, Fabrice Katzberg, Marco Maaß, Radoslaw Mazur, Alfred Mertins:
Audio Scene Classification with Deep Recurrent Neural Networks. 3043-3047 - Maria Sandsten, Isabella Reinhold, Josefin Starkhammar:
Automatic Time-Frequency Analysis of Echolocation Signals Using the Matched Gaussian Multitaper Spectrogram. 3048-3052 - Jindrich Matousek, Daniel Tihelka:
Classification-Based Detection of Glottal Closure Instants from Speech Signals. 3053-3057 - Xiaoke Qi, Jianhua Tao:
A Domain Knowledge-Assisted Nonlinear Model for Head-Related Transfer Functions Based on Bottleneck Deep Neural Network. 3058-3062 - Luis M. T. Jesus, Bruno Rocha, Andreia Hall:
Laryngeal Articulation During Trumpet Performance: An Exploratory Study. 3063-3067 - Jian Guan, Xuan Wang, Pengming Feng, Jing Dong, Wenwu Wang:
Matrix of Polynomials Model Based Polynomial Dictionary Learning Method for Acoustic Impulse Response Modeling. 3068-3072 - Rakib Hyder, Shabnam Ghaffarzadegan, Zhe Feng, John H. L. Hansen, Taufiq Hasan:
Acoustic Scene Classification Using a CNN-SuperVector System Trained with Auditory and Spectrogram Image Features. 3073-3077 - Xue Feng, Brigitte Richardson, Scott Amman, James R. Glass:
An Environmental Feature Representation for Robust Speech Recognition and for Environment Identification. 3078-3082 - Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley:
Attention and Localization Based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging. 3083-3087 - Jing Pan, Ming Li, Zhanmei Song, Xin Li, Xiaolin Liu, Hua Yi, Manman Zhu:
An Audio Based Piano Performance Evaluation Method Using Deep Neural Network Based Acoustic Modeling. 3088-3092 - Shreyan Chowdhury, Tanaya Guha, Rajesh M. Hegde:
Music Tempo Estimation Using Sub-Band Synchrony. 3093-3096 - Yun Wang, Florian Metze:
A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification. 3097-3101 - Naziba Mostafa, Pascale Fung:
A Note Based Query By Humming System Using Convolutional Neural Network. 3102-3106 - Hardik B. Sailor, Dharmesh M. Agrawal, Hemant A. Patil:
Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification. 3107-3111 - Meet H. Soni, Rishabh Tak, Hemant A. Patil:
Novel Shifted Real Spectrum for Exact Signal Reconstruction. 3112-3116
Disorders Related to Speech and Language
- Jochen Weiner, Mathis Engelbart, Tanja Schultz:
Manual and Automatic Transcriptions in Dementia Detection from Speech. 3117-3121 - Rahul Gupta, Saurabh Sahu, Carol Y. Espy-Wilson, Shrikanth S. Narayanan:
An Affect Prediction Approach Through Depression Severity Parameter Incorporation in Neural Networks. 3122-3126 - Stephanie Gillespie, Yash-Yee Logan, Elliot Moore, Jacqueline Laures-Gore, Scott Russell, Rupal Patel:
Cross-Database Models for the Classification of Dysarthria Presence. 3127-3131 - Michal Novotný, Jan Rusz, K. Spálenka, Jirí Klempír, Dana Horáková, Evzen Ruzicka:
Acoustic Evaluation of Nasality in Cerebellar Syndromes. 3132-3136 - Simone Hantke, Hesam Sagha, Nicholas Cummins, Björn W. Schuller:
Emotional Speech of Mentally and Physically Disabled Individuals: Introducing the EmotAsS Database and First Findings. 3137-3141 - Carla Agurto, Raquel Norel, Rachel Ostrand, Gillinder Bedi, Harriet de Wit, Matthew J. Baggott, Matthew G. Kirkpatrick, Margaret Wardle, Guillermo A. Cecchi:
Phonological Markers of Oxytocin and MDMA Ingestion. 3142-3146 - Bahman Mirheidari, Daniel Blackburn, Kirsty Harkness, Traci Walker, Annalena Venneri, Markus Reuber, Heidi Christensen:
An Avatar-Based System for Identifying Individuals Likely to Develop Dementia. 3147-3151 - Yue Zhang, Felix Weninger, Björn W. Schuller:
Cross-Domain Classification of Drowsiness in Speech: The Case of Alcohol Intoxication and Sleep Deprivation. 3152-3156 - Paula Lopez-Otero, Laura Docío Fernández, Alberto Abad, Carmen García-Mateo:
Depression Detection Using Automatic Transcriptions of De-Identified Speech. 3157-3161 - Sebastian Wankerl, Elmar Nöth, Stefan Evert:
An N-Gram Based Approach to the Automatic Diagnosis of Alzheimer's Disease from Spoken Language. 3162-3166 - Karel Mundnich, Md. Nasir, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Exploiting Intra-Annotator Rating Consistency Through Copeland's Method for Estimation of Ground Truth Labels in Couples' Therapy. 3167-3171 - Massimo Pettorino, Wentao Gu, Pawel Pólrola, Ping Fan:
Rhythmic Characteristics of Parkinsonian Speech: A Study on Mandarin and Polish. 3172-3176
Prosody
- Jung-Yueh Tu, Janice Wing Sze Wong, Jih-Ho Cha:
Trisyllabic Tone 3 Sandhi Patterns in Mandarin Produced by Cantonese Speakers. 3177-3180 - Heete Sahkai, Meelis Mihkla:
Intonation of Contrastive Topic in Estonian. 3181-3185 - Lixia Hao, Wei Zhang, Yanlu Xie, Jinsong Zhang:
Reanalyze Fundamental Frequency Peak Delay in Mandarin. 3186-3190 - Amandine Michelas, Cecile Cau, Maud Champagne-Lavau:
How Does the Absence of Shared Knowledge Between Interlocutors Affect the Production of French Prosodic Forms? 3191-3195 - Michael Wagner, Michael McAuliffe:
Three Dimensions of Sentence Prosody and Their (Non-)Interactions. 3196-3200 - Janine Kleinhans, Mireia Farrús, Agustín Gravano, Juan Manuel Pérez, Catherine Lai, Leo Wanner:
Using Prosody to Classify Discourse Relations. 3201-3205 - Elizabeth Godoy, James R. Williamson, Thomas F. Quatieri:
Canonical Correlation Analysis and Prediction of Perceived Rhythmic Prominences and Pitch Tones in Speech. 3206-3210 - Sofoklis Kakouros, Okko Räsänen, Paavo Alku:
Evaluation of Spectral Tilt Measures for Sentence Prominence Under Different Noise Conditions. 3211-3215 - Jianjing Kuang:
Creaky Voice as a Function of Tonal Categories and Prosodic Boundaries. 3216-3220 - Radek Skarnitzl, Anders Eriksson:
The Acoustics of Word Stress in Czech as a Function of Speaking Style. 3221-3225 - Petra Wagner, Nataliya Bryhadyr:
What You See is What You Get Prosodically Less - Visibility Shapes Prosodic Prominence Production in Spontaneous Interaction. 3226-3230 - Yu-Yin Hsu, Anqi Xu:
Focus Acoustics in Mandarin Nominals. 3231-3235 - Malin Svensson Lundmark, Gilbert Ambrazaitis, Otto Ewald:
Exploring Multidimensionality: Acoustic and Articulatory Correlates of Swedish Word Accents. 3236-3240 - Karin Puga, Robert Fuchs, Jane Setter, Peggy Mok:
The Perception of English Intonation Patterns by German L2 Speakers of English. 3241-3245
Speaker States and Traits
- Emilia Parada-Cabaleiro, Alice Baird, Anton Batliner, Nicholas Cummins, Simone Hantke, Björn W. Schuller:
The Perception of Emotions in Noisified Nonsense Speech. 3246-3250 - James Gibson, Dogan Can, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan:
Attention Networks for Modeling Behaviors in Addiction Counseling. 3251-3255 - Torsten Wörtwein, Tadas Baltrusaitis, Eugene Laksana, Luciana Pennant, Elizabeth S. Liebson, Dost Öngür, Justin T. Baker, Louis-Philippe Morency:
Computational Analysis of Acoustic Descriptors in Psychotic Patients. 3256-3260 - Ya-Tse Wu, Hsuan-Yu Chen, Yu-Hsien Liao, Li-Wei Kuo, Chi-Chun Lee:
Modeling Perceivers Neural-Responses Using Lobe-Dependent Convolutional Neural Network to Improve Speech Emotion Recognition. 3261-3265 - Bogdan Vlasenko, Hesam Sagha, Nicholas Cummins, Björn W. Schuller:
Implementing Gender-Dependent Vowel-Level Analysis for Boosting Speech-Based Depression Recognition. 3266-3270 - Farhad Bin Siddique, Pascale Fung:
Bilingual Word Embeddings for Cross-Lingual Personality Recognition Using Convolutional Neural Nets. 3271-3275 - Yoshiko Arimoto, Hiroki Mori:
Emotion Category Mapping to Emotional Space by Cross-Corpus Emotion Labeling. 3276-3280 - Cédric Fayet, Arnaud Delhay, Damien Lolive, Pierre-François Marteau:
Big Five vs. Prosodic Features as Cues to Detect Abnormality in SSPNET-Personality Corpus. 3281-3285 - Hayakawa Akira, Carl Vogel, Saturnino Luz, Nick Campbell:
Speech Rate Comparison When Talking to a System and Talking to a Human: A Study from a Speech-to-Speech, Machine Translation Mediated Map Task. 3286-3290 - Shao-Yen Tseng, Brian R. Baucom, Panayiotis G. Georgiou:
Approaching Human Performance in Behavior Estimation in Couples Therapy Using Deep Sentence Embeddings. 3291-3295 - Md. Nasir, Brian R. Baucom, Craig J. Bryan, Shrikanth S. Narayanan, Panayiotis G. Georgiou:
Complexity in Speech and its Relation to Emotional Bond in Therapist-Patient Interactions During Suicide Risk Assessment Interviews. 3296-3300 - Zhaocheng Huang, Julien Epps:
An Investigation of Emotion Dynamics and Kalman Filtering for Speech-Based Emotion Prediction. 3301-3305
Language Understanding and Generation
- Kugatsu Sadamitsu, Yukinori Homma, Ryuichiro Higashinaka, Yoshihiro Matsuo:
Zero-Shot Learning for Natural Language Understanding Using Domain-Independent Sequential Structure and Question Types. 3306-3310 - Naoki Sawada, Ryo Masumura, Hiromitsu Nishizaki:
Parallel Hierarchical Attention Networks with Shared Memory Reader for Multi-Stream Conversational Document Classification. 3311-3315 - Mohamed Morchid:
Internal Memory Gate for Recurrent Neural Networks with Application to Spoken Language Understanding. 3316-3319 - Mandy Korpusik, Zachary Collins, James R. Glass:
Character-Based Embedding Models and Reranking Strategies for Understanding Natural Language Meal Descriptions. 3320-3324 - Titouan Parcollet, Mohamed Morchid, Georges Linarès:
Quaternion Denoising Encoder-Decoder for Theme Identification of Telephone Conversations. 3325-3328 - Edwin Simonnet, Sahar Ghannay, Nathalie Camelin, Yannick Estève, Renato De Mori:
ASR Error Management for Improving Spoken Language Understanding. 3329-3333 - Mingbo Ma, Kai Zhao, Liang Huang, Bing Xiang, Bowen Zhou:
Jointly Trained Sequential Labeling and Classification by Sparse Attention Neural Networks. 3334-3338 - Neha Nayak, Dilek Hakkani-Tür, Marilyn A. Walker, Larry P. Heck:
To Plan or not to Plan? Discourse Planning in Slot-Value Informed Sequence to Sequence Models for Language Generation. 3339-3343 - Matthieu Riou, Bassam Jabaian, Stéphane Huet, Fabrice Lefèvre:
Online Adaptation of an Attention-Based Neural Network for Natural Language Generation. 3344-3348 - Carlos D. Martínez-Hinarejos, Zuzanna Parcheta:
Spanish Sign Language Recognition with Different Topology Hidden Markov Models. 3349-3353 - Michelle Renee Morales, Stefan Scherer, Rivka Levitan:
OpenMM: An Open-Source Multimodal Feature Extraction Tool. 3354-3358 - Yuyun Huang, Emer Gilmartin, Nick Campbell:
Speaker Dependency Analysis, Audiovisual Fusion Cues and a Multimodal BLSTM for Conversational Engagement Recognition. 3359-3363
Voice Conversion 2
- Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang:
Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks. 3364-3368 - Toru Nakashika:
CAB: An Energy-Based Speaker Clustering Model for Rapid Adaptation in Non-Parallel Voice Conversion. 3369-3373 - Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki:
Phoneme-Discriminative Features for Dysarthric Speech Conversion. 3374-3378 - Jie Wu, Dong-Yan Huang, Lei Xie, Haizhou Li:
Denoising Recurrent Neural Network for Deep Bidirectional LSTM Based Voice Conversion. 3379-3383 - Kei Tanaka, Sunao Hara, Masanobu Abe, Masaaki Sato, Shogo Minagi:
Speaker Dependent Approach for Enhancing a Glossectomy Patient's Speech via GMM-Based Voice Conversion. 3384-3388 - Takuhiro Kaneko, Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi:
Generative Adversarial Network-Based Postfilter for STFT Spectrograms. 3389-3393 - Bajibabu Bollepalli, Lauri Juvela, Paavo Alku:
Generative Adversarial Network-Based Glottal Waveform Model for Statistical Parametric Speech Synthesis. 3394-3398 - Zhaojie Luo, Jinhui Chen, Tetsuya Takiguchi, Yasuo Ariki:
Emotional Voice Conversion with Adaptive Scales F0 Based on Wavelet Transform Using Limited Amount of Emotional Data. 3399-3403 - Rama Doddipatla, Norbert Braunschweiler, Ranniery Maia:
Speaker Adaptation in DNN-Based Speech Synthesis Using d-Vectors. 3404-3408 - Runnan Li, Zhiyong Wu, Yishuang Ning, Lifa Sun, Helen Meng, Lianhong Cai:
Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion. 3409-3413 - Miguel Varela Ramos, Alan W. Black, Ramón Fernandez Astudillo, Isabel Trancoso, Nuno Fonseca:
Segment Level Voice Conversion with Recurrent Neural Networks. 3414-3418
Show & Tell 5
- Roger K. Moore, Ben Mitchinson:
Creating a Voice for MiRo, the World's First Commercial Biomimetic Robot. 3419-3420 - Mónica Domínguez, Mireia Farrús, Leo Wanner:
A Thematicity-Based Prosody Enrichment Tool for CTS. 3421-3422 - Martin Gruber, Jindrich Matousek, Zdenek Hanzlícek, Jakub Vít, Daniel Tihelka:
WebSubDub - Experimental System for Creating High-Quality Alternative Audio Track for TV Broadcasting. 3423-3424 - Markéta Juzová, Daniel Tihelka, Jindrich Matousek, Zdenek Hanzlícek:
Voice Conservation and TTS System for People Facing Total Laryngectomy. 3425-3426 - Atish Shankar Ghone, Rachana Nerpagar, Pranaw Kumar, Arun Baby, S. Aswin Shanmugam, M. Sasikumar, Hema A. Murthy:
TBT (Toolkit to Build TTS): A High Performance Framework to Build Multiple Language HTS Voice. 3427-3428 - Reima Karhila, Sari Ylinen, Seppo Enarvi, Kalle J. Palomäki, Aleksander Nikulin, Olli Rantula, Vertti Viitanen, Krupakar Dhinakaran, Anna-Riikka Smolander, Heini Kallio, Katja Junttila, Maria Uther, Perttu Hämäläinen, Mikko Kurimo:
SIAK - A Game for Foreign Language Pronunciation Learning. 3429-3430
Show & Tell 6
- Staffan Larsson, Alexander Berman, Andreas Krona, Fredrik Kronlid:
Integrating the Talkamatic Dialogue Manager with Alexa. 3431-3432 - Farhia Ahmed, Pierrette Bouillon, Chelle Destefano, Johanna Gerlach, Sonia Halimi, Angela Hooper, Manny Rayner, Hervé Spechbach, Irene Strasly, Nikos Tsourakis:
A Robust Medical Speech-to-Speech/Speech-to-Sign Phraselator. 3433-3434 - Frank Duckhorn, Markus Huber, Werner Meyer, Oliver Jokisch, Constanze Tschöpe, Matthias Wolff:
Towards an Autarkic Embedded Cognitive User Interface. 3435-3436 - Genta Indra Winata, Onno Kampman, Yang Yang, Anik Dey, Pascale Fung:
Nora the Empathetic Psychologist. 3437-3438 - Hassan Alam, Aman Kumar, Manan Vyas, Tina Werner, Rachmat Hartono:
Modifying Amazon's Alexa ASR Grammar and Lexicon - A Case Study. 3439-3440
Keynote 3: Björn Lindblom
- Björn Lindblom:
Re-Inventing Speech - The Biological Way. 3441
Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 1
- Björn W. Schuller, Stefan Steidl, Anton Batliner, Elika Bergelson, Jarek Krajewski, Christoph Janott, Andrei Amatuni, Marisa Casillas, Amanda Seidl, Melanie Soderstrom, Anne S. Warlaumont, Guillermo Hidalgo, Sebastian Schnieder, Clemens Heiser, Winfried Hohenhorst, Michael Herzog, Maximilian Schmitt, Kun Qian, Yue Zhang, George Trigeorgis, Panagiotis Tzirakis, Stefanos Zafeiriou:
The INTERSPEECH 2017 Computational Paralinguistics Challenge: Addressee, Cold & Snoring. 3442-3446 - Jarek Krajewski, Sebastian Schnieder, Anton Batliner:
Description of the Upper Respiratory Tract Infection Corpus (URTIC). - Christoph Janott, Anton Batliner:
Description of the Munich-Passau Snore Sound Corpus (MPSSC). - Elika Bergelson, Andrei Amatuni, Marisa Casillas, Amanda Seidl, Melanie Soderstrom, Anne S. Warlaumont:
Description of the Homebank Child/Adult Addressee Corpus (HB-CHAAC). - Mark A. Huckvale, András Beke:
It Sounds Like You Have a Cold! Testing Voice Features for the Interspeech 2017 Computational Paralinguistics Cold Challenge. 3447-3451 - Danwei Cai, Zhidong Ni, Wenbo Liu, Weicheng Cai, Gang Li, Ming Li:
End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum. 3452-3456 - Johannes Wagner, Thiago Fraga-Silva, Yvan Josse, Dominik Schiller, Andreas Seiderer, Elisabeth André:
Infected Phonemes: How a Cold Impairs Speech on a Phonetic Level. 3457-3461 - Akshay Kalkunte Suresh, Srinivasa Raghavan K. M., Prasanta Kumar Ghosh:
Phoneme State Posteriorgram Features for Speech Based Automatic Classification of Speakers in Cold and Healthy Condition. 3462-3466 - Tin Lay Nwe, Tran Huy Dat, Wen Zheng Terence Ng, Bin Ma:
An Integrated Solution for Snoring Sound Classification Using Bhattacharyya Distance Based GMM Supervectors with SVM, Feature Selection with Random Forest and Spectrogram with CNN. 3467-3471
Special Session: State of the Art in Physics-based Voice Simulation
- Tatsuya Kitamura, Hironori Takemoto, Hisanori Makinae, Tetsutaro Yamaguchi, Koutaro Maki:
Acoustic Analysis of Detailed Three-Dimensional Shape of the Human Nasal Cavity and Paranasal Sinuses. 3472-3476 - Marc Arnela, Saeed Dabbaghchian, Oriol Guasch, Olov Engwall:
A Semi-Polar Grid Strategy for the Three-Dimensional Finite Element Simulation of Vowel-Vowel Sequences. 3477-3481 - Arvind Vasudevan, Victor Zappi, Peter Anderson, Sidney S. Fels:
A Fast Robust 1D Flow Model for a Self-Oscillating Coupled 2D FEM Vocal Fold Simulation. 3482-3486 - Tiina Murtola, Jarmo Malinen:
Waveform Patterns in Pitch Glides Near a Vocal Tract Resonance. 3487-3491 - Niyazi Cem Degirmenci, Johan Jansson, Johan Hoffman, Marc Arnela, Patricia Sánchez-Martín, Oriol Guasch, Sten Ternström:
A Unified Numerical Simulation of Vowel Production That Comprises Phonation and the Emitted Sound. 3492-3496 - Saeed Dabbaghchian, Marc Arnela, Olov Engwall, Oriol Guasch:
Synthesis of VV Utterances from Muscle Activation to Sound with a 3D Model. 3497-3501
Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 2
- M. V. Achuth Rao, Shivani Yadav, Prasanta Kumar Ghosh:
A Dual Source-Filter Model of Snore Audio for Snorer Group Classification. 3502-3506 - Michael Freitag, Shahin Amiriparian, Nicholas Cummins, Maurice Gerczuk, Björn W. Schuller:
An 'End-to-Evolution' Hybrid Approach for Snore Sound Classification. 3507-3511 - Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Nicholas Cummins, Michael Freitag, Sergey Pugachevskiy, Alice Baird, Björn W. Schuller:
Snore Sound Classification Using Image-Based Deep Spectrum Features. 3512-3516 - David Tavarez, Xabier Sarasola, Agustín Alonso, Jon Sánchez, Luis Serrano, Eva Navas, Inma Hernáez:
Exploring Fusion Methods and Feature Space for the Classification of Paralinguistic Information. 3517-3521 - Gábor Gosztolya, Róbert Busa-Fekete, Tamás Grósz, László Tóth:
DNN-Based Feature Extraction and Classifier Combination for Child-Directed Speech, Cold and Snoring Identification. 3522-3526 - Heysem Kaya, Alexey A. Karpov:
Introducing Weighted Kernel Classifiers for Handling Imbalanced Paralinguistic Corpora: Snoring, Addressee and Cold. 3527-3531 - Stefan Steidl:
The INTERSPEECH 2017 Computational Paralinguistics Challenge: A Summary of Results. - Björn W. Schuller, Anton Batliner:
Discussion.
Discriminative Training for ASR
- Shubham Toshniwal, Hao Tang, Liang Lu, Karen Livescu:
Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition. 3532-3536 - Matt Shannon:
Optimizing Expected Word Error Rate via Sampling for Speech Recognition. 3537-3541 - Tara N. Sainath, Vijayaditya Peddinti, Olivier Siohan, Arun Narayanan:
Annealed f-Smoothing as a Mechanism to Speed up Neural Network Training. 3542-3546 - Zhong Meng, Biing-Hwang Juang:
Non-Uniform MCE Training of Deep Long Short-Term Memory Recurrent Neural Networks for Keyword Spotting. 3547-3551 - Pranay Dighe, Afsaneh Asaei, Hervé Bourlard:
Exploiting Eigenposteriors for Semi-Supervised Training of DNN Acoustic Models with Sequence Discrimination. 3552-3556 - Ming-Han Yang, Hung-Shin Lee, Yu-Ding Lu, Kuan-Yu Chen, Yu Tsao, Berlin Chen, Hsin-Min Wang:
Discriminative Autoencoders for Acoustic Modeling. 3557-3561
Speaker Diarization
- Zbynek Zajíc, Marek Hrúz, Ludek Müller:
Speaker Diarization Using Convolutional Neural Network for Statistics Accumulation Refinement. 3562-3566 - Arindam Jati, Panayiotis G. Georgiou:
Speaker2Vec: Unsupervised Learning and Adaptation of a Speaker Manifold Using Deep Neural Networks with an Evaluation on Speaker Segmentation. 3567-3571 - Gaël Le Lan, Delphine Charlet, Anthony Larcher, Sylvain Meignier:
A Triplet Ranking-Based Neural Network for Speaker Diarization and Linking. 3572-3576 - Yishai Cohen, Itshak Lapidot:
Estimating Speaker Clustering Quality Using Logistic Regression. 3577-3581 - Guillaume Wisniewski, Hervé Bredin, Gregory Gelly, Claude Barras:
Combining Speaker Turn Embedding and Incremental Structure Prediction for Low-Latency Speaker Diarization. 3582-3586 - Hervé Bredin:
pyannote.metrics: A Toolkit for Reproducible Evaluation, Diagnostic, and Error Analysis of Speaker Diarization Systems. 3587-3591
Spoken Term Detection
- Zhipeng Chen, Ji Wu:
A Rescoring Approach for Keyword Search Using Lattice Context Information. 3592-3596 - Jan Trmal, Matthew Wiesner, Vijayaditya Peddinti, Xiaohui Zhang, Pegah Ghahremani, Yiming Wang, Vimal Manohar, Hainan Xu, Daniel Povey, Sanjeev Khudanpur:
The Kaldi OpenKWS System: Improving Low Resource Keyword Search. 3597-3601 - Yuri Y. Khokhlov, Ivan Medennikov, Aleksei Romanenko, Valentin Mendelev, Maxim Korenevsky, Alexey Prudnikov, Natalia A. Tomashenko, Alexander Zatvornitsky:
The STC Keyword Search System for OpenKWS 2016 Evaluation. 3602-3606 - Ming Sun, David Snyder, Yixin Gao, Varun K. Nagaraja, Mike Rodehorst, Sankaran Panchapagesan, Nikko Strom, Spyros Matsoukas, Shiv Vitaladevuni:
Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting. 3607-3611 - Masayuki Suzuki, Gakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, Kenneth Ward Church, Mark Drake:
Symbol Sequence Search from Telephone Conversation. 3612-3616 - Batuhan Gündogdu, Murat Saraclar:
Similarity Learning Based Query Modeling for Keyword Search. 3617-3621
Noise Reduction
- Suman Samui, Indrajit Chakrabarti, Soumya K. Ghosh:
Deep Recurrent Neural Network Based Monaural Speech Separation Using Recurrent Temporal Restricted Boltzmann Machines. 3622-3626 - Qizheng Huang, Changchun Bao, Xianyun Wang:
Improved Codebook-Based Speech Enhancement Based on MBE Model. 3627-3631 - Zhuo Chen, Yan Huang, Jinyu Li, Yifan Gong:
Improving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual Connection. 3632-3636 - Bi-Cheng Yan, Chin-Hong Shih, Shih-Hung Liu, Berlin Chen:
Exploring Low-Dimensional Structures of Modulation Spectra for Robust Speech Recognition. 3637-3641 - Santiago Pascual, Antonio Bonafonte, Joan Serrà:
SEGAN: Speech Enhancement Generative Adversarial Network. 3642-3646 - Soumi Maiti, Michael I. Mandel:
Concatenative Resynthesis Using Twin Networks. 3647-3651
Speech Recognition: Multimodal Systems
- Themos Stafylakis, Georgios Tzimiropoulos:
Combining Residual Networks with LSTMs for Lipreading. 3652-3656 - Kwanchiva Thangthai, Richard W. Harvey:
Improving Computer Lipreading via DNN Sequence Discriminative Training Techniques. 3657-3661 - Michael Wand, Jürgen Schmidhuber:
Improving Speaker-Independent Lipreading with Domain-Adversarial Training. 3662-3666 - Ahmed Hussen Abdelaziz:
Turbo Decoders for Audio-Visual Continuous Speech Recognition. 3667-3671 - Tamás Gábor Csapó, Tamás Grósz, Gábor Gosztolya, László Tóth, Alexandra Markó:
DNN-Based Ultrasound-to-Speech Conversion for a Silent Speech Interface. 3672-3676 - Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu:
Visually Grounded Learning of Keyword Prediction from Untranscribed Speech. 3677-3681
Neural Network Acoustic Models for ASR 3
- Jen-Tzung Chien, Chen Shen:
Deep Neural Factorization for Speech Recognition. 3682-3686 - Karel Veselý, Lukás Burget, Jan Cernocký:
Semi-Supervised DNN Training with Word Selection for ASR. 3687-3691 - Junfeng Hou, Shiliang Zhang, Li-Rong Dai:
Gaussian Prediction Based Attention for Online End-to-End Speech Recognition. 3692-3696 - Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, Samuel Thomas, Jia Cui, Bhuvana Ramabhadran:
Efficient Knowledge Distillation from an Ensemble of Teachers. 3697-3701 - Rohit Prabhavalkar, Tara N. Sainath, Bo Li, Kanishka Rao, Navdeep Jaitly:
An Analysis of "Attention" in Sequence-to-Sequence Models. 3702-3706 - Hagen Soltau, Hank Liao, Hasim Sak:
Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition. 3707-3711
Robust Speaker Recognition
- Jinxi Guo, Usha Amrutha Nookala, Abeer Alwan:
CNN-Based Joint Mapping of Short and Long Utterance i-Vectors for Speaker Verification Using Short Utterances. 3712-3716 - Shivesh Ranjan, Abhinav Misra, John H. L. Hansen:
Curriculum Learning Based Probabilistic Linear Discriminant Analysis for Noise Robust Speaker Recognition. 3717-3721 - Shivangi Mahto, Hitoshi Yamamoto, Takafumi Koshinaka:
i-Vector Transformation Using a Novel Discriminative Denoising Autoencoder for Noise-Robust Speaker Recognition. 3722-3726 - Qiongqiong Wang, Takafumi Koshinaka:
Unsupervised Discriminative Training of PLDA for Domain Adaptation in Speaker Verification. 3727-3731 - Jahangir Alam, Patrick Kenny, Gautam Bhattacharya, Marcel Kockmann:
Speaker Verification Under Adverse Conditions Using i-Vector Adaptation and Neural Networks. 3732-3736 - Diego Castán, Mitchell McLaren, Luciana Ferrer, Aaron Lawson, Alicia Lozano-Diez:
Improving Robustness of Speaker Recognition to New Conditions Using Unlabeled Data. 3737-3741
Multimodal Resources and Annotation
- Karima Abidi, Mohamed Amine Menacer, Kamel Smaïli:
CALYOU: A Comparable Spoken Algerian Corpus Harvested from YouTube. 3742-3746 - Abhishek Narwekar, Prasanta Kumar Ghosh:
PRAV: A Phonetically Rich Audio Visual Corpus. 3747-3751 - Ahmed Hussen Abdelaziz:
NTCD-TIMIT: A New Database and Baseline for Noise-Robust Audio-Visual Speech Recognition. 3752-3756 - David M. Howcroft, Dietrich Klakow, Vera Demberg:
The Extended SPaRKy Restaurant Corpus: Designing a Corpus with Variable Information Density. 3757-3761 - André Mansikkaniemi, Peter Smit, Mikko Kurimo:
Automatic Construction of the Finnish Parliament Speech Corpus. 3762-3766 - Omnia Abdo, Sherif M. Abdou, Mervat Fashal:
Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech. 3767-3771
Forensic Phonetics and Sociophonetic Varieties
- Vincent Hughes, Paul Foulkes:
What is the Relevant Population? Considerations for the Computation of Likelihood Ratios in Forensic Voice Comparison. 3772-3776 - Véronique Delvaux, Lise Caucheteux, Kathy Huet, Myriam Piccaluga, Bernard Harmegnies:
Voice Disguise vs. Impersonation: Acoustic and Perceptual Measurements of Vocal Flexibility in Non Experts. 3777-3781 - Yaru Wu, Martine Adda-Decker, Cécile Fougeron, Lori Lamel:
Schwa Realization in French: Using Automatic Speech Processing to Study Phonological and Socio-Linguistic Factors in Large Corpora. 3782-3786 - Daniel Duran, Jagoda Bruni, Grzegorz Dogil, Justus Roux:
The Social Life of Setswana Ejectives. 3787-3791 - Lea S. Kohtz, Oliver Niebuhr:
How Long is Too Long? How Pause Features After Requests Affect the Perceived Willingness of Affirmative Answers. 3792-3796 - Iona Gessinger, Eran Raveh, Sébastien Le Maguer, Bernd Möbius, Ingmar Steiner:
Shadowing Synthesized Speech - Segmental Analysis of Phonetic Convergence. 3797-3801
Speech and Audio Segmentation and Classification 1
- Shabnam Ghaffarzadegan, Attila Reiss, Mirko Ruhs, Robert Dürichen, Zhe Feng:
Occupancy Detection in Commercial and Residential Environments Using Audio Signal. 3802-3806 - Tran Huy Dat, Wen Zheng Terence Ng, Yi Ren Leng:
Data Augmentation, Missing Feature Mask and Kernel Classification for Through-the-Wall Acoustic Surveillance. 3807-3811 - Shuo-Yiin Chang, Bo Li, Tara N. Sainath, Gabor Simko, Carolina Parada:
Endpoint Detection Using Grid Long Short-Term Memory Networks for Streaming Speech Recognition. 3812-3816 - Arun Baby, Jeena J. Prakash, S. Rupak Vignesh, Hema A. Murthy:
Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages. 3817-3821 - Yu-Hsuan Wang, Cheng-Tao Chung, Hung-yi Lee:
Gate Activation Signal Analysis for Gated Recurrent Neural Networks and its Correlation with Phoneme Boundaries. 3822-3826 - Ruiqing Yin, Hervé Bredin, Claude Barras:
Speaker Change Detection in Broadcast TV Using Bidirectional Long Short-Term Memory Networks. 3827-3831
Noise Robust and Far-field ASR
- Cong-Thanh Do, Yannis Stylianou:
Improved Automatic Speech Recognition Using Subband Temporal Envelope Features and Time-Delay Neural Network Denoising Autoencoder. 3832-3836 - Masakiyo Fujimoto:
Factored Deep Convolutional Neural Networks for Noise Robust Speech Recognition. 3837-3841 - Pavlos Papadopoulos, Ruchir Travadi, Shrikanth S. Narayanan:
Global SNR Estimation of Speech Signals for Unknown Noise Conditions Using Noise Adapted Non-Linear Regression. 3842-3846 - Fengpei Ge, Kehuang Li, Bo Wu, Sabato Marco Siniscalchi, Yonghong Yan, Chin-Hui Lee:
Joint Training of Multi-Channel-Condition Dereverberation and Acoustic Modeling of Microphone Array Speech for Robust Distant Speech Recognition. 3847-3851 - Dung T. Tran, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani:
Uncertainty Decoding with Adaptive Sampling for Noise Robust DNN-Based Acoustic Modeling. 3852-3856 - Yu Zhang, Pengyuan Zhang, Yonghong Yan:
Attention-Based LSTM with Multi-Task Learning for Distant Speech Recognition. 3857-3861 - Hengguan Huang, Brian Mak:
To Improve the Robustness of LSTM-RNN Acoustic Models Using Higher-Order Feedback from Multiple Histories. 3862-3866 - Suyoun Kim, Ian R. Lane:
End-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech Recognition. 3867-3871 - Anjali Menon, Chanwoo Kim, Richard M. Stern:
Robust Speech Recognition Based on Binaural Auditory Processing. 3872-3876 - Joe Caroselli, Izhak Shafran, Arun Narayanan, Richard Rose:
Adaptive Multichannel Dereverberation for Automatic Speech Recognition. 3877-3881
Styles, Varieties, Forensics and Tools
- Urban Zihlmann:
The Effects of Real and Placebo Alcohol on Deaffrication. 3882-3886 - Michael McAuliffe, Elias Stengel-Eskin, Michaela Socolof, Morgan Sonderegger:
Polyglot and Speech Corpus Tools: A System for Representing, Integrating, and Querying Speech Corpora. 3887-3891 - Vincent Hughes, Philip Harrison, Paul Foulkes, Peter French, Colleen Kavanagh, Eugenia San Segundo:
Mapping Across Feature Spaces in Forensic Voice Comparison: The Contribution of Auditory-Based Voice Quality to (Semi-)Automatic System Testing. 3892-3896 - Pablo Arantes, Anders Eriksson, Suska Gutzeit:
Effect of Language, Speaking Style and Speaker on Long-Term F0 Estimation. 3897-3901 - Jan Volín, Tereza Tykalová, Tomás Boril:
Stability of Prosodic Characteristics Across Age and Gender Groups. 3902-3906 - Julien Plante-Hébert, Victor J. Boucher, Boutheina Jemel:
Electrophysiological Correlates of Familiar Voice Recognition. 3907-3910 - Jamison Cooper-Leavitt, Lori Lamel, Annie Rialland, Martine Adda-Decker, Gilles Adda:
Developing an Embosi (Bantu C25) Speech Variant Dictionary to Model Vowel Elision and Morpheme Deletion. 3911-3915 - Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl:
Rd as a Control Parameter to Explore Affective Correlates of the Tense-Lax Continuum. 3916-3920 - Plínio Almeida Barbosa, Sandra Madureira, Philippe Boula de Mareüil:
Cross-Linguistic Distinctions Between Professional and Non-Professional Speaking Styles. 3921-3925 - Cédric Gendrot:
Perception and Production of Word-Final /ʁ/ in French. 3926-3930 - N. P. Narendra, Manu Airaksinen, Paavo Alku:
Glottal Source Estimation from Coded Telephone Speech Using a Deep Neural Network. 3931-3935 - George Christodoulides, Mathieu Avanzi, Anne-Catherine Simon:
Automatic Labelling of Prosodic Prominence, Phrasing and Disfluencies in French Speech by Simulating the Perception of Naïve and Expert Listeners. 3936-3940 - Michael Levit, Yan Huang, Shuangyu Chang, Yifan Gong:
Don't Count on ASR to Transcribe for You: Breaking Bias with Two Crowds. 3941-3945 - Manu Airaksinen, Paavo Alku:
Effects of Training Data Variety in Generating Glottal Pulses from Acoustic Features with DNNs. 3946-3950 - Simone Hantke, Zixing Zhang, Björn W. Schuller:
Towards Intelligent Crowdsourcing for Audio Data Annotation: Integrating Active Learning in the Real World. 3951-3955
Speech Synthesis: Data, Evaluation, and Novel Paradigms
- Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Junichi Yamagishi:
Principles for Learning Controllable TTS from Annotated and Latent Variation. 3956-3960 - Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari:
Sampling-Based Speech Parameter Generation Using Moment-Matching Networks. 3961-3965 - Vincent Pollet, Enrico Zovato, Sufian Irhimeh, Pier Domenico Batzu:
Unit Selection with Hierarchical Cascaded Long Short Term Memory Bidirectional Recurrent Neural Nets. 3966-3970 - Erica Cooper, Xinyue Wang, Alison Chang, Yocheved Levitan, Julia Hirschberg:
Utterance Selection for Optimizing Intelligibility of TTS Voices Trained on ASR Data. 3971-3975 - Andrew Rosenberg, Bhuvana Ramabhadran:
Bias and Statistical Significance in Evaluating Speech Synthesis with Mean Opinion Scores. 3976-3980 - Nagaraj Adiga, S. R. Mahadeva Prasanna:
Phase Modeling Using Integrated Linear Prediction Residual for Statistical Parametric Speech Synthesis. 3981-3985 - José A. González, Lam Aun Cheah, Phil D. Green, James M. Gilbert, Stephen R. Ell, Roger K. Moore, Ed Holdsworth:
Evaluation of a Silent Speech Interface Based on Magnetic Sensing and Deep Learning for a Phonetically Rich Vocabulary. 3986-3990 - David Greenwood, Stephen D. Laycock, Iain A. Matthews:
Predicting Head Pose from Speech with a Conditional Variational Autoencoder. 3991-3995 - Mirjam Wester, David A. Braude, Blaise Potard, Matthew P. Aylett, Francesca Shaw:
Real-Time Reactive Speech Synthesis: Incorporating Interruptions. 3996-4000 - Merlijn Blaauw, Jordi Bonada:
A Neural Parametric Singing Synthesizer. 4001-4005 - Yuxuan Wang, R. J. Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc V. Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous:
Tacotron: Towards End-to-End Speech Synthesis. 4006-4010 - Tim Capes, Paul Coles, Alistair Conkie, Ladan Golipour, Abie Hadjitarkhani, Qiong Hu, Nancy Huddleston, Melvyn Hunt, Jiangchuan Li, Matthias Neeracher, Kishore Prahallad, Tuomo Raitio, Ramya Rasipuram, Greg Townsend, Becci Williamson, David Winarsky, Zhizheng Wu, Hepeng Zhang:
Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System. 4011-4015 - Daan van Esch, Richard Sproat:
An Expanded Taxonomy of Semiotic Classes for Text Normalization. 4016-4020 - Toru Nakashika, Shinji Takaki, Junichi Yamagishi:
Complex-Valued Restricted Boltzmann Machine for Direct Learning of Frequency Spectra. 4021-4025
Show & Tell 7
- Bartosz Ziólko, Tomasz Pedzimaz, Szymon Piotr Palka:
Soundtracing for Realtime Speech Adjustment to Environmental Conditions in 3D Simulations. 4026-4027 - Takayuki Arai:
Vocal-Tract Model with Static Articulators: Lips, Teeth, Tongue, and More. 4028-4029 - Ikuyo Masuda-Katsuse:
Remote Articulation Test System Based on WebRTC. 4030-4031 - H. Timothy Bunnell, Jason Lilley, Kathleen McGrath:
The ModelTalker Project: A Web-Based Voice Banking Pipeline for ALS/MND Patients. 4032-4033 - Wilbert Heeringa, Hans Van de Velde:
Visible Vowels: A Tool for the Visualization of Vowel Variation. 4034-4035
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.