default search action
20th Interspeech 2019: Graz, Austria
- Gernot Kubin, Zdravko Kacic:
20th Annual Conference of the International Speech Communication Association, Interspeech 2019, Graz, Austria, September 15-19, 2019. ISCA 2019
ISCA Medal 2019 Keynote Speech
- Keiichi Tokuda:
Statistical Approach to Speech Synthesis: Past, Present and Future.
Spoken Language Processing for Children’s Speech
- Fei Wu, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur:
Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network. 1-5 - Gary Yeung, Abeer Alwan:
A Frequency Normalization Technique for Kindergarten Speech Recognition Inspired by the Role of fo in Vowel Perception. 6-10 - Robert Gale, Liu Chen, Jill Dolata, Jan P. H. van Santen, Meysam Asgari:
Improving ASR Systems for Children with Autism and Language Impairment Using Domain-Focused DNN Transfer Techniques. 11-15 - Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals:
Ultrasound Tongue Imaging for Diarization and Alignment of Child Speech Therapy Sessions. 16-20 - Anastassia Loukina, Beata Beigman Klebanov, Patrick L. Lange, Yao Qian, Binod Gyawali, Nitin Madnani, Abhinav Misra, Klaus Zechner, Zuowei Wang, John Sabatini:
Automated Estimation of Oral Reading Fluency During Summer Camp e-Book Reading with MyTurnToRead. 21-25 - Vanessa Lopes, João Magalhães, Sofia Cavaco:
Sustained Vowel Game: A Computer Therapy Game for Children with Dysphonia. 26-30
Dynamics of Emotional Speech Exchanges in Multimodal Communication
- Anna Esposito, Terry Amorese, Marialucia Cuciniello, Maria Teresa Riviello, Antonietta Maria Esposito, Alda Troncone, Gennaro Cordasco:
The Dependability of Voice on Elders' Acceptance of Humanoid Agents. 31-35 - Oliver Niebuhr, Uffe Schjoedt:
God as Interlocutor - Real or Imaginary? Prosodic Markers of Dialogue Speech and Expected Efficacy in Spoken Prayer. 36-40 - Michelle Cohn, Georgia Zellou:
Expressiveness Influences Human Vocal Alignment Toward voice-AI. 41-45 - Catherine Lai, Beatrice Alex, Johanna D. Moore, Leimin Tian, Tatsuro Hori, Gianpiero Francesca:
Detecting Topic-Oriented Speaker Stance in Conversational Speech. 46-50 - Jilt Sebastian, Piero Pierucci:
Fusion Techniques for Utterance-Level Emotion Recognition Combining Speech and Transcripts. 51-55 - Marvin Rajwadi, Cornelius Glackin, Julie A. Wall, Gérard Chollet, Nigel Cannings:
Explaining Sentiment Classification. 56-60 - Ricardo Kleinlein, Cristina Luna Jiménez, Juan Manuel Montero, Zoraida Callejas, Fernando Fernández Martínez:
Predicting Group-Level Skin Attention to Short Movies from Audio-Based LSTM-Mixture of Experts Models. 61-65
End-to-End Speech Recognition
- Ralf Schlüter:
Survey Talk: Modeling in Automatic Speech Recognition: Beyond Hidden Markov Models. - Ngoc-Quan Pham, Thai-Son Nguyen, Jan Niehues, Markus Müller, Alex Waibel:
Very Deep Self-Attention Networks for End-to-End Speech Recognition. 66-70 - Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde:
Jasper: An End-to-End Convolutional Neural Acoustic Model. 71-75 - Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Unidirectional Neural Network Architectures for End-to-End Automatic Speech Recognition. 76-80 - Yonatan Belinkov, Ahmed Ali, James R. Glass:
Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition. 81-85
Speech Enhancement: Multi-Channel
- Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa:
Multi-Channel Speech Enhancement Using Time-Domain Convolutional Denoising Autoencoder. 86-90 - Kristina Tesch, Robert Rehr, Timo Gerkmann:
On Nonlinear Spatial Filtering in Multichannel Speech Enhancement. 91-95 - Juan M. Martín-Doñas, Jens Heitkaemper, Reinhold Haeb-Umbach, Angel M. Gomez, Antonio M. Peinado:
Multi-Channel Block-Online Source Extraction Based on Utterance Adaptation. 96-100 - Saeed Bagheri, Daniele Giacobello:
Exploiting Multi-Channel Speech Presence Probability in Parametric Multi-Channel Wiener Filter. 101-105 - Masahito Togami, Tatsuya Komatsu:
Variational Bayesian Multi-Channel Speech Dereverberation Under Noisy Environments with Probabilistic Convolutive Transfer Function. 106-110 - Tomohiro Nakatani, Keisuke Kinoshita:
Simultaneous Denoising and Dereverberation for Low-Latency Applications Using Frame-by-Frame Online Unified Convolutional Beamformer. 111-115
Speech Production: Individual Differences and the Brain
- Cathryn Snyder, Michelle Cohn, Georgia Zellou:
Individual Variation in Cognitive Processing Style Predicts Differences in Phonetic Imitation of Device and Human Voices. 116-120 - Aravind Illa, Prasanta Kumar Ghosh:
An Investigation on Speaker Specific Articulatory Synthesis with Speaker Independent Articulatory Inversion. 121-125 - Xiaohan Zhang, Chongke Bi, Kiyoshi Honda, Wenhuan Lu, Jianguo Wei:
Individual Difference of Relative Tongue Size and its Acoustic Effects. 126-130 - Tsukasa Yoshinaga, Kazunori Nozaki, Shigeo Wada:
Individual Differences of Airflow and Sound Generation in the Vocal Tract of Sibilant /s/. 131-135 - Shashwat Uttam, Yaman Kumar, Dhruva Sahrawat, Mansi Aggarwal, Rajiv Ratn Shah, Debanjan Mahata, Amanda Stent:
Hush-Hush Speak: Speech Reconstruction Using Silent Videos. 136-140 - Pramit Saha, Muhammad Abdul-Mageed, Sidney S. Fels:
SPEAK YOUR MIND! Towards Imagined Speech Recognition with Hierarchical Deep Learning. 141-145
Speech Signal Characterization 1
- Yu-An Chung, Wei-Ning Hsu, Hao Tang, James R. Glass:
An Unsupervised Autoregressive Model for Speech Representation Learning. 146-150 - Feng Huang, Péter Balázs:
Harmonic-Aligned Frame Mask Based on Non-Stationary Gabor Transform with Application to Content-Dependent Speaker Comparison. 151-155 - Gurunath Reddy M., K. Sreenivasa Rao, Partha Pratim Das:
Glottal Closure Instants Detection from Speech Signal by Deep Features Extracted from Raw Speech and Linear Prediction Residual. 156-160 - Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio:
Learning Problem-Agnostic Speech Representations from Multiple Self-Supervised Tasks. 161-165 - Bhanu Teja Nellore, Sri Harsha Dumpala, Karan Nathwani, Suryakanth V. Gangashetty:
Excitation Source and Vocal Tract System Based Acoustic Features for Detection of Nasals in Continuous Speech. 166-170 - Aggelina Chatziagapi, Georgios Paraskevopoulos, Dimitris Sgouropoulos, Georgios Pantazopoulos, Malvina Nikandrou, Theodoros Giannakopoulos, Athanasios Katsamanis, Alexandros Potamianos, Shrikanth Narayanan:
Data Augmentation Using GANs for Speech Emotion Recognition. 171-175
Neural Waveform Generation
- Zvi Kons, Slava Shechtman, Alexander Sorin, Carmel Rabinovitz, Ron Hoory:
High Quality, Lightweight and Adaptable TTS Using LPCNet. 176-180 - Jaime Lorenzo-Trueba, Thomas Drugman, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote, Alexis Moinet, Vatsal Aggarwal:
Towards Achieving Robust Universal Neural Vocoding. 181-185 - Paarth Neekhara, Chris Donahue, Miller S. Puckette, Shlomo Dubnov, Julian J. McAuley:
Expediting TTS Synthesis with Adversarial Vocoding. 186-190 - Ahmed Mustafa, Arijit Biswas, Christian Bergler, Julia Schottenhamml, Andreas K. Maier:
Analysis by Adversarial Synthesis - A Novel Approach for Speech Vocoding. 191-195 - Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda:
Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation. 196-200 - Xiaohai Tian, Eng Siong Chng, Haizhou Li:
A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data. 201-205
Attention Mechanism for Speaker State Recognition
- Kyu Jeong Han, Ramon Prieto, Tao Ma:
Survey Talk: When Attention Meets Speech Applications: Speech & Speaker Recognition Perspective. - Ziping Zhao, Zhongtian Bao, Zixing Zhang, Nicholas Cummins, Haishuai Wang, Björn W. Schuller:
Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition. 206-210 - Jeng-Lin Li, Chi-Chun Lee:
Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile. 211-215 - Ascensión Gallardo-Antolín, Juan Manuel Montero:
A Saliency-Based Attention LSTM Model for Cognitive Load Classification from Speech. 216-220 - Adria Mallol-Ragolta, Ziping Zhao, Lukas Stappen, Nicholas Cummins, Björn W. Schuller:
A Hierarchical Attention Network-Based Approach for Depression Detection from Transcribed Clinical Interviews. 221-225
ASR Neural Network Training — 1
- Andrea Carmantini, Peter Bell, Steve Renals:
Untranscribed Web Audio for Low Resource Speech Recognition. 226-230 - Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney:
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention. 231-235 - Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu, Shinji Watanabe:
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition. 236-240 - Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong:
Speaker Adaptation for Attention-Based End-to-End Speech Recognition. 241-245 - Peidong Wang, Jia Cui, Chao Weng, Dong Yu:
Large Margin Training for Attention Based End-to-End Speech Recognition. 246-250 - Khoi-Nguyen C. Mac, Xiaodong Cui, Wei Zhang, Michael Picheny:
Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition. 251-255
Zero-Resource ASR
- Benjamin Milde, Chris Biemann:
SparseSpeech: Unsupervised Acoustic Unit Discovery with Memory-Augmented Sequence Autoencoders. 256-260 - Lucas Ondel, Hari Krishna Vydana, Lukás Burget, Jan Cernocký:
Bayesian Subspace Hidden Markov Model for Acoustic Unit Discovery. 261-265 - Yosuke Higuchi, Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa:
Speaker Adversarial Training of DPGMM-Based Feature Extractor for Zero-Resource Languages. 266-270 - Manasa Prasad, Daan van Esch, Sandy Ritchie, Jonas Fromseier Mortensen:
Building Large-Vocabulary ASR Systems for Languages Without Any Audio Training Data. 271-275 - Emmanuel Azuh, David Harwath, James R. Glass:
Towards Bilingual Lexicon Discovery From Visually Grounded Speech Audio. 276-280 - Siyuan Feng, Tan Lee:
Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation. 281-285
Sociophonetics
- Shawn L. Nissen, Sharalee Blunck, Anita Dromey, Christopher Dromey:
Listeners' Ability to Identify the Gender of Preadolescent Children in Different Linguistic Contexts. 286-290 - Wiebke Ahlers, Philipp Meer:
Sibilant Variation in New Englishes: A Comparative Sociophonetic Study of Trinidadian and American English /s(tr)/-Retraction. 291-295 - Michele Gubian, Jonathan Harrington, Mary Stevens, Florian Schiel, Paul Warren:
Tracking the New Zealand English NEAR/SQUARE Merger Using Functional Principal Components Analysis. 296-300 - Iona Gessinger, Bernd Möbius, Bistra Andreeva, Eran Raveh, Ingmar Steiner:
Phonetic Accommodation in a Wizard-of-Oz Experiment: Intonation and Segments. 301-305 - Oliver Niebuhr, Jan Michalsky:
PASCAL and DPA: A Pilot Study on Using Prosodic Competence Scores to Predict Communicative Skills for Team Working and Public Speaking. 306-310 - Jan Michalsky, Heike Schoormann, Thomas Schultze:
Towards the Prosody of Persuasion in Competitive Negotiation. The Relationship Between f0 and Negotiation Success in Same Sex Sales Tasks. 311-315
Resources – Annotation – Evaluation
- Jacob Sager, Ravi Shankar, Jacob Reinhold, Archana Venkataraman:
VESUS: A Crowd-Annotated Database to Study Emotion Production and Perception in Spoken English. 316-320 - Jia Xin Koh, Aqilah Mislan, Kevin Khoo, Brian Ang, Wilson Ang, Charmaine Ng, Ying-Ying Tan:
Building the Singapore English National Speech Corpus. 321-325 - Michael Picheny, Zoltán Tüske, Brian Kingsbury, Kartik Audhkhasi, Xiaodong Cui, George Saon:
Challenging the Boundaries of Speech Recognition: The MALACH Corpus. 326-330 - Pravin Bhaskar Ramteke, Sujata Supanekar, Pradyoth Hegde, Hanna Nelson, Venkataraja Aithal, Shashidhar G. Koolagudi:
NITK Kids' Speech Corpus. 331-335 - Ahmed Ali, Salam Khalifa, Nizar Habash:
Towards Variability Resistant Dialectal Speech Evaluation. 336-340 - Per Fallgren, Zofia Malisz, Jens Edlund:
How to Annotate 100 Hours in 45 Minutes. 341-345
Speaker Recognition and Diarization
- Mireia Díez, Lukás Burget, Shuai Wang, Johan Rohdin, Jan Cernocký:
Bayesian HMM Based x-Vector Clustering for Speaker Diarization. 346-350 - Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen, Takafumi Koshinaka:
Unleashing the Unused Potential of i-Vectors Enabled by GPU Acceleration. 351-355 - Suwon Shon, Najim Dehak, Douglas A. Reynolds, James R. Glass:
MCE 2018: The 1st Multi-Target Speaker Detection and Identification Challenge Evaluation. 356-360 - Zhifu Gao, Yan Song, Ian McLoughlin, Pengcheng Li, Yiheng Jiang, Li-Rong Dai:
Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System. 361-365 - Qingjian Lin, Ruiqing Yin, Ming Li, Hervé Bredin, Claude Barras:
LSTM Based Similarity Measurement with Spectral Clustering for Speaker Diarization. 366-370 - Joon Son Chung, Bong-Jin Lee, Icksang Han:
Who Said That?: Audio-Visual Speaker Diarisation of Real-World Meetings. 371-375 - Jiamin Xie, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur:
Multi-PLDA Diarization on Children's Speech. 376-380 - Alan McCree, Gregory Sell, Daniel Garcia-Romero:
Speaker Diarization Using Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings. 381-385 - Omid Ghahabi, Volker Fischer:
Speaker-Corrupted Embeddings for Online Speaker Diarization. 386-390 - Tae Jin Park, Kyu Jeong Han, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis G. Georgiou, Shrikanth Narayanan:
Speaker Diarization with Lexical Information. 391-395 - Laurent El Shafey, Hagen Soltau, Izhak Shafran:
Joint Speech Recognition and Speaker Diarization via Sequence Transduction. 396-400 - Sandro Cumani:
Normal Variance-Mean Mixtures for Unsupervised Score Calibration. 401-405 - Hitoshi Yamamoto, Kong Aik Lee, Koji Okabe, Takafumi Koshinaka:
Speaker Augmentation and Bandwidth Extension for Deep Speaker Embedding. 406-410 - Emre Yilmaz, Adem Derinel, Kun Zhou, Henk van den Heuvel, Niko Brummer, Haizhou Li, David A. van Leeuwen:
Large-Scale Speaker Diarization of Radio Broadcast Archives. 411-415 - Harishchandra Dubey, Abhijeet Sangwan, John H. L. Hansen:
Toeplitz Inverse Covariance Based Robust Speaker Clustering for Naturalistic Audio Streams. 416-420
ASR for Noisy and Far-Field Speech
- György Kovács, László Tóth, Dirk Van Compernolle, Marcus Liwicki:
Examining the Combination of Multi-Band Processing and Channel Dropout for Robust Speech Recognition. 421-425 - Meet H. Soni, Ashish Panda:
Label Driven Time-Frequency Masking for Robust Continuous Speech Recognition. 426-430 - Long Wu, Hangting Chen, Li Wang, Pengyuan Zhang, Yonghong Yan:
Speaker-Invariant Feature-Mapping for Distant Speech Recognition via Adversarial Teacher-Student Learning. 431-435 - Ji Ming, Danny Crookes:
Full-Sentence Correlation: A Method to Handle Unpredictable Noise for Robust Speech Recognition. 436-440 - Meet H. Soni, Sonal Joshi, Ashish Panda:
Generative Noise Modeling and Channel Simulation for Robust Speech Recognition in Unseen Conditions. 441-445 - Shashi Kumar, Shakti P. Rath:
Far-Field Speech Enhancement Using Heteroscedastic Autoencoder for Improved Speech Recognition. 446-450 - Marc Delcroix, Shinji Watanabe, Tsubasa Ochiai, Keisuke Kinoshita, Shigeki Karita, Atsunori Ogawa, Tomohiro Nakatani:
End-to-End SpeakerBeam for Single Channel Target Speech Recognition. 451-455 - I-Hung Hsu, Ayush Jaiswal, Premkumar Natarajan:
NIESR: Nuisance Invariant End-to-End Speech Recognition. 456-460 - Takahito Suzuki, Jun Ogata, Takashi Tsunakawa, Masafumi Nishida, Masafumi Nishimura:
Knowledge Distillation for Throat Microphone Speech Recognition. 461-465 - Jian Wu, Yong Xu, Shi-Xiong Zhang, Lianwu Chen, Meng Yu, Lei Xie, Dong Yu:
Improved Speaker-Dependent Separation for CHiME-5 Challenge. 466-470 - Peidong Wang, Ke Tan, DeLiang Wang:
Bridging the Gap Between Monaural Speech Enhancement and Recognition with Distortion-Independent Acoustic Modeling. 471-475 - Peidong Wang, DeLiang Wang:
Enhanced Spectral Features for Distortion-Independent Acoustic Modeling. 476-480 - Paarth Neekhara, Shehzeen Hussain, Prakhar Pandey, Shlomo Dubnov, Julian J. McAuley, Farinaz Koushanfar:
Universal Adversarial Perturbations for Speech Recognition Systems. 481-485 - Masakiyo Fujimoto, Hisashi Kawai:
One-Pass Single-Channel Noisy Speech Recognition Using a Combination of Noisy and Enhanced Features. 486-490 - Bin Liu, Shuai Nie, Shan Liang, Wenju Liu, Meng Yu, Lianwu Chen, Shouye Peng, Changliang Li:
Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition. 491-495
Social Signals Detection and Speaker Traits Analysis
- Zixiaofan Yang, Bingyan Hu, Julia Hirschberg:
Predicting Humor by Learning from Time-Aligned Comments. 496-500 - Yoan Dinkov, Ahmed Ali, Ivan Koychev, Preslav Nakov:
Predicting the Leading Political Ideology of YouTube Channels Using Acoustic, Textual, and Metadata Information. 501-505 - Guozhen An, Rivka Levitan:
Mitigating Gender and L1 Differences to Improve State and Trait Recognition. 506-509 - Felix Weninger, Yang Sun, Junho Park, Daniel Willett, Puming Zhan:
Deep Learning Based Mandarin Accent Identification for Accent Robust ASR. 510-514 - Gábor Gosztolya, László Tóth:
Calibrating DNN Posterior Probability Estimates of HMM/DNN Models to Improve Social Signal Detection from Audio Data. 515-519 - Hiroki Mori, Tomohiro Nagata, Yoshiko Arimoto:
Conversational and Social Laughter Synthesis with WaveNet. 520-523 - Bogdan Ludusan, Petra Wagner:
Laughter Dynamics in Dyadic Conversations. 524-528 - Khiet P. Truong, Jürgen Trouvain, Michel-Pierre Jansen:
Towards an Annotation Scheme for Complex Laughter in Speech Corpora. 529-533 - Alice Baird, Shahin Amiriparian, Nicholas Cummins, Sarah Sturmbauer, Johanna Janson, Eva-Maria Meßner, Harald Baumeister, Nicolas Rohleder, Björn W. Schuller:
Using Speech to Predict Sequentially Measured Cortisol Levels During a Trier Social Stress Test. 534-538 - Alice Baird, Eduardo Coutinho, Julia Hirschberg, Björn W. Schuller:
Sincerity in Acted Speech: Presenting the Sincere Apology Corpus and Results. 539-543 - Oliver Niebuhr, Kerstin Fischer:
Do not Hesitate! - Unless You Do it Shortly or Nasally: How the Phonetics of Filled Pauses Determine Their Subjective Frequency and Perceived Speaker Performance. 544-548 - Juan Camilo Vásquez-Correa, Philipp Klumpp, Juan Rafael Orozco-Arroyave, Elmar Nöth:
Phonet: A Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech. 549-553
Applications of Language Technologies
- Ching-Ting Chang, Shun-Po Chuang, Hung-yi Lee:
Code-Switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation. 554-558 - Moritz Meier, Celeste Mason, Felix Putze, Tanja Schultz:
Comparative Analysis of Think-Aloud Methods for Everyday Activities in the Context of Cognitive Robotics. 559-563 - Doug Beeferman, William Brannon, Deb Roy:
RadioTalk: A Large-Scale Corpus of Talk Radio Transcripts. 564-568 - Salima Mdhaffar, Yannick Estève, Nicolas Hernandez, Antoine Laurent, Richard Dufour, Solen Quiniou:
Qualitative Evaluation of ASR Adaptation in a Lecture Context: Application to the PASTEL Corpus. 569-573 - Federico Marinelli, Alessandra Cervone, Giuliano Tortoreto, Evgeny A. Stepanov, Giuseppe Di Fabbrizio, Giuseppe Riccardi:
Active Annotation: Bootstrapping Annotation Lexicon and Guidelines for Supervised NLU Learning. 574-578 - Gerardo Roa Dabike, Jon Barker:
Automatic Lyric Transcription from Karaoke Vocal Tracks: Resources and a Baseline System. 579-583 - Qiang Huang, Thomas Hain:
Detecting Mismatch Between Speech and Transcription Using Cross-Modal Attention. 584-588 - Jazmín Vidal, Luciana Ferrer, Leonardo Brambilla:
EpaDB: A Database for Development of Pronunciation Assessment Systems. 589-593 - Katrin Angerbauer, Heike Adel, Ngoc Thang Vu:
Automatic Compression of Subtitles with Neural Networks and its Effect on User Experience. 594-598 - Hongyin Luo, Mitra Mohtarami, James R. Glass, Karthik Krishnamurthy, Brigitte Richardson:
Integrating Video Retrieval and Moment Detection in a Unified Corpus for Video Question Answering. 599-603
Speech and Audio Characterization and Segmentation
- Sarah E. Gutz, Jun Wang, Yana Yunusova, Jordan R. Green:
Early Identification of Speech Changes Due to Amyotrophic Lateral Sclerosis Using Machine Classification. 604-608 - Mohamed Ismail Yasar Arafath K, Aurobinda Routray:
Automatic Detection of Breath Using Voice Activity Detection and SVM Classifier with Application on News Reports. 609-613 - Hee-Soo Heo, Jee-weon Jung, Hye-jin Shim, Ha-Jin Yu:
Acoustic Scene Classification Using Teacher-Student Learning with Soft-Labels. 614-618 - Yanping Chen, Hongxia Jin:
Rare Sound Event Detection Using Deep Learning and Data Augmentation. 619-623 - Bidisha Sharma, Haizhou Li:
A Combination of Model-Based and Feature-Based Strategy for Speech-to-Singing Alignment. 624-628 - Yosi Shrem, Matthew Goldrick, Joseph Keshet:
Dr.VOT: Measuring Positive and Negative Voice Onset Time in the Wild. 629-633 - Jun Hui, Yue Wei, Shutao Chen, Richard Hau Yue So:
Effects of Base-Frequency and Spectral Envelope on Deep-Learning Speech Separation and Recognition Models. 634-638 - Nirmesh J. Shah, Hemant A. Patil:
Phone Aware Nearest Neighbor Technique Using Spectral Transition Measure for Non-Parallel Voice Conversion. 639-643 - Ravi Shankar, Archana Venkataraman:
Weakly Supervised Syllable Segmentation by Vowel-Consonant Peak Classification. 644-648 - Lukás Mateju, Petr Cerva, Jindrich Zdánský:
An Approach to Online Speaker Change Point Detection Using DNNs and WFSTs. 649-653 - Zhenyu Tang, John D. Kanu, Kevin Hogan, Dinesh Manocha:
Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks. 654-658
Neural Techniques for Voice Conversion and Waveform Generation
- Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou:
Non-Parallel Voice Conversion Using Weighted Generative Adversarial Networks. 659-663 - Ju-Chieh Chou, Hung-yi Lee:
One-Shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization. 664-668 - Hui Lu, Zhiyong Wu, Dongyang Dai, Runnan Li, Shiyin Kang, Jia Jia, Helen Meng:
One-Shot Voice Conversion with Global Speaker Embeddings. 669-673 - Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
Non-Parallel Voice Conversion with Cyclic Variational Autoencoder. 674-678 - Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo:
StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion. 679-683 - Yusuke Kurita, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda:
Robustness of Statistical Voice Conversion Based on Direct Waveform Modification Against Background Sounds. 684-688 - Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma:
Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks. 689-693 - Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku:
GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-Spectrogram. 694-698 - Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim:
Probability Density Distillation with Generative Adversarial Networks for High-Quality Parallel Waveform Generation. 699-703 - Seyed Hamidreza Mohammadi, Taehwan Kim:
One-Shot Voice Conversion with Disentangled Representations by Leveraging Phonetic Posteriorgrams. 704-708 - Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang:
Investigation of F0 Conditioning and Fully Convolutional Networks in Variational Autoencoder Based Voice Conversion. 709-713 - Songxiang Liu, Yuewen Cao, Xixin Wu, Lifa Sun, Xunying Liu, Helen Meng:
Jointly Trained Conversion Model and WaveNet Vocoder for Non-Parallel Voice Conversion Using Mel-Spectrograms and Phonetic Posteriorgrams. 714-718 - Li-Wei Chen, Hung-yi Lee, Yu Tsao:
Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech. 719-723 - Shaojin Ding, Ricardo Gutierrez-Osuna:
Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion. 724-728 - Cory Stephenson, Gokce Keskin, Anil Thomas, Oguz H. Elibol:
Semi-Supervised Voice Conversion with Amortized Variational Inference. 729-733
Model Adaptation for ASR
- Subhadeep Dey, Petr Motlícek, Trung Bui, Franck Dernoncourt:
Exploiting Semi-Supervised Training Through a Dropout Regularization in End-to-End Speech Recognition. 734-738 - Chanwoo Kim, Minkyu Shin, Abhinav Garg, Dhananjaya Gowda:
Improved Vocal Tract Length Perturbation for a State-of-the-Art End-to-End Speech Recognition System. 739-743 - Han Zhu, Li Wang, Pengyuan Zhang, Yonghong Yan:
Multi-Accent Adaptation Based on Gate Mechanism. 744-748 - Pengcheng Guo, Sining Sun, Lei Xie:
Unsupervised Adaptation with Adversarial Dropout Regularization for Robust Speech Recognition. 749-753 - Markus Kitza, Pavel Golik, Ralf Schlüter, Hermann Ney:
Cumulative Adaptation for BLSTM Acoustic Models. 754-758 - Xurong Xie, Xunying Liu, Tan Lee, Lan Wang:
Fast DNN Acoustic Model Speaker Adaptation by Learning Hidden Unit Contribution Features. 759-763 - Emiru Tsunoo, Yosuke Kashiwagi, Satoshi Asakawa, Toshiyuki Kumakura:
End-to-End Adaptation with Backpropagation Through WFST for On-Device Speech Recognition System. 764-768 - Leda Sari, Samuel Thomas, Mark A. Hasegawa-Johnson:
Learning Speaker Aware Offsets for Speaker Adaptation of Neural Networks. 769-773 - Khe Chai Sim, Petr Zadrazil, Françoise Beaufays:
An Investigation into On-Device Personalization of End-to-End Automatic Speech Recognition Models. 774-778 - Abhinav Jain, Vishwanath P. Singh, Shakti P. Rath:
A Multi-Accent Acoustic Model Using Mixture of Experts for Speech Recognition. 779-783 - Joel Shor, Dotan Emanuel, Oran Lang, Omry Tuval, Michael P. Brenner, Julie Cattiau, Fernando Vieira, Maeve McNally, Taylor Charbonneau, Melissa Nollstadt, Avinatan Hassidim, Yossi Matias:
Personalizing ASR for Dysarthric and Accented Speech with Limited Data. 784-788
Dialogue Speech Understanding
- Denis Peskov, Joe Barrow, Pedro Rodriguez, Graham Neubig, Jordan L. Boyd-Graber:
Mitigating Noisy Inputs for Question Answering. 789-793 - Rahul Gupta, Aman Alok, Shankar Ananthakrishnan:
One-vs-All Models for Asynchronous Training: An Empirical Analysis. 794-798 - Gabriel Marzinotto, Géraldine Damnati, Frédéric Béchet:
Adapting a FrameNet Semantic Parser for Spoken Language Understanding Using Adversarial Learning. 799-803 - Titouan Parcollet, Mohamed Morchid, Xavier Bost, Georges Linarès:
M2H-GAN: A GAN-Based Mapping from Machine to Human Transcripts for Speech Understanding. 804-808 - Munir Georges, Krzysztof Czarnowski, Tobias Bocklet:
Ultra-Compact NLU: Neuronal Network Binarization as Regularization. 809-813 - Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, Yoshua Bengio:
Speech Model Pre-Training for End-to-End Spoken Language Understanding. 814-818 - Prashanth Gurunath Shivakumar, Mu Yang, Panayiotis G. Georgiou:
Spoken Language Intent Detection Using Confusion2Vec. 819-823 - Natalia A. Tomashenko, Antoine Caubrière, Yannick Estève:
Investigating Adaptation and Transfer Learning for End-to-End Spoken Language Understanding from Speech. 824-828 - Yuanfeng Song, Di Jiang, Xueyang Wu, Qian Xu, Raymond Chi-Wing Wong, Qiang Yang:
Topic-Aware Dialogue Speech Recognition with Transfer Learning. 829-833 - Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Hosana Kamiyama, Takanobu Oba, Satoshi Kobashikawa, Yushi Aono:
Improving Conversation-Context Language Models with Multiple Spoken Language Understanding Models. 834-838 - Jen-Tzung Chien, Wei Xiang Lieow:
Meta Learning for Hyperparameter Optimization in Dialogue System. 839-843 - Kyle Williams:
Zero Shot Intent Classification Using Long-Short Term Memory Networks. 844-848 - Mandy Korpusik, Zoe Liu, James R. Glass:
A Comparison of Deep Learning Methods for Language Understanding. 849-853 - Yuka Kobayashi, Takami Yoshida, Kenji Iwata, Hiroshi Fujimura:
Slot Filling with Weighted Multi-Encoders for Out-of-Domain Values. 854-858
Speech Production and Silent Interfaces
- Nadee Seneviratne, Ganesh Sivaraman, Carol Y. Espy-Wilson:
Multi-Corpus Acoustic-to-Articulatory Speech Inversion. 859-863 - Debadatta Dash, Alan Wisler, Paul Ferrari, Jun Wang:
Towards a Speaker Independent Speech-BCI Using Speaker Adaptation. 864-868 - Janaki Sheth, Ariel Tankus, Michelle Tran, Lindy Comstock, Itzhak Fried, William Speier:
Identifying Input Features for Development of Real-Time Translation of Neural Signals to Text. 869-873 - Samuel S. Silva, António J. S. Teixeira, Conceição Cunha, Nuno Almeida, Arun A. Joseph, Jens Frahm:
Exploring Critical Articulator Identification from 50Hz RT-MRI Data of the Vocal Tract. 874-878 - Ioannis K. Douros, Anastasiia Tsukanova, Karyna Isaieva, Pierre-André Vuissoz, Yves Laprie:
Towards a Method of Dynamic Vocal Tract Shapes Generation by Combining Static 3D and Dynamic 2D MRI Speech Data. 879-883 - Oksana Rasskazova, Christine Mooshammer, Susanne Fuchs:
Temporal Coordination of Articulatory and Respiratory Events Prior to Speech Initiation. 884-888 - Michele Gubian, Manfred Pastätter, Marianne Pouplier:
Zooming in on Spatiotemporal V-to-C Coarticulation with Functional PCA. 889-893 - Tamás Gábor Csapó, Mohammed Salah Al-Radhi, Géza Németh, Gábor Gosztolya, Tamás Grósz, László Tóth, Alexandra Markó:
Ultrasound-Based Silent Speech Interface Built on a Continuous Vocoder. 894-898 - Eugen Klein, Jana Brunner, Phil Hoole:
Assessing Acoustic and Articulatory Dimensions of Speech Motor Adaptation with Random Forests. 899-903 - Hironori Takemoto, Tsubasa Goto, Yuya Hagihara, Sayaka Hamanaka, Tatsuya Kitamura, Yukiko Nota, Kikuo Maekawa:
Speech Organ Contour Extraction Using Real-Time MRI and Machine Learning Method. 904-908 - K. G. van Leeuwen, P. Bos, Stefano Trebeschi, Maarten J. A. van Alphen, Luuk Voskuilen, Ludi E. Smeele, Ferdi van der Heijden, R. J. J. H. van Son:
CNN-Based Phoneme Classifier from Vocal Tract MRI Learns Embedding Consistent with Articulatory Topology. 909-913 - Doris Mücke, Anne Hermes, Sam Tilsen:
Strength and Structure: Coupling Tones with Oral Constriction Gestures. 914-918
Speech Signal Characterization 2
- W. Bastiaan Kleijn, Felicia S. C. Lim, Michael Chinen, Jan Skoglund:
Salient Speech Representations Based on Cloned Networks. 919-923 - Manoj Kumar Ramanathi, Chiranjeevi Yarra, Prasanta Kumar Ghosh:
ASR Inspired Syllable Stress Detection for Pronunciation Evaluation Without Using a Supervised Classifier and Syllable Level Features. 924-928 - Renuka Mannem, Jhansi Mallela, Aravind Illa, Prasanta Kumar Ghosh:
Acoustic and Articulatory Feature Based Speech Rate Estimation Using a Convolutional Dense Neural Network. 929-933 - Sebastian Springenberg, Egor Lakomkin, Cornelius Weber, Stefan Wermter:
Predictive Auxiliary Variational Autoencoder for Representation Learning of Global Speech Characteristics. 934-938 - Georgios Paraskevopoulos, Efthymios Tzinis, Nikolaos Ellinas, Theodoros Giannakopoulos, Alexandros Potamianos:
Unsupervised Low-Rank Representations for Speech Emotion Recognition. 939-943 - Jitendra Kumar Dhiman, Nagaraj Adiga, Chandra Sekhar Seelamantula:
On the Suitability of the Riesz Spectro-Temporal Envelope for WaveNet Based Speech Synthesis. 944-948 - Xinzhou Xu, Jun Deng, Nicholas Cummins, Zixing Zhang, Li Zhao, Björn W. Schuller:
Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition. 949-953 - Sweekar Sudhakara, Manoj Kumar Ramanathi, Chiranjeevi Yarra, Prasanta Kumar Ghosh:
An Improved Goodness of Pronunciation (GoP) Measure for Pronunciation Evaluation with DNN-HMM System Considering HMM Transition Probabilities. 954-958 - Atreyee Saha, Chiranjeevi Yarra, Prasanta Kumar Ghosh:
Low Resource Automatic Intonation Classification Using Gated Recurrent Unit (GRU) Networks Pre-Trained with Synthesized Pitch Patterns. 959-963
Applications in Language Learning and Healthcare
- Juan Camilo Vásquez-Correa, Tomás Arias-Vergara, Philipp Klumpp, M. Strauss, Arne Küderle, Nils Roth, Sebastian P. Bayerl, Nicanor García-Ospina, Paula Andrea Pérez-Toro, L. Felipe Parra-Gallego, Cristian David Ríos-Urrego, Daniel Escobar-Grisales, Juan Rafael Orozco-Arroyave, Björn M. Eskofier, Elmar Nöth:
Apkinson: A Mobile Solution for Multimodal Assessment of Patients with Parkinson's Disease. 964-965 - Gábor Kiss, Dávid Sztahó, Klára Vicsi:
Depression State Assessment: Application for Detection of Depression by Speech. 966-967 - Chiranjeevi Yarra, Aparna Srinivasan, Sravani Gottimukkala, Prasanta Kumar Ghosh:
SPIRE-fluent: A Self-Learning App for Tutoring Oral Fluency to Second Language English Learners. 968-969 - Shawn L. Nissen, Rebecca Nissen:
Using Real-Time Visual Biofeedback for Second Language Instruction. 970-971 - Avin Miwardelli, Ian Gallagher, Jenny Gibson, Napoleon Katsos, Kate M. Knill, Helena Wood:
Splash: Speech and Language Assessment in Schools and Homes. 972-973 - Colin T. Annand, Maurice Lamb, Sarah Dugan, Sarah R. Li, Hannah M. Woeste, T. Douglas Mast, Michael A. Riley, Jack A. Masterson, Neeraja Mahalingam, Kathryn J. Eary, Caroline Spencer, Suzanne Boyce, Stephanie Jackson, Anoosha Baxi, Reneé Seward:
Using Ultrasound Imaging to Create Augmented Visual Biofeedback for Articulatory Practice. 974-975 - Vasiliy Radostev, Serge Berger, Justin Tabrizi, Pasha Kamyshev, Hisami Suzuki:
Speech-Based Web Navigation for Limited Mobility Users. 976-977
Keynote 2: Tanja Schultz
- Tanja Schultz:
Biosignal Processing for Human-Machine Interaction.
The Second DIHARD Speech Diarization Challenge (DIHARD II)
- Neville Ryant, Kenneth Church, Christopher Cieri, Alejandrina Cristià, Jun Du, Sriram Ganapathy, Mark Liberman:
The Second DIHARD Diarization Challenge: Dataset, Task, and Baselines. 978-982 - Prachi Singh, Harsha Vardhan, Sriram Ganapathy, Ahilan Kanagasundaram:
LEAP Diarization System for the Second DIHARD Challenge. 983-987 - Ignacio Viñals, Pablo Gimeno, Alfonso Ortega Giménez, Antonio Miguel, Eduardo Lleida:
ViVoLAB Speaker Diarization System for the DIHARD 2019 Challenge. 988-992 - Zbynek Zajíc, Marie Kunesová, Marek Hrúz, Jan Vanek:
UWB-NTIS Speaker Diarization System for the DIHARD II 2019 Challenge. 993-997 - Tae Jin Park, Manoj Kumar, Nikolaos Flemotomos, Monisankha Pal, Raghuveer Peri, Rimita Lahiri, Panayiotis G. Georgiou, Shrikanth Narayanan:
The Second DIHARD Challenge: System Description for USC-SAIL Team. 998-1002 - Sergey Novoselov, Aleksei Gusev, Artem Ivanov, Timur Pekhovsky, Andrey Shulipa, Anastasia Avdeeva, Artem Gorlanov, Alexandr Kozlov:
Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II. 1003-1007
The 2019 Automatic Speaker Verification Spoofing and Countermeasures Challenge: ASVspoof Challenge — O
- Massimiliano Todisco, Xin Wang, Ville Vestman, Md. Sahidullah, Héctor Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Tomi H. Kinnunen, Kong Aik Lee:
ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection. 1008-1012
The 2019 Automatic Speaker Verification Spoofing and Countermeasures Challenge: ASVspoof Challenge — P
- Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak:
ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual Networks. 1013-1017 - Bhusan Chettri, Daniel Stoller, Veronica Morfi, Marco A. Martínez Ramírez, Emmanouil Benetos, Bob L. Sturm:
Ensemble Models for Spoofing Detection in Automatic Speaker Verification. 1018-1022 - Weicheng Cai, Haiwei Wu, Danwei Cai, Ming Li:
The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion. 1023-1027 - Radoslaw Bialobrzeski, Michal Kosmider, Mateusz Matuszewski, Marcin Plata, Alexander Rakowski:
Robust Bayesian and Light Neural Networks for Voice Spoofing Detection. 1028-1032 - Galina Lavrentyeva, Sergey Novoselov, Andzhukaev Tseren, Marina Volkova, Artem Gorlanov, Alexandr Kozlov:
STC Antispoofing Systems for the ASVspoof2019 Challenge. 1033-1037 - Yexin Yang, Hongji Wang, Heinrich Dinkel, Zhengyang Chen, Shuai Wang, Yanmin Qian, Kai Yu:
The SJTU Robust Anti-Spoofing System for the ASVspoof 2019 Challenge. 1038-1042 - K. N. R. K. Raju Alluri, Anil Kumar Vuppala:
IIIT-H Spoofing Countermeasures for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2019. 1043-1047 - Rongjin Li, Miao Zhao, Zheng Li, Lin Li, Qingyang Hong:
Anti-Spoofing Speaker Verification System with Multi-Feature Integration and Multi-Task Learning. 1048-1052 - Jennifer Williams, Joanna Rownicka:
Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features. 1053-1057 - Rohan Kumar Das, Jichen Yang, Haizhou Li:
Long Range Acoustic Features for Spoofed Speech Detection. 1058-1062 - Su-Yu Chang, Kai-Cheng Wu, Chia-Ping Chen:
Transfer-Representation Learning for Detecting Spoofing Attacks with Converted and Synthesized Speech in Automatic Speaker Verification System. 1063-1067 - Alejandro Gómez Alanís, Antonio M. Peinado, José A. González, Angel M. Gomez:
A Light Convolutional GRU-RNN Deep Feature Extractor for ASV Spoofing Detection. 1068-1072 - Hossein Zeinali, Themos Stafylakis, Georgia Athanasopoulou, Johan Rohdin, Ioannis Gkinis, Lukás Burget, Jan Cernocký:
Detecting Spoofing Attacks Using VGG and SincNet: BUT-Omilia Submission to ASVspoof 2019 Challenge. 1073-1077 - Moustafa Alzantot, Ziqi Wang, Mani B. Srivastava:
Deep Residual Neural Networks for Audio Spoofing Detection. 1078-1082 - Jee-weon Jung, Hye-jin Shim, Hee-Soo Heo, Ha-Jin Yu:
Replay Attack Detection with Complementary High-Resolution Information Using End-to-End DNN for the ASVspoof 2019 Challenge. 1083-1087
The Zero Resource Speech Challenge 2019: TTS Without T
- Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. Black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux:
The Zero Resource Speech Challenge 2019: TTS Without T. 1088-1092 - Siyuan Feng, Tan Lee, Zhiyuan Peng:
Combining Adversarial Training and Disentangled Speech Representation for Robust Zero-Resource Subword Modeling. 1093-1097 - Bolaji Yusuf, Alican Gök, Batuhan Gündogdu, Oyku Deniz Kose, Murat Saraclar:
Temporally-Aware Acoustic Unit Discovery for Zerospeech 2019 Challenge. 1098-1102 - Ryan Eloff, André Nortje, Benjamin van Niekerk, Avashna Govender, Leanne Nortje, Arnu Pretorius, Elan Van Biljon, Ewald van der Westhuizen, Lisa van Staden, Herman Kamper:
Unsupervised Acoustic Unit Discovery for Speech Synthesis Using Discrete Latent-Variable Neural Networks. 1103-1107 - Andy T. Liu, Po-Chun Hsu, Hung-yi Lee:
Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion. 1108-1112 - Karthik Pandia D. S, Hema A. Murthy:
Zero Resource Speech Synthesis Using Transcripts Derived from Perceptual Acoustic Units. 1113-1117 - Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li, Satoshi Nakamura:
VQVAE Unsupervised Unit Discovery and Multi-Scale Code2Spec Inverter for Zerospeech Challenge 2019. 1118-1122
Speech Translation
- Jan Niehues:
Survey Talk: A Survey on Speech Translation. - Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, Yonghui Wu:
Direct Speech-to-Speech Translation with a Sequence-to-Sequence Model. 1123-1127 - Yuchen Liu, Hao Xiong, Jiajun Zhang, Zhongjun He, Hua Wu, Haifeng Wang, Chengqing Zong:
End-to-End Speech Translation with Knowledge Distillation. 1128-1132 - Mattia Antonino Di Gangi, Matteo Negri, Marco Turchi:
Adapting Transformer to End-to-End Spoken Language Translation. 1133-1137 - Steven Hillis, Anushree Prasanna Kumar, Alan W. Black:
Unsupervised Phonetic and Word Level Discovery for Speech to Speech Translation for Unwritten Languages. 1138-1142
Speaker Recognition 1
- Gautam Bhattacharya, Md. Jahangir Alam, Patrick Kenny:
Deep Speaker Recognition: Modular or Monolithic? 1143-1147 - Shuai Wang, Johan Rohdin, Lukás Burget, Oldrich Plchot, Yanmin Qian, Kai Yu, Jan Cernocký:
On the Usage of Phonetic Information for Text-Independent Speaker Embedding Extraction. 1148-1152 - Mirco Ravanelli, Yoshua Bengio:
Learning Speaker Representations with Mutual Information. 1153-1157 - Lanhua You, Wu Guo, Li-Rong Dai, Jun Du:
Multi-Task Learning with High-Order Statistics for x-Vector Based Text-Independent Speaker Verification. 1158-1162 - Zhanghao Wu, Shuai Wang, Yanmin Qian, Kai Yu:
Data Augmentation Using Variational Autoencoder for Embedding Based Speaker Verification. 1163-1167 - Lanhua You, Wu Guo, Li-Rong Dai, Jun Du:
Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification. 1168-1172
Dialogue Understanding
- Riyaz Ahmad Bhat, John Chen, Rashmi Prasad, Srinivas Bangalore:
Neural Transition Systems for Modeling Hierarchical Semantic Representations. 1173-1177 - Vedran Vukotic, Christian Raymond:
Mining Polysemous Triplets with Recurrent Neural Networks for Spoken Language Understanding. 1178-1182 - Avik Ray, Yilin Shen, Hongxia Jin:
Iterative Delexicalization for Improved Spoken Language Understanding. 1183-1187 - Swapnil Bhosale, Imran A. Sheikh, Sri Harsha Dumpala, Sunil Kumar Kopparapu:
End-to-End Spoken Language Understanding: Bootstrapping in Low Resource Scenarios. 1188-1192 - Hiroaki Takatsu, Katsuya Yokoyama, Yoichi Matsuyama, Hiroshi Honda, Shinya Fujie, Tetsunori Kobayashi:
Recognition of Intentions of Users' Short Responses for Conversational News Delivery System. 1193-1197 - Antoine Caubrière, Natalia A. Tomashenko, Antoine Laurent, Emmanuel Morin, Nathalie Camelin, Yannick Estève:
Curriculum-Based Transfer Learning for an Effective End-to-End Spoken Language Understanding and Domain Portability. 1198-1202
Speech in the Brain
- Debadatta Dash, Paul Ferrari, Jun Wang:
Spatial and Spectral Fingerprint in the Brain: Speaker Identification from Single Trial MEG Signals. 1203-1207 - Annika Nijveld, Louis ten Bosch, Mirjam Ernestus:
ERP Signal Analysis with Temporal Resolution Using a Time Window Bank. 1208-1212 - Louis ten Bosch, Kimberley Mulder, Louis Boves:
Phase Synchronization Between EEG Signals as a Function of Differences Between Stimuli Characteristics. 1213-1217 - Mariya Kharaman, Manluolan Xu, Carsten Eulitz, Bettina Braun:
The Processing of Prosodic Cues to Rhetorical Question Interpretation: Psycholinguistic and Neurolinguistics Evidence. 1218-1222 - Odette Scharenborg, Jiska Koemans, Cybelle Smith, Mark A. Hasegawa-Johnson, Kara D. Federmeier:
The Neural Correlates Underlying Lexically-Guided Perceptual Learning. 1223-1227 - Ivan Halim Parmonangan, Hiroki Tanaka, Sakriani Sakti, Shinnosuke Takamichi, Satoshi Nakamura:
Speech Quality Evaluation of Synthesized Japanese Speech Using EEG. 1228-1232
Far-Field Speech Recognition
- Yiteng Huang, Turaj Zakizadeh Shabestary, Alexander Gruenstein, Li Wan:
Multi-Microphone Adaptive Noise Cancellation for Robust Hotword Detection. 1233-1237 - Shengkui Zhao, Chongjia Ni, Rong Tong, Bin Ma:
Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition. 1238-1242 - Yuri Y. Khokhlov, Alexander Zatvornitskiy, Ivan Medennikov, Ivan Sorokin, Tatiana Prisyach, Aleksei Romanenko, Anton Mitrofanov, Vladimir Bataev, Andrei Andrusenko, Mariya Korenevskaya, Oleg Petrov:
R-Vectors: New Technique for Adaptation to Room Acoustics. 1243-1247 - Naoyuki Kanda, Christoph Böddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach:
Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR. 1248-1252 - Lukas Drude, Jahn Heymann, Reinhold Haeb-Umbach:
Unsupervised Training of Neural Mask-Based Beamforming. 1253-1257 - Feng Ma, Li Chai, Jun Du, Diyuan Liu, Zhongfu Ye, Chin-Hui Lee:
Acoustic Model Ensembling Using Effective Data Augmentation for CHiME-5 Challenge. 1258-1262
Speaker and Language Recognition 1
- Ming Li, Weicheng Cai, Danwei Cai:
Survey Talk: End-to-End Deep Neural Network Based Speaker and Language Recognition. - Bharat Padi, Anand Mohan, Sriram Ganapathy:
Attention Based Hybrid i-Vector BLSTM Model for Language Recognition. 1263-1267 - Jee-weon Jung, Hee-Soo Heo, Ju-ho Kim, Hye-jin Shim, Ha-Jin Yu:
RawNet: Advanced End-to-End Deep Neural Network Using Raw Waveforms for Text-Independent Speaker Verification. 1268-1272 - Wei Rao, Chenglin Xu, Eng Siong Chng, Haizhou Li:
Target Speaker Extraction for Multi-Talker Speaker Verification. 1273-1277 - Hanna Mazzawi, Xavi Gonzalvo, Aleks Kracun, Prashant Sridhar, Niranjan Subrahmanya, Ignacio López-Moreno, Hyun-Jin Park, Patrick Violette:
Improving Keyword Spotting and Language Identification via Neural Architecture Search at Scale. 1278-1282
Speech Synthesis: Towards End-to-End
- Yibin Zheng, Xi Wang, Lei He, Shifeng Pan, Frank K. Soong, Zhengqi Wen, Jianhua Tao:
Forward-Backward Decoding for Regularizing End-to-End TTS. 1283-1287 - Haohan Guo, Frank K. Soong, Lei He, Lei Xie:
A New GAN-Based End-to-End TTS Training Algorithm. 1288-1292 - Mutian He, Yan Deng, Lei He:
Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS. 1293-1297 - Mingyang Zhang, Xin Wang, Fuming Fang, Haizhou Li, Junichi Yamagishi:
Joint Training Framework for Text-to-Speech and Voice Conversion Using Multi-Source Tacotron and WaveNet. 1298-1302 - Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa:
Training Multi-Speaker Neural Text-to-Speech Systems Using Speaker-Imbalanced Speech Corpora. 1303-1307 - Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Real-Time Neural Text-to-Speech with Sequence-to-Sequence Acoustic Model and WaveGlow or Single Gaussian WaveRNN Vocoders. 1308-1312
Semantic Analysis and Classification
- Sushant Kafle, Cecilia Ovesdotter Alm, Matt Huenerfauth:
Fusion Strategy for Prosodic and Lexical Representations of Word Importance. 1313-1317 - Jen-Tzung Chien, Chun-Wei Wang:
Self Attention in Variational Sequential Learning for Summarization. 1318-1322 - Zhongkai Sun, Prathusha Kameswara Sarma, William A. Sethares, Erik P. Bucy:
Multi-Modal Sentiment Analysis Using Deep Canonical Correlation Analysis. 1323-1327 - Yilin Shen, Wenhu Chen, Hongxia Jin:
Interpreting and Improving Deep Neural SLU Models via Vocabulary Importance. 1328-1332 - Máté Ákos Tündik, Valér Kaszás, György Szaszák:
Assessing the Semantic Space Bias Caused by ASR Error Propagation and its Effect on Spoken Document Summarization. 1333-1337 - Peisong Huang, Peijie Huang, Wencheng Ai, Jiande Ding, Jinchuan Zhang:
Latent Topic Attention for Domain Classification. 1338-1342
Speech and Audio Source Separation and Scene Analysis 1
- Chaitanya Narisetty:
A Unified Bayesian Source Modelling for Determined Blind Source Separation. 1343-1347 - Naoya Takahashi, Sudarsanam Parthasaarathy, Nabarun Goswami, Yuki Mitsufuji:
Recursive Speech Separation for Unknown Number of Speakers. 1348-1352 - Pieter Appeltans, Jeroen Zegers, Hugo Van hamme:
Practical Applicability of Deep Neural Networks for Overlapping Speaker Separation. 1353-1357 - Zhaoyi Gu, Jing Lu, Kai Chen:
Speech Separation Using Independent Vector Analysis with an Amplitude Variable Gaussian Mixture Model. 1358-1362 - Gene-Ping Yang, Chao-I Tuan, Hung-yi Lee, Lin-Shan Lee:
Improved Speech Separation with Time-and-Frequency Cross-Domain Joint Embedding and Clustering. 1363-1367 - Gordon Wichern, Joe Antognini, Michael Flynn, Licheng Richard Zhu, Emmett McQuinn, Dwight Crow, Ethan Manilow, Jonathan Le Roux:
WHAM!: Extending Speech Separation to Noisy Environments. 1368-1372
Speech Intelligibility
- Andreas Nautsch:
Survey Talk: Preserving Privacy in Speaker and Speech Characterisation. - Carol Chermaz, Cassia Valentini-Botinhao, Henning F. Schepker, Simon King:
Evaluating Near End Listening Enhancement Algorithms in Realistic Environments. 1373-1377 - Amin Edraki, Wai-Yip Chan, Jesper Jensen, Daniel Fogerty:
Improvement and Assessment of Spectro-Temporal Modulation Analysis for Speech Intelligibility Estimation. 1378-1382 - Zhuohuang Zhang, Yi Shen:
Listener Preference on the Local Criterion for Ideal Binary-Masked Speech. 1383-1387 - Tuan Dinh, Alexander Kain, Kris Tjaden:
Using a Manifold Vocoder for Spectral Voice and Style Conversion. 1388-1392
ASR Neural Network Architectures 1
- Patrick von Platen, Chao Zhang, Philip C. Woodland:
Multi-Span Acoustic Modelling Using Raw Waveform Signals. 1393-1397 - André Merboldt, Albert Zeyer, Ralf Schlüter, Hermann Ney:
An Analysis of Local Monotonic Attention Variants. 1398-1402 - Eric Sun, Jinyu Li, Yifan Gong:
Layer Trajectory BLSTM. 1403-1407 - Shigeki Karita, Nelson Enrique Yalta Soplin, Shinji Watanabe, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani:
Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration. 1408-1412 - Shucong Zhang, Erfan Loweimi, Yumo Xu, Peter Bell, Steve Renals:
Trainable Dynamic Subsampling for End-to-End Speech Recognition. 1413-1417 - Ding Zhao, Tara N. Sainath, David Rybach, Pat Rondon, Deepti Bhatia, Bo Li, Ruoming Pang:
Shallow-Fusion End-to-End Contextual Biasing. 1418-1422
Speech and Language Analytics for Mental Health
- Md. Nasir, Sandeep Nallan Chakravarthula, Brian R. W. Baucom, David C. Atkins, Panayiotis G. Georgiou, Shrikanth Narayanan:
Modeling Interpersonal Linguistic Coordination in Conversations Using Word Mover's Distance. 1423-1427 - Wenchao Du, Louis-Philippe Morency, Jeffrey F. Cohn, Alan W. Black:
Bag-of-Acoustic-Words for Mental Health Assessment: A Deep Autoencoding Approach. 1428-1432 - Rohit Voleti, Stephanie Woolridge, Julie M. Liss, Melissa Milanovic, Christopher R. Bowie, Visar Berisha:
Objective Assessment of Social Skills Using Automated Language Analysis for Identification of Schizophrenia and Bipolar Disorder. 1433-1437 - Katie Matton, Melvin G. McInnis, Emily Mower Provost:
Into the Wild: Transitioning from Recognizing Mood in Clinical Interactions to Personal Conversations for Individuals with Bipolar Disorder. 1438-1442 - Morteza Rohanian, Julian Hough, Matthew Purver:
Detecting Depression with Word-Level Multimodal Fusion. 1443-1447 - Carol Y. Espy-Wilson, Adam C. Lammert, Nadee Seneviratne, Thomas F. Quatieri:
Assessing Neuromotor Coordination in Depression Using Inverted Vocal Tract Variables. 1448-1452
Dialogue Modelling
- Shachi Paul, Rahul Goel, Dilek Hakkani-Tür:
Towards Universal Dialogue Act Tagging for Task-Oriented Dialogues. 1453-1457 - Rahul Goel, Shachi Paul, Dilek Hakkani-Tür:
HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking. 1458-1462 - Jirí Martínek, Pavel Král, Ladislav Lenc, Christophe Cerisara:
Multi-Lingual Dialogue Act Recognition with Deep Learning Methods. 1463-1467 - Guan-Lin Chao, Ian R. Lane:
BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer. 1468-1472 - David Griol, Zoraida Callejas:
Discovering Dialog Rules by Means of an Evolutionary Approach. 1473-1477 - Xi C. Chen, Adithya Sagar, Justine T. Kao, Tony Y. Li, Christopher Klein, Stephen Pulman, Ashish Garg, Jason D. Williams:
Active Learning for Domain Classification in a Commercial Spoken Personal Assistant. 1478-1482
Speaker Recognition Evaluation
- Seyed Omid Sadjadi, Craig S. Greenberg, Elliot Singer, Douglas A. Reynolds, Lisa P. Mason, Jaime Hernandez-Cordero:
The 2018 NIST Speaker Recognition Evaluation. 1483-1487 - Jesús Villalba, Nanxin Chen, David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Jonas Borgstrom, Fred Richardson, Suwon Shon, François Grondin, Réda Dehak, Leibny Paola García-Perera, Daniel Povey, Pedro A. Torres-Carrasquillo, Sanjeev Khudanpur, Najim Dehak:
State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18. 1488-1492 - Daniel Garcia-Romero, David Snyder, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur:
x-Vector DNN Refinement with Full-Length Recordings for Speaker Recognition. 1493-1496 - Kong Aik Lee, Ville Hautamäki, Tomi H. Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Héctor Delgado, Massimiliano Todisco:
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences. 1497-1501 - Elie Khoury, Khaled Lakhdhar, Andrew Vaughan, Ganesh Sivaraman, Parav Nagarsheth:
Pindrop Labs' Submission to the First Multi-Target Speaker Detection and Identification Challenge. 1502-1505 - Daniel Garcia-Romero, David Snyder, Shinji Watanabe, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur:
Speaker Recognition Benchmark Using the CHiME-5 Corpus. 1506-1510
Speech Synthesis: Data and Evaluation
- David Ayllón, Héctor A. Sánchez-Hevia, Carol Figueroa, Pierre Lanchantin:
Investigating the Effects of Noisy and Reverberant Speech in Text-to-Speech Systems. 1511-1515 - Fang-Yu Kuo, Iris Chuoying Ouyang, Sandesh Aryal, Pierre Lanchantin:
Selection and Training Schemes for Improving TTS Voice Built on Found Data. 1516-1520 - David A. Braude, Matthew P. Aylett, Caoimhín Laoide-Kemp, Simone Ashby, Kristen M. Scott, Brian Ó Raghallaigh, Anna Braudo, Alex Brouwer, Adriana Stan:
All Together Now: The Living Audio Dataset. 1521-1525 - Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu:
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech. 1526-1530 - Meysam Shamsi, Damien Lolive, Nelly Barbot, Jonathan Chevelu:
Corpus Design Using Convolutional Auto-Encoder Embeddings for Audio-Book Synthesis. 1531-1535 - Nobukatsu Hojo, Noboru Miyazaki:
Evaluating Intention Communication by TTS Using Explicit Definitions of Illocutionary Act Performance. 1536-1540 - Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang:
MOSNet: Deep Learning-Based Objective Assessment for Voice Conversion. 1541-1545 - Jason Fong, Pilar Oplustil Gallegos, Zack Hodari, Simon King:
Investigating the Robustness of Sequence-to-Sequence Text-to-Speech Models to Imperfectly-Transcribed Training Data. 1546-1550 - Avashna Govender, Anita E. Wagner, Simon King:
Using Pupil Dilation to Measure Cognitive Load When Listening to Text-to-Speech in Quiet and in Noise. 1551-1555 - Ioannis K. Douros, Jacques Felblinger, Jens Frahm, Karyna Isaieva, Arun A. Joseph, Yves Laprie, Freddy Odille, Anastasiia Tsukanova, Dirk Voit, Pierre-André Vuissoz:
A Multimodal Real-Time MRI Articulatory Corpus of French for Speech Research. 1556-1560 - Jia-Xiang Chen, Zhen-Hua Ling, Li-Rong Dai:
A Chinese Dataset for Identifying Speakers in Novels. 1561-1565 - Kyubyong Park, Thomas Mulc:
CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages. 1566-1570
Model Training for ASR
- Ievgen Karaulov, Dmytro Tkanov:
Attention Model for Articulatory Features Detection. 1571-1575 - Sibo Tong, Apoorv Vyas, Philip N. Garner, Hervé Bourlard:
Unbiased Semi-Supervised LF-MMI Training Using Dropout. 1576-1580 - Xiaodong Cui, Michael Picheny:
Acoustic Model Optimization Based on Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition. 1581-1585 - Nirmesh J. Shah, Hardik B. Sailor, Hemant A. Patil:
Whether to Pretrain DNN or not?: An Empirical Analysis for Voice Conversion. 1586-1590 - Mohit Goyal, Varun Srivastava, Prathosh A. P.:
Detection of Glottal Closure Instants from Raw Speech Using Convolutional Neural Networks. 1591-1595 - Joachim Fainberg, Ondrej Klejch, Steve Renals, Peter Bell:
Lattice-Based Lightly-Supervised Acoustic Model Training. 1596-1600 - Wilfried Michel, Ralf Schlüter, Hermann Ney:
Comparison of Lattice-Free and Lattice-Based Sequence Discriminative Training Criteria for LVCSR. 1601-1605 - Ryo Masumura, Hiroshi Sato, Tomohiro Tanaka, Takafumi Moriya, Yusuke Ijima, Takanobu Oba:
End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders. 1606-1610 - Abdelwahab Heba, Thomas Pellegrini, Jean-Pierre Lorré, Régine André-Obrecht:
Char+CV-CTC: Combining Graphemes and Consonant/Vowel Units for CTC-Based ASR Using Multitask Learning. 1611-1615 - Gakuto Kurata, Kartik Audhkhasi:
Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation. 1616-1620 - Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata:
Direct Neuron-Wise Fusion of Cognate Neural Networks. 1621-1625 - Pranav Ladkat, Oleg Rybakov, Radhika Arava, Sree Hari Krishnan Parthasarathi, I-Fan Chen, Nikko Strom:
Two Tiered Distributed Training Algorithm for Acoustic Modeling. 1626-1630 - Pin-Tuan Huang, Hung-Shin Lee, Syu-Siang Wang, Kuan-Yu Chen, Yu Tsao, Hsin-Min Wang:
Exploring the Encoder Layers of Discriminative Autoencoders for LVCSR. 1631-1635 - Gakuto Kurata, Kartik Audhkhasi:
Multi-Task CTC Training with Auxiliary Feature Reconstruction for End-to-End Speech Recognition. 1636-1640 - Mohan Li, Yuanjiang Cao, Weicong Zhou, Min Liu:
Framewise Supervised Training Towards End-to-End Speech Recognition Models: First Results. 1641-1645
Network Architectures for Emotion and Paralinguistics Recognition
- Efthymios Georgiou, Charilaos Papaioannou, Alexandros Potamianos:
Deep Hierarchical Fusion with Application in Sentiment Analysis. 1646-1650 - Vikramjit Mitra, Sue Booker, Erik Marchi, David Scott Farrar, Ute Dorothea Peitz, Bridget Cheng, Ermine Teves, Anuj Mehta, Devang Naik:
Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice. 1651-1655 - Jack Parry, Dimitri Palaz, Georgia Clarke, Pauline Lecomte, Rebecca Mead, Michael Berger, Gregor Hofer:
Analysis of Deep Learning Architectures for Cross-Corpus Speech Emotion Recognition. 1656-1660 - Bo Wang, Maria Liakata, Hao Ni, Terry J. Lyons, Alejo J. Nevado-Holgado, Kate Saunders:
A Path Signature Approach for Speech Emotion Recognition. 1661-1665 - Olga Egorow, Tarik Mrech, Norman Weißkirchen, Andreas Wendemuth:
Employing Bottleneck and Convolutional Features for Speech-Based Physical Load Detection on Limited Data Amounts. 1666-1670 - Jinming Zhao, Shizhe Chen, Jingjun Liang, Qin Jin:
Speech Emotion Recognition in Dyadic Dialogues with Attentive Interaction Modeling. 1671-1675 - Shun-Chang Zhong, Yun-Shao Lin, Chun-Min Chang, Yi-Ching Liu, Chi-Chun Lee:
Predicting Group Performances Using a Personality Composite-Network Architecture During Collaborative Task. 1676-1680 - Gao-Yi Chao, Yun-Shao Lin, Chun-Min Chang, Chi-Chun Lee:
Enforcing Semantic Consistency for Cross Corpus Valence Regression from Speech Using Adversarial Discrepancy Learning. 1681-1685 - Shuiyang Mao, P. C. Ching, Tan Lee:
Deep Learning of Segment-Level Feature Representation with Multiple Instance Learning for Utterance-Level Speech Emotion Recognition. 1686-1690 - Andreas Triantafyllopoulos, Gil Keren, Johannes Wagner, Ingmar Steiner, Björn W. Schuller:
Towards Robust Speech Emotion Recognition Using Deep Residual Networks for Speech Enhancement. 1691-1695 - Zhixuan Li, Liang He, Jingyang Li, Li Wang, Wei-Qiang Zhang:
Towards Discriminative Representations and Unbiased Predictions: Class-Specific Angular Softmax for Speech Emotion Recognition. 1696-1700 - Md Asif Jalal, Erfan Loweimi, Roger K. Moore, Thomas Hain:
Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition. 1701-1705
Acoustic Phonetics
- Sonia D'Apolito, Barbara Gili Fivela:
L2 Pronunciation Accuracy and Context: A Pilot Study on the Realization of Geminates in Italian as L2 by French Learners. 1706-1710 - Nisad Jamakovic, Robert Fuchs:
The Monophthongs of Formal Nigerian English: An Acoustic Analysis. 1711-1715 - Pablo Arantes, Anders Eriksson:
Quantifying Fundamental Frequency Modulation as a Function of Language, Speaking Style and Speaker. 1716-1720 - Niamh E. Kelly, Lara Keshishian:
The Voicing Contrast in Stops and Affricates in the Western Armenian of Lebanon. 1721-1725 - Adèle Jatteau, Ioana Vasilescu, Lori Lamel, Martine Adda-Decker, Nicolas Audibert:
" Gra[f] e!" Word-Final Devoicing of Obstruents in Standard French: An Acoustic Study Based on Large Corpora. 1726-1730 - Chih-Hsiang Huang, Huang-Cheng Chou, Yi-Tong Wu, Chi-Chun Lee, Yi-Wen Liu:
Acoustic Indicators of Deception in Mandarin Daily Conversations Recorded from an Interactive Game. 1731-1735 - Barbara Schuppler, Margaret Zellers:
Prosodic Effects on Plosive Duration in German and Austrian German. 1736-1740 - Cibu Johny, Alexander Gutkin, Martin Jansche:
Cross-Lingual Consistency of Phonological Features: An Empirical Study. 1741-1745 - Fanny Guitard-Ivent, Gabriele Chignoli, Cécile Fougeron, Laurianne Georgeton:
Are IP Initial Vowels Acoustically More Distinct? Results from LDA and CNN Classifications. 1746-1750 - Xizi Wei, Melvyn Hunt, Adrian Skilling:
Neural Network-Based Modeling of Phonetic Durations. 1751-1755 - Janina Molczanow, Beata Lukaszewicz, Anna Lukaszewicz:
An Acoustic Study of Vowel Undershoot in a System with Several Degrees of Prominence. 1756-1760 - Stephanie Berger, Oliver Niebuhr, Margaret Zellers:
A Preliminary Study of Charismatic Speech on YouTube: Correlating Prosodic Variation with Counts of Subscribers, Views and Likes. 1761-1765 - Shan Luo:
Phonetic Detail Encoding in Explaining the Size of Speech Planning Window. 1766-1770 - Dina El Zarka, Barbara Schuppler, Francesco Cangemi:
Acoustic Cues to Topic and Narrow Focus in Egyptian Arabic. 1771-1775 - Kowovi Comivi Alowonou, Jianguo Wei, Wenhuan Lu, Zhicheng Liu, Kiyoshi Honda, Jianwu Dang:
Acoustic and Articulatory Study of Ewe Vowels: A Comparative Study of Male and Female. 1776-1780
Speech Enhancement: Noise Attenuation
- Ya'nan Guo, Ziping Zhao, Yide Ma, Björn W. Schuller:
Speech Augmentation via Speaker-Specific Noise in Unseen Environment. 1781-1785 - Xiang Hao, Xiangdong Su, Zhiyu Wang, Hui Zhang, Batushiren:
UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-Noise Ratio Condition. 1786-1790 - Santiago Pascual, Joan Serrà, Antonio Bonafonte:
Towards Generalized Speech Enhancement with Generative Adversarial Networks. 1791-1795 - Xiaoqi Li, Yaxing Li, Meng Li, Shan Xu, Yuanjie Dong, Xinrong Sun, Shengwu Xiong:
A Convolutional Neural Network with Non-Local Module for Speech Enhancement. 1796-1800 - Yu-Chen Lin, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo:
IA-NET: Acceleration and Compression of Speech Enhancement Using Integer-Adder Deep Neural Network. 1801-1805 - Li Chai, Jun Du, Chin-Hui Lee:
KL-Divergence Regularized Deep Neural Network Adaptation for Low-Resource Speaker-Dependent Speech Enhancement. 1806-1810 - Jorge Llombart, Dayana Ribas, Antonio Miguel, Luis Vicente, Alfonso Ortega Giménez, Eduardo Lleida:
Speech Enhancement with Wide Residual Networks in Reverberant Environments. 1811-1815 - Chandan K. A. Reddy, Ebrahim Beyrami, Jamie Pool, Ross Cutler, Sriram Srinivasan, Johannes Gehrke:
A Scalable Noisy Speech Dataset and Online Subjective Test Framework. 1816-1820 - Nagaraj Adiga, Yannis Pantazis, Vassilis Tsiaras, Yannis Stylianou:
Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN. 1821-1825 - P. V. Muhammed Shifas, Nagaraj Adiga, Vassilis Tsiaras, Yannis Stylianou:
A Non-Causal FFTNet Architecture for Speech Enhancement. 1826-1830 - Daniel T. Braithwaite, W. Bastiaan Kleijn:
Speech Enhancement with Variance Constrained Autoencoders. 1831-1835
Language Learning and Databases
- Konstantinos Kyriakopoulos, Kate M. Knill, Mark J. F. Gales:
A Deep Learning Approach to Automatic Characterisation of Rhythm in Non-Native English Speech. 1836-1840 - Danny Merkx, Stefan L. Frank, Mirjam Ernestus:
Language Learning Using Speech to Image Retrieval. 1841-1845 - Lucy Skidmore, Roger K. Moore:
Using Alexa for Flashcard-Based Learning. 1846-1850 - John H. L. Hansen, Aditya Joglekar, Meena Chandra Shekhar, Vinay Kothapally, Chengzhu Yu, Lakshmish Kaushik, Abhijeet Sangwan:
The 2019 Inaugural Fearless Steps Challenge: A Giant Leap for Naturalistic Audio. 1851-1855 - Kuan-Yu Chen, Che-Ping Tsai, Da-Rong Liu, Hung-yi Lee, Lin-Shan Lee:
Completely Unsupervised Phoneme Recognition by a Generative Adversarial Network Harmonized with Iteratively Refined Hidden Markov Models. 1856-1860 - Tasavat Trisitichoke, Shintaro Ando, Daisuke Saito, Nobuaki Minematsu:
Analysis of Native Listeners' Facial Microexpressions While Shadowing Non-Native Speech - Potential of Shadowers' Facial Expressions for Comprehensibility Prediction. 1861-1865 - Reima Karhila, Anna-Riikka Smolander, Sari Ylinen, Mikko Kurimo:
Transparent Pronunciation Scoring Using Articulatorily Weighted Phoneme Edit Distance. 1866-1870 - Su-Youn Yoon, Chong Min Lee, Klaus Zechner, Keelan Evanini:
Development of Robust Automated Scoring Models Using Adversarial Input for Oral Proficiency Assessment. 1871-1875 - Yiting Lu, Mark J. F. Gales, Kate M. Knill, P. P. Manakul, Linlin Wang, Yu Wang:
Impact of ASR Performance on Spoken Grammatical Error Detection. 1876-1880 - Seung Hee Yang, Minhwa Chung:
Self-Imitating Feedback Generation Using GAN for Computer-Assisted Pronunciation Training. 1881-1885
Emotion and Personality in Conversation
- Chiori Hori, Anoop Cherian, Tim K. Marks, Takaaki Hori:
Joint Student-Teacher Learning for Audio-Visual Scene-Aware Dialog. 1886-1890 - Karthik Gopalakrishnan, Behnam Hedayatnia, Qinlang Chen, Anna Gottardi, Sanjeev Kwatra, Anu Venkatesh, Raefer Gabriel, Dilek Hakkani-Tür:
Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations. 1891-1895 - Uliyana Kubasova, Gabriel Murray, McKenzie Braley:
Analyzing Verbal and Nonverbal Features for Predicting Group Performance. 1896-1900 - Victor R. Martinez, Nikolaos Flemotomos, Victor Ardulov, Krishna Somandepalli, Simon B. Goldberg, Zac E. Imel, David C. Atkins, Shrikanth Narayanan:
Identifying Therapist and Client Personae for Therapeutic Alliance Estimation. 1901-1905 - Kristin Haake, Sarah Schimke, Simon Betz, Sina Zarrieß:
Do Hesitations Facilitate Processing of Partially Defective System Utterances? An Exploratory Eye Tracking Study. 1906-1910 - Bin Li, Yuan Jia:
Influence of Contextuality on Prosodic Realization of Information Structure in Chinese Dialogues. 1911-1915 - Kristijan Gjoreski, Aleksandar Gjoreski, Ivan Kraljevski, Diane Hirschfeld:
Cross-Lingual Transfer Learning for Affective Spoken Dialogue Systems. 1916-1920 - Mingzhi Yu, Emer Gilmartin, Diane J. Litman:
Identifying Personality Traits Using Overlap Dynamics in Multiparty Dialogue. 1921-1925 - Zakaria Aldeneh, Mimansa Jaiswal, Michael Picheny, Melvin G. McInnis, Emily Mower Provost:
Identifying Mood Episodes Using Dialogue Features from Clinical Interviews. 1926-1930 - Nichola Lubold, Stephanie A. Borrie, Tyson S. Barrett, Megan M. Willi, Visar Berisha:
Do Conversational Partners Entrain on Articulatory Precision? 1931-1935 - Zheng Lian, Jianhua Tao, Bin Liu, Jian Huang:
Conversational Emotion Analysis via Attention Mechanisms. 1936-1940
Voice Quality, Speech Perception, and Prosody
- Emma O'Neill, Julie Carson-Berndsen:
The Effect of Phoneme Distribution on Perceptual Similarity in English. 1941-1945 - Sofoklis Kakouros, Antti Suni, Juraj Simko, Martti Vainio:
Prosodic Representations of Prominence Classification Neural Networks and Autoencoders Using Bottleneck Features. 1946-1950 - Sharon Peperkamp, Alvaro Martin Iturralde Zurita:
Compensation for French Liquid Deletion During Auditory Sentence Processing. 1951-1955 - Daniil Kocharov, Tatiana Kachkovskaia, Pavel A. Skrelin:
Prosodic Factors Influencing Vowel Reduction in Russian. 1956-1960 - Christer Gobl, Ailbhe Ní Chasaide:
Time to Frequency Domain Mapping of the Voice Source: The Influence of Open Quotient and Glottal Skew on the Low End of the Source Spectrum. 1961-1965 - Eleanor Chodroff, Jennifer S. Cole:
Testing the Distinctiveness of Intonational Tunes: Evidence from Imitative Productions in American English. 1966-1970 - Sangwook Park, David K. Han, Mounya Elhilali:
A Study of a Cross-Language Perception Based on Cortical Analysis Using Biomimetic STRFs. 1971-1975 - Pavel Sturm, Jan Volín:
Perceptual Evaluation of Early versus Late F0 Peaks in the Intonation Structure of Czech Question-Word Questions. 1976-1980 - Anneliese Kelterer, Barbara Schuppler:
Acoustic Correlates of Phonation Type in Chichimec. 1981-1985 - Yu-Ren Chien, Michal Borský, Jón Guðnason:
F0 Variability Measures Based on Glottal Closure Instants. 1986-1989 - Lauri Tavi, Tanel Alumäe, Stefan Werner:
Recognition of Creaky Voice from Emergency Calls. 1990-1994
Speech Signal Characterization 3
- Shuzhuang Xu, Hiroshi Shimodaira:
Direct F0 Estimation with Neural-Network-Based Regression. 1995-1999 - Tanay Sharma, Rohith Chandrashekar Aralikatti, Dilip Kumar Margam, Abhinav Thanda, Sharad Roy, Pujitha Appan Kandala, Shankar M. Venkatesan:
Real Time Online Visual End Point Detection Using Unidirectional LSTM. 2000-2004 - Luc Ardaillon, Axel Roebel:
Fully-Convolutional Network for Pitch Estimation of Speech Signals. 2005-2009 - Mingye Dong, Jie Wu, Jian Luan:
Vocal Pitch Extraction in Polyphonic Music Using Convolutional Residual Network. 2010-2014 - Bidisha Sharma, Rohan Kumar Das, Haizhou Li:
Multi-Level Adaptive Speech Activity Detector for Speech in Naturalistic Environments. 2015-2019 - Bidisha Sharma, Rohan Kumar Das, Haizhou Li:
On the Importance of Audio-Source Separation for Singer Identification in Polyphonic Music. 2020-2024 - Hiroko Terasawa, Kenta Wakasa, Hideki Kawahara, Ken-Ichi Sakakibara:
Investigating the Physiological and Acoustic Contrasts Between Choral and Operatic Singing. 2025-2029 - Ruixi Lin, Charles Costello, Charles Jankowski, Vishwas Mruthyunjaya:
Optimizing Voice Activity Detection for Noisy Conditions. 2030-2034 - Taiki Yamamoto, Ryota Nishimura, Masayuki Misaki, Norihide Kitaoka:
Small-Footprint Magic Word Detection Method Using Convolutional LSTM Neural Network. 2035-2039 - Chitralekha Gupta, Emre Yilmaz, Haizhou Li:
Acoustic Modeling for Automatic Lyrics-to-Audio Alignment. 2040-2044 - Anastasios Vafeiadis, Eleftherios Fanioudakis, Ilyas Potamitis, Konstantinos Votis, Dimitrios Giakoumis, Dimitrios Tzovaras, Liming Chen, Raouf Hamzaoui:
Two-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detection. 2045-2049 - Tokihiko Kaburagi:
A Study of Soprano Singing in Light of the Source-Filter Interaction. 2050-2054
Speech Synthesis: Pronunciation, Multilingual, and Low Resource
- Yuxiang Zou, Linhao Dong, Bo Xu:
Boosting Character-Based Chinese Speech Synthesis via Multi-Task Learning and Dictionary Tutoring. 2055-2059 - Liumeng Xue, Wei Song, Guanghui Xu, Lei Xie, Zhizheng Wu:
Building a Mixed-Lingual Neural TTS System with Only Monolingual Data. 2060-2064 - Alex Sokolov, Tracy Rohlin, Ariya Rastrow:
Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion. 2065-2069 - Jason Taylor, Korin Richmond:
Analysis of Pronunciation Learning in End-to-End Speech Synthesis. 2070-2074 - Yuan-Jui Chen, Tao Tu, Cheng-chieh Yeh, Hung-yi Lee:
End-to-End Text-to-Speech for Low-Resource Languages by Cross-Lingual Transfer Learning. 2075-2079 - Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, R. J. Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran:
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning. 2080-2084 - Markéta Juzová, Daniel Tihelka, Jakub Vít:
Unified Language-Independent DNN-Based G2P Converter. 2085-2089 - Dongyang Dai, Zhiyong Wu, Shiyin Kang, Xixin Wu, Jia Jia, Dan Su, Dong Yu, Helen Meng:
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT. 2090-2094 - Sevinj Yolchuyeva, Géza Németh, Bálint Gyires-Tóth:
Transformer Based Grapheme-to-Phoneme Conversion. 2095-2099 - Harry Bleyan, Sandy Ritchie, Jonas Fromseier Mortensen, Daan van Esch:
Developing Pronunciation Models in New Languages Faster by Exploiting Common Grapheme-to-Phoneme Correspondences Across Languages. 2100-2104 - Mengnan Chen, Minchuan Chen, Shuang Liang, Jun Ma, Lei Chen, Shaojun Wang, Jing Xiao:
Cross-Lingual, Multi-Speaker Text-To-Speech Synthesis Using Neural Speaker Embedding. 2105-2109 - Zexin Cai, Yaogen Yang, Chuxiong Zhang, Xiaoyi Qin, Ming Li:
Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-Level Embedding Features. 2110-2114 - Hao Sun, Xu Tan, Jun-Wei Gan, Hongzhi Liu, Sheng Zhao, Tao Qin, Tie-Yan Liu:
Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion. 2115-2119
Cross-Lingual and Multilingual ASR
- Xinjian Li, Siddharth Dalmia, Alan W. Black, Florian Metze:
Multilingual Speech Recognition with Corpus Relatedness Sampling. 2120-2124 - Harish Arsikere, Ashtosh Sapru, Sri Garimella:
Multi-Dialect Acoustic Modeling Using Phone Mapping and Online i-Vectors. 2125-2129 - Anjuli Kannan, Arindrima Datta, Tara N. Sainath, Eugene Weinstein, Bhuvana Ramabhadran, Yonghui Wu, Ankur Bapna, Zhifeng Chen, Seungji Lee:
Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model. 2130-2134 - Carlos Mendes, Alberto Abad, João Paulo Neto, Isabel Trancoso:
Recognition of Latin American Spanish Using Multi-Task Learning. 2135-2139 - Thibault Viglino, Petr Motlícek, Milos Cernak:
End-to-End Accented Speech Recognition. 2140-2144 - Sheng Li, Chenchen Ding, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai:
End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition. 2145-2149 - Karan Taneja, Satarupa Guha, Preethi Jyothi, Basil Abraham:
Exploiting Monolingual Speech Corpora for Code-Mixed Speech Recognition. 2150-2154 - Ke Hu, Antoine Bruguier, Tara N. Sainath, Rohit Prabhavalkar, Golan Pundak:
Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models. 2155-2159 - Yerbolat Khassanov, Haihua Xu, Van Tung Pham, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma:
Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data. 2160-2164 - Zhiping Zeng, Yerbolat Khassanov, Van Tung Pham, Haihua Xu, Eng Siong Chng, Haizhou Li:
On the End-to-End Solution to Mandarin-English Code-Switching Speech Recognition. 2165-2169 - Shiliang Zhang, Yuan Liu, Ming Lei, Bin Ma, Lei Xie:
Towards Language-Universal Mandarin-English Speech Recognition. 2170-2174
Spoken Term Detection, Confidence Measure, and End-to-End Speech Recognition
- Prakhar Swarup, Roland Maas, Sri Garimella, Sri Harish Mallidi, Björn Hoffmeister:
Improving ASR Confidence Scores for Alexa Using Acoustic and Hypothesis Embeddings. 2175-2179 - Shiliang Zhang, Ming Lei, Zhijie Yan:
Investigation of Transformer Based Spelling Correction Model for CTC-Based End-to-End Mandarin Speech Recognition. 2180-2184 - Cal Peyser, Hao Zhang, Tara N. Sainath, Zelin Wu:
Improving Performance of End-to-End ASR on Numeric Sequences. 2185-2189 - Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Zhengkun Tian, Chenghao Zhao, Cunhang Fan:
A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting. 2190-2194 - Chieh-Chi Kao, Ming Sun, Yixin Gao, Shiv Vitaladevuni, Chao Wang:
Sub-Band Convolutional Neural Networks for Small-Footprint Spoken Term Classification. 2195-2199 - Sheng Li, Xugang Lu, Chenchen Ding, Peng Shen, Tatsuya Kawahara, Hisashi Kawai:
Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese. 2200-2204 - Jiaqi Guo, Yongbin You, Yanmin Qian, Kai Yu:
Joint Decoding of CTC Based Systems for Speech Recognition. 2205-2209 - Tomohiro Tanaka, Ryo Masumura, Takafumi Moriya, Takanobu Oba, Yushi Aono:
A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Sharable Knowledge. 2210-2214 - Karan Malhotra, Shubham Bansal, Sriram Ganapathy:
Active Learning Methods for Low Resource End-to-End Speech Recognition. 2215-2219 - Martin Karafiát, Murali Karthick Baskar, Shinji Watanabe, Takaaki Hori, Matthew Wiesner, Jan Cernocký:
Analysis of Multilingual Sequence-to-Sequence Speech Recognition Systems. 2220-2224 - Michal Zapotoczny, Piotr Pietrzak, Adrian Lancucki, Jan Chorowski:
Lattice Generation in Attention-Based Speech Recognition Models. 2225-2229 - Martin Jansche, Alexander Gutkin:
Sampling from Stochastic Finite Automata with Applications to CTC Decoding. 2230-2234 - Lukasz Dudziak, Mohamed S. Abdelfattah, Ravichander Vipperla, Stefanos Laskaridis, Nicholas D. Lane:
ShrinkML: End-to-End ASR Model Compression Using Reinforcement Learning. 2235-2239 - Yashesh Gaur, Jinyu Li, Zhong Meng, Yifan Gong:
Acoustic-to-Phrase Models for Speech Recognition. 2240-2244 - Ruizhi Li, Gregory Sell, Hynek Hermansky:
Performance Monitoring for End-to-End Speech Recognition. 2245-2249
Speech Perception
- Michelle Cohn, Georgia Zellou, Santiago Barreda:
The Role of Musical Experience in the Perceptual Weighting of Acoustic Cues for the Obstruent Coda Voicing Contrast in American English. 2250-2254 - Natalie Lewandowski, Daniel Duran:
Individual Differences in Implicit Attention to Phonetic Detail in Speech Perception. 2255-2259 - Kaylah Lalonde:
Effects of Natural Variability in Cross-Modal Temporal Correlations on Audiovisual Speech Recognition Benefit. 2260-2264 - Martijn Bentum, Louis ten Bosch, Antal van den Bosch, Mirjam Ernestus:
Listening with Great Expectations: An Investigation of Word Form Anticipations in Naturalistic Speech. 2265-2269 - Martijn Bentum, Louis ten Bosch, Antal van den Bosch, Mirjam Ernestus:
Quantifying Expectation Modulation in Human Speech Processing. 2270-2274 - Daniel R. Turner, Ann R. Bradlow, Jennifer S. Cole:
Perception of Pitch Contours in Speech and Nonspeech. 2275-2279 - Louis ten Bosch, Lou Boves, Kimberley Mulder:
Analyzing Reaction Time and Error Sequences in Lexical Decision Experiments. 2280-2284 - Li Liu, Jianze Li, Gang Feng, Xiao-Ping (Steven) Zhang:
Automatic Detection of the Temporal Segmentation of Hand Movements in British English Cued Speech. 2285-2289 - Yuriko Yokoe:
Place Shift as an Autonomous Process: Evidence from Japanese Listeners. 2290-2294 - Julien Meyer, Laure Dentel, Silvain Gerber, Rachid Ridouane:
A Perceptual Study of CV Syllables in Both Spoken and Whistled Speech: A Tashlhiyt Berber Perspective. 2295-2299 - Han-Chi Hsieh, Wei-Zhong Zheng, Ko-Chiang Chen, Ying-Hui Lai:
Consonant Classification in Mandarin Based on the Depth Image Feature: A Pilot Study. 2300-2304 - Shiri Lev-Ari, Robin Dodsworth, Jeff Mielke, Sharon Peperkamp:
The Different Roles of Expectations in Phonetic and Lexical Processing. 2305-2309 - Bruno Ferenc Segedin, Michelle Cohn, Georgia Zellou:
Perceptual Adaptation to Device and Human Voices: Learning and Generalization of a Phonetic Shift Across Real and Voice-AI Talkers. 2310-2314 - Katerina Papadimitriou, Gerasimos Potamianos:
End-to-End Convolutional Sequence Learning for ASL Fingerspelling Recognition. 2315-2319
Topics in Speech and Audio Signal Processing
- Krishna Somandepalli, Naveen Kumar, Arindam Jati, Panayiotis G. Georgiou, Shrikanth Narayanan:
Multiview Shared Subspace Learning Across Speakers and Speech Commands. 2320-2324 - Chelzy Belitz, Hussnain Ali, John H. L. Hansen:
A Machine Learning Based Clustering Protocol for Determining Hearing Aid Initial Configurations from Pure-Tone Audiograms. 2325-2329 - Truc Nguyen, Franz Pernkopf:
Acoustic Scene Classification with Mismatched Devices Using CliqueNets and Mixup Data Augmentation. 2330-2334 - Mohsin Y. Ahmed, Md. Mahbubur Rahman, Jilong Kuang:
DeepLung: Smartphone Convolutional Neural Network-Based Inference of Lung Anomalies for Pulmonary Patients. 2335-2339 - Roger K. Moore, Lucy Skidmore:
On the Use/Misuse of the Term 'Phoneme'. 2340-2344 - Hannah Muckenhirn, Vinayak Abrol, Mathew Magimai-Doss, Sébastien Marcel:
Understanding and Visualizing Raw Waveform-Based CNNs. 2345-2349 - Kevin Kilgour, Mauricio Zuluaga, Dominik Roblek, Matthew Sharifi:
Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms. 2350-2354 - Yuan Gong, Jian Yang, Jacob Huber, Mitchell MacKnight, Christian Poellabauer:
ReMASC: Realistic Replay Attack Corpus for Voice Controlled Systems. 2355-2359 - Balamurali B. T., Jer-Ming Chen:
Analyzing Intra-Speaker and Inter-Speaker Vocal Tract Impedance Characteristics in a Low-Dimensional Feature Space Using t-SNE. 2360-2363
Speech Processing and Analysis
- Geon Woo Lee, Jung Hyuk Lee, Seong Ju Kim, Hong Kook Kim:
Directional Audio Rendering Using a Neural Network Based Personalized HRTF. 2364-2365 - Wikus Pienaar, Daan Wissing:
Online Speech Processing and Analysis Suite. 2366-2367 - Dieter Maurer, Heidy Suter, Christian d'Hereuse, Volker Dellwo:
Formant Pattern and Spectral Shape Ambiguity of Vowel Sounds, and Related Phenomena of Vowel Acoustics - Exemplary Evidence. 2368-2369 - Anton Noll, Jonathan Stuefer, Nicola Klingler, Hannah Leykum, Carina Lozo, Jan Luttenberger, Michael Pucher, Carolin Schmid:
Sound Tools eXtended (STx) 5.0 - A Powerful Sound Analysis Tool Optimized for Speech. 2370-2371 - Mohamed Eldesouki, Naassih Gopee, Ahmed Ali, Kareem Darwish:
FarSpeech: Arabic Natural Language Processing for Live Arabic Speech. 2372-2373 - Fasih Haider, Saturnino Luz:
A System for Real-Time Privacy Preserving Data Collection for Ambient Assisted Living. 2374-2375 - Chitralekha Gupta, Karthika Vijayan, Bidisha Sharma, Xiaoxue Gao, Haizhou Li:
NUS Speak-to-Sing: A Web Platform for Personalized Speech-to-Singing Conversion. 2376-2377
Keynote 3: Manfred Kaltenbacher
- Manfred Kaltenbacher:
Physiology and Physics of Voice Production.
The Interspeech 2019 Computational Paralinguistics Challenge (ComParE)
- Björn W. Schuller, Anton Batliner, Christian Bergler, Florian B. Pokorny, Jarek Krajewski, Margaret Cychosz, Ralf Vollmann, Sonja-Dana Roelen, Sebastian Schnieder, Elika Bergelson, Alejandrina Cristià, Amanda Seidl, Anne S. Warlaumont, Lisa Yankowitz, Elmar Nöth, Shahin Amiriparian, Simone Hantke, Maximilian Schmitt:
The INTERSPEECH 2019 Computational Paralinguistics Challenge: Styrian Dialects, Continuous Sleepiness, Baby Sounds & Orca Activity. 2378-2382 - S. Pavankumar Dubagunta, Mathew Magimai-Doss:
Using Speech Production Knowledge for Raw Waveform Modelling Based Styrian Dialect Identification. 2383-2387 - Daniel Elsner, Stefan Langer, Fabian Ritz, Robert Müller, Steffen Illium:
Deep Neural Baselines for Computational Paralinguistics. 2388-2392 - Thomas Kisler, Raphael Winkelmann, Florian Schiel:
Styrian Dialect Classification: Comparing and Fusing Classifiers Based on a Feature Selection Using a Genetic Algorithm. 2393-2397 - Sung-Lin Yeh, Gao-Yi Chao, Bo-Hao Su, Yu-Lin Huang, Meng-Han Lin, Yin-Chun Tsai, Yu-Wen Tai, Zheng-Chi Lu, Chieh-Yu Chen, Tsung-Ming Tai, Chiu-Wang Tseng, Cheng-Kuang Lee, Chi-Chun Lee:
Using Attention Networks and Adversarial Augmentation for Styrian Dialect Continuous Sleepiness and Baby Sound Recognition. 2398-2402 - Peter Wu, Sai Krishna Rallabandi, Alan W. Black, Eric Nyberg:
Ordinal Triplet Loss: Investigating Sleepiness Detection from Speech. 2403-2407 - Vijay Ravi, Soo Jin Park, Amber Afshan, Abeer Alwan:
Voice Quality and Between-Frame Entropy for Sleepiness Estimation. 2408-2412 - Gábor Gosztolya:
Using Fisher Vector and Bag-of-Audio-Words Representations to Identify Styrian Dialects, Sleepiness, Baby & Orca Sounds. 2413-2417 - Rohan Kumar Das, Haizhou Li:
Instantaneous Phase and Long-Term Acoustic Cues for Orca Activity Detection. 2418-2422 - Dominik Schiller, Tobias Huber, Florian Lingenfelser, Michael Dietz, Andreas Seiderer, Elisabeth André:
Relevance-Based Feature Masking: Improving Neural Network Based Whale Classification Through Explainable Artificial Intelligence. 2423-2427 - Marie-José Caraty, Claude Montacié:
Spatial, Temporal and Spectral Multiresolution Analysis for the INTERSPEECH 2019 ComParE Challenge. 2428-2432 - Haiwei Wu, Weiqing Wang, Ming Li:
The DKU-LENOVO Systems for the INTERSPEECH 2019 Computational Paralinguistic Challenge. 2433-2437
The VOiCES from a Distance Challenge — O
- Mahesh Kumar Nandwana, Julien van Hout, Colleen Richey, Mitchell McLaren, María Auxiliadora Barrios, Aaron Lawson:
The VOiCES from a Distance Challenge 2019. 2438-2442 - Sergey Novoselov, Aleksei Gusev, Artem Ivanov, Timur Pekhovsky, Andrey Shulipa, Galina Lavrentyeva, Vladimir Volokhov, Alexandr Kozlov:
STC Speaker Recognition Systems for the VOiCES from a Distance Challenge. 2443-2447 - Pavel Matejka, Oldrich Plchot, Hossein Zeinali, Ladislav Mosner, Anna Silnova, Lukás Burget, Ondrej Novotný, Ondrej Glembek:
Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge. 2448-2452 - Ivan Medennikov, Yuri Y. Khokhlov, Aleksei Romanenko, Ivan Sorokin, Anton Mitrofanov, Vladimir Bataev, Andrei Andrusenko, Tatiana Prisyach, Mariya Korenevskaya, Oleg Petrov, Alexander Zatvornitskiy:
The STC ASR System for the VOiCES from a Distance Challenge 2019. 2453-2457 - Tze Yuang Chong, Kye Min Tan, Kah Kuan Teh, Chang Huai You, Hanwu Sun, Tran Huy Dat:
The I2R's ASR System for the VOiCES from a Distance Challenge 2019. 2458-2462
The VOiCES from a Distance Challenge — P
- Mahesh Kumar Nandwana, Julien van Hout, Colleen Richey, Mitchell McLaren, Maria Alejandra Barrios, Aaron Lawson:
The VOiCES from a Distance Challenge 2019. - Sergey Novoselov, Aleksei Gusev, Artem Ivanov, Timur Pekhovsky, Andrey Shulipa, Galina Lavrentyeva, Vladimir Volokhov, Alexandr Kozlov:
STC Speaker Recognition Systems for the VOiCES from a Distance Challenge. - Pavel Matejka, Oldrich Plchot, Hossein Zeinali, Ladislav Mosner, Anna Silnova, Lukás Burget, Ondrej Novotný, Ondrej Glembek:
Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge. - Ivan Medennikov, Yuri Y. Khokhlov, Aleksei Romanenko, Ivan Sorokin, Anton Mitrofanov, Vladimir Bataev, Andrei Andrusenko, Tatiana Prisyach, Mariya Korenevskaya, Oleg Petrov, Alexander Zatvornitskiy:
The STC ASR System for the VOiCES from a Distance Challenge 2019. - Tze Yuang Chong, Kye Min Tan, Kah Kuan Teh, Chang Huai You, Hanwu Sun, Tran Huy Dat:
The I2R's ASR System for the VOiCES from a Distance Challenge 2019. - Arindam Jati, Raghuveer Peri, Monisankha Pal, Tae Jin Park, Naveen Kumar, Ruchir Travadi, Panayiotis G. Georgiou, Shrikanth Narayanan:
Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech. 2463-2467 - David Snyder, Jesús Villalba, Nanxin Chen, Daniel Povey, Gregory Sell, Najim Dehak, Sanjeev Khudanpur:
The JHU Speaker Recognition System for the VOiCES 2019 Challenge. 2468-2472 - Jonathan Huang, Tobias Bocklet:
Intel Far-Field Speaker Recognition System for VOiCES Challenge 2019. 2473-2477 - Hanwu Sun, Kah Kuan Teh, Ivan Kukanov, Tran Huy Dat:
The I2R's Submission to VOiCES Distance Speaker Recognition Challenge 2019. 2478-2482 - Yulong Liang, Lin Yang, Xuyang Wang, Yingjie Li, Chen Jia, Junjie Wang:
The LeVoice Far-Field Speech Recognition System for VOiCES from a Distance Challenge 2019. 2483-2487 - Yiming Wang, David Snyder, Hainan Xu, Vimal Manohar, Phani Sankar Nidadavolu, Daniel Povey, Sanjeev Khudanpur:
The JHU ASR System for VOiCES from a Distance Challenge 2019. 2488-2492 - Danwei Cai, Xiaoyi Qin, Weicheng Cai, Ming Li:
The DKU System for the Speaker Recognition Task of the 2019 VOiCES from a Distance Challenge. 2493-2497
Voice Quality Characterization for Clinical Voice Assessment: Voice Production, Acoustics, and Auditory Perception
- Yermiyahu Hauptman, Ruth Aloni-Lavi, Itshak Lapidot, Tanya Gurevich, Yael Manor, Stav Naor, Noa Diamant, Irit Opher:
Identifying Distinctive Acoustic and Spectral Features in Parkinson's Disease. 2498-2502 - Carlo Drioli, Philipp Aichinger:
Aerodynamics and Lumped-Masses Combined with Delay Lines for Modeling Vertical and Anterior-Posterior Phase Differences in Pathological Vocal Fold Vibration. 2503-2507 - Sudarsana Reddy Kadiri, Paavo Alku:
Mel-Frequency Cepstral Coefficients of Voice Source Waveforms for Classification of Phonation Types in Speech. 2508-2512 - Sunghye Cho, Mark Liberman, Neville Ryant, Meredith Cola, Robert T. Schultz, Julia Parish-Morris:
Automatic Detection of Autism Spectrum Disorder in Children Using Acoustic and Text Features from Brief Natural Conversations. 2513-2517 - Jean Schoentgen, Philipp Aichinger:
Analysis and Synthesis of Vocal Flutter and Vocal Jitter. 2518-2522 - Felix Schaeffler, Stephen Jannetts, Janet Beck:
Reliability of Clinical Voice Parameters Captured with Smartphones - Measurements of Added Noise and Spectral Tilt. 2523-2527 - Meredith Moore, Michael Saxon, Hemanth Venkateswara, Visar Berisha, Sethuraman Panchanathan:
Say What? A Dataset for Exploring the Error Patterns That Two ASR Engines Make. 2528-2532
Prosody
- Nigel G. Ward:
Survey Talk: Prosody Research and Applications: The State of the Art. - Simon Roessig, Doris Mücke, Lena Pagel:
Dimensions of Prosodic Prominence in an Attractor Model. 2533-2537 - Antti Suni, Marcin Wlodarczak, Martti Vainio, Juraj Simko:
Comparative Analysis of Prosodic Characteristics Using WaveNet Embeddings. 2538-2542 - Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl:
The Role of Voice Quality in the Perception of Prominence in Synthetic Speech. 2543-2547 - Rachel Albar, Hiyon Yoo:
Phonological Awareness of French Rising Contours in Japanese Learners. 2548-2552
Speech and Audio Classification 1
- Masaki Okawa, Takuya Saito, Naoki Sawada, Hiromitsu Nishizaki:
Audio Classification of Bit-Representation Waveform. 2553-2557 - Manjunath Mulimani, Shashidhar G. Koolagudi:
Locality-Constrained Linear Coding Based Fused Visual Features for Robust Acoustic Event Classification. 2558-2562 - Yu-Han Shen, Ke-Xin He, Wei-Qiang Zhang:
Learning How to Listen: A Temporal-Frequential Attention Model for Sound Event Detection. 2563-2567 - Logan Ford, Hao Tang, François Grondin, James R. Glass:
A Deep Residual Network for Large-Scale Acoustic Scene Analysis. 2568-2572 - Chandan K. A. Reddy, Ross Cutler, Johannes Gehrke:
Supervised Classifiers for Audio Impairments with Noisy Labels. 2573-2577 - Lorenzo Tarantino, Philip N. Garner, Alexandros Lazaridis:
Self-Attention for Speech Emotion Recognition. 2578-2582
Singing and Multimodal Synthesis
- Eliya Nachmani, Lior Wolf:
Unsupervised Singing Voice Conversion. 2583-2587 - Juheon Lee, Hyeong-Seok Choi, Chang-Bin Jeon, Junghyun Koo, Kyogu Lee:
Adversarially Trained End-to-End Korean Singing Voice Synthesis System. 2588-2592 - Yuan-Hao Yi, Yang Ai, Zhen-Hua Ling, Li-Rong Dai:
Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling. 2593-2597 - Sara Dahmani, Vincent Colotte, Valérian Girard, Slim Ouni:
Conditional Variational Auto-Encoder for Text-Driven Expressive AudioVisual Speech Synthesis. 2598-2602 - David Ayllón, Fernando Villavicencio, Pierre Lanchantin:
A Strategy for Improved Phone-Level Lyrics-to-Audio Alignment for Speech-to-Singing Synthesis. 2603-2607 - Théo Biasutto-Lervat, Sara Dahmani, Slim Ouni:
Modeling Labial Coarticulation with Bidirectional Gated Recurrent Networks and Transfer Learning. 2608-2612
ASR Neural Network Training — 2
- Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le:
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. 2613-2617 - Kartik Audhkhasi, George Saon, Zoltán Tüske, Brian Kingsbury, Michael Picheny:
Forget a Bit to Learn Better: Soft Forgetting for CTC-Based Automatic Speech Recognition. 2618-2622 - Haoran Miao, Gaofeng Cheng, Pengyuan Zhang, Ta Li, Yonghong Yan:
Online Hybrid CTC/Attention Architecture for End-to-End Speech Recognition. 2623-2627 - Wei Zhang, Xiaodong Cui, Ulrich Finkler, George Saon, Abdullah Kayi, Alper Buyuktosunoglu, Brian Kingsbury, David S. Kung, Michael Picheny:
A Highly Efficient Distributed Deep Learning System for Automatic Speech Recognition. 2628-2632 - Wangyou Zhang, Xuankai Chang, Yanmin Qian:
Knowledge Distillation for End-to-End Monaural Multi-Talker ASR System. 2633-2637 - Tobias Menne, Ilya Sklyar, Ralf Schlüter, Hermann Ney:
Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech. 2638-2642
Bilingualism, L2, and Non-Nativeness
- Ann R. Bradlow:
Survey Talk: Recognition of Foreign-Accented Speech: Challenges and Opportunities for Human and Computer Speech Communication. - John S. Novak III, Daniel Bunn, Robert V. Kenyon:
The Effects of Time Expansion on English as a Second Language Individuals. 2643-2647 - Shuju Shi, Chilin Shih, Jinsong Zhang:
Capturing L1 Influence on L2 Pronunciation by Simulating Perceptual Space Using Acoustic Features. 2648-2652 - Juqiang Chen, Catherine T. Best, Mark Antoniou:
Cognitive Factors in Thai-Naïve Mandarin Speakers' Imitation of Thai Lexical Tones. 2653-2657 - Annie Tremblay, Mirjam Broersma:
Foreign-Language Knowledge Enhances Artificial-Language Segmentation. 2658-2662
Spoken Term Detection
- Abdalghani Abujabal, Judith Gaspers:
Neural Named Entity Recognition from Subword Units. 2663-2667 - Saurabhchand Bhati, Shekhar Nayak, K. Sri Rama Murty, Najim Dehak:
Unsupervised Acoustic Segmentation and Clustering Using Siamese Network Embeddings. 2668-2672 - Bolaji Yusuf, Murat Saraclar:
An Empirical Evaluation of DTW Subsampling Methods for Keyword Search. 2673-2677 - Zixiaofan Yang, Julia Hirschberg:
Linguistically-Informed Training of Acoustic Word Embeddings for Low-Resource Languages. 2678-2682 - Liming Wang, Mark A. Hasegawa-Johnson:
Multimodal Word Discovery and Retrieval with Phone Sequence and Image Concepts. 2683-2687 - Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier:
Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-Resource Settings. 2688-2692
Speech and Audio Source Separation and Scene Analysis 2
- Wei Xue, Ying Tong, Guohong Ding, Chao Zhang, Tao Ma, Xiaodong He, Bowen Zhou:
Direct-Path Signal Cross-Correlation Estimation for Sound Source Localization in Reverberation. 2693-2697 - François Grondin, James R. Glass:
Multiple Sound Source Localization with SVD-PHAT. 2698-2702 - Wangyou Zhang, Ying Zhou, Yanmin Qian:
Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking. 2703-2707 - Yoshiki Masuyama, Masahito Togami, Tatsuya Komatsu:
Multichannel Loss Function for Supervised Speech Source Separation by Mask-Based Beamforming. 2708-2712 - Guanjun Li, Shan Liang, Shuai Nie, Wenju Liu, Meng Yu, Lianwu Chen, Shouye Peng, Changliang Li:
Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction. 2713-2717 - Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Tomohiro Nakatani:
Multimodal SpeakerBeam: Single Channel Target Speech Extraction with Audio-Visual Speaker Clues. 2718-2722
Speech Enhancement: Single Channel 2
- François G. Germain, Qifeng Chen, Vladlen Koltun:
Speech Denoising with Deep Feature Losses. 2723-2727 - Quan Wang, Hannah Muckenhirn, Kevin W. Wilson, Prashant Sridhar, Zelin Wu, John R. Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio López-Moreno:
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking. 2728-2732 - Chien-Feng Liao, Yu Tsao, Xugang Lu, Hisashi Kawai:
Incorporating Symbolic Sequential Modeling for Speech Enhancement. 2733-2737 - Pejman Mowlaee, Daniel Scheran, Johannes Stahl, Sean U. N. Wood, W. Bastiaan Kleijn:
Maximum a posteriori Speech Enhancement Based on Double Spectrum. 2738-2742 - Jian Yao, Ahmad Al-Dahle:
Coarse-to-Fine Optimization for Speech Enhancement. 2743-2747 - Like Hui, Siyuan Ma, Mikhail Belkin:
Kernel Machines Beat Deep Neural Networks on Mask-Based Single-Channel Speech Enhancement. 2748-2752
Multimodal ASR
- Florian Metze:
Survey Talk: Multimodal Processing of Speech and Language. - Nilay Shrivastava, Astitwa Saxena, Yaman Kumar, Rajiv Ratn Shah, Amanda Stent, Debanjan Mahata, Preeti Kaur, Roger Zimmermann:
MobiVSR : Efficient and Light-Weight Neural Network for Visual Speech Recognition on Mobile Devices. 2753-2757 - Pujitha Appan Kandala, Abhinav Thanda, Dilip Kumar Margam, Rohith Chandrashekar Aralikatti, Tanay Sharma, Sharad Roy, Shankar M. Venkatesan:
Speaker Adaptation for Lip-Reading Using Visual Identity Vectors. 2758-2762 - Alexandros Koumparoulis, Gerasimos Potamianos:
MobiLipNet: Resource-Efficient Deep Learning Based Lipreading. 2763-2767 - Leyuan Qu, Cornelius Weber, Stefan Wermter:
LipSound: Neural Mel-Spectrogram Reconstruction for Lip Reading. 2768-2772
ASR Neural Network Architectures 2
- Tara N. Sainath, Ruoming Pang, David Rybach, Yanzhang He, Rohit Prabhavalkar, Wei Li, Mirkó Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian McGraw, Chung-Cheng Chiu:
Two-Pass End-to-End Speech Recognition. 2773-2777 - Max W. Y. Lam, Jun Wang, Xunying Liu, Helen Meng, Dan Su, Dong Yu:
Extract, Adapt and Recognize: An End-to-End Neural Network for Corrupted Monaural Speech Recognition. 2778-2782 - Dhananjaya Gowda, Abhinav Garg, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim:
Multi-Task Multi-Resolution Char-to-BPE Cross-Attention Decoder for End-to-End Speech Recognition. 2783-2787 - Kyu Jeong Han, Jing Huang, Yun Tang, Xiaodong He, Bowen Zhou:
Multi-Stride Self-Attention for Speech Recognition. 2788-2792 - Shoukang Hu, Xurong Xie, Shansong Liu, Max W. Y. Lam, Jianwei Yu, Xixin Wu, Xunying Liu, Helen Meng:
LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition. 2793-2797 - Liang Lu, Eric Sun, Yifan Gong:
Self-Teaching Networks. 2798-2802
Training Strategy for Speech Emotion Recognition
- Yuanchao Li, Tianyu Zhao, Tatsuya Kawahara:
Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning. 2803-2807 - Maximilian Schmitt, Nicholas Cummins, Björn W. Schuller:
Continuous Emotion Recognition in Speech - Do We Need Recurrence? 2808-2812 - Anda Ouyang, Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah:
Speech Based Emotion Prediction: Can a Linear Model Work? 2813-2817 - Atsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono:
Speech Emotion Recognition Based on Multi-Label Emotion Existence Model. 2818-2822 - Cristina Gorrostieta, Reza Lotfian, Kye Taylor, Richard Brutti, John Kane:
Gender De-Biasing in Speech Emotion Recognition. 2823-2827 - Fang Bao, Michael Neumann, Ngoc Thang Vu:
CycleGAN-Based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition. 2828-2832
Voice Conversion for Style, Accent, and Emotion
- Bajibabu Bollepalli, Lauri Juvela, Paavo Alku:
Lombard Speech Synthesis Using Transfer Learning in a Tacotron Text-to-Speech System. 2833-2837 - Shreyas Seshadri, Lauri Juvela, Paavo Alku, Okko Räsänen:
Augmented CycleGANs for Continuous Scale Normal-to-Lombard Speaking Style Conversion. 2838-2842 - Guanlong Zhao, Shaojin Ding, Ricardo Gutierrez-Osuna:
Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams. 2843-2847 - Ravi Shankar, Jacob Sager, Archana Venkataraman:
A Multi-Speaker Emotion Morphing Model Using Highway Networks and Maximum Likelihood Objective. 2848-2852 - Itshak Lapidot, Jean-François Bonastre:
Effects of Waveform PMF on Anti-Spoofing Detection. 2853-2857 - Jian Gao, Deep Chakraborty, Hamidou Tembine, Olaitan Olaleye:
Nonparallel Emotional Speech Conversion. 2858-2862
Speaker Recognition 2
- Themos Stafylakis, Johan Rohdin, Oldrich Plchot, Petr Mizera, Lukás Burget:
Self-Supervised Speaker Embeddings. 2863-2867 - Andreas Nautsch, Jose Patino, Amos Treiber, Themos Stafylakis, Petr Mizera, Massimiliano Todisco, Thomas Schneider, Nicholas W. D. Evans:
Privacy-Preserving Speaker Recognition with Cohort Score Normalisation. 2868-2872 - Yi Liu, Liang He, Jia Liu:
Large Margin Softmax Loss for Speaker Verification. 2873-2877 - Amirhossein Hajavi, Ali Etemad:
A Deep Neural Network for Short-Segment Speaker Recognition. 2878-2882 - Jianfeng Zhou, Tao Jiang, Zheng Li, Lin Li, Qingyang Hong:
Deep Speaker Embedding Extraction with Channel-Wise Feature Responses and Additive Supervision Softmax Loss Function. 2883-2887 - Suwon Shon, Hao Tang, James R. Glass:
VoiceID Loss: Speech Enhancement for Speaker Verification. 2888-2892
Speaker Recognition and Anti-Spoofing
- Anderson R. Avila, Jahangir Alam, Douglas D. O'Shaughnessy, Tiago H. Falk:
Blind Channel Response Estimation for Replay Attack Detection. 2893-2897 - Ankur T. Patil, Rajul Acharya, Pulikonda Krishna Aditya Sai, Hemant A. Patil:
Energy Separation-Based Instantaneous Frequency Estimation for Cochlear Cepstral Feature for Replay Spoof Detection. 2898-2902 - Victoria Mingote, Antonio Miguel, Dayana Ribas, Alfonso Ortega Giménez, Eduardo Lleida:
Optimization of False Acceptance/Rejection Rates and Decision Threshold for End-to-End Text-Dependent Speaker Verification Systems. 2903-2907 - Lei Fan, Qing-Yuan Jiang, Ya-Qi Yu, Wu-Jun Li:
Deep Hashing for Speaker Identification and Retrieval. 2908-2912 - Mirko Marras, Pawel Korus, Nasir D. Memon, Gianni Fenu:
Adversarial Optimization for Dictionary Attacks on Speaker Verification. 2913-2917 - Tharshini Gunendradasan, Eliathamby Ambikairajah, Julien Epps, Haizhou Li:
An Adaptive-Q Cochlear Model for Replay Spoofing Detection. 2918-2922 - Sungrack Yun, Janghoon Cho, Jungyun Eum, Wonil Chang, Kyuwoong Hwang:
An End-to-End Text-Independent Speaker Verification Framework with a Keyword Adversarial Network. 2923-2927 - Soonshin Seo, Daniel Jun Rim, Minkyu Lim, Donghyun Lee, Hosung Park, Junseok Oh, Changmin Kim, Ji-Hwan Kim:
Shortcut Connections Based Deep Speaker Embeddings for End-to-End Speaker Verification System. 2928-2932 - Chang Huai You, Jichen Yang, Huy Dat Tran:
Device Feature Extractor for Replay Spoofing Detection. 2933-2937 - Hongji Wang, Heinrich Dinkel, Shuai Wang, Yanmin Qian, Kai Yu:
Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training. 2938-2942 - Ahilan Kanagasundaram, Sridha Sridharan, Sriram Ganapathy, Prachi Singh, Clinton Fookes:
A Study of x-Vector Based Speaker Recognition on Short Utterances. 2943-2947 - Nanxin Chen, Jesús Villalba, Najim Dehak:
Tied Mixture of Factor Analyzers Layer to Combine Frame Level Representations in Neural Speaker Embeddings. 2948-2952 - Buddhi Wickramasinghe, Eliathamby Ambikairajah, Julien Epps:
Biologically Inspired Adaptive-Q Filterbanks for Replay Spoofing Attack Detection. 2953-2957 - Pierre-Michel Bousquet, Mickael Rouvier:
On Robustness of Unsupervised Domain Adaptation for Speaker Recognition. 2958-2962 - Suwon Shon, Younggun Lee, Taesu Kim:
Large-Scale Speaker Retrieval on Random Speaker Variability Subspace. 2963-2967
Rich Transcription and ASR Systems
- Takuya Yoshioka, Dimitrios Dimitriadis, Andreas Stolcke, William Hinthorn, Zhuo Chen, Michael Zeng, Xuedong Huang:
Meeting Transcription Using Asynchronous Distant Microphones. 2968-2972 - Samuel Thomas, Kartik Audhkhasi, Zoltán Tüske, Yinghui Huang, Michael Picheny:
Detection and Recovery of OOVs for Improved English Broadcast News Captioning. 2973-2977 - Muhammad Umar Farooq, Farah Adeeba, Sahar Rauf, Sarmad Hussain:
Improving Large Vocabulary Urdu Speech Recognition System Using Deep Neural Networks. 2978-2982 - Min Tang:
Hybrid Arbitration Using Raw ASR String and NLU Information - Taking the Best of Both Embedded World and Cloud World. 2983-2987 - György Szaszák, Máté Ákos Tündik:
Leveraging a Character, Word and Prosody Triplet for an ASR Error Robust and Agglutination Friendly Punctuation Approach. 2988-2992 - Thomas Pellegrini, Jérôme Farinas, Estelle Delpech, François Lancelot:
The Airbus Air Traffic Control Speech Recognition 2018 Challenge: Towards ATC Automatic Transcription and Call Sign Detection. 2993-2997 - Dan Oneata, Horia Cucu:
Kite: Automatic Speech Recognition for Unmanned Aerial Vehicles. 2998-3002 - Xiaofei Wang, Jinyi Yang, Ruizhi Li, Samik Sadhu, Hynek Hermansky:
Exploring Methods for the Automatic Detection of Errors in Manual Transcription. 3003-3007 - Astik Biswas, Raghav Menon, Ewald van der Westhuizen, Thomas Niesler:
Improved Low-Resource Somali Speech Recognition by Semi-Supervised Acoustic and Language Model Training. 3008-3012 - Inga Rún Helgadóttir, Anna Björk Nikulásdóttir, Michal Borský, Judy Y. Fong, Róbert Kjaran, Jón Guðnason:
The Althingi ASR System. 3013-3017 - Vishwa Gupta, Lise Rebout, Gilles Boulianne, Pierre André Ménard, Jahangir Alam:
CRIM's Speech Transcription and Call Sign Detection System for the ATC Airbus Challenge Task. 3018-3022
Speech and Language Analytics for Medical Applications
- Tomasz Rutowski, Amir Harati, Yang Lu, Elizabeth Shriberg:
Optimizing Speech-Input Length for Speaker-Independent Depression Classification. 3023-3027 - Mary Pietrowicz, Carla Agurto, Raquel Norel, Elif Eyigöz, Guillermo A. Cecchi, Zarina R. Bilgrami, Cheryl Corcoran:
A New Approach for Automating Analysis of Responses on Verbal Fluency Tests from Subjects At-Risk for Schizophrenia. 3028-3032 - Laetitia Jeancolas, Graziella Mangone, Jean-Christophe Corvol, Marie Vidailhet, Stéphane Lehéricy, Badr-Eddine Benkelfat, Habib Benali, Dijana Petrovska-Delacrétaz:
Comparison of Telephone Recordings and Professional Microphone Recordings for Early Detection of Parkinson's Disease, Using Mel-Frequency Cepstral Coefficients with Gaussian Mixture Models. 3033-3037 - Parvaneh Janbakhshi, Ina Kodrasi, Hervé Bourlard:
Spectral Subspace Analysis for Automatic Assessment of Pathological Speech Intelligibility. 3038-3042 - Carolina De Pasquale, Charlie Cullen, Brian Vaughan:
An Investigation of Therapeutic Rapport Through Prosody in Brief Psychodynamic Psychotherapy. 3043-3047 - Alice Rueda, Juan Camilo Vásquez-Correa, Cristian David Ríos-Urrego, Juan Rafael Orozco-Arroyave, Sridhar Krishnan, Elmar Nöth:
Feature Representation of Pathophysiology of Parkinsonian Dysarthria. 3048-3052 - Charles C. Onu, Jonathan Lebensold, William L. Hamilton, Doina Precup:
Neural Transfer Learning for Cry-Based Diagnosis of Perinatal Asphyxia. 3053-3057 - Hui-Ting Hong, Jeng-Lin Li, Yi-Ming Weng, Chip-Jin Ng, Chi-Chun Lee:
Investigating the Variability of Voice Quality and Pain Levels as a Function of Multiple Clinical Parameters. 3058-3062 - José Vicente Egas López, Juan Rafael Orozco-Arroyave, Gábor Gosztolya:
Assessing Parkinson's Disease from Speech Using Fisher Vectors. 3063-3067 - Philipp Klumpp, Juan Camilo Vásquez-Correa, Tino Haderlein, Elmar Nöth:
Feature Space Visualization with Spatial Similarity Maps for Pathological Speech Data. 3068-3072 - Sandeep Nallan Chakravarthula, Haoqi Li, Shao-Yen Tseng, Maija Reblin, Panayiotis G. Georgiou:
Predicting Behavior in Cancer-Afflicted Patient and Spouse Interactions Using Speech and Language. 3073-3077 - Ying Qin, Tan Lee, Anthony Pak-Hin Kong:
Automatic Assessment of Language Impairment Based on Raw ASR Output. 3078-3082
Speech Perception in Adverse Listening Conditions
- Zhen Fu, Xihong Wu, Jing Chen:
Effects of Spectral and Temporal Cues to Mandarin Concurrent-Vowels Identification for Normal-Hearing and Hearing-Impaired Listeners. 3083-3087 - Vicky Zayats, Trang Tran, Richard A. Wright, Courtney Mansfield, Mari Ostendorf:
Disfluencies and Human Speech Transcription Errors. 3088-3092 - Sandra I. Parhammer, Miriam Ebersberg, Jenny Tippmann, Katja Stärk, Andreas Opitz, Barbara Hinger, Sonja Rossi:
The Influence of Distraction on Speech Processing: How Selective is Selective Attention? 3093-3097 - Valérie Hazan, Outi Tuomainen, Linda Taschenberger:
Subjective Evaluation of Communicative Effort for Younger and Older Adults in Interactive Tasks with Energetic and Informational Masking. 3098-3102 - Chris Davis, Jeesun Kim:
Perceiving Older Adults Producing Clear and Lombard Speech. 3103-3107 - Tomás Arias-Vergara, Juan Rafael Orozco-Arroyave, Milos Cernak, Sandra Gollwitzer, Maria Schuster, Elmar Nöth:
Phone-Attribute Posteriors to Evaluate the Speech of Cochlear Implant Users. 3108-3112 - Nao Hodoshima:
Effects of Urgent Speech and Congruent/Incongruent Text on Speech Intelligibility in Noise and Reverberation. 3113-3117 - Nursadul Mamun, Ria Ghosh, John H. L. Hansen:
Quantifying Cochlear Implant Users' Ability for Speaker Identification Using CI Auditory Stimuli. 3118-3122 - E. Felker, Mirjam Ernestus, Mirjam Broersma:
Lexically Guided Perceptual Learning of a Vowel Shift in an Interactive L2 Listening Context. 3123-3127 - Maximillian Paulus, Valérie Hazan, Patti Adank:
Talker Intelligibility and Listening Effort with Temporally Modified Speech. 3128-3132 - Lauren Ward, Catherine Robinson, Matthew Paradis, Katherine M. Tucker, Ben G. Shirley:
R2SPIN: Re-Recording the Revised Speech Perception in Noise Test. 3133-3137 - Fei Chen:
Contributions of Consonant-Vowel Transitions to Mandarin Tone Identification in Simulated Electric-Acoustic Hearing. 3138-3142
Speech Enhancement: Single Channel 1
- Shadi Pirhosseinloo, Jonathan S. Brumberg:
Monaural Speech Enhancement with Dilated Convolutions. 3143-3147 - Chien-Feng Liao, Yu Tsao, Hung-yi Lee, Hsin-Min Wang:
Noise Adaptive Speech Enhancement Using Domain Adversarial Training. 3148-3152 - Meng Ge, Longbiao Wang, Nan Li, Hao Shi, Jianwu Dang, Xiangang Li:
Environment-Dependent Attention-Driven Recurrent Convolutional Neural Network for Robust Speech Enhancement. 3153-3157 - Manuel Pariente, Antoine Deleforge, Emmanuel Vincent:
A Statistically Principled and Computationally Efficient Approach to Speech Enhancement Using Variational Autoencoders. 3158-3162 - Ju Lin, Sufeng Niu, Zice Wei, Xiang Lan, Adriaan J. de Lind van Wijngaarden, Melissa C. Smith, Kuang-Ching Wang:
Speech Enhancement Using Forked Generative Adversarial Networks with Spectral Subtraction. 3163-3167 - Ryandhimas E. Zezario, Szu-Wei Fu, Xugang Lu, Hsin-Min Wang, Yu Tsao:
Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric. 3168-3172 - Fu-Kai Chuang, Syu-Siang Wang, Jeih-weih Hung, Yu Tsao, Shih-Hau Fang:
Speaker-Aware Deep Denoising Autoencoder with Embedded Speaker Identity for Speech Enhancement. 3173-3177 - Yun Liu, Hui Zhang, Xueliang Zhang, Yuhang Cao:
Investigation of Cost Function for Supervised Monaural Speech Separation. 3178-3182 - Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Jiqing Han, Anyan Shi:
Deep Attention Gated Dilated Temporal Convolutional Networks with Intra-Parallel Convolutional Modules for End-to-End Monaural Speech Separation. 3183-3187 - Xianyun Wang, Changchun Bao:
Masking Estimation with Phase Restoration of Clean Speech for Monaural Speech Enhancement. 3188-3192 - Jorge Llombart, Dayana Ribas, Antonio Miguel, Luis Vicente, Alfonso Ortega Giménez, Eduardo Lleida:
Progressive Speech Enhancement with Residual Connections. 3193-3197
Speech Recognition and Beyond
- Langzhou Chen, Volker Leutnant:
Acoustic Model Bootstrapping Using Semi-Supervised Learning. 3198-3202 - Gautam Mantena, Ozlem Kalinli, Ossama Abdel-Hamid, Don McAllaster:
Bandwidth Embeddings for Mixed-Bandwidth Speech Recognition. 3203-3207 - Shreya Khare, Rahul Aralikatte, Senthil Mani:
Adversarial Black-Box Attacks on Automatic Speech Recognition Systems Using Multi-Objective Evolutionary Optimization. 3208-3212 - Bilal Soomro, Anssi Kanervisto, Trung Ngo Trong, Ville Hautamäki:
Towards Debugging Deep Neural Networks by Generating Speech Utterances. 3213-3217 - Haisong Ding, Kai Chen, Qiang Huo:
Compression of CTC-Trained Acoustic Models by Dynamic Frame-Wise Distillation or Segment-Wise N-Best Hypotheses Imitation. 3218-3222 - Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen:
Keyword Spotting for Hearing Assistive Devices Robust to External Speakers. 3223-3227 - Mortaza Doulaty, Thomas Hain:
Latent Dirichlet Allocation Based Acoustic Data Selection for Automatic Speech Recognition. 3228-3232 - Wenjie Li, Pengyuan Zhang, Yonghong Yan:
Target Speaker Recovery and Recognition Network with Average x-Vector and Global Training. 3233-3237 - Motoyuki Suzuki, Sho Tomita, Tomoki Morita:
Lyrics Recognition from Singing Voice Focused on Correspondence Between Voice and Notes. 3238-3241 - Wei-Ning Hsu, David Harwath, James R. Glass:
Transfer Learning from Audio-Visual Grounding to Speech Recognition. 3242-3246
Emotion Modeling and Analysis
- Hui Luo, Jiqing Han:
Cross-Corpus Speech Emotion Recognition Using Semi-Supervised Transfer Non-Negative Matrix Factorization with Adaptation Regularization. 3247-3251 - Aniruddha Tammewar, Alessandra Cervone, Eva-Maria Messner, Giuseppe Riccardi:
Modeling User Context for Valence Prediction from Narratives. 3252-3256 - Rupayan Chakraborty, Ashish Panda, Meghna Pandharipande, Sonal Joshi, Sunil Kumar Kopparapu:
Front-End Feature Compensation and Denoising for Noise Robust Speech Emotion Recognition. 3257-3261 - Xingfeng Li, Masato Akagi:
The Contribution of Acoustic Features Analysis to Model Emotion Perceptual Process for Language Diversity. 3262-3266 - Rajeev Rajan, Haritha U. G., Sujitha A. C., Rejisha T. M.:
Design and Development of a Multi-Lingual Speech Corpora (TaMaR-EmoDB) for Emotion Analysis. 3267-3271 - Kusha Sridhar, Carlos Busso:
Speech Emotion Recognition with a Reject Option. 3272-3276 - Zhenghao Jin, Houwei Cao:
Development of Emotion Rankers Based on Intended and Perceived Emotion Labels. 3277-3281 - John Gideon, Heather T. Schatten, Melvin G. McInnis, Emily Mower Provost:
Emotion Recognition from Natural Phone Conversations in Individuals with and without Recent Suicidal Ideation. 3282-3286 - Deniece S. Nazareth, Ellen Tournier, Sarah Leimkötter, Esther Janse, Dirk Heylen, Gerben J. Westerhof, Khiet P. Truong:
An Acoustic and Lexical Analysis of Emotional Valence in Spontaneous Speech: Autobiographical Memory Recall in Older Adults. 3287-3291 - Yi Zhao, Atsushi Ando, Shinji Takaki, Junichi Yamagishi, Satoshi Kobashikawa:
Does the Lombard Effect Improve Emotional Communication in Noise? - Analysis of Emotional Speech Acted in Noise. 3292-3296 - Soumaya Gharsellaoui, Sid-Ahmed Selouani, Mohammed Sidi Yakoub:
Linear Discriminant Differential Evolution for Feature Selection in Emotional Speech Recognition. 3297-3301 - Saurabh Sahu, Vikramjit Mitra, Nadee Seneviratne, Carol Y. Espy-Wilson:
Multi-Modal Learning for Speech Emotion Recognition: An Analysis and Comparison of ASR Outputs with Ground Truth Transcription. 3302-3306
Articulatory Phonetics
- Laura Spinu, Maida Percival, Alexei Kochetov:
Articulatory Characteristics of Secondary Palatalization in Romanian Fricatives. 3307-3311 - Louise Ratko, Michael I. Proctor, Felicity Cox:
Articulation of Vowel Length Contrasts in Australian English. 3312-3316 - Andrea Deme, Márton Bartók, Tekla Etelka Gráczi, Tamás Gábor Csapó, Alexandra Markó:
V-to-V Coarticulation Induced Acoustic and Articulatory Variability of Vowels: The Effect of Pitch-Accent. 3317-3321 - Hannah King, Emmanuel Ferragne:
The Contribution of Lip Protrusion to Anglo-English /r/: Evidence from Hyper- and Non-Hyperarticulated Speech. 3322-3326 - Alexandra Markó, Márton Bartók, Tamás Gábor Csapó, Tekla Etelka Gráczi, Andrea Deme:
Articulatory Analysis of Transparent Vowel /iː/ in Harmonic and Antiharmonic Hungarian Stems: Is There a Difference? 3327-3331 - Conceição Cunha, Samuel S. Silva, António J. S. Teixeira, Catarina Oliveira, Paula Martins, Arun A. Joseph, Jens Frahm:
On the Role of Oral Configurations in European Portuguese Nasal Vowels. 3332-3336
Speech and Audio Classification 2
- Yan Xiong, Visar Berisha, Chaitali Chakrabarti:
Residual + Capsule Networks (ResCap) for Simultaneous Single-Channel Overlapped Keyword Recognition. 3337-3341 - Che-Wei Huang, Roland Maas, Sri Harish Mallidi, Björn Hoffmeister:
A Study for Improving Device-Directed Speech Detection Toward Frictionless Human-Machine Interaction. 3342-3346 - Hang Su, Borislav Dzodzo, Xixin Wu, Xunying Liu, Helen Meng:
Unsupervised Methods for Audio Classification from Lecture Discussion Recordings. 3347-3351 - Takanori Ashihara, Yusuke Shinohara, Hiroshi Sato, Takafumi Moriya, Kiyoaki Matsui, Takaaki Fukutomi, Yoshikazu Yamaguchi, Yushi Aono:
Neural Whispered Speech Detection with Imbalanced Learning. 3352-3356 - Christian Bergler, Manuel Schmitt, Rachael Xi Cheng, Andreas K. Maier, Volker Barth, Elmar Nöth:
Deep Learning for Orca Call Type Identification - A Fully Unsupervised Approach. 3357-3361 - Niccolò Sacchi, Alexandre Nanchen, Martin Jaggi, Milos Cernak:
Open-Vocabulary Keyword Spotting with Audio and Text Embeddings. 3362-3366 - Qiang Gao, Shutao Sun, Yaping Yang:
ToneNet: A CNN Model of Tone Classification of Mandarin Chinese. 3367-3371 - Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, Sungjoo Ha:
Temporal Convolution for Real-Time Keyword Spotting on Mobile Devices. 3372-3376 - Zhiying Huang, Shiliang Zhang, Ming Lei:
Audio Tagging with Compact Feedforward Sequential Memory Network and Audio-to-Audio Ratio Based Data Augmentation. 3377-3381 - Hansi Yang, Wei-Qiang Zhang:
Music Genre Classification Using Duplicated Convolutional Layers in Neural Networks. 3382-3386 - Nehory Carmi, Azaria Cohen, Mireille Avigal, Anat Lerner:
A Storyteller's Tale: Literature Audiobooks Genre Classification Using CNN and RNN Architectures. 3387-3390
Speech Coding and Evaluation
- Min-Jae Hwang, Hong-Goo Kang:
Parameter Enhancement for MELP Speech Codec in Noisy Communication Environment. 3391-3395 - Kai Zhen, Jongmo Sung, Mi Suk Lee, Seungkwon Beack, Minje Kim:
Cascaded Cross-Module Residual Learning Towards Lightweight End-to-End Speech Coding. 3396-3400 - Tom Bäckström:
End-to-End Optimization of Source Models for Speech and Audio Coding Using a Machine Learning Framework. 3401-3405 - Jean-Marc Valin, Jan Skoglund:
A Real-Time Wideband Neural Vocoder at 1.6kb/s Using LPCNet. 3406-3410 - Guillaume Fuchs, Chamran Ashour, Tom Bäckström:
Super-Wideband Spectral Envelope Modeling for Speech Coding. 3411-3415 - Xinyu Li, Venkata Chebiyyam, Katrin Kirchhoff:
Speech Audio Super-Resolution for Speech Recognition. 3416-3420 - Deepika Gupta, Hanumant Singh Shekhawat:
Artificial Bandwidth Extension Using H∞ Optimization. 3421-3425 - Gabriel Mittag, Sebastian Möller:
Quality Degradation Diagnosis for Voice Networks - Estimating the Perceived Noisiness, Coloration, and Discontinuity of Transmitted Speech. 3426-3430 - Li Chai, Jun Du, Chin-Hui Lee:
A Cross-Entropy-Guided (CEG) Measure for Speech Enhancement Front-End Assessing Performances of Back-End Automatic Speech Recognition. 3431-3435 - Sebastian Möller, Gabriel Mittag, Thilo Michael, Vincent Barriac, Hitoshi Aoki:
Extending the E-Model Towards Super-Wideband and Fullband Speech Communication Scenarios. 3436-3440
Feature Extraction for ASR
- Samik Sadhu, Hynek Hermansky:
Modulation Vectors as Robust Feature Representation for ASR in Domain Mismatched Conditions. 3441-3445 - Chenda Li, Yanmin Qian:
Prosody Usage Optimization for Children Speech Recognition with Zero Resource Children Speech. 3446-3450 - Purvi Agrawal, Sriram Ganapathy:
Unsupervised Raw Waveform Representation Learning for ASR. 3451-3455 - David B. Ramsay, Kevin Kilgour, Dominik Roblek, Matthew Sharifi:
Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition. 3456-3459 - Alexandre Riviello, Jean-Pierre David:
Binary Speech Features for Keyword Spotting Tasks. 3460-3464 - Steffen Schneider, Alexei Baevski, Ronan Collobert, Michael Auli:
wav2vec: Unsupervised Pre-Training for Speech Recognition. 3465-3469 - Sunghye Cho, Mark Liberman, Yong-cheol Lee:
Automatic Detection of Prosodic Focus in American English. 3470-3474 - Raghav Menon, Herman Kamper, Ewald van der Westhuizen, John A. Quinn, Thomas Niesler:
Feature Exploration for Almost Zero-Resource ASR-Free Keyword Spotting Using a Multilingual Bottleneck Extractor and Correspondence Autoencoders. 3475-3479 - Erfan Loweimi, Peter Bell, Steve Renals:
On Learning Interpretable CNNs with Parametric Modulated Kernel-Based Filters. 3480-3484
Lexicon and Language Model for Speech Recognition
- Lyan Verwimp, Jerome R. Bellegarda:
Reverse Transfer Learning: Can Word Embeddings Trained for Different NLP Tasks Improve Neural Language Models? 3485-3489 - Zhehuai Chen, Mahaveer Jain, Yongqiang Wang, Michael L. Seltzer, Christian Fuegen:
Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR. 3490-3494 - Chang Liu, Zhen Zhang, Pengyuan Zhang, Yonghong Yan:
Character-Aware Sub-Word Level Language Modeling for Uyghur and Turkish ASR. 3495-3499 - Ernest Pusateri, Christophe Van Gysel, Rami Botros, Sameer Badaskar, Mirko Hannemann, Youssef Oualil, Ilya Oparin:
Connecting and Comparing Language Model Interpolation Techniques. 3500-3504 - Yerbolat Khassanov, Zhiping Zeng, Van Tung Pham, Haihua Xu, Eng Siong Chng:
Enriching Rare Word Representations in Neural Language Models by Embedding Matrix Augmentation. 3505-3509 - Jianwei Yu, Max W. Y. Lam, Shoukang Hu, Xixin Wu, Xu Li, Yuewen Cao, Xunying Liu, Helen Meng:
Comparative Study of Parametric and Representation Uncertainty Modeling for Recurrent Neural Network Language Models. 3510-3514 - Wiehan Agenbag, Thomas Niesler:
Improving Automatically Induced Lexicons for Highly Agglutinating Languages Using Data-Driven Morphological Segmentation. 3515-3519 - Alejandro Coucheiro-Limeres, Fernando Fernández Martínez, Rubén San Segundo, Javier Ferreiros López:
Attention-Based Word Vector Prediction with LSTMs and its Application to the OOV Problem in ASR. 3520-3524 - Yingying Gao, Junlan Feng, Ying Liu, Leijing Hou, Xin Pan, Yong Ma:
Code-Switching Sentence Generation by Bert and Generative Adversarial Networks. 3525-3529 - Sandy Ritchie, Richard Sproat, Kyle Gorman, Daan van Esch, Christian Schallhart, Nikos Bampounis, Benoît Brard, Jonas Fromseier Mortensen, Millie Holt, Eoin Mahon:
Unified Verbalization for Speech Recognition & Synthesis Across Languages. 3530-3534 - Dravyansh Sharma, Melissa Wilson, Antoine Bruguier:
Better Morphology Prediction for Better Speech Systems. 3535-3539
First and Second Language Acquisition
- Anke Sennema, Silke Hamann:
Vietnamese Learners Tackling the German /ʃt/ in Perception. 3540-3543 - Scott Lewis, Adib Mehrabi, Esther de Leeuw:
An Articulatory-Acoustic Investigation into GOOSE-Fronting in German-English Bilinguals Residing in London, UK. 3544-3548 - Sabrina Jenne, Ngoc Thang Vu:
Multimodal Articulation-Based Pronunciation Error Detection with Spectrogram and Acoustic Features. 3549-3553 - Anouschka Foltz, Sarah Cooper, Tamsin M. McKelvey:
Using Prosody to Discover Word Order Alternations in a Novel Language. 3554-3558 - Ann R. Bradlow:
Speaking Rate, Information Density, and Information Rate in First-Language and Second-Language Speech. 3559-3563 - Calbert Graham, Francis Nolan:
Articulation Rate as a Metric in Spoken Language Assessment. 3564-3568 - Haiyang Xu, Hui Zhang, Kun Han, Yun Wang, Yiping Peng, Xiangang Li:
Learning Alignment for Multimodal Emotion Recognition from Speech. 3569-3573 - Sharon Peperkamp, Monica Hegde, Maria Julia Carbajal:
Liquid Deletion in French Child-Directed Speech. 3574-3578 - Amanda Seidl, Anne S. Warlaumont, Alejandrina Cristià:
Towards Detection of Canonical Babbling by Citizen Scientists: Performance as a Function of Clip Length. 3579-3583 - Bogdan Ludusan, Annett Jorschick, Reiko Mazuka:
Nasal Consonant Discrimination in Infant- and Adult-Directed Speech. 3584-3588 - Ellen Marklund, Johan Sjons, Lisa Gustavsson, Elísabet Eir Cortes:
No Distributional Learning in Adults from Attended Listening to Non-Speech. 3589-3593 - Okko Räsänen, Khazar Khorrami:
A Computational Model of Early Language Acquisition from Audiovisual Experiences of Young Infants. 3594-3598 - Dan Du, Jinsong Zhang:
The Production of Chinese Affricates /ts/ and /tsh/ by Native Urdu Speakers. 3599-3603
Speech and Audio Classification 3
- Xinyu Li, Venkata Chebiyyam, Katrin Kirchhoff:
Multi-Stream Network with Temporal Attention for Environmental Sound Classification. 3604-3608 - Gianmarco Cerutti, Rahul Prasad, Alessio Brutti, Elisabetta Farella:
Neural Network Distillation on IoT Platforms for Sound Event Detection. 3609-3613 - Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai:
Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection. 3614-3618 - Xue Bai, Jun Du, Zi-Rui Wang, Chin-Hui Lee:
A Hybrid Approach to Acoustic Scene Classification Based on Universal Acoustic Models. 3619-3623 - Ke-Xin He, Yu-Han Shen, Wei-Qiang Zhang:
Hierarchical Pooling Structure for Weakly Labeled Sound Event Detection. 3624-3628 - Wei Xia, Kazuhito Koishida:
Sound Event Detection in Multichannel Audio Using Convolutional Time-Frequency-Channel Squeeze and Excitation. 3629-3633 - Lam Dang Pham, Ian McLoughlin, Huy Phan, Ramaswamy Palaniappan:
A Robust Framework for Acoustic Scene Classification. 3634-3638 - Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang:
Compression of Acoustic Event Detection Models with Quantized Distillation. 3639-3643 - Jiaxu Chen, Jing Hao, Kai Chen, Di Xie, Shicai Yang, Shiliang Pu:
An End-to-End Audio Classification System Based on Raw Waveforms and Mix-Training Strategy. 3644-3648 - Shilei Zhang, Yong Qin, Kewei Sun, Yonghua Lin:
Few-Shot Audio Classification with Attentional Graph Neural Networks. 3649-3653 - Kangkang Lu, Chuan-Sheng Foo, Kah Kuan Teh, Huy Dat Tran, Vijay Ramaseshan Chandrasekhar:
Semi-Supervised Audio Classification with Consistency-Based Regularization. 3654-3658
Speech and Speaker Recognition
- Jan Mizgajski, Adrian Szymczak, Robert Glowski, Piotr Szymanski, Piotr Zelasko, Lukasz Augustyniak, Mikolaj Morzy, Yishay Carmiel, Jeff Hodson, Lukasz Wójciak, Daniel Smoczyk, Adam Wróbel, Bartosz Borowik, Adam Artajew, Marcin Baran, Cezary Kwiatkowski, Marzena Zyla-Hoppe:
Avaya Conversational Intelligence: A Real-Time System for Spoken Language Understanding in Human-Human Call Center Conversations. 3659-3660 - Shounan An, Youngsoo Kim, Hu Xu, Jinwoo Lee, Myungwoo Lee, Insoo Oh:
Robust Keyword Spotting via Recycle-Pooling for Mobile Game. 3661-3662 - Adam Chýlek, Lubos Smídl, Jan Svec:
Multimodal Dialog with the MALACH Audiovisual Archive. 3663-3664 - Sarfaraz Jelil, Abhishek Shrivastava, Rohan Kumar Das, S. R. Mahadeva Prasanna, Rohit Sinha:
SpeechMarker: A Voice Based Multi-Level Attendance Application. 3665-3666 - Jibin Wu, Zihan Pan, Malu Zhang, Rohan Kumar Das, Yansong Chua, Haizhou Li:
Robust Sound Recognition: A Neuromorphic Approach. 3667-3668 - Shoukang Hu, Shansong Liu, Heng Fai Chang, Mengzhe Geng, Jiani Chen, Lau Wing Chung, To Ka Hei, Jianwei Yu, Ka Ho Wong, Xunying Liu, Helen Meng:
The CUHK Dysarthric Speech Recognition Systems for English and Cantonese. 3669-3670
Speech Annotation and Labelling
- Florian Schiel, Thomas Kisler:
BAS Web Services for Automatic Subtitle Creation and Anonymization. 3671-3672 - Jana Voße, Petra Wagner:
A User-Friendly and Adaptable Re-Implementation of an Acoustic Prominence Detection and Annotation Tool. 3673-3674 - Mónica Domínguez, Patrick Louis Rohrer, Juan Soler Company:
PyToBI: A Toolkit for ToBI Labeling Under Python. 3675-3676 - Golan Levy, Raquel Sitman, Ido Amir, Eduard Golshtein, Ran Mochary, Eilon Reshef, Roi Reichart, Omri Allouche:
GECKO - A Tool for Effective Annotation of Human Conversations. 3677-3678 - Roger Yu-Hsiang Lo, Kathleen Currie Hall:
SLP-AA: Tools for Sign Language Phonetic and Phonological Research. 3679-3680 - Xinjian Li, Zhong Zhou, Siddharth Dalmia, Alan W. Black, Florian Metze:
SANTLR: Speech Annotation Toolkit for Low Resource Languages. 3681-3682
Speech Synthesis
- Martin Gruber, Jakub Vít, Jindrich Matousek:
Web-Based Speech Synthesis Editor. 3683-3684 - Olivier Perrotin, Ian McLoughlin:
GFM-Voc: A Real-Time Voice Quality Modification System. 3685-3686 - Éva Székely, Gustav Eje Henter, Jonas Beskow, Joakim Gustafson:
Off the Cuff: Exploring Extemporaneous Speech Delivery with TTS. 3687-3688 - Lucas Kessler, Cecilia Ovesdotter Alm, Reynold Bailey:
Synthesized Spoken Names: Biases Impacting Perception. 3689-3690 - Luís Bernardo, Mathieu Giquel, Sebastião Quintas, Paulo Dimas, Helena Moniz, Isabel Trancoso:
Unbabel Talk - Human Verified Translations for Voice Instant Messaging. 3691-3692 - Azam Rabiee, Tae-Ho Kim, Soo-Young Lee:
Adjusting Pleasure-Arousal-Dominance for Continuous Emotional Text-to-Speech Synthesizer. 3693-3694
Keynote 4: Mirella Lapata
- Mirella Lapata:
Learning Natural Language Interfaces with Neural Models.
Privacy in Speech and Audio Interfaces
- Andreas Nautsch, Catherine Jasserand, Els Kindt, Massimiliano Todisco, Isabel Trancoso, Nicholas W. D. Evans:
The GDPR & Speech Data: Reflections of Legal and Technology Communities, First Steps Towards a Common Understanding. 3695-3699 - Brij Mohan Lal Srivastava, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent:
Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion? 3700-3704 - Alexandru Nelus, Silas Rech, Timm Koppelmann, Henrik Biermann, Rainer Martin:
Privacy-Preserving Siamese Feature Extraction for Gender Recognition versus Speaker Identification. 3705-3709 - Alexandru Nelus, Janek Ebbers, Reinhold Haeb-Umbach, Rainer Martin:
Privacy-Preserving Variational Information Feature Extraction for Domestic Activity Monitoring versus Speaker Identification. 3710-3714 - Patricia Thaine, Gerald Penn:
Extracting Mel-Frequency and Bark-Frequency Cepstral Coefficients from Encrypted Signals. 3715-3719 - Pablo Pérez Zarazaga, Sneha Das, Tom Bäckström, Vishnu Vidyadhara Raju Vegesna, Anil Kumar Vuppala:
Sound Privacy: A Conversational Speech Corpus for Quantifying the Experience of Privacy. 3720-3724
Speech Technologies for Code-Switching in Multilingual Communities
- Victor Soto, Julia Hirschberg:
Improving Code-Switched Language Modeling Performance Using Cognate Features. 3725-3729 - Grandee Lee, Xianghu Yue, Haizhou Li:
Linguistically Motivated Parallel Data Augmentation for Code-Switch Language Modeling. 3730-3734 - Sai Krishna Rallabandi, Alan W. Black:
Variational Attention Using Articulatory Priors for Generating Code Mixed Speech Using Monolingual Corpora. 3735-3739 - Qinyi Wang, Emre Yilmaz, Adem Derinel, Haizhou Li:
Code-Switching Detection Using ASR-Generated Language Posteriors. 3740-3744 - Astik Biswas, Emre Yilmaz, Febe de Wet, Ewald van der Westhuizen, Thomas Niesler:
Semi-Supervised Acoustic Model Training for Five-Lingual Code-Switched ASR. 3745-3749 - Emre Yilmaz, Samuel Cohen, Xianghu Yue, David A. van Leeuwen, Haizhou Li:
Multi-Graph Decoding for Code-Switching ASR. 3750-3754 - Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. Hershey:
End-to-End Multilingual Multi-Speaker Speech Recognition. 3755-3759
Speech Synthesis: Articulatory and Physical Approaches
- Oriol Guasch:
Survey Talk: Realistic Physics-Based Computational Voice Production. - Debasish Ray Mohapatra, Victor Zappi, Sidney S. Fels:
An Extended Two-Dimensional Vocal Tract Model for Fast Acoustic Simulation of Single-Axis Symmetric Three-Dimensional Tubes. 3760-3764 - Peter Birkholz, Susanne Drechsel, Simon Stone:
Perceptual Optimization of an Enhanced Geometric Vocal Fold Model for Articulatory Speech Synthesis. 3765-3769 - Yingming Gao, Simon Stone, Peter Birkholz:
Articulatory Copy Synthesis Based on a Genetic Algorithm. 3770-3774 - Abdolreza Sabzi Shahrebabaki, Negar Olfati, Ali Shariq Imran, Sabato Marco Siniscalchi, Torbjørn Svendsen:
A Phonetic-Level Analysis of Different Input Features for Articulatory Inversion. 3775-3779
Sequence-to-Sequence Speech Recognition
- Zoltán Tüske, Kartik Audhkhasi, George Saon:
Advancing Sequence-to-Sequence Based Speech Recognition. 3780-3784 - Awni Y. Hannun, Ann Lee, Qiantong Xu, Ronan Collobert:
Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions. 3785-3789 - Murali Karthick Baskar, Shinji Watanabe, Ramón Fernandez Astudillo, Takaaki Hori, Lukás Burget, Jan Cernocký:
Semi-Supervised Sequence-to-Sequence ASR Using Unpaired Speech and Text. 3790-3794 - Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen:
Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition. 3795-3799 - Kazuki Irie, Rohit Prabhavalkar, Anjuli Kannan, Antoine Bruguier, David Rybach, Patrick Nguyen:
On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition. 3800-3804 - Felix Weninger, Jesús Andrés-Ferrer, Xinwei Li, Puming Zhan:
Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR. 3805-3809
Search Methods for Speech Recognition
- Anna V. Rúnarsdóttir, Inga Rún Helgadóttir, Jón Guðnason:
Lattice Re-Scoring During Manual Editing for Automatic Error Correction of ASR Transcripts. 3810-3814 - Daisuke Fukunaga, Yoshiki Tanaka, Yuichi Kageyama:
GPU-Based WFST Decoding with Extra Large Language Model. 3815-3819 - Javier Jorge, Adrià Giménez, Javier Iranzo-Sánchez, Jorge Civera, Albert Sanchís, Alfons Juan:
Real-Time One-Pass Decoder for Speech Recognition Using LSTM Language Models. 3820-3824 - Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Niko Moritz, Jonathan Le Roux:
Vectorized Beam Search for CTC-Attention-Based Speech Recognition. 3825-3829 - Jack Serrino, Leonid Velikovich, Petar S. Aleksic, Cyril Allauzen:
Contextual Recovery of Out-of-Lattice Named Entities in Automatic Speech Recognition. 3830-3834 - Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura:
Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition. 3835-3839
Audio Signal Characterization
- Zheng Lian, Jianhua Tao, Bin Liu, Jian Huang:
Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition. 3840-3844 - Huy Phan, Oliver Y. Chén, Lam Dang Pham, Philipp Koch, Maarten De Vos, Ian McLoughlin, Alfred Mertins:
Spatio-Temporal Attention Pooling for Audio Scene Classification. 3845-3849 - Qiuying Shi, Hui Luo, Jiqing Han:
Subspace Pooling Based Temporal Features Extraction for Audio Event Recognition. 3850-3854 - Jingyang Zhang, Wenhao Ding, Jintao Kang, Liang He:
Multi-Scale Time-Frequency Attention for Acoustic Event Detection. 3855-3859 - Hongwei Song, Jiqing Han, Shiwen Deng, Zhihao Du:
Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events. 3860-3864 - Xiaoke Qi, Lu Wang:
Parameter-Transfer Learning for Low-Resource Individualization of Head-Related Transfer Functions. 3865-3869
Speech and Voice Disorders 1
- Lei Liu, Meng Jian, Wentao Gu:
Prosodic Characteristics of Mandarin Declarative and Interrogative Utterances in Parkinson's Disease. 3870-3874 - Laureano Moro-Velázquez, Jaejin Cho, Shinji Watanabe, Mark A. Hasegawa-Johnson, Odette Scharenborg, Heejin Kim, Najim Dehak:
Study of the Performance of Automatic Speech Recognition Systems in Speakers with Parkinson's Disease. 3875-3879 - Tianqi Wang, Chongyuan Lian, Jingshen Pan, Quanlei Yan, Feiqi Zhu, Manwa L. Ng, Lan Wang, Nan Yan:
Towards the Speech Features of Mild Cognitive Impairment: Universal Evidence from Structured and Unstructured Connected Speech of Chinese. 3880-3884 - Jiarui Wang, Ying Qin, Zhiyuan Peng, Tan Lee:
Child Speech Disorder Detection with Siamese Recurrent Network Using Speech Attribute Features. 3885-3889 - Daniel Korzekwa, Roberto Barra-Chicote, Bozena Kostek, Thomas Drugman, Mateusz Lajszczak:
Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech. 3890-3894 - Camille Noufi, Adam C. Lammert, Daryush D. Mehta, James R. Williamson, Gregory A. Ciccarelli, Douglas E. Sturim, Jordan R. Green, Thomas F. Campbell, Thomas F. Quatieri:
Vocal Biomarker Assessment Following Pediatric Traumatic Brain Injury: A Retrospective Cohort Study. 3895-3899
Neural Networks for Language Modeling
- Odette Scharenborg:
Survey Talk: Reaching Over the Gap: Cross- and Interdisciplinary Research on Human and Automatic Speech Processing. - Atsunori Ogawa, Marc Delcroix, Shigeki Karita, Tomohiro Nakatani:
Improved Deep Duel Model for Rescoring N-Best Speech Recognition List Using Backward LSTMLM and Ensemble Encoders. 3900-3904 - Kazuki Irie, Albert Zeyer, Ralf Schlüter, Hermann Ney:
Language Modeling with Deep Transformers. 3905-3909 - Anirudh Raju, Denis Filimonov, Gautam Tiwari, Guitang Lan, Ariya Rastrow:
Scalable Multi Corpora Neural Language Models for ASR. 3910-3914 - Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert:
Who Needs Words? Lexicon-Free Speech Recognition. 3915-3919
Representation Learning of Emotion and Paralinguistics
- Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Julien Epps:
Direct Modelling of Speech Emotion from Raw Speech. 3920-3924 - Mousmita Sarma, Pegah Ghahremani, Daniel Povey, Nagendra Kumar Goel, Kandarpa Kumar Sarma, Najim Dehak:
Improving Emotion Identification Using Phone Posteriors in Raw Speech Waveform Based DNN. 3925-3929 - Miao Cao, Chun Yang, Fang Zhou, Xu-Cheng Yin:
Pyramid Memory Block and Timestep Attention for Speech Emotion Recognition. 3930-3934 - Christopher Oates, Andreas Triantafyllopoulos, Ingmar Steiner, Björn W. Schuller:
Robust Speech Emotion Recognition Under Different Encoding Conditions. 3935-3939 - Gábor Gosztolya:
Using the Bag-of-Audio-Word Feature Representation of ASR DNN Posteriors for Paralinguistic Classification. 3940-3944 - Jennifer Williams, Simon King:
Disentangling Style Factors from Speaker Representations. 3945-3949
World’s Languages and Varieties
- Yu-Yin Hsu, Anqi Xu:
Sentence Prosody and Wh-Indeterminates in Taiwan Mandarin. 3950-3954 - Fang Hu, Youjue He:
Frication as a Vowel Feature? - Evidence from the Rui'an Wu Chinese Dialect. 3955-3959 - Zhenrui Zhang, Fang Hu:
Vowels and Diphthongs in the Xupu Xiang Chinese Dialect. 3960-3964 - Luciana Albuquerque, Catarina Oliveira, António J. S. Teixeira, Pedro Sá-Couto, Daniela Figueiredo:
Age-Related Changes in European Portuguese Vowel Acoustics. 3965-3969 - Wendy Lalhminghlui, Viyazonuo Terhiija, Priyankoo Sarmah:
Vowel-Tone Interaction in Two Tibeto-Burman Languages. 3970-3974 - Jenifer Vega Rodríguez:
The Vowel System of Korebaju. 3975-3979
Adaptation and Accommodation in Conversation
- Omnia Ibrahim, Gabriel Skantze, Sabine Stoll, Volker Dellwo:
Fundamental Frequency Accommodation in Multi-Party Human-Robot Game Interactions: The Effect of Winning or Losing. 3980-3984 - Petra Wagner, Nataliya Bryhadyr, Marin Schröer:
Pitch Accent Trajectories Across Different Conditions of Visibility and Information Structure - Evidence from Spontaneous Dyadic Interaction. 3985-3989 - Simon Betz, Sina Zarrieß, Éva Székely, Petra Wagner:
The Greennn Tree - Lengthening Position Influences Uncertainty Perception. 3990-3994 - Yuke Si, Longbiao Wang, Jianwu Dang, Mengfei Wu, Aijun Li:
CNN-BLSTM Based Question Detection from Dialogs Considering Phase and Context Information. 3995-3999 - Katherine Metcalf, Barry-John Theobald, Garrett Weinberg, Robert Lee, Ing-Marie Jonsson, Russ Webb, Nicholas Apostoloff:
Mirroring to Build Trust in Digital Assistants. 4000-4004 - Eran Raveh, Ingo Siegert, Ingmar Steiner, Iona Gessinger, Bernd Möbius:
Three's a Crowd? Effects of a Second Human on Vocal Accommodation with a Voice Assistant. 4005-4009
Speaker and Language Recognition 2
- Qing Wang, Pengcheng Guo, Sining Sun, Lei Xie, John H. L. Hansen:
Adversarial Regularization for End-to-End Robust Speaker Verification. 4010-4014 - João Monteiro, Jahangir Alam, Tiago H. Falk:
Combining Speaker Recognition and Metric Learning for Speaker-Dependent Representation Learning. 4015-4019 - Yang Zhang, Lantian Li, Dong Wang:
VAE-Based Regularization for Deep Speaker Embedding. 4020-4024 - Victoria Mingote, Diego Castán, Mitchell McLaren, Mahesh Kumar Nandwana, Alfonso Ortega Giménez, Eduardo Lleida, Antonio Miguel:
Language Recognition Using Triplet Neural Networks. 4025-4029 - Youngmoon Jung, Younggwan Kim, Hyungjun Lim, Yeunju Choi, Hoirin Kim:
Spatial Pyramid Encoding with Convex Length Normalization for Text-Independent Speaker Verification. 4030-4034 - Hee-Soo Heo, Jee-weon Jung, Il-Ho Yang, Sung-Hyun Yoon, Hye-jin Shim, Ha-Jin Yu:
End-to-End Losses Based on Speaker Basis Vectors and All-Speaker Hard Negative Mining for Speaker Verification. 4035-4039 - Yiheng Jiang, Yan Song, Ian McLoughlin, Zhifu Gao, Li-Rong Dai:
An Effective Deep Embedding Learning Architecture for Speaker Verification. 4040-4044 - Xiaoyi Qin, Danwei Cai, Ming Li:
Far-Field End-to-End Text-Dependent Speaker Verification Based on Mixed Training Data with Transfer Learning and Enrollment Data Augmentation. 4045-4049 - Zongze Ren, Guofu Yang, Shugong Xu:
Two-Stage Training for Chinese Dialect Recognition. 4050-4054 - Ryota Kaminishi, Haruna Miyamoto, Sayaka Shiota, Hitoshi Kiya:
Investigation on Blind Bandwidth Extension with a Non-Linear Function and its Evaluation of x-Vector-Based Speaker Verification. 4055-4059 - Umair Khan, Miquel India, Javier Hernando:
Auto-Encoding Nearest Neighbor i-Vectors for Speaker Verification. 4060-4064 - Siqi Zheng, Gang Liu, Hongbin Suo, Yun Lei:
Towards a Fault-Tolerant Speaker Verification System: A Regularization Approach to Reduce the Condition Number. 4065-4069 - Hassan Taherian, Zhong-Qiu Wang, DeLiang Wang:
Deep Learning Based Multi-Channel Speaker Recognition in Noisy and Reverberant Environments. 4070-4074 - Joon-Young Yang, Joon-Hyuk Chang:
Joint Optimization of Neural Acoustic Beamforming and Dereverberation with x-Vectors for Robust Speaker Verification. 4075-4079 - Xiaoxiao Miao, Ian McLoughlin, Yonghong Yan:
A New Time-Frequency Attention Mechanism for TDNN and CNN-LSTM-TDNN, with Application to Language Identification. 4080-4084
Medical Applications and Visual ASR
- Jun Chen, Ji Zhu, Jieping Ye:
An Attention-Based Hybrid Network for Automatic Detection of Alzheimer's Disease from Narrative Speech. 4085-4089 - Pingchuan Ma, Stavros Petridis, Maja Pantic:
Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition. 4090-4094 - Jasper Ooster, Pia Nancy Porysek Moreta, Jörg-Hendrik Bach, Inga Holube, Bernd T. Meyer:
"Computer, Test My Hearing": Accurate Speech Audiometry with Smart Speakers. 4095-4099 - Aciel Eshky, Manuel Sam Ribeiro, Korin Richmond, Steve Renals:
Synchronising Audio and Ultrasound by Learning Cross-Modal Embeddings. 4100-4104 - Yilin Pan, Bahman Mirheidari, Markus Reuber, Annalena Venneri, Daniel Blackburn, Heidi Christensen:
Automatic Hierarchical Attention Neural Network for Detecting AD. 4105-4109 - Venkata Srikanth Nallanthighal, Aki Härmä, Helmer Strik:
Deep Sensing of Breathing Signal During Conversational Speech. 4110-4114 - Fadi Biadsy, Ron J. Weiss, Pedro J. Moreno, Dimitri Kanvesky, Ye Jia:
Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation. 4115-4119 - Shansong Liu, Shoukang Hu, Yi Wang, Jianwei Yu, Rongfeng Su, Xunying Liu, Helen Meng:
Exploiting Visual Features Using Bayesian Gated Neural Networks for Disordered Speech Recognition. 4120-4124 - Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Maja Pantic:
Video-Driven Speech Reconstruction Using Generative Adversarial Networks. 4125-4129 - Shansong Liu, Shoukang Hu, Xunying Liu, Helen Meng:
On the Use of Pitch Features for Disordered Speech Recognition. 4130-4134 - Brendan Shillingford, Yannis M. Assael, Matthew W. Hoffman, Thomas Paine, Cían Hughes, Utsav Prabhu, Hank Liao, Hasim Sak, Kanishka Rao, Lorrayne Bennett, Marie Mulville, Misha Denil, Ben Coppin, Ben Laurie, Andrew W. Senior, Nando de Freitas:
Large-Scale Visual Speech Recognition. 4135-4139
Turn Management in Dialogue
- Seyedeh Zahra Razavi, Benjamin Kane, Lenhart K. Schubert:
Investigating Linguistic and Semantic Features for Turn-Taking Prediction in Open-Domain Human-Computer Conversation. 4140-4144 - Frédéric Béchet, Christian Raymond:
Benchmarking Benchmarks: Introducing New Automatic Indicators for Benchmarking Spoken Language Understanding Corpora. 4145-4149 - Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro:
A Neural Turn-Taking Model without RNN. 4150-4154 - Andrei Catalin Coman, Koichiro Yoshino, Yukitoshi Murase, Satoshi Nakamura, Giuseppe Riccardi:
An Incremental Turn-Taking Model for Task-Oriented Dialog Systems. 4155-4159 - Feng-Guang Su, Aliyah R. Hsu, Yi-Lin Tuan, Hung-yi Lee:
Personalized Dialogue Response Generation Learned from Monologues. 4160-4164 - Mattias Heldner, Marcin Wlodarczak, Stefan Benus, Agustín Gravano:
Voice Quality as a Turn-Taking Cue. 4165-4169 - Kohei Hara, Koji Inoue, Katsuya Takanashi, Tatsuya Kawahara:
Turn-Taking Prediction Based on Detection of Transition Relevance Place. 4170-4174 - Divesh Lala, Shizuka Nakamura, Tatsuya Kawahara:
Analysis of Effect and Timing of Fillers in Natural Turn-Taking. 4175-4179 - Shota Horiguchi, Naoyuki Kanda, Kenji Nagamatsu:
Multimodal Response Obligation Detection with Unsupervised Online Domain Adaptation. 4180-4184 - Ming-Hsiang Su, Chung-Hsien Wu, Yi Chang:
Follow-Up Question Generation Using Neural Tensor Network-Based Domain Ontology Population in an Interview Coaching System. 4185-4189
Corpus Annotation and Evaluation
- Trang Tran, Jiahong Yuan, Yang Liu, Mari Ostendorf:
On the Role of Style in Parsing Speech with Neural Models. 4190-4194 - Ankita Pasad, Bowen Shi, Herman Kamper, Karen Livescu:
On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval. 4195-4199 - Xinhao Wang, Su-Youn Yoon, Keelan Evanini, Klaus Zechner, Yao Qian:
Automatic Detection of Off-Topic Spoken Responses Using Very Deep Convolutional Neural Networks. 4200-4204 - Anna Piunova, Eugen Beck, Ralf Schlüter, Hermann Ney:
Rescoring Keyword Search Confidence Estimates with Graph-Based Re-Ranking Using Acoustic Word Embeddings. 4205-4209 - Yael Segal, Tzeviya Sylvia Fuchs, Joseph Keshet:
SpeechYOLO: Detection and Localization of Speech Objects. 4210-4214 - Alp Öktem, Mireia Farrús, Antonio Bonafonte:
Prosodic Phrase Alignment for Machine Dubbing. 4215-4219 - Christina Tånnander, Per Fallgren, Jens Edlund, Joakim Gusafsson:
Spot the Pleasant People! Navigating the Cocktail Party Buzz. 4220-4224 - Zhi Chen, Wu Guo, Li-Rong Dai, Zhen-Hua Ling, Jun Du:
Neural Text Clustering with Document-Level Attention Based on Dynamic Soft Labels. 4225-4229 - Nguyen Bach, Fei Huang:
Noisy BiLSTM-Based Models for Disfluency Detection. 4230-4234 - Mittul Singh, Sami Virpioja, Peter Smit, Mikko Kurimo:
Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search. 4235-4239 - Takashi Maekaku, Yusuke Kida, Akihiko Sugiyama:
Simultaneous Detection and Localization of a Wake-Up Word Using Multi-Task Learning of the Duration and Endpoint. 4240-4244
Speech Enhancement: Multi-Channel and Intelligibility
- Ching Hua Lee, Kuan-Lin Chen, Fredric J. Harris, Bhaskar D. Rao, Harinath Garudadri:
On Mitigating Acoustic Feedback in Hearing Aids with Frequency Warping by All-Pass Networks. 4245-4249 - Amin Fazel, Mostafa El-Khamy, Jungwon Lee:
Deep Multitask Acoustic Echo Cancellation. 4250-4254 - Hao Zhang, Ke Tan, DeLiang Wang:
Deep Learning for Joint Acoustic Echo and Noise Cancellation with Nonlinear Distortions. 4255-4259 - Charlotte Sørensen, Jesper Bünsow Boldt, Mads Græsbøll Christensen:
Harmonic Beamformers for Non-Intrusive Speech Intelligibility Prediction. 4260-4264 - Nursadul Mamun, Soheil Khorram, John H. L. Hansen:
Convolutional Neural Network-Based Speech Enhancement for Cochlear Implant Recipients. 4265-4269 - Charlotte Sørensen, Jesper Bünsow Boldt, Mads Græsbøll Christensen:
Validation of the Non-Intrusive Codebook-Based Short Time Objective Intelligibility Metric for Processed Speech. 4270-4274 - Kenichi Arai, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani, Katsuhiko Yamamoto, Toshio Irino:
Predicting Speech Intelligibility of Enhanced Speech Using Phone Accuracy of DNN-Based ASR System. 4275-4279 - Suliang Bu, Yunxin Zhao, Mei-Yuh Hwang:
A Novel Method to Correct Steering Vectors in MVDR Beamformer for Noise Robust ASR. 4280-4284 - Hyeon Seung Lee, Hyung Yong Kim, Woo Hyun Kang, Jeunghun Kim, Nam Soo Kim:
End-to-End Multi-Channel Speech Enhancement Using Inter-Channel Time-Restricted Attention on Raw Waveform. 4285-4289 - Rongzhi Gu, Lianwu Chen, Shi-Xiong Zhang, Jimeng Zheng, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu:
Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information. 4290-4294 - Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman:
My Lips Are Concealed: Audio-Visual Speech Enhancement Through Obstructions. 4295-4299
Speaker Recognition 3
- Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe:
End-to-End Neural Speaker Diarization with Permutation-Free Objectives. 4300-4304 - Miquel India, Pooyan Safari, Javier Hernando:
Self Multi-Head Attention for Speaker Recognition. 4305-4309 - Ignacio Viñals, Dayana Ribas, Victoria Mingote, Jorge Llombart, Pablo Gimeno, Antonio Miguel, Alfonso Ortega Giménez, Eduardo Lleida:
Phonetically-Aware Embeddings, Wide Residual Networks with Time-Delay Neural Networks and Self Attention Models for the 2018 NIST Speaker Recognition Evaluation. 4310-4314 - Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien:
Variational Domain Adversarial Learning for Speaker Verification. 4315-4319 - Tianchi Liu, Maulik C. Madhavi, Rohan Kumar Das, Haizhou Li:
A Unified Framework for Speaker and Utterance Verification. 4320-4324 - Mahesh Kumar Nandwana, Luciana Ferrer, Mitchell McLaren, Diego Castán, Aaron Lawson:
Analysis of Critical Metadata Factors for the Calibration of Speaker Recognition Systems. 4325-4329 - Ondrej Novotný, Oldrich Plchot, Ondrej Glembek, Lukás Burget:
Factorization of Discriminatively Trained i-Vector Extractor for Speaker Recognition. 4330-4334 - Daniele Salvati, Carlo Drioli, Gian Luca Foresti:
End-to-End Speaker Identification in Noisy and Reverberant Environments Using Raw Waveform Convolutional Neural Networks. 4335-4339 - Abinay Reddy Naini, Achuth Rao M. V, Prasanta Kumar Ghosh:
Whisper to Neutral Mapping Using Cosine Similarity Maximization in i-Vector Space for Speaker Verification. 4340-4344 - Yingke Zhu, Tom Ko, Brian Mak:
Mixup Learning Strategies for Text-Independent Speaker Verification. 4345-4349 - Luciana Ferrer, Mitchell McLaren:
Optimizing a Speaker Embedding Extractor Through Backend-Driven Regularization. 4350-4354 - Kong Aik Lee, Hitoshi Yamamoto, Koji Okabe, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda:
The NEC-TT 2018 Speaker Verification System. 4355-4359 - Siqi Zheng, Gang Liu, Hongbin Suo, Yun Lei:
Autoencoder-Based Semi-Supervised Curriculum Learning for Out-of-Domain Speaker Verification. 4360-4364 - Danwei Cai, Xiaoyi Qin, Ming Li:
Multi-Channel Training for End-to-End Speaker Recognition Under Reverberant and Noisy Environment. 4365-4369 - Danwei Cai, Weicheng Cai, Ming Li:
The DKU-SMIIP System for NIST 2018 Speaker Recognition Evaluation. 4370-4374
NN Architectures for ASR
- Matthew Wiesner, Adithya Renduchintala, Shinji Watanabe, Chunxi Liu, Najim Dehak, Sanjeev Khudanpur:
Pretraining by Backtranslation for End-to-End ASR in Low-Resource Settings. 4375-4379 - Suyoun Kim, Siddharth Dalmia, Florian Metze:
Cross-Attention End-to-End ASR for Two-Party Conversations. 4380-4384 - Jan Chorowski, Adrian Lancucki, Bartosz Kostka, Michal Zapotoczny:
Towards Using Context-Dependent Symbols in CTC Without State-Tying Decision Trees. 4385-4389 - Ruchao Fan, Pan Zhou, Wei Chen, Jia Jia, Gang Liu:
An Online Attention-Based Model for Speech Recognition. 4390-4394 - Zhengkun Tian, Jiangyan Yi, Jianhua Tao, Ye Bai, Zhengqi Wen:
Self-Attention Transducers for End-to-End Speech Recognition. 4395-4399 - Sheng Li, Raj Dabre, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai:
Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation. 4400-4404 - Jeong-Uk Bang, Mu-Yeol Choi, Sang-Hun Kim, Oh-Wook Kwon:
Extending an Acoustic Data-Driven Phone Set for Spontaneous Speech Recognition. 4405-4409 - Takafumi Moriya, Jian Wang, Tomohiro Tanaka, Ryo Masumura, Yusuke Shinohara, Yoshikazu Yamaguchi, Yushi Aono:
Joint Maximization Decoder with Neural Converters for Fully Neural Network-Based Japanese Speech Recognition. 4410-4414 - Titouan Parcollet, Mohamed Morchid, Georges Linarès, Renato De Mori:
Real to H-Space Encoder for Speech Recognition. 4415-4419 - Cheng Yi, Feng Wang, Bo Xu:
Ectc-Docd: An End-to-End Structure with CTC Encoder and OCD Decoder for Speech Recognition. 4420-4424 - Pavel Denisov, Ngoc Thang Vu:
End-to-End Multi-Speaker Speech Recognition Using Speaker Embeddings and Transfer Learning. 4425-4429
Speech Synthesis: Text Processing, Prosody, and Emotion
- Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Shubham Toshniwal, Karen Livescu:
Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis. 4430-4434 - Éva Székely, Gustav Eje Henter, Jonas Beskow, Joakim Gustafson:
Spontaneous Conversational Speech Synthesis from Found Data. 4435-4439 - Viacheslav Klimkov, Srikanth Ronanki, Jonas Rohnke, Thomas Drugman:
Fine-Grained Robust Prosody Transfer for Single-Speaker Neural Text-To-Speech. 4440-4444 - Nusrah Hussain, Engin Erzin, T. Metin Sezgin, Yücel Yemez:
Speech Driven Backchannel Generation Using Deep Q-Network for Enhancing Engagement in Human-Robot Interaction. 4445-4449 - Tomoki Koriyama, Takao Kobayashi:
Semi-Supervised Prosody Modeling Using Deep Gaussian Process Latent Variable Model. 4450-4454 - Anna Björk Nikulásdóttir, Jón Guðnason:
Bootstrapping a Text Normalization System for an Inflected Language. Numbers as a Test Case. 4455-4459 - Haohan Guo, Frank K. Soong, Lei He, Lei Xie:
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS. 4460-4464 - Jinfu Ni, Yoshinori Shiga, Hisashi Kawai:
Duration Modeling with Global Phoneme-Duration Vectors. 4465-4469 - Adèle Aubin, Alessandra Cervone, Oliver Watts, Simon King:
Improving Speech Synthesis with Discourse Relations. 4470-4474 - Noé Tits, Fengna Wang, Kevin El Haddad, Vincent Pagel, Thierry Dutoit:
Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis Through Audio Analysis. 4475-4479 - Bing Yang, Jiaqi Zhong, Shan Liu:
Pre-Trained Text Representations for Improving Front-End Text Processing in Mandarin Text-to-Speech Synthesis. 4480-4484 - Huashan Pan, Xiulin Li, Zhiqiang Huang:
A Mandarin Prosodic Boundary Prediction Model Based on Multi-Task Learning. 4485-4488 - Ajda Gokcen, Hao Zhang, Richard Sproat:
Dual Encoder Classifier Models as Constraints in Neural Text Normalization. 4489-4493 - Jingbei Li, Zhiyong Wu, Runnan Li, Pengpeng Zhi, Song Yang, Helen Meng:
Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis. 4494-4498 - Ravi Shankar, Hsi-Wei Hsieh, Nicolas Charon, Archana Venkataraman:
Automated Emotion Morphing in Speech Based on Diffeomorphic Curve Registration and Highway Networks. 4499-4503
Speech and Voice Disorders 2
- Kathryn P. Connaghan, Jordan R. Green, Sabrina Paganoni, James Chan, Harli Weber, Ella Collins, Brian Richburg, Marziye Eshghi, Jukka-Pekka Onnela, James D. Berry:
Use of Beiwe Smartphone App to Identify and Track Speech Decline in Amyotrophic Lateral Sclerosis (ALS). 4504-4508 - Hannah P. Rowe, Jordan R. Green:
Profiling Speech Motor Impairments in Persons with Amyotrophic Lateral Sclerosis: An Acoustic-Based Approach. 4509-4513 - Alex Mayle, Zhiwei Mou, Razvan C. Bunescu, Sadegh Mirshekarian, Li Xu, Chang Liu:
Diagnosing Dysarthria with Long Short-Term Memory Networks. 4514-4518 - Protima Nomo Sudro, S. R. Mahadeva Prasanna:
Modification of Devoicing Error in Cleft Lip and Palate Speech. 4519-4523 - Marziye Eshghi, Panying Rong, Antje S. Mefferd, Kaila L. Stipancic, Yana Yunusova, Jordan R. Green:
Reduced Task Adaptation in Alternating Motion Rate Tasks as an Early Marker of Bulbar Involvement in Amyotrophic Lateral Sclerosis. 4524-4528 - Tianqi Wang, Quanlei Yan, Jingshen Pan, Feiqi Zhu, Rongfeng Su, Yi Guo, Lan Wang, Nan Yan:
Towards the Speech Features of Early-Stage Dementia: Design and Application of the Mandarin Elderly Cognitive Speech Database. 4529-4533 - Wenjun Chen, Jeroen van de Weijer, Shuangshuang Zhu, Qian Qian, Manna Wang:
Acoustic Characteristics of Lexical Tone Disruption in Mandarin Speakers After Brain Damage. 4534-4538 - Anne Hermes, Doris Mücke, Tabea Thies, Michael T. Barbe:
Intragestural Variation in Natural Sentence Production: Essential Tremor Patients Treated with DBS. 4539-4543 - Sishir Kalita, Protima Nomo Sudro, S. R. Mahadeva Prasanna, Samarendra Dandapat:
Nasal Air Emission in Sibilant Fricatives of Cleft Lip and Palate Speech. 4544-4548 - Luis Serrano, Sneha Raman, David Tavarez, Eva Navas, Inma Hernáez:
Parallel vs. Non-Parallel Voice Conversion for Esophageal Speech. 4549-4553 - Akhilesh Kumar Dubey, S. R. Mahadeva Prasanna, Samarendra Dandapat:
Hypernasality Severity Detection Using Constant Q Cepstral Coefficients. 4554-4558 - Mingyue Niu, Jianhua Tao, Bin Liu, Cunhang Fan:
Automatic Depression Level Detection via ℓp-Norm Pooling. 4559-4563 - Suhas B. N., Deep Patel, Nithin Rao Koluguri, Yamini Belur, Pradeep Reddy, Atchayaram Nalini, Ravi Yadav, Dipanjan Gope, Prasanta Kumar Ghosh:
Comparison of Speech Tasks and Recording Devices for Voice Based Automatic Classification of Healthy Subjects and Patients with Amyotrophic Lateral Sclerosis. 4564-4568
Speech and Audio Source Separation and Scene Analysis 3
- Dongxiao Wang, Hirokazu Kameoka, Koichi Shinoda:
A Modified Algorithm for Multiple Input Spectrogram Inversion. 4569-4573 - Fahimeh Bahmaninezhad, Jian Wu, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu:
A Comprehensive Study of Speech Separation: Spectrogram vs Waveform Separation. 4574-4578 - Berkay Inan, Milos Cernak, Helmut Grabner, Helena Peic Tukuljac, Rodrigo C. G. Pena, Benjamin Ricaud:
Evaluating Audiovisual Source Separation in the Context of Video Conferencing. 4579-4583 - David Ditter, Timo Gerkmann:
Influence of Speaker-Specific Parameters on Speech Separation Systems. 4584-4588 - Jeroen Zegers, Hugo Van hamme:
CNN-LSTM Models for Multi-Speaker Source Separation Using Bayesian Hyper Parameter Optimization. 4589-4593 - Helen L. Bear, Inês Nolasco, Emmanouil Benetos:
Towards Joint Sound Scene and Polyphonic Sound Event Recognition. 4594-4598 - Cunhang Fan, Bin Liu, Jianhua Tao, Jiangyan Yi, Zhengqi Wen:
Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features. 4599-4603 - Midia Yousefi, Soheil Khorram, John H. L. Hansen:
Probabilistic Permutation Invariant Training for Speech Separation. 4604-4608 - Jing Shi, Jiaming Xu, Bo Xu:
Which Ones Are Speaking? Speaker-Inferred Model for Multi-Talker Speech Separation. 4609-4613 - Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu, Shoji Hayakawa, Shouji Harada, Jiqing Han:
End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network. 4614-4618 - Francesc Lluís, Jordi Pons, Xavier Serra:
End-to-End Music Source Separation: Is it Possible in the Waveform Domain? 4619-4623
Speech-to-Text and Speech Assessment
- Ben Foley, Alina Rakhi, Nicholas Lambourne, Nicholas Buckeridge, Janet Wiles:
Elpis, an Accessible Speech-to-Text Tool. 4624-4625 - Martin Gruber, Adam Chýlek, Jindrich Matousek:
Framework for Conducting Tasks Requiring Human Assessment. 4626-4627 - Shen Huang, Bojie Hu, Shan Huang, Pengfei Hu, Jian Kang, Zhiqiang Lv, Jinghao Yan, Qi Ju, Shiyin Kang, Deyi Tuo, Guangzhi Li, Nurmemet Yolwas:
Multimedia Simultaneous Translation System for Minority Language Communication with Mandarin. 4628-4629 - Erinç Dikici, Gerhard Backfried, Jürgen Riedler:
The SAIL LABS Media Mining Indexer and the CAVA Framework. 4630-4631 - Nagendra Kumar Goel, Mousmita Sarma, Saikiran Valluri, Dharmeshkumar Agrawal, Steve Braich, Tejendra Singh Kuswah, Zikra Iqbal, Surbhi Chauhan, Raj Karbar:
CaptionAI: A Real-Time Multilingual Captioning Application. 4632-4633
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.