default search action

combined dblp search
author search
venue search
publication search

ask others

24th Interspeech 2023: Dublin, Ireland

> Home > Conferences and Workshops > Interspeech

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/2023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/2023
Naomi Harte, Julie Carson-Berndsen, Gareth Jones:
24th Annual Conference of the International Speech Communication Association, Interspeech 2023, Dublin, Ireland, August 20-24, 2023. ISCA 2023

Keynote 1 ISCA Medallist

- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/Narayanan23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Narayanan23
Shrikanth Narayanan:
Bridging Speech Science and Technology - Now and Into the Future. 1

Speech Synthesis: Prosody and Emotion

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangZLXLL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangZLXLL23
Jianrong Wang, Yaxin Zhao, Li Liu, Tianyi Xu, Qi Li, Sen Li:
Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks. 2-6
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuLHPWW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuLHPWW23
Zhaoci Liu, Zhen-Hua Ling, Ya-Jun Hu, Jia Pan, Jin-Wei Wang, Yun-Di Wu:
Speech Synthesis with Self-Supervisedly Learnt Prosodic Representations. 7-11
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Tang0W0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Tang0W0023
Haobin Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao:
EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis. 12-16
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XinTMS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XinTMS23
Detai Xin, Shinnosuke Takamichi, Ai Morimatsu, Hiroshi Saruwatari:
Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus. 17-21
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0008ZHG023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0008ZHG023
Rui Liu, Haolin Zuo, De Hu, Guanglai Gao, Haizhou Li:
Explicit Intensity Control for Accented Text-to-speech. 22-26
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangMRVYPECABB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangMRVYPECABB23
Guangyan Zhang, Thomas Merritt, Manuel Sam Ribeiro, Biel Tura Vecino, Kayoko Yanagisawa, Kamil Pokora, Abdelhamid Ezzerg, Sebastian Cygert, Ammar Abbas, Piotr Bilinski, Roberto Barra-Chicote, Daniel Korzekwa, Jaime Lorenzo-Trueba:
Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech. 27-31

Statistical Machine Translation

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DuquenneSS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuquenneSS23
Paul-Ambroise Duquenne, Holger Schwenk, Benoît Sagot:
Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer. 32-36
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Pal0VMCF23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Pal0VMCF23
Proyag Pal, Brian Thompson, Yogesh Virkar, Prashant Mathur, Alexandra Chronopoulou, Marcello Federico:
Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters. 37-41
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Song0LWW00M23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Song0LWW00M23
Kun Song, Yi Ren, Yi Lei, Chunfeng Wang, Kun Wei, Lei Xie, Xiang Yin, Zejun Ma:
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation. 42-46
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaidoPNT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaidoPNT23
Marco Gaido, Sara Papi, Matteo Negri, Marco Turchi:
Joint Speech Translation and Named Entity Recognition. 47-51
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SantE23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SantE23
Gerard Sant, Carlos Escolano:
Analysis of Acoustic information in End-to-End Spoken Language Translation. 52-56
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangSX0ZG0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangSX0ZG0023
Peidong Wang, Eric Sun, Jian Xue, Yu Wu, Long Zhou, Yashesh Gaur, Shujie Liu, Jinyu Li:
LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers. 57-61

Self-Supervised Learning in ASR

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengS0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengS0023
Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe:
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models. 62-66
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZaiemPE23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZaiemPE23
Salah Zaiem, Titouan Parcollet, Slim Essid:
Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations. 67-71
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangN0FJXMNC0Z23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangN0FJXMNC0Z23
Zhao Yang, Dianwen Ng, Chong Zhang, Xiao Fu, Rui Jiang, Wei Xi, Yukun Ma, Chongjia Ni, Eng Siong Chng, Bin Ma, Jizhong Zhao:
Dual Acoustic Linguistic Self-supervised Representation Learning for Cross-Domain Speech Recognition. 72-76
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaskarRRA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaskarRRA23
Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Kartik Audhkhasi:
O-1: Self-training with Oracle and 1-best Hypothesis. 77-81
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaZTW023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaZTW023
Ziyang Ma, Zhisheng Zheng, Changli Tang, Yujin Wang, Xie Chen:
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets. 82-86
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Lam-Yee-MuiYK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Lam-Yee-MuiYK23
Léa-Marie Lam-Yee-Mui, Lucas Ondel Yang, Ondrej Klejch:
Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models. 87-91

Prosody

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangC23
Xinya Zhang, Ying Chen:
Chinese EFL Learners' Perception of English Prosodic Focus. 92-96
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Sostarics023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Sostarics023
Thomas Sostarics, Jennifer Cole:
Pitch Accent Variation and the Interpretation of Rising and Falling Intonation in American English. 97-101
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KuangCR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KuangCR23
Jianjing Kuang, May Pik Yu Chan, Nari Rhee:
Tonal coarticulation as a cue for upcoming prosodic boundary. 102-106
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ReppMH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ReppMH23
Sophie Repp, Lara Muhtz, Johannes Heim:
Alignment of Beat Gestures and Prosodic Prominence in German. 107-111
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WhitePGSC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WhitePGSC23
Hannah White, Joshua Penney, Andy Gibson, Anita Szakay, Felicity Cox:
Creak Prevalence and Prosodic Context in Australian English. 112-116
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BodurBGRFM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BodurBGRFM23
Kübra Bodur, Roxane Bertrand, James Sneed German, Stéphane Rauzy, Corinne Fredouille, Christine Meunier:
Speech reduction: position within French prosodic structure. 117-121

Speech Production

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuCZHW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuCZHW23
Ziyu Zhu, Yujie Chi, Zhao Zhang, Kiyoshi Honda, Jianguo Wei:
Transvelar Nasal Coupling Contributing to Speaker Characteristics in Non-nasal Vowels. 122-126
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OtaniSOK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OtaniSOK23
Yuto Otani, Shun Sawada, Hidefumi Ohmura, Kouichi Katsurada:
Speech Synthesis from Articulatory Movements Recorded by Real-time MRI. 127-131
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0006PJXFD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0006PJXFD23
Zheng Yuan, Aldo Pastore, Dorina De Jong, Hao Xu, Luciano Fadiga, Alessandro D'Ausilio:
The ART of Conversation: Measuring Phonetic Convergence and Deliberate Imitation in L2-Speech with a Siamese RNN. 132-136
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MahshieL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MahshieL23
James J. Mahshie, Michael Larsen:
Did you see that? Exploring the role of vision in the development of consonant feature contrasts in children with cochlear implants. 137-140

Dysarthric Speech Assessment

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BemmelPWS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BemmelPWS23
Loes van Bemmel, Chiara Pesenti, Xue Wei, Helmer Strik:
Automatic assessments of dysarthric speech: the usability of acoustic-phonetic features. 141-145
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KumarBBNYG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KumarBBNYG23
Chowdam Venkata Thirumala Kumar, Tanuka Bhattacharjee, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Prasanta Kumar Ghosh:
Classification of Multi-class Vowels and Fricatives From Patients Having Amyotrophic Lateral Sclerosis with Varied Levels of Dysarthria Severity. 146-150
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Qih23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Qih23
Jinzi Qi, Hugo Van hamme:
Parameter-efficient Dysarthric Speech Recognition Using Adapter Fusion and Householder Transformation. 151-155
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HermannM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HermannM23
Enno Hermann, Mathew Magimai-Doss:
Few-shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation. 156-160
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YeeLNHTBF23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YeeLNHTBF23
Dianna Yee, Colin Lea, Jaya Narain, Zifang Huang, Lauren Tooley, Jeffrey P. Bigham, Leah Findlater:
Latent Phrase Matching for Dysarthric Speech. 161-165
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YeoCKC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YeoCKC23
Eun Jung Yeo, Kwanghee Choi, Sunhee Kim, Minhwa Chung:
Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification. 166-170

Speech Coding: Transmission and Enhancement

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhengXT0X23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhengXT0X23
Youqiang Zheng, Li Xiao, Weiping Tu, Yuhong Yang, Xinmeng Xu:
CQNV: A Combination of Coarsely Quantized Bitstream and Neural Vocoder for Low Rate Speech Coding. 171-175
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KamoDN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KamoDN23
Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatani:
Target Speech Extraction with Conditional Diffusion Model. 176-180
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CohenHN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CohenHN23
Elad Cohen, Hai Victor Habi, Arnon Netzer:
Towards Fully Quantized Neural Networks For Speech Enhancement. 181-185
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangL23
Youshan Zhang, Jialu Li:
Complex Image Generation SwinTransformer Network for Audio Denoising. 186-190

Speech Recognition: Signal Processing, Acoustic Modeling, Robustness, Adaptation 1

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BlauAMWRCGBHR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BlauAMWRCGBHR23
Yochai Blau, Rohan Agrawal, Lior Madmony, Gary Wang, Andrew Rosenberg, Zhehuai Chen, Zorik Gekhman, Genady Beryozkin, Parisa Haghani, Bhuvana Ramabhadran:
Using Text Injection to Improve Recognition of Personal Identifiers in Speech. 191-195
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GroszGARK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GroszGARK23
Tamás Grósz, Yaroslav Getman, Ragheb Al-Ghezi, Aku Rouhe, Mikko Kurimo:
Investigating wav2vec2 context representations and the effects of fine-tuning, a case-study of a Finnish model. 196-200
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeheckaSPI23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeheckaSPI23
Jan Lehecka, Jan Svec, Josef V. Psutka, Pavel Ircing:
Transformer-based Speech Recognition Models for Oral History Archives in English, German, and Czech. 201-205
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SinghTO23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SinghTO23
Mayank Kumar Singh, Naoya Takahashi, Naoyuki Onoe:
Iteratively Improving Speech Recognition and Voice Conversion. 206-210
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FatehiK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FatehiK23
Kavan Fatehi, Ayse Küçükyilmaz:
LABERT: A Combination of Local Aggregation and Self-Supervised Speech Representation Learning for Detecting Informative Hidden Units in Low-Resource ASR Systems. 211-215
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XueSCG0L23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XueSCG0L23
Hongfei Xue, Qijie Shao, Peikun Chen, Pengcheng Guo, Lei Xie, Jie Liu:
TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition. 216-220
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuMRPSL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuMRPSL23
Zelin Wu, Tsendsuren Munkhdalai, Pat Rondon, Golan Pundak, Khe Chai Sim, Christopher Li:
Dual-Mode NAM: Effective Top-K Context Injection for End-to-End ASR. 221-225
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouZ0MX023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouZ0MX023
Hang Zhou, Xiaoxu Zheng, Yunhe Wang, Michael Bi Mi, Deyi Xiong, Kai Han:
GhostRNN: Reducing State Redundancy in RNN with Cheap Operations. 226-230
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangW0SW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangW0SW23
Haoyu Wang, Siyuan Wang, Wei-Qiang Zhang, Hongbin Suo, Yulong Wan:
Task-Agnostic Structured Pruning of Speech Representation Models. 231-235
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KandaYL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KandaYL23
Naoyuki Kanda, Takuya Yoshioka, Yang Liu:
Factual Consistency Oriented Speech Recognition. 236-240
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FathullahWSJXML23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FathullahWSJXML23
Yassir Fathullah, Chunyang Wu, Yuan Shangguan, Junteng Jia, Wenhan Xiong, Jay Mahadeokar, Chunxi Liu, Yangyang Shi, Ozlem Kalinli, Mike Seltzer, Mark J. F. Gales:
Multi-Head State Space Model for Speech Recognition. 241-245
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoZCDF23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoZCDF23
Yingying Gao, Shilei Zhang, Zihao Cui, Chao Deng, Junlan Feng:
Cascaded Multi-task Adaptive Learning Based on Neural Architecture Search. 246-250
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MartinGBL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MartinGBL23
Kinan Martin, Jon Gauthier, Canaan Breiss, Roger Levy:
Probing Self-supervised Speech Models for Phonetic and Phonemic Information: A Case Study in Aspiration. 251-255
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HardingTW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HardingTW23
Philip Harding, Sibo Tong, Simon Wiesler:
Selective Biasing with Trie-based Contextual Adapters for Personalised Speech Recognition using Neural Transducers. 256-260

Analysis of Speech and Audio Signals 1

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zeng000023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zeng000023
Xiao-Min Zeng, Yan Song, Ian McLoughlin, Lin Liu, Li-Rong Dai:
Robust Prototype Learning for Anomalous Sound Detection. 261-265
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KushwahaF23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KushwahaF23
Saksham Singh Kushwaha, Magdalena Fuentes:
A multimodal prototypical approach for unsupervised sound classification. 266-270
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WenHYZ0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WenHYZ0023
Penghui Wen, Kun Hu, Wenxi Yue, Sen Zhang, Wanlei Zhou, Zhiyong Wang:
Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms. 271-275
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiangLLPBP023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiangLLPBP023
Jinhua Liang, Xubo Liu, Haohe Liu, Huy Phan, Emmanouil Benetos, Mark D. Plumbley, Wenwu Wang:
Adapting Language-Audio Models as Few-Shot Audio Learners. 276-280
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangY23
Mengwei Wang, Zhe Yang:
TFECN: Time-Frequency Enhanced ConvNet for Audio Classification. 281-285
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiC23
Won-Gook Choi, Joon-Hyuk Chang:
Resolution Consistency Training on Time-Frequency Domain for Semi-Supervised Sound Event Detection. 286-290
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Li000L023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Li000L023
Kang Li, Yan Song, Ian McLoughlin, Lin Liu, Jin Li, Li-Rong Dai:
Fine-tuning Audio Spectrogram Transformer with Task-aware Adapters for Sound Event Detection. 291-295
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NgXYYTFC023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NgXYYTFC023
Dianwen Ng, Yang Xiao, Jia Qi Yip, Zhao Yang, Biao Tian, Qiang Fu, Eng Siong Chng, Bin Ma:
Small Footprint Multi-channel Network for Keyword Spotting with Centroid Based Awareness. 296-300
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XieLHCV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XieLHCV23
Wei Xie, Yanxiong Li, Qianhua He, Wenchang Cao, Tuomas Virtanen:
Few-shot Class-incremental Audio Classification Using Adaptively-refined Prototypes. 301-305
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ValiB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ValiB23
Mohammad Hassan Vali, Tom Bäckström:
Interpretable Latent Space Using Space-Filling Curves for Phonetic Analysis in Voice Conversion. 306-310
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TulchinskiiKKCB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TulchinskiiKKCB23
Eduard Tulchinskii, Kristian Kuznetsov, Laida Kushnareva, Daniil Cherniavskii, Serguei Barannikov, Irina Piontkovskaya, Sergey I. Nikolenko, Evgeny Burnaev:
Topological Data Analysis for Speech Processing. 311-315
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JangKYK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JangKYK23
Kangwook Jang, Sungnyun Kim, Se-Young Yun, Hoirin Kim:
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation. 316-320
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoppelmannA023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoppelmannA023
Timm Koppelmann, Semih Agcaer, Rainer Martin:
Personalized Acoustic Scene Classification in Ultra-low Power Embedded Devices Using Privacy-preserving Data Augmentation. 321-325
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinBG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinBG23
Wei-Cheng Lin, Luca Bondi, Shabnam Ghaffarzadegan:
Background Domain Switch: A Novel Data Augmentation Technique for Robust Sound Event Detection. 326-330
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HouSLMR000B23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HouSLMR000B23
Yuanbo Hou, Siyang Song, Cheng Luo, Andrew Mitchell, Qiaoqiao Ren, Weicheng Xie, Jian Kang, Wenwu Wang, Dick Botteldooren:
Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning. 331-335
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhang0ZXL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhang0ZXL23
Hejing Zhang, Jian Guan, Qiaoxi Zhu, Feiyang Xiao, Youde Liu:
Anomalous Sound Detection Using Self-Attention-Based Frequency Pattern Analysis of Machine Sounds. 336-340
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XinZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XinZ23
Yifei Xin, Yuexian Zou:
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions. 341-345
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BNRA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BNRA23
Suhas BN, Sarah Rajtmajer, Saeed Abdullah:
Differential Privacy enabled Dementia Classification: An Exploration of the Privacy-Accuracy Trade-off in Speech Signal Data. 346-350
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangGB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangGB23
Shijun Wang, Jón Guðnason, Damian Borth:
Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech. 351-355
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BeheraRTRK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BeheraRTRK23
Swarup Ranjan Behera, Pailla Balakrishna Reddy, Achyut Mani Tripathi, Megavath Bharadwaj Rathod, Tejesh Karavadi:
Towards Multi-Lingual Audio Question Answering. 356-360

Speech Recognition: Architecture, Search, and Linguistic Components 1

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AldarmakiG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AldarmakiG23
Hanan Aldarmaki, Ahmad Ghannam:
Diacritic Recognition Performance in Arabic ASR. 361-365
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KolehmainenGGSG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KolehmainenGGSG23
Jari Kolehmainen, Yile Gu, Aditya Gourav, Prashanth Gurunath Shivakumar, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko:
Personalization for BERT-based Discriminative Speech Recognition Rescoring. 366-370
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KrishnanAK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KrishnanAK23
Aravind Krishnan, Jesujoba O. Alabi, Dietrich Klakow:
On the N-gram Approximation of Pre-trained Language Models. 371-375
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangHWS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangHWS23
Tianyu Huang, Chung Hoon Hong, Carl Wivagg, Kanna Shimizu:
Record Deduplication for Entity Distribution Modeling in ASR Transcripts. 376-380
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AgrawalRSCS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AgrawalRSCS23
Aakriti Agrawal, Milind Rao, Anit Kumar Sahu, Gopinath Chennupati, Andreas Stolcke:
Learning When to Trust Which Teacher for Weakly Supervised ASR. 381-385
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangL00M23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangL00M23
Lu Huang, Boyu Li, Jun Zhang, Lu Lu, Zejun Ma:
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer. 386-390

Speech Recognition: Technologies and Systems for New Applications 1

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Peng0RMH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Peng0RMH23
Puyuan Peng, Shang-Wen Li, Okko Räsänen, Abdelrahman Mohamed, David Harwath:
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model. 391-395
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengY0H23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengY0H23
Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath:
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization. 396-400
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MooreM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MooreM23
Roger K. Moore, Ricard Marxer:
Progress and Prospects for Spoken Language Technology: Results from Five Sexennial Surveys. 401-405
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SanabriaKTG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SanabriaKTG23
Ramon Sanabria, Ondrej Klejch, Hao Tang, Sharon Goldwater:
Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling. 406-410
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiD0YLZ0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiD0YLZ0023
Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Li-Rong Dai:
CASA-ASR: Context-Aware Speaker-Attributed ASR. 411-415
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TakahashiS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TakahashiS23
Shun Takahashi, Sakriani Sakti:
Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams. 416-420
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangDHNLL023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangDHNLL023
Gaobin Yang, Jun Du, Maokui He, Shutong Niu, Baoxiang Li, Jiakui Li, Chin-Hui Lee:
AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in the SUPERB Benchmark. 421-425
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WongZC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WongZC23
Jeremy H. M. Wong, Huayun Zhang, Nancy F. Chen:
Distilling knowledge from Gaussian process teacher to neural network student. 426-430
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Bhati0MTD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Bhati0MTD23
Saurabhchand Bhati, Jesús Villalba, Laureano Moro-Velázquez, Thomas Thebaud, Najim Dehak:
Segmental SpeechCLIP: Utilizing Pretrained Image-text Models for Audio-Visual Learning. 431-435
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JacobsRCBK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JacobsRCBK23
Christiaan Jacobs, Nathanaël Carraz Rakotonirina, Everlyn Asiko Chimoto, Bruce A. Bassett, Herman Kamper:
Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili. 436-440
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MerweK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MerweK23
Ruan van der Merwe, Herman Kamper:
Mitigating Catastrophic Forgetting for Few-Shot Spoken Word Classification Through Meta-Learning. 441-445
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PolacekCZW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PolacekCZW23
Martin Polácek, Petr Cerva, Jindrich Zdánský, Lenka Weingartová:
Online Punctuation Restoration using ELECTRA Model for streaming ASR Systems. 446-450
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenPPSZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenPPSZ23
Szu-Jui Chen, Debjyoti Paul, Yutong Pang, Peng Su, Xuedong Zhang:
Language Agnostic Data-Driven Inverse Text Normalization. 451-455
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenY00CCPLS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenY00CCPLS23
Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath:
How to Estimate Model Transferability of Pre-Trained Speech Models? 456-460
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IhoriSTMMH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IhoriSTMMH23
Mana Ihori, Hiroshi Sato, Tomohiro Tanaka, Ryo Masumura, Saki Mizuno, Nobukatsu Hojo:
Transcribing Speech as Spoken and Written Dual Text Using an Autoregressive Model. 461-465

Lexical and Language Modeling for ASR

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YukselFJAG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YukselFJAG23
Kamer Ali Yuksel, Thiago Castro Ferreira, Golara Javadi, Mohamed Al-Badrashiny, Ahmet Gunduz:
NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning. 466-470
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuSKGRB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuSKGRB23
Yile Gu, Prashanth Gurunath Shivakumar, Jari Kolehmainen, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko:
Scaling Laws for Discriminative Speech Recognition Rescoring Models. 471-475
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuLOZX23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuLOZX23
Hong Liu, Zhaobiao Lv, Zhijian Ou, Wenbo Zhao, Qing Xiao:
Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition. 476-480
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FengTXH023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FengTXH023
Yukun Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang:
Memory Augmented Lookup Dictionary Based Language Modeling for Automatic Speech Recognition. 481-485
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IwamotoS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IwamotoS23
Yu Iwamoto, Takahiro Shinozaki:
Memory Network-Based End-To-End Neural ES-KMeans for Improved Word Segmentation. 486-490
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SudoHN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SudoHN23
Yui Sudo, Kazuya Hata, Kazuhiro Nakadai:
Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation. 491-495

Language Identification and Diarization

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuTLOJ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuTLOJ23
Winstead Zhu, Md. Iftekhar Tanveer, Yang Janet Liu, Seye Ojumu, Rosie Jones:
Lightweight and Efficient Spoken Language Identification of Long-form Audio. 496-500
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MishraPCP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MishraPCP23
Jagabandhu Mishra, Jayadev N. Patil, Amartya Chowdhury, S. R. Mahadeva Prasanna:
End to End Spoken Language Diarization with Wav2vec Embeddings. 501-505
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NietoJDS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NietoJDS23
Oriol Nieto, Zeyu Jin, Franck Dernoncourt, Justin Salamon:
Efficient Spoken Language Recognition via Multilabel Classification. 506-510
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MatejkaSSMPKPSB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MatejkaSSMPKPSB23
Pavel Matejka, Anna Silnova, Josef Slavícek, Ladislav Mosner, Oldrich Plchot, Michal Klco, Junyi Peng, Themos Stafylakis, Lukás Burget:
Description and Analysis of ABC Submission to NIST LRE 2022. 511-515
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AlumaeKLBMK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AlumaeKLBMK23
Tanel Alumäe, Kunnar Kukk, Viet Bac Le, Claude Barras, Abdel Messaoudi, Waad Ben Kheder:
Exploring the Impact of Pretrained Models and Web-Scraped Data for the 2022 NIST Language Recognition Evaluation. 516-520
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001BJKGTD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001BJKGTD23
Jesús Villalba, Jonas Borgstrom, Maliha Jahan, Saurabh Kataria, Leibny Paola García, Pedro A. Torres-Carrasquillo, Najim Dehak:
Advances in Language Recognition in Low Resource African Languages: The JHU-MIT Submission for NIST LRE22. 521-525

Speech Quality Assessment

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiangCSC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiangCSC23
Xinyu Liang, Fredrik Cumlin, Christian Schüldt, Saikat Chatterjee:
DeePMOS: Deep Posterior Mean-Opinion-Score of Speech. 526-530
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DasareHST23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DasareHST23
Ashwini Dasare, Pradyoth Hegde, Supritha M. Shetty, Deepak K. T.:
The Role of Formant and Excitation Source Features in Perceived Naturalness of Low Resource Tribal Language TTS: An Empirical Study. 531-535
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GongWLY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GongWLY23
Wuxuan Gong, Jing Wang, Yitong Liu, Hongwen Yang:
A no-reference speech quality assessment method based on neural network with densely connected convolutional architecture. 536-540
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TaLLD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TaLLD23
Bao Thang Ta, Minh Tu Le, Nhat Minh Le, Van Hai Do:
Probing Speech Quality Information in ASR Systems. 541-545
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuYT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuYT23
Cheng-Hung Hu, Yusuke Yasuda, Tomoki Toda:
Preference-based training framework for automatic speech quality assessment using deep neural network. 546-550
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Phatthiyaphaibun23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Phatthiyaphaibun23
Wannaphong Phatthiyaphaibun, Chompakorn Chaksangchaichot, Thanawin Rakthanmanon, Ekapol Chuangsuwanich, Sarana Nutanong:
Crowdsourced Data Validation for ASR Training. 551-555

Feature Modeling for ASR

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuoSHMSM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuoSHMSM23
Zhouyuan Huo, Khe Chai Sim, Dongseong Hwang, Tsendsuren Munkhdalai, Tara N. Sainath, Pedro Moreno Mengibar:
Re-investigating the Efficient Transfer Learning of Speech Foundation Model using Feature Fusion Methods. 556-560
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QiCMJDZ023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QiCMJDZ023
Gege Qi, Yuefeng Chen, Xiaofeng Mao, Xiaojun Jia, Ranjie Duan, Rong Zhang, Hui Xue:
Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training. 561-565
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LaiZ0QW0CY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LaiZ0QW0CY23
Zhi-Hao Lai, Tian-Hao Zhang, Qi Liu, Xinyuan Qian, Li-Fang Wei, Feng Chen, Song-Lu Chen, Xu-Cheng Yin:
InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition. 566-570
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TanALZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TanALZ23
Yizhou Tan, Haojun Ai, Shengchen Li, Feng Zhang:
Transductive Feature Space Regularization for Few-shot Bioacoustic Event Detection. 571-575
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLO23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLO23
Jisung Wang, Haram Lee, Myungwoo Oh:
Incorporating L2 Phonemes Using Articulatory Features for Robust Speech Recognition. 576-580
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParcolletZDRB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParcolletZDRB23
Titouan Parcollet, Shucong Zhang, Rogier van Dalen, Alberto Gil C. P. Ramos, Sourav Bhattacharya:
On the (In)Efficiency of Acoustic Feature Extractors for Self-Supervised Speech Representation Learning. 581-585

Interfacing Speech Technology and Phonetics

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BoschBB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BoschBB23
Louis ten Bosch, Martijn Bentum, Lou Boves:
Phonemic competition in end-to-end ASR models. 586-590
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HughesWFHKVWX23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HughesWFHKVWX23
Vincent Hughes, Jessica Wormald, Paul Foulkes, Philip Harrison, Finnian Kelly, David van der Vloed, Poppy Welch, Chenzi Xu:
Automatic speaker recognition with variation across vocal conditions: a controlled experiment with implications for forensics. 591-595
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GeigerS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GeigerS23
Bernhard C. Geiger, Barbara Schuppler:
Exploring Graph Theory Methods For the Analysis of Pronunciation Variation in Spontaneous Speech. 596-600
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NuttallHH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NuttallHH23
Bryony Nuttall, Philip Harrison, Vincent Hughes:
Automatic Speaker Recognition performance with matched and mismatched female bilingual speech data. 601-605

Speech Synthesis: Multilinguality

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangKJKKCK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangKJKKCK23
Hongsun Yang, Ji-Hoon Kim, Yooncheol Ju, Ilhwan Kim, Byeong-Yeol Kim, Shukjae Choi, Hyung Yong Kim:
FACTSpeech: Speaking a Foreign Language Pronunciation Using Only Your Native Characters. 606-610
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeYKK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeYKK23
Hoyeon Lee, Hyun-Wook Yoon, Jong-Hwan Kim, Jae-Min Kim:
Cross-Lingual Transfer Learning for Phrase Break Prediction with Multilingual Language Model. 611-615
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuGD0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuGD0023
Sen Liu, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu:
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech. 616-620
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MarkopoulosMVEV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MarkopoulosMVEV23
Konstantinos Markopoulos, Georgia Maniati, Georgios Vamvoukakis, Nikolaos Ellinas, Georgios Vardaxoglou, Panos Kakoulidis, Junkwang Oh, Gunu Jho, Inchul Hwang, Aimilios Chalamandaris, Pirros Tsiakoulis, Spyros Raptis:
Generating Multilingual Gender-Ambiguous Text-to-Speech Voices. 621-625
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BadlaniVSSGC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BadlaniVSSGC23
Rohan Badlani, Rafael Valle, Kevin J. Shih, João Felipe Santos, Siddharth Gururani, Bryan Catanzaro:
RAD-MMM: Multilingual Multiaccented Multispeaker Text To Speech. 626-630
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CominiRYSL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CominiRYSL23
Giulia Comini, Manuel Sam Ribeiro, Fan Yang, Heereen Shim, Jaime Lorenzo-Trueba:
Multilingual context-based pronunciation learning for Text-to-Speech. 631-635

Speech Emotion Recognition 1

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/00040023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/00040023
Minh Tran, Yufeng Yin, Mohammad Soleymani:
Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition. 636-640
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChouGLLB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChouGLLB23
Huang-Cheng Chou, Lucas Goncalves, Seong-Gyun Leem, Chi-Chun Lee, Carlos Busso:
The Importance of Calibration: Rethinking Confidence and Performance of Speech Multi-label Emotion Classifiers. 641-645
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MalikLJS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MalikLJS23
Mohammad Ibrahim Malik, Siddique Latif, Raja Jurdak, Björn W. Schuller:
A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model. 646-650
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AlsenaniGV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AlsenaniGV23
Basmah Alsenani, Tanaya Guha, Alessandro Vinciarelli:
Privacy Risks in Speech Emotion Recognition: A Systematic Study on Gender Inference Attack. 651-655
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TavernorPP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TavernorPP23
James Tavernor, Matthew Perez, Emily Mower Provost:
Episodic Memory For Domain-Adaptable, Robust Speech Emotion Recognition. 656-660
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DingLZLZZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DingLZLZZ23
Chaoyue Ding, Jiakui Li, Daoming Zong, Baoxiang Li, Tian-Hao Zhang, Qunyan Zhou:
Stable Speech Emotion Recognition with Head-k-Pooling Loss. 661-665

Show and Tell: Health applications and emotion recognition

- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/GibsonKZJ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GibsonKZJ23
Matthew Gibson, Ievgen Karaulov, Oleksii Zhelo, Filip Jurcícek:
A Personalised Speech Communication Application for Dysarthric Speakers. 666-667
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/0001K23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001K23
Sun-Kyung Lee, Jong-Hwan Kim:
Video Multimodal Emotion Recognition System for Real World Applications. 668-669
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/RohmatillahAYNS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RohmatillahAYNS23
Mahdin Rohmatillah, Bobbi Aditya, Li-Jen Yang, Bryan Gautama Ngo, Willianto Sulaiman, Jen-Tzung Chien:
Promoting Mental Self-Disclosure in a Spoken Dialogue System. 670-671
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/BujnowskiKPRMBA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BujnowskiKPRMBA23
Pawel Bujnowski, Bartlomiej Kuzma, Bartlomiej Paziewski, Jacek Rutkowski, Joanna Marhula, Zuzanna Bordzicka, Piotr Andruszkiewicz:
"Select language, modality or put on a mask!" Experiments with Multimodal Emotion Recognition. 672-673
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/ValentineMGS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ValentineMGS23
Hannah Valentine, Joel MacAuslan, Maria I. Grigos, Marisha Speights:
My Vowels Matter: Formant Automation Tools for Diverse Child Speech. 674-675
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/Chong-WhiteSM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Chong-WhiteSM23
Nicky Chong-White, Arun Sebastian, Jorge Mejia:
NEMA: An Ecologically Valid Tool for Assessing Hearing Devices, Advanced Algorithms, and Communication in Diverse Listening Environments. 676-677
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/RamanarayananPA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RamanarayananPA23
Vikram Ramanarayanan, David Pautler, Lakshmi Arbatti, Abhishek Hosamath, Michael Neumann, Hardik Kothare, Oliver Roesler, Jackson Liscombe, Andrew Cornish, Doug Habberstad, Vanessa Richter, David Fox, David Suendermann-Oeft, Ira Shoulson:
When Words Speak Just as Loudly as Actions: Virtual Agent Based Remote Health Assessment Integrating What Patients Say with What They Do. 678-679
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/MotepalliNPKMAV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MotepalliNPKMAV23
Kowshik Siva Sai Motepalli, Vamshiraghusimha Narasinga, Harsha Pathuri, Hina Khan, Sangeetha Mahesh, Ajish K. Abraham, Anil Kumar Vuppala:
Stuttering Detection Application. 680-681
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/ZusagW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZusagW23
Mario Zusag, Laurin Wagner:
Providing Interpretable Insights for Neurological Speech and Cognitive Disorders from Interactive Serious Games. 682-683
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/SolinskyFMP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SolinskyFMP23
Jacob C. Solinsky, Raymond L. Finzel, Martin Michalowski, Serguei Pakhomov:
Automated Neural Nursing Assistant (ANNA): An Over-The-Phone System for Cognitive Monitoring. 684-685
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/0008BGDAA0RS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0008BGDAA0RS23
Ankit Gupta, Abhijeet Bishnu, Mandar Gogate, Kia Dashtipour, Tughrul Arslan, Ahsan Adeel, Amir Hussain, Tharmalingam Ratnarajah, Mathini Sellathurai:
5G-IoT Cloud based Demonstration of Real-Time Audio-Visual Speech Enhancement for Multimodal Hearing-aids. 686-687
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/RazaAA0AA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RazaAA0AA23
Mohsin Raza, Adewale Adetomi, Khubaib Ahmed, Amir Hussain, Tughrul Arslan, Ahsan Adeel:
Towards Two-point Neuron-inspired Energy-efficient Multimodal Open Master Hearing Aid. 688-689

Spoken Dialog Systems and Conversational Analysis 1

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChengX0ZLLZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChengX0ZLLZ23
Xuxin Cheng, Wanshi Xu, Ziyu Yao, Zhihong Zhu, Yaowei Li, Hongxiang Li, Yuexian Zou:
FC-MTLF: A Fine- and Coarse-grained Multi-Task Learning Framework for Cross-Lingual Spoken Language Understanding. 690-694
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Cheng0ZLLZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Cheng0ZLLZ23
Xuxin Cheng, Ziyu Yao, Zhihong Zhu, Yaowei Li, Hongxiang Li, Yuexian Zou:
C²A-SLU: Cross and Contrastive Attention for Improving ASR Robustness in Spoken Language Understanding. 695-699
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeldHLPH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeldHLPH23
Henry Weld, Sijia Hu, Siqu Long, Josiah Poon, Soyeon Caren Han:
Tri-level Joint Natural Language Understanding for Multi-turn Conversational Datasets. 700-704
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LaperriereNGJE23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LaperriereNGJE23
Gaëlle Laperrière, Ha Nguyen, Sahar Ghannay, Bassam Jabaian, Yannick Estève:
Semantic Enrichment Towards Efficient Speech Representations. 705-709
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KashiwagiAFHWPY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KashiwagiAFHWPY23
Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe:
Tensor decomposition for minimization of E2E SLU model toward on-device processing. 710-714
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaoZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaoZ23
Tianjun Mao, Chenghong Zhang:
DiffSLU: Knowledge Distillation Based Diffusion Model for Cross-Lingual Spoken Language Understanding. 715-719
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AroraFKTY023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AroraFKTY023
Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding. 720-724
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuL0G23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuL0G23
Zhiyuan Zhu, Yusheng Liao, Yu Wang, Yunfeng Guan:
Contrastive Learning Based ASR Robust Knowledge Selection For Spoken Dialogue System. 725-729
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkSL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkSL23
Seongmin Park, Jinkyu Seo, Jihwa Lee:
Unsupervised Dialogue Topic Segmentation in Hyperdimensional Space. 730-734
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CappellazzoFB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CappellazzoFB23
Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti:
An Investigation of the Combination of Rehearsal and Knowledge Distillation in Continual Learning for Spoken Language Understanding. 735-739
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuY0WJL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuY0WJL23
Zhenhe Wu, Xiaoguang Yu, Meng Chen, Liangqing Wu, Jiahao Ji, Zhoujun Li:
Enhancing New Intent Discovery via Robust Neighbor-based Contrastive Learning. 740-744
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Schwarz0SHR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Schwarz0SHR23
Andreas Schwarz, Di He, Maarten Van Segbroeck, Mohammed Hethnawi, Ariya Rastrow:
Personalized Predictive ASR for Latency Reduction in Voice Assistants. 745-749
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RaySJ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RaySJ23
Avik Ray, Yilin Shen, Hongxia Jin:
Compositional Generalization in Spoken Language Understanding. 750-754
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiRRRS023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiRRRS023
Zefei Li, Anil Ramakrishna, Anna Rumshisky, Andy Rosenbaum, Saleh Soltan, Rahul Gupta:
Sampling bias in NLU models: Impact and Mitigation. 755-759
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuTMLZYA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuTMLZYA23
Jiarui Lu, Bo-Hsiang Tseng, Joel Ruben Antony Moniz, Site Li, Xueyun Zhu, Hong Yu, Murat Akbacak:
5IDER: Unified Query Rewriting for Steering, Intent Carryover, Disfluencies, Entity Carryover and Repair. 760-764
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Shi0T23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Shi0T23
Xiaohan Shi, Xingfeng Li, Tomoki Toda:
Emotion Awareness in Multi-utterance Turn for Improving Emotion Prediction in Multi-Speaker Conversation. 765-769
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLGQLSWT0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLGQLSWT0023
Minghan Wang, Yinglu Li, Jiaxin Guo, Xiaosong Qiao, Zongyao Li, Hengchao Shang, Daimeng Wei, Shimin Tao, Min Zhang, Hao Yang:
WhiSLU: End-to-End Spoken Language Understanding with Whisper. 770-774

Speech Coding and Enhancement 1

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WenV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WenV23
Chuan Wen, Sarah Verhulst:
Biophysically-inspired single-channel speech enhancement in the time domain. 775-779
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JalalPZOSHLJ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JalalPZOSHLJ23
Md Asif Jalal, Pablo Peso Parada, Jisi Zhang, Mete Ozay, Karthikeyan Saravanan, Myoungji Han, Jungin Lee, Seokyeong Jung:
On-Device Speaker Anonymization of Acoustic Embeddings for ASR based on Flexible Location Gradient Reversal Layer. 780-784
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShimHSK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShimHSK23
Hye-jin Shim, Rosa González Hautamäki, Md. Sahidullah, Tomi Kinnunen:
How to Construct Perfect and Worse-than-Coin-Flip Spoofing Countermeasures: A Word of Warning on Shortcut Learning. 785-789
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KongPDC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KongPDC23
Zhifeng Kong, Wei Ping, Ambrish Dantrey, Bryan Catanzaro:
CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram. 790-794
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenXCWLCTXDSZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenXCWLCTXDSZ23
Zhuangqi Chen, Xianjun Xia, Cheng Chen, Xianke Wang, Yanhong Leng, Li Chen, Roberto Togneri, Yijian Xiao, Piao Ding, Shenyi Song, Pingjian Zhang:
A Two-stage Progressive Neural Network for Acoustic Echo Cancellation. 795-799
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuJZXCXDSYS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuJZXCXDSYS23
Linping Xu, Jiawei Jiang, Dejun Zhang, Xianjun Xia, Li Chen, Yijian Xiao, Piao Ding, Shenyi Song, Sixing Yin, Ferdous Sohel:
An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec. 800-803
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangCRPDB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangCRPDB23
Shucong Zhang, Malcolm Chadwick, Alberto Gil C. P. Ramos, Titouan Parcollet, Rogier van Dalen, Sourav Bhattacharya:
Real-Time Personalised Speech Enhancement Transformers with Dynamic Cross-attended Speaker Representations. 804-808
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MamunH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MamunH23
Nursadul Mamun, John H. L. Hansen:
CFTNet: Complex-valued Frequency Transformation Network for Speech Enhancement. 809-813
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangK23
Hejung Yang, Hong-Goo Kang:
Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement. 814-818
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiaoLWYSK0S023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiaoLWYSK0S023
Wei Xiao, Wenzhe Liu, Meng Wang, Shan Yang, Yupeng Shi, Yuyong Kang, Dan Su, Shidong Shang, Dong Yu:
Multi-mode Neural Speech Coding Based on Deep Generative Networks. 819-823
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaeCKLLK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaeCKLLK23
Soo Hyun Bae, Seok Wan Chae, Youngseok Kim, Keunsang Lee, Hyunjin Lim, Lae-Hoon Kim:
Streaming Dual-Path Transformer for Speech Enhancement. 824-828
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ElyaderaniS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ElyaderaniS23
Mahsa Kadkhodaei Elyaderani, Shahram Shirani:
Sequence-to-Sequence Multi-Modal Speech In-Painting. 829-833
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhang0WY023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhang0WY023
Hao Zhang, Meng Yu, Yuzhong Wu, Tao Yu, Dong Yu:
Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression. 834-838
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoYS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoYS23
Chun-Wei Ho, Chao-Han Huck Yang, Sabato Marco Siniscalchi:
Differentially Private Adapters for Parameter Efficient Acoustic Modeling. 839-843
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhengAL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhengAL23
Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling:
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation. 844-848
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UezuWTU23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UezuWTU23
Yasufumi Uezu, Sicheng Wang, Teruki Toya, Masashi Unoki:
Consonant-emphasis Method Incorporating Robust Consonant-section Detection to Improve Intelligibility of Bone-conducted speech. 849-853
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SatoMODMASMITH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SatoMODMASMITH23
Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo:
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss. 854-858
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ByunSSBP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ByunSSBP23
Joon Byun, Seungmin Shin, Jongmo Sung, Seungkwon Beack, Youngcheol Park:
Perceptual Improvement of Deep Neural Network (DNN) Speech Coder Using Parametric and Non-parametric Density Models. 859-863
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeCC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeCC23
Dongheon Lee, Dayun Choi, Jung-Woo Choi:
DeFT-AN RT: Real-time Multichannel Speech Enhancement using Dense Frequency-Time Attentive Network and Non-overlapping Synthesis Window. 864-868

Speech Recognition: Signal Processing, Acoustic Modeling, Robustness, Adaptation 2

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeKJPH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeKJPH23
Kyungmin Lee, Haeri Kim, Sichen Jin, Jinhwan Park, Youngho Han:
A More Accurate Internal Language Model Score Estimation for the Hybrid Autoregressive Transducer. 869-873
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeLLK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeLLK23
Kyungmin Lee, Hyeontaek Lim, Munhwan Lee, Hong-Gee Kim:
Attention Gate Between Capsules in Fully Capsule-Network Speech Recognition. 874-878
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiBCHHCC0ML023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiBCHHCC0ML023
Jiatong Shi, Dan Berrebbi, William Chen, En-Pei Hu, Wei-Ping Huang, Ho-Lam Chung, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe:
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark. 884-888
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimSC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimSC23
Do-Hee Kim, Daeyeol Shim, Joon-Hyuk Chang:
General-purpose Adversarial Training for Enhanced Automatic Speech Recognition Model Generalization. 889-893
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhao0LZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhao0LZ23
Keke Zhao, Peng Song, Shaokai Li, Wenming Zheng:
Joint Instance Reconstruction and Feature Subspace Alignment for Cross-Domain Speech Emotion Recognition. 894-898
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MoriyaSODAMTMOA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MoriyaSODAMTMOA23
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami:
Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data. 899-903
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinHXPKCHLM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinHXPKCHLM23
Yist Y. Lin, Tao Han, Haihua Xu, Van Tung Pham, Yerbolat Khassanov, Tze Yuang Chong, Yi He, Lu Lu, Zejun Ma:
Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition. 904-908
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SelvarajGKSK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SelvarajGKSK23
Nithish Muthuchamy Selvaraj, Xiaobao Guo, Adams Wai-Kin Kong, Bingquan Shen, Alex C. Kot:
Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers. 909-913
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangQLC00QY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangQLC00QY23
Tian-Hao Zhang, Haibo Qin, Zhi-Hao Lai, Song-Lu Chen, Qi Liu, Feng Chen, Xinyuan Qian, Xu-Cheng Yin:
Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding. 914-918
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiangSMD0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiangSMD0023
Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen:
Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation. 919-923
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoWXGPK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoWXGPK23
Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola García, Daniel Povey, Sanjeev Khudanpur:
Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts. 924-928
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LvWSM023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LvWSM023
Shubo Lv, Xiong Wang, Sining Sun, Long Ma, Lei Xie:
DCCRN-KWS: An Audio Bias Based Model for Noise Robust Small-Footprint Keyword Spotting. 929-933
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FuLLLDF0W023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FuLLLDF0W023
Li Fu, Siqi Li, Qingtao Li, Fangzhu Li, Liping Deng, Lu Fan, Meng Chen, Youzheng Wu, Xiaodong He:
OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition. 934-938
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BleekerSBZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BleekerSBZ23
Maurits J. R. Bleeker, Pawel Swietojanski, Stefan Braun, Xiaodan Zhuang:
Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition. 939-943
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Eeckth23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Eeckth23
Steven Vander Eeckt, Hugo Van hamme:
Rehearsal-Free Online Continual Learning for Automatic Speech Recognition. 944-948

Speech Recognition: Technologies and Systems for New Applications 2

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FuGSTLM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FuGSTLM23
Kaiqi Fu, Shaojun Gao, Shuju Shi, Xiaohai Tian, Wei Li, Zejun Ma:
Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring. 949-953
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiFGTGLM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiFGTGLM23
Shuju Shi, Kaiqi Fu, Yiwei Gu, Xiaohai Tian, Shaojun Gao, Wei Li, Zejun Ma:
Disentangling the Contribution of Non-native Speech in Automated Pronunciation Assessment. 954-958
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RyuKC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RyuKC23
Hyungshin Ryu, Sunhee Kim, Minhwa Chung:
A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning. 959-963
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeiHCRS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeiHCRS23
Xing Wei, Roeland van Hout, Catia Cucchiarini, Danielle Reuvekamp, Helmer Strik:
Assessing Intelligibility in Non-native Speech: Comparing Measures Obtained at Different Levels. 964-968
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiangSMJQ00XQ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiangSMJQ00XQ23
Yukang Liang, Kaitao Song, Shaoguang Mao, Huiqiang Jiang, Luna Qiu, Yuqing Yang, Dongsheng Li, Linli Xu, Lili Qiu:
End-to-End Word-Level Pronunciation Assessment with MASK Pre-training. 969-973
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChaoLWSC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChaoLWSC23
Fu-An Chao, Tien-Hong Lo, Tzu-I Wu, Yao-Ting Sung, Berlin Chen:
A Hierarchical Context-aware Modeling Approach for Multi-aspect and Multi-granular Pronunciation Assessment. 974-978
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoCMNS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoCMNS23
Yingxiang Gao, Jaehyun Choi, Nobuaki Minematsu, Noriko Nakanishi, Daisuke Saito:
Automatic Prediction of Language Learners' Listenability Using Speech and Text Features Extracted from Listening Drills. 979-983
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShekarYHLKH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShekarYHLKH23
Ram C. M. C. Shekar, Mu Yang, Kevin Hirschi, Stephen D. Looney, Okim Kang, John H. L. Hansen:
Assessment of Non-Native Speech Intelligibility using Wav2vec2-based Mispronunciation Detection and Multi-level Goodness of Pronunciation Transformer. 984-988
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaQGK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaQGK23
Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill:
Adapting an Unadaptable ASR System. 989-993
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkC23
Jungbae Park, Seungtaek Choi:
Addressing Cold Start Problem for End-to-end Automatic Speech Scoring. 994-998
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RibeiroCL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RibeiroCL23
Manuel Sam Ribeiro, Giulia Comini, Jaime Lorenzo-Trueba:
Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings. 999-1003
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RichterPOFBMG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RichterPOFBMG23
Caitlin Richter, Ragnar Pálsson, Luke O'Brien, Kolbrún Friðriksdóttir, Branislav Bédi, Eydís Huld Magnúsdóttir, Jón Guðnason:
Orthography-based Pronunciation Scoring for Better CAPT Feedback. 1004-1008
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0002S023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0002S023
Hongfu Liu, Mingqian Shi, Ye Wang:
Zero-Shot Automatic Pronunciation Assessment. 1009-1013
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuuPND23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuuPND23
Tuong Tu Huu, Viet-Thanh Pham, Thi Thu Trang Nguyen, Thai Lai Dao:
Mispronunciation detection and diagnosis model for tonal language, applied to Vietnamese. 1014-1018

Keynote 2

- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/Dignum23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Dignum23
Virginia Dignum:
Beyond the AI hype: Balancing Innovation and Social Responsibility. 1019

Paralinguistics 1

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/StemmerLOLCB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/StemmerLOLCB23
Georg Stemmer, Paulo López-Meyer, Juan A. del Hoyo Ontiveros, Jose A. Lopez, Héctor A. Cordourier, Tobias Bocklet:
Detection of Emotional Hotspots in Meetings Using a Cross-Corpus Approach. 1020-1024
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MatsudaA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MatsudaA23
Takuto Matsuda, Yoshiko Arimoto:
Detection of Laughter and Screaming Using the Attention and CTC Models. 1025-1029
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BhattacharyaCH023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BhattacharyaCH023
Debasmita Bhattacharya, Jie Chi, Julia Hirschberg, Peter Bell:
Capturing Formality in Speech Across Domains and Languages. 1030-1034
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiHM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiHM23
Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain:
Towards Robust Family-Infant Audio Analysis Based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio. 1035-1039
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FeindtREEZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FeindtREEZ23
Kathrin Feindt, Martina Rossi, Ghazaleh Esfandiari-Baiat, Axel G. Ekström, Margaret Zellers:
Cues to next-speaker projection in conversational Swedish: Evidence from reaction times. 1040-1044
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BukerAV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BukerAV23
Abeer A. N. Buker, Huda Alsofyani, Alessandro Vinciarelli:
Multiple Instance Learning for Inference of Child Attachment From Paralinguistic Aspects of Speech. 1045-1049

Speech Enhancement and Denoising

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EskimezYJTPW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EskimezYJTPW23
Sefik Emre Eskimez, Takuya Yoshioka, Alex Ju, Min Tang, Tanel Pärnamaa, Huaming Wang:
Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation. 1050-1054
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiMYL0Z23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiMYL0Z23
Andong Li, Weixin Meng, Guochen Yu, Wenzhe Liu, Xiaodong Li, Chengshi Zheng:
TaylorBeamixer: Learning Taylor-Inspired All-Neural Multi-Channel Speech Enhancement from Beam-Space Dictionary Perspective. 1055-1059
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangZ23
Yulong Wang, Xueliang Zhang:
MFT-CRN: Multi-scale Fourier Transform for Monaural Speech Enhancement. 1060-1064
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoD0GZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoD0GZ23
Zilu Guo, Jun Du, Chin-Hui Lee, Yu Gao, Wenbin Zhang:
Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement. 1065-1069
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Taherian0WXW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Taherian0WXW23
Hassan Taherian, Ashutosh Pandey, Daniel Wong, Buye Xu, DeLiang Wang:
Multi-input Multi-output Complex Spectral Mapping for Speaker Separation. 1070-1074
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OberhagN0RP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OberhagN0RP23
Maurice Oberhag, Daniel Neudek, Rainer Martin, Tobias Rosenkranz, Henning Puder:
Short-term Extrapolation of Speech Signals Using Recursive Neural Networks in the STFT Domain. 1075-1079

Speech Synthesis: Evaluation

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PandeyEMH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PandeyEMH23
Ayushi Pandey, Jens Edlund, Sébastien Le Maguer, Naomi Harte:
Listener sensitivity to deviating obstruents in WaveNet. 1080-1084
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkTNSXS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkTNSXS23
Joonyong Park, Shinnosuke Takamichi, Tomohiko Nakamura, Kentaro Seki, Detai Xin, Hiroshi Saruwatari:
How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics. 1085-1089
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CampKFC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CampKFC23
Joshua Camp, Tom Kenter, Lev Finkelstein, Rob Clark:
MOS vs. AB: Evaluating Text-to-Speech Systems Reliably Using Clustered Standard Errors. 1090-1094
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangZZQ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangZZQ23
Hui Wang, Shiwan Zhao, Xiguang Zheng, Yong Qin:
RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting. 1095-1099
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Melnik-LeroyN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Melnik-LeroyN23
Gerda Ana Melnik-Leroy, Gediminas Navickas:
Can Better Perception Become a Disadvantage? Synthetic Speech Perception in Congenitally Blind Users. 1100-1103
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CooperY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CooperY23
Erica Cooper, Junichi Yamagishi:
Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech. 1104-1108

End-to-end Spoken Dialog Systems

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001G23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001G23
Mutian He, Philip N. Garner:
Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding. 1109-1113
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Rajaa23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Rajaa23
Shangeth Rajaa:
Improving End-to-End SLU performance with Prosodic Attention and Distillation. 1114-1118
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimSLLKS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimSLLKS23
Suyoun Kim, Akshat Shrivastava, Duc Le, Ju Lin, Ozlem Kalinli, Michael L. Seltzer:
Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding. 1119-1123
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangLZHL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangLZHL23
Lingyan Huang, Tao Li, Haodong Zhou, Qingyang Hong, Lin Li:
Cross-Modal Semantic Alignment before Fusion for Two-Pass End-to-End Spoken Language Understanding. 1124-1128
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunderF0KK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunderF0KK23
Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas, Hong-Kwang Jeff Kuo, Brian Kingsbury:
ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding. 1129-1133
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChengZ0LLZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChengZ0LLZ23
Xuxin Cheng, Zhihong Zhu, Ziyu Yao, Hongxiang Li, Yaowei Li, Yuexian Zou:
GhostT5: Generate More Features with Cheap Operations to Improve Textless Spoken Question Answering. 1134-1138

Biosignal-enabled Spoken Communication

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangCDLZWH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangCDLZWH23
Kaibo Zhang, Lili Cao, Yiming Ding, Yanru Li, Chao Zhang, Ji Wu, Demin Han:
Obstructive Sleep Apnea Detection using Pre-trained Speech Representations. 1139-1143
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangC023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangC023
Ruicong Wang, Siqi Cai, Haizhou Li:
EEG-based Auditory Attention Detection with Spatiotemporal Graph and Graph Convolutional Network. 1144-1148
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BeesonR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BeesonR23
Rachel Beeson, Korin Richmond:
Silent Speech Recognition with Articulator Positions Estimated from Tongue Ultrasound and Lip Video. 1149-1153
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangXZWZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangXZWZ23
Kai Yang, Zhuang Xie, Di Zhou, Longbiao Wang, Gaoyan Zhang:
Auditory Attention Detection in Real-Life Scenarios Using Common Spatial Patterns from EEG. 1154-1158
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimLLL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimLLL23
Soowon Kim, Young-Eun Lee, Seo-Hyun Lee, Seong-Whan Lee:
Diff-E: Diffusion-based Learning for Decoding Imagined Speech EEG. 1159-1163
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CsapoANB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CsapoANB23
Tamás Gábor Csapó, Frigyes Viktor Arthur, Péter Nagy, Ádám Boncz:
Towards Ultrasound Tongue Image prediction from EEG during speech production. 1164-1168
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001SGC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001SGC23
László Tóth, Amin Honarmandi Shandiz, Gábor Gosztolya, Tamás Gábor Csapó:
Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks. 1169-1173
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ScheckS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ScheckS23
Kevin Scheck, Tanja Schultz:
STE-GAN: Speech-to-Electromyography Signal Conversion using Generative Adversarial Networks. 1174-1178
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SalomonsBNH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SalomonsBNH23
Inge Salomons, Eder del Blanco, Eva Navas, Inma Hernáez:
Spanish Phone Confusion Analysis for EMG-Based Silent Speech Interfaces. 1179-1183
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiW0Z0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiW0Z0023
Huiyan Li, Mingyi Wang, Han Gao, Shuo Zhao, Guang Li, You Wang:
Hybrid Silent Speech Interface Through Fusion of Electroencephalography and Electromyography. 1184-1188

Neural-based Speech and Acoustic Analysis

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SarkarM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SarkarM23
Eklavya Sarkar, Mathew Magimai-Doss:
Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers? 1189-1193
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CaiVL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CaiVL23
Jinjin Cai, Sudip Vhaduri, Xiao Luo:
Discovering COVID-19 Coughing and Breathing Patterns from Unlabeled Data Using Contrastive Learning with Varying Pre-Training Domains. 1194-1198
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XinYZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XinYZ23
Yifei Xin, Dongchao Yang, Yuexian Zou:
Background-aware Modeling for Weakly Supervised Sound Event Detection. 1199-1203
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SrivastavaDP023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SrivastavaDP023
Prerak Srivastava, Antoine Deleforge, Archontis Politis, Emmanuel Vincent:
How to (Virtually) Train Your Speaker Localizer. 1204-1208
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GhoshTRSM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GhoshTRSM23
Sreyan Ghosh, Utkarsh Tyagi, S. Ramaneswaran, Harshvardhan Srivastava, Dinesh Manocha:
MMER: Multimodal Multi-task Learning for Speech Emotion Recognition. 1209-1213
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KhandelwalD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KhandelwalD23
Tanmay Khandelwal, Rohan Kumar Das:
A Multi-Task Learning Framework for Sound Event Detection using High-level Acoustic Characteristics of Sounds. 1214-1218

DiGo - Dialog for Good: Speech and Language Technology for Social Good

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NeumannKHR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NeumannKHR23
Michael Neumann, Hardik Kothare, Doug Habberstad, Vikram Ramanarayanan:
A Multimodal Investigation of Speech, Text, Cognitive and Facial Video Features for Characterizing Depression With and Without Medication. 1219-1223
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AddleseeD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AddleseeD23
Angus Addlesee, Marco Damonte:
Understanding Disrupted Sentences Using Underspecified Abstract Meaning Representation. 1224-1228
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FieldVSEJ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FieldVSEJ23
Anjalie Field, Prateek Verma, Nay San, Jennifer L. Eberhardt, Dan Jurafsky:
Developing Speech Processing Pipelines for Police Accountability. 1229-1233
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SzekelyG023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SzekelyG023
Éva Székely, Joakim Gustafson, Ilaria Torre:
Prosody-controllable Gender-ambiguous Speech Synthesis: A Tool for Investigating Implicit Bias in Speech Perception. 1234-1238
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RouasWS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RouasWS23
Jean-Luc Rouas, Yaru Wu, Takaaki Shochi:
Affective attributes of French caregivers' professional speech. 1239-1243

Speech Recognition: Signal Processing, Acoustic Modeling, Robustness, Adaptation 3

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CasanovaSKJSAP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CasanovaSKJSAP23
Edresson Casanova, Christopher Shulby, Alexander Korolev, Arnaldo Cândido Júnior, Anderson da Silva Soares, Sandra M. Aluísio, Moacir Antonelli Ponti:
ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion. 1244-1248
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuDZ0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuDZ0023
Yue Gu, Zhihao Du, Shiliang Zhang, Qian Chen, Jiqing Han:
Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition. 1249-1253
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ItoKFK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ItoKFK23
Aoi Ito, Tatsuya Komatsu, Yusuke Fujita, Yusuke Kida:
Target Vocabulary Recognition Based on Multi-Task Learning with Decomposed Teacher Sequences. 1254-1258
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShenABC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShenABC23
Gaofei Shen, Afra Alishahi, Arianna Bisazza, Grzegorz Chrupala:
Wave to Syntax: Probing spoken language models for syntax. 1259-1263
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NaowaratHDTH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NaowaratHDTH23
Burin Naowarat, Philip Harding, Pasquale D'Alterio, Sibo Tong, Bashar Awwad Shiekh Hasan:
Effective Training of Attention-based Contextual Biasing Adapters with Synthetic Audio for Personalised ASR. 1264-1268
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaZY00023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaZY00023
Ziyang Ma, Zhisheng Zheng, Guanrou Yang, Yu Wang, Chao Zhang, Xie Chen:
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation. 1269-1273
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HaqueSCS0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HaqueSCS0023
Mirazul Haque, Rutvij Shah, Simin Chen, Berrak Sisman, Cong Liu, Wei Yang:
SlothSpeech: Denial-of-service Attack Against Speech Recognition Models. 1274-1278
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangHW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangHW23
Zhihan Wang, Feng Hou, Ruili Wang:
CLRL-Tuning: A Novel Continual Learning Approach for Automatic Speech Recognition. 1279-1283
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LaiH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LaiH23
Li-Fang Lai, Nicole R. Holliday:
Exploring Sources of Racial Bias in Automatic Speech Recognition through the Lens of Rhythmic Variation. 1284-1288
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunZ0W23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunZ0W23
Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C. Woodland:
Can Contextual Biasing Remain Effective with Whisper and GPT-2? 1289-1293
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NiizumiTOHK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NiizumiTOHK23
Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino:
Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation. 1294-1298
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CuiSK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CuiSK23
Xiaodong Cui, George Saon, Brian Kingsbury:
Improving RNN Transducer Acoustic Models for English Conversational Speech Recognition. 1299-1303
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XieH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XieH23
Jiamin Xie, John H. L. Hansen:
MixRep: Hidden Representation Mixup for Low-Resource Speech Recognition. 1304-1308
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiQ0KWYQ023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiQ0KWYQ023
Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng:
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers. 1314-1318
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Ng0ZM0NZ0WC023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Ng0ZM0NZ0WC023
Dianwen Ng, Chong Zhang, Ruixi Zhang, Yukun Ma, Trung Hieu Nguyen, Chongjia Ni, Shengkui Zhao, Qian Chen, Wen Wang, Eng Siong Chng, Bin Ma:
Adapter-tuning with Effective Token-dependent Representation Shift for Automatic Speech Recognition. 1319-1323
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuHSTFLCWS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuHSTFLCWS23
Yiting Lu, Philip Harding, Kanthashree Mysore Sathyendra, Sibo Tong, Xuandi Fu, Jing Liu, Feng-Ju Chang, Simon Wiesler, Grant P. Strimel:
Model-Internal Slot-triggered Biasing for Domain Expansion in Neural Transducer ASR Models. 1324-1328
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Yao0KGYYLP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Yao0KGYYLP23
Zengwei Yao, Wei Kang, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Yifan Yang, Long Lin, Daniel Povey:
Delay-penalized CTC Implemented Based on Finite State Transducer. 1329-1333

Speech Recognition: Architecture, Search, and Linguistic Components 2

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuTXS0Y00M23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuTXS0Y00M23
Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen Meng:
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation. 1334-1338
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenXKHLM023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenXKHLM023
Zhipeng Chen, Haihua Xu, Yerbolat Khassanov, Yi He, Lu Lu, Zejun Ma, Ji Wu:
Knowledge Distillation Approach for Efficient Internal Language Model Estimation. 1339-1343
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AdhikaryV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AdhikaryV23
Jiban Adhikary, Keith Vertanen:
Language Model Personalization for Improved Touchscreen Typing. 1344-1348
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JungKSS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JungKSS23
Minkyu Jung, Ohhyeok Kwon, Seunghyun Seo, Soonshin Seo:
Blank Collapse: Compressing CTC Emission for the Faster Decoding. 1349-1353
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PeyserMPRSPCH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PeyserMPRSPCH23
Cal Peyser, Zhong Meng, Rohit Prabhavalkar, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho, Ke Hu:
Improving Joint Speech-Text Representations Without Alignment. 1354-1358
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FlynnR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FlynnR23
Robert Flynn, Anton Ragni:
Leveraging Cross-Utterance Context For ASR Decoding. 1359-1363
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HanC0X023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HanC0X023
Minglun Han, Feilong Chen, Jing Shi, Shuang Xu, Bo Xu:
Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation. 1364-1368
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TsunooFKA023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TsunooFKA023
Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe:
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition. 1369-1373
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Jiang0W23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Jiang0W23
Dongcheng Jiang, Chao Zhang, Philip C. Woodland:
A Neural Time Alignment Module for End-to-End Automatic Speech Recognition. 1374-1378
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Li00023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Li00023
Yuang Li, Yu Wu, Jinyu Li, Shujie Liu:
Accelerating Transducers through Adjacent Token Merging. 1379-1383
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FengTXHW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FengTXHW23
Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang:
Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition. 1384-1388
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangMLD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangMLD23
Wenxuan Wang, Guodong Ma, Yuke Li, Binbin Du:
Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition. 1389-1393
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeMK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeMK23
Jaeyoung Lee, Masato Mimura, Tatsuya Kawahara:
Embedding Articulatory Constraints for Low-resource Speech Recognition Based on Large Pre-trained Model. 1394-1398
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangYFM023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangYFM023
Xuankai Chang, Brian Yan, Yuya Fujita, Takashi Maekaku, Shinji Watanabe:
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning. 1399-1403
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AntonovaBG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AntonovaBG23
Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg:
SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings. 1404-1408
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BijwadiaCWMZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BijwadiaCWMZ23
Shaan Bijwadia, Shuo-Yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang:
Text Injection for Capitalization and Turn-Taking Prediction in Speech Models. 1409-1413
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GitmanLLG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GitmanLLG23
Igor Gitman, Vitaly Lavrukhin, Aleksandr Laptev, Boris Ginsburg:
Confidence-based Ensembles of End-to-End Speech Recognition Models. 1414-1418
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChiLE0JA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChiLE0JA23
Jie Chi, Brian Lu, Jason Eisner, Peter Bell, Preethi Jyothi, Ahmed M. Ali:
Unsupervised Code-switched Text Generation from Parallel Text. 1419-1423
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLLWQZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLLWQZ23
Dingyi Wang, Mengjie Luo, Lin Li, Xiaoqin Wang, Shushan Qiao, Yumei Zhou:
A Binary Keyword Spotting System with Error-Diffusion Based Feature Binarization. 1424-1428
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FengTXH023a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FengTXH023a
Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang:
Language-universal Phonetic Encoder for Low-resource Speech Recognition. 1429-1433
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinC23
Chong-En Lin, Kuan-Yu Chen:
A Lexical-aware Non-autoregressive Transformer-based ASR Model. 1434-1438
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VurenN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VurenN23
Joshua Jansen van Vüren, Thomas Niesler:
Improving Under-Resourced Code-Switched Speech Recognition: Large Pre-trained Models or Architectural Interventions. 1439-1443

Spoken Language Translation, Information Retrieval, Summarization, Resources, and Evaluation 1

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Bellegarda23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Bellegarda23
Jerome R. Bellegarda:
Pragmatic Pertinence: A Learnable Confidence Metric to Assess the Subjective Quality of LM-Generated Text. 1444-1448
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiZK0L23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiZK0L23
Yuanchao Li, Zeyu Zhao, Ondrej Klejch, Peter Bell, Catherine Lai:
ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition. 1449-1453
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001AZ0SR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001AZ0SR23
Roshan Sharma, Siddhant Arora, Kenneth Zheng, Shinji Watanabe, Rita Singh, Bhiksha Raj:
BASS: Block-wise Adaptation for Speech Summarization. 1454-1458
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShekarH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShekarH23
Meena M. Chandra Shekar, John H. L. Hansen:
Speaker Tracking using Graph Attention Networks with Varying Duration Utterances across Multi-Channel Naturalistic Data: Fearless Steps Apollo-11 Audio Corpus. 1459-1463
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YanMNH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YanMNH23
Tianfang Yan, Kikuo Maekawa, Yukiko Nota, Masayuki Hirata:
Combining language corpora in a Japanese electromagnetic articulography database for acoustic-to-articulatory inversion. 1464-1467
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangL23a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangL23a
Xiaoheng Zhang, Yang Li:
A Dual Attention-based Modality-Collaborative Fusion Network for Emotion Recognition. 1468-1472
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChivrigaR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChivrigaR23
Cristian Chivriga, Rinita Roy:
Large Dataset Generation of Synchronized Music Audio and Lyrics at Scale using Teacher-Student Paradigm. 1473-1477
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Banerjee023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Banerjee023
Adhiraj Banerjee, Vipul Arora:
Enc-Dec RNN Acoustic Word Embeddings learned via Pairwise Prediction. 1478-1482
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoteyDH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoteyDH23
Samantha Kotey, Rozenn Dahyot, Naomi Harte:
Query Based Acoustic Summarization for Podcasts. 1483-1487
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/00010L0Y23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/00010L0Y23
Ying Shi, Dong Wang, Lantian Li, Jiqing Han, Shi Yin:
Spot Keywords From Very Noisy and Mixed Speech. 1488-1492
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NayemXCS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NayemXCS23
Khandokar Md. Nayem, Ran Xue, Ching-Yun Chang, Akshaya Vishnu Kudlu Shanbhogue:
Knowledge Distillation on Joint Task End-to-End Speech Translation. 1493-1497
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangZHS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangZHS23
Hao Yang, Jinming Zhao, Gholamreza Haffari, Ehsan Shareghi:
Investigating Pre-trained Audio Encoders in the Low-Resource Condition. 1498-1502
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuL0L23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuL0L23
Guan-Wei Wu, Guan-Ting Lin, Shang-Wen Li, Hung-yi Lee:
Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target. 1503-1507

Speech, Voice, and Hearing Disorders 1

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimCSL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimCSL23
Eungbeom Kim, Yunkee Chae, Jaeheon Sim, Kyogu Lee:
Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test. 1508-1512
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PapadimitriouP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PapadimitriouP23
Katerina Papadimitriou, Gerasimos Potamianos:
Multimodal Locally Enhanced Transformer for Continuous Sign Language Recognition. 1513-1517
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MachorroHRHHPZS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MachorroHRHHPZS23
Monica González Machorro, Pascal Hecker, Uwe D. Reichel, Helly N. Hammer, Robert Hoepner, Lisa Pedrotti, Alisha Zmutt, Hesam Sagha, Johan van Beek, Florian Eyben, Dagmar M. Schuller, Björn W. Schuller, Bert Arnrich:
Towards Supporting an Early Diagnosis of Multiple Sclerosis using Vocal Features. 1518-1522
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RathodCVJP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RathodCVJP23
Siddharth Rathod, Monil Charola, Akshat Vora, Yash Jogi, Hemant A. Patil:
Whisper Features for Dysarthric Severity-Level Classification. 1523-1527
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TangCC0M23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TangCC0M23
Jiyang Tang, William Chen, Xuankai Chang, Shinji Watanabe, Brian MacWhinney:
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning. 1528-1532
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YueLC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YueLC23
Zhengjun Yue, Erfan Loweimi, Zoran Cvetkovic:
Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra. 1533-1537
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BayerlWBHBNR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BayerlWBHBNR23
Sebastian P. Bayerl, Dominik Wagner, Ilja Baumann, Florian Hönig, Tobias Bocklet, Elmar Nöth, Korbinian Riedhammer:
A Stutter Seldom Comes Alone - Cross-Corpus Stuttering Detection as a Multi-label Problem. 1538-1542
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BhattacharjeeJB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BhattacharjeeJB23
Tanuka Bhattacharjee, Anjali Jayakumar, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Prasanta Kumar Ghosh:
Transfer Learning to Aid Dysarthria Severity Classification for Patients with Amyotrophic Lateral Sclerosis. 1543-1547
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangT0SLDM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangT0SLDM23
Helin Wang, Thomas Thebaud, Jesús Villalba, Myra Sydnor, Becky Lammers, Najim Dehak, Laureano Moro-Velázquez:
DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model. 1548-1552
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HedeshyMS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HedeshyMS23
Ramin Hedeshy, Raphael Menges, Steffen Staab:
CNVVE: Dataset and Benchmark for Classifying Non-verbal Voice. 1553-1557
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaaliASK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaaliASK23
Massa Baali, Ibrahim Almakky, Shady Shehata, Fakhri Karray:
Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation. 1558-1562
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KouzelisPKK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KouzelisPKK23
Theodoros Kouzelis, Georgios Paraskevopoulos, Athanasios Katsamanis, Vassilis Katsouros:
Weakly-supervised forced alignment of disfluent speech using phoneme-level modeling. 1563-1567
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NovotnyTSKR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NovotnyTSKR23
Michal Novotný, Tereza Tykalová, Michal Simek, Tomás Kouba, Jan Rusz:
Glottal source analysis of voice deficits in basal ganglia dysfunction: evidence from de novo Parkinson's disease and Huntington's disease. 1568-1572
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MunKKRKC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MunKKRKC23
Jihyun Mun, Sunhee Kim, Myeong-Ju Kim, Jiwon Ryu, Sejoong Kim, Minhwa Chung:
An Analysis of Glottal Features of Chronic Kidney Disease Speech and Its Application to CKD Detection. 1573-1577
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BelagaliRG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BelagaliRG23
Varun Belagali, M. V. Achuth Rao, Prasanta Kumar Ghosh:
Weakly supervised glottis segmentation in high-speed videoendoscopy using bounding box labels. 1578-1582

Speech Recognition: Technologies and Systems for New Applications 3

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiLLSMF23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiLLSMF23
Zhengyang Li, Chenwei Liang, Timo Lohrenz, Marvin Sach, Björn Möller, Tim Fingscheidt:
An Efficient and Noise-Robust Audiovisual Encoder for Audiovisual Speech Recognition. 1583-1587
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SinghHW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SinghHW23
Satwinder Singh, Feng Hou, Ruili Wang:
A Novel Self-training Approach for Low-resource Speech Recognition. 1588-1592
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoLWLSCLZDZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoLWLSCLZDZ23
Zhifu Gao, Zerui Li, Jiaming Wang, Haoneng Luo, Xian Shi, Mengzhe Chen, Yabin Li, Lingyun Zuo, Zhihao Du, Shiliang Zhang:
FunASR: A Fundamental End-to-End Speech Recognition Toolkit. 1593-1597
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001MPFP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001MPFP23
Pingchuan Ma, Niko Moritz, Stavros Petridis, Christian Fuegen, Maja Pantic:
Streaming Audio-Visual Speech Recognition with Alignment Regularization. 1598-1602
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Fernandez-Lopez23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Fernandez-Lopez23
Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma, Alexandros Haliassos, Stavros Petridis, Maja Pantic:
SparseVSR: Lightweight and Noise Robust Visual Speech Recognition. 1603-1607
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangZMAST23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangZMAST23
Allen Chang, Xiaoyuan Zhu, Aarav Monga, Seoho Ahn, Tejas Srinivasan, Jesse Thomason:
Multimodal Speech Recognition for Language-Guided Embodied Agents. 1608-1612

Spoken Term Detection and Voice Search

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NishuCN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NishuCN23
Kumari Nishu, Minsik Cho, Devang Naik:
Matching Latent Encoding for Audio-Text based Keyword Spotting. 1613-1617
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SudhakarRM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SudhakarRM23
P. Sudhakar, K. Sreenivasa Rao, Pabitra Mitra:
Self-Paced Pattern Augmentation for Spoken Term Detection in Zero-Resource. 1618-1622
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangGTDL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangGTDL23
Gene-Ping Yang, Yue Gu, Qingming Tang, Dongsu Du, Yuzong Liu:
On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation. 1623-1627
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MichieliPO23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MichieliPO23
Umberto Michieli, Pablo Peso Parada, Mete Ozay:
Online Continual Learning in Keyword Spotting for Low-Resource Devices via Pooling High-Order Temporal Statistics. 1628-1632
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangKSC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangKSC23
Seunghan Yang, Byeonggeun Kim, Kyuhong Shim, Simyoung Chang:
Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data. 1633-1637
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangSSLSJ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangSSLSJ23
Chouchang Yang, Yashas Malur Saidutta, Rakshith Sharma Srinivasa, Ching Hua Lee, Yilin Shen, Hongxia Jin:
Robust Keyword Spotting for Noisy Environments by Leveraging Speech Enhancement and Speech Presence Probability. 1638-1642

Models for Streaming ASR

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangLD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangLD23
Yuting Yang, Yuke Li, Binbin Du:
Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning. 1643-1647
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Song0ZP0P023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Song0ZP0P023
Xingchen Song, Di Wu, Binbin Zhang, Zhendong Peng, Bo Dang, Fuping Pan, Zhiyong Wu:
ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs. 1648-1652
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimSLB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimSLB23
Hanbyul Kim, Seunghyun Seo, Lukas Lee, Seolki Baek:
Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation. 1653-1657
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuybrechtsRLNBK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuybrechtsRLNBK23
Goeric Huybrechts, Srikanth Ronanki, Xilai Li, Hadis Nosrati, Sravan Bodapati, Katrin Kirchhoff:
DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer. 1658-1662
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShimLCH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShimLCH23
Kyuhong Shim, Jinkyu Lee, Simyoung Chang, Kyuwoong Hwang:
Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer. 1663-1667
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuYHGZLCL023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuYHGZLCL023
Tianyi Xu, Zhanheng Yang, Kaixun Huang, Pengcheng Guo, Ao Zhang, Biao Li, Changru Chen, Chao Li, Lei Xie:
Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition. 1668-1672

Source Separation

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MartelRL0G23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MartelRL0G23
Héctor Martel, Julius Richter, Kai Li, Xiaolin Hu, Timo Gerkmann:
Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model. 1673-1677
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaijoO23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaijoO23
Kohei Saijo, Tetsuji Ogawa:
Remixing-based Unsupervised Source Separation from Scratch. 1678-1682
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OkamotoSIDHK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OkamotoSIDHK23
Yuki Okamoto, Kanta Shimonishi, Keisuke Imoto, Kota Dohi, Shota Horiguchi, Yohei Kawaguchi:
CAPTDURE: Captioned Sound Dataset of Single Sources. 1683-1687
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MunakataTK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MunakataTK23
Hokuto Munakata, Ryu Takeda, Kazunori Komatani:
Recursive Sound Source Separation with Deep Learning-based Beamforming for Unknown Number of Sources. 1688-1692
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MosnerPPBC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MosnerPPBC23
Ladislav Mosner, Oldrich Plchot, Junyi Peng, Lukás Burget, Jan Cernocký:
Multi-Channel Speech Separation with Cross-Attention and Beamforming. 1693-1697
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EomNK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EomNK23
Deokjun Eom, Woo Hyun Nam, Kyung-Rae Kim:
Background-Sound Controllable Voice Source Separation. 1698-1702

Speech and Language in Health: From Remote Monitoring to Medical Conversations 1

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Escobar-Grisales23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Escobar-Grisales23
Daniel Escobar-Grisales, Tomás Arias-Vergara, Cristian David Ríos-Urrego, Elmar Nöth, Adolfo M. García, Juan Rafael Orozco-Arroyave:
An Automatic Multimodal Approach to Analyze Linguistic and Acoustic Cues on Parkinson's Disease Patients. 1703-1707
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TranHNNV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TranHNNV23
Khanh-Tung Tran, Truong Hoang, Duy Khuong Nguyen, Hoang D. Nguyen, Xuan-Son Vu:
Personalization for Robust Voice Pathology Detection in Sound Waves. 1708-1712
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MengMMFGKLMWWWW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MengMMFGKLMWWWW23
Helen Meng, Brian Mak, Man-Wai Mak, Helene H. Fung, Xianmin Gong, Timothy C. Y. Kwok, Xunying Liu, Vincent C. T. Mok, Patrick C. M. Wong, Jean Woo, Xixin Wu, Ka Ho Wong, Sean Shensheng Xu, Naijun Zheng, Ranzo Huang, Jiawen Kang, Xiaoquan Ke, Junan Li, Jinchao Li, Yi Wang:
Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders. 1713-1717
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NiuRJMP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NiuRJMP23
Minxue Niu, Amrit Romana, Mimansa Jaiswal, Melvin G. McInnis, Emily Mower Provost:
Capturing Mismatch between Textual and Acoustic Emotion Expressions for Mood Identification in Bipolar Disorder. 1718-1722
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiWRGL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiWRGL23
Qifei Li, Dong Wang, Yiming Ren, Yingming Gao, Ya Li:
FTA-net: A Frequency and Time Attention Network for Speech Depression Detection. 1723-1727
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FaraHGGMC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FaraHGGMC23
Salvatore Fara, Orlaith Hickey, Alexandra Livia Georgescu, Stefano Goria, Emilia Molimpakis, Nicholas Cummins:
Bayesian Networks for the robust and unbiased prediction of depression and its symptoms utilizing speech and multimodal data. 1728-1732
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangHDJGWML23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangHDJGWML23
Tianzi Wang, Shoukang Hu, Jiajun Deng, Zengrui Jin, Mengzhe Geng, Yi Wang, Helen Meng, Xunying Liu:
Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition. 1733-1737
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CampbellDCMWOSB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CampbellDCMWOSB23
Edward L. Campbell, Judith Dineley, Pauline Conde, Faith Matcham, Katie M. White, Carolin Oetzmann, Sara Simblett, Stuart Bruce, Amos A. Folarin, Til Wykes, Srinivasan Vairavan, Richard J. B. Dobson, Laura Docío Fernández, Carmen García-Mateo, Vaibhav A. Narayan, Matthew Hotopf, Nicholas Cummins:
Classifying depression symptom severity: Assessment of speech representations in personalized and generalized machine learning models. 1738-1742
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GhaffarzadeganB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GhaffarzadeganB23
Shabnam Ghaffarzadegan, Luca Bondi, Ho-Hsiang Wu, Sirajum Munir, Kelly J. Shields, Samarjit Das, Joseph Aracri:
Active Learning for Abnormal Lung Sound Data Curation and Detection in Asthma. 1743-1747
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Perez-ToroABHTA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Perez-ToroABHTA23
Paula Andrea Pérez-Toro, Tomás Arias-Vergara, Franziska Braun, Florian Hönig, Carlos Andrés Tobón-Quintero, David Aguillón, Francisco Lopera, Liliana Hincapié-Henao, Maria Schuster, Korbinian Riedhammer, Andreas Maier, Elmar Nöth, Juan Rafael Orozco-Arroyave:
Automatic Assessment of Alzheimer's across Three Languages Using Speech and Language Features. 1748-1752
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GengXSYJWH0ML23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GengXSYJWH0ML23
Mengzhe Geng, Xurong Xie, Rongfeng Su, Jianwei Yu, Zengrui Jin, Tianzi Wang, Shujie Hu, Zi Ye, Helen Meng, Xunying Liu:
On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition. 1753-1757
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SvihlikIKSKTJR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SvihlikIKSKTJR23
Jan Svihlík, Vojtech Illner, Petr Krýze, Mário Sousa, Paul Krack, Elina Tripoliti, Robert Jech, Jan Rusz:
Relationship between LTAS-based spectral moments and acoustic parameters of hypokinetic dysarthria in Parkinson's disease. 1758-1762
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AlvaradoGLMWMSY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AlvaradoGLMWMSY23
Eduardo Alvarado, Nicolás Grágeda, Alejandro Luzanto, Rodrigo Mahú, Jorge Wuth, Laura Mendoza, Richard M. Stern, Néstor Becerra Yoma:
Respiratory distress estimation in human-robot interaction scenario. 1763-1767
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Reyner-FuentesR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Reyner-FuentesR23
Emma Reyner-Fuentes, Esther Rituerto-González, Isabel Trancoso, Carmen Peláez-Moreno:
Prediction of the Gender-based Violence Victim Condition using Speech: What do Machine Learning Models rely on? 1768-1772
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CharolaKP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CharolaKP23
Monil Charola, Aastha Kachhi, Hemant A. Patil:
Whisper Encoder features for Infant Cry Classification. 1773-1777

Speech Perception

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JurovIF23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JurovIF23
Nika Jurov, William J. Idsardi, Naomi H. Feldman:
A neural architecture for selective attention to speech features. 1778-1782
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuoSFT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuoSFT23
Mingyue Huo, Yinglun Sun, Daniel Fogerty, Yan Tang:
Quantifying Informational Masking due to Masker Intelligibility in Same-talker Speech-in-speech Perception. 1783-1787
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CuervoM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CuervoM23
Santiago Cuervo, Ricard Marxer:
On the Benefits of Self-supervised Learned Speech Representations for Predicting Human Phonetic Misperceptions. 1788-1792
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SchulzSRH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SchulzSRH23
Felicia Schulz, Mirella De Sisto, M. Paula M. P. Roncaglia-Denissen, Peter Hendrix:
Predicting Perceptual Centers Located at Vowel Onset in German Speech Using Long Short-Term Memory Networks. 1793-1797
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CookeL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CookeL23
Martin Cooke, María Luisa García Lecumberri:
Exploring the mutual intelligibility breakdown caused by sculpting speech from a competing speech signal. 1798-1802
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KitaharaWNHHM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KitaharaWNHHM23
Mafuyu Kitahara, Naoya Watabe, Hiroto Noguchi, Chuyu Huang, Ayako Hashimoto, Ai Mizoguchi:
Perception of Incomplete Voicing Neutralization of Obstruents in Tohoku Japanese. 1803-1807

Phonetics and Phonology: Languages and Varieties

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PohnleinK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PohnleinK23
Jasmin Pöhnlein, Felicitas Kleber:
The emergence of obstruent-intrinsic f0 and VOT as cues to the fortis/lenis contrast in West Central Bavarian. 1808-1812
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HavardSSPC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HavardSSPC23
William N. Havard, Yaya Sy, Camila Scaff, Loann Peurey, Alejandrina Cristià:
〈'〉 in Tsimane': a Preliminary Investigation. 1813-1817
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoffmannO23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoffmannO23
Dennis Hoffmann, Maria O'Reilly:
Segmental features of Brazilian (Santa Catarina) Hunsrik. 1818-1822
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RatkoPC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RatkoPC23
Louise Ratko, Joshua Penney, Felicity Cox:
Opening or Closing? An Electroglottographic Analysis of Voiceless Coda Consonants in Australian English. 1823-1827
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zebe23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zebe23
Franka Zebe:
Increasing aspiration of word-medial fortis plosives in Swiss Standard German. 1828-1832
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShaoBHG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShaoBHG23
Bowei Shao, Philipp Buech, Anne Hermes, Maria Giavazzi:
Lexical Stress and Velar Palatalization in Italian: A spatio-temporal Interaction. 1833-1837

Paralinguistics 2

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0009SHC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0009SHC23
Zihan Wu, Neil Scheidwasser-Clow, Karl El Hajal, Milos Cernak:
Speaker Embeddings as Individuality Proxy for Voice Stress Detection. 1838-1842
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuDSA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuDSA23
Jingyao Wu, Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah:
From Interval to Ordinal: A HMM based Approach for Emotion Label Conversion. 1843-1847
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangLLWGL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangLLWGL23
Zhiyu Zhang, Da Liu, Shengqiang Liu, Anna Wang, Jie Gao, Yali Li:
Turbo your multi-modal classification with contrastive learning. 1848-1852
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IoannidesOFR023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IoannidesOFR023
Georgios Ioannides, Michael Owen, Andrew Fletcher, Viktor Rozgic, Chao Wang:
Towards Paralinguistic-Only Speech Representations for End-to-End Speech Emotion Recognition. 1853-1857
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangWLLX0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangWLLX0023
Ruiteng Zhang, Jianguo Wei, Xugang Lu, Yongwei Li, Junhai Xu, Di Jin, Jianhua Tao:
SOT: Self-supervised Learning-Assisted Optimal Transport for Unsupervised Adaptive Speech Emotion Recognition. 1858-1862
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BansalDCJG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BansalDCJG23
Lokesh Bansal, S. Pavankumar Dubagunta, Malolan Chetlur, Pushpak Jagtap, Aravind Ganapathiraju:
On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition. 1863-1867
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuLL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuLL23
Shao-Hao Lu, Yun-Shao Lin, Chi-Chun Lee:
Speaking State Decoder with Transition Detection for Next Speaker Prediction. 1868-1872
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KitagishiTOMA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KitagishiTOMA23
Yuki Kitagishi, Naohiro Tawara, Atsunori Ogawa, Ryo Masumura, Taichi Asami:
What are differences? Comparing DNN and Human by Their Performance and Characteristics in Speaker Age Estimation. 1873-1877
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ArtsT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ArtsT23
Joop Arts, Khiet P. Truong:
Effects of perceived gender on the perceived social function of laughter. 1878-1882
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PurohitVM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PurohitVM23
Tilak Purohit, Bogdan Vlasenko, Mathew Magimai-Doss:
Implicit phonetic information modeling for speech emotion recognition. 1883-1887
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeemFOGB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeemFOGB23
Seong-Gyun Leem, Daniel Fulford, Jukka-Pekka Onnela, David Gard, Carlos Busso:
Computation and Memory Efficient Noise Adaptation of Wav2Vec2.0 for Noisy Speech Emotion Recognition with Skip Connection Adapters. 1888-1892
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuSCWZLW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuSCWZLW23
Yang Liu, Haoqin Sun, Geng Chen, Qingyue Wang, Zhen Zhao, Xugang Lu, Longbiao Wang:
Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions. 1893-1897
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NainiSB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NainiSB23
Abinay Reddy Naini, Ali N. Salman, Carlos Busso:
Preference Learning Labels by Anchoring on Consecutive Annotations. 1898-1902
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhukanB023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhukanB023
Orchid Chetia Phukan, Arun Balaji Buduru, Rajesh Sharma:
Transforming the Embeddings: A Lightweight Technique for Speech Emotion Recognition Tasks. 1903-1907
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0005LZZZL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0005LZZZL23
Cheng Lu, Hailun Lian, Wenming Zheng, Yuan Zong, Yan Zhao, Sunan Li:
Learning Local to Global Feature Aggregation for Speech Emotion Recognition. 1908-1912
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangZQ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangZQ23
Xuechen Wang, Shiwan Zhao, Yong Qin:
Supervised Contrastive Learning with Nearest Neighbor Search for Speech Emotion Recognition. 1913-1917

Speaker and Language Identification 1

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhamNHN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhamNHN23
Viet-Thanh Pham, Xuan Thai Hoa Nguyen, Vu Hoang, Thi Thu Trang Nguyen:
Vietnam-Celeb: a large-scale dataset for Vietnamese speaker recognition. 1918-1922
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangSKH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangSKH23
Mu Yang, Ram C. M. C. Shekar, Okim Kang, John H. L. Hansen:
What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model. 1923-1927
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeGGBSNMR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeGGBSNMR23
Yooyoung Lee, Craig S. Greenberg, Eliot Godard, Asad A. Butt, Elliot Singer, Trang Nguyen, Lisa P. Mason, Douglas A. Reynolds:
The 2022 NIST Language Recognition Evaluation. 1928-1932
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SarniCSB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SarniCSB23
Salvatore Sarni, Sandro Cumani, Sabato Marco Siniscalchi, Andrea Bottino:
Description and analysis of the KPT system for NIST Language Recognition Evaluation 2022. 1933-1937
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YipTN0M0NZC023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YipTN0M0NZC023
Jia Qi Yip, Duc-Tuan Truong, Dianwen Ng, Chong Zhang, Yukun Ma, Trung Hieu Nguyen, Chongjia Ni, Shengkui Zhao, Eng Siong Chng, Bin Ma:
ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention. 1938-1942
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YaoLPZ023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YaoLPZ023
Jiadi Yao, Chengdong Liang, Zhendong Peng, Binbin Zhang, Xiao-Lei Zhang:
Branch-ECAPA-TDNN: A Parallel Branch Architecture to Capture Local and Global Features for Speaker Verification. 1943-1947
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SinghSK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SinghSK23
Vishwanath Pratap Singh, Md. Sahidullah, Tomi Kinnunen:
Speaker Verification Across Ages: Investigating Deep Speaker Embedding Sensitivity to Age Mismatch in Enrollment and Test Speech. 1948-1952
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DeyS023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DeyS023
Spandan Dey, Premjeet Singh, Goutam Saha:
Wavelet Scattering Transform for Improving Generalization in Low-Resourced Spoken Language Identification. 1953-1957
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RadhakrishnanYK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RadhakrishnanYK23
Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, Narsis A. Kiani, David Gomez-Cabrero, Jesper N. Tegnér:
A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-Trained General-Purpose Speech Model. 1958-1962
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FlorezMN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FlorezMN23
Pablo Andrés Tamayo Flórez, Rubén Manrique, Bernardo Pereira Nunes:
HABLA: A Dataset of Latin American Spanish Accents for Voice Anti-spoofing. 1963-1967
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiXXPLHC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiXXPLHC23
Rui Li, Zhiwei Xie, Haihua Xu, Yizhou Peng, Hexin Liu, Hao Huang, Eng Siong Chng:
Self-supervised Learning Representation based Accent Recognition with Persistent Accent Memory. 1968-1972
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuWQ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuWQ23
Bei Liu, Haoyu Wang, Yanmin Qian:
Extremely Low Bit Quantization for Mobile Speaker Verification Systems Under 1MB Memory. 1973-1977
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DasVUY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DasVUY23
Sourya Dipta Das, Yash Vadi, Abhishek Unnam, Kuldeep Yadav:
Unsupervised Out-of-Distribution Dialect Detection with Mahalanobis Distance. 1978-1982
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Bredin23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Bredin23
Hervé Bredin:
pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe. 1983-1987
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiL0WL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiL0WL23
Jingyu Li, Wei Liu, Zhaoyang Zhang, Jiong Wang, Tan Lee:
Model Compression for DNN-based Speaker Verification Using Weight Quantization. 1988-1992
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VachhaniSL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VachhaniSL23
Bhavik Vachhani, Dipesh K. Singh, Rustom Lawyer:
Multi-resolution Approach to Identification of Spoken Languages and To Improve Overall Language Diarization System Using Whisper Model. 1993-1997
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zeng0MCY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zeng0MCY23
Chang Zeng, Xin Wang, Xiaoxiao Miao, Erica Cooper, Junichi Yamagishi:
Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms. 1998-2002
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Song0ZX023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Song0ZX023
Zhida Song, Liang He, Baowei Zhao, Minqiang Xu, Yu Zheng:
Dynamic Fully-Connected Layer for Large-Scale Speaker Verification. 2003-2007

Show and Tell: Speech tools, speech enhancement, speech synthesis

- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/SchroterER023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SchroterER023
Hendrik Schröter, Alberto N. Escalante-B., Tobias Rosenkranz, Andreas Maier:
DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement. 2008-2009
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/BurkhardtES23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BurkhardtES23
Felix Burkhardt, Florian Eyben, Björn W. Schuller:
Nkululeko: Machine Learning Experiments on Speaker Characteristics Without Programming. 2010-2011
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/Maguer0H23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Maguer0H23
Sébastien Le Maguer, Mark Anderson, Naomi Harte:
Sp1NY: A Quick and Flexible Speech Visualisation Tool in Python. 2012-2013
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/CorkeyO023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CorkeyO023
Niamh Corkey, Johannah O'Mahony, Simon King:
Intonation Control for Neural Text-to-Speech Synthesis with Polynomial Models of F0. 2014-2015
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/SzekelyWG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SzekelyWG23
Éva Székely, Siyang Wang, Joakim Gustafson:
So-to-Speak: An Exploratory Platform for Investigating the Interplay between Style and Prosody in TTS. 2016-2017
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/AraiYI23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AraiYI23
Takayuki Arai, Tsukasa Yoshinaga, Akiyoshi Iida:
Comparing /b/ and /d/ with a Single Physical Model of the Human Vocal Tract to Visualize Droplets Produced while Speaking. 2018-2019
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/EkstedtS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EkstedtS23
Erik Ekstedt, Gabriel Skantze:
Show & Tell: Voice Activity Projection and Turn-taking. 2020-2021
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/CordourierSABB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CordourierSABB23
Héctor A. Cordourier, Georg Stemmer, Sinem Aslan, Tobias Bocklet, Himanshu Bhalla:
Real Time Detection of Soft Voice for Speech Enhancement. 2022-2023
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/TannaSAW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TannaSAW23
Avani Tanna, Michael Saxon, Amr El Abbadi, William Yang Wang:
Data Augmentation for Diverse Voice Conversion in Noisy Environments. 2024-2025
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/GogateD023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GogateD023
Mandar Gogate, Kia Dashtipour, Amir Hussain:
Application for Real-time Audio-Visual Speech Enhancement. 2026-2027

Speech Synthesis and Voice Conversion

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoonYGEKHGH0Y23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoonYGEKHGH0Y23
Eunseop Yoon, Hee Suk Yoon, Dhananjaya Gowda, SooHwan Eom, Daehyeok Kim, John B. Harvill, Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim, Chang D. Yoo:
Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction. 2028-2032
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RybakovBZJMA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RybakovBZJMA23
Oleg Rybakov, Fadi Biadsy, Xia Zhang, Liyang Jiang, Phoenix Meadowlark, Shivani Agrawal:
Streaming Parrotron for on-device speech-to-speech conversion. 2033-2037
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShaheenSMSK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShaheenSMSK23
Zein Shaheen, Tasnima Sadekova, Yulia Matveeva, Alexandra Shirshova, Mikhail A. Kudinov:
Exploiting Emotion Information in Speaker Embeddings for Expressive Text-to-Speech. 2038-2042
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OkamotoTK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OkamotoTK23
Takuma Okamoto, Tomoki Toda, Hisashi Kawai:
E2E-S2S-VC: End-To-End Sequence-To-Sequence Voice Conversion. 2043-2047
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiK23
Yerin Choi, Myoung-Wan Koo:
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer. 2048-2052
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaasNK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaasNK23
Matthew Baas, Benjamin van Niekerk, Herman Kamper:
Voice Conversion With Just Nearest Neighbors. 2053-2057
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TanakaKKS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TanakaKKS23
Kou Tanaka, Takuhiro Kaneko, Hirokazu Kameoka, Shogo Seki:
CFVC: Conditional Filtering for Controllable Voice Conversion. 2058-2062
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NingJ0YW0B23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NingJ0YW0B23
Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Jixun Yao, Shuai Wang, Lei Xie, Mengxiao Bi:
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding. 2063-2067
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenYCLX23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenYCLX23
Yun Chen, Lingxiao Yang, Qi Chen, Jian-Huang Lai, Xiaohua Xie:
Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion. 2068-2072
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangRC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangRC23
Bohan Wang, Damien Ronssin, Milos Cernak:
ALO-VC: Any-to-any Low-latency One-shot Voice Conversion. 2073-2077
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MinixhoferK023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MinixhoferK023
Christoph Minixhofer, Ondrej Klejch, Peter Bell:
Evaluating and reducing the distance between synthetic and real speech distributions. 2078-2082
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QuamerDG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QuamerDG23
Waris Quamer, Anurag Das, Ricardo Gutierrez-Osuna:
Decoupling Segmental and Prosodic Cues of Non-native Speech through Vector Quantization. 2083-2087
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KanagawaMI23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KanagawaMI23
Hiroki Kanagawa, Takafumi Moriya, Yusuke Ijima:
VC-T: Streaming Voice Conversion Based on Neural Transducer. 2088-2092
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GhoshDSSPS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GhoshDSSPS23
Suhita Ghosh, Arnab Das, Yamini Sinha, Ingo Siegert, Tim Polzehl, Sebastian Stober:
Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion. 2093-2097
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenD23
Meiying Chen, Zhiyao Duan:
ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed. 2098-2102
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiXT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiXT23
Yeonjong Choi, Chao Xie, Tomoki Toda:
Reverberation-Controllable Voice Conversion Using Reverberation Time Estimator. 2103-2107
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuLZSTW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuLZSTW23
Cheng Yu, Yang Li, Weiqin Zu, Fanglei Sun, Zheng Tian, Jun Wang:
Cross-utterance Conditioned Coherent Speech Editing. 2108-2112

Spoken Language Translation, Information Retrieval, Summarization, Resources, and Evaluation 2

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangHLXLL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangHLXLL23
Jianrong Wang, Yuchen Huo, Li Liu, Tianyi Xu, Qi Li, Sen Li:
MAVD: The First Open Large-Scale Mandarin Audio-Visual Dataset with Depth Information. 2113-2117
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiLJCH023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiLJCH023
Lantian Li, Xiaolou Li, Haoyu Jiang, Chen Chen, Ruihai Hou, Dong Wang:
CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition. 2118-2122
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiWSWW023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiWSWW023
Yuhang Li, Xiao Wei, Yuke Si, Longbiao Wang, Xiaobao Wang, Jianwu Dang:
Improving Zero-shot Cross-domain Slot Filling via Transformer-based Slot Semantics Fusion. 2123-2127
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShinPKKL023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShinPKKL023
Wooseok Shin, Hyun Joon Park, Jin Sob Kim, Dongwon Kim, Seungjin Lee, Sung Won Han:
Rethinking Transfer and Auxiliary Learning for Improving Audio Captioning Transformer. 2128-2132
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LaiS0BT0DDN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LaiS0BT0DDN23
Viet Dac Lai, Abel Salinas, Hao Tan, Trung Bui, Quan Tran, Seunghyun Yoon, Hanieh Deilamsalehy, Franck Dernoncourt, Thien Huu Nguyen:
Boosting Punctuation Restoration with Data Generation and Reinforcement Learning. 2133-2137
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuL23
Yi-Fen Liu, Xiang-Li Lu:
J-ToneNet: A Transformer-based Encoding Network for Improving Tone Classification in Continuous Speech via F0 Sequences. 2138-2142
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AvilaW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AvilaW23
Jonathan E. Avila, Nigel G. Ward:
Towards Cross-Language Prosody Transfer for Dialog. 2143-2147
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KesirajuSPMC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KesirajuSPMC23
Santosh Kesiraju, Marek Sarvas, Tomás Pavlícek, Cécile Macaire, Alejandro Ciuba:
Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models. 2148-2152
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoudounasQVCAPC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoudounasQVCAPC23
Alkis Koudounas, Moreno La Quatra, Lorenzo Vaiani, Luca Colomba, Giuseppe Attanasio, Eliana Pastor, Luca Cagliero, Elena Baralis:
ITALIC: An Italian Intent Classification Dataset. 2153-2157
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RugayanSS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RugayanSS23
Janine Rugayan, Giampiero Salvi, Torbjørn Svendsen:
Perceptual and Task-Oriented Assessment of a Semantic Metric for ASR Evaluation. 2158-2162
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiCY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiCY23
Guangpeng Li, Lu Chen, Kai Yu:
How ChatGPT is Robust for Spoken Language Understanding? 2163-2167
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YeZKMWWC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YeZKMWWC23
Rong Ye, Chengqi Zhao, Tom Ko, Chutong Meng, Tao Wang, Mingxuan Wang, Jun Cao:
GigaST: A 10, 000-hour Pseudo Speech Translation Corpus. 2168-2172
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FanZLWLO0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FanZLWLO0023
Jiaxin Fan, Yong Zhang, Hanzhang Li, Jianzong Wang, Zhitao Li, Sheng Ouyang, Ning Cheng, Jing Xiao:
Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism. 2173-2177
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FallgrenE23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FallgrenE23
Per Fallgren, Jens Edlund:
Crowdsource-based Validation of the Audio Cocktail as a Sound Browsing Tool. 2178-2182
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiLWM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiLWM23
Yunxiang Li, Pengfei Liu, Xixin Wu, Helen Meng:
PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts. 2183-2187
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KatoH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KatoH23
Shuhei Kato, Taiichi Hashimoto:
Speech-to-Face Conversion Using Denoising Diffusion Probabilistic Models. 2188-2192
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Nishikawa023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Nishikawa023
Yuta Nishikawa, Satoshi Nakamura:
Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation. 2193-2197

Novel Transformer Models for ASR

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RadfarLTXZHFSSM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RadfarLTXZHFSSM23
Martin Radfar, Paulina Lyskawa, Brandon Trujillo, Yi Xie, Kai Zhen, Jahn Heymann, Denis Filimonov, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris:
Conmer: Streaming Conformer Without Self-attention for Interactive Voice Assistants. 2198-2202
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimCC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimCC23
Do-Hee Kim, Ji-Eun Choi, Joon-Hyuk Chang:
Intra-ensemble: A New Method for Combining Intermediate Outputs in Transformer-based Automatic Speech Recognition. 2203-2207
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengKWYACTSS023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengKWYACTSS023
Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe:
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks. 2208-2212
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaiZPM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaiZPM23
Florian Mai, Juan Zuluaga-Gomez, Titouan Parcollet, Petr Motlícek:
HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition. 2213-2217
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CarvalhoA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CarvalhoA23
Carlos Carvalho, Alberto Abad:
Memory-augmented conformer for improved end-to-end long-form ASR. 2218-2222
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CuiKD0X0L23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CuiKD0X0L23
Mingyu Cui, Jiawen Kang, Jiajun Deng, Xi Yin, Yutao Xie, Xie Chen, Xunying Liu:
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems. 2223-2227

Speaker Recognition 1

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenZWC0Q23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenZWC0Q23
Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Jiajun Qi:
An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification. 2228-2232
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangHGM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangHGM23
Jian Zhang, Liang He, Xiaochen Guo, Jing Ma:
A Study on Visualization of Voiceprint Feature. 2233-2237
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YakovlevOTMVS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YakovlevOTMVS23
Ivan Yakovlev, Anton Okhotnikov, Nikita Torgashov, Rostislav Makarov, Yuri Voevodin, Konstantin Simonchik:
VoxTube: a multilingual speaker recognition dataset. 2238-2242
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiLH023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiLH023
Pengqi Li, Lantian Li, Askar Hamdulla, Dong Wang:
Visualizing Data Augmentation in Deep Speaker Recognition. 2243-2247

Cross-lingual and Multilingual ASR

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangWQ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangWQ23
Zhilong Zhang, Wei Wang, Yanmin Qian:
Fast and Efficient Multilingual Self-Supervised Pre-training for Low-Resource Speech Recognition. 2248-2252
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangQ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangQ23
Wei Wang, Yanmin Qian:
UniSplice: Universal Cross-Lingual Data Splicing for Low-Resource ASR. 2253-2257
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GlockerHG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GlockerHG23
Kevin Glocker, Aaricia Herygers, Munir Georges:
Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes. 2258-2262
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiXWL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiXWL23
Li Li, Dongxing Xu, Haoran Wei, Yanhua Long:
Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system. 2263-2267
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RouditchenkoK0F23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RouditchenkoK0F23
Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogério Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James R. Glass:
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages. 2268-2272
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangW0B23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangW0B23
Haoyu Wang, Siyuan Wang, Wei-Qiang Zhang, Jinfeng Bai:
DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model. 2273-2277

Voice Conversion

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuZ0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuZ0023
Hai Zhu, Huayi Zhan, Hong Cheng, Ying Wu:
Emotional Voice Conversion with Semi-Supervised Generative Modeling. 2278-2282
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiLL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiLL23
Ha-Yeong Choi, Sang-Hoon Lee, Seong-Whan Lee:
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation. 2283-2287
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wei0WLQXM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wei0WLQXM23
Pengfei Wei, Xiang Yin, Chunfeng Wang, Zhonghao Li, Xinghua Qu, Zhiqiang Xu, Zejun Ma:
S2CD: Self-heuristic Speaker Content Disentanglement for Any-to-Any Voice Conversion. 2288-2292
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuZLYZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuZLYZ23
Le Xu, Rongxiu Zhong, Ying Liu, Huibao Yang, Shilei Zhang:
Flow-VAE VC: End-to-End Flow Framework with Contrastive Loss for Zero-shot Voice Conversion. 2293-2297
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuW023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuW023
Zhonghua Liu, Shijun Wang, Ning Chen:
Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech Augmentation. 2298-2302
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KangHR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KangHR23
Wonjune Kang, Mark Hasegawa-Johnson, Deb Roy:
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions. 2303-2307

Speech and Language in Health: From Remote Monitoring to Medical Conversations 2

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BraunBPHLHNBR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BraunBPHLHNBR23
Franziska Braun, Sebastian P. Bayerl, Paula Andrea Pérez-Toro, Florian Hönig, Hartmut Lehfeld, Thomas Hillemacher, Elmar Nöth, Tobias Bocklet, Korbinian Riedhammer:
Classifying Dementia in the Presence of Depression: A Cross-Corpus Study. 2308-2312
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuXGCDLWML23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuXGCDLWML23
Shujie Hu, Xurong Xie, Mengzhe Geng, Mingyu Cui, Jiajun Deng, Guinan Li, Tianzi Wang, Helen Meng, Xunying Liu:
Exploiting Cross-Domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition. 2313-2317
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WagnerBBBNRB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WagnerBBBNRB23
Dominik Wagner, Ilja Baumann, Franziska Braun, Sebastian P. Bayerl, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet:
Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data? 2318-2322
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KothareNLGR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KothareNLGR23
Hardik Kothare, Michael Neumann, Jackson Liscombe, Jordan R. Green, Vikram Ramanarayanan:
Responsiveness, Sensitivity and Clinical Utility of Timing-Related Speech Biomarkers for Remote Monitoring of ALS Disease Progression. 2323-2327
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GengJWHDCLYXL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GengJWHDCLYXL23
Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Jiajun Deng, Mingyu Cui, Guinan Li, Jianwei Yu, Xurong Xie, Xunying Liu:
Use of Speech Impairment Severity for Dysarthric Speech Recognition. 2328-2332
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MosuilyWC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MosuilyWC23
Mohammed Mosuily, Lindsay Welch, Jagmohan Chauhan:
MMLung: Moving Closer to Practical Lung Health Estimation using Smartphones. 2333-2337
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenGESFS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenGESFS23
Siyuan Chen, Colin A. Grambow, Mojtaba Kadkhodaie Elyaderani, Alireza Sadeghi, Federico Fancellu, Thomas Schaaf:
Investigating the Utility of Synthetic Data for Doctor-Patient Conversation Summarization. 2338-2342
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangRA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangRA23
Jinhan Wang, Vijay Ravi, Abeer Alwan:
Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals. 2343-2347
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DemirW0S0Y23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DemirW0S0Y23
Kubilay Can Demir, Tobias Weise, Matthias May, Axel Schmid, Andreas Maier, Seung Hee Yang:
PoCaPNet: A Novel Approach for Surgical Phase Recognition Using Speech and X-Ray Images. 2348-2352
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NeumannKR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NeumannKR23
Michael Neumann, Hardik Kothare, Vikram Ramanarayanan:
Combining Multiple Multimodal Speech Features into an Interpretable Index Score for Capturing Disease Progression in Amyotrophic Lateral Sclerosis. 2353-2357
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Mallol-RagoltaU23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Mallol-RagoltaU23
Adria Mallol-Ragolta, Nils Urbach, Shuo Liu, Anton Batliner, Björn W. Schuller:
The MASCFLICHT Corpus: Face Mask Type and Coverage Area Recognition from Speech. 2358-2362
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BotelhoAST23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BotelhoAST23
Catarina Botelho, Alberto Abad, Tanja Schultz, Isabel Trancoso:
Towards Reference Speech Characterization for Health Applications. 2363-2367
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Rios-UrregoRNO23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Rios-UrregoRNO23
Cristian David Ríos-Urrego, Jan Rusz, Elmar Nöth, Juan Rafael Orozco-Arroyave:
Automatic Classification of Hypokinetic and Hyperkinetic Dysarthria based on GMM-Supervectors. 2368-2372
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DineleyCMDDQC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DineleyCMDDQC23
Judith Dineley, Ewan Carr, Faith Matcham, Johnny Downs, Richard J. B. Dobson, Thomas F. Quatieri, Nicholas Cummins:
Towards robust paralinguistic assessment for real-world mobile health (mHealth) monitoring: an initial study of reverberation effects on speech. 2373-2377

Pathological Speech Analysis 1

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SimmatisPY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SimmatisPY23
Leif E. R. Simmatis, Timothy Pommeé, Yana Yunusova:
Multimodal Assessment of Bulbar Amyotrophic Lateral Sclerosis (ALS) Using a Novel Remote Speech Assessment App. 2378-2382
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MartinezRL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MartinezRL23
David Martínez, Dayana Ribas, Eduardo Lleida:
On the Use of High Frequency Information for Voice Pathology Classification. 2383-2387
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Favaro0T0BDM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Favaro0T0BDM23
Anna Favaro, Tianyu Cao, Thomas Thebaud, Jesús Villalba, Ankur A. Butala, Najim Dehak, Laureano Moro-Velázquez:
Do Phonatory Features Display Robustness to Characterize Parkinsonian Speech Across Corpora? 2388-2392
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KadiriKA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KadiriKA23
Sudarsana Reddy Kadiri, Manila Kodali, Paavo Alku:
Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features. 2393-2397
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SimekKNTR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SimekKNTR23
Michal Simek, Tomás Kouba, Michal Novotný, Tereza Tykalová, Jan Rusz:
Comparison of acoustic measures of dysphonia in Parkinson's disease and Huntington's disease: Effect of sex and speaking task. 2398-2402
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Gomez-ZaragozaW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Gomez-ZaragozaW23
Lucía Gómez-Zaragozá, Simone Wills, Cristian Tejedor García, Javier Marín-Morales, Mariano Alcañiz, Helmer Strik:
Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses. 2403-2407

Multimodal Speech Emotion Recognition

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GongBSNEJ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GongBSNEJ23
Taesik Gong, Josh Belanich, Krishna Somandepalli, Arsha Nagrani, Brian Eoff, Brendan Jou:
LanSER: Language-Model Supported Speech Emotion Recognition. 2408-2412
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuoPR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuoPR23
Jiachen Luo, Huy Phan, Joshua D. Reiss:
Fine-tuned RoBERTa Model with a CNN-LSTM Network for Conversational Emotion Recognition. 2413-2417
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/StanleyDKOCBP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/StanleyDKOCBP23
Eimear Stanley, Eric DeMattos, Anita Klementiev, Piotr Ozimek, Georgia Clarke, Michael Berger, Dimitri Palaz:
Emotion Label Encoding Using Word Embeddings for Speech Emotion Recognition. 2418-2422
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiZW023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiZW023
Zhongjie Li, Gaoyan Zhang, Longbiao Wang, Jianwu Dang:
Discrimination of the Different Intents Carried by the Same Text Through Integrating Multimodal Information. 2423-2427
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiTH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiTH23
Zhi Li, Ryu Takeda, Takahiro Hara:
Meta-domain Adversarial Contrastive Learning for Alleviating Individual Bias in Self-sentiment Predictions. 2428-2432
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001GWS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001GWS23
Ziping Zhao, Tian Gao, Haishuai Wang, Björn W. Schuller:
SWRR: Feature Map Classifier Based on Sliding Window Attention and High-Response Feature Reuse for Multimodal Emotion Recognition. 2433-2437

Speech Coding and Enhancement 2

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuT023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuT023
Xinmeng Xu, Weiping Tu, Yuhong Yang:
PCNN: A Lightweight Parallel Conformer Neural Network for Efficient Monaural Speech Enhancement. 2438-2442
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HanXT0L23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HanXT0L23
Chang Han, Xinmeng Xu, Weiping Tu, Yuhong Yang, Yajie Liu:
Exploring the Interactions Between Target Positive and Negative Information for Acoustic Echo Cancellation. 2443-2447
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AndreevBSSA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AndreevBSSA23
Pavel Andreev, Nicholas Babaev, Azat Saginbaev, Ivan Shchekotov, Aibek Alanov:
Iterative autoregression: a novel trick to improve your low-latency speech enhancement model. 2448-2452
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KuYS023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KuYS023
Pin-Jui Ku, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee:
A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models. 2453-2457
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FrenkelGC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FrenkelGC23
Lior Frenkel, Jacob Goldberger, Shlomo E. Chazan:
Domain Adaptation for Speech Enhancement in a Large Domain Gap. 2458-2462
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zadorozhnyy0K23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zadorozhnyy0K23
Vasily Zadorozhnyy, Qiang Ye, Kazuhito Koishida:
SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks. 2463-2467
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuGMDWD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuGMDWD23
Liang Liu, Haixin Guan, Jinlong Ma, Wei Dai, Guangyong Wang, Shaowei Ding:
A Mask Free Neural Network for Monaural Speech Enhancement. 2468-2472
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenCL0W23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenCL0W23
Li-Wei Chen, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang:
A Training and Inference Strategy Using Noisy and Enhanced Speech as Target for Speech Enhancement without Clean Speech. 2473-2477
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/00040X23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/00040X23
Ashutosh Pandey, Ke Tan, Buye Xu:
A Simple RNN Model for Lightweight, Low-compute and Low-latency Multichannel Speech Enhancement in the Time Domain. 2478-2482
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuC0GW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuC0GW23
Jianwei Yu, Hangting Chen, Yi Luo, Rongzhi Gu, Chao Weng:
High Fidelity Speech Enhancement with Band-split RNN. 2483-2487
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinWD00YWZW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinWD00YWZW23
Jiuxin Lin, Peng Wang, Heinrich Dinkel, Jun Chen, Zhiyong Wu, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang:
Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information. 2488-2492
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KovalyovPP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KovalyovPP23
Anton Kovalyov, Kashyap Patel, Issa M. S. Panahi:
DFSNet: A Steerable Neural Beamformer Invariant to Microphone Array Configuration for Real-Time, Low-Latency Speech Enhancement. 2493-2497
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuSLK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuSLK23
Xuechen Liu, Md. Sahidullah, Kong Aik Lee, Tomi Kinnunen:
Speaker-Aware Anti-spoofing. 2498-2502
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ArakiYOAONI23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ArakiYOAONI23
Shoko Araki, Ayako Yamamoto, Tsubasa Ochiai, Kenichi Arai, Atsunori Ogawa, Tomohiro Nakatani, Toshio Irino:
Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine. 2503-2507
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SachFDFSTF23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SachFDFSTF23
Marvin Sach, Jan Franzen, Bruno Defraene, Kristoff Fluyt, Maximilian Strake, Wouter Tirry, Tim Fingscheidt:
EffCRN: An Efficient Convolutional Recurrent Network for High-Performance Speech Enhancement. 2508-2512
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkCKC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkCKC23
JungPhil Park, Jeong-Hwan Choi, Yungyeo Kim, Joon-Hyuk Chang:
HAD-ANC: A Hybrid System Comprising an Adaptive Filter and Deep Neural Networks for Active Noise Control. 2513-2517
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChuWMFYXTW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChuWMFYXTW23
Minghang Chu, Jing Wang, Yaoyao Ma, Zhiwei Fan, Mengtao Yang, Chao Xu, Zhi Tao, Di Wu:
MSAF: A Multiple Self-Attention Field Method for Speech Enhancement. 2518-2522
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenY0GLLW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenY0GLLW23
Hangting Chen, Jianwei Yu, Yi Luo, Rongzhi Gu, Weihua Li, Zhuocheng Lu, Chao Weng:
Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression. 2523-2527
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WanZPC023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WanZPC023
Yixin Wan, Yuan Zhou, Xiulian Peng, Kai-Wei Chang, Yan Lu:
ABC-KD: Attention-Based-Compression Knowledge Distillation for Deep Learning-Based Noise Suppression. 2528-2532
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DienerPSSAC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DienerPSSAC23
Lorenz Diener, Marju Purin, Sten Sootla, Ando Saabas, Robert Aichner, Ross Cutler:
PLCMOS - A Data-driven Non-intrusive Metric for The Evaluation of Packet Loss Concealment Algorithms. 2533-2537

Phonetics, Phonology, and Prosody 1

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WagnerB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WagnerB23
Petra Wagner, Simon Betz:
Effects of Meter, Genre and Experience on Pausing, Lengthening and Prosodic Phrasing in German Poetry Reading. 2538-2542
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SzalayHNMM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SzalayHNMM23
Tünde Szalay, John Holik, Duy Duong Nguyen, James Morandini, Catherine J. Madill:
Comparing first spectral moment of Australian English /s/ between straight and gay voices using three analysis window sizes. 2543-2547
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Taguchi0H023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Taguchi0H023
Chihiro Taguchi, Yusuke Sakai, Parisa Haghani, David Chiang:
Universal Automatic Phonetic Transcription into the International Phonetic Alphabet. 2548-2552
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GerlachMKA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GerlachMKA23
Linda Gerlach, Kirsty McDougall, Finnian Kelly, Anil Alexander:
Voice Twins: Discovering Extremely Similar-sounding, Unrelated Speakers. 2553-2557
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HedegardFTSL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HedegardFTSL23
Hannah Hedegard, Andrea Fröhlich, Fabian Tomaschek, Carina Steiner, Adrian Leemann:
Filling the population statistics gap: Swiss German reference data on F0 and speech tempo for forensic contexts. 2558-2562
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HutinDA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HutinDA23
Mathilde Hutin, Liesbeth Degand, Marc Allassonnière-Tang:
Investigating the Syntax-Discourse Interface in the Phonetic Implementation of Discourse Markers. 2563-2567
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EsseryHH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EsseryHH23
Robert Essery, Philip Harrison, Vincent Hughes:
Evaluation of a Forensic Automatic Speaker Recognition System with Emotional Speech Recordings. 2568-2572
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AhnLWC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AhnLWC23
Emily Ahn, Gina-Anne Levow, Richard A. Wright, Eleanor Chodroff:
An Outlier Analysis of Vowel Formants from a Corpus Phonetics Pipeline. 2573-2577
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QuZ0WSR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QuZ0WSR23
Liao Qu, Xianwei Zou, Xiang Li, Yandong Wen, Rita Singh, Bhiksha Raj:
The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features. 2578-2582
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BlaylockN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BlaylockN23
Reed Blaylock, Shrikanth Narayanan:
Beatboxing Kick Drum Kinematics. 2583-2587
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouBZM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouBZM23
Huali Zhou, Xianming Bei, Nengheng Zheng, Qinglin Meng:
Effects of hearing loss and amplification on Mandarin consonant perception. 2588-2592
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AdamsG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AdamsG23
Roland Adams, Calbert Graham:
An Acoustic Analysis of Fricative Variation in Three Accents of English. 2593-2597
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Bros23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Bros23
Karolina Bros:
Acoustic cues to stress perception in Spanish - a mismatch negativity study. 2598-2602
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SabevAGG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SabevAGG23
Mitko Sabev, Bistra Andreeva, Christoph Gabriel, Jonas Gruenke:
Bulgarian Unstressed Vowel Reduction: Received Views vs Corpus Findings. 2603-2607
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JainPVGY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JainPVGY23
Shelly Jain, Priyanshi Pal, Anil Kumar Vuppala, Prasanta Kumar Ghosh, Chiranjeevi Yarra:
An Investigation of Indian Native Language Phonemic Influences on L2 English Pronunciations. 2608-2612
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkK23
Hye-Sook Park, Sunhee Kim:
Identifying Stable Sections for Formant Frequency Extraction of French Nasal Vowels Based on Difference Thresholds. 2613-2617
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AudibertCCHP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AudibertCCHP23
Nicolas Audibert, Francesca Carbone, Maud Champagne-Lavau, Aurélien Said Housseini, Caterina Petrone:
Evaluation of delexicalization methods for research on emotional speech. 2618-2622

Spoken Dialog Systems and Conversational Analysis 2

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KejriwalB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KejriwalB23
Jay Kejriwal, Stefan Benus:
Relationship between auditory and semantic entrainment using Deep Neural Networks (DNN). 2623-2627
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KejriwalBR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KejriwalBR23
Jay Kejriwal, Stefan Benus, Lina Maria Rojas-Barahona:
Unsupervised Auditory and Semantic Entrainment Models with Deep Neural Networks. 2628-2632
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NielsenSG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NielsenSG23
Elizabeth Nielsen, Mark Steedman, Sharon Goldwater:
Parsing dialog turns with prosodic features in English. 2633-2637
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MuromachiK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MuromachiK23
Toshiki Muromachi, Yoshinobu Kano:
Estimation of Listening Response Timing by Generative Model and Parameter Control of Response Substantialness Using Dynamic-Prompt-Tune. 2638-2642
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Chowdhury0S23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Chowdhury0S23
Tahiya Chowdhury, Verónica Romero, Amanda Stent:
Parameter Selection for Analyzing Conversations with Autism Spectrum Disorder. 2643-2647
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BuddiSHSYRA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BuddiSHSYRA23
Sai Srujana Buddi, Utkarsh Oggy Sarawgi, Tashweena Heeramun, Karan Sawnhey, Ed Yanosik, Saravana Rathinam, Saurabh Adya:
Efficient Multimodal Neural Networks for Trigger-less Voice Assistants. 2648-2652
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OstrandFP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OstrandFP23
Rachel Ostrand, Victor S. Ferreira, David Piorkowski:
Rapid Lexical Alignment to a Conversational Agent. 2653-2657
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KurataSFM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KurataSFM23
Fuma Kurata, Mao Saeki, Shinya Fujie, Yoichi Matsuyama:
Multimodal Turn-Taking Model Using Visual Cues for End-of-Utterance Prediction in Spoken Dialogue Systems. 2658-2662
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HojoMKMIST23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HojoMKMIST23
Nobukatsu Hojo, Saki Mizuno, Satoshi Kobashikawa, Ryo Masumura, Mana Ihori, Hiroshi Sato, Tomohiro Tanaka:
Audio-Visual Praise Estimation for Conversational Video based on Synchronization-Guided Multimodal Transformer. 2663-2667
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SakumaFZK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SakumaFZK23
Jin Sakuma, Shinya Fujie, Huaibo Zhao, Tetsunori Kobayashi:
Improving the response timing estimation for spoken dialogue systems by reducing the effect of speech recognition delay. 2668-2672
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimC23
Keulbit Kim, Namhyun Cho:
Focus-attention-enhanced Crossmodal Transformer with Metric Learning for Multimodal Speech Emotion Recognition. 2673-2677
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangDZ0RZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangDZ0RZ23
Haotian Wang, Jun Du, Hengshun Zhou, Chin-Hui Lee, Yuling Ren, Jiangjiang Zhao:
A Multiple-Teacher Pruning Based Self-Distillation (MT-PSD) Approach to Model Compression for Audio-Visual Wake Word Spotting. 2678-2682
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SpiesbergerTTS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SpiesbergerTTS23
Anika A. Spiesberger, Andreas Triantafyllopoulos, Iosif Tsangko, Björn W. Schuller:
Abusive Speech Detection in Indic Languages Using Acoustic Features. 2683-2687
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IngleKV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IngleKV23
Digvijay Ingle, Ayush Kumar, Jithendra Vepa:
Listening To Silences In Contact Center Conversations Using Textual Cues. 2688-2692
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YeenKK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YeenKK23
Heuiyeen Yeen, Minju Kim, Myoung-Wan Koo:
I Learned Error, I Can Fix It! : A Detector-Corrector Structure for ASR Error Calibration. 2693-2697
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GarnierFR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GarnierFR23
Maeva Garnier, Éric Le Ferrand, Fabien Ringeval:
Verbal and nonverbal feedback signals in response to increasing levels of miscommunication. 2698-2702
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AmiriparianCKGT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AmiriparianCKGT23
Shahin Amiriparian, Lukas Christ, Regina Kushtanova, Maurice Gerczuk, Alexandra Teynor, Björn W. Schuller:
Speech-Based Classification of Defensive Communication: A Novel Dataset and Results. 2703-2707
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wallbridge0L23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wallbridge0L23
Sarenne Wallbridge, Peter Bell, Catherine Lai:
Quantifying the perceptual value of lexical and non-lexical channels in speech. 2708-2712
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TsubokuraIK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TsubokuraIK23
Kazuya Tsubokura, Yurie Iribe, Norihide Kitaoka:
Relationships Between Gender, Personality Traits and Features of Multi-Modal Data to Responses to Spoken Dialog Systems Breakdown. 2713-2717
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoL023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoL023
Huan Zhao, Bo Li, Zixing Zhang:
Speaker-aware Cross-modal Fusion Architecture for Conversational Emotion Recognition. 2718-2722

Analysis of Speech and Audio Signals 2

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiaoXLCCFZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiaoXLCCFZ23
Zhiheng Liao, Feifei Xiong, Juan Luo, Minjie Cai, Eng Siong Chng, Jinwei Feng, Xionghu Zhong:
Blind Estimation of Room Impulse Response from Monaural Reverberant Speech with Segmental Generative Neural Network. 2723-2727
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RenLZC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RenLZC23
Xin Ren, Juan Luo, Xionghu Zhong, Minjie Cai:
Emotion-Aware Audio-Driven Face Animation via Contrastive Feature Disentanglement. 2728-2732
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShimonishiDK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShimonishiDK23
Kanta Shimonishi, Kota Dohi, Yohei Kawaguchi:
Anomalous Sound Detection Based on Sound Separation. 2733-2737
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FahedDL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FahedDL23
Vitória S. Fahed, Emer P. Doheny, Madeleine M. Lowery:
Random Forest Classification of Breathing Phases from Audio Signals Recorded using Mobile Devices. 2738-2742
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Ahn00S023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Ahn00S023
Youngdo Ahn, Chengyi Wang, Yu Wu, Jong Won Shin, Shujie Liu:
GRAVO: Learning to Generate Relevant Audio from Visual Features with Noisy Online Videos. 2743-2747
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaiH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaiH23
Wanyue Zhai, Mark Hasegawa-Johnson:
Wav2ToBI: a new approach to automatic ToBI transcription. 2748-2752
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoM023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoM023
Lijian Gao, Qirong Mao, Ming Dong:
Joint-Former: Jointly Regularized and Locally Down-sampled Conformer for Semi-supervised Sound Event Detection. 2753-2757
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GoelKCSB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GoelKCSB23
Chirag Goel, Surya Koppisetti, Ben Colman, Ali Shahriyari, Gaurav Bharaj:
Towards Attention-based Contrastive Learning for Audio Spoof Detection. 2758-2762
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XinP023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XinP023
Yifei Xin, Xiulian Peng, Yan Lu:
Masked Audio Modeling with CLAP and Multi-Objective Learning. 2763-2767
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RusciT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RusciT23
Manuele Rusci, Tinne Tuytelaars:
Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems. 2768-2772
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AzeemiQR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AzeemiQR23
Abdul Hameed Azeemi, Ihsan Ayyub Qazi, Agha Ali Raza:
Self-Supervised Dataset Pruning for Efficient Training in Audio Anti-spoofing. 2773-2777
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangZKCS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangZKCS23
W. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-Yiin Chang, Tara N. Sainath:
Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR. 2778-2782
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MariotteLMT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MariotteLMT23
Théo Mariotte, Anthony Larcher, Silvio Montrésor, Jean-Hugh Thomas:
Multi-microphone Automatic Speech Segmentation in Meetings Based on Circular Harmonics Features. 2783-2787
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiLLX23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiLLX23
Jing Li, Yanhua Long, Yijie Li, Dongxing Xu:
Advanced RawNet2 with Attention-based Channel Masking for Synthetic Speech Detection. 2788-2792
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Martinez-Sevilla23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Martinez-Sevilla23
Juan Carlos Martinez-Sevilla, María Alfaro-Contreras, Jose J. Valero-Mas, Jorge Calvo-Zaragoza:
Insights into end-to-end audio-to-score transcription with real recordings: A case study with saxophone works. 2793-2797
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001KKG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001KKG23
Yuan Gong, Sameer Khurana, Leonid Karlinsky, James R. Glass:
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers. 2798-2802
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Gong023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Gong023
Jingran Gong, Ning Chen:
Synthetic Voice Spoofing Detection based on Feature Pyramid Conformer. 2803-2807
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XieCWY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XieCWY23
Yuankun Xie, Haonan Cheng, Yutian Wang, Long Ye:
Learning A Self-Supervised Domain-Invariant Feature Representation for Generalized Audio Deepfake Detection. 2808-2812
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KerpicciNZV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KerpicciNZV23
Mine Kerpicci, Van Nguyen, Shuhua Zhang, Erik Visser:
Application of Knowledge Distillation to Multi-Task Speech Representation Learning. 2813-2817
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiangLM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiangLM23
Xilin Jiang, Yinghao Aaron Li, Nima Mesgarani:
DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes. 2818-2822
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Almudevar0VML23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Almudevar0VML23
Antonio Almudévar, Alfonso Ortega, Luis Vicente, Antonio Miguel, Eduardo Lleida:
Variational Classifier for Unsupervised Anomalous Sound Detection under Domain Generalization. 2823-2827
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FengECS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FengECS23
Jiu Feng, Mehmet Hamza Erol, Joon Son Chung, Arda Senocak:
FlexiAST: Flexibility is What AST Needs. 2828-2832
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoonKK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoonKK23
Ji Won Yoon, Seok Min Kim, Nam Soo Kim:
MCR-Data2vec 2.0: Improving Self-supervised Speech Pre-training via Model-level Consistency Regularization. 2833-2837
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuHMLKSLK0TPK023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuHMLKSLK0TPK023
Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, H. Lilian Tang, Mark D. Plumbley, Volkan Kiliç, Wenwu Wang:
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention. 2838-2842

Speech Coding: Privacy

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DitthapronAL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DitthapronAL23
Apiwat Ditthapron, Emmanuel O. Agu, Adam C. Lammert:
Masking Kernel for Learning Energy-Efficient Representations for Speaker Recognition and Mobile Health. 2843-2847
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiangLWSZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiangLWSZ23
Bajian Xiang, Hongkun Liu, Zedong Wu, Su Shen, Xiangdong Zhang:
eSTImate: A Real-time Speech Transmission Index Estimator With Speech Enhancement Auxiliary Task Using Self-Attention Feature Pyramid Network. 2848-2852
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wang23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wang23
Junyu Wang:
Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive Feature Learning in Speech Enhancement. 2853-2857
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0004023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0004023
Minh Tran, Mohammad Soleymani:
Privacy-preserving Representation Learning for Speech Understanding. 2858-2862
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PanarielloTE23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PanarielloTE23
Michele Panariello, Massimiliano Todisco, Nicholas W. D. Evans:
Vocoder drift in x-vector-based speaker anonymization. 2863-2867
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PanarielloGTTE23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PanarielloGTTE23
Michele Panariello, Wanying Ge, Hemlata Tak, Massimiliano Todisco, Nicholas W. D. Evans:
Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems. 2868-2872

Analysis of Neural Speech Representations

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZaiemKPER23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZaiemKPER23
Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco Ravanelli:
Speech Self-Supervised Representation Benchmarking: Are We Doing it Right? 2873-2877
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangBGL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangBGL23
Olivier Zhang, Olivier Le Blouch, Nicolas Gengembre, Damien Lolive:
An extension of disentanglement metrics and its application to voice. 2878-2882
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AbdullahSMK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AbdullahSMK23
Badr M. Abdullah, Mohammed Maqsood Shaik, Bernd Möbius, Dietrich Klakow:
An Information-Theoretic Analysis of Self-supervised Discrete Representations of Speech. 2883-2887
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AshiharaMMTIADH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AshiharaMMTIADH23
Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka, Yusuke Ijima, Taichi Asami, Marc Delcroix, Yukinori Honma:
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge? 2888-2892
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SasouC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SasouC23
Akira Sasou, Yang Chen:
Comparison of GIF- and SSL-based Features in Pathological-voice Detection. 2893-2897
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MengSA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MengSA23
Hanyu Meng, Vidhyasaharan Sethu, Eliathamby Ambikairajah:
What is Learnt by the LEArnable Front-end (LEAF)? Adapting Per-Channel Energy Normalisation (PCEN) to Noisy Conditions. 2898-2902

End-to-end ASR

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MasumuraMYYMIUS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MasumuraMYYMIUS23
Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando:
End-to-End Joint Target and Non-Target Speakers ASR. 2903-2907
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenLWHM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenLWHM23
Xianzhao Chen, Yist Y. Lin, Kang Wang, Yi He, Zejun Ma:
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition. 2908-2912
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MakishimaSSAM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MakishimaSSAM23
Naoki Makishima, Keita Suzuki, Satoshi Suzuki, Atsushi Ando, Ryo Masumura:
Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-level Timestamp Prediction. 2913-2917
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuH0C23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuH0C23
Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng:
Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition. 2918-2922
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLLW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLLW23
Xuefei Wang, Yanhua Long, Yijie Li, Haoran Wei:
Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition. 2923-2927
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BataevKSLG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BataevKSLG23
Vladimir Bataev, Roman Korostik, Evgeny Shabalin, Vitaly Lavrukhin, Boris Ginsburg:
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator. 2928-2932

Spoken Language Understanding, Summarization, and Information Retrieval

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangBG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangBG23
He Huang, Jagadeesh Balam, Boris Ginsburg:
Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling. 2933-2937
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangHS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangHS23
Heerin Yang, Seung-won Hwang, Jungmin So:
Relation-based Counterfactual Data Augmentation and Contrastive Learning for Robustifying Natural Language Inference Models. 2938-2942
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MatsuuraAMTKOD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MatsuuraAMTKOD23
Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Takatomo Kano, Atsunori Ogawa, Marc Delcroix:
Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization. 2943-2947
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DeshmukhEW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DeshmukhEW23
Soham Deshmukh, Benjamin Elizalde, Huaming Wang:
Audio Retrieval with WavText5K and CLAP Training. 2948-2952
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CappellazzoYFB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CappellazzoYFB23
Umberto Cappellazzo, Muqiao Yang, Daniele Falavigna, Alessio Brutti:
Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding. 2953-2957
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChienL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChienL23
Jen-Tzung Chien, Shang-En Li:
Contrastive Disentangled Learning for Memory-Augmented Transformer. 2958-2962

Invariant and Robust Pre-trained Acoustic Models

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SeysselLTTVRWLD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SeysselLTTVRWLD23
Maureen de Seyssel, Marvin Lavechin, Hadrien Titeux, Arthur Thomas, Gwendal Virlet, Andrea Santos Revilla, Guillaume Wisniewski, Bogdan Ludusan, Emmanuel Dupoux:
ProsAudit, a prosodic benchmark for self-supervised speech models. 2963-2967
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuTG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuTG23
Oli Danyi Liu, Hao Tang, Sharon Goldwater:
Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces. 2968-2972
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HallapDD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HallapDD23
Mark Hallap, Emmanuel Dupoux, Ewan Dunbar:
Evaluating context-invariance in unsupervised speech representations. 2973-2977
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MengAKW023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MengAKW023
Chutong Meng, Junyi Ao, Tom Ko, Mingxuan Wang, Haizhou Li:
CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning. 2978-2982
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangLG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangLG23
Heng-Jui Chang, Alexander H. Liu, James R. Glass:
Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering. 2983-2987
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinYA023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinYA023
Jingru Lin, Xianghu Yue, Junyi Ao, Haizhou Li:
Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder. 2988-2992

Pathological Speech Analysis 2

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PatersonMC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PatersonMC23
Mary Paterson, James Moor, Luisa Cutillo:
A Pipeline to Evaluate the Effects of Noise on Machine Learning Detection of Laryngeal Cancer. 2993-2997
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangW023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangW023
Pingyue Zhang, Mengyue Wu, Kai Yu:
ReCLR: Reference-Enhanced Contrastive Learning of Audio Representation for Depression Detection. 2998-3002
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LopezSBHG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LopezSBHG23
José Vicente Egas López, Veronika Svindt, Judit Bóna, Ildikó Hoffmann, Gábor Gosztolya:
Automated Multiple Sclerosis Screening Based on Encoded Speech Representations. 3003-3007
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MelistasKAMSGKN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MelistasKAMSGKN23
Thomas Melistas, Lefteris Kapelonis, Nikolaos Antoniou, Petros Mitseas, Dimitris Sgouropoulos, Theodoros Giannakopoulos, Athanasios Katsamanis, Shrikanth Narayanan:
Cross-Lingual Features for Alzheimer's Dementia Detection from Speech. 3008-3012
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZusagWB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZusagWB23
Mario Zusag, Laurin Wagner, Theresa Bloder:
Careful Whisper - leveraging advances in automatic speech recognition for robust and interpretable aphasia subtype classification. 3013-3017
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ThienpondtSD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ThienpondtSD23
Jenthe Thienpondt, Caroline M. Speksnijder, Kris Demuynck:
Behavioral Analysis of Pathological Speaker Embeddings of Patients During Oncological Treatment of Oral Cancer. 3018-3022

Speech Synthesis: Representation Learning

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoonUKK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoonUKK23
Hyungchan Yoon, Seyun Um, Changhwan Kim, Hong-Goo Kang:
Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech. 3023-3027
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HsiehGG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HsiehGG23
Cheng-Ping Hsieh, Subhankar Ghosh, Boris Ginsburg:
Adapter-Based Extension of Multi-Speaker Text-To-Speech Model for New Speakers. 3028-3032
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SivaguruLU23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SivaguruLU23
Ramanan Sivaguru, Vasista Sai Lodagala, Srinivasan Umesh:
SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis. 3033-3037
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimKYY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimKYY23
Heeseung Kim, Sungwon Kim, Jiheum Yeom, Sungroh Yoon:
UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data. 3038-3042
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Dang0TNNLLD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Dang0TNNLLD23
Dinh Son Dang, Tung Lam Nguyen, Bao Thang Ta, Tien Thanh Nguyen, Thi Ngoc Anh Nguyen, Dang Linh Le, Nhat Minh Le, Van Hai Do:
LightVoc: An Upsampling-Free GAN Vocoder Based On Conformer And Inverse Short-time Fourier Transform. 3043-3047
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaitoTITS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaitoTITS23
Yuki Saito, Shinnosuke Takamichi, Eiji Iimori, Kentaro Tachibana, Hiroshi Saruwatari:
ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings. 3048-3052

Speech Perception, Production, and Acquisition 1

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoSCD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoSCD23
Jian Gao, Hanbo Sun, Cheng Cao, Zheng Du:
Human Transcription Quality Improvement. 3053-3057
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SimantirakiPC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SimantirakiPC23
Olympia Simantiraki, Yannis Pantazis, Martin Cooke:
The effect of masking noise on listeners' spectral tilt preferences. 3058-3062
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NgocMM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NgocMM23
Anaïs Tran Ngoc, Fanny Meunier, Julien Meyer:
The Effect of Whistled Vowels on Whistled Word Categorization for Naive Listeners. 3063-3067
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BharatiCM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BharatiCM23
Puja Bharati, Sabyasachi Chandra, Shayamal Kumar Das Mandal:
Automatic Deep Neural Network-Based Segmental Pronunciation Error Detection of L2 English Speech (L1 Bengali). 3068-3072
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HaoG023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HaoG023
Lixia Hao, Qi Gong, Jinsong Zhang:
The effect of stress on Mandarin tonal perception in continuous speech for Spanish-speaking learners. 3073-3077
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ElmerichGACM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ElmerichGACM23
Amélie Elmerich, Jiayin Gao, Angélique Amelot, Lise Crevier-Buchman, Shinji Maeda:
Combining acoustic and aerodynamic data collection: A perceptual evaluation of acoustic distortions. 3078-3082
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ElieT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ElieT23
Benjamin Elie, Alice Turk:
Estimating virtual targets for lingual stop consonants using general Tau theory. 3083-3087
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Gibson23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Gibson23
Mark Gibson:
Using Random Forests to classify language as a function of syllable timing in two groups: children with cochlear implants and with normal hearing. 3088-3092
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangGK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangGK23
Sheng Yang, Zheng Gong, Jia Kang:
An Improved End-to-End Audio-Visual Speech Recognition Model. 3093-3097
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WesolekGBZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WesolekGBZ23
Sarah Wesolek, Piotr Gulgowski, Joanna Blaszczak, Marzena Zygis:
What influences the foreign accent strength? Phonological and grammatical errors in the perception of accentedness. 3098-3102
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuttnerNP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuttnerNP23
Lena-Marie Huttner, Noël Nguyen, Martin J. Pickering:
Investigating the Perception Production Link through Perceptual Adaptation and Phonetic Convergence. 3103-3107
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouL0SWZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouL0SWZ23
Xingfa Zhou, Min Li, Lan Yang, Rui Sun, Xin Wang, Huayi Zhan:
Emotion Prompting for Speech Emotion Recognition. 3108-3112
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChinTA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChinTA23
Jessica L. L. Chin, Elena Talevska, Mark Antoniou:
Speech-in-Speech Recognition is Modulated by Familiarity to Dialect. 3113-3116
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0042XZL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0042XZL23
Jie Zhang, Qing-Tian Xu, Qiu-Shi Zhu, Zhen-Hua Ling:
BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-talker Conditions. 3117-3121
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MiodonskaLMKSB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MiodonskaLMKSB23
Zuzanna Miodonska, Claartje Levelt, Natalia Mocko, Michal Krecichwost, Agata Sage, Pawel Badura:
Are retroflex-to-dental sibilant substitutions in Polish children's speech an example of a covert contrast? A preliminary acoustic study. 3122-3126

Speaker and Language Identification 2

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuQ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuQ23
Bei Liu, Yanmin Qian:
Reversible Neural Networks for Memory-Efficient Speaker Verification. 3127-3131
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuQ23a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuQ23a
Bei Liu, Yanmin Qian:
ECAPA++: Fine-grained Deep Embedding Learning for TDNN Based Speaker Verification. 3132-3136
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangY0ZZFC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangY0ZZFC23
Chenglong Wang, Jiangyan Yi, Jianhua Tao, Chu Yuan Zhang, Shuai Zhang, Ruibo Fu, Xun Chen:
TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection. 3137-3141
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZuoLL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZuoLL23
Chu-Xiao Zuo, Jia-Yi Leng, Wu-Jun Li:
Fooling Speaker Identification Systems with Adversarial Background Music. 3142-3146
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Li0DZHZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Li0DZHZ23
Jianchen Li, Jiqing Han, Shiwen Deng, Tieran Zheng, Yongjun He, Guibin Zheng:
Mutual Information-based Embedding Decoupling for Generalizable Speaker Verification. 3147-3151
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiangTP023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiangTP023
Yidi Jiang, Ruijie Tao, Zexu Pan, Haizhou Li:
Target Active Speaker Detection with Audio-visual Cues. 3152-3156
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BroughtonS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BroughtonS23
Samuel J. Broughton, Lahiru Samarakoon:
Improving End-to-End Neural Diarization Using Conversational Summary Representations. 3157-3161
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zang0D23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zang0D23
Yongyi Zang, You Zhang, Zhiyao Duan:
Phase perturbation improves channel robustness for speech spoofing countermeasures. 3162-3166
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BousquetR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BousquetR23
Pierre-Michel Bousquet, Mickael Rouvier:
Improving training datasets for resource-constrained speaker recognition neural networks. 3167-3171
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LertpetchpunC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LertpetchpunC23
Thanathai Lertpetchpun, Ekapol Chuangsuwanich:
Instance-based Temporal Normalization for Speaker Verification. 3172-3176
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NovoselovLAVKAL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NovoselovLAVKAL23
Sergey Novoselov, Galina Lavrentyeva, Anastasia Avdeeva, Vladimir Volokhov, Nikita Khmelev, Artem Akulov, Polina Leonteva:
On the robustness of wav2vec 2.0 based speaker recognition systems. 3177-3181
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangWXX023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangWXX023
Xiyuan Wang, Fangyuan Wang, Bo Xu, Liang Xu, Jing Xiao:
P-vectors: A Parallel-coupled TDNN/Transformer Network for Speaker Verification. 3182-3186
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeiWYLM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeiWYLM23
Zhenchun Lei, Yan Wen, Yingen Yang, Changhong Liu, Minglei Ma:
Group GMM-ResNet for Detection of Synthetic Speech Attacks. 3187-3191
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Fang0MGL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Fang0MGL23
Zhihua Fang, Liang He, Hanhan Ma, Xiaochen Guo, Lin Li:
Robust Training for Speaker Verification against Noisy Labels. 3192-3196
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JeoungCSKC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JeoungCSKC23
Ye-Rin Jeoung, Jeong-Hwan Choi, Ju-Seok Seong, Jehyun Kyung, Joon-Hyuk Chang:
Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization. 3197-3201
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenHXHLQ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenHXHLQ23
Zhengyang Chen, Bing Han, Xu Xiang, Houjun Huang, Bei Liu, Yanmin Qian:
Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022. 3202-3206
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AmorBOB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AmorBOB23
Imen Ben Amor, Jean-François Bonastre, Benjamin O'Brien, Pierre-Michel Bousquet:
Describing the phonetics in the underlying speech attributes for deep and interpretable speaker recognition. 3207-3211
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhang0CEY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhang0CEY23
Lin Zhang, Xin Wang, Erica Cooper, Nicholas W. D. Evans, Junichi Yamagishi:
Range-Based Equal Error Rate for Spoof Localization. 3212-3216
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TabassumTSSMH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TabassumTSSMH23
Nowshin Tabassum, Tasfia Tabassum, Fardin Saad, Tahiya Sultana Safa, Hasan Mahmud, Md. Kamrul Hasan:
Exploring the English Accent-independent Features for Speech Emotion Recognition using Filter and Wrapper-based Methods for Feature Selection. 3217-3221
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PlaquetB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PlaquetB23
Alexis Plaquet, Hervé Bredin:
Powerset multi-class cross entropy loss for neural speaker diarization. 3222-3226
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunZLYZ0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunZLYZ0023
Peiwen Sun, Shanshan Zhang, Zishan Liu, Yougen Yuan, Taotao Zhang, Honggang Zhang, Pengfei Hu:
A Method of Audio-Visual Person Verification by Mining Connections between Time Series. 3227-3231

Speech Recognition: Architecture, Search, and Linguistic Components 3

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FishMO23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FishMO23
Edward Fish, Umberto Michieli, Mete Ozay:
A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization. 3232-3236
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0011P23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0011P23
Zhe Liu, Fuchun Peng:
Modeling Dependent Structure for Utterances in ASR Evaluation. 3237-3241
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VermaSM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VermaSM23
Tushar Verma, Atul Shree, Ashutosh Modi:
ASR for Low Resource and Multilingual Noisy Code-Mixed Speech. 3242-3246
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiLGZY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiLGZY23
Xian Shi, Haoneng Luo, Zhifu Gao, Shiliang Zhang, Zhijie Yan:
Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System. 3247-3251
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MatejuNCZK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MatejuNCZK23
Lukás Mateju, Jan Nouza, Petr Cerva, Jindrich Zdánský, Frantisek Kynych:
Combining Multilingual Resources and Models to Develop State-of-the-Art E2E ASR for Swedish. 3252-3256
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangSWZM023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangSWZM023
Zhanheng Yang, Sining Sun, Xiong Wang, Yike Zhang, Long Ma, Lei Xie:
Two Stage Contextual Word Filtering for Context Bias in Unified Streaming and Non-streaming Transducer. 3257-3261
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhamNW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhamNW23
Quan Ngoc Pham, Jan Niehues, Alex Waibel:
Towards continually learning new languages. 3262-3266
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaGKQ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaGKQ23
Rao Ma, Mark J. F. Gales, Kate M. Knill, Mengjie Qian:
N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space. 3267-3271
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuS0M23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuS0M23
Jiaxu Zhu, Changhe Song, Zhiyong Wu, Helen Meng:
SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge. 3272-3276
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GulzarBEIN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GulzarBEIN23
Haris Gulzar, Monikka Roslianna Busto, Takeharu Eda, Katsutoshi Itoyama, Kazuhiro Nakadai:
miniStreamer: Enhancing Small Conformer with Chunked-Context Masking for Streaming ASR Applications on the Edge. 3277-3281
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuPL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuPL23
Wei Liu, Zhiyuan Peng, Tan Lee:
CoMFLP: Correlation Measure Based Fast Search on ASR Layer Pruning. 3282-3286
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Shi0IG0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Shi0IG0023
Jiatong Shi, Yun Tang, Hirofumi Inaguma, Hongyu Gong, Juan Pino, Shinji Watanabe:
Exploration on HuBERT with Multiple Resolution. 3287-3291
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangCKZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangCKZ23
Zi Yang, Samridhi Choudhary, Siegfried Kunzmann, Zheng Zhang:
Quantization-aware and Tensor-compressed Training of Transformers for Natural Language Understanding. 3292-3296
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NaowaratKC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NaowaratKC23
Burin Naowarat, Thananchai Kongthaworn, Ekapol Chuangsuwanich:
Word-level Confidence Estimation for CTC Models. 3297-3301
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KulshreshthaDHB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KulshreshthaDHB23
Devang Kulshreshtha, Saket Dingliwal, Brady Houston, Sravan Bodapati:
Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages. 3302-3306
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhengM0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhengM0023
Zhisheng Zheng, Ziyang Ma, Yu Wang, Xie Chen:
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition. 3307-3311
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Sudo0YS023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Sudo0YS023
Yui Sudo, Muhammad Shakeel, Brian Yan, Jiatong Shi, Shinji Watanabe:
4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders. 3312-3316
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YenKYHSC023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YenKYHSC023
Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao:
Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition. 3317-3321
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FanD0L00M23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FanD0L00M23
Zhiyun Fan, Linhao Dong, Chen Shen, Zhenlin Liang, Jun Zhang, Lu Lu, Zejun Ma:
Language-specific Boundary Learning for Improving Mandarin-English Code-switching Speech Recognition. 3322-3326
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Hu0S0B23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Hu0S0B23
Ke Hu, Bo Li, Tara N. Sainath, Yu Zhang, Françoise Beaufays:
Mixture-of-Expert Conformer for Streaming Multilingual ASR. 3327-3331
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiWDXHL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiWDXHL23
Zhaoqing Li, Tianzi Wang, Jiajun Deng, Junhao Xu, Shoukang Hu, Xunying Liu:
Lossless 4-bit Quantization of Architecture Compressed Conformer ASR Systems on the 300-hr Switchboard Corpus. 3332-3336
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuanYF00S023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuanYF00S023
Yuping Yuan, Zhao You, Shulin Feng, Dan Su, Yanchun Liang, Xiaohu Shi, Dong Yu:
Compressed MoE ASR Model Based on Knowledge Distillation and Quantization. 3337-3341

Acoustic Model Adaptation for ASR

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DengLXJCWHGL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DengLXJCWHGL23
Jiajun Deng, Guinan Li, Xurong Xie, Zengrui Jin, Mingyu Cui, Tianzi Wang, Shujie Hu, Mengzhe Geng, Xunying Liu:
Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems. 3342-3346
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wang0SYQ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wang0SYQ23
Wei Wang, Xun Gong, Hang Shao, Dongning Yang, Yanmin Qian:
Text Only Domain Adaptation with Phoneme Guided Data Splicing for End-to-End Speech Recognition. 3347-3351
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CahyawijayaLCF023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CahyawijayaLCF023
Samuel Cahyawijaya, Holy Lovenia, Willy Chung, Rita Frieske, Zihan Liu, Pascale Fung:
Cross-Lingual Cross-Age Adaptation for Low-Resource Elderly Speech Emotion Recognition. 3352-3356
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Li0HSM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Li0HSM23
Qiujia Li, Bo Li, Dongseong Hwang, Tara N. Sainath, Pedro Moreno Mengibar:
Modular Domain Adaptation for Conformer-Based Streaming ASR. 3357-3361
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BhatiaSDGBK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BhatiaSDGBK23
Anshu Bhatia, Sanchit Sinha, Saket Dingliwal, Karthik Gopalakrishnan, Sravan Bodapati, Katrin Kirchhoff:
Don't Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters. 3362-3366
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimPSY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimPSY23
Changhun Kim, Joonhyung Park, Hajin Shim, Eunho Yang:
SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization. 3367-3371

Speech Synthesis: Expressivity

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MoriK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MoriK23
Hiroki Mori, Shunya Kimura:
A Generative Framework for Conversational Laughter: Its 'Language Model' and Laughter Sound Synthesis. 3372-3376
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiLH00KM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiLH00KM23
Weiqin Li, Shun Lei, Qiaochu Huang, Yixuan Zhou, Zhiyong Wu, Shiyin Kang, Helen Meng:
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis. 3377-3381
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LamerisGS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LamerisGS23
Harm Lameris, Joakim Gustafson, Éva Székely:
Beyond Style: Synthesizing Speech with Pragmatic Functions. 3382-3386
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AbbasKSKMNBPMD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AbbasKSKMNBPMD23
Ammar Abbas, Sri Karlapati, Bastian Schnell, Penny Karanasou, Marcel Granero Moya, Amith Nagaraj, Ayman Boustati, Nicole Peinelt, Alexis Moinet, Thomas Drugman:
eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer. 3387-3391

Multi-modal Systems

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DebNMCMGSBK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DebNMCMGSBK23
Ahana Deb, Sayan Nag, Ayan Mahapatra, Soumitri Chattopadhyay, Aritra Marik, Pijush Kanti Gayen, Shankha Sanyal, Archi Banerjee, Samir Karmakar:
BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion. 3392-3396
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KashiwagiTFM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KashiwagiTFM23
Sara Kashiwagi, Keitaro Tanaka, Qi Feng, Shigeo Morishima:
Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning. 3397-3401
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Jakubiak23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Jakubiak23
Agata Jakubiak:
Whistle-to-text: Automatic recognition of the Silbo Gomero whistled language. 3402-3406
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoH023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoH023
Lufei Gao, Shan Huang, Li Liu:
A Novel Interpretable and Generalizable Re-synchronization Model for Cued Speech based on a Multi-Cuer Corpus. 3407-3411
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NortjeNK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NortjeNK23
Leanne Nortje, Benjamin van Niekerk, Herman Kamper:
Visually grounded few-shot word acquisition with fewer shots. 3412-3416
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhou0SYL023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhou0SYL023
Li Zhou, Zhenyu Liu, Zixuan Shangguan, Xiaoyan Yuan, Yutong Li, Bin Hu:
JAMFN: Joint Attention Multi-Scale Fusion Network for Depression Detection. 3417-3421

Question Answering from Speech

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangLW00Y023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangLW00Y023
Yong Zhang, Zhitao Li, Jianzong Wang, Yiming Gao, Ning Cheng, Fengying Yu, Jing Xiao:
Prompt Guided Copy Mechanism for Conversational Question Answering. 3422-3426
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FaustiniFCFRM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FaustiniFCFRM23
Pedro Faustini, Besnik Fetahu, Giuseppe Castellucci, Anjie Fang, Oleg Rokhlenko, Shervin Malmasi:
Composing Spoken Hints for Follow-on Question Suggestion in Voice Assistants. 3427-3431
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HanJH023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HanJH023
Sang-eun Han, Yeonseok Jeong, Seung-won Hwang, Kyungjae Lee:
On Monotonic Aggregation for Open-domain QA. 3432-3436
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NguyenKNNCV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NguyenKNNCV23
Minh Van Nguyen, Kishan KC, Toan Nguyen, Thien Huu Nguyen, Ankit Chadha, Thuy Vu:
Question-Context Alignment and Answer-Context Dependencies for Effective Answer Sentence Selection. 3437-3441
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiX023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiX023
Guangyao Li, Yixin Xu, Di Hu:
Multi-Scale Attention for Audio Question Answering. 3442-3446
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenH0X023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenH0X023
Feilong Chen, Minglun Han, Jing Shi, Shuang Xu, Bo Xu:
Enhancing Visual Question Answering via Deconstructing Questions and Explicating Answers. 3447-3451

Multi-talker Methods in Speech Processing

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZengSW023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZengSW023
Bang Zeng, Hongbin Suo, Yulong Wan, Ming Li:
SEF-Net: Speaker Embedding Free Target Speaker Extraction Network. 3452-3456
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RoseCS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RoseCS23
Richard Rose, Oscar Chang, Olivier Siohan:
Cascaded encoders for fine-tuning ASR models on overlapped speech. 3457-3461
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ErdoganWCBTZH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ErdoganWCBTZH23
Hakan Erdogan, Scott Wisdom, Xuankai Chang, Zalán Borsos, Marco Tagliasacchi, Neil Zeghidour, John R. Hershey:
TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition. 3462-3466
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MengKCWWM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MengKCWWM23
Lingwei Meng, Jiawen Kang, Mingyu Cui, Haibin Wu, Xixin Wu, Helen Meng:
Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator. 3467-3471
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kalkhorani00XW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kalkhorani00XW23
Vahid Ahmadi Kalkhorani, Anurag Kumar, Ke Tan, Buye Xu, DeLiang Wang:
Time-domain Transformer-based Audiovisual Speaker Separation. 3472-3476
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DelcroixTDLSONB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DelcroixTDLSONB23
Marc Delcroix, Naohiro Tawara, Mireia Díez, Federico Landini, Anna Silnova, Atsunori Ogawa, Tomohiro Nakatani, Lukás Burget, Shoko Araki:
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization. 3477-3481
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NiuDH0LL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NiuDH0LL23
Shutong Niu, Jun Du, Maokui He, Chin-Hui Lee, Baoxiang Li, Jiakui Li:
Unsupervised Adaptation with Quality-Aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization. 3482-3486
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiangYLGZ0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiangYLGZ0023
Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen, Lei Xie:
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR. 3487-3491
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoGM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoGM23
Chenyang Gao, Yue Gu, Ivan Marsic:
Improving Label Assignments Learning by Dynamic Sample Dropout Combined with Layer-wise Optimization in Speech Separation. 3492-3496
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaultierG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaultierG23
Clément Gaultier, Tobias Goehring:
Joint compensation of multi-talker noise and reverberation for speech enhancement with cochlear implants using one or more microphones. 3497-3501
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YousefiKW00Y23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YousefiKW00Y23
Midia Yousefi, Naoyuki Kanda, Dongmei Wang, Zhuo Chen, Xiaofei Wang, Takuya Yoshioka:
Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach. 3502-3506
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RajPK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RajPK23
Desh Raj, Daniel Povey, Sanjeev Khudanpur:
GPU-accelerated Guided Source Separation for Meeting Transcription. 3507-3511
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuZLQ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuZLQ23
Linfeng Yu, Wangyou Zhang, Chenda Li, Yanmin Qian:
Overlap Aware Continuous Speech Separation without Permutation Invariant Training. 3512-3516
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangQ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangQ23
Wangyou Zhang, Yanmin Qian:
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition. 3517-3521
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinMXKFS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinMXKFS23
Ju Lin, Niko Moritz, Ruiming Xie, Kaustubh Kalgaonkar, Christian Fuegen, Frank Seide:
Directional Speech Recognition for Speaker Disambiguation and Cross-talk Suppression. 3522-3526
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BergerVBSH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BergerVBSH23
Simon Berger, Peter Vieting, Christoph Böddeker, Ralf Schlüter, Reinhold Haeb-Umbach:
Mixture Encoder for Joint Speech Separation and Recognition. 3527-3531

Sociophonetics

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HejnaJ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HejnaJ23
Mísa Hejná, Adèle Jatteau:
Aberystwyth English Pre-aspiration in Apparent Time. 3532-3536
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunD23
Yanting Sun, Hongwei Ding:
Speech Entrainment in Chinese Story-Style Talk Shows: The Interaction Between Gender and Role. 3537-3541
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SteinerSLBL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SteinerSLBL23
Carina Steiner, Dieter Studer-Joho, Corinne Lanthemann, Andrin Büchler, Adrian Leemann:
Sociodemographic and Attitudinal Effects on Dialect Speakers' Articulation of the Standard Language: Evidence from German-Speaking Switzerland. 3542-3546
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Burridge23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Burridge23
James Burridge:
Vowel Normalisation in Latent Space for Sociolinguistics. 3547-3551

Speaker and Language Diarization

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenHWQ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenHWQ23
Zhengyang Chen, Bing Han, Shuai Wang, Yanmin Qian:
Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor. 3552-3556
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LahiriFHLKN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LahiriFHLKN23
Rimita Lahiri, Tiantian Feng, Rajat Hebbar, Catherine Lord, So Hyun Kim, Shrikanth Narayanan:
Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism. 3557-3561
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaghelRSHSJCKPV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaghelRSHSJCKPV23
Shikha Baghel, Shreyas Ramoji, Sidharth, Ranjana H, Prachi Singh, Somil Jain, Pratik Roy Chowdhuri, Kaustubh Kulkarni, Swapnil Padhi, Deepu Vijayasenan, Sriram Ganapathy:
The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments. 3562-3566
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PaturiSL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PaturiSL23
Rohit Paturi, Sundararajan Srinivasan, Xiang Li:
Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction. 3567-3571
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PirlogeanuOGC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PirlogeanuOGC23
Gabriel Pirlogeanu, Dan Oneata, Alexandru-Lucian Georgescu, Horia Cucu:
The SpeeD-ZevoTech submission at DISPLACE 2023. 3572-3576
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLFKL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLFKL23
Chao Wang, Jie Li, Xiang Fang, Jian Kang, Yongxiang Li:
End-to-End Neural Speaker Diarization with Absolute Speaker Loss. 3577-3581

Speech Emotion Recognition 2

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuCLCL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuCLCL23
Ya-Tse Wu, Yuan-Ting Chang, Shao-Hao Lu, Jing-Yi Chuang, Chi-Chun Lee:
A Context-Constrained Sentence Modeling for Deception Detection in Real Interrogation. 3582-3586
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuL23
Ya-Tse Wu, Chi-Chun Lee:
MetricAug: A Distortion Metric-Lead Augmentation Strategy for Training Noise-Robust Speech Emotion Recognizer. 3587-3591
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LudusanSRW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LudusanSRW23
Bogdan Ludusan, Marin Schröer, Martina Rossi, Petra Wagner:
The co-use of laughter and head gestures across speech styles. 3592-3596
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunL0L0SCWC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunL0L0SCWC23
Haiyang Sun, Zheng Lian, Bin Liu, Ying Li, Jianhua Tao, Licai Sun, Cong Cai, Meng Wang, Yuan Cheng:
EmotionNAS: Two-stream Neural Architecture Search for Speech Emotion Recognition. 3597-3601
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Chen023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Chen023
Maximillian Chen, Zhou Yu:
Pre-Finetuning for Few-Shot Emotional Speech Recognition. 3602-3606
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wu0W23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wu0W23
Wen Wu, Chao Zhang, Philip C. Woodland:
Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations. 3607-3611
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LavaniaDHH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LavaniaDHH23
Chandrashekhar Lavania, Sanjiv Das, Xin Huang, Kyu J. Han:
Utility-Preserving Privacy-Enabled Speech Embeddings for Emotion Detection. 3612-3616
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BurdissoVMM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BurdissoVMM23
Sergio Burdisso, Esaú Villatoro-Tello, Srikanth R. Madikeri, Petr Motlícek:
Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews. 3617-3621
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BrancoTIT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BrancoTIT23
Catarina Branco, Isabel Trancoso, Paulo Infante, Khiet P. Truong:
Laughter in task-based settings: whom we talk to affects how, when, and how often we laugh. 3622-3626
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FangXXZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FangXXZ23
Yuanbo Fang, Xiaofen Xing, Xiangmin Xu, Weibin Zhang:
Exploring Downstream Transfer of Self-Supervised Features for Speech Emotion Recognition. 3627-3631
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OliveiraPG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OliveiraPG23
Danilo de Oliveira, Navin Raj Prabhu, Timo Gerkmann:
Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled Models. 3632-3636
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoCK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoCK23
Yuan Gao, Chenhui Chu, Tatsuya Kawahara:
Two-stage Finetuning of Wav2vec 2.0 for Speech Emotion Recognition with ASR and Gender Pretraining. 3637-3641
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ThakranA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ThakranA23
Yash Thakran, Vinayak Abrol:
Investigating Acoustic Cues for Multilingual Abuse Detection. 3642-3646
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Singh023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Singh023
Premjeet Singh, Goutam Saha:
A novel frequency warping scale for speech emotion recognition. 3647-3651
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiXFZFX23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiXFZFX23
Zhipeng Li, Xiaofen Xing, Yuanbo Fang, Weibin Zhang, Hengsheng Fan, Xiangmin Xu:
Multi-Scale Temporal Transformer For Speech Emotion Recognition. 3652-3656
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GragedaAMBY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GragedaAMBY23
Nicolás Grágeda, Eduardo Alvarado, Rodrigo Mahú, Carlos Busso, Néstor Becerra Yoma:
Distant Speech Emotion Recognition in an Indoor Human-robot Interaction Scenario. 3657-3661
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TaoLCL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TaoLCL23
Dehua Tao, Tan Lee, Harold Chui, Sarah Luk:
A Study on Prosodic Entrainment in Relation to Therapist Empathy in Counseling Conversation. 3662-3666

Show and Tell: Language learning and educational resources

- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/MinematsuNGS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MinematsuNGS23
Nobuaki Minematsu, Noriko Nakanishi, Yingxiang Gao, Haitong Sun:
A Unified Framework to Improve Learners' Skills of Perception and Production Based on Speech Shadowing and Overlapping. 3667-3668
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/NichollsKGRR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NichollsKGRR23
Diane Nicholls, Kate M. Knill, Mark J. F. Gales, Anton Ragni, Paul Ricketts:
Speak & Improve: L2 English Speaking Practice Tool. 3669-3670
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/NicolaoMMMOOK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NicolaoMMMOOK23
Mauro Nicolao, Brenda McGuirk, Declan Moore, Niall Mullally, Lora Lynn O'Mahony, Emma O'Neill, Amelia C. Kelly:
Measuring prosody in child speech using SoapBox Fluency API. 3671-3672
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/Nissen23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Nissen23
Shawn L. Nissen:
Teaching Non-native Sound Contrasts using Visual Biofeedback. 3673-3674
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/WalshHNWRZ0ZDFW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WalshHNWRZ0ZDFW23
Brendan Walsh, Mark Hamilton, Greg Newby, Xi Wang, Serena Ruan, Sheng Zhao, Lei He, Shaofei Zhang, Eric Dettinger, William T. Freeman, Markus Weimer:
Large-Scale Automatic Audiobook Creation. 3675-3676
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/KheirKCMA023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KheirKCMA023
Yassine El Kheir, Fouad Khnaisser, Shammur Absar Chowdhury, Hamdy Mubarak, Shazia Afzal, Ahmed M. Ali:
QVoice: Arabic Speech Pronunciation Learning Application. 3677-3678
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/SvecBFP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SvecBFP23
Jan Svec, Martin Bulín, Adam Frémund, Filip Polák:
Asking Questions: an Innovative Way to Interact with Oral History Archives. 3679-3680
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/BhatJB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BhatJB23
Vineet Bhat, Preethi Jyothi, Pushpak Bhattacharyya:
DisfluencyFixer: A tool to enhance Language Learning through Speech To Speech Disfluency Correction. 3681-3682
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/0001ASMGKFVMMWN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001ASMGKFVMMWN23
Anusha Prakash, Arun Kumar A, Ashish Seth, Bhagyashree Mukherjee, Ishika Gupta, Jom Kuriakose, Jordan Fernandes, K. V. Vikram, Mano Ranjith Kumar M., Metilda Sagaya Mary, Mohammad Wajahat, Mohana N, Mudit Batra, Navina K, Nihal John George, Nithya Ravi, Pruthwik Mishra, Sudhanshu Srivastava, Vasista Sai Lodagala, Vandan Mujadia, Kada Sai Venkata Vineeth, Vrunda N. Sukhadia, Dipti Misra Sharma, Hema A. Murthy, Pushpak Bhattacharyya, Srinivasan Umesh, Rajeev Sangal:
Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages. 3683-3684
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/ElshahawyKC023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ElshahawyKC023
Yousseif Elshahawy, Yassine El Kheir, Shammur Absar Chowdhury, Ahmed M. Ali:
MyVoice: Arabic Speech Resource Collaboration Platform. 3685-3686
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/HromadaK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HromadaK23
Daniel Devatman Hromada, Hyungjoong Kim:
Personal Primer Prototype 1: Invitation to Make Your Own Embooked Speech-Based Educational Artifact. 3687-3688

Analysis of Speech and Audio Signals 3

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DengZL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DengZL23
Zhewen Deng, Yi Zhou, Hongqing Liu:
Time-frequency Domain Filter-and-sum Network for Multi-channel Speech Separation. 3689-3693
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuZCWA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuZCWA23
Debang Liu, Tianqi Zhang, Mads Græsbøll Christensen, Ying Wei, Zeliang An:
Audio-Visual Fusion using Multiscale Temporal Convolutional Attention for Time-Domain Speech Separation. 3694-3698
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wang23a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wang23a
Junyu Wang:
An Efficient Speech Separation Network Based on Recurrent Fusion Dilated Convolution and Channel Attention. 3699-3703
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhokhinananOA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhokhinananOA23
Waradon Phokhinanan, Nicolas Obin, Sylvain Argentieri:
Binaural Sound Localization in Noisy Environments Using Frequency-Based Audio Vision Transformer (FAViT). 3704-3708
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimK23
Jihyun Kim, Hong-Goo Kang:
Contrastive Learning based Deep Latent Masking for Music Source Separation. 3709-3713
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangBP0WW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangBP0WW23
Ke Zhang, Marvin Borsdorf, Zexu Pan, Haizhou Li, Yangjie Wei, Yi Wang:
Speaker Extraction with Detection of Presence and Absence of Target Speakers. 3714-3718
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuG0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuG0023
Qinghua Liu, Meng Ge, Zhizheng Wu, Haizhou Li:
PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network. 3719-3723
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SarabiaMTSAPZTA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SarabiaMTSAPZTA23
Miguel Sarabia, Elena Menyaylenko, Alessandro Toso, Skyler Seto, Zakaria Aldeneh, Shadi Pirhosseinloo, Luca Zappella, Barry-John Theobald, Nicholas Apostoloff, Jonathan Sheaffer:
Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning. 3724-3728
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiBWDZZW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiBWDZZW23
Chenxing Li, Ye Bai, Yang Wang, Feng Deng, Yuanyuan Zhao, Zhuo Zhang, Xiaorui Wang:
Image-driven Audio-visual Universal Source Separation. 3729-3733
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FrasWK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FrasWK23
Mieszko Fras, Marcin Witkowski, Konrad Kowalczyk:
Joint Blind Source Separation and Dereverberation for Automatic Speech Recognition using Delayed-Subsource MNMF with Localization Prior. 3734-3738
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangDFGWZ0W23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangDFGWZ0W23
Honglong Wang, Chengyun Deng, Yanjie Fu, Meng Ge, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Fei Wang:
SDNet: Stream-attention and Dual-feature Learning Network for Ad-hoc Array Speech Separation. 3739-3743
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaekYC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaekYC23
Min-Sang Baek, Joon-Young Yang, Joon-Hyuk Chang:
Deeply Supervised Curriculum Learning for Deep Neural Network-based Sound Source Localization. 3744-3748
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FujimuraS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FujimuraS23
Takuya Fujimura, Robin Scheibler:
Multi-channel separation of dynamic speech and sound events. 3749-3753
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiGPCW0Z23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiGPCW0Z23
Junjie Li, Meng Ge, Zexu Pan, Rui Cao, Longbiao Wang, Jianwu Dang, Shiliang Zhang:
Rethinking the Visual Cues in Audio-Visual Speaker Extraction. 3754-3758
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DangMTK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DangMTK23
Shaoxiang Dang, Tetsuya Matsumoto, Yoshinori Takeuchi, Hiroaki Kudo:
Using Semi-supervised Learning for Monaural Time-domain Speech Separation with a Self-supervised Learning-based SI-SNR Estimator. 3759-3763
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimLYSLCL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimLYSLCL23
Younggwan Kim, Hyungjun Lim, Kiho Yeom, Eunjoo Seo, Hoodong Lee, Stanley Jungkyu Choi, Honglak Lee:
Investigation of Training Mute-Expressive End-to-End Speech Separation Networks for an Unknown Number of Speakers. 3764-3768
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoC23
Jae-Heung Cho, Joon-Hyuk Chang:
SR-SRP: Super-Resolution based SRP-PHAT for Sound Source Localization and Tracking. 3769-3773
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangNL0JXMNZ0C23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangNL0JXMNZ0C23
Zhao Yang, Dianwen Ng, Xizhe Li, Chong Zhang, Rui Jiang, Wei Xi, Yukun Ma, Chongjia Ni, Jizhong Zhao, Bin Ma, Eng Siong Chng:
Dual-Memory Multi-Modal Learning for Continual Spoken Keyword Spotting with Confidence Selection and Diversity Enhancement. 3774-3778
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangYL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangYL23
Yabo Wang, Bing Yang, Xiaofei Li:
FN-SSL: Full-Band and Narrow-Band Fusion for Sound Source Localization. 3779-3783
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0075YLHKC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0075YLHKC23
Chen Chen, Chao-Han Huck Yang, Kai Li, Yuchen Hu, Pin-Jui Ku, Eng Siong Chng:
A Neural State-Space Modeling Approach to Efficient Speech Separation. 3784-3788
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FuGWLYWZ0DW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FuGWLYWZ0DW23
Yanjie Fu, Meng Ge, Honglong Wang, Nan Li, Haoran Yin, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang:
Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation. 3789-3793
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangBZC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangBZC23
Xue Yang, Changchun Bao, Xu Zhang, Xianhong Chen:
Monaural Speech Separation Method Based on Recurrent Attention with Parallel Branches. 3794-3798
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuKLM0P23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuKLM0P23
Haohe Liu, Qiuqiang Kong, Xubo Liu, Xinhao Mei, Wenwu Wang, Mark D. Plumbley:
Ontology-aware Learning and Evaluation for Audio Tagging. 3799-3803

Speech Coding and Enhancement 3

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShimJK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShimJK23
Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen:
Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio Anti-spoofing. 3804-3808
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LayWRG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LayWRG23
Bunlong Lay, Simon Welker, Julius Richter, Timo Gerkmann:
Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement. 3809-3813
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MullerSB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MullerSB23
Nicolas M. Müller, Philip Sperl, Konstantin Böttinger:
Complex-valued neural networks for voice anti-spoofing. 3814-3818
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RisteaISPGC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RisteaISPGC23
Nicolae-Catalin Ristea, Evgenii Indenbom, Ando Saabas, Tanel Pärnamaa, Jegor Guzvin, Ross Cutler:
DeepVQE: Real Time Deep Voice Quality Enhancement for Joint Acoustic Echo Cancellation, Noise Suppression and Dereverberation. 3819-3823
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SawataMTU0TM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SawataMTU0TM23
Ryosuke Sawata, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji:
Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement. 3824-3828
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimCHJK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimCHJK23
Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang:
HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders. 3829-3833
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuAL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuAL23
Ye-Xin Lu, Yang Ai, Zhen-Hua Ling:
MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra. 3834-3838
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YinZTXL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YinZTXL23
Dacheng Yin, Zhiyuan Zhao, Chuanxin Tang, Zhiwei Xiong, Chong Luo:
TridentSE: Guiding Speech Enhancement with 32 Global Tokens. 3839-3843
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangY0ZZC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangY0ZZC23
Chenglong Wang, Jiangyan Yi, Jianhua Tao, Chu Yuan Zhang, Shuai Zhang, Xun Chen:
Detection of Cross-Dataset Fake Audio Based on Prosodic and Pronunciation Features. 3844-3848
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DowerahKSJ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DowerahKSJ23
Sandipana Dowerah, Ajinkya Kulkarni, Romain Serizel, Denis Jouvet:
Self-supervised learning with Diffusion-based multichannel speech enhancement for speaker verification under noisy conditions. 3849-3853
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NespoliBBN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NespoliBBN23
Francesco Nespoli, Daniel Barreda, Jörg Bitzer, Patrick A. Naylor:
Two-Stage Voice Anonymization for Enhanced Privacy. 3854-3858
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001KZN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001KZN23
Ruilin Xu, Gurunandan Krishnan, Changxi Zheng, Shree K. Nayar:
Personalized Dereverberation of Speech. 3859-3863
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ThienWGIN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ThienWGIN23
Nguyen Binh Thien, Yukoh Wakabayashi, Yuting Geng, Kenta Iwai, Takanobu Nishiura:
Weighted Von Mises Distribution-based Loss Function for Real-time STFT Phase Reconstruction Using DNN. 3864-3868
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SchroterRE023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SchroterRE023
Hendrik Schröter, Tobias Rosenkranz, Alberto N. Escalante-B., Andreas Maier:
Deep Multi-Frame Filtering for Hearing Aids. 3869-3873
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiongBC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiongBC23
Yan Xiong, Visar Berisha, Chaitali Chakrabarti:
Aligning Speech Enhancement for Improving Downstream Classification Performance. 3874-3878
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimCS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimCS23
Minseung Kim, Sein Cheong, Jong Won Shin:
DNN-based Parameter Estimation for MVDR Beamforming and Post-filtering. 3879-3883
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0004Y23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0004Y23
Yi Luo, Jianwei Yu:
FRA-RIR: Fast Random Approximation of the Image-source Method. 3884-3888
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wu0X0W23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wu0X0W23
Haibin Wu, Ke Tan, Buye Xu, Anurag Kumar, Daniel Wong:
Rethinking Complex-Valued Deep Neural Networks for Monaural Speech Enhancement. 3889-3893
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeLCGHCXGXDSL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeLCGHCXGXDSL23
Xiaohuai Le, Tong Lei, Li Chen, Yiqing Guo, Chao He, Cheng Chen, Xianjun Xia, Hua Gao, Yijian Xiao, Piao Ding, Shenyi Song, Jing Lu:
Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model. 3894-3898

Spoken Language Translation, Information Retrieval, Summarization, Resources, and Evaluation 3

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangWLW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangWLW23
Zhihong Huang, Longyue Wang, Siyou Liu, Derek F. Wong:
How Does Pretraining Improve Discourse-Aware Translation? 3899-3903
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangWKES23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangWKES23
Ziji Zhang, Zhehui Wang, Rajesh Kamma, Sharanya Eswaran, Narayanan Sadagopan:
PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR Error Correction. 3904-3908
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TsengLL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TsengLL23
Shu-Chuan Tseng, Yi-Fen Liu, Xiang-Li Lu:
Model-assisted Lexical Tone Evaluation of three-year-old Chinese-speaking Children by also Considering Segment Production. 3909-3913
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TanMP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TanMP23
Yi Xuan Tan, Navonil Majumder, Soujanya Poria:
Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding. 3914-3918
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiH23
Qiang Li, Beibei Hu:
Joint Time and Frequency Transformer for Chinese Opera Classification. 3919-3923
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JungK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JungK23
Myunghun Jung, Hoirin Kim:
AdaMS: Deep Metric Learning with Adaptive Margin and Adaptive Scale for Acoustic Word Discrimination. 3924-3928
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ArvanDP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ArvanDP23
Mohammad Arvan, A. Seza Dogruöz, Natalie Parde:
Investigating Reproducibility at Interspeech Conferences: A Longitudinal and Comparative Perspective. 3929-3933
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BiswasBPB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BiswasBPB23
Astik Biswas, Abdelmoumene Boumadane, Stéphane Peillon, Gildas Bleas:
An Efficient Approach for the Automated Segmentation and Transcription of the People's Speech Sorpus. 3939-3943
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Lee23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Lee23
Shi-wook Lee:
Diverse Feature Mapping and Fusion via Multitask Learning for Multilingual Speech Emotion Recognition. 3944-3948
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaharGRZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaharGRZ23
Parnia Bahar, Mattia Di Gangi, Nick Rossenbach, Mohammad Zeineldeen:
Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text. 3949-3953
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinSCS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinSCS23
Pin-Jie Lin, Muhammed Saeed, Ernie Chang, Merel C. J. Scholman:
Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin. 3954-3958
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimJTNKA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimJTNKA23
Eesung Kim, Aditya Jajodia, Cindy Tseng, Divya Neelagiri, Taeyeon Ki, Vijendra Raj Apsingekar:
Efficient Adaptation of Spoken Language Understanding based on End-to-End Automatic Speech Recognition. 3959-3963
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeC23
Yong-Hyeok Lee, Namhyun Cho:
PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords. 3964-3968
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuCCHLZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuCCHLZ23
Zhihong Zhu, Xuxin Cheng, Dongsheng Chen, Zhiqi Huang, Hongxiang Li, Yuexian Zou:
Mix before Align: Towards Zero-shot Cross-lingual Sentiment Analysis via Soft-Mix and Multi-View Learning. 3969-3973
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PapiTN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PapiTN23
Sara Papi, Marco Turchi, Matteo Negri:
AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation. 3974-3978
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PolakY0WB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PolakY0WB23
Peter Polák, Brian Yan, Shinji Watanabe, Alex Waibel, Ondrej Bojar:
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff. 3979-3983
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SikasoteSMZPPZN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SikasoteSMZPPZN23
Claytone Sikasote, Kalinda Siaminwe, Stanly Mwape, Bangiwe Zulu, Mofya Phiri, Martin Phiri, David Zulu, Mayumbo Nyirenda, Antonios Anastasopoulos:
Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages. 3984-3988

Anti-Spoofing for Speaker Verification

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MunST0LSJHTLYEK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MunST0LSJHTLYEK23
Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md. Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas W. D. Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung:
Towards Single Integrated Spoofing-aware Speaker Verification Embeddings. 3989-3993
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0039YWG023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0039YWG023
Qing Wang, Jixun Yao, Ziqian Wang, Pengcheng Guo, Lei Xie:
Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification. 3994-3998
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0008ZG023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0008ZG023
Rui Liu, Jinhua Zhang, Guanglai Gao, Haizhou Li:
Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion. 3999-4003
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangZSW023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangZSW023
Xingming Wang, Bang Zeng, Hongbin Suo, Yulong Wan, Ming Li:
Robust Audio Anti-spoofing Countermeasure with Joint Training of Front-end and Back-end Models. 4004-4008
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KawaPCSS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KawaPCSS23
Piotr Kawa, Marcin Plata, Michal Czuba, Piotr Szymanski, Piotr Syga:
Improved DeepFake Detection Using Whisper Features. 4009-4013
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangXLWFS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangXLWFS23
Mengao Zhang, Ke Xu, Hao Li, Lei Wang, Chengfang Fang, Jie Shi:
DoubleDeceiver: Deceiving the Speaker Verification System Protected by Spoofing Countermeasures. 4014-4018

Speech Coding: Intelligibility

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PanchapagesanSN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PanchapagesanSN23
Sankaran Panchapagesan, Turaj Zakizadeh Shabestary, Arun Narayanan:
On Training a Neural Residual Acoustic Echo Suppressor for Improved ASR. 4019-4023
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LemercierTG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LemercierTG23
Jean-Marie Lemercier, Julian Tobergte, Timo Gerkmann:
Extending DNN-based Multiplicative Masking to Deep Subband Filtering for Improved Dereverberation. 4024-4028
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiangWZ023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiangWZ023
Wenbin Jiang, Fei Wen, Yifan Zhang, Kai Yu:
UnSE: Unsupervised Speech Enhancement Using Optimal Transport. 4029-4033
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0024RWLJHW023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0024RWLJHW023
Jun Chen, Wei Rao, Zilin Wang, Jiuxin Lin, Yukai Ju, Shulin He, Yannan Wang, Zhiyong Wu:
MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation. 4034-4038
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BartolewskaKK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BartolewskaKK23
Julitta Bartolewska, Stanislaw Kacprzak, Konrad Kowalczyk:
Causal Signal-Based DCCRN with Overlapped-Frame Prediction for Online Speech Enhancement. 4039-4043
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuS0RHLW023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuS0RHLW023
Wenzhe Liu, Yupeng Shi, Jun Chen, Wei Rao, Shulin He, Andong Li, Yannan Wang, Zhiyong Wu:
Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction. 4044-4048

Resources for Spoken Language Processing

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RyuminaRMK023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RyuminaRMK023
Elena Ryumina, Dmitry Ryumin, Maxim Markitantov, Heysem Kaya, Alexey Karpov:
Multimodal Personality Traits Assessment (MuPTA) Corpus: The Impact of Spontaneous and Read Speech. 4049-4053
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PudoWCKLJ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PudoWCKLJ23
Mikolaj Pudo, Mateusz Wosik, Adam Cieslak, Justyna Krzywdziak, Bozena Lukasiak, Artur Janicki:
MOCKS 1.0: Multilingual Open Custom Keyword Spotting Testset. 4054-4058
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EisensteinPRDS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EisensteinPRDS23
Jacob Eisenstein, Vinodkumar Prabhakaran, Clara Rivera, Dorottya Demszky, Devyani Sharma:
MD3: The Multi-Dialect Dataset of Dialogues. 4059-4063
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AnwarSGH0W23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AnwarSGH0W23
Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino, Changhan Wang:
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation. 4064-4068
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SuwanbanditNSC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SuwanbanditNSC23
Artit Suwanbandit, Burin Naowarat, Orathai Sangpetch, Ekapol Chuangsuwanich:
Thai Dialect Corpus and Transfer-based Curriculum Learning Investigation for Dialect Automatic Speech Recognition. 4069-4073
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiaoXYGWDK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiaoXYGWDK23
Cihan Xiao, Henry Li Xinyuan, Jinyi Yang, Dongji Gao, Matthew Wiesner, Kevin Duh, Sanjeev Khudanpur:
HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation. 4074-4078

New Computational Strategies for ASR Training and Inference

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BekalGMRBK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BekalGMRBK23
Dhanush Bekal, Karthik Gopalakrishnan, Karel Mundnich, Srikanth Ronanki, Sravan Bodapati, Katrin Kirchhoff:
A Metric-Driven Approach to Conformer Layer Pruning for Efficient ASR Inference. 4079-4083
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShivakumarKGGRB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShivakumarKGGRB23
Prashanth Gurunath Shivakumar, Jari Kolehmainen, Yile Gu, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko:
Distillation Strategies for Discriminative Speech Recognition Rescoring. 4084-4088
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PouthierPVBP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PouthierPVBP23
Baptiste Pouthier, Laurent Pilati, Giacomo Valenti, Charles Bouveyron, Frédéric Precioso:
Another Point of View on Visual Speech Recognition. 4089-4093
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0043BBSN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0043BBSN23
Wei Zhou, Eugen Beck, Simon Berger, Ralf Schlüter, Hermann Ney:
RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition. 4094-4098
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FilimonovPRGS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FilimonovPRGS23
Denis Filimonov, Prabhat Pandey, Ariya Rastrow, Ankur Gandhe, Andreas Stolcke:
Streaming Speech-to-Confusion Network Speech Recognition. 4099-4103
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiangZLWCC0LYQ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiangZLWCC0LYQ23
Huiqiang Jiang, Li Lyna Zhang, Yuang Li, Yu Wu, Shijie Cao, Ting Cao, Yuqing Yang, Jinyu Li, Mao Yang, Lili Qiu:
Accurate and Structured Pruning for Efficient Automatic Speech Recognition. 4104-4108

MERLIon CCS Challenge: Multilingual Everyday Recordings - Language Identification On Code-Switched Child-Directed Speech

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChuaLGWWZKKDS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChuaLGWWZKKDS23
Yi Han Victoria Chua, Hexin Liu, Leibny Paola García, Fei Ting Woon, Jinyi Wong, Xiangyu Zhang, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels, Suzy J. Styles:
MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization. 4109-4113
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuptaHK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuptaHK23
Shashi Kant Gupta, Sushant Hiray, Prashant Kukde:
Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech. 4114-4118
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShahinNSA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShahinNSA23
Mostafa Shahin, Zheng Nan, Vidhyasaharan Sethu, Beena Ahmed:
Improving wav2vec2-based Spoken Language Identification by Learning Phonological Features. 4119-4123
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PraveenRSPS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PraveenRSPS23
Kiran Praveen, Balaji Radhakrishnan, Kamini Sabu, Abhishek Pandey, Mahaboob Ali Basha Shaik:
Language Identification Networks for Multilingual Everyday Recordings. 4124-4128
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/StylesCWLGKKD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/StylesCWLGKKD23
Suzy J. Styles, Yi Han Victoria Chua, Fei Ting Woon, Hexin Liu, Leibny Paola García, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels:
Investigating model performance in language identification: beyond simple error statistics. 4129-4133

Health-Related Speech Analysis

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KodaliKA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KodaliKA23
Manila Kodali, Sudarsana Reddy Kadiri, Paavo Alku:
Classification of Vocal Intensity Category from Speech using the Wav2vec2 and Whisper Embeddings. 4134-4138
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KathanTAMGHMSSM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KathanTAMGHMSSM23
Alexander Kathan, Andreas Triantafyllopoulos, Shahin Amiriparian, Sabrina Milkus, Alexander Gebhard, Jonas Hohmann, Pauline Muderlak, Jürgen Schottdorf, Björn W. Schuller, Richard Musil:
The effect of clinical intervention on the speech of individuals with PTSD: features and recognition performances. 4139-4143
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Triantafyllopoulos23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Triantafyllopoulos23
Andreas Triantafyllopoulos, Alexander Gebhard, Alexander Kathan, Maurice Gerczuk, Shahin Amiriparian, Björn W. Schuller:
Analysis and automatic prediction of exertion from speech: Contrasting objective and subjective measures collected while running. 4144-4148
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TaoEV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TaoEV23
Fuxiang Tao, Anna Esposito, Alessandro Vinciarelli:
The Androids Corpus: A New Publicly Available Benchmark for Speech Based Depression Detection. 4149-4153
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EniDZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EniDZ23
Marina Eni, Ilan Dinstein, Yaniv Zigel:
Comparing Hand-Crafted Features to Spectrograms for Autism Severity Estimation. 4154-4158
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MijndersJNT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MijndersJNT23
Carmen Mijnders, Esther Janse, Paul Naarding, Khiet P. Truong:
Acoustic characteristics of depression in older adults' speech: the role of covariates. 4159-4163

Automatic Audio Classification and Audio Captioning

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunLMKP023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunLMKP023
Jianyuan Sun, Xubo Liu, Xinhao Mei, Volkan Kiliç, Mark D. Plumbley, Wenwu Wang:
Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning. 4164-4168
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PellegriniHLM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PellegriniHLM23
Thomas Pellegrini, Ismail Khalfaoui Hassani, Etienne Labbé, Timothée Masquelier:
Adapting a ConvNeXt Model to Audio Classification on AudioSet. 4169-4173
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiCLXH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiCLXH23
Yanxiong Li, Wenchang Cao, Jialong Li, Wei Xie, Qianhua He:
Few-shot Class-incremental Audio Classification Using Stochastic Classifier. 4174-4178
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XieXW023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XieXW023
Zeyu Xie, Xuenan Xu, Mengyue Wu, Kai Yu:
Enhance Temporal Relations in Audio Captioning with Sound Event Detection. 4179-4183

Speech Perception, Production, and Acquisition 2

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhang23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhang23
Sijia Zhang:
First Language Effects on Second Language Perception: Evidence from English Low-vowel Nasal Sequences Perceived by L1 Mandarin Chinese Listeners. 4184-4188
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaityXPSFWF23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaityXPSFWF23
Ursa Maity, Fangxu Xing, Jerry L. Prince, Maureen Stone, Georges El Fakhri, Jonghye Woo, Sidney Fels:
Motor Control Similarity Between Speakers Saying "A Souk" Using Inverse Atlas Tongue Modeling. 4189-4193
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangMW0DT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangMW0DT23
Zhiyi Wang, Shaoguang Mao, Wenshan Wu, Yan Xia, Yan Deng, Jonathan Tien:
Assessing Phrase Break of ESL Speech with Pre-trained Language Models and Large Language Models. 4194-4198
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoshinagaAI23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoshinagaAI23
Tsukasa Yoshinaga, Takayuki Arai, Akiyoshi Iida:
A Relationship Between Vocal Fold Vibration and Droplet Production. 4199-4203
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Garnier23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Garnier23
Maeva Garnier:
Audio, Visual and Audiovisual intelligibility of vowels produced in noise. 4204-4208
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ElieST23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ElieST23
Benjamin Elie, Juraj Simko, Alice Turk:
Optimal control of speech with context-dependent articulatory targets. 4209-4213
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChengC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChengC23
Tzu-Han Zoe Cheng, Paul Calamia:
Computational modeling of auditory brainstem responses derived from modified speech. 4214-4218
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangZCF0W023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangZCF0W023
Peiying Wang, Sunlu Zeng, Junqing Chen, Lu Fan, Meng Chen, Youzheng Wu, Xiaodong He:
Leveraging Label Information for Multimodal Emotion Recognition. 4219-4223
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TanFWGLC0W023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TanFWGLC0W023
Fengyun Tan, Chaofeng Feng, Tao Wei, Shuai Gong, Jinqiang Leng, Wei Chu, Jun Ma, Shaojun Wang, Jing Xiao:
Improving End-to-End Modeling For Mandarin-English Code-Switching Using Lightweight Switch-Routing Mixture-of-Experts. 4224-4228
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhang0ZCHW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhang0ZCHW23
Zhao Zhang, Ju Zhang, Ziyu Zhu, Yujie Chi, Kiyoshi Honda, Jianguo Wei:
Frequency Patterns of Individual Speaker Characteristics at Higher and Lower Spectral Ranges. 4229-4233
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Berthelsen23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Berthelsen23
Sabine Gosselke Berthelsen:
Adaptation to predictive prosodic cues in non-native standard dialect. 4234-4238
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Archer-Boyd023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Archer-Boyd023
Alan Archer-Boyd, Rainer Martin:
Head movements in two- and four-person interactive conversational tasks in noisy and moderately reverberant conditions. 4239-4243
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenQCC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenQCC23
Juqiang Chen, Ailing Qin, Hui Chang, Hua Chen:
Second language identification of Vietnamese tones by native Mandarin learners. 4244-4248
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FagniartDCHHPH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FagniartDCHHPH23
Sophie Fagniart, Véronique Delvaux, Brigitte Charlier, Bernard Harmegnies, Anne Huberlant, Myriam Piccaluga, Kathy Huet:
Nasal vowel production and grammatical processing in French-speaking children with cochlear implants and normal-hearing peers. 4249-4253
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangWC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangWC23
Zechen Zhang, Xihong Wu, Jing Chen:
Emotion Classification with EEG Responses Evoked by Emotional Prosody of Speech. 4254-4258
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiTBB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiTBB23
Yanping Li, Michael D. Tyler, Denis Burnham, Catherine T. Best:
L2-Mandarin regional accent variability during Mandarin tone-word training facilitates English listeners' subsequent tone categorizations. 4259-4263
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UedaTSTS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UedaTSTS23
Yota Ueda, Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Hiroshi Saruwatari:
HumanDiffusion: diffusion model using perceptual gradients. 4264-4268
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KachelPN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KachelPN23
Sven Kachel, Manuel Pöhlmann, Christine Nussbaum:
Queer Events, Relationships, and Sports: Does Topic Influence Speakers' Acoustic Expression of Sexual Orientation? 4269-4273

Speech Synthesis

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuenasonFB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuenasonFB23
Jón Guðnason, Guolin Fang, Mike Brookes:
Epoch-Based Spectrum Estimation for Speech. 4274-4278
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MehtaKLBSH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MehtaKLBSH23
Shivam Mehta, Ambika Kirkland, Harm Lameris, Jonas Beskow, Éva Székely, Gustav Eje Henter:
OverFlow: Putting flows on top of neural transducers for better TTS. 4279-4283
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MehrishKLMP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MehrishKLMP23
Ambuj Mehrish, Abhinav Ramesh Kashyap, Yingting Li, Navonil Majumder, Soujanya Poria:
ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation. 4284-4288
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiKKC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiKKC23
Won-Gook Choi, So-Jeong Kim, Tae-Ho Kim, Joon-Hyuk Chang:
Prior-free Guided TTS: An Improved and Efficient Diffusion-based Text-Guided Speech Synthesis. 4289-4293
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IashchenkoASBV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IashchenkoASBV23
Anastasiia Iashchenko, Pavel Andreev, Ivan Shchekotov, Nicholas Babaev, Dmitry P. Vetrov:
UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model. 4294-4298
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoonKSYK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoonKSYK23
Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang:
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech. 4299-4303
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuanLLHHL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuanLLHHL23
Wenhao Guan, Tao Li, Yishuang Li, Hukai Huang, Qingyang Hong, Lin Li:
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge. 4304-4308
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KogelNC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KogelNC23
Fabian Kögel, Bac Nguyen, Fabien Cardinaux:
Towards Robust FastSpeech 2 by Modelling Residual Multimodality. 4309-4313
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RybakovTLJZB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RybakovTLJZB23
Oleg Rybakov, Marco Tagliasacchi, Yunpeng Li, Liyang Jiang, Xia Zhang, Fadi Biadsy:
Real time spectrogram inversion on mobile phone. 4314-4318
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkKO23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkKO23
Seongyeon Park, Bohyung Kim, Tae-Hyun Oh:
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis. 4319-4323
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WellsRL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WellsRL23
Dan Wells, Korin Richmond, William Lamb:
A Low-Resource Pipeline for Text-to-Speech from Found Data With Application to Scottish Gaelic. 4324-4328
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KrugBGNX023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KrugBGNX023
Paul Konstantin Krug, Peter Birkholz, Branislav Gerazov, Daniel R. van Niekerk, Anqi Xu, Yi Xu:
Self-Supervised Solution to the Control Problem of Articulatory Synthesis. 4329-4333
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeBMLLC023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeBMLLC023
Joun Yeop Lee, Jae-Sung Bae, Seongkyu Mun, Jihwan Lee, Ji-Hyun Lee, Hoon-Young Cho, Chanwoo Kim:
Hierarchical Timbre-Cadence Speaker Encoder for Zero-shot Speech Synthesis. 4334-4338
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KangHHY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KangHHY23
Minki Kang, Wooseok Han, Sung Ju Hwang, Eunho Yang:
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models. 4339-4343
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DuLQL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuLQL23
Muyang Du, Chuan Liu, Jiaxing Qi, Junjie Lai:
Improving WaveRNN with Heuristic Dynamic Blending for Fast and High-Quality GPU Vocoding. 4344-4348
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiKR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiKR23
Jeongsoo Choi, Minsu Kim, Yong Man Ro:
Intelligible Lip-to-Speech Synthesis with Speech Units. 4349-4353
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangYC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangYC23
Li-Jen Yang, Chao-Han Huck Yang, Jen-Tzung Chien:
Parameter-Efficient Learning for Text-to-Speech Accent Adaptation. 4354-4358
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KhanWVW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KhanWVW23
Ziya Khan, Lovisa Wihlborg, Cassia Valentini-Botinhao, Oliver Watts:
Controlling formant frequencies with neural text-to-speech for the manipulation of perceived speaker age. 4359-4363
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JangLP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JangLP23
Won Jang, Dan Lim, Heayoung Park:
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs. 4364-4368
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KanekoKTS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KanekoKTS23
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki:
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN. 4369-4373
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KongPKKKK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KongPKKKK23
Jungil Kong, Jihoon Park, Beomjeong Kim, Jeongmin Kim, Dohee Kong, Sangjin Kim:
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design. 4374-4378
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuongY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuongY23
Hieu-Thi Luong, Junichi Yamagishi:
Controlling Multi-Class Human Vocalization Generation via a Simple Segment-based Labeling Scheme. 4379-4383

Speech Recognition: Signal Processing, Acoustic Modeling, Robustness, Adaptation 4

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BhogaleSRJKK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BhogaleSRJKK23
Kaushal Santosh Bhogale, Sai Sundaresan, Abhigyan Raman, Tahir Javed, Mitesh M. Khapra, Pratyush Kumar:
Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR. 4384-4388
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DoDLH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DoDLH23
Cong-Thanh Do, Rama Doddipatla, Mohan Li, Thomas Hain:
Domain Adaptive Self-supervised Training of Automatic Speech Recognition. 4389-4393
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OlivierR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OlivierR23
Raphaël Olivier, Bhiksha Raj:
There is more than one kind of robustness: Fooling Whisper with adversarial examples. 4394-4398
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HegganHBY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HegganHBY23
Calum Heggan, Timothy M. Hospedales, Sam Budgett, Mehrdad Yaghoobi:
MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations. 4399-4403
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenCPNM023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenCPNM023
William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti, Shinji Watanabe:
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute. 4404-4408
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangYGY0KL0P23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangYGY0KL0P23
Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang, Fangjun Kuang, Long Lin, Xie Chen, Daniel Povey:
Blank-regularized CTC for Frame Skipping in Neural Transducer. 4409-4413
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JayakumarSAU23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JayakumarSAU23
Kaousheik Jayakumar, Vrunda N. Sukhadia, Arun Kumar A, Srinivasan Umesh:
The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR. 4414-4418
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UnniMJS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UnniMJS23
Vinit S. Unni, Ashish R. Mittal, Preethi Jyothi, Sunita Sarawagi:
Improving RNN-Transducers with Acoustic LookAhead. 4419-4423
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MarklL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MarklL23
Nina Markl, Catherine Lai:
Everyone has an accent. 4424-4427
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaisonE23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaisonE23
Lucas Maison, Yannick Estève:
Some Voices are Too Common: Building Fair Speech Recognition Systems Using the CommonVoice Dataset. 4428-4432
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangGK0XZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangGK0XZ23
Yuhao Zhang, Chenghao Gao, Kaiqi Kou, Chen Xu, Tong Xiao, Jingbo Zhu:
Information Magnitude Based Dynamic Sub-sampling for Speech-to-text. 4433-4437

Keynote 3

- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/Grice23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Grice23
Martine Grice:
What's in a Rise? The Relevance of Intonation for Attention Orienting. 4438

Speech Synthesis: Controllability and Adaptation

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeCOL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeCOL23
Sang-Hoon Lee, Ha-Yeong Choi, Hyung-Seok Oh, Seong-Whan Lee:
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer. 4439-4443
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangXL0GZG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangXL0GZG23
Yongmao Zhang, Heyang Xue, Hanzhao Li, Lei Xie, Tingwei Guo, Ruixiong Zhang, Caixia Gong:
VISinger2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer. 4444-4448
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaHWHF23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaHWHF23
Youneng Ma, Junyi He, Meimei Wu, Guangyue Hu, Haojun Fei:
EdenTTS: A Simple and Efficient Parallel Text-to-speech Architecture with Collaborative Duration-alignment Learning. 4449-4453
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wang0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wang0023
Wenbin Wang, Yang Song, Sanjay Jha:
Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations. 4454-4458
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MontesinosMHT023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MontesinosMHT023
Juan Felipe Montesinos, Daniel Michelsanti, Gloria Haro, Zheng-Hua Tan, Jesper Jensen:
Speech inpainting: Context-based speech synthesis guided by video. 4459-4463
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TranLS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TranLS23
Chung Tran, Chi Mai Luong, Sakriani Sakti:
STEN-TTS: Improving Zero-shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework. 4464-4468

Search Methods and Decoding Algorithms for ASR

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KanoS023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KanoS023
Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura:
Average Token Delay: A Latency Metric for Simultaneous Translation. 4469-4473
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QianZW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QianZW23
Yukun Qian, Xuyi Zhuang, Mingjiang Wang:
Automatic Speech Recognition Transformer with Global Contextual Information Decoder. 4474-4478
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Sudo0P023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Sudo0P023
Yui Sudo, Muhammad Shakeel, Yifan Peng, Shinji Watanabe:
Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training. 4479-4483
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PraveenDPR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PraveenDPR23
Kiran Praveen, Advait Vinay Dhopeshwarkar, Abhishek Pandey, Balaji Radhakrishnan:
Prefix Search Decoding for RNN Transducers. 4484-4488
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BainHHZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BainHHZ23
Max Bain, Jaesung Huh, Tengda Han, Andrew Zisserman:
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio. 4489-4493
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NigmatulinaMVMZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NigmatulinaMVMZ23
Iuliia Nigmatulina, Srikanth R. Madikeri, Esaú Villatoro-Tello, Petr Motlícek, Juan Zuluaga-Gomez, Karthik Pandia, Aravind Ganapathiraju:
Implementing Contextual Biasing in GPU Decoder for Online ASR. 4494-4498

Speech Signal Analysis

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChungKCK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChungKCK23
Woo-Jin Chung, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang:
MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion. 4499-4503
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AttiaTE23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AttiaTE23
Ahmed Adel Attia, Mark Tiede, Carol Y. Espy-Wilson:
Enhancing Speech Articulation Analysis Using A Geometric Transformation of the X-ray Microbeam Dataset. 4504-4507
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JouaitiKV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JouaitiKV23
Mélanie Jouaiti, Pippa Kirby, Ravi Vaidyanathan:
Matching Acoustic and Perceptual Measures of Phonation Assessment in Disordered Speech - A Case Study. 4508-4512
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuanC023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuanC023
Jiahong Yuan, Xingyu Cai, Kenneth Church:
Improved Contextualized Speech Representations for Tonal Analysis. 4513-4517
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChandrasekarRPG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChandrasekarRPG23
Siddarth Chandrasekar, Arvind Ramesh, Tilak Purohit, Prasanta Kumar Ghosh:
A Study on the Importance of Formant Transitions for Stop-Consonant Classification in VCV Sequence. 4518-4522
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ErenTA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ErenTA23
Eray Eren, Lee Ngee Tan, Abeer Alwan:
FusedF0: Improving DNN-based F0 Estimation by Fusion of Summary-Correlograms and Raw Waveform Representations of Speech Signals. 4523-4527

Speech Emotion Recognition 3

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KyungSCJC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KyungSCJC23
Jehyun Kyung, Ju-Seok Seong, Jeong-Hwan Choi, Ye-Rin Jeoung, Joon-Hyuk Chang:
Improving Joint Speech and Emotion Recognition Using Global Style Tokens. 4528-4532
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NagaseFY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NagaseFY23
Ryotaro Nagase, Takahiro Fukumori, Yoichi Yamashita:
Speech Emotion Recognition by Estimating Emotional Label Sequences with Phoneme Class Attribute. 4533-4537
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Jiang0LZZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Jiang0LZZ23
Shenjie Jiang, Peng Song, Shaokai Li, Keke Zhao, Wenming Zheng:
Unsupervised Transfer Components Learning for Cross-Domain Speech Emotion Recognition. 4538-4542
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PrisayadFSDF23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PrisayadFSDF23
Darshana Prisayad, Tharindu Fernando, Sridha Sridharan, Simon Denman, Clinton Fookes:
Dual Memory Fusion for Multimodal Speech Emotion Recognition. 4543-4547
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KondratenkoKSSK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KondratenkoKSSK23
Vladimir Kondratenko, Nikolay Karpov, Artem Sokolov, Nikita Savushkin, Oleg Kutuzov, Fyodor Minkin:
Hybrid Dataset for Speech Emotion Recognition in Russian Language. 4548-4552
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HsuWW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HsuWW23
Jia-Hao Hsu, Chung-Hsien Wu, Yu-Hung Wei:
Speech Emotion Recognition using Decomposed Speech via Multi-task Learning. 4553-4557

Connecting Speech-science and Speech-technology for Children's Speech

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BenwayP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BenwayP23
Nina R. Benway, Jonathan L. Preston:
Prospective Validation of Motor-Based Intervention with Automated Mispronunciation Detection of Rhotics in Residual Speech Sound Disorders. 4558-4562
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BenwayPSXSB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BenwayPSXSB23
Nina R. Benway, Jonathan L. Preston, Asif Salekin, Yi Xiao, Harshit Sharma, Tara McAllister Byun:
Classifying Rhoticity of /ɹ/ in Speech Sound Disorder using Age-and-Sex Normalized Formants. 4563-4567
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BenwaySPHBE23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BenwaySPHBE23
Nina R. Benway, Yashish M. Siriwardena, Jonathan L. Preston, Elaine Hitchcock, Tara McAllister Byun, Carol Y. Espy-Wilson:
Acoustic-to-Articulatory Speech Inversion Features for Mispronunciation Detection of /ɹ/ in Child Speech Sound Disorders. 4568-4572
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PitonHPCMB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PitonHPCMB23
Timothy Piton, Enno Hermann, Angela Pasqualotto, Marjolaine Cohen, Mathew Magimai-Doss, Daphne Bavelier:
Using Commercial ASR Solutions to Assess Reading Skills in Children: A Case Report. 4573-4577
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GebauerRELO23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GebauerRELO23
Christopher Gebauer, Lars Rumberg, Hanna Ehlert, Ulrike Lüdtke, Jörn Ostermann:
Exploiting Diversity of Automatic Transcripts from Distinct Speech Recognition Techniques for Children's Speech. 4578-4582
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RumbergGEWLO23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RumbergGEWLO23
Lars Rumberg, Christopher Gebauer, Hanna Ehlert, Maren Wallbaum, Ulrike Lüdtke, Jörn Ostermann:
Uncertainty Estimation for Connectionist Temporal Classification Based Automatic Speech Recognition. 4583-4587
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LavechinSTBRBDC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LavechinSTBRBDC23
Marvin Lavechin, Yaya Sy, Hadrien Titeux, María Andrea Cruz Blandón, Okko Räsänen, Hervé Bredin, Emmanuel Dupoux, Alejandrina Cristià:
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models. 4588-4592
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoSWK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoSWK23
Shuyang Zhao, Mittul Singh, Abraham Woubie, Reima Karhila:
Data augmentation for children ASR and child-adult speaker classification using voice conversion methods. 4593-4597
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShettyLA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShettyLA23
Vishwas M. Shetty, Steven M. Lulich, Abeer Alwan:
Developmental Articulatory and Acoustic Features for Six to Ten Year Old Children. 4598-4602
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangCCKBWDS0P23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangCCKBWDS0P23
Yahan Yang, Sunghye Cho, Maxine Covello, Azia Knox, Osbert Bastani, James Weimer, Edgar Dobriban, Robert T. Schultz, Insup Lee, Julia Parish-Morris:
Automatically Predicting Perceived Conversation Quality in a Pediatric Sample Enriched for Autism. 4603-4607
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JohnsonVSA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JohnsonVSA23
Alexander Johnson, Hariram Veeramani, Natarajan Balaji Shankar, Abeer Alwan:
An Equitable Framework for Automatically Assessing Children's Oral Narrative Language Abilities. 4608-4612
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CaoFSS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CaoFSS23
Xinwei Cao, Zijian Fan, Torbjørn Svendsen, Giampiero Salvi:
An Analysis of Goodness of Pronunciation for Child Speech. 4613-4617
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SyHLDC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SyHLDC23
Yaya Sy, William N. Havard, Marvin Lavechin, Emmanuel Dupoux, Alejandrina Cristià:
Measuring Language Development From Child-centered Recordings. 4618-4622
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HungPA0N23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HungPA0N23
Hiuching Hung, Paula Andrea Pérez-Toro, Tomás Arias-Vergara, Andreas Maier, Elmar Nöth:
Speaking Clearly, Understanding Better: Predicting the L2 Narrative Comprehension of Chinese Bilingual Kindergarten Children Based on Speech Intelligibility Using a Machine Learning Approach. 4623-4627
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CharuauVS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CharuauVS23
Delphine Charuau, Béatrice Vaxelaire, Rudolph Sock:
Speech Breathing Behavior During Pauses in Children. 4628-4632
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuHLFBSTN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuHLFBSTN23
Anfeng Xu, Rajat Hebbar, Rimita Lahiri, Tiantian Feng, Lindsay Butler, Lue Shen, Helen Tager-Flusberg, Shrikanth Narayanan:
Understanding Spoken Language Development of Children with ASD Using Pre-trained Speech Embeddings. 4633-4637
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Arias-VergaraLP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Arias-VergaraLP23
Tomás Arias-Vergara, Elizabeth Londoño-Mora, Paula Andrea Pérez-Toro, Maria Schuster, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas Maier:
Measuring Phonological Precision in Children with Cleft Lip and Palate. 4638-4642
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NgNL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NgNL23
Si Ioi Ng, Cymie Wing-Yee Ng, Tan Lee:
A Study on Using Duration and Formant Features in Automatic Detection of Speech Sound Disorder in Children. 4643-4647
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaumannWBBNRB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaumannWBBNRB23
Ilja Baumann, Dominik Wagner, Franziska Braun, Sebastian P. Bayerl, Elmar Nöth, Korbinian Riedhammer, Tobias Bocklet:
Influence of Utterance and Speaker Characteristics on the Classification of Children with Cleft Lip and Palate. 4648-4652

Dialog Management

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaKGGJCP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaKGGJCP23
Mingyu Derek Ma, Jiun-Yu Kao, Shuyang Gao, Arpit Gupta, Di Jin, Tagyoung Chung, Nanyun Peng:
Parameter-Efficient Low-Resource Dialogue State Tracking by Prompt Tuning. 4653-4657
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/McNeillL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/McNeillL23
Matthew McNeill, Rivka Levitan:
An Autoregressive Conversational Dynamics Model for Dialogue Systems. 4658-4662
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoriPHLOJCJRR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoriPHLOJCJRR23
Chiori Hori, Puyuan Peng, David Harwath, Xinyu Liu, Kei Ota, Siddarth Jain, Radu Corcodel, Devesh K. Jha, Diego Romeres, Jonathan Le Roux:
Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos. 4663-4667
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SoltauSWRZJ00M23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SoltauSWRZJ00M23
Hagen Soltau, Izhak Shafran, Mingqiu Wang, Abhinav Rastogi, Jeffrey Zhao, Ye Jia, Wei Han, Yuan Cao, Aramys Miranda:
Speech Aware Dialog System Technology Challenge (DSTC11). 4668-4672
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CaiLOHF23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CaiLOHF23
Yucheng Cai, Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng:
Knowledge-Retrieval Task-Oriented Dialog Systems with Semi-Supervision. 4673-4677
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeL0L23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeL0L23
Jihyun Lee, Chaebin Lee, Yunsu Kim, Gary Geunbae Lee:
Tracking Must Go On : Dialogue State Tracking with Verified Self-Training. 4678-4682

Speaker Recognition 2

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangWWLW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangWWLW23
Jiaying Wang, Xianglong Wang, Namin Wang, Lantian Li, Dong Wang:
Ordered and Binary Speaker Embedding. 4683-4687
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kataria0MTD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kataria0MTD23
Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Thomas Thebaud, Najim Dehak:
Self-FiLM: Conditioning GANs with self-supervised representations for bandwidth extension based speaker recognition. 4688-4692
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeoJKKLKC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeoJKKLKC23
Hee-Soo Heo, Jee-weon Jung, Jingu Kang, Youngki Kwon, Bong-Jin Lee, You Jin Kim, Joon Son Chung:
Curriculum Learning for Self-supervised Speaker Verification. 4693-4697
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangG023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangG023
Ziyang Zhang, Wu Guo, Bin Gu:
Introducing Self-Supervised Phonetic Information for Text-Independent Speaker Verification. 4698-4702
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Cord-LandwehrBZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Cord-LandwehrBZ23
Tobias Cord-Landwehr, Christoph Böddeker, Catalin Zorila, Rama Doddipatla, Reinhold Haeb-Umbach:
A Teacher-Student Approach for Extracting Informative Speaker Embeddings From Speech Mixtures. 4703-4707
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LepageD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LepageD23
Théo Lepage, Réda Dehak:
Experimenting with Additive Margins for Contrastive Self-Supervised Speaker Verification. 4708-4712

Phonetics, Phonology, and Prosody 2

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HopeWL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HopeWL23
Maxwell Hope, Charlotte Ward, Jason Lilley:
Nonbinary American English speakers encode gender in vowel acoustics. 4713-4717
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SharpFL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SharpFL23
Jared Sharp, Matthew Faytak, Hasutai Fei Xiong Liu:
Coarticulation of Sibe Vowels and Dorsal Fricatives in Spontaneous Speech: An Acoustic Study. 4718-4722
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BrownKC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BrownKC23
Georgina Brown, Christin Kirchhübel, Ramiz Cuthbert:
Using speech synthesis to explain automatic speaker recognition: a new application of synthetic speech. 4723-4727
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Huang23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Huang23
Yishan Huang:
Same F0, Different Tones: A Multidimensional Investigation of Zhangzhou Tones. 4728-4732
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EnglishKC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EnglishKC23
Patrick Cormac English, John D. Kelleher, Julie Carson-Berndsen:
Discovering Phonetic Feature Event Patterns in Transformer Embeddings. 4733-4737
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangG23
Zihan Wang, Christer Gobl:
A System for Generating Voice Source Signals that Implements the Transformed LF-model Parameter Control. 4738-4742
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SiriwardenaEBTO23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SiriwardenaEBTO23
Yashish M. Siriwardena, Carol Y. Espy-Wilson, Suzanne Boyce, Mark Tiede, Liran Oren:
Speaker-independent Speech Inversion for Estimation of Nasalance. 4743-4747
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuFZL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuFZL23
Yiying Hu, Hui Feng, Qinghua Zhao, Aijun Li:
Effects of Tonal Coarticulation and Prosodic Positions on Tonal Contours of Low Rising Tones: In the Case of Xiamen Dialect. 4748-4752
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Issa23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Issa23
Amel Issa:
Durational and Non-durational Correlates of Lexical and Derived Geminates in Arabic. 4753-4757
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Rao23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Rao23
Ashwin Rao:
Mapping Phonemes to Acoustic Symbols and Codes Using Synchrony in Speech Modulation Vectors Estimated by the Travellingwave Filter Bank. 4758-4762
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GeXD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GeXD23
Lindun Ge, Min Xu, Hongwei Ding:
Rhythmic Characteristics of L2 German Speech by Advanced Chinese Learners. 4763-4767
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KeltererZS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KeltererZS23
Anneliese Kelterer, Margaret Zellers, Barbara Schuppler:
(Dis)agreement and Preference Structure are Reflected in Matching Along Distinct Acoustic-prosodic Features. 4768-4772
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Christodoulidou23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Christodoulidou23
Polychronia Christodoulidou, Katerina Nicolaidis, Dimitrios Stamovlasis:
Vowel reduction by Greek-speaking children: The effect of stress and word length. 4773-4777
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LennesT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LennesT23
Mietta Lennes, Minnaleena Toivola:
Pitch distributions in a very large corpus of spontaneous Finnish speech. 4778-4782
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KuderaZEEHW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KuderaZEEHW23
Jacek Kudera, Katharina Zahner-Ritter, Jakob Engel, Nathalie Elsässer, Philipp Hutmacher, Carolin Worstbrock:
Speech Enhancement Patterns in Human-Robot Interaction: A Cross-Linguistic Perspective. 4783-4787

Speech Synthesis: Expressivity

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuxTMV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuxTMV23
Florian Lux, Pascal Tilli, Sarina Meyer, Ngoc Thang Vu:
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions. 4788-4792
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiGXK023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiGXK023
Ruishan Li, Yingming Gao, Yanlu Xie, Dengfeng Ke, Jinsong Zhang:
Dual Audio Encoders Based Mandarin Prosodic Boundary Prediction by Using Multi-Granularity Prosodic Representations. 4793-4797
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangLWYWZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangLWYWZ23
Dongchao Yang, Songxiang Liu, Helin Wang, Jianwei Yu, Chao Weng, Yuexian Zou:
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS. 4798-4802
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangSYZW023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangSYZW023
Ya-Jie Zhang, Wei Song, Yanghao Yue, Zhengchen Zhang, Youzheng Wu, Xiaodong He:
MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy. 4803-4807
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KalyanRJB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KalyanRJB23
Tankala Pavan Kalyan, Preeti Rao, Preethi Jyothi, Pushpak Bhattacharyya:
Narrator or Character: Voice Modulation in an Expressive Multi-speaker TTS. 4808-4812
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CuiWZZC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CuiWZZC23
Yuhao Cui, Xiongwei Wang, Zhongzhou Zhao, Wei Zhou, Haiqing Chen:
CASEIN: Cascading Explicit and Implicit Control for Fine-grained Emotion Intensity Regulation. 4813-4817
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OhLHL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OhLHL23
Yoori Oh, Juheon Lee, Yoseob Han, Kyogu Lee:
Semi-supervised Learning for Continuous Emotional Intensity Controllable Speech Synthesis with Disentangled Representations. 4818-4822
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NguyenHDSGFRCSH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NguyenHDSGFRCSH23
Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarandi, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux:
Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis. 4823-4827
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangXWS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangXWS23
Yuyue Wang, Huan Xiao, Yihan Wu, Ruihua Song:
ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios. 4828-4832
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KunesovaM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KunesovaM23
Marie Kunesová, Jindrich Matousek:
Neural Speech Synthesis with Enriched Phrase Boundaries. 4833-4837
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SwiatkowskiWBTV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SwiatkowskiWBTV23
Jakub Swiatkowski, Duo Wang, Mikolaj Babianski, Patrick Lumban Tobing, Ravichander Vipperla, Vincent Pollet:
Cross-lingual Prosody Transfer for Expressive Machine Dubbing. 4838-4842
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ElmersOS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ElmersOS23
Mikey Elmers, Johannah O'Mahony, Éva Székely:
Synthesis after a couple PINTs: Investigating the Role of Pause-Internal Phonetic Particles in Speech Synthesis and Perception. 4843-4847
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GenevaSGTGM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GenevaSGTGM23
Diana Geneva, Georgi Shopov, Kostadin Garov, Maria Todorova, Stefan Gerdjikov, Stoyan Mihov:
Accentor: An Explicit Lexical Stress Model for TTS Systems. 4848-4852
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShechtmanF23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShechtmanF23
Slava Shechtman, Raul Fernandez:
A Neural TTS System with Parallel Prosody Transfer from Unseen Speakers. 4853-4857
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0105LL0WM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0105LL0WM23
Xiang Li, Songxiang Liu, Max W. Y. Lam, Zhiyong Wu, Chao Weng, Helen Meng:
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model. 4858-4862
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangLLW0S023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangLLW0S023
Zhihan Yang, Shansong Liu, Xu Li, Haozhe Wu, Zhiyong Wu, Ying Shan, Jia Jia:
Prosody Modeling with 3D Visual Information for Expressive Video Dubbing. 4863-4867
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wu0W23a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wu0W23a
Jie Wu, Jian Luan, Yujun Wang:
LightClone: Speaker-guided Parallel Subnet Selection for Few-shot Voice Cloning. 4868-4872
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhongZLSD0S23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhongZLSD0S23
Yi Zhong, Chen Zhang, Xule Liu, Chenxi Sun, Weishan Deng, Haifeng Hu, Zhongqian Sun:
EE-TTS: Emphatic Expressive TTS with Linguistic Information. 4873-4877
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OgunC023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OgunC023
Sewade Ogun, Vincent Colotte, Emmanuel Vincent:
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS. 4878-4882
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiaoZ000ZSL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiaoZ000ZSL23
Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee:
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading. 4883-4887
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuZLCW0L23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuZLCW0L23
Guanghou Liu, Yongmao Zhang, Yi Lei, Yunlin Chen, Rui Wang, Lei Xie, Zhifei Li:
PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions. 4888-4892
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TianZL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TianZL23
Yusheng Tian, Guangyan Zhang, Tan Lee:
Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models. 4893-4897

Speech Recognition: Signal Processing, Acoustic Modeling, Robustness, Adaptation 5

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VaessenL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VaessenL23
Nik Vaessen, David A. van Leeuwen:
Towards Multi-task Learning of Speech and Speaker Recognition. 4898-4902
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhao023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhao023
Zeyu Zhao, Peter Bell:
Regarding Topology and Variant Frame Rates for Differentiable WFST-based End-to-End ASR. 4903-4907
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RybakovMDQLRH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RybakovMDQLRH23
Oleg Rybakov, Phoenix Meadowlark, Shaojin Ding, David Qiu, Jian Li, David Rim, Yanzhang He:
2-bit Conformer quantization for automatic speech recognition. 4908-4912
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Yang0W23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Yang0W23
Yufeng Yang, Ashutosh Pandey, DeLiang Wang:
Time-Domain Speech Enhancement for Robust Automatic Speech Recognition. 4913-4917
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoTSW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoTSW23
Yifan Guo, Yao Tian, Hongbin Suo, Yulong Wan:
Multi-channel multi-speaker transformer for speech recognition. 4918-4922
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YeM0Y23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YeM0Y23
Zhe Ye, Terui Mao, Li Dong, Diqun Yan:
Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion. 4923-4927
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MiwaK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MiwaK23
Shogo Miwa, Atsuhiko Kai:
Dialect Speech Recognition Modeling using Corpus of Japanese Dialects and Self-Supervised Learning-based Model XLSR. 4928-4932
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangZYGMX023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangZYGMX023
Kaixun Huang, Ao Zhang, Zhanheng Yang, Pengcheng Guo, Bingshen Mu, Tianyi Xu, Lei Xie:
Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network. 4933-4937
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RaissiLGSN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RaissiLGSN23
Tina Raissi, Christoph Lüscher, Moritz Gunz, Ralf Schlüter, Hermann Ney:
Competitive and Resource Efficient Factored Hybrid HMM Systems are Simpler Than You Think. 4938-4942
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouWCZYZZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouWCZYZZ23
Xiaohuan Zhou, Jiaming Wang, Zeyu Cui, Shiliang Zhang, Zhijie Yan, Jingren Zhou, Chang Zhou:
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for speech recognition. 4943-4947
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KreyssigSGSMW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KreyssigSGSMW23
Florian L. Kreyssig, Yangyang Shi, Jinxi Guo, Leda Sari, Abdel-rahman Mohamed, Philip C. Woodland:
Biased Self-supervised Learning for ASR. 4948-4952
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangN0JXMNZ0C23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangN0JXMNZ0C23
Zhao Yang, Dianwen Ng, Chong Zhang, Rui Jiang, Wei Xi, Yukun Ma, Chongjia Ni, Jizhong Zhao, Bin Ma, Eng Siong Chng:
A Unified Recognition and Correction Model under Noisy and Accent Speech Conditions. 4953-4957
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangM23
Ranzo Huang, Brian Mak:
wav2vec 2.0 ASR for Cantonese-Speaking Older Adults in a Clinical Setting. 4958-4962
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AnSZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AnSZ23
Keyu An, Xian Shi, Shiliang Zhang:
BAT: Boundary aware transducer for memory-efficient and low-latency ASR. 4963-4967
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TianYCYW0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TianYCYW0023
Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu, Shinji Watanabe:
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction. 4968-4972
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AlastrueyDHW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AlastrueyDHW23
Belen Alastruey, Lukas Drude, Jahn Heymann, Simon Wiesler:
Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition. 4973-4977

Speech, Voice, and Hearing Disorders 2

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SankarBEPH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SankarBEPH23
Sanjana Sankar, Denis Beautemps, Frédéric Elisei, Olivier Perrotin, Thomas Hueber:
Investigating the dynamics of hand and lips in French Cued Speech using attention mechanisms and CTC-based decoding. 4978-4982
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangXSWY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangXSWY23
Yingyang Wang, Min Xu, Jing Shao, Lan Wang, Nan Yan:
Hearing Loss Affects Emotion Perception in Older Adults: Evidence from a Prosody-Semantics Stroop Task. 4983-4987
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KongZWHSM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KongZWHSM23
Fanhui Kong, Nengheng Zheng, Xianren Wang, Hao He, Jan W. H. Schnupp, Qinglin Meng:
Cochlear-implant Listeners Listening to Cochlear-implant Simulated Speech. 4988-4992
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MurtonHMCG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MurtonHMCG23
Olivia M. Murton, Abigail E. Haenssler, Marc F. Maffei, Kathryn P. Connaghan, Jordan R. Green:
Validation of a Task-Independent Cepstral Peak Prominence Measure with Voice Activity Detection. 4993-4997
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Do0L23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Do0L23
Heejin Do, Yunsu Kim, Gary Geunbae Lee:
Score-balanced Loss for Multi-aspect Pronunciation Assessment. 4998-5002
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ArastehRN0YRO23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ArastehRN0YRO23
Soroosh Tayebi Arasteh, Cristian David Ríos-Urrego, Elmar Nöth, Andreas Maier, Seung Hee Yang, Jan Rusz, Juan Rafael Orozco-Arroyave:
Federated Learning for Secure Development of AI Models for Parkinson's Disease Detection Using Speech from Different Languages. 5003-5007
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouKZM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouKZM23
Huali Zhou, Fanhui Kong, Nengheng Zheng, Qinglin Meng:
F0inTFS: A lightweight periodicity enhancement strategy for cochlear implants. 5008-5012
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OBrienGBBB23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OBrienGBBB23
Benjamin O'Brien, Adrien Gresse, Jean-Baptise Billaud, Guilhem Belda, Jean-François Bonastre:
Differentiating acoustic and physiological features in speech for hypoxia detection. 5013-5017
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0006CYTCW023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0006CYTCW023
Hsin-Hao Chen, Yung-Lun Chien, Ming-Chi Yen, Shu-Wei Tsai, Tai-Shih Chi, Hsin-Min Wang, Yu Tsao:
Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain Features. 5018-5022
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Chien0YTW0C23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Chien0YTW0C23
Yung-Lun Chien, Hsin-Hao Chen, Ming-Chi Yen, Shu-Wei Tsai, Hsin-Min Wang, Yu Tsao, Tai-Shih Chi:
Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion. 5023-5026
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IllnerKSSKTJR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IllnerKSSKTJR23
Vojtech Illner, Petr Krýze, Jan Svihlík, Mário Sousa, Paul Krack, Elina Tripoliti, Robert Jech, Jan Rusz:
Which aspects of motor speech disorder are captured by Mel Frequency Cepstral Coefficients? Evidence from the change in STN-DBS conditions in Parkinson's disease. 5027-5031
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SubramanianKBBO23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SubramanianKBBO23
Vinod Subramanian, Namhee Kwon, Raymond Brueckner, Nate Blaylock, Henry O'Connell, Luis Sierra, Clementina Ullman, Karen Hildebrand, Simon E. Laganiere:
Detecting Manifest Huntington's Disease Using Vocal Data. 5032-5036
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenM0W023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenM0W023
Minchuan Chen, Chenfeng Miao, Jun Ma, Shaojun Wang, Jing Xiao:
Exploring multi-task learning and data augmentation in dementia detection with self-supervised pretrained models. 5037-5041

Speech Activity Detection and Modeling

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangZZC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangZZC23
Kuncai Zhang, Wei Zhou, Pengcheng Zhu, Haiqing Chen:
GL-SSD: Global and Local Speech Style Disentanglement by vector quantization for robust sentence boundary detection in speech stream. 5042-5046
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiSZ0Z0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiSZ0Z0023
Mohan Shi, Yuchun Shu, Lingyun Zuo, Qian Chen, Shiliang Zhang, Jie Zhang, Li-Rong Dai:
Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction. 5047-5051
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GudepuKSS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GudepuKSS23
Prithvi R. R. Gudepu, Jayesh M. Koroth, Kamini Sabu, Mahaboob Ali Basha Shaik:
Dynamic Encoder RNN for Online Voice Activity Detection in Adverse Noise Conditions. 5052-5056
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MoussaHWR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MoussaHWR23
Denise Moussa, Germans Hirsch, Sebastian Wankerl, Christian Riess:
Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets. 5057-5061
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wang0023a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wang0023a
Jingyuan Wang, Jie Zhang, Li-Rong Dai:
Real-Time Causal Spectro-Temporal Voice Activity Detection Based on Convolutional Encoding and Residual Decoding. 5062-5066
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KangWP023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KangWP023
Zuheng Kang, Jianzong Wang, Junqing Peng, Jing Xiao:
SVVAD: Personal Voice Activity Detection for Speaker Verification. 5067-5071

Multilingual Models for ASR

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FarooqH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FarooqH23
Muhammad Umar Farooq, Thomas Hain:
Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition. 5072-5076
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OlatunjiADTERS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OlatunjiADTERS23
Tobi Olatunji, Tejumade Afonja, Bonaventure F. P. Dossou, Atnafu Lambebo Tonja, Chris Chinenye Emezue, Amina Mardiyyah Rufai, Sahib Singh:
AfriNames: Most ASR Models "Butcher" African Names. 5077-5081
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LonerganQCGC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LonerganQCGC23
Liam Lonergan, Mengjie Qian, Neasa Ní Chiaráin, Christer Gobl, Ailbhe Ní Chasaide:
Towards Dialect-inclusive Recognition in a Low-resource Language: Are Balanced Corpora the Answer? 5082-5086
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JavedJNSNRBKK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JavedJNSNRBKK23
Tahir Javed, Sakshi Joshi, Vignesh Nagarajan, Sai Sundaresan, Janki Nawale, Abhigyan Raman, Kaushal Santosh Bhogale, Pratyush Kumar, Mitesh M. Khapra:
Svarah: Evaluating English ASR Systems on Indian Accents. 5087-5091
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TalafhaWA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TalafhaWA23
Bashar Talafha, Abdul Waheed, Muhammad Abdul-Mageed:
N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition. 5092-5096
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PichenyYZZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PichenyYZZ23
Michael Picheny, Qin Yang, Daiheng Zhang, Lining Zhang:
The MALACH Corpus: Results with End-to-End Architectures and Pretraining. 5097-5101

Speech Enhancement and Bandwidth Expansion

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinLGA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinLGA23
Xiaoyu Lin, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda:
Unsupervised speech enhancement with deep dynamical generative speech and noise models. 5102-5106
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinSLKJL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinSLKJL23
Yin-Tse Lin, Bo-Hao Su, Chi-Han Lin, Shih-Chan Kuo, Jyh-Shing Roger Jang, Chi-Chun Lee:
Noise-Robust Bandwidth Expansion for 8K Speech Recordings. 5107-5111
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShuaiS0L23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShuaiS0L23
Chenhao Shuai, Chaohua Shi, Lu Gan, Hongqing Liu:
mdctGAN: Taming transformer-based GAN for speech super-resolution with Modified DCT spectra. 5112-5116
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0004K0Z023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0004K0Z023
Yong Xu, Vinay Kothapally, Meng Yu, Shixiong Zhang, Dong Yu:
Zoneformer: On-device Neural Beamformer For In-car Multi-zone Speech Separation, Enhancement and Echo Cancellation. 5117-5121
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuLMLZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuLMLZ23
Jiayi Xu, Jian Li, Weixin Meng, Xiaodong Li, Chengshi Zheng:
Low-complexity Broadband Beampattern Synthesis using Array Response Control. 5122-5126
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhao23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhao23
Haixin Zhao:
A GAN Speech Inpainting Model for Audio Editing Software. 5127-5131

Articulation

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuLLZLBG0A23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuLLZLBG0A23
Peter Wu, Tingle Li, Yijing Lu, Yubin Zhang, Jiachen Lian, Alan W. Black, Louis Goldstein, Shinji Watanabe, Gopala Krishna Anumanchipalli:
Deep Speech Synthesis from MRI-Based Articulatory Representations. 5132-5136
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SiriwardenaES23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SiriwardenaES23
Yashish M. Siriwardena, Carol Y. Espy-Wilson, Shihab A. Shamma:
Learning to Compute the Articulatory Representations of Speech with the MIRRORNET. 5137-5141
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/StrauchS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/StrauchS23
Martin Strauch, Antoine Serrurier:
Generating high-resolution 3D real-time MRI of the vocal tract. 5142-5146
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BandekarUG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BandekarUG23
Jesuraja Bandekar, Sathvik Udupa, Prasanta Kumar Ghosh:
Exploring a classification approach using quantised articulatory movements for acoustic to articulatory inversion. 5147-5151

Neural Processing of Speech and Language: Encoding and Decoding the Diverse Auditory Brain

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OotaTAH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OotaTAH23
Subba Reddy Oota, Nathan Trouvain, Frédéric Alexandre, Xavier Hinaut:
MEG Encoding using Word Context Semantics in Listening Stories. 5152-5156
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CantisaniCLS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CantisaniCLS23
Giorgia Cantisani, Amirhossein Chalehchaleh, Giovanni M. Di Liberto, Shihab A. Shamma:
Investigating the cortical tracking of speech and music with sung speech. 5157-5161
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KedingASS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KedingASS23
Oskar Keding, Emina Alickovic, Martin A. Skoglund, Maria Sandsten:
Coherence Estimation Tracks Auditory Attention in Listeners with Hearing Impairment. 5162-5166
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OotaAM0B23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OotaAM0B23
Subba Reddy Oota, Veeral Agarwal, Mounika Marreddy, Manish Gupta, Raju S. Bapi:
Speech Taskonomy: Which Speech Tasks are the most Predictive of fMRI Brain Activity? 5167-5171
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Qiu0YL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Qiu0YL23
Zelin Qiu, Jianjun Gu, Dingding Yao, Junfeng Li:
Exploring Auditory Attention Decoding using Speaker Features. 5172-5176
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SomanSG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SomanSG23
Akshara Soman, Vidhi Sinha, Sriram Ganapathy:
Enhancing the EEG Speech Match Mismatch Tasks With Word Boundaries. 5177-5181
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChengCSCBI23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChengCSCBI23
Tzu-Han Zoe Cheng, Kuan-Lin Chen, Juliane Schubert, Ya-Ping Chen, Tim Brown, John Iversen:
Similar Hierarchical Representation of Speech and Other Complex Sounds In the Brain and Deep Residual Networks: An MEG Study. 5182-5186
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MacIntyreG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MacIntyreG23
Alexis Deighton MacIntyre, Tobias Goehring:
Effects of spectral degradation on the cortical tracking of the speech envelope. 5187-5191
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PalmaLL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PalmaLL23
Ignacio Calderon De Palma, Laura S. Lopez, Alejandro Lopez-Valdes:
Effects of spectral and temporal modulation degradation on intelligibility and cortical tracking of speech signals. 5192-5196

Perception of Paralinguistics

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Li0L23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Li0L23
Yuanchao Li, Peter Bell, Catherine Lai:
Transfer Learning for Personality Perception via Speech Emotion Recognition. 5197-5201
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NaganoIH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NaganoIH23
Mizuki Nagano, Yusuke Ijima, Sadao Hiroya:
A stimulus-organism-response model of willingness to buy from advertising speech using voice quality. 5202-5206
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DoukhanDGRCWR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DoukhanDGRCWR23
David Doukhan, Simon Devauchelle, Lucile Girard-Monneron, Mía Chávez Ruz, V. Chaddouk, Isabelle Wagner, Albert Rilliard:
Voice Passing : a Non-Binary Voice Gender Prediction System for evaluating Transgender voice transition. 5207-5211
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YanagidaIT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YanagidaIT23
Hikaru Yanagida, Yusuke Ijima, Naohiro Tawara:
Influence of Personal Traits on Impressions of One's Own Voice. 5212-5216
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KirklandGS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KirklandGS23
Ambika Kirkland, Joakim Gustafson, Éva Székely:
Pardon my disfluency: The impact of disfluency effects on the perception of speaker competence and confidence. 5217-5221
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GessingerCCZM23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GessingerCCZM23
Iona Gessinger, Michelle Cohn, Benjamin R. Cowan, Georgia Zellou, Bernd Möbius:
Cross-linguistic Emotion Perception in Human and TTS Voices. 5222-5226

Technologies for Child Speech Processing

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Duan23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Duan23
Richeng Duan:
Joint Learning Feature and Model Adaptation for Unsupervised Acoustic Modelling of Child Speech. 5227-5231
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MolenaarGCS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MolenaarGCS23
Bo Molenaar, Cristian Tejedor García, Catia Cucchiarini, Helmer Strik:
Automatic Assessment of Oral Reading Accuracy for Reading Diagnostics. 5232-5236
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaiHCHS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaiHCHS23
Yu Bai, Ferdy Hubers, Catia Cucchiarini, Roeland van Hout, Helmer Strik:
An ASR-enabled Reading Tutor: Investigating Feedback to Optimize Interaction for Learning to Read. 5237-5241
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JainBY0C23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JainBY0C23
Rishabh Jain, Andrei Barcovschi, Mariam Yahayah Yiwere, Peter Corcoran, Horia Cucu:
Adaptation of Whisper models to child speech recognition. 5242-5246

Show and Tell: Media and commercial applications

- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/YinRAR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YinRAR23
Michele Yin, Gabriel Roccabruna, Abhinav Azad, Giuseppe Riccardi:
Let's Give a Voice to Conversational Agents in Virtual Reality. 5247-5248
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/Baali023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Baali023
Massa Baali, Ahmed M. Ali:
FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football Commentator. 5249-5250
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/LiuCLXP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuCLXP23
Hanchao Liu, Dapeng Chen, Rongjun Li, Wenyuan Xue, Wei Peng:
Video Summarization Leveraging Multimodal Information for Presentations. 5251-5252
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/NathanDKGV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NathanDKGV23
Varun Nathan, Devashish Deshpande, Ayush Kumar, Cijo George, Jithendra Vepa:
What questions are my customers asking?: Towards Actionable Insights from Customer Questions in Contact Center Calls. 5253-5254
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/TripathiIKGV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TripathiIKGV23
Rishabh Kumar Tripathi, Digvijay Ingle, Ayush Kumar, Cijo George, Jithendra Vepa:
COnVoy: A Contact Center Operated Pipeline for Voice of Customer Discovery. 5255-5256
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/RastorguevaLG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RastorguevaLG23
Elena Rastorgueva, Vitaly Lavrukhin, Boris Ginsburg:
NeMo Forced Aligner and its application to word alignment for subtitle generation. 5257-5258
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/PattnaikNSGV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PattnaikNSGV23
Anup Pattnaik, Tanay Narshana, Aashraya Sachdeva, Cijo George, Jithendra Vepa:
CauSE: Causal Search Engine for Understanding Contact-Center Conversations. 5259-5260
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/SachdevaPPNGKV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SachdevaPPNGKV23
Aashraya Sachdeva, Sai Nishanth Padala, Anup Pattnaik, Varun Nathan, Cijo George, Ayush Kumar, Jithendra Vepa:
Tailored Real-Time Call Summarization System for Contact Centers. 5261-5262
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/MandkeORCLSMMSO23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MandkeORCLSMMSO23
Prathamesh Mandke, Rachel Oberst, Matthias Reisser, Avijit Chakraborty, Christos Louizos, Joseph Soriaga, Daniel Madrigal Diaz, Andre Manoel, Nalin Singal, Jeff Omhover, Robert Sim:
Federated Learning Toolkit with Voice-based User Verification Demo. 5263-5264
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/DuganWSCMZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuganWSCMZ23
Liam Dugan, Anshul Wadhawan, Kyle Spence, Chris Callison-Burch, Morgan McGuire, Victor B. Zordan:
Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models. 5265-5266
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/ChoKKK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoKKK23
Namhyun Cho, Sunmin Kim, Yoseb Kang, Heeman Kim:
Fast Enrollable Streaming Keyword Spotting System: Training and Inference using a Web Browser. 5267-5268
- view
  - electronic edition @ isca-archive.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/interspeech/AgrawalSJGV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AgrawalSJGV23
Suraj Agrawal, Aashraya Sachdeva, Soumya Jain, Cijo George, Jithendra Vepa:
Cross-lingual/Cross-channel Intent Detection in Contact-Center Conversations. 5269-5270

Speaker and Language Identification 3

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeoLKSY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeoLKSY23
Jungwoo Heo, Chan-yeong Lim, Ju-ho Kim, Hyun-seo Shin, Ha-Jin Yu:
One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification. 5271-5275
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KawaPS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KawaPS23
Piotr Kawa, Marcin Plata, Piotr Syga:
Defense Against Adversarial Attacks on Audio DeepFake Detection. 5276-5280
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RoselloGGP23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RoselloGGP23
Eros Rosello, Alejandro Gómez Alanís, Angel M. Gomez, Antonio M. Peinado:
A conformer-based classifier for variable-length utterance processing in anti-spoofing. 5281-5285
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangHLH023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangHLH023
Feng Wang, Lingyan Huang, Tao Li, Qingyang Hong, Lin Li:
Conformer-based Language Embedding with Self-Knowledge Distillation for Spoken Language Identification. 5286-5290
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zuluaga-GomezAV23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zuluaga-GomezAV23
Juan Zuluaga-Gomez, Sara Ahmed, Danielius Visockas, Cem Subakan:
CommonAccent: Exploring Large Acoustic Pretrained Models for Accent Classification Based on Common Voice. 5291-5295
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CumaniS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CumaniS23
Sandro Cumani, Salvatore Sarni:
From adaptive score normalization to adaptive data normalization for speaker verification systems. 5296-5300
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangZCC023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangZCC023
Hui Wang, Siqi Zheng, Yafeng Chen, Luyao Cheng, Qian Chen:
CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking. 5301-5305
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KakourosH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KakourosH23
Sofoklis Kakouros, Katri Hiovain-Asikainen:
North Sámi Dialect Identification with Self-supervised Speech Models. 5306-5310
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JungSHKKKLL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JungSHKKKLL23
Jee-weon Jung, Soonshin Seo, Hee-Soo Heo, Geonmin Kim, You Jin Kim, Youngki Kwon, Minjae Lee, Bong-Jin Lee:
Encoder-decoder Multimodal Speaker Change Detection. 5311-5315
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NamKHHJC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NamKHHJC23
Kihyun Nam, Youkyum Kim, Jaesung Huh, Hee-Soo Heo, Jee-weon Jung, Joon Son Chung:
Disentangled Representation Learning for Multilingual Speaker Recognition. 5316-5320
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiaKBG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiaKBG23
Fei Jia, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg:
A Compact End-to-End Model with Local and Global Context for Spoken Language Identification. 5321-5325
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SullivanEA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SullivanEA23
Peter Sullivan, AbdelRahim A. Elmadany, Muhammad Abdul-Mageed:
On the Robustness of Arabic Speech Dialect Identification. 5326-5330
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLWQ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLWQ23
Haoyu Wang, Bei Liu, Yifei Wu, Yanmin Qian:
Adaptive Neural Network Quantization For Lightweight Speaker Verification. 5331-5335
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SuXZH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SuXZH23
Xinmei Su, Xiang Xie, Fengrun Zhang, Chenguang Hu:
Adversarial Diffusion Probability Model For Cross-domain Speaker Verification Integrating Contrastive Loss. 5336-5340
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ItoH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ItoH23
Aoi Ito, Shota Horiguchi:
Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model. 5346-5350
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VashishthBGBM0A23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VashishthBGBM0A23
Shikhar Vashishth, Shikhar Bharadwaj, Sriram Ganapathy, Ankur Bapna, Min Ma, Wei Han, Vera Axelrod, Partha Talukdar:
Label Aware Speech Representation Learning For Language Identification. 5351-5355
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuoZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuoZ23
Qibao Luo, Ruohua Zhou:
Exploring the Impact of Back-End Network on Wav2vec 2.0 for Dialect Identification. 5356-5360
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengPSMBC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengPSMBC23
Junyi Peng, Oldrich Plchot, Themos Stafylakis, Ladislav Mosner, Lukás Burget, Jan Cernocký:
Improving Speaker Verification with Self-Pretrained Transformer Models. 5361-5365
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RibeiroHSYWS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RibeiroHSYWS23
Vinicius Ribeiro, Yiteng Huang, Yuan Shangguan, Zhaojun Yang, Li Wan, Ming Sun:
Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-Based, Alignment-Free and Hybrid Approaches. 5366-5370

Analysis of Speech and Audio Signals 4

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinkeKDMKS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinkeKDMKS23
Julian Linke, Mate Kadar, Gergely Dosinszky, Péter Mihajlik, Gernot Kubin, Barbara Schuppler:
What do self-supervised speech representations encode? An analysis of languages, varieties, speaking styles and speakers. 5371-5375
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangYZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangYZ23
Jinghong Zhang, Xiaowei Yi, Xianfeng Zhao:
A Compressed Synthetic Speech Detection Method with Compression Feature Embedding. 5376-5380
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangSW023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangSW023
Yucong Zhang, Hongbin Suo, Yulong Wan, Ming Li:
Outlier-aware Inlier Modeling and Multi-scale Scoring for Anomalous Sound Detection via Multitask Learning. 5381-5385
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiL23
Zitong Li, Wei Li:
MOSLight: A Lightweight Data-Efficient System for Non-Intrusive Speech Quality Assessment. 5386-5390
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeiCZGLW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeiCZGLW23
Xipin Wei, Junhui Chen, Zirui Zheng, Li Guo, Lantian Li, Dong Wang:
A Multi-Scale Attentive Transformer for Multi-Instrument Symbolic Music Generation. 5391-5395
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoHWHH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoHWHH23
Yuan Gao, Ying Hu, Liusong Wang, Hao Huang, Liang He:
MTANet: Multi-band Time-frequency Attention Network for Singing Melody Extraction from Polyphonic Music. 5396-5400
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangZH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangZH23
Chunhui Wang, Chang Zeng, Xing He:
Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer Based on Generative Adversarial Network. 5401-5405
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SolankiBKG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SolankiBKG23
Mohammad Shaique Solanki, Ashutosh Bharadwaj, Jeevan Kylash, Prasanta Kumar Ghosh:
Do Vocal Breath Sounds Encode Gender Cues for Automatic Gender Classification? 5406-5410
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SugiuraN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SugiuraN23
Toki Sugiura, Hiromitsu Nishizaki:
Automatic Exploration of Optimal Data Processing Operations for Sound Data Augmentation Using Improved Differentiable Automatic Data Augmentation. 5411-5415
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiaoYLTCYL0R23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiaoYLTCYL0R23
Li Xiao, Xiuping Yang, Xinhong Li, Weiping Tu, Xiong Chen, Weiyan Yi, Jie Lin, Yuhong Yang, Yanzhen Ren:
A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis. 5416-5420
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeiCDC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeiCDC23
Haojie Wei, Xueke Cao, Tangpeng Dan, Yueguo Chen:
RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music. 5421-5425
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ManochaG0MR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ManochaG0MR23
Pranay Manocha, Israel Dejene Gebru, Anurag Kumar, Dejan Markovic, Alexander Richard:
Spatialization Quality Metric for Binaural Speech. 5426-5430
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RoyS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RoyS23
Arka Roy, Udit Satija:
AsthmaSCELNet: A Lightweight Supervised Contrastive Embedding Learning Framework for Asthma Classification Using Lung Sounds. 5431-5435
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaeKCBSLHTKY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaeKCBSLHTKY23
Sangmin Bae, June-Woo Kim, Won-Yang Cho, Hyerim Baek, Soyoun Son, Byungjo Lee, Changwan Ha, Kyongpil Tae, Sungnyun Kim, Se-Young Yun:
Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification. 5436-5440
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RichterNGRRKR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RichterNGRRKR23
Vanessa Richter, Michael Neumann, Jordan R. Green, Brian Richburg, Oliver Roesler, Hardik Kothare, Vikram Ramanarayanan:
Remote Assessment for ALS using Multimodal Dialog Agents: Data Quality, Feasibility and Task Compliance. 5441-5445
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YarivGWAS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YarivGWAS23
Guy Yariv, Itai Gat, Lior Wolf, Yossi Adi, Idan Schwartz:
Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation. 5446-5450
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Romero0BJ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Romero0BJ23
Hector E. Romero, Ning Ma, Guy J. Brown, Sam Johnson:
Obstructive sleep apnea screening with breathing sounds and respiratory effort: a multimodal deep learning approach. 5451-5455
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Sun0W0H023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Sun0W0H023
Yifu Sun, Xulong Zhang, Jianzong Wang, Ning Cheng, Kaiyu Hu, Jing Xiao:
Investigation of Music Emotion Recognition Based on Segmented Semi-Supervised Learning. 5456-5460

Speech Synthesis: Multilinguality; Evaluation

- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DoCDK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DoCDK23
Phat Do, Matt Coler, Jelske Dijkstra, Esther Klabbers:
The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech. 5461-5465
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DoCDK23a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DoCDK23a
Phat Do, Matt Coler, Jelske Dijkstra, Esther Klabbers:
Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages. 5466-5470
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuZY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuZY23
Yewei Gu, Xianfeng Zhao, Xiaowei Yi:
Robust Feature Decoupling in Voice Conversion by Using Locality-Based Instance Normalization. 5471-5475
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiaTPLCMWW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiaTPLCMWW23
Dongya Jia, Qiao Tian, Kainan Peng, Jiaxin Li, Yuanzhe Chen, Mingbo Ma, Yuping Wang, Yuxuan Wang:
Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network. 5476-5480
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EkstedtWSGS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EkstedtWSGS23
Erik Ekstedt, Siyang Wang, Éva Székely, Joakim Gustafson, Gabriel Skantze:
Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis. 5481-5485
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CongZL0W00M23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CongZL0W00M23
Yahuan Cong, Haoyu Zhang, Haopeng Lin, Shichao Liu, Chunfeng Wang, Yi Ren, Xiang Yin, Zejun Ma:
GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech. 5486-5490
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YasudaT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YasudaT23
Yusuke Yasuda, Tomoki Toda:
Analysis of Mean Opinion Scores in Subjective Evaluation of Synthetic Speech Based on Tail Probabilities. 5491-5495
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoizumiZKDYMB0H23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoizumiZKDYMB0H23
Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna:
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus. 5496-5500
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MitsuiHS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MitsuiHS23
Kentaro Mitsui, Yukiya Hono, Kei Sawada:
UniFLG: Unified Facial Landmark Generator from Text or Speech. 5501-5505
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NguyenPN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NguyenPN23
Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen:
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech. 5506-5510
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KulkarniKSA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KulkarniKSA23
Ajinkya Kulkarni, Atharva Kulkarni, Sara Abedalmonem Mohammad Shatnawi, Hanan Aldarmaki:
ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus. 5511-5515
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DejaTCCD23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DejaTCCD23
Kamil Deja, Georgi Tinchev, Marta Czarnowska, Marius Cotescu, Jasha Droppo:
Diffusion-based accent modelling in speech synthesis. 5516-5520
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YeshpanovMK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YeshpanovMK23
Rustem Yeshpanov, Saida Mussakhojayeva, Yerbolat Khassanov:
Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration. 5521-5525
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangTLWZ023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangTLWZ023
Siheng Zhang, Xingjun Tan, Yanqiang Lei, Xianxiang Wang, Zhizhong Zhang, Yuan Xie:
CVTE-Poly: A New Benchmark for Chinese Polyphone Disambiguation. 5526-5530
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Yang0MW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Yang0MW23
Fengyu Yang, Jian Luan, Meng Meng, Yujun Wang:
Improving Bilingual TTS Using Language And Phonology Embedding With Embedding Strength Modulator. 5531-5535
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuS0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuS0023
Junchen Lu, Berrak Sisman, Mingyang Zhang, Haizhou Li:
High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units. 5536-5540
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuPBHTZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuPBHTZ23
Yang Yu, Matthew Perez, Ankur Bapna, Fadi Haik, Siamak Tazari, Yu Zhang:
PronScribe: Highly Accurate Multimodal Phonemic Transcription From Speech and Text. 5541-5545
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SwiatkowskiWBCT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SwiatkowskiWBCT23
Jakub Swiatkowski, Duo Wang, Mikolaj Babianski, Giuseppe Coccia, Patrick Lumban Tobing, Ravichander Vipperla, Viacheslav Klimkov, Vincent Pollet:
Expressive Machine Dubbing Through Phrase-level Cross-lingual Prosody Transfer. 5546-5550
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChiangHL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChiangHL23
Cheng-Han Chiang, Wei-Ping Huang, Hung-yi Lee:
Why We Should Report the Details in Subjective Evaluation of TTS More Rigorously. 5551-5555
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZarazagaMHJ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZarazagaMHJ23
Pablo Pérez Zarazaga, Zofia Malisz, Gustav Eje Henter, Lauri Juvela:
Speaker-independent neural formant synthesis. 5556-5560
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaitoITTS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaitoITTS23
Yuki Saito, Eiji Iimori, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari:
CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center. 5561-5565
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SharoniSC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SharoniSC23
Orian Sharoni, Roee Shenberg, Erica Cooper:
SASPEECH: A Hebrew Single Speaker Dataset for Text To Speech and Voice Conversion. 5566-5570

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.