default search action
SLT 2022: Doha, Qatar
- IEEE Spoken Language Technology Workshop, SLT 2022, Doha, Qatar, January 9-12, 2023. IEEE 2023, ISBN 979-8-3503-9690-4
- Vasista Sai Lodagala, Sreyan Ghosh, Srinivasan Umesh:
CCC-WAV2VEC 2.0: Clustering AIDED Cross Contrastive Self-Supervised Learning of Speech Representations. 1-8 - Hyung Yong Kim, Byeong-Yeol Kim, Seung Woo Yoo, Youshin Lim, Yunkyu Lim, Hanbin Lee:
ASBERT: ASR-Specific Self-Supervised Learning with Self-Training. 9-14 - Kai Zhen, Martin Radfar, Hieu Duy Nguyen, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris:
Sub-8-Bit Quantization for On-Device Speech Recognition: A Regularization-Free Approach. 15-22 - Gary Wang, Ekin D. Cubuk, Andrew Rosenberg, Shuyang Cheng, Ron J. Weiss, Bhuvana Ramabhadran, Pedro J. Moreno, Quoc V. Le, Daniel S. Park:
G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR. 23-30 - David Qiu, Tsendsuren Munkhdalai, Yanzhang He, Khe Chai Sim:
Context-Aware Neural Confidence Estimation for Rare Word Speech Recognition. 31-37 - Antoine Bruguier, David Qiu, Trevor Strohman, Yanzhang He:
Flickering Reduction with Partial Hypothesis Reranking for Streaming ASR. 38-45 - Tatsuya Komatsu, Yusuke Fujita:
Interdecoder: using Attention Decoders as Intermediate Regularization for CTC-Based Speech Recognition. 46-51 - Tara N. Sainath, Rohit Prabhavalkar, Ankur Bapna, Yu Zhang, Zhouyuan Huo, Zhehuai Chen, Bo Li, Weiran Wang, Trevor Strohman:
JOIST: A Joint Speech and Text Streaming Model for ASR. 52-59 - Ke-Han Lu, Kuan-Yu Chen:
A Context-Aware Knowledge Transferring Strategy for CTC-Based ASR. 60-67 - Zhehuai Chen, Ankur Bapna, Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Pedro J. Moreno, Nanxin Chen:
Maestro-U: Leveraging Joint Speech-Text Representation Learning for Zero Supervised Speech ASR. 68-75 - Yusuke Fujita, Tatsuya Komatsu, Yusuke Kida:
Alternate Intermediate Conditioning with Syllable-Level and Character-Level Targets for Japanese ASR. 76-83 - Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu Jeong Han, Shinji Watanabe:
E-Branchformer: Branchformer with Enhanced Merging for Speech Recognition. 84-91 - Jinhwan Park, Sichen Jin, Junmo Park, Sungsoo Kim, Dhairya Sandhyana, Changheon Lee, Myoungji Han, Jungin Lee, Seokyeong Jung, Changwoo Han, Chanwoo Kim:
Conformer-Based on-Device Streaming Speech Recognition with KD Compression and Two-Pass Architecture. 92-99 - Suhaila M. Shakiah, Rupak Vignesh Swaminathan, Hieu Duy Nguyen, Raviteja Chinta, Tariq Afzal, Nathan Susanj, Athanasios Mouchtaris, Grant P. Strimel, Ariya Rastrow:
Accelerator-Aware Training for Transducer-Based Speech Recognition. 100-107 - Lahiru Samarakoon, Ivan Fung:
Untied Positional Encodings for Efficient Transformer-Based Speech Recognition. 108-114 - Yan Gao, Javier Fernández-Marqués, Titouan Parcollet, Pedro P. B. de Gusmao, Nicholas D. Lane:
Match to Win: Analysing Sequences Lengths for Efficient Self-Supervised Learning in Speech and Audio. 115-122 - Peng Shen, Xugang Lu, Hisashi Kawai:
Pronunciation-Aware Unique Character Encoding for RNN Transducer-Based Mandarin Speech Recognition. 123-129 - Somshubra Majumdar, Shantanu Acharya, Vitaly Lavrukhin, Boris Ginsburg:
Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition. 130-135 - Vasista Sai Lodagala, Sreyan Ghosh, Srinivasan Umesh:
PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations. 136-143 - Fan Yu, Shiliang Zhang, Pengcheng Guo, Yuhao Liang, Zhihao Du, Yuxiao Lin, Lei Xie:
MFCCA:Multi-Frame Cross-Channel Attention for Multi-Speaker ASR in Multi-Party Meeting Scenario. 144-151 - Aleksandr Laptev, Boris Ginsburg:
Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-to-End Automatic Speech Recognition. 152-159 - Sungjun Han, Deepak Baby, Valentin Mendelev:
Residual Adapters for Targeted Updates in RNN-Transducer Based Speech Recognition System. 160-166 - Tian Li, Qingliang Meng, Yujian Sun:
Improved Noisy Iterative Pseudo-Labeling for Semi-Supervised Speech Recognition. 167-173 - Aparna Khare, Minhua Wu, Saurabhchand Bhati, Jasha Droppo, Roland Maas:
Guided Contrastive Self-Supervised Pre-Training for Automatic Speech Recognition. 174-181 - Jakob Poncelet, Hugo Van hamme:
Learning to Jointly Transcribe and Subtitle for End-To-End Spontaneous Speech Recognition. 182-189 - Tsendsuren Munkhdalai, Zelin Wu, Golan Pundak, Khe Chai Sim, Jiayang Li, Pat Rondon, Tara N. Sainath:
NAM+: Towards Scalable End-to-End Contextual Biasing for Adaptive ASR. 190-196 - Zhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno:
Modular Hybrid Autoregressive Transducer. 197-204 - Juan Zuluaga-Gomez, Amrutha Prasad, Iuliia Nigmatulina, Seyyed Saeed Sarfjoo, Petr Motlícek, Matthias Kleinert, Hartmut Helmke, Oliver Ohneiser, Qingran Zhan:
How Does Pre-Trained Wav2Vec 2.0 Perform on Domain-Shifted Asr? an Extensive Benchmark on Air Traffic Control Communications. 205-212 - Adam Stooke, Khe Chai Sim, Mason Chua, Tsendsuren Munkhdalai, Trevor Strohman:
Internal Language Model Personalization of E2E Automatic Speech Recognition Using Random Encoder Features. 213-220 - Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski:
Towards End-to-End Unsupervised Speech Recognition. 221-228 - Albert Zeyer, Robin Schmitt, Wei Zhou, Ralf Schlüter, Hermann Ney:
Monotonic Segmental Attention for Automatic Speech Recognition. 229-236 - Yashesh Gaur, Nick Kibre, Jian Xue, Kangyuan Shu, Yuhui Wang, Issac Alphanso, Jinyu Li, Yifan Gong:
Streaming, Fast and Accurate on-Device Inverse Text Normalization for Automatic Speech Recognition. 237-244 - Cal Peyser, W. Ronny Huang, Tara N. Sainath, Rohit Prabhavalkar, Michael Picheny, Kyunghyun Cho:
Dual Learning for Large Vocabulary On-Device ASR. 245-251 - Aditya Patil, Vikas Joshi, Purvi Agrawal, Rupesh R. Mehta:
Streaming Bilingual End-to-End ASR Model Using Attention Over Multiple Softmax. 252-259 - Yoshiki Masuyama, Xuankai Chang, Samuele Cornell, Shinji Watanabe, Nobutaka Ono:
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation. 260-265 - Dongjune Lee, Minchan Kim, Sung Hwan Mun, Min Hyun Han, Nam Soo Kim:
Fully Unsupervised Training of Few-Shot Keyword Spotting. 266-272 - Chunxi Liu, Yuan Shangguan, Haichuan Yang, Yangyang Shi, Raghuraman Krishnamoorthi, Ozlem Kalinli:
Learning a Dual-Mode Speech Recognition Model VIA Self-Pruning. 273-279 - Ji Won Yoon, Beom Jun Woo, Sunghwan Ahn, Hyeonseung Lee, Nam Soo Kim:
Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition. 280-286 - Tina Raissi, Wei Zhou, Simon Berger, Ralf Schlüter, Hermann Ney:
HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch. 287-294 - Vrunda N. Sukhadia, Srinivasan Umesh:
Domain Adaptation of Low-Resource Target-Domain Models Using Well-Trained ASR Conformer Models. 295-301 - Saket Dingliwal, Monica Sunkara, Srikanth Ronanki, Jeff Farris, Katrin Kirchhoff, Sravan Bodapati:
Personalization of CTC Speech Recognition Models. 302-309 - Shaan Bijwadia, Shuo-Yiin Chang, Bo Li, Tara N. Sainath, Chao Zhang, Yanzhang He:
Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems. 310-316 - Arun Narayanan, James Walker, Sankaran Panchapagesan, Nathan Howard, Yuma Koizumi:
Learning Mask Scalars for Improved Robust Automatic Speech Recognition. 317-323 - Niko Moritz, Frank Seide, Duc Le, Jay Mahadeokar, Christian Fuegen:
An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition. 324-330 - Chanwoo Kim, Sathish Indurti, Jinhwan Park, Wonyong Sung:
Macro-Block Dropout for Improved Regularization in Training End-to-End Speech Recognition Models. 331-338 - Ragheb Al-Ghezi, Yaroslav Getman, Ekaterina Voskoboinik, Mittul Singh, Mikko Kurimo:
Automatic Rating of Spontaneous Speech for Low-Resource Languages. 339-345 - Benjamin Kleiner, Jack G. M. Fitzgerald, Haidar Khan, Gohkan Tur:
Mixture of Domain Experts for Language Understanding: an Analysis of Modularity, Task Performance, and Memory Tradeoffs. 346-352 - Anupama Chingacham, Vera Demberg, Dietrich Klakow:
A Data-Driven Investigation of Noise-Adaptive Utterance Generation with Linguistic Modification. 353-360 - Gaëlle Laperrière, Valentin Pelloin, Mickaël Rouvier, Themos Stafylakis, Yannick Estève:
On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding. 361-368 - Jin Sakuma, Shinya Fujie, Tetsunori Kobayashi:
Response Timing Estimation for Spoken Dialog Systems Based on Syntactic Completeness Prediction. 369-374 - Jinzi Qi, Hugo Van hamme:
Weak-Supervised Dysarthria-Invariant Features for Spoken Language Understanding Using an Fhvae and Adversarial Training. 375-381 - Hong Liu, Yucheng Cai, Zhijian Ou, Yi Huang, Junlan Feng:
Building Markovian Generative Architectures Over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems. 382-389 - Mohan Li, Rama Doddipatla:
Non-Autoregressive End-to-End Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding. 390-397 - Yasufumi Moriya, Gareth J. F. Jones:
Improving Noise Robustness for Spoken Content Retrieval Using Semi-Supervised ASR and N-Best Transcripts for BERT-Based Ranking Models. 398-405 - Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe:
A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding. 406-413 - Wei-Tsung Kao, Yuan-Kuei Wu, Chia-Ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-Yi Lee:
On the Efficiency of Integrating Self-Supervised Learning and Meta-Learning for User-Defined Few-Shot Keyword Spotting. 414-421 - Liang Wen, Lizhong Wang, Ying Zhang, Kwang Pyo Choi:
Multi-Stage Progressive Audio Bandwidth Extension. 422-427 - Sandipana Dowerah, Romain Serizel, Denis Jouvet, Mohammad MohammadAmini, Driss Matrouf:
Joint Optimization of Diffusion Probabilistic-Based Multichannel Speech Enhancement with Far-Field Speaker Verification. 428-435 - Shubo Lv, Yihui Fu, Yukai Jv, Lei Xie, Weixin Zhu, Wei Rao, Yannan Wang:
Spatial-DCCRN: DCCRN Equipped with Frame-Level Angle Feature and Hybrid Filtering for Multi-Channel Speech Enhancement. 436-443 - Martin Strauss, Matteo Torcoli, Bernd Edler:
Improved Normalizing Flow-Based Speech Enhancement Using an all-Pole Gammatone Filterbank for Conditional Input Representation. 444-450 - Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu:
Exploring WavLM on Speech Enhancement. 451-457 - Yu-sheng Tsao, Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen:
Adaptive-FSN: Integrating Full-Band Extraction and Adaptive Sub-Band Encoding for Monaural Speech Enhancement. 458-464 - Andrea Lorena Aldana Blanco, Cassia Valentini-Botinhao, Ondrej Klejch, Mandar Gogate, Kia Dashtipour, Amir Hussain, Peter Bell:
AVSE Challenge: Audio-Visual Speech Enhancement Challenge. 465-471 - Yukai Ju, Shimin Zhang, Wei Rao, Yannan Wang, Tao Yu, Lei Xie, Shidong Shang:
TEA-PSE 2.0: Sub-Band Network for Real-Time Personalized Speech Enhancement. 472-479 - Soumi Maiti, Yushi Ueda, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu:
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers. 480-487 - Qinghua Liu, Yating Huang, Yunzhe Hao, Jiaming Xu, Bo Xu:
LiMuSE: Lightweight Multi-Modal Speaker Extraction. 488-495 - Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe, Yanmin Qian:
End-to-End Multi-Speaker ASR with Independent Vector Analysis. 496-501 - Wolfgang Mack, Emanuël A. P. Habets:
A Hybrid Acoustic Echo Reduction Approach Using Kalman Filtering and Informed Source Extraction with Improved Training. 502-508 - Chendong Zhao, Jianzong Wang, Xiaoyang Qu, Haoqian Wang, Jing Xiao:
Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation. 509-516 - Tianyu Cao, Laureano Moro-Velázquez, Piotr Zelasko, Jesús Villalba, Najim Dehak:
Vsameter: Evaluation of a New Open-Source Tool to Measure Vowel Space Area and Related Metrics. 517-524 - Tyler Vuong, Nikhil Madaan, Rohan Panda, Richard M. Stern:
Investigating the Important Temporal Modulations for Deep-Learning-Based Speech Activity Detection. 525-531 - Anna Favaro, Chelsie Motley, Tianyu Cao, Miguel Iglesias, Ankur A. Butala, Esther S. Oh, Robert D. Stevens, Jesús Villalba, Najim Dehak, Laureano Moro-Velázquez:
A Multi-Modal Array of Interpretable Features to Evaluate Language and Speech Patterns in Different Neurological Disorders. 532-539 - Donghyeon Kim, Jeong-gi Kwak, Hanseok Ko:
Efficient Dynamic Filter For Robust and Low Computational Feature Extraction. 540-547 - Sung Hwan Mun, Jee-weon Jung, Min Hyun Han, Nam Soo Kim:
Frequency and Multi-Scale Selective Kernel Attention for Speaker Verification. 548-554 - Junyi Peng, Oldrich Plchot, Themos Stafylakis, Ladislav Mosner, Lukás Burget, Jan Cernocký:
An Attention-Based Backend Allowing Efficient Fine-Tuning of Transformer Models for Speaker Verification. 555-562 - Woo Hyun Kang, Jahangir Alam, Abderrahim Fathan:
Flow-ER: A Flow-Based Embedding Regularization Strategy for Robust Speech Representation Learning. 563-570 - Ismail Rasim Ülgen, Levent M. Arslan:
Unsupervised Domain Adaptation of Neural PLDA Using Segment Pairs for Speaker Verification. 571-576 - Bhusan Chettri:
The Clever Hans Effect in Voice Spoofing Detection. 577-584 - Xin Wang, Junichi Yamagishi:
Investigating Active-Learning-Based Training Data Selection for Speech Spoofing Countermeasure. 585-592 - Xinyue Ma, Shanshan Zhang, Shen Huang, Ji Gao, Ying Hu, Liang He:
How to Boost Anti-Spoofing with X-Vectors. 593-598 - Zhengyang Chen, Yao Qian, Bing Han, Yanmin Qian, Michael Zeng:
A Comprehensive Study on Self-Supervised Distillation for Speaker Representation Learning. 599-604 - Jeremy Heng Meng Wong, Yifan Gong:
Joint Speaker Diarisation and Tracking in Switching State-Space Model. 605-612 - Jeremy Heng Meng Wong, Igor Abramovski, Xiong Xiao, Yifan Gong:
Diarisation Using Location Tracking with Agglomerative Clustering. 613-619 - Shota Horiguchi, Yuki Takashima, Shinji Watanabe, Paola García:
Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization. 620-625 - Juan Manuel Coria, Hervé Bredin, Sahar Ghannay, Sophie Rosset:
Continual Self-Supervised Domain Adaptation for End-to-End Speaker Diarization. 626-632 - Juan Zuluaga-Gomez, Seyyed Saeed Sarfjoo, Amrutha Prasad, Iuliia Nigmatulina, Petr Motlícek, Karel Ondrej, Oliver Ohneiser, Hartmut Helmke:
Bertraffic: Bert-Based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications. 633-640 - Giovanni Morrone, Samuele Cornell, Desh Raj, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini:
Low-Latency Speech Separation Guided Diarization for Telephone Conversations. 641-646 - Samantha Kotey, Rozenn Dahyot, Naomi Harte:
Fine Grained Spoken Document Summarization Through Text Segmentation. 647-654 - Jwala Dhamala, Varun Kumar, Rahul Gupta, Kai-Wei Chang, Aram Galstyan:
An Analysis of The Effects of Decoding Algorithms on Fairness in Open-Ended Language Generation. 655-662 - Lu Zeng, Sree Hari Krishnan Parthasarathi, Dilek Hakkani-Tur:
N-Best Hypotheses Reranking for Text-to-SQL Systems. 663-670 - Jia Cui, Heng Lu, Wenjie Wang, Shiyin Kang, Liqiang He, Guangzhi Li, Dong Yu:
Efficient Text Analysis with Pre-Trained Neural Network Models. 671-676 - Sharman Tan, Piyush Behre, Nick Kibre, Issac Alphonso, Shuangyu Chang:
Four-in-One: a Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition. 677-684 - Hiroaki Sugiyama, Masahiro Mizukami, Tsunehiro Arimoto, Hiromi Narimatsu, Yuya Chiba, Hideharu Nakajima, Toyomi Meguro:
Empirical Analysis of Training Strategies of Transformer-Based Japanese Chit-Chat Systems. 685-691 - Xuanjun Chen, Haibin Wu, Helen Meng, Hung-yi Lee, Jyh-Shing Roger Jang:
Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection. 692-699 - Leanne Nortje, Herman Kamper:
Towards Visually Prompted Keyword Localisation for Zero-Resource Spoken Languages. 700-707 - Binghuai Lin, Liyuan Wang:
Exploiting Information From Native Data for Non-Native Automatic Pronunciation Assessment. 708-714 - Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-yi Lee, David Harwath:
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model. 715-722 - Zhengyang Li, Timo Lohrenz, Matthias Dunkelberg, Tim Fingscheidt:
Transformer-Based Lip-Reading with Regularized Dropout and Relaxed Attention. 723-730 - Kayode Olaleye, Dan Oneata, Herman Kamper:
YFACC: A Yorùbá Speech-Image Dataset for Cross-Lingual Keyword Localisation Through Visual Grounding. 731-738 - Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato:
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis. 739-746 - Muhammad Huzaifah, Ivan Kukanov:
An Analysis of Semantically-Aligned Speech-Text Embeddings. 747-754 - Brady Houston, Katrin Kirchhoff:
Exploration of Language-Specific Self-Attention Parameters for Multilingual End-to-End Speech Recognition. 755-762 - Shelly Jain, Aditya Yadavalli, Ganesh Mirishkar, Anil Kumar Vuppala:
How do Phonological Properties Affect Bilingual Automatic Speech Recognition? 763-770 - Ke Hu, Bo Li, Tara N. Sainath:
Scaling Up Deliberation For Multilingual ASR. 771-776 - Amir Hussein, Shammur Absar Chowdhury, Ahmed Abdelali, Najim Dehak, Ahmed Ali, Sanjeev Khudanpur:
Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition. 777-784 - Joshua Jansen van Vüren, Thomas Niesler:
Code-Switched Language Modelling Using a Code Predictive Lstm in Under-Resourced South African Languages. 785-791 - Le Minh Nguyen, Shekhar Nayak, Matt Coler:
Improving Luxembourgish Speech Recognition with Cross-Lingual Speech Representations. 792-797 - Alexis Conneau, Min Ma, Simran Khanuja, Yu Zhang, Vera Axelrod, Siddharth Dalmia, Jason Riesa, Clara Rivera, Ankur Bapna:
FLEURS: FEW-Shot Learning Evaluation of Universal Representations of Speech. 798-805 - Zihan Wang, Qi Meng, HaiFeng Lan, XinRui Zhang, KeHao Guo, Akshat Gupta:
Multilingual Speech Emotion Recognition with Multi-Gating Mechanism and Neural Architecture Search. 806-813 - Hui Lu, Disong Wang, Xixin Wu, Zhiyong Wu, Xunying Liu, Helen Meng:
Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE. 814-821 - Chia-Yu Li, Ngoc Thang Vu:
Improving Semi-Supervised End-To-End Automatic Speech Recognition Using Cyclegan and Inter-Domain Losses. 822-829 - Chandran Savithri Anoop, A. G. Ramakrishnan:
Exploring a Unified ASR for Multiple South Indian Languages Leveraging Multilingual Acoustic and Language Models. 830-837 - Sepand Mavandadi, Bo Li, Chao Zhang, Brian Farris, Tara N. Sainath, Trevor Strohman:
A Truly Multilingual First Pass and Monolingual Second Pass Streaming on-Device ASR System. 838-845 - Xiaoming Zhang, Fan Zhang, Xiaodong Cui, Wei Zhang:
Speech Emotion Recognition with Complementary Acoustic Representations. 846-852 - Amruta Saraf, Ganesh Sivaraman, Elie Khoury:
A Zero-Shot Approach to Identifying Children's Speech in Automatic Gender Classification. 853-859 - Wen Wu, Chao Zhang, Philip C. Woodland:
Distribution-Based Emotion Recognition in Conversation. 860-867 - Yuanchao Li, Yumnah Mohamied, Peter Bell, Catherine Lai:
Exploration of a Self-Supervised Speech Model: A Study on Emotional Corpora. 868-875 - Florian Lux, Ching-Yi Chen, Ngoc Thang Vu:
Combining Contrastive and Non-Contrastive Losses for Fine-Tuning Pretrained Models in Speech Analysis. 876-883 - Yuma Koizumi, Kohei Yatabe, Heiga Zen, Michiel Bacchiani:
Wavefit: an Iterative and Non-Autoregressive Neural Vocoder Based on Fixed-Point Iteration. 884-891 - Mikolaj Babianski, Kamil Pokora, Raahil Shah, Rafal Sienkiewicz, Daniel Korzekwa, Viacheslav Klimkov:
On Granularity of Prosodic Representations in Expressive Text-to-Speech. 892-899 - Sewade Ogun, Vincent Colotte, Emmanuel Vincent:
Can We Use Common Voice to Train a Multi-Speaker TTS System? 900-905 - Matthew Baas, Herman Kamper:
GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models. 906-911 - Sarina Meyer, Pascal Tilli, Pavel Denisov, Florian Lux, Julia Koch, Ngoc Thang Vu:
Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy. 912-919 - Yinghao Aaron Li, Cong Han, Nima Mesgarani:
Styletts-VC: One-Shot Voice Conversion by Knowledge Transfer From Style-Based TTS Models. 920-927 - Jan Melechovský, Ambuj Mehrish, Dorien Herremans, Berrak Sisman:
Learning Accent Representation with Multi-Level VAE Towards Controllable Speech Synthesis. 928-935 - Yoshifumi Nakano, Takaaki Saeki, Shinnosuke Takamichi, Katsuhito Sudoh, Hiroshi Saruwatari:
VTTS: Visual-Text To Speech. 936-942 - Dominik Wagner, Sebastian P. Bayerl, Héctor A. Cordourier Maruri, Tobias Bocklet:
Generative Models for Improved Naturalness, Intelligibility, and Voicing of Whispered Speech. 943-948 - Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, Tomoki Toda:
Two-Stage Training Method for Japanese Electrolaryngeal Speech Enhancement Based on Sequence-to-Sequence Voice Conversion. 949-954 - Hiroki Kanagawa, Yusuke Ijima:
SIMD-Size Aware Weight Regularization for Fast Neural Vocoding on CPU. 955-961 - Florian Lux, Julia Koch, Ngoc Thang Vu:
Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech. 962-969 - Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji, Andros Tjandra, Sakriani Sakti:
NIX-TTS: Lightweight and End-to-End Text-to-Speech Via Module-Wise Distillation. 970-976 - Efthymios Georgiou, Kosmas Kritsis, Georgios Paraskevopoulos, Athanasios Katsamanis, Vassilis Katsouros, Alexandros Potamianos:
Regotron: Regularizing the Tacotron2 Architecture Via Monotonic Alignment Loss. 977-983 - Abdelhamid Ezzerg, Thomas Merritt, Kayoko Yanagisawa, Piotr Bilinski, Magdalena Proszewska, Kamil Pokora, Renard Korzeniowski, Roberto Barra-Chicote, Daniel Korzekwa:
Remap, Warp and Attend: Non-Parallel Many-to-Many Accent Conversion with Normalizing Flows. 984-990 - Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-Chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin Algayres, Tu Anh Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed:
Stop: A Dataset for Spoken Task Oriented Semantic Parsing. 991-998 - Injy Hamed, Amir Hussein, Oumnia Chellah, Shammur Absar Chowdhury, Hamdy Mubarak, Sunayana Sitaram, Nizar Habash, Ahmed Ali:
Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition. 999-1005 - Mohammad Al-Fetyani, Muhammad Al-Barham, Gheith A. Abandah, Adham Alsharkawi, Maha Dawas:
MASC: Massive Arabic Speech Corpus. 1006-1013 - Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Shogo Seki:
Distilling Sequence-to-Sequence Voice Conversion Models for Streaming Conversion Applications. 1022-1028 - Chuanbo Zhu, Takuya Kunihara, Daisuke Saito, Nobuaki Minematsu, Noriko Nakanishi:
Automatic Prediction of Intelligibility of Words and Phonemes Produced Orally by Japanese Learners of English. 1029-1036 - Zuheng Kang, Jianzong Wang, Junqing Peng, Jing Xiao:
SVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution Learning. 1037-1044 - Bi-Cheng Yan, Hsin-Wei Wang, Berlin Chen:
Peppanet: Effective Mispronunciation Detection and Diagnosis Leveraging Phonetic, Phonological, and Acoustic Cues. 1045-1051 - Samuele Cornell, Thomas Balestri, Thibaud Sénéchal:
Implicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection. 1052-1058 - Suliang Bu, Tuo Zhao, Yunxin Zhao:
TDOA Estimation of Speech Source in Noisy Reverberant Environments. 1059-1066 - Luke Strgar, David Harwath:
Phoneme Segmentation Using Self-Supervised Speech Models. 1067-1073 - Chao-Han Huck Yang, I-Fan Chen, Andreas Stolcke, Sabato Marco Siniscalchi, Chin-Hui Lee:
An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition. 1074-1080 - Aghilas Sini, Antoine Perquin, Damien Lolive, Arnaud Delhay:
Phone-Level Pronunciation Scoring for L1 Using Weighted-Dynamic Time Warping. 1081-1087 - Stefano Bannò, Marco Matassoni:
Proficiency Assessment of L2 Spoken English Using Wav2Vec 2.0. 1088-1095 - Tzu-hsun Feng, Shuyan Annie Dong, Ching-Feng Yeh, Shu-Wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe, Abdelrahman Mohamed, Shang-Wen Li, Hung-yi Lee:
Superb @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning. 1096-1103 - Chaoyue Ding, Jiakui Li, Martin Zong, Baoxiang Li:
Speed-Robust Keyword Spotting Via Soft Self-Attention on Multi-Scale Features. 1104-1111 - Guan-Ting Lin, Chi-Luen Feng, Wei-Ping Huang, Yuan Tseng, Tzu-Han Lin, Chen-An Li, Hung-yi Lee, Nigel G. Ward:
On the Utility of Self-Supervised Models for Prosody-Related Tasks. 1104-1111 - Kuan-Po Huang, Yu-Kuan Fu, Tsu-Yuan Hsu, Fabian Ritter Gutierrez, Fan-Lin Wang, Liang-Hsuan Tseng, Yu Zhang, Hung-yi Lee:
Improving Generalizability of Distilled Self-Supervised Speech Processing Models Under Distorted Settings. 1112-1119 - Zih-Ching Chen, Chin-Lun Fu, Chih-Ying Liu, Shang-Wen (Daniel) Li, Hung-yi Lee:
Exploring Efficient-Tuning Methods in Self-Supervised Speech Models. 1120-1127 - Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola García, Hung-yi Lee, Hao Tang:
On Compressing Sequences for Self-Supervised Speech Models. 1128-1135 - Themos Stafylakis, Ladislav Mosner, Sofoklis Kakouros, Oldrich Plchot, Lukás Burget, Jan Cernocký:
Extracting Speaker and Emotion Information from Self-Supervised Speech Models via Channel-Wise Correlations. 1136-1143
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.