default search action
Naoyuki Kanda
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j7]Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka:
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer. IEEE ACM Trans. Audio Speech Lang. Process. 32: 3355-3364 (2024) - [c66]Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Naoyuki Kanda, Jinyu Li, Yashesh Gaur:
Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation. ICASSP 2024: 10381-10385 - [c65]Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka:
Diarist: Streaming Speech Translation with Speaker Diarization. ICASSP 2024: 10866-10870 - [c64]Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li:
T-SOT FNT: Streaming Multi-Talker ASR with Text-Only Domain Adaptation Capability. ICASSP 2024: 11531-11535 - [c63]Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Midia Yousefi, Takuya Yoshioka, Jian Wu:
Profile-Error-Tolerant Target-Speaker Voice Activity Detection. ICASSP 2024: 11906-11910 - [c62]Ziyi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Xuemei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang:
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data. NAACL-HLT (Findings) 2024: 1615-1627 - [i54]Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe'er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka:
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription. CoRR abs/2401.08887 (2024) - [i53]Naoyuki Kanda, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yufei Xia, Jinzhu Li, Yanqing Liu, Sheng Zhao, Michael Zeng:
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like. CoRR abs/2402.07383 (2024) - [i52]Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Yufei Xia, Jinzhu Li, Sheng Zhao, Jinyu Li, Naoyuki Kanda:
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS. CoRR abs/2406.05699 (2024) - [i51]Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Xu Tan, Yanqing Liu, Sheng Zhao, Naoyuki Kanda:
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS. CoRR abs/2406.18009 (2024) - [i50]Haibin Wu, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Daniel Tompkins, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Sheng Zhao, Jinyu Li, Naoyuki Kanda:
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech. CoRR abs/2407.12229 (2024) - 2023
- [c61]Ziyi Yang, Yuwei Fang, Chenguang Zhu, Reid Pryzant, Dongdong Chen, Yu Shi, Yichong Xu, Yao Qian, Mei Gao, Yi-Ling Chen, Liyang Lu, Yujia Xie, Robert Gmyr, Noel Codella, Naoyuki Kanda, Bin Xiao, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang:
i-Code: An Integrative and Composable Multimodal Learning Framework. AAAI 2023: 10880-10890 - [c60]Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez:
Speech Separation with Large-Scale Self-Supervised Learning. ICASSP 2023: 1-5 - [c59]Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang:
Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition. ICASSP 2023: 1-5 - [c58]Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition. ICASSP 2023: 1-5 - [c57]Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu:
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-To-End Neural Diarization. ICASSP 2023: 1-5 - [c56]Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Simulating Realistic Speech Overlaps Improves Multi-Talker ASR. ICASSP 2023: 1-5 - [c55]Naoyuki Kanda, Takuya Yoshioka, Yang Liu:
Factual Consistency Oriented Speech Recognition. INTERSPEECH 2023: 236-240 - [c54]Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng:
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers. INTERSPEECH 2023: 1314-1318 - [c53]Midia Yousefi, Naoyuki Kanda, Dongmei Wang, Zhuo Chen, Xiaofei Wang, Takuya Yoshioka:
Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach. INTERSPEECH 2023: 3502-3506 - [i49]Naoyuki Kanda, Takuya Yoshioka, Yang Liu:
Factual Consistency Oriented Speech Recognition. CoRR abs/2302.12369 (2023) - [i48]Ziyi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang:
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data. CoRR abs/2305.12311 (2023) - [i47]Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng:
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers. CoRR abs/2305.18747 (2023) - [i46]Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka:
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer. CoRR abs/2308.06873 (2023) - [i45]Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka:
DiariST: Streaming Speech Translation with Speaker Diarization. CoRR abs/2309.08007 (2023) - [i44]Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li:
t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability. CoRR abs/2309.08131 (2023) - [i43]Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Midia Yousefi, Takuya Yoshioka, Jian Wu:
Profile-Error-Tolerant Target-Speaker Voice Activity Detection. CoRR abs/2309.12521 (2023) - [i42]Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Naoyuki Kanda, Jinyu Li, Yashesh Gaur:
Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation. CoRR abs/2310.14806 (2023) - 2022
- [j6]Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu Jeong Han, Shinji Watanabe, Shrikanth Narayanan:
A review of speaker diarization: Recent advances with deep learning. Comput. Speech Lang. 72: 101317 (2022) - [j5]Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Xiangzhan Yu, Furu Wei:
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing. IEEE J. Sel. Top. Signal Process. 16(6): 1505-1518 (2022) - [c52]Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda:
VarArray: Array-Geometry-Agnostic Continuous Speech Separation. ICASSP 2022: 6027-6031 - [c51]Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez:
All-Neural Beamformer for Continuous Speech Separation. ICASSP 2022: 6032-6036 - [c50]Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR. ICASSP 2022: 8082-8086 - [c49]Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings. INTERSPEECH 2022: 521-525 - [c48]Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li, Xie Chen, Yu Wu, Yifan Gong:
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition. INTERSPEECH 2022: 2608-2612 - [c47]Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Multi-Talker ASR with Token-Level Serialized Output Training. INTERSPEECH 2022: 3774-3778 - [c46]Xiaofei Wang, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka:
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation. INTERSPEECH 2022: 3814-3818 - [c45]Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei:
Separating Long-Form Speech with Group-wise Permutation Invariant Training. INTERSPEECH 2022: 5383-5387 - [i41]Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Multi-Talker ASR with Token-Level Serialized Output Training. CoRR abs/2202.00842 (2022) - [i40]Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings. CoRR abs/2203.16685 (2022) - [i39]Xiaofei Wang, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka:
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation. CoRR abs/2204.03232 (2022) - [i38]Ziyi Yang, Yuwei Fang, Chenguang Zhu, Reid Pryzant, Dongdong Chen, Yu Shi, Yichong Xu, Yao Qian, Mei Gao, Yi-Ling Chen, Liyang Lu, Yujia Xie, Robert Gmyr, Noel Codella, Naoyuki Kanda, Bin Xiao, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang:
i-Code: An Integrative and Composable Multimodal Learning Framework. CoRR abs/2205.01818 (2022) - [i37]Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu:
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization. CoRR abs/2208.13085 (2022) - [i36]Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition. CoRR abs/2209.04974 (2022) - [i35]Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Simulating realistic speech overlaps improves multi-talker ASR. CoRR abs/2210.15715 (2022) - [i34]Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez:
Speech separation with large-scale self-supervised learning. CoRR abs/2211.05172 (2022) - [i33]Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang:
Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition. CoRR abs/2211.05564 (2022) - [i32]Xiaofei Wang, Zhuo Chen, Yu Shi, Jian Wu, Naoyuki Kanda, Takuya Yoshioka:
Breaking trade-offs in speech separation with sparsely-gated mixture of experts. CoRR abs/2211.06493 (2022) - 2021
- [j4]Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong:
Streaming End-to-End Multi-Talker Speech Recognition. IEEE Signal Process. Lett. 28: 803-807 (2021) - [c44]Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio. ASRU 2021: 296-303 - [c43]Xiong Xiao, Naoyuki Kanda, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao, Gang Liu, Yu Wu, Jian Wu, Shujie Liu, Jinyu Li, Yifan Gong:
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020. ICASSP 2021: 5824-5828 - [c42]Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka:
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR. ICASSP 2021: 6503-6507 - [c41]Xuankai Chang, Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka:
Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings. ICASSP 2021: 6763-6767 - [c40]Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu, Xie Chen, Jinyu Li, Yifan Gong:
Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition. ICASSP 2021: 7338-7342 - [c39]Yao Qian, Ximo Bian, Yu Shi, Naoyuki Kanda, Leo Shen, Zhen Xiao, Michael Zeng:
Speech-Language Pre-Training for End-to-End Spoken Language Understanding. ICASSP 2021: 7458-7462 - [c38]Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong:
Streaming Multi-Talker Speech Recognition with Joint Speaker Identification. Interspeech 2021: 1782-1786 - [c37]Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong:
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition. Interspeech 2021: 2596-2600 - [c36]Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li:
Investigation of Practical Aspects of Single Channel Speech Separation for ASR. Interspeech 2021: 3066-3070 - [c35]Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone. Interspeech 2021: 3430-3434 - [c34]Liang Lu, Zhong Meng, Naoyuki Kanda, Jinyu Li, Yifan Gong:
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer. Interspeech 2021: 3435-3439 - [c33]Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
End-to-End Speaker-Attributed ASR with Transformer. Interspeech 2021: 4413-4417 - [c32]Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong:
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition. SLT 2021: 243-250 - [c31]Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings. SLT 2021: 809-816 - [c30]Xiaofei Wang, Naoyuki Kanda, Yashesh Gaur, Zhuo Chen, Zhong Meng, Takuya Yoshioka:
Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription. SLT 2021: 833-840 - [c29]Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey:
Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis. SLT 2021: 897-904 - [i31]Xuankai Chang, Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka:
Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker Recordings. CoRR abs/2101.01853 (2021) - [i30]Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu Jeong Han, Shinji Watanabe, Shrikanth Narayanan:
A Review of Speaker Diarization: Recent Advances with Deep Learning. CoRR abs/2101.09624 (2021) - [i29]Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu, Xie Chen, Jinyu Li, Yifan Gong:
Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition. CoRR abs/2102.01380 (2021) - [i28]Yao Qian, Ximo Bian, Yu Shi, Naoyuki Kanda, Leo Shen, Zhen Xiao, Michael Zeng:
Speech-language Pre-training for End-to-end Spoken Language Understanding. CoRR abs/2102.06283 (2021) - [i27]Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone. CoRR abs/2103.16776 (2021) - [i26]Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong:
Streaming Multi-talker Speech Recognition with Joint Speaker Identification. CoRR abs/2104.02109 (2021) - [i25]Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
End-to-End Speaker-Attributed ASR with Transformer. CoRR abs/2104.02128 (2021) - [i24]Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong:
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition. CoRR abs/2106.02302 (2021) - [i23]Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li:
Investigation of Practical Aspects of Single Channel Speech Separation for ASR. CoRR abs/2107.01922 (2021) - [i22]Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio. CoRR abs/2107.02852 (2021) - [i21]Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR. CoRR abs/2110.03151 (2021) - [i20]Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li, Xie Chen, Yu Wu, Yifan Gong:
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition. CoRR abs/2110.05354 (2021) - [i19]Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda:
VarArray: Array-Geometry-Agnostic Continuous Speech Separation. CoRR abs/2110.05745 (2021) - [i18]Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez:
All-neural beamformer for continuous speech separation. CoRR abs/2110.06428 (2021) - [i17]Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei:
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing. CoRR abs/2110.13900 (2021) - [i16]Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei:
Separating Long-Form Speech with Group-Wise Permutation Invariant Training. CoRR abs/2110.14142 (2021) - 2020
- [c28]Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka:
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers. INTERSPEECH 2020: 36-40 - [c27]Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka:
Serialized Output Training for End-to-End Overlapped Speech Recognition. INTERSPEECH 2020: 2797-2801 - [i15]Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka:
Serialized Output Training for End-to-End Overlapped Speech Recognition. CoRR abs/2003.12687 (2020) - [i14]Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka:
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers. CoRR abs/2006.10930 (2020) - [i13]Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings. CoRR abs/2008.04546 (2020) - [i12]Xiong Xiao, Naoyuki Kanda, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao, Gang Liu, Yu Wu, Jian Wu, Shujie Liu, Jinyu Li, Yifan Gong:
Microsoft Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2020. CoRR abs/2010.11458 (2020) - [i11]Liang Lu, Zhong Meng, Naoyuki Kanda, Jinyu Li, Yifan Gong:
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer. CoRR abs/2010.12673 (2020) - [i10]Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong:
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition. CoRR abs/2011.01991 (2020) - [i9]Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Mao-Kui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey:
Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis. CoRR abs/2011.02014 (2020) - [i8]Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka:
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR. CoRR abs/2011.02921 (2020) - [i7]Xiaofei Wang, Naoyuki Kanda, Yashesh Gaur, Zhuo Chen, Zhong Meng, Takuya Yoshioka:
Exploring End-to-End Multi-channel ASR with Bias Information for Meeting Transcription. CoRR abs/2011.03110 (2020) - [i6]Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong:
Streaming end-to-end multi-talker speech recognition. CoRR abs/2011.13148 (2020)
2010 – 2019
- 2019
- [c26]Naoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe:
Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models. ASRU 2019: 31-38 - [c25]Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe:
End-to-End Neural Speaker Diarization with Self-Attention. ASRU 2019: 296-303 - [c24]Naoyuki Kanda, Yusuke Fujita, Shota Horiguchi, Rintaro Ikeshita, Kenji Nagamatsu, Shinji Watanabe:
Acoustic Modeling for Distant Multi-talker Speech Recognition with Single- and Multi-channel Branches. ICASSP 2019: 6630-6634 - [c23]Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu, Shinji Watanabe:
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition. INTERSPEECH 2019: 236-240 - [c22]Naoyuki Kanda, Christoph Böddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach:
Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR. INTERSPEECH 2019: 1248-1252 - [c21]Shota Horiguchi, Naoyuki Kanda, Kenji Nagamatsu:
Multimodal Response Obligation Detection with Unsupervised Online Domain Adaptation. INTERSPEECH 2019: 4180-4184 - [c20]Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe:
End-to-End Neural Speaker Diarization with Permutation-Free Objectives. INTERSPEECH 2019: 4300-4304 - [i5]Naoyuki Kanda, Christoph Böddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach:
Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR. CoRR abs/1905.12230 (2019) - [i4]Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu, Shinji Watanabe:
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition. CoRR abs/1906.10876 (2019) - [i3]Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe:
End-to-End Neural Speaker Diarization with Permutation-Free Objectives. CoRR abs/1909.05952 (2019) - [i2]Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe:
End-to-End Neural Speaker Diarization with Self-attention. CoRR abs/1909.06247 (2019) - [i1]Naoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe:
Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models. CoRR abs/1909.08103 (2019) - 2018
- [c19]Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu:
Sequence Distillation for Purely Sequence Trained Acoustic Models. ICASSP 2018: 5964-5968 - [c18]Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu:
Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models. INTERSPEECH 2018: 2923-2927 - [c17]Shota Horiguchi, Naoyuki Kanda, Kenji Nagamatsu:
Face-Voice Matching using Cross-modal Embeddings. ACM Multimedia 2018: 1011-1019 - 2017
- [j3]Naoyuki Kanda, Xugang Lu, Hisashi Kawai:
Maximum-a-Posteriori-Based Decoding for End-to-End Acoustic Models. IEEE ACM Trans. Audio Speech Lang. Process. 25(5): 1023-1034 (2017) - [c16]Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu:
Investigation of lattice-free maximum mutual information-based acoustic models with sequence-level Kullback-Leibler divergence. ASRU 2017: 69-76 - [c15]Naoyuki Kanda, Xugang Lu, Hisashi Kawai:
Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework. ICASSP 2017: 4855-4859 - 2016
- [j2]Peng Shen, Xugang Lu, Xinhui Hu, Naoyuki Kanda, Masahiro Saiko, Chiori Hori, Hisashi Kawai:
Combination of multiple acoustic models with unsupervised adaptation for lecture speech transcription. Speech Commun. 82: 1-13 (2016) - [c14]Naoyuki Kanda, Shoji Harada, Xugang Lu, Hisashi Kawai:
Investigation of Semi-Supervised Acoustic Model Training Based on the Committee of Heterogeneous Neural Networks. INTERSPEECH 2016: 1325-1329 - [c13]Naoyuki Kanda, Xugang Lu, Hisashi Kawai:
Maximum a posteriori Based Decoding for CTC Acoustic Models. INTERSPEECH 2016: 1868-1872 - 2015
- [c12]Naoyuki Kanda, Mitsuyoshi Tachimori, Xugang Lu, Hisashi Kawai:
Training data pseudo-shuffling and direct decoding framework for recurrent neural network based acoustic modeling. ASRU 2015: 15-21 - 2014
- [b1]Naoyuki Kanda:
Open-ended Spoken Language Technology: Studies on Spoken Dialogue Systems and Spoken Document Retrieval Systems. Kyoto University, Japan, 2014 - [c11]Ryu Takeda, Naoyuki Kanda, Nobuo Nukaga:
Boundary contraction training for acoustic models based on discrete deep neural networks. INTERSPEECH 2014: 1063-1067 - [c10]Peng Shen, Yugang Lu, Xinhui Hu, Naoyuki Kanda, Masahiro Saiko, Chiori Hori:
The NCT ASR system for IWSLT 2014. IWSLT (Evaluation Campaign) 2014 - 2013
- [c9]Naoyuki Kanda, Ryu Takeda, Yasunari Obuchi:
Elastic spectral distortion for low resource speech recognition with deep neural networks. ASRU 2013: 309-314 - [c8]Naoyuki Kanda, Katsutoshi Itoyama, Hiroshi G. Okuno:
Multiple index combination for Japanese spoken term detection with optimum index selection based on OOV-region classifier. ICASSP 2013: 8540-8544 - [c7]Naoyuki Kanda, Ryu Takeda, Yasunari Obuchi:
Noise robust speaker verification with delta cepstrum normalization. INTERSPEECH 2013: 3112-3116 - 2012
- [c6]Yasunari Obuchi, Ryu Takeda, Naoyuki Kanda:
Voice activity detection based on augmented statistical noise suppression. APSIPA 2012: 1-4 - [c5]Naoyuki Kanda, Ryu Takeda, Yasunari Obuchi:
Using rhythmic features for Japanese spoken term detection. SLT 2012: 170-175 - 2011
- [j1]Mikio Nakano, Yuji Hasegawa, Kotaro Funakoshi, Johane Takeuchi, Toyotaka Torii, Kazuhiro Nakadai, Naoyuki Kanda, Kazunori Komatani, Hiroshi G. Okuno, Hiroshi Tsujino:
A multi-expert model for dialogue and behavior control of conversational robots and agents. Knowl. Based Syst. 24(2): 248-256 (2011)
2000 – 2009
- 2008
- [c4]Naoyuki Kanda, Hirohiko Sagawa, Takashi Sumiyoshi, Yasunari Obuchi:
Open-vocabulary keyword detection from super-large scale speech database. MMSP 2008: 939-944 - 2006
- [c3]Kazunori Komatani, Naoyuki Kanda, Mikio Nakano, Kazuhiro Nakadai, Hiroshi Tsujino, Tetsuya Ogata, Hiroshi G. Okuno:
Multi-Domain Spoken Dialogue System with Extensibility and Robustness against Speech Recognition Errors. SIGDIAL Workshop 2006: 9-17 - 2005
- [c2]Kazunori Komatani, Naoyuki Kanda, Tetsuya Ogata, Hiroshi G. Okuno:
Contextual constraints based on dialogue models in database search task for spoken dialogue systems. INTERSPEECH 2005: 877-880 - [c1]Mikio Nakano, Yuji Hasegawa, Kazuhiro Nakadai, Takahiro Nakamura, Johane Takeuchi, Toyotaka Torii, Hiroshi Tsujino, Naoyuki Kanda, Hiroshi G. Okuno:
A two-layer model for behavior and dialogue planning in conversational service robots. IROS 2005: 3329-3335
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-12-15 01:26 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint