default search action

combined dblp search
author search
venue search
publication search

ask others

Naoyuki Kanda

> Home > Persons

Person information

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

2020 – today

see FAQ

What is the meaning of the colors in the publication lists?

2024
[j7]
- view
  authority control:
- export record
  dblp key:
  - journals/taslp/WangTCKECTLLY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/taslp/WangTCKECTLLY24
Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka:
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer. IEEE ACM Trans. Audio Speech Lang. Process. 32: 3355-3364 (2024)
[c66]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/PapiWCXK0G24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/PapiWCXK0G24
Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Naoyuki Kanda, Jinyu Li, Yashesh Gaur:
Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation. ICASSP 2024: 10381-10385
[c65]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/YangK0CWX0Y24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/YangK0CWX0Y24
Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka:
Diarist: Streaming Speech Translation with Speaker Diarization. ICASSP 2024: 10866-10870
[c64]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/WuKY00024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/WuKY00024
Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li:
T-SOT FNT: Streaming Multi-Talker ASR with Text-Only Domain Adaptation Capability. ICASSP 2024: 11531-11535
[c63]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/WangXKYYW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/WangXKYYW24
Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Midia Yousefi, Takuya Yoshioka, Jian Wu:
Profile-Error-Tolerant Target-Speaker Voice Activity Detection. ICASSP 2024: 11906-11910
[c62]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/naacl/YangKXPFZCQGCGKCXSYYZH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/naacl/YangKXPFZCQGCGKCXSYYZH24
Ziyi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Xuemei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang:
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data. NAACL-HLT (Findings) 2024: 1615-1627
[i54]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2401-08887
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2401-08887
Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe'er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka:
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription. CoRR abs/2401.08887 (2024)
[i53]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2402-07383
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2402-07383
Naoyuki Kanda, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yufei Xia, Jinzhu Li, Yanqing Liu, Sheng Zhao, Michael Zeng:
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like. CoRR abs/2402.07383 (2024)
[i52]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-05699
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-05699
Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Yufei Xia, Jinzhu Li, Sheng Zhao, Jinyu Li, Naoyuki Kanda:
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS. CoRR abs/2406.05699 (2024)
[i51]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-18009
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-18009
Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Xu Tan, Yanqing Liu, Sheng Zhao, Naoyuki Kanda:
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS. CoRR abs/2406.18009 (2024)
[i50]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2407-12229
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2407-12229
Haibin Wu, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Daniel Tompkins, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Sheng Zhao, Jinyu Li, Naoyuki Kanda:
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech. CoRR abs/2407.12229 (2024)
2023
[c61]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/aaai/YangFZP00XQGCLX23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/aaai/YangFZP00XQGCLX23
Ziyi Yang, Yuwei Fang, Chenguang Zhu, Reid Pryzant, Dongdong Chen, Yu Shi, Yichong Xu, Yao Qian, Mei Gao, Yi-Ling Chen, Liyang Lu, Yujia Xie, Robert Gmyr, Noel Codella, Naoyuki Kanda, Bin Xiao, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang:
i-Code: An Integrative and Composable Multimodal Learning Framework. AAAI 2023: 10880-10890
[c60]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ChenKWWWYLSE23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ChenKWWWYLSE23
Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez:
Speech Separation with Large-Scale Self-Supervised Learning. ICASSP 2023: 1-5
[c59]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/HuangCKWWLYWW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/HuangCKWWLYWW23
Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang:
Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition. ICASSP 2023: 1-5
[c58]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/KandaWWCLY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/KandaWWCLY23
Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition. ICASSP 2023: 1-5
[c57]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/WangXKYW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/WangXKYW23
Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu:
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-To-End Neural Diarization. ICASSP 2023: 1-5
[c56]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/YangKWWSCLY23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/YangKWWSCLY23
Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Simulating Realistic Speech Overlaps Improves Multi-Talker ASR. ICASSP 2023: 1-5
[c55]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KandaYL23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KandaYL23
Naoyuki Kanda, Takuya Yoshioka, Yang Liu:
Factual Consistency Oriented Speech Recognition. INTERSPEECH 2023: 236-240
[c54]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiQ0KWYQ023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiQ0KWYQ023
Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng:
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers. INTERSPEECH 2023: 1314-1318
[c53]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YousefiKW00Y23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YousefiKW00Y23
Midia Yousefi, Naoyuki Kanda, Dongmei Wang, Zhuo Chen, Xiaofei Wang, Takuya Yoshioka:
Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach. INTERSPEECH 2023: 3502-3506
[i49]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2302-12369
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2302-12369
Naoyuki Kanda, Takuya Yoshioka, Yang Liu:
Factual Consistency Oriented Speech Recognition. CoRR abs/2302.12369 (2023)
[i48]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-12311
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-12311
Ziyi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang:
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data. CoRR abs/2305.12311 (2023)
[i47]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-18747
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-18747
Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng:
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers. CoRR abs/2305.18747 (2023)
[i46]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2308-06873
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2308-06873
Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka:
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer. CoRR abs/2308.06873 (2023)
[i45]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-08007
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-08007
Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka:
DiariST: Streaming Speech Translation with Speaker Diarization. CoRR abs/2309.08007 (2023)
[i44]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-08131
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-08131
Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li:
t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability. CoRR abs/2309.08131 (2023)
[i43]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-12521
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-12521
Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Midia Yousefi, Takuya Yoshioka, Jian Wu:
Profile-Error-Tolerant Target-Speaker Voice Activity Detection. CoRR abs/2309.12521 (2023)
[i42]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2310-14806
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2310-14806
Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Naoyuki Kanda, Jinyu Li, Yashesh Gaur:
Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation. CoRR abs/2310.14806 (2023)
2022
[j6]
- view
  authority control:
- export record
  dblp key:
  - journals/csl/ParkKDHWN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/csl/ParkKDHWN22
Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu Jeong Han, Shinji Watanabe, Shrikanth Narayanan:
A review of speaker diarization: Recent advances with deep learning. Comput. Speech Lang. 72: 101317 (2022)
[j5]
- view
  authority control:
- export record
  dblp key:
  - journals/jstsp/ChenWCWLCLKYXWZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/jstsp/ChenWCWLCLKYXWZ22
Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Xiangzhan Yu, Furu Wei:
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing. IEEE J. Sel. Top. Signal Process. 16(6): 1505-1518 (2022)
[c52]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/YoshiokaWWTZCK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/YoshiokaWWTZCK22
Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda:
VarArray: Array-Geometry-Agnostic Continuous Speech Separation. ICASSP 2022: 6027-6031
[c51]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ZhangYKCWWE22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ZhangYKCWWE22
Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez:
All-Neural Beamformer for Continuous Speech Separation. ICASSP 2022: 6032-6036
[c50]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/KandaXGWMCY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/KandaXGWMCY22
Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR. ICASSP 2022: 8082-8086
[c49]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kanda0WXMWG00Y22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kanda0WXMWG00Y22
Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings. INTERSPEECH 2022: 521-525
[c48]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MengGK0CW022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MengGK0CW022
Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li, Xie Chen, Yu Wu, Yifan Gong:
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition. INTERSPEECH 2022: 2608-2612
[c47]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KandaWWXMWGC0Y22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KandaWWXMWGC0Y22
Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Multi-Talker ASR with Token-Level Serialized Output Training. INTERSPEECH 2022: 3774-3778
[c46]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangWKEY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangWKEY22
Xiaofei Wang, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka:
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation. INTERSPEECH 2022: 3814-3818
[c45]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhang0K00EYXMQW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhang0K00EYXMQW22
Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei:
Separating Long-Form Speech with Group-wise Permutation Invariant Training. INTERSPEECH 2022: 5383-5387
[i41]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2202-00842
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2202-00842
Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Multi-Talker ASR with Token-Level Serialized Output Training. CoRR abs/2202.00842 (2022)
[i40]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2203-16685
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2203-16685
Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings. CoRR abs/2203.16685 (2022)
[i39]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2204-03232
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2204-03232
Xiaofei Wang, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka:
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation. CoRR abs/2204.03232 (2022)
[i38]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2205-01818
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2205-01818
Ziyi Yang, Yuwei Fang, Chenguang Zhu, Reid Pryzant, Dongdong Chen, Yu Shi, Yichong Xu, Yao Qian, Mei Gao, Yi-Ling Chen, Liyang Lu, Yujia Xie, Robert Gmyr, Noel Codella, Naoyuki Kanda, Bin Xiao, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang:
i-Code: An Integrative and Composable Multimodal Learning Framework. CoRR abs/2205.01818 (2022)
[i37]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2208-13085
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2208-13085
Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu:
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization. CoRR abs/2208.13085 (2022)
[i36]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2209-04974
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2209-04974
Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition. CoRR abs/2209.04974 (2022)
[i35]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2210-15715
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2210-15715
Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Simulating realistic speech overlaps improves multi-talker ASR. CoRR abs/2210.15715 (2022)
[i34]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2211-05172
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2211-05172
Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez:
Speech separation with large-scale self-supervised learning. CoRR abs/2211.05172 (2022)
[i33]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2211-05564
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2211-05564
Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang:
Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition. CoRR abs/2211.05564 (2022)
[i32]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2211-06493
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2211-06493
Xiaofei Wang, Zhuo Chen, Yu Shi, Jian Wu, Naoyuki Kanda, Takuya Yoshioka:
Breaking trade-offs in speech separation with sparsely-gated mixture of experts. CoRR abs/2211.06493 (2022)
2021
[j4]
- view
  authority control:
- export record
  dblp key:
  - journals/spl/LuKLG21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/spl/LuKLG21
Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong:
Streaming End-to-End Multi-Talker Speech Recognition. IEEE Signal Process. Lett. 28: 803-807 (2021)
[c44]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/KandaXWZGWMCY21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/KandaXWZGWMCY21
Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio. ASRU 2021: 296-303
[c43]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/XiaoK0ZYC0L0W0021
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/XiaoK0ZYC0L0W0021
Xiong Xiao, Naoyuki Kanda, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao, Gang Liu, Yu Wu, Jian Wu, Shujie Liu, Jinyu Li, Yifan Gong:
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020. ICASSP 2021: 5824-5828
[c42]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/KandaMLGWCY21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/KandaMLGWCY21
Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka:
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR. ICASSP 2021: 6503-6507
[c41]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ChangKGWMY21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ChangKGWMY21
Xuankai Chang, Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka:
Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings. ICASSP 2021: 6763-6767
[c40]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/MengKGPSLC0021
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/MengKGPSLC0021
Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu, Xie Chen, Jinyu Li, Yifan Gong:
Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition. ICASSP 2021: 7338-7342
[c39]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/QianBSKSXZ21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/QianBSKSXZ21
Yao Qian, Ximo Bian, Yu Shi, Naoyuki Kanda, Leo Shen, Zhen Xiao, Michael Zeng:
Speech-Language Pre-Training for End-to-End Spoken Language Understanding. ICASSP 2021: 7458-7462
[c38]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001K0021
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001K0021
Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong:
Streaming Multi-Talker Speech Recognition with Joint Speaker Identification. Interspeech 2021: 1782-1786
[c37]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Meng0K0CYSL021
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Meng0K0CYSL021
Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong:
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition. Interspeech 2021: 2596-2600
[c36]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuCCWYK0L21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuCCWYK0L21
Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li:
Investigation of Practical Aspects of Single Channel Speech Separation for ASR. Interspeech 2021: 3066-3070
[c35]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KandaYWGWMCY21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KandaYWGWMCY21
Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone. Interspeech 2021: 3430-3434
[c34]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001MKL021
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001MKL021
Liang Lu, Zhong Meng, Naoyuki Kanda, Jinyu Li, Yifan Gong:
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer. Interspeech 2021: 3435-3439
[c33]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KandaYGWMCY21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KandaYGWMCY21
Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
End-to-End Speaker-Attributed ASR with Transformer. Interspeech 2021: 4413-4417
[c32]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/MengPSGKLCZLG21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/MengPSGKLCZLG21
Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong:
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition. SLT 2021: 243-250
[c31]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/KandaCGWMCY21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/KandaCGWMCY21
Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings. SLT 2021: 809-816
[c30]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/WangKGCMY21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/WangKGCMY21
Xiaofei Wang, Naoyuki Kanda, Yashesh Gaur, Zhuo Chen, Zhong Meng, Takuya Yoshioka:
Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription. SLT 2021: 833-840
[c29]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/RajDCEHH0DYLKLW21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/RajDCEHH0DYLKLW21
Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey:
Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis. SLT 2021: 897-904
[i31]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2101-01853
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2101-01853
Xuankai Chang, Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka:
Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker Recordings. CoRR abs/2101.01853 (2021)
[i30]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2101-09624
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2101-09624
Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu Jeong Han, Shinji Watanabe, Shrikanth Narayanan:
A Review of Speaker Diarization: Recent Advances with Deep Learning. CoRR abs/2101.09624 (2021)
[i29]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2102-01380
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2102-01380
Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu, Xie Chen, Jinyu Li, Yifan Gong:
Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition. CoRR abs/2102.01380 (2021)
[i28]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2102-06283
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2102-06283
Yao Qian, Ximo Bian, Yu Shi, Naoyuki Kanda, Leo Shen, Zhen Xiao, Michael Zeng:
Speech-language Pre-training for End-to-end Spoken Language Understanding. CoRR abs/2102.06283 (2021)
[i27]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2103-16776
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2103-16776
Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone. CoRR abs/2103.16776 (2021)
[i26]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2104-02109
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2104-02109
Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong:
Streaming Multi-talker Speech Recognition with Joint Speaker Identification. CoRR abs/2104.02109 (2021)
[i25]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2104-02128
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2104-02128
Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
End-to-End Speaker-Attributed ASR with Transformer. CoRR abs/2104.02128 (2021)
[i24]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2106-02302
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2106-02302
Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong:
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition. CoRR abs/2106.02302 (2021)
[i23]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2107-01922
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2107-01922
Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li:
Investigation of Practical Aspects of Single Channel Speech Separation for ASR. CoRR abs/2107.01922 (2021)
[i22]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2107-02852
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2107-02852
Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio. CoRR abs/2107.02852 (2021)
[i21]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2110-03151
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2110-03151
Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR. CoRR abs/2110.03151 (2021)
[i20]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2110-05354
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2110-05354
Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li, Xie Chen, Yu Wu, Yifan Gong:
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition. CoRR abs/2110.05354 (2021)
[i19]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2110-05745
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2110-05745
Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda:
VarArray: Array-Geometry-Agnostic Continuous Speech Separation. CoRR abs/2110.05745 (2021)
[i18]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2110-06428
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2110-06428
Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez:
All-neural beamformer for continuous speech separation. CoRR abs/2110.06428 (2021)
[i17]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2110-13900
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2110-13900
Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei:
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing. CoRR abs/2110.13900 (2021)
[i16]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2110-14142
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2110-14142
Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei:
Separating Long-Form Speech with Group-Wise Permutation Invariant Training. CoRR abs/2110.14142 (2021)
2020
[c28]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KandaGWMCZY20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KandaGWMCZY20
Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka:
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers. INTERSPEECH 2020: 36-40
[c27]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KandaGWMY20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KandaGWMY20
Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka:
Serialized Output Training for End-to-End Overlapped Speech Recognition. INTERSPEECH 2020: 2797-2801
[i15]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2003-12687
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2003-12687
Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka:
Serialized Output Training for End-to-End Overlapped Speech Recognition. CoRR abs/2003.12687 (2020)
[i14]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2006-10930
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2006-10930
Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka:
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers. CoRR abs/2006.10930 (2020)
[i13]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2008-04546
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2008-04546
Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings. CoRR abs/2008.04546 (2020)
[i12]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2010-11458
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2010-11458
Xiong Xiao, Naoyuki Kanda, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao, Gang Liu, Yu Wu, Jian Wu, Shujie Liu, Jinyu Li, Yifan Gong:
Microsoft Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2020. CoRR abs/2010.11458 (2020)
[i11]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2010-12673
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2010-12673
Liang Lu, Zhong Meng, Naoyuki Kanda, Jinyu Li, Yifan Gong:
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer. CoRR abs/2010.12673 (2020)
[i10]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2011-01991
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2011-01991
Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong:
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition. CoRR abs/2011.01991 (2020)
[i9]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2011-02014
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2011-02014
Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Mao-Kui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey:
Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis. CoRR abs/2011.02014 (2020)
[i8]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2011-02921
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2011-02921
Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka:
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR. CoRR abs/2011.02921 (2020)
[i7]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2011-03110
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2011-03110
Xiaofei Wang, Naoyuki Kanda, Yashesh Gaur, Zhuo Chen, Zhong Meng, Takuya Yoshioka:
Exploring End-to-End Multi-channel ASR with Bias Information for Meeting Transcription. CoRR abs/2011.03110 (2020)
[i6]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2011-13148
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2011-13148
Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong:
Streaming end-to-end multi-talker speech recognition. CoRR abs/2011.13148 (2020)

2010 – 2019

see FAQ

What is the meaning of the colors in the publication lists?

2019
[c26]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/KandaHFXNW19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/KandaHFXNW19
Naoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe:
Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models. ASRU 2019: 31-38
[c25]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/FujitaKHXNW19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/FujitaKHXNW19
Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe:
End-to-End Neural Speaker Diarization with Self-Attention. ASRU 2019: 296-303
[c24]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/KandaFHINW19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/KandaFHINW19
Naoyuki Kanda, Yusuke Fujita, Shota Horiguchi, Rintaro Ikeshita, Kenji Nagamatsu, Shinji Watanabe:
Acoustic Modeling for Distant Multi-talker Speech Recognition with Single- and Multi-channel Branches. ICASSP 2019: 6630-6634
[c23]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KandaHTFNW19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KandaHTFNW19
Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu, Shinji Watanabe:
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition. INTERSPEECH 2019: 236-240
[c22]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KandaBHFHNH19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KandaBHFHNH19
Naoyuki Kanda, Christoph Böddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach:
Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR. INTERSPEECH 2019: 1248-1252
[c21]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoriguchiKN19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoriguchiKN19
Shota Horiguchi, Naoyuki Kanda, Kenji Nagamatsu:
Multimodal Response Obligation Detection with Unsupervised Online Domain Adaptation. INTERSPEECH 2019: 4180-4184
[c20]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FujitaKHNW19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FujitaKHNW19
Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe:
End-to-End Neural Speaker Diarization with Permutation-Free Objectives. INTERSPEECH 2019: 4300-4304
[i5]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-1905-12230
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1905-12230
Naoyuki Kanda, Christoph Böddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach:
Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR. CoRR abs/1905.12230 (2019)
[i4]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-1906-10876
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1906-10876
Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu, Shinji Watanabe:
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition. CoRR abs/1906.10876 (2019)
[i3]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-1909-05952
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1909-05952
Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, Shinji Watanabe:
End-to-End Neural Speaker Diarization with Permutation-Free Objectives. CoRR abs/1909.05952 (2019)
[i2]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-1909-06247
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1909-06247
Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe:
End-to-End Neural Speaker Diarization with Self-attention. CoRR abs/1909.06247 (2019)
[i1]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-1909-08103
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1909-08103
Naoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe:
Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models. CoRR abs/1909.08103 (2019)
2018
[c19]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/KandaFN18
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/KandaFN18
Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu:
Sequence Distillation for Purely Sequence Trained Acoustic Models. ICASSP 2018: 5964-5968
[c18]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KandaFN18
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KandaFN18
Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu:
Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models. INTERSPEECH 2018: 2923-2927
[c17]
- view
  authority control:
- export record
  dblp key:
  - conf/mm/HoriguchiKN18
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/mm/HoriguchiKN18
Shota Horiguchi, Naoyuki Kanda, Kenji Nagamatsu:
Face-Voice Matching using Cross-modal Embeddings. ACM Multimedia 2018: 1011-1019
2017
[j3]
- view
  authority control:
- export record
  dblp key:
  - journals/taslp/KandaLK17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/taslp/KandaLK17
Naoyuki Kanda, Xugang Lu, Hisashi Kawai:
Maximum-a-Posteriori-Based Decoding for End-to-End Acoustic Models. IEEE ACM Trans. Audio Speech Lang. Process. 25(5): 1023-1034 (2017)
[c16]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/KandaFN17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/KandaFN17
Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu:
Investigation of lattice-free maximum mutual information-based acoustic models with sequence-level Kullback-Leibler divergence. ASRU 2017: 69-76
[c15]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/KandaLK17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/KandaLK17
Naoyuki Kanda, Xugang Lu, Hisashi Kawai:
Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework. ICASSP 2017: 4855-4859
2016
[j2]
- view
  authority control:
- export record
  dblp key:
  - journals/speech/ShenLHKSHK16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/speech/ShenLHKSHK16
Peng Shen, Xugang Lu, Xinhui Hu, Naoyuki Kanda, Masahiro Saiko, Chiori Hori, Hisashi Kawai:
Combination of multiple acoustic models with unsupervised adaptation for lecture speech transcription. Speech Commun. 82: 1-13 (2016)
[c14]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KandaHLK16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KandaHLK16
Naoyuki Kanda, Shoji Harada, Xugang Lu, Hisashi Kawai:
Investigation of Semi-Supervised Acoustic Model Training Based on the Committee of Heterogeneous Neural Networks. INTERSPEECH 2016: 1325-1329
[c13]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KandaLK16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KandaLK16
Naoyuki Kanda, Xugang Lu, Hisashi Kawai:
Maximum a posteriori Based Decoding for CTC Acoustic Models. INTERSPEECH 2016: 1868-1872
2015
[c12]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/KandaTLK15
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/KandaTLK15
Naoyuki Kanda, Mitsuyoshi Tachimori, Xugang Lu, Hisashi Kawai:
Training data pseudo-shuffling and direct decoding framework for recurrent neural network based acoustic modeling. ASRU 2015: 15-21
2014
[b1]
- view
  authority control:
- export record
  dblp key:
  - phd/jp/Kanda14
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/phd/jp/Kanda14
Naoyuki Kanda:
Open-ended Spoken Language Technology: Studies on Spoken Dialogue Systems and Spoken Document Retrieval Systems. Kyoto University, Japan, 2014
[c11]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TakedaKN14
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TakedaKN14
Ryu Takeda, Naoyuki Kanda, Nobuo Nukaga:
Boundary contraction training for acoustic models based on discrete deep neural networks. INTERSPEECH 2014: 1063-1067
[c10]
- view
  - electronic edition @ aclanthology.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/iwslt/ShenLHKSH14
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iwslt/ShenLHKSH14
Peng Shen, Yugang Lu, Xinhui Hu, Naoyuki Kanda, Masahiro Saiko, Chiori Hori:
The NCT ASR system for IWSLT 2014. IWSLT (Evaluation Campaign) 2014
2013
[c9]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/KandaTO13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/KandaTO13
Naoyuki Kanda, Ryu Takeda, Yasunari Obuchi:
Elastic spectral distortion for low resource speech recognition with deep neural networks. ASRU 2013: 309-314
[c8]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/KandaIO13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/KandaIO13
Naoyuki Kanda, Katsutoshi Itoyama, Hiroshi G. Okuno:
Multiple index combination for Japanese spoken term detection with optimum index selection based on OOV-region classifier. ICASSP 2013: 8540-8544
[c7]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KandaTO13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KandaTO13
Naoyuki Kanda, Ryu Takeda, Yasunari Obuchi:
Noise robust speaker verification with delta cepstrum normalization. INTERSPEECH 2013: 3112-3116
2012
[c6]
- view
  - electronic edition @ ieee.org
  - no references & citations available
- export record
  dblp key:
  - conf/apsipa/ObuchiTK12
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/apsipa/ObuchiTK12
Yasunari Obuchi, Ryu Takeda, Naoyuki Kanda:
Voice activity detection based on augmented statistical noise suppression. APSIPA 2012: 1-4
[c5]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/KandaTO12
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/KandaTO12
Naoyuki Kanda, Ryu Takeda, Yasunari Obuchi:
Using rhythmic features for Japanese spoken term detection. SLT 2012: 170-175
2011
[j1]
- view
  authority control:
- export record
  dblp key:
  - journals/kbs/NakanoHFTTNKKOT11
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/kbs/NakanoHFTTNKKOT11
Mikio Nakano, Yuji Hasegawa, Kotaro Funakoshi, Johane Takeuchi, Toyotaka Torii, Kazuhiro Nakadai, Naoyuki Kanda, Kazunori Komatani, Hiroshi G. Okuno, Hiroshi Tsujino:
A multi-expert model for dialogue and behavior control of conversational robots and agents. Knowl. Based Syst. 24(2): 248-256 (2011)

2000 – 2009

see FAQ

What is the meaning of the colors in the publication lists?

2008
[c4]
- view
  authority control:
- export record
  dblp key:
  - conf/mmsp/KandaSSO08
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/mmsp/KandaSSO08
Naoyuki Kanda, Hirohiko Sagawa, Takashi Sumiyoshi, Yasunari Obuchi:
Open-vocabulary keyword detection from super-large scale speech database. MMSP 2008: 939-944
2006
[c3]
- view
  - electronic edition @ aclanthology.org (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/sigdial/KomataniKNNTOO06
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/sigdial/KomataniKNNTOO06
Kazunori Komatani, Naoyuki Kanda, Mikio Nakano, Kazuhiro Nakadai, Hiroshi Tsujino, Tetsuya Ogata, Hiroshi G. Okuno:
Multi-Domain Spoken Dialogue System with Extensibility and Robustness against Speech Recognition Errors. SIGDIAL Workshop 2006: 9-17
2005
[c2]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KomataniKOO05
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KomataniKOO05
Kazunori Komatani, Naoyuki Kanda, Tetsuya Ogata, Hiroshi G. Okuno:
Contextual constraints based on dialogue models in database search task for spoken dialogue systems. INTERSPEECH 2005: 877-880
[c1]
- view
  authority control:
- export record
  dblp key:
  - conf/iros/NakanoHNNTTTKO05
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iros/NakanoHNNTTTKO05
Mikio Nakano, Yuji Hasegawa, Kazuhiro Nakadai, Takahiro Nakamura, Johane Takeuchi, Toyotaka Torii, Hiroshi Tsujino, Naoyuki Kanda, Hiroshi G. Okuno:
A two-layer model for behavior and dialogue planning in conversational service robots. IROS 2005: 3329-3335

Coauthor Index

see FAQ

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.