default search action
ASRU 2019: Singapore
- IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019. IEEE 2019, ISBN 978-1-7281-0306-8
- Zhehuai Chen, Mahsa Yarmohammadi, Hainan Xu, Hang Lv, Lei Xie, Daniel Povey, Sanjeev Khudanpur:
Incremental Lattice Determinization for WFST Decoders. 1-7 - Albert Zeyer, Parnia Bahar, Kazuki Irie, Ralf Schlüter, Hermann Ney:
A Comparison of Transformer and LSTM Encoder Decoder Models for ASR. 8-15 - Jiayi Fu, Kuang Ru:
A Dropout-Based Single Model Committee Approach for Active Learning in ASR. 16-22 - Khe Chai Sim, Leif Johnson, Giovanni Motta, Lillian Zhou, Françoise Beaufays, Arnaud Benard, Dhruv Guliani, Andreas Kabel, Nikhil Khare, Tamar Lucassen, Petr Zadrazil, Harry Zhang:
Personalization of End-to-End Speech Recognition on Mobile Devices for Named Entities. 23-30 - Naoyuki Kanda, Shota Horiguchi, Yusuke Fujita, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe:
Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models. 31-38 - Qiujia Li, Chao Zhang, Philip C. Woodland:
Integrating Source-Channel and Attention-Based Sequence-to-Sequence Models for Speech Recognition. 39-46 - Catalin Zorila, Christoph Böddeker, Rama Doddipatla, Reinhold Haeb-Umbach:
An Investigation into the Effectiveness of Enhancement in ASR Training and Test for Chime-5 Dinner Party Transcription. 47-53 - Kyu Jeong Han, Ramon Prieto, Tao Ma:
State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention with Dilated 1D Convolutions. 54-61 - Rao Ma, Qi Liu, Kai Yu:
Highly Efficient Neural Network Language Model Compression Using Soft Binarization Training. 62-69 - Abhinav Garg, Dhananjaya Gowda, Ankur Kumar, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim:
Improved Multi-Stage Training of Online Attention-Based Encoder-Decoder Models. 70-77 - Adrien Dufraux, Emmanuel Vincent, Awni Y. Hannun, Armelle Brun, Matthijs Douze:
Lead2Gold: Towards Exploiting the Full Potential of Noisy Transcriptions for Speech Recognition. 78-85 - Mingu Lee, Jinkyu Lee, Hye Jin Jang, Byeonggeun Kim, Wonil Chang, Kyuwoong Hwang:
Orthogonality Constrained Multi-Head Attention for Keyword Spotting. 86-92 - Jeremy Heng Meng Wong, Mark J. F. Gales, Yu Wang:
Learning Between Different Teacher and Student Models in ASR. 93-99 - Shuo-Yiin Chang, Bo Li, Gabor Simko:
A Unified Endpointer Using Multitask and Multidomain Training. 100-106 - Shahram Ghorbani, Soheil Khorram, John H. L. Hansen:
Domain Expansion in DNN-Based Acoustic Models for Robust Speech Recognition. 107-113 - Jinyu Li, Rui Zhao, Hu Hu, Yifan Gong:
Improving RNN Transducer Modeling for End-to-End Speech Recognition. 114-121 - Lukas Lee, Jinhwan Park, Wonyong Sung:
Simple Gated Convnet for Small Footprint Acoustic Modeling. 122-128 - Peiyao Sheng, Zhuolin Yang, Yanmin Qian:
GANs for Children: A Generative Data Augmentation Strategy for Children Speech Recognition. 129-135 - Yiming Wang, Sanjeev Khudanpur, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe:
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit. 136-143 - Berrak Sisman, Mingyang Zhang, Minghui Dong, Haizhou Li:
On the Study of Generative Adversarial Networks for Cross-Lingual Voice Conversion. 144-151 - Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li:
WaveNet Factorization with Singular Value Decomposition for Voice Conversion. 152-159 - Yi Zhou, Xiaohai Tian, Emre Yilmaz, Rohan Kumar Das, Haizhou Li:
A Modularized Neural Network with Language-Specific Output Layers for Cross-Lingual Voice Conversion. 160-167 - Hao Sun, Xu Tan, Jun-Wei Gan, Sheng Zhao, Dongxu Han, Hongzhi Liu, Tao Qin, Tie-Yan Liu:
Knowledge Distillation from Bert in Pre-Training and Fine-Tuning for Polyphone Disambiguation. 168-175 - Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda:
Investigation of Shallow Wavenet Vocoder with Laplacian Distribution Output. 176-183 - Xiaochun An, Yuxuan Wang, Shan Yang, Zejun Ma, Lei Xie:
Learning Hierarchical Representations for Expressive Speaking Style in End-to-End Speech Synthesis. 184-191 - Xiaolian Zhu, Shan Yang, Geng Yang, Lei Xie:
Controlling Emotion Strength with Relative Attribute for End-to-End Speech Synthesis. 192-199 - Hieu-Thi Luong, Junichi Yamagishi:
Bootstrapping Non-Parallel Voice Conversion from Speaker-Adaptive Text-to-Speech. 200-207 - Fengyu Yang, Shan Yang, Pengcheng Zhu, Pengju Yan, Lei Xie:
Improving Mandarin End-to-End Speech Synthesis by Self-Attention and Learnable Gaussian Bias. 208-213 - Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai:
Tacotron-Based Acoustic Model Using Phoneme Alignment for Practical Neural Text-to-Speech Systems. 214-221 - Zhiyun Fan, Jie Li, Shiyu Zhou, Bo Xu:
Speaker-Aware Speech-Transformer. 222-229 - Peidong Wang, Zhuo Chen, Xiong Xiao, Zhong Meng, Takuya Yoshioka, Tianyan Zhou, Liang Lu, Jinyu Li:
Speech Separation Using Speaker Inventory. 230-236 - Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe:
MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition. 237-244 - Mahesh K. Chelimilla, Shashi Kumar, Shakti P. Rath:
Joint Distribution Learning in the Framework of Variational Autoencoders for Far-Field Speech Enhancement. 245-251 - Salar Jafarlou, Soheil Khorram, Vinay Kothapally, John H. L. Hansen:
Analyzing Large Receptive Field Convolutional Networks for Distant Speech Recognition. 252-259 - Yi Luo, Cong Han, Nima Mesgarani, Enea Ceolini, Shih-Chii Liu:
FaSNet: Low-Latency Adaptive Beamforming for Multi-Microphone Audio Processing. 260-267 - Zhong Meng, Jinyu Li, Yashesh Gaur, Yifan Gong:
Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition. 268-275 - Takuya Yoshioka, Yan Huang, Aviv Hurvitz, Li Jiang, Sharon Koubi, Eyal Krupka, Ido Leichter, Changliang Liu, Partha Parthasarathy, Alon Vinnikov, Lingfeng Wu, Igor Abramovski, Xiong Xiao, Wayne Xiong, Huaming Wang, Zhenghao Wang, Jun Zhang, Yong Zhao, Tianyan Zhou, Cem Aksoylar, Zhuo Chen, Moshe David, Dimitrios Dimitriadis, Yifan Gong, Ilya Gurvich, Xuedong Huang:
Advances in Online Audio-Visual Meeting Transcription. 276-283 - Zhiming Wang, Kaisheng Yao, Shuo Fang, Xiaolong Li:
Joint Optimization of Classification and Clustering for Deep Speaker Embedding. 284-290 - Chien-Lin Huang:
Exploring Effective Data Augmentation with TDNN-LSTM Neural Network Embedding for Speaker Recognition. 291-295 - Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu, Shinji Watanabe:
End-to-End Neural Speaker Diarization with Self-Attention. 296-303 - Rosanna Milner, Md Asif Jalal, Raymond W. M. Ng, Thomas Hain:
A Cross-Corpus Study on Speech Emotion Recognition. 304-311 - Songxiang Liu, Haibin Wu, Hung-yi Lee, Helen Meng:
Adversarial Attacks on Spoofing Countermeasures of Automatic Speaker Verification. 312-319 - H. Muralikrishna, Pulkit Sapra, Anuksha Jain, Dileep Aroor Dinesh:
Spoken Language Identification Using Bidirectional LSTM Based LID Sequential Senones. 320-326 - Chenglin Xu, Wei Rao, Eng Siong Chng, Haizhou Li:
Time-Domain Speaker Extraction Network. 327-334 - Jee-Weon Jung, Hee-Soo Heo, Hye-Jin Shim, Ha-Jin Yu:
Short Utterance Compensation in Speaker Verification via Cosine-Based Teacher-Student Learning of Speaker Embeddings. 335-341 - Rajul Acharya, Hemant A. Patil, Harsh Kotta:
Novel Enhanced Teager Energy Based Cepstral Coefficients for Replay Spoof Detection. 342-349 - Junyi Peng, Yuexian Zou, Na Li, Deyi Tuo, Dan Su, Meng Yu, Chunlei Zhang, Dong Yu:
Syllable-Dependent Discriminative Learning for Small Footprint Text-Dependent Speaker Verification. 350-357 - Kin Wai Cheuk, Balamurali B. T., Gemma Roig, Dorien Herremans:
Latent Space Representation for Multi-Target Speaker Detection and Identification with a Sparse Dataset Using Triplet Neural Networks. 358-364 - Youngmoon Jung, Yeunju Choi, Hoirin Kim:
Self-Adaptive Soft Voice Activity Detection Using Deep Neural Networks for Robust Speaker Verification. 365-372 - Tuomas Kaseva, Aku Rouhe, Mikko Kurimo:
Spherediar: An Effective Speaker Diarization System for Meeting Data. 373-380 - Jen-Tzung Chien, Chun Lin Kuo:
Bayesian Adversarial Learning for Speaker Recognition. 381-388 - Tirusha Mandava, Ravi Kumar Vuddagiri, Hari Krishna Vydana, Anil Kumar Vuppala:
An Investigation of LSTM-CTC based Joint Acoustic Model for Indian Language Identification. 389-396 - Hossein Zeinali, Lukás Burget, Jan Honza Cernocký:
A Multi Purpose and Large Scale Speech Corpus in Persian and English for Speaker and Speech Recognition: The Deepmine Database. 397-402 - Rutuja Ubale, Vikram Ramanarayanan, Yao Qian, Keelan Evanini, Chee Wee Leong, Chong Min Lee:
Native Language Identification from Raw Waveforms Using Deep Convolutional Neural Networks with Attentive Pooling. 403-410 - Ladislav Mosner, Oldrich Plchot, Johan Rohdin, Lukás Burget, Jan Cernocký:
Speaker Verification with Application-Aware Beamforming. 411-418 - Kazuki Irie, Albert Zeyer, Ralf Schlüter, Hermann Ney:
Training Language Models for Long-Span Cross-Sentence Evaluation. 419-426 - Emiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, Shinji Watanabe:
Transformer ASR with Contextual Block Processing. 427-433 - Erik McDermott, Hasim Sak, Ehsan Variani:
A Density Ratio Approach to Language Model Fusion in End-to-End Automatic Speech Recognition. 434-441 - Abhishek Niranjan, M. Ali Basha Shaik:
Improving Grapheme-to-Phoneme Conversion by Investigating Copying Mechanism in Recurrent Architectures. 442-448 - Shigeki Karita, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto:
A Comparative Study on Transformer vs RNN in Speech Applications. 449-456 - Duc Le, Xiaohui Zhang, Weiyi Zheng, Christian Fügen, Geoffrey Zweig, Michael L. Seltzer:
From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition. 457-464 - Osamu Segawa, Tomoki Hayashi, Kazuya Takeda:
Attention-Based Speech Recognition Using Gaze Information. 465-470 - Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura:
Listening While Speaking and Visualizing: Improving ASR Through Multimodal Chain. 471-478 - Joanna Rownicka, Peter Bell, Steve Renals:
Embeddings for DNN Speaker Adaptive Training. 479-486 - Surabhi Punjabi, Harish Arsikere, Sri Garimella:
Language Model Bootstrapping Using Neural Machine Translation for Conversational Speech Recognition. 487-493 - Shubham Bansal, Karan Malhotra, Sriram Ganapathy:
Speaker and Language Aware Training for End-to-End ASR. 494-501 - Tohru Nagano, Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata:
Data Augmentation Based on Vowel Stretch for Improving Children's Speech Recognition. 502-508 - Takashi Fukuda, Samuel Thomas:
Mixed Bandwidth Acoustic Modeling Leveraging Knowledge Distillation. 509-515 - Timo Lohrenz, Maximilian Strake, Tim Fingscheidt:
On Temporal Context Information for Hybrid BLSTM-Based Phoneme Recognition. 516-523 - Mingkun Huang, Yizhou Lu, Lan Wang, Yanmin Qian, Kai Yu:
Exploring Model Units and Training Strategies for End-to-End Speech Recognition. 524-531 - Byeonggeun Kim, Mingu Lee, Jinkyu Lee, Yeonseok Kim, Kyuwoong Hwang:
Query-by-Example On-Device Keyword Spotting. 532-538 - Xi Chen, Shouyi Yin, Dandan Song, Peng Ouyang, Leibo Liu, Shaojun Wei:
Small-Footprint Keyword Spotting with Graph Convolutional Network. 539-546 - George Saon, Zoltán Tüske, Kartik Audhkhasi, Brian Kingsbury, Michael Picheny, Samuel Thomas:
Simplified LSTMS for Speech Recognition. 547-553 - Ryo Masumura, Mana Ihori, Tomohiro Tanaka, Itsumi Saito, Kyosuke Nishida, Takanobu Oba:
Generalized Large-Context Language Models Based on Forward-Backward Hierarchical Recurrent Encoder-Decoder Models. 554-561 - Chanwoo Kim, Minkyoo Shin, Shatrughan Singh, Larry Heck, Dhananjaya Gowda, Sungsoo Kim, Kwangyoun Kim, Mehul Kumar, Jiyeon Kim, Kyungmin Lee, Changwoo Han, Abhinav Garg, Eunhyang Kim:
End-to-End Training of a Large Vocabulary End-to-End Speech Recognition System. 562-569 - Hirofumi Inaguma, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe:
Multilingual End-to-End Speech Translation. 570-577 - Takatomo Kano, Sakriani Sakti, Satoshi Nakamura:
Neural Machine Translation with Acoustic Embedding. 578-584 - Mattia Antonino Di Gangi, Matteo Negri, Marco Turchi:
One-to-Many Multilingual End-to-End Speech Translation. 585-592 - Andros Tjandra, Sakriani Sakti, Satoshi Nakamura:
Speech-to-Speech Translation Between Untranscribed Unknown Languages. 593-600 - Hsiao-Yun Lin, Tien-Hong Lo, Berlin Chen:
Enhanced Bert-Based Ranking Models for Spoken Document Retrieval. 601-606 - Xiong Wang, Sining Sun, Lei Xie:
Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting. 607-612 - Yougen Yuan, Zhiqiang Lv, Shen Huang, Lei Xie:
Verifying Deep Keyword Spotting Detection with Acoustic Word Embeddings. 613-620 - Dhananjay Ram, Lesly Miculicich, Hervé Bourlard:
Multilingual Bottleneck Features for Query by Example Spoken Term Detection. 621-628 - Myunghun Jung, Hyungjun Lim, Jahyun Goo, Youngmoon Jung, Hoirin Kim:
Additional Shared Decoder on Siamese Multi-View Encoders for Learning Acoustic Word Embeddings. 629-636 - Tomohiro Tanaka, Takahiro Shinozaki:
Efficient Free Keyword Detection Based on CNN and End-to-End Continuous DP-Matching. 637-644 - Bo Wu, Meng Yu, Lianwu Chen, Mingjie Jin, Dan Su, Dong Yu:
Improving Speech Enhancement with Phonetic Embedding Features. 645-651 - Daniel Kopev, Ahmed Ali, Ivan Koychev, Preslav Nakov:
Detecting Deception in Political Debates Using Acoustic and Textual Features. 652-659 - Wangyou Zhang, Man Sun, Lan Wang, Yanmin Qian:
End-to-End Overlapped Speech Detection and Speaker Counting with Raw Waveform. 660-666 - Jian Wu, Yong Xu, Shi-Xiong Zhang, Lianwu Chen, Meng Yu, Lei Xie, Dong Yu:
Time Domain Audio Visual Speech Separation. 667-673 - Jochen Weiner, Claudia Frankenberg, Johannes Schröder, Tanja Schultz:
Speech Reveals Future Risk of Developing Dementia: Predictive Dementia Screening from Biographic Interviews. 674-681 - Lorenz Diener, Tejas Umesh, Tanja Schultz:
Improving Fundamental Frequency Generation in EMG-to-Speech Conversion Using a Quantization Approach. 682-689 - Peter Plantinga, Eric Fosler-Lussier:
Towards Real-Time Mispronunciation Detection in Kids' Speech. 690-696 - Tsun-Yat Leung, Lahiru Samarakoon, Albert Y. S. Lam:
Incorporating Prior Knowledge into Speaker Diarization and Linking for Identifying Common Speaker. 697-703 - Junyi Peng, Rongzhi Gu, Yuexian Zou:
Logistic Similarity Metric Learning via Affinity Matrix for Text-Independent Speaker Verification. 704-709 - Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Najim Dehak:
Low-Resource Domain Adaptation for Speaker Recognition Using Cycle-Gans. 710-717 - Tianyan Zhou, Yong Zhao, Jinyu Li, Yifan Gong, Jian Wu:
CNN with Phonetic Attention for Text-Independent Speaker Verification. 718-725 - Desh Raj, David Snyder, Daniel Povey, Sanjeev Khudanpur:
Probing the Information Encoded in X-Vectors. 726-733 - M. Joana Correia, Isabel Trancoso, Bhiksha Raj:
In-the-Wild End-to-End Detection of Speech Affecting Diseases. 734-741 - Hira Dhamyal, Tianyan Zhou, Bhiksha Raj, Rita Singh:
Optimizing Neural Network Embeddings Using a Pair-Wise Loss for Text-Independent Speaker Verification. 742-748 - Rosa González Hautamäki, Tomi H. Kinnunen:
Towards Controlling False Alarm - Miss Trade-Off in Perceptual Speaker Comparison via Non-Neutral Listening Task Framing. 749-756 - Andreas Stolcke, Takuya Yoshioka:
Dover: A Method for Combining Diarization Outputs. 757-763 - Xinhao Wang, Keelan Evanini, Yao Qian, Klaus Zechner:
Using Very Deep Convolutional Neural Networks to Automatically Detect Plagiarized Spoken Responses. 764-771 - Shang-Bao Luo, Hung-Shin Lee, Kuan-Yu Chen, Hsin-Min Wang:
Spoken Multiple-Choice Question Answering Using Multimodal Convolutional Neural Networks. 772-778 - Qian Chen, Zhu Zhuo, Wen Wang, Qiuyun Xu:
Transfer Learning for Context-Aware Spoken Language Understanding. 779-786 - Chirag Singh, Abhay Kumar, Ajay Nagar, Suraj Tripathi, Promod Yenigalla:
Emoception: An Inception Inspired Efficient Speech Emotion Recognition Network. 787-791 - Parnia Bahar, Tobias Bieschke, Hermann Ney:
A Comparative Study on End-to-End Speech to Text Translation. 792-799 - Jiewen Wu, Luis Fernando D'Haro, Nancy F. Chen, Pavitra Krishnaswamy, Rafael E. Banchs:
Joint Learning of Word and Label Embeddings for Sequence Labelling in Spoken Language Understanding. 800-806 - Jen-Tzung Chien, Che-Yu Kuo:
Markov Recurrent Neural Network Language Model. 807-813 - Zhengyuan Liu, Angela Ng, Sheldon Lee Shao Guang, Ai Ti Aw, Nancy F. Chen:
Topic-Aware Pointer-Generator Networks for Summarizing Spoken Conversations. 814-821 - Thierry Desot, François Portet, Michel Vacher:
SLU for Voice Command in Smart Home: Comparison of Pipeline and End-to-End Approaches. 822-829 - Vevake Balaraman, Bernardo Magnini:
Scalable Neural Dialogue State Tracking. 830-837 - Raghavendra Pappagari, Piotr Zelasko, Jesús Villalba, Yishay Carmiel, Najim Dehak:
Hierarchical Transformers for Long Document Classification. 838-844 - Chao-Wei Huang, Yun-Nung Chen:
Adapting Pretrained Transformer to Lattices for Spoken Language Understanding. 845-852 - Md Asif Jalal, Roger K. Moore, Thomas Hain:
Spatio-Temporal Context Modelling for Speech Emotion Classification. 853-859 - Lohith Ravuru, Hyungtak Choi, Siddarth K. M., Hojung Lee, Inchul Hwang:
Paraphrase Generation Based on VAE and Pointer-Generator Networks. 860-866 - Yinghui Huang, Samuel Thomas, Masayuki Suzuki, Zoltán Tüske, Larry Sansone, Michael Picheny:
Semi-Supervised Training and Data Augmentation for Adaptation of Automatic Broadcast News Captioning Systems. 867-874 - Franco Mana, Felix Weninger, Roberto Gemello, Puming Zhan:
Online Batch Normalization Adaptation for Automatic Speech Recognition. 875-880 - Ondrej Klejch, Joachim Fainberg, Peter Bell, Steve Renals:
Speaker Adaptive Training Using Model Agnostic Meta-Learning. 881-888 - Chung-Cheng Chiu, Anjuli Kannan, Rohit Prabhavalkar, Zhifeng Chen, Tara N. Sainath, Yonghui Wu, Wei Han, Yu Zhang, Ruoming Pang, Sergey Kishchenko, Patrick Nguyen, Arun Narayanan, Hank Liao, Shuyuan Zhang:
A Comparison of End-to-End Models for Long-Form Speech Recognition. 889-896 - Joachim Fainberg, Ondrej Klejch, Erfan Loweimi, Peter Bell, Steve Renals:
Acoustic Model Adaptation from Raw Waveforms with Sincnet. 897-904 - Takaki Makino, Hank Liao, Yannis M. Assael, Brendan Shillingford, Basilio Garcia, Otavio Braga, Olivier Siohan:
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition. 905-912 - Jennifer Drexler, James R. Glass:
Explicit Alignment of Text and Speech Encodings for Attention-Based End-to-End Speech Recognition. 913-919 - Arun Narayanan, Rohit Prabhavalkar, Chung-Cheng Chiu, David Rybach, Tara N. Sainath, Trevor Strohman:
Recognizing Long-Form Speech Using Streaming End-to-End Models. 920-927 - Austin Waters, Neeraj Gaur, Parisa Haghani, Pedro J. Moreno, Zhongdi Qu:
Leveraging Language ID in Multilingual End-to-End Speech Recognition. 928-935 - Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Streaming End-to-End Speech Recognition with Joint CTC-Attention Based Models. 936-943 - Anshuman Tripathi, Han Lu, Hasim Sak, Hagen Soltau:
Monotonic Recurrent Neural Network Transducer and Decoding Strategies. 944-948 - Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong:
Character-Aware Attention-Based End-to-End Speech Recognition. 949-955 - Kwangyoun Kim, Seokyeong Jung, Jungin Lee, Myoungji Han, Chanwoo Kim, Kyungmin Lee, Dhananjaya Gowda, Junmo Park, Sungsoo Kim, Sichen Jin, Young-Yoon Lee, Jinsu Yeo, Daehyun Kim:
Attention Based On-Device Streaming Speech Recognition with Large Speech Corpus. 956-963 - Sahoko Nakayama, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura:
Zero-Shot Code-Switching ASR and TTS with Multilingual Machine Speech Chain. 964-971 - Xianghu Yue, Grandee Lee, Emre Yilmaz, Fang Deng, Haizhou Li:
End-to-End Code-Switching ASR for Low-Resourced Language Pairs. 972-979 - Hardik B. Sailor, Salil Deena, Md Asif Jalal, Rasa Lileikyte, Thomas Hain:
Unsupervised Adaptation of Acoustic Models for ASR Using Utterance-Level Embeddings from Squeeze and Excitation Networks. 980-987 - Chanwoo Kim, Mehul Kumar, Kwangyoun Kim, Dhananjaya Gowda:
Power-Law Nonlinearity with Maximally Uniform Distribution Criterion for Improved Neural Network Training in Automatic Speech Recognition. 988-995 - Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Ye Jia, Pedro J. Moreno, Yonghui Wu, Zelin Wu:
Speech Recognition with Augmented Synthesized Speech. 996-1002 - João Monteiro, Jahangir Alam:
Development of Voice Spoofing Detection Systems for 2019 Edition of Automatic Speaker Verification and Countermeasures Challenge. 1003-1010 - Mari Ganesh Kumar, Suvidha Rupesh Kumar, M. S. Saranya, B. Bharathi, Hema A. Murthy:
Spoof Detection Using Time-Delay Shallow Neural Network and Feature Switching. 1011-1017 - Rohan Kumar Das, Jichen Yang, Haizhou Li:
Long Range Acoustic and Deep Features Perspective on ASVspoof 2019. 1018-1025 - Ahmed Ali, Suwon Shon, Younes Samih, Hamdy Mubarak, Ahmed Abdelali, James R. Glass, Steve Renals, Khalid Choukri:
The MGB-5 Challenge: Recognition and Dialect Identification of Dialectal Arabic Speech. 1026-1033 - Kiran Praveen, Anshul Gupta, Akshara Soman, Sriram Ganapathy:
Second Language Transfer Learning in Humans and Machines Using Image Supervision. 1040-1047 - Matthew Wiesner, Oliver Adams, David Yarowsky, Jan Trmal, Sanjeev Khudanpur:
Zero-Shot Pronunciation Lexicons for Cross-Language Acoustic Model Transfer. 1048-1054 - Fotios Lygerakis, Vassilios Diakoloukas, Michail Lagoudakis, Margarita Kotti:
Robust Belief State Space Representation for Statistical Dialogue Managers Using Deep Autoencoders. 1055-1061 - Ryo Masumura, Mana Ihori, Tomohiro Tanaka, Atsushi Ando, Ryo Ishii, Takanobu Oba, Ryuichiro Higashinaka:
Improving Speech-Based End-of-Turn Detection Via Cross-Modal Representation Learning with Punctuated Text Data. 1062-1069 - Yu-An Wang, Yun-Nung Chen:
Dialogue Environments are Different from Games: Investigating Variants of Deep Q-Networks for Dialogue Policy. 1070-1076 - Eunah Cho, He Xie, John P. Lalor, Varun Kumar, William M. Campbell:
Efficient Semi-Supervised Learning for Natural Language Understanding by Optimizing Diversity. 1077-1084
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.