Yiming Wang

Yiming Wang 王一鸣

Research Scientist @ NVIDIA

NeMo Speech Team · Santa Clara, CA, USA

Speech Recognition Multimodal LLMs Self-Supervised Learning Machine Learning

Biography

I am a Research Scientist at NVIDIA, working on Multimodal LLMs. Before joining NVIDIA, I worked at Microsoft CoreAI under Jinyu Li, after receiving my Ph.D. in Computer Science from Johns Hopkins University. At JHU I was affiliated with the Center for Language and Speech Processing (CLSP), advised by Prof. Sanjeev Khudanpur and former JHU Prof. Daniel Povey.

My work centers on speech recognition (ASR), with broad interests in machine learning and natural language processing. I am one of the major contributors to the Kaldi project and the owner of the open-source end-to-end ASR toolkit Espresso. I interned at Google's speech team and Amazon's Alexa ASR team in 2017 and 2018 respectively, working on end-to-end ASR.

I received my B.S. and M.S. in Computer Science from Nanjing University in 2009 and 2012, advised by Prof. Tong Lu.

Education

Ph.D. in Computer Science

Sep 2012 – Sep 2020

Department of Computer Science, Johns Hopkins University · Baltimore, MD, USA

Advisors: Prof. Sanjeev Khudanpur & Dr. Daniel Povey
Thesis: Wake Word Detection and its Applications

M.S. in Computer Science

Sep 2009 – Jun 2012

Department of Computer Science and Technology, Nanjing University · Nanjing, China

Advisor: Prof. Tong Lu
Thesis: Scene Image Understanding Based on Topic Modeling (in Chinese)

B.S. in Computer Science

Sep 2005 – Jun 2009

Department of Computer Science and Technology, Nanjing University · Nanjing, China

Work Experience

Staff Research Scientist

Apr 2026 – Present

NeMo Speech Team, NVIDIA Corporation · Santa Clara, CA, USA

Supervisor: Dr. Boris Ginsburg

Principal Applied Scientist

Sep 2025 – Mar 2026

Senior Applied Scientist

Sep 2020 – Aug 2025

CoreAI, Microsoft Corporation · Redmond, WA, USA

Supervisor: Dr. Jinyu Li

Applied Scientist Intern

May 2018 – Aug 2018

Amazon.com, Inc. · Seattle, WA, USA

Worked with Dr. Xing Fan, Dr. I-Fan Chen and Dr. Yuzong Liu on improving Seq2Seq ASR with information extracted from anchored words for Amazon Alexa.

Research Intern

May 2017 – Aug 2017

Google LLC · Mountain View, CA, USA

Worked with Dr. Arun Narayanan, Dr. Rohit Prabhavalkar and Dr. Izhak Shafran on improving the LAS model with time-frequency attention for robust ASR.

Research Assistant

Sep 2015 – Aug 2020

Center for Language and Speech Processing, Johns Hopkins University · Baltimore, MD, USA

Worked with Dr. Daniel Povey and Prof. Sanjeev Khudanpur on speech recognition, contributing to the Kaldi project.

Research Assistant

Sep 2014 – Aug 2015

The Lieber Institute for Brain Development · Baltimore, MD, USA

Worked on multi-view learning for genomic and brain imaging data.

Teaching Experience

Teaching Assistant · Machine Learning

Fall 2016, Fall & Spring 2014

Johns Hopkins University

Instructor: Mark Dredze

Teaching Assistant · Information Retrieval and Web Agents

Spring 2015

Johns Hopkins University

Instructor: David Yarowsky

Teaching Assistant · Machine Learning in Complex Domains

Fall 2013

Johns Hopkins University

Instructor: Suchi Saria

Teaching Assistant · Algorithms for Sensor-based Robotics

Spring 2013

Johns Hopkins University

Instructor: Gregory Hager

Teaching Assistant · Programming in Java

Spring 2010

Nanjing University

Instructor: Ning Li

Invited Talks

Espresso: A Fast End-to-end Neural Speech Recognition Toolkit

NVIDIA GPU Technology Conference (GTC) 2020

Publications

2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Abdelrahman Abouelenin, Atabak Ashfaq, Adam Atkinson, Hany Awadalla, Nguyen Bach, Jianmin Bao, Alon Benhaim, Martin Cai, Vishrav Chaudhary, Congcong Chen, et al. Technical Report 2025
2023
Data2vec-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks Heming Wang, Yao Qian, Hemin Yang, Naoyuki Kanda, Peidong Wang, Takuya Yoshioka, Xiaofei Wang, Yiming Wang, Shujie Liu, Zhuo Chen, DeLiang Wang, Michael Zeng ICASSP 2023
2023
Self-Supervised Learning with Bi-label Masked Speech Prediction for Streaming Multi-talker Speech Recognition Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang ICASSP 2023
2023
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives Ruchao Fan, Yiming Wang, Yashesh Gaur, Jinyu Li ICASSP 2023
2022
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training Yiming Wang, Chengyi Wang, Yu Wu, Sanyuan Chen, Jinyu Li, Shujie Liu, Furu Wei Interspeech 2022
2022
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie Liu, Takuya Yoshioka, Jinyu Li, DeLiang Wang ICASSP 2022
2022
Wav2vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech Recognition Yiming Wang, Jinyu Li, Heming Wang, Yao Qian, Chengyi Wang, Yu Wu ICASSP 2022
2021
LET-Decoder: A WFST-Based Lazy-Evaluation Token-Group Decoder with Exact Lattice Generation Hang Lv, Daniel Povey, Mahsa Yarmohammadi, Ke Li, Yiming Wang, Lei Xie, Sanjeev Khudanpur IEEE Signal Processing Letters 2021
2021
Wake Word Detection with Streaming Transformers Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur ICASSP 2021
2020
Wake Word Detection with Alignment-Free Lattice-Free MMI Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur Interspeech 2020
2020
PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur Interspeech 2020
2019
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur ASRU 2019
2019
The JHU ASR System for VOiCES from a Distance Challenge 2019 Yiming Wang, David Snyder, Hainan Xu, Vimal Manohar, Phani Sankar Nidadavolu, Daniel Povey, Sanjeev Khudanpur Interspeech 2019
2019
Robust Document Representations for Cross-Lingual Information Retrieval in Low-Resource Settings Mahsa Yarmohammadi, Xutai Ma, Sorami Hisamoto, Muhammad Rahman, Yiming Wang, Hainan Xu, Daniel Povey, Philipp Koehn, Kevin Duh Machine Translation Summit 2019
2019
End-to-end Anchored Speech Recognition Yiming Wang, Xing Fan, I-Fan Chen, Yuzong Liu, Tongfei Chen, Björn Hoffmeister ICASSP 2019
2018
A Pruned RNNLM Lattice-rescoring Algorithm for Automatic Speech Recognition Hainan Xu, Tongfei Chen, Dongji Gao, Yiming Wang, Ke Li, Nagendra Goel, Yishay Carmiel, Daniel Povey, Sanjeev Khudanpur ICASSP 2018
2018
Neural Network Language Modeling with Letter-based Features and Importance Sampling Hainan Xu, Ke Li, Yiming Wang, Jian Wang, Shiyin Kang, Xie Chen, Daniel Povey, Sanjeev Khudanpur ICASSP 2018
2018
Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition Ke Li, Hainan Xu, Yiming Wang, Daniel Povey, Sanjeev Khudanpur Interspeech 2018
2018
A GPU-based WFST Decoder with Exact Lattice Generation Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey, Sanjeev Khudanpur Interspeech 2018
2018
Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi, Sanjeev Khudanpur Interspeech 2018
2018
Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs Vijayaditya Peddinti, Yiming Wang, Daniel Povey, Sanjeev Khudanpur IEEE Signal Processing Letters 2018
2017
Backstitch: Counteracting Finite-sample Bias via Negative Steps Yiming Wang, Vijayaditya Peddinti, Xiaohui Zhang, Daniel Povey, Sanjeev Khudanpur Interspeech 2017
2017
The Kaldi OpenKWS System: Improving Low Resource Keyword Search Jan Trmal, Matthew Wiesner, Vijayaditya Peddinti, Xiaohui Zhang, Pegah Ghahremani, Yiming Wang, Vimal Manohar, Hainan Xu, Daniel Povey, Sanjeev Khudanpur Interspeech 2017
2016
Far-Field ASR Without Parallel Data Vijayaditya Peddinti, Vimal Manohar, Yiming Wang, Daniel Povey, Sanjeev Khudanpur Interspeech 2016
2016
Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahremani, Vimal Manohar, Xingyu Na, Yiming Wang, Sanjeev Khudanpur Interspeech 2016
2016
Weakly-supervised Region Annotation for Understanding Scene Images Hao Wang, Tong Lu, Yiming Wang, Palaiahnakote Shivakumara, Chew Lim Tan Multimedia Tools and Applications 2016
2014
Accelerated Mini-batch Randomized Block Coordinate Descent Method Tuo Zhao, Mo Yu, Yiming Wang, Raman Arora, Han Liu NeurIPS 2014
2014
Learning Polylingual Topic Models from Code-Switched Social Media Documents Nanyun Peng, Yiming Wang, Mark Dredze ACL 2014
2010
3D Model Comparison through Kernel Density Matching Yiming Wang, Tong Lu, Rongjun Gao, Wenyin Liu ICPR 2010
2009
QuickDiagram: A System for Online Sketching and Understanding of Diagrams Wenyin Liu, Xiangfei Kong, Yiming Wang, Chester Wan, Cheuk-Yin Ho, Tong Lu, Zhengxing Sun International Workshop on Graphics Recognition 2009

Patents

Speech Detection and Speech Recognition

Xing Fan, I-Fan Chen, Yuzong Liu, Bjorn Hoffmeister, Yiming Wang, Tongfei Chen

US Patent 10,923,111 (2021)

3D Model Comparison and Retrieval Method based on Kernel Density Estimation

Tong Lu, Yiming Wang

CN Patent 101,882,150 (2012)