default search action
Xiaodan Zhuang
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [i7]Roger Hsiao, Liuhui Deng, Erik McDermott, Ruchir Travadi, Xiaodan Zhuang:
Optimizing Byte-level Representation for End-to-end ASR. CoRR abs/2406.09676 (2024) - [i6]Adnan Haider, Xingyu Na, Erik McDermott, Tim Ng, Zhen Huang, Xiaodan Zhuang:
Focused Discriminative Training For Streaming CTC-Trained Automatic Speech Recognition Models. CoRR abs/2408.13008 (2024) - 2023
- [c39]Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang:
Variable Attention Masking for Configurable Transformer Transducer Speech Recognition. ICASSP 2023: 1-5 - [c38]Maurits J. R. Bleeker, Pawel Swietojanski, Stefan Braun, Xiaodan Zhuang:
Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition. INTERSPEECH 2023: 939-943 - [i5]Maurits J. R. Bleeker, Pawel Swietojanski, Stefan Braun, Xiaodan Zhuang:
Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition. CoRR abs/2304.08862 (2023) - 2022
- [i4]Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang:
Variable Attention Masking for Configurable Transformer Transducer Speech Recognition. CoRR abs/2211.01438 (2022) - 2021
- [c37]Xinwei Li, Yuanyuan Zhang, Xiaodan Zhuang, Daben Liu:
Frame-Level Specaugment for Deep Convolutional Neural Networks in Hybrid ASR Systems. SLT 2021: 209-214 - [i3]Zhen Huang, Xiaodan Zhuang, Daben Liu, Xiaoqiang Xiao, Yuchen Zhang, Sabato Marco Siniscalchi:
Exploring Retraining-Free Speech Recognition for Intra-sentential Code-Switching. CoRR abs/2109.00921 (2021) - 2020
- [c36]Zhen Huang, Tim Ng, Leo Liu, Henry Mason, Xiaodan Zhuang, Daben Liu:
SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition. ICASSP 2020: 6854-6858 - [i2]Xinwei Li, Yuanyuan Zhang, Xiaodan Zhuang, Daben Liu:
Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems. CoRR abs/2012.04094 (2020)
2010 – 2019
- 2019
- [c35]Zhen Huang, Xiaodan Zhuang, Daben Liu, Xiaoqiang Xiao, Yuchen Zhang, Sabato Marco Siniscalchi:
Exploring Retraining-free Speech Recognition for Intra-sentential Code-switching. ICASSP 2019: 6066-6070 - [i1]Zhen Huang, Tim Ng, Leo Liu, Henry Mason, Xiaodan Zhuang, Daben Liu:
SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition. CoRR abs/1910.01992 (2019) - 2017
- [j4]Jianwei Liao, Xiaodan Zhuang, Renyi Fan, Xiaoning Peng:
Toward a General Distributed Messaging Framework for Online Transaction Processing Applications. IEEE Access 5: 18166-18178 (2017) - [c34]Xiaodan Zhuang, Arnab Ghoshal, Antti-Veikko Rosti, Matthias Paulik, Daben Liu:
Improving DNN Bluetooth Narrowband Acoustic Models by Cross-Bandwidth and Cross-Lingual Initialization. INTERSPEECH 2017: 2148-2152 - 2014
- [c33]Xiaodan Zhuang, Viktor Rozgic, Michael Crystal:
Compact unsupervised EEG response representation for emotion recognition. BHI 2014: 736-739 - [c32]Shuang Wu, Sravanthi Bondugula, Florian Luisier, Xiaodan Zhuang, Pradeep Natarajan:
Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts. CVPR 2014: 2665-2672 - [c31]Shengxin Zha, Xujun Peng, Huaigu Cao, Xiaodan Zhuang, Pradeep Natarajan, Prem Natarajan:
Text Classification via iVector Based Feature Representation. Document Analysis Systems 2014: 151-155 - [c30]Arpit Jain, Xujun Peng, Xiaodan Zhuang, Pradeep Natarajan, Huaigu Cao:
Text detection and recognition in natural scenes and consumer videos. ICASSP 2014: 1245-1249 - [c29]Shuang Wu, Xiaodan Zhuang, Pradeep Natarajan:
Effective representations for leveraging language content in multimedia event detection. ICASSP 2014: 7123-7127 - [c28]Xiaodan Zhuang, Viktor Rozgic, Michael Crystal, Brian Marx:
Improving speech-based PTSD detection via multi-view learning. SLT 2014: 260-265 - 2013
- [j3]Kai-Hsiang Lin, Xiaodan Zhuang, Camille Goudeseune, Sarah King, Mark Hasegawa-Johnson, Thomas S. Huang:
Saliency-maximized audio visualization and efficient audio-visual browsing for faster-than-real-time human acoustic event detection. ACM Trans. Appl. Percept. 10(4): 26:1-26:16 (2013) - [c27]Nina Zinovieva, Xiaodan Zhuang, Pat Peterson, Joe Alwan, Rohit Prasad:
Probabilistic trainable segmenter for call center audio using multiple features. INTERSPEECH 2013: 2054-2058 - [c26]Xiaodan Zhuang, Shuang Wu, Pradeep Natarajan, Rohit Prasad, Prem Natarajan:
Audio self organized units for high-level event detection. INTERSPEECH 2013: 2953-2957 - [c25]Xiaodan Zhuang, Shuang Wu, Pradeep Natarajan:
Compact bag-of-words visual representation for effective linear classification. ACM Multimedia 2013: 521-524 - [c24]Pradeep Natarajan, Shuang Wu, Florian Luisier, Xiaodan Zhuang, Manasvi Tickoo, Guangnan Ye, Dong Liu, Shih-Fu Chang, Imran Saleemi, Mubarak Shah, Vlad I. Morariu, Larry Davis, Abhinav Gupta, Ismail Haritaoglu, Sadiye Guler, Ashutosh Morde:
BBN VISER TRECVID 2013 Multimedia Event Detection and Multimedia Event Recounting Systems. TRECVID 2013 - [c23]Shiv Naga Prasad Vitaladevuni, Pradeep Natarajan, Shuang Wu, Xiaodan Zhuang, Rohit Prasad, Premkumar Natarajan:
Scene image categorization and video event detection using Naive Bayes Nearest Neighbor. WACV 2013: 140-147 - 2012
- [c22]Pradeep Natarajan, Shuang Wu, Shiv Naga Prasad Vitaladevuni, Xiaodan Zhuang, Stavros Tsakalidis, Unsang Park, Rohit Prasad, Premkumar Natarajan:
Multimodal feature fusion for robust event detection in web videos. CVPR 2012: 1298-1305 - [c21]Pradeep Natarajan, Shuang Wu, Shiv Naga Prasad Vitaladevuni, Xiaodan Zhuang, Unsang Park, Rohit Prasad, Premkumar Natarajan:
Multi-channel Shape-Flow Kernel Descriptors for Robust Video Event Detection and Retrieval. ECCV (2) 2012: 301-314 - [c20]Kai-Hsiang Lin, Xiaodan Zhuang, Camille Goudeseune, Sarah King, Mark Hasegawa-Johnson, Thomas S. Huang:
Improving faster-than-real-time human acoustic event detection by saliency-maximized audio visualization. ICASSP 2012: 2277-2280 - [c19]Xiaodan Zhuang, Stavros Tsakalidis, Shuang Wu, Pradeep Natarajan, Rohit Prasad, Prem Natarajan:
Compact Audio Representation for Event Detection in Consumer Media. INTERSPEECH 2012: 2089-2092 - [c18]Stavros Tsakalidis, Xiaodan Zhuang, Roger Hsiao, Shuang Wu, Pradeep Natarajan, Rohit Prasad, Prem Natarajan:
Robust Event Detection From Spoken Content In Consumer Domain Videos. INTERSPEECH 2012: 2101-2104 - [c17]Pradeep Natarajan, Prem Natarajan, Shuang Wu, Xiaodan Zhuang, Amelio Vázquez Reina, Shiv Vitaladevuni, Kleovoulos Tsourides, Carl Andersen, Rohit Prasad, Guangnan Ye, Dong Liu, Shih-Fu Chang, Imran Saleemi, Mubarak Shah, Yue Ng, Brandyn White, Larry Davis, Abhinav Gupta, Ismail Haritaoglu:
BBNVISER : BBN VISER TRECVID 2012 Multimedia Event Detection and Multimedia Event Recounting Systems. TRECVID 2012 - 2011
- [b1]Xiaodan Zhuang:
Modeling audio and visual cues for real-world event detection. University of Illinois Urbana-Champaign, USA, 2011 - [c16]Po-Sen Huang, Xiaodan Zhuang, Mark Hasegawa-Johnson:
Improving acoustic event detection using generalizable visual features and multi-modality modeling. ICASSP 2011: 349-352 - [c15]Lijuan Wang, Yi-Jian Wu, Xiaodan Zhuang, Frank K. Soong:
Synthesizing visual speech trajectory with minimum generation error. ICASSP 2011: 4580-4583 - [c14]Mark Hasegawa-Johnson, Jui-Ting Huang, Xiaodan Zhuang:
Unlabeled data and other marginals. MLSLP 2011 - [c13]Pradeep Natarajan, Prem Natarajan, Vasant Manohar, Shuang Wu, Stavros Tsakalidis, Shiv Vitaladevuni, Xiaodan Zhuang, Rohit Prasad, Guangnan Ye, Dong Liu, I-Hong Jhuo, Shih-Fu Chang, Hamid Izadinia, Imran Saleemi, Mubarak Shah, Brandyn White, Tom Yeh, Larry Davis:
BBN VISER TRECVID 2011 Multimedia Event Detection System. TRECVID 2011 - [p1]Xiaodan Zhuang, Xi Zhou, Mark Hasegawa-Johnson, Thomas S. Huang:
Efficient Object Localization with Variation-Normalized Gaussianized Vectors. Intelligent Video Event Analysis and Understanding 2011: 93-109 - 2010
- [j2]Xi Zhou, Xiaodan Zhuang, Hao Tang, Mark Hasegawa-Johnson, Thomas S. Huang:
Novel Gaussianized vector representation for improved natural scene categorization. Pattern Recognit. Lett. 31(8): 702-708 (2010) - [j1]Xiaodan Zhuang, Xi Zhou, Mark Hasegawa-Johnson, Thomas S. Huang:
Real-world acoustic event detection. Pattern Recognit. Lett. 31(12): 1543-1551 (2010) - [c12]Xiaodan Zhuang, Lijuan Wang, Frank K. Soong, Mark Hasegawa-Johnson:
A minimum converted trajectory error (MCTE) approach to high quality speech-to-lips conversion. INTERSPEECH 2010: 1736-1739 - [c11]Chi Hu, Xiaodan Zhuang, Mark Hasegawa-Johnson:
FSM-based pronunciation modeling using articulatory phonological code. INTERSPEECH 2010: 2274-2277
2000 – 2009
- 2009
- [c10]Xiaodan Zhuang, Jing Huang, Gerasimos Potamianos, Mark Hasegawa-Johnson:
Acoustic fall detection using Gaussian mixture models and GMM supervectors. ICASSP 2009: 69-72 - [c9]Jing Huang, Xiaodan Zhuang, Vit Libal, Gerasimos Potamianos:
Long-time span acoustic activity analysis from far-field sensors in smart homes. ICASSP 2009: 4173-4176 - [c8]Xiaodan Zhuang, Hosung Nam, Mark Hasegawa-Johnson, Louis Goldstein, Elliot Saltzman:
Articulatory phonological code for word classification. INTERSPEECH 2009: 2763-2766 - 2008
- [c7]Xiaodan Zhuang, Xi Zhou, Thomas S. Huang, Mark Hasegawa-Johnson:
Feature analysis and selection for acoustic event detection. ICASSP 2008: 17-20 - [c6]Xi Zhou, Xiaodan Zhuang, Hao Tang, Mark Hasegawa-Johnson, Thomas S. Huang:
A novel Gaussianized vector representation for natural scene categorization. ICPR 2008: 1-4 - [c5]Xiaodan Zhuang, Xi Zhou, Mark Hasegawa-Johnson, Thomas S. Huang:
Face age estimation using patch-based hidden Markov model supervectors. ICPR 2008: 1-4 - [c4]Xiaodan Zhuang, Hosung Nam, Mark Hasegawa-Johnson, Louis M. Goldstein, Elliot Saltzman:
The entropy of the articulatory phonological code: recognizing gestures from tract variables. INTERSPEECH 2008: 1489-1492 - [c3]Xi Zhou, Xiaodan Zhuang, Shuicheng Yan, Shih-Fu Chang, Mark Hasegawa-Johnson, Thomas S. Huang:
SIFT-Bag kernel for video event analysis. ACM Multimedia 2008: 229-238 - 2007
- [c2]Ming Liu, Yanxiang Chen, Xi Zhou, Xiaodan Zhuang, Mark Hasegawa-Johnson, Thomas S. Huang:
Multichannel and Multimodality Person Identification. CLEAR 2007: 248-255 - [c1]Xi Zhou, Xiaodan Zhuang, Ming Liu, Hao Tang, Mark Hasegawa-Johnson, Thomas S. Huang:
HMM-Based Acoustic Event Detection with AdaBoost Feature Selection. CLEAR 2007: 345-353
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-09-30 00:58 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint