default search action
SLT 2016: San Diego, CA, USA
- 2016 IEEE Spoken Language Technology Workshop, SLT 2016, San Diego, CA, USA, December 13-16, 2016. IEEE 2016, ISBN 978-1-5090-4903-5
- Gueorgui Pironkov, Stéphane Dupont, Thierry Dutoit:
I-Vector estimation as auxiliary task for Multi-Task Learning based acoustic modeling for automatic speech recognition. 1-7 - Scott Novotney, Damianos G. Karakos, Jan Silovský, Richard M. Schwartz:
BBN technologies' OpenSAD system. 8-12 - Dayana Ribas, Emmanuel Vincent, José Ramón Calvo de Lara:
A study of speech distortion conditions in real scenarios for speech processing applications. 13-20 - Mortaza Doulaty, Richard Rose, Olivier Siohan:
Automatic optimization of data perturbation distributions for multi-style training in speech recognition. 21-27 - Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio:
Batch-normalized joint training for DNN-based distant speech recognition. 28-34 - Sakriani Sakti, Seiji Kawanishi, Graham Neubig, Koichiro Yoshino, Satoshi Nakamura:
Deep bottleneck features and sound-dependent i-vectors for simultaneous recognition of speech and environmental sounds. 35-42 - Shawn Tan, Khe Chai Sim:
Learning utterance-level normalisation using Variational Autoencoders for robust automatic speech recognition. 43-49 - Bernd T. Meyer, Sri Harish Reddy Mallidi, Angel Mario Castro Martinez, Guillermo Payá Vayá, Hendrik Kayser, Hynek Hermansky:
Performance monitoring for automatic speech recognition in noisy multi-channel environments. 50-56 - Michael Heck, Sakriani Sakti, Satoshi Nakamura:
Iterative training of a DPGMM-HMM acoustic unit recognizer in a zero resource scenario. 57-63 - Chris Bartels, Wen Wang, Vikramjit Mitra, Colleen Richey, Andreas Kathol, Dimitra Vergyri, Harry Bratt, Chiachi Hung:
Toward human-assisted lexical unit discovery without text resources. 64-70 - Amir Hossein Harati Nejad Torbati, Joseph Picone:
A nonparametric Bayesian approach for automatic discovery of a lexicon and acoustic units. 71-75 - Shubham Toshniwal, Karen Livescu:
Jointly learning to align and convert graphemes to phonemes with neural attention models. 76-82 - Tiancheng Zhao, Kyusong Lee, Maxine Eskénazi:
DialPort: Connecting the spoken dialog research community to real user data. 83-90 - Ming Sun, Aasish Pappu, Yun-Nung Chen, Alexander I. Rudnicky:
Weakly supervised user intent detection for multi-domain dialogues. 91-97 - Merwan Barlier, Romain Laroche, Olivier Pietquin:
Learning dialogue dynamics with the method of moments. 98-105 - Tatiana Ekeinhor-Komi, Jean Léon Bouraoui, Romain Laroche, Fabrice Lefèvre:
Towards a virtual personal assistant based on a user-defined portfolio of multi-domain vocal applications. 106-113 - Maryam Najafian, John H. L. Hansen:
Speaker independent diarization for child language environment analysis using deep neural networks. 114-120 - Xinhao Wang, Keelan Evanini, James V. Bruno, Matthew Mulholland:
Automatic plagiarism detection for spoken responses in an assessment of English language proficiency. 121-128 - Fumiya Shiozawa, Daisuke Saito, Nobuaki Minematsu:
Improved prediction of the accent gap between speakers of English for individual-based clustering of World Englishes. 129-135 - Michelle Renee Morales, Rivka Levitan:
Speech vs. text: A comparative analysis of features for depression detection systems. 136-143 - Vincent Renkens, Vikrant Tomar, Hugo Van hamme:
Incrementally learn the relevance of words in a dictionary for spoken language acquisition. 144-150 - Lang-Chi Yu, Hung-yi Lee, Lin-Shan Lee:
Abstractive headline generation for spoken content by attentive recurrent neural networks with ASR error modeling. 151-157 - Chun-I Tsai, Hsiao-Tsung Hung, Kuan-Yu Chen, Berlin Chen:
Extractive speech summarization leveraging convolutional neural network techniques. 158-164 - David Snyder, Pegah Ghahremani, Daniel Povey, Daniel Garcia-Romero, Yishay Carmiel, Sanjeev Khudanpur:
Deep neural network-based speaker embeddings for end-to-end speaker verification. 165-170 - Shi-Xiong Zhang, Zhuo Chen, Yong Zhao, Jinyu Li, Yifan Gong:
End-to-End attention based text-dependent speaker verification. 171-178 - Héctor Delgado, Massimiliano Todisco, Md. Sahidullah, Achintya Kumar Sarkar, Nicholas W. D. Evans, Tomi Kinnunen, Zheng-Hua Tan:
Further optimisations of constant Q cepstral processing for integrated utterance and text-dependent speaker verification. 179-185 - Na Li, Man-Wai Mak, Jen-Tzung Chien:
Deep neural network driven mixture of PLDA for robust i-vector speaker verification. 186-191 - Gautam Bhattacharya, Jahangir Alam, Patrick Kenny, Vishwa Gupta:
Modelling speaker and channel variability using deep neural networks for robust speaker verification. 192-198 - Ondrej Novotný, Pavel Matejka, Ondrej Glembek, Oldrich Plchot, Frantisek Grézl, Lukás Burget, Jan Honza Cernocký:
Analysis of the DNN-based SRE systems in multi-language conditions. 199-204 - Finnian Kelly, John H. L. Hansen:
Evaluation and calibration of Lombard effects in speaker verification. 205-209 - Moez Ajili, Jean-François Bonastre, Waad Ben Kheder, Solange Rossato, Juliette Kahn:
Phonetic content impact on Forensic Voice Comparison. 210-217 - Mohamed Bouaziz, Mohamed Morchid, Richard Dufour, Georges Linarès, Renato De Mori:
Parallel Long Short-Term Memory for multi-stream classification. 218-223 - Mohamed Bouaziz, Mohamed Morchid, Richard Dufour, Georges Linarès:
Improving multi-stream classification by mapping sequence-embedding in a high dimensional space. 224-231 - Wei Fang, Juei-Yang Hsu, Hung-yi Lee, Lin-Shan Lee:
Hierarchical attention model for improved machine comprehension of spoken content. 232-238 - Andrea Schnall, Martin Heckmann:
Comparing speaker independent and speaker adapted classification for word prominence detection. 239-244 - Pierre Lison, Raveesh Meena:
Automatic turn segmentation for Movie & TV subtitles. 245-252 - Justin Scheiner, Ian Williams, Petar S. Aleksic:
Voice search language model adaptation using contextual information. 253-257 - Maria Joana Correia, Isabel Trancoso, Bhiksha Raj:
Adaptation of SVM for MIL for inferring the polarity of movies and movie reviews. 258-264 - Meriem Beloucif, Dekai Wu:
Semantically driven inversion transduction grammar induction for early stage training of spoken language translation. 265-272 - Xu-Kui Yang, Dan Qu, Wen-Lin Zhang, Wei-Qiang Zhang:
The NDSC transcription system for the 2016 multi-genre broadcast challenge. 273-278 - Ahmed Ali, Peter Bell, James R. Glass, Yacine Messaoui, Hamdy Mubarak, Steve Renals, Yifan Zhang:
The MGB-2 challenge: Arabic multi-dialect broadcast media recognition. 279-284 - Natalia A. Tomashenko, Kévin Vythelingum, Anthony Rousseau, Yannick Estève:
LIUM ASR systems for the 2016 Multi-Genre Broadcast Arabic challenge. 285-291 - Sameer Khurana, Ahmed M. Ali:
QCRI advanced transcription system (QATS) for the Arabic Multi-Dialect Broadcast media recognition: MGB-2 challenge. 292-298 - Tuka Al Hanai, Wei-Ning Hsu, James R. Glass:
Development of the MIT ASR system for the 2016 Arabic Multi-genre Broadcast Challenge. 299-304 - Morten Kolbæk, Zheng-Hua Tan, Jesper Jensen:
Speech enhancement using Long Short-Term Memory based recurrent Neural Networks for noise robust Speaker Verification. 305-311 - Lea Schonherr, Dennis Orth, Martin Heckmann, Dorothea Kolossa:
Environmentally robust audio-visual speaker identification. 312-318 - Harishchandra Dubey, Abhijeet Sangwan, John H. L. Hansen:
A robust diarization system for measuring dominance in Peer-Led Team Learning groups. 319-323 - Qian Zhang, John H. L. Hansen:
Unsupervised k-means clustering based out-of-set candidate selection for robust open-set language recognition. 324-329 - Luis Murphy Marcos, Frederick Richardson:
Multi-lingual deep neural networks for language recognition. 330-334 - Shahan C. Nercessian, Pedro A. Torres-Carrasquillo, Gabriel Martinez-Montes:
Approaches for language identification in mismatched environments. 335-340 - Mohamed Kamal Omar:
A factor analysis model of sequences for language recognition. 341-347 - Yun-Nung Chen, Dilek Hakanni-Tur, Gökhan Tür, Asli Celikyilmaz, Jianfeng Gao, Li Deng:
Syntax or semantics? knowledge-guided joint semantic frame parsing. 348-355 - Killian Janod, Mohamed Morchid, Richard Dufour, Georges Linarès:
A log-linear weighting approach in the Word2vec space for spoken language understanding. 356-361 - Titouan Parcollet, Mohamed Morchid, Pierre-Michel Bousquet, Richard Dufour, Georges Linarès, Renato De Mori:
Quaternion Neural Networks for Spoken Language Understanding. 362-368 - Takeshi Homma, Kazuaki Shima, Takuya Matsumoto:
Robust utterance classification using multiple classifiers in the presence of speech recognition errors. 369-375 - Gozde Cetinkaya, Batuhan Gündogdu, Murat Saraclar:
Pre-filtered dynamic time warping for posteriorgram based keyword search. 376-382 - Dario Bertero, Pascale Fung:
Multimodal deep neural nets for detecting humor in TV sitcoms. 383-390 - Ruhi Sarikaya, Paul A. Crook, Alex Marin, Minwoo Jeong, Jean-Philippe Robichaud, Asli Celikyilmaz, Young-Bum Kim, Alexandre Rochette, Omar Zia Khan, Xiaohu Liu, Daniel Boies, Tasos Anastasakos, Zhaleh Feizollahi, Nikhil Ramesh, Hisami Suzuki, Roman Holenstein, Elizabeth Krawczyk, Vasiliy Radostev:
An overview of end-to-end language understanding and dialog management for personal digital assistants. 391-397 - Leonid Velikovich:
Semantic model for fast tagging of word lattices. 398-405 - Franck Dernoncourt, Ji Young Lee:
Optimizing neural network hyperparameters with Gaussian processes for dialog act classification. 406-413 - Joo-Kyung Kim, Gökhan Tür, Asli Celikyilmaz, Bin Cao, Ye-Yi Wang:
Intent detection using semantically enriched word embeddings. 414-419 - Yike Zhang, Pengyuan Zhang, Ta Li, Yonghong Yan:
An unsupervised vocabulary selection technique for Chinese automatic speech recognition. 420-425 - Anna Currey, Irina Illina, Dominique Fohr:
Dynamic adjustment of language models for automatic speech recognition using word similarity. 426-432 - Ondrej Klejch, Peter Bell, Steve Renals:
Punctuated transcription of multi-genre broadcasts using acoustic and lexical approaches. 433-440 - Lucy Vasserman, Ben Haynor, Petar S. Aleksic:
Contextual language model adaptation using dynamic classes. 441-446 - Assaf Hurwitz Michaely, Mohammadreza Ghodsi, Zelin Wu, Justin Scheiner, Petar S. Aleksic:
Unsupervised context learning for speech recognition. 447-453 - Akshay Chandrashekaran, Ian R. Lane:
Automated optimization of decoder hyper-parameters for online LVCSR. 454-460 - Liang Lu:
Sequence training and adaptation of highway deep neural networks. 461-466 - Wei-Ning Hsu, Yu Zhang, James R. Glass:
A prioritized grid long short-term memory RNN for speech recognition. 467-473 - Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, Gengshen Fu, Arindam Mandal, Spyros Matsoukas, Nikko Strom, Shiv Vitaladevuni:
Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting. 474-480 - Yanmin Qian, Philip C. Woodland:
Very deep convolutional neural networks for robust speech recognition. 481-488 - Ivan Kukanov, Ville Hautamäki, Sabato Marco Siniscalchi, Kehuang Li:
Deep learning with maximal figure-of-merit cost to advance multi-label speech attribute detection. 489-495 - Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu:
End-to-end training approaches for discriminative segmental models. 496-502 - Shane Settle, Karen Livescu:
Discriminative acoustic word embeddings: Tecurrent neural network-based approaches. 503-510 - Seokhwan Kim, Luis Fernando D'Haro, Rafael E. Banchs, Jason D. Williams, Matthew Henderson, Koichiro Yoshino:
The fifth dialog state tracking challenge. 511-517 - Takashi Ushio, Hongjie Shi, Mitsuru Endo, Katsuyoshi Yamagami, Noriaki Horii:
Recurrent convolutional neural networks for structured speech act tagging. 518-524 - Ying Su, Miao Li, Ji Wu:
The MSIIP system for dialog state tracking challenge 5. 525-530 - Youngsoo Jang, Jiyeon Ham, Byung-Jun Lee, Youngjae Chang, Kee-Eung Kim:
Neural dialog state tracker for large ontologies by attention mechanism. 531-537 - Richard Dufour, Mohamed Morchid, Titouan Parcollet:
Tracking dialog states using an Author-Topic based representation. 544-551 - Takaaki Hori, Hai Wang, Chiori Hori, Shinji Watanabe, Bret Harsham, Jonathan Le Roux, John R. Hershey, Yusuke Koji, Yi Jing, Zhaocheng Zhu, Takeyuki Aikawa:
Dialog state tracking with attention-based sequence-to-sequence learning. 552-558 - Hongjie Shi, Takashi Ushio, Mitsuru Endo, Katsuyoshi Yamagami, Noriaki Horii:
A multichannel convolutional neural network for cross-language dialog state tracking. 559-564 - Leimin Tian, Johanna D. Moore, Catherine Lai:
Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical features. 565-572 - Felix Sun, David F. Harwath, James R. Glass:
Look, listen, and decode: Multimodal speech recognition with images. 573-578 - Spyridon Thermos, Gerasimos Potamianos:
Audio-visual speech activity detection in a two-speaker scenario incorporating depth information from a profile or frontal view. 579-584 - Ian Beaver, Cynthia Freeman:
Analysis of user behavior with multimodal virtual customer service agents. 585-591 - Felix de Chaumont Quitry, Asa Oines, Pedro J. Moreno, Eugene Weinstein:
High quality agreement-based semi-supervised training data for acoustic modeling. 592-596 - Adriana Stan, Cassia Valentini-Botinhao, Bogdan Orza, Mircea Giurgiu:
Blind speech segmentation using spectrogram image-based features and Mel cepstral coefficients. 597-602 - Ryu Takeda, Kazunori Komatani:
Discriminative multiple sound source localization based on deep neural networks using independent location model. 603-609 - Emre Yilmaz, Henk van den Heuvel, David A. van Leeuwen:
Code-switching detection using multilingual DNNS. 610-616 - Vipul Arora, Aditi Lahiri, Henning Reetz:
Attribute based shared hidden layers for cross-language knowledge transfer. 617-623 - Mohamed Elfeky, Meysam Bastani, Xavier Velez, Pedro J. Moreno, Austin Waters:
Towards acoustic model unification across dialects. 624-628 - Frantisek Grézl, Martin Karafiát:
Boosting performance on low-resource languages by standard corpora: An analysis. 629-636 - Martin Karafiát, Murali Karthick Baskar, Pavel Matejka, Karel Veselý, Frantisek Grézl, Jan Cernocký:
Multilingual BLSTM and speaker-specific vector adaptation in 2016 but babel system. 637-643 - Marco Matassoni, Daniele Falavigna, Diego Giuliani:
DNN adaptation for recognition of children speech through automatic utterance selection. 644-651 - Lahiru Samarakoon, Khe Chai Sim:
Low-rank bases for factorized hidden layer adaptation of DNN acoustic models. 652-658 - Hoon Chung, Jeom Ja Kang, Kiyoung Park, Sung Joo Lee, Jeon Gue Park:
Deep neural network based acoustic model parameter reduction using manifold regularized low rank matrix factorization. 659-664 - Tomohiro Tanaka, Takafumi Moriya, Takahiro Shinozaki, Shinji Watanabe, Takaaki Hori, Kevin Duh:
Automated structure discovery and parameter tuning of neural network language model based on evolution strategy. 665-671 - Gautam Mantena, Khe Chai Sim:
Entropy-based pruning of hidden units to reduce DNN parameters. 672-679 - Florian Hinterleitner, Benjamin Weiss, Sebastian Möller:
Influence of corpus size and content on the perceptual quality of a unit selection MaryTTS voice. 680-685 - Srikanth Ronanki, Oliver Watts, Simon King, Gustav Eje Henter:
Median-based generation of synthetic speech durations using a non-parametric approach. 686-692 - Kazuhiro Kobayashi, Tomoki Toda, Satoshi Nakamura:
F0 transformation techniques for statistical voice conversion with direct waveform modification with spectral differential. 693-700
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.