default search action
INTERSPEECH 2015: Dresden, Germany
- 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015, Dresden, Germany, September 6-10, 2015. ISCA 2015
Keynotes
- Mary E. Beckman:
The emergence of compositional structure in language evolution and development. - Ruhi Sarikaya:
The technology powering personal digital assistants. - Katrin Amunts:
The HBP-atlas - concept, perspectives, and application for language and speech research. - Klaus R. Scherer:
Voices of power, passion, and personality.
Feature Extraction and Modeling with Neural Networks
- Tara N. Sainath, Ron J. Weiss, Andrew W. Senior, Kevin W. Wilson, Oriol Vinyals:
Learning the speech front-end with raw waveform CLDNNs. 1-5 - Mayank Bhargava, Richard Rose:
Architectures for deep neural network based acoustic models defined over windowed speech waveforms. 6-10 - Dimitri Palaz, Mathew Magimai-Doss, Ronan Collobert:
Analysis of CNN-based speech recognition system using raw speech as input. 11-15 - Tetsuji Ogawa, Kenshiro Ueda, Kouichi Katsurada, Tetsunori Kobayashi, Tsuneo Nitta:
Bilinear map of filter-bank outputs for DNN-based speech recognition. 16-20 - Payton Lin, Dau-Cheng Lyu, Yun-Fan Chang, Yu Tsao:
Speech recognition with temporal neural networks. 21-25 - Pavel Golik, Zoltán Tüske, Ralf Schlüter, Hermann Ney:
Convolutional neural networks for acoustic modeling of raw time signal in LVCSR. 26-30
Prosody 1-3
- Ulrike Glavitsch, Lei He, Volker Dellwo:
Stable and unstable intervals as a basic segmentation procedure of the speech signal. 31-35 - Andreas Windmann, Juraj Simko, Petra Wagner:
Polysyllabic shortening and word-final lengthening in English. 36-40 - Anders Eriksson, Mattias Heldner:
The acoustics of word stress in English as a function of stress level and speaking style. 41-45 - Katharina Zahner, Muna Pohl, Bettina Braun:
Pitch accent distribution in German infant-directed speech. 46-50 - Hansjörg Mixdorff, Christian G. Cossio Mercado, Angelika Hönemann, Jorge A. Gurlekian, Diego A. Evin, Humberto M. Torres:
Acoustic correlates of perceived syllable prominence in German. 51-55 - Simone Simonetti, Jeesun Kim, Chris Davis:
Cross-modality matching of linguistic and emotional prosody. 56-59
Speech Intelligibility Enhancement
- Tudor-Catalin Zorila, Yannis Stylianou:
A fast algorithm for improved intelligibility of speech-in-noise based on frequency and time domain energy reallocation. 60-64 - Maria Koutsogiannaki, Petko Nikolov Petkov, Yannis Stylianou:
Intelligibility enhancement of casual speech for reverberant environments inspired by clear speech properties. 65-69 - Amira Ben Jemaa, N. Mechergui, G. Courtois, A. Mudry, Sonia Djaziri Larbi, Monia Turki, Hervé Lissek, Meriem Jaïdane:
Intelligibility enhancement of vocal announcements for public address systems: a design for all through a presbycusis pre-compensation filter. 70-74 - Henning F. Schepker, David Hülsmeier, Jan Rennies, Simon Doclo:
Model-based integration of reverberation for noise-adaptive near-end listening enhancement. 75-79 - Sebastian Rottschäfer, Hendrik Buschmeier, Herwin van Welbergen, Stefan Kopp:
Online Lombard adaptation in incremental speech synthesis. 80-84 - Emma Jokinen, Ulpu Remes, Paavo Alku:
Comparison of Gaussian process regression and Gaussian mixture models in spectral tilt modelling for intelligibility enhancement of telephone speech. 85-89
Detecting and Predicting Mental and Social Disorders
- Naveen Kumar, Shrikanth S. Narayanan:
A discriminative reliability-aware classification model with applications to intelligibility classification in pathological speech. 90-94 - Juan Rafael Orozco-Arroyave, Florian Hönig, Julián D. Arias-Londoño, Jesús Francisco Vargas-Bonilla, Sabine Skodda, Jan Rusz, Elmar Nöth:
Voiced/unvoiced transitions in speech as a potential bio-marker to detect parkinson's disease. 95-99 - Tatiana Villa-Cañas, Julián D. Arias-Londoño, Juan Rafael Orozco-Arroyave, Jesús Francisco Vargas-Bonilla, Elmar Nöth:
Low-frequency components analysis in running speech for the automatic detection of parkinson's disease. 100-104 - Juan Camilo Vásquez-Correa, Tomás Arias-Vergara, Juan Rafael Orozco-Arroyave, Jesús Francisco Vargas-Bonilla, Julián D. Arias-Londoño, Elmar Nöth:
Automatic detection of parkinson's disease from continuous speech recorded in non-controlled noise conditions. 105-109 - Nicholas Cummins, Vidhyasaharan Sethu, Julien Epps, Jarek Krajewski:
Relevance vector machine for depression prediction. 110-114 - Erik Marchi, Björn W. Schuller, Simon Baron-Cohen, Ofer Golan, Sven Bölte, Prerna Arora, Reinhold Häb-Umbach:
Typicality and emotion in the voice of children with autism spectrum condition: evidence across three languages. 115-119
Spoken Language Understanding 1-3
- Chunxi Liu, Puyang Xu, Ruhi Sarikaya:
Deep contextual language understanding in spoken dialogue systems. 120-124 - Yik-Cheung Tam, Yangyang Shi, Hunk Chen, Mei-Yuh Hwang:
RNN-based labeled data generation for spoken language understanding. 125-129 - Vedran Vukotic, Christian Raymond, Guillaume Gravier:
Is it time to Switch to word embedding and recurrent neural networks for spoken language understanding? 130-134 - Suman V. Ravuri, Andreas Stolcke:
Recurrent neural network and LSTM models for lexical utterance classification. 135-139 - Hung-tsung Lu, Yuan-ming Liou, Hung-yi Lee, Lin-Shan Lee:
Semantic retrieval of personal photos using a deep autoencoder fusing visual features with speech annotations represented as word/paragraph vectors. 140-144 - Mohamed Morchid, Richard Dufour, Driss Matrouf:
A comparison of normalization techniques applied to latent space representations for speech analytics. 145-149
Active Perception in Human and Machine Speech Communication (Special Session)
- Éva Székely, Mark T. Keane, Julie Carson-Berndsen:
The effect of soft, modal and loud voice levels on entrainment in noisy conditions. 150-154 - Benjamin R. Cowan, Holly P. Branigan:
Does voice anthropomorphism affect lexical alignment in speech-based human-computer dialogue? 155-159 - Ning Ma, Guy J. Brown, José A. González:
Exploiting top-down source models to improve binaural localisation of multiple sources in reverberant environments. 160-164 - Christopher Schymura, Fiete Winter, Dorothea Kolossa, Sascha Spors:
Binaural sound source localisation and tracking using a dynamic spherical head model. 165-169 - Tobias May, Thomas Bentsen, Torsten Dau:
The role of temporal resolution in modulation-based speech segregation. 170-174 - Hendrik Kayser, Constantin Spille, Daniel Marquardt, Bernd T. Meyer:
Improving automatic speech recognition in spatially-aware hearing aids. 175-179 - Randy Gomez, Levko Ivanchuk, Keisuke Nakamura, Takeshi Mizumoto, Kazuhiro Nakadai:
Dereverberation for active human-robot communication robust to speaker's face orientation. 180-184
Speaker Recognition and Diarization 1-3
- Nanxin Chen, Yanmin Qian, Kai Yu:
Multi-task learning for text-dependent speaker verification. 185-189 - Themos Stafylakis, Patrick Kenny, Md. Jahangir Alam, Marcel Kockmann:
JFA for speaker recognition with random digit strings. 190-194 - Elena Knyazeva, Guillaume Wisniewski, Hervé Bredin, François Yvon:
Structured prediction for speaker identification in TV series. 195-199 - Sandro Cumani, Pietro Laface, Farzana Kulsoom:
Speaker recognition by means of acoustic and phonetically informed GMMs. 200-204 - Ashish Panda:
A fast approach to psychoacoustic model compensation for robust speaker recognition in additive noise. 205-209 - Danila Doroshin, Nikolay Lubimov, Marina Nastasenko, Mikhail Kotov:
Blind score normalization method for PLDA based speaker recognition. 210-213 - Sergey Novoselov, Timur Pekhovsky, Oleg Kudashev, Valentin S. Mendelev, Alexey Prudnikov:
Non-linear PLDA for i-vector speaker verification. 214-218 - Carlos Vaquero, Patricia Rodríguez:
On the need of template protection for voice authentication. 219-223 - Finnian Kelly, John H. L. Hansen:
Evaluation and calibration of short-term aging effects in speaker verification. 224-228 - Liping Chen, Kong-Aik Lee, Bin Ma, Wu Guo, Haizhou Li, Li-Rong Dai:
Phone-centric local variability vector for text-constrained speaker verification. 229-233 - Kuruvachan K. George, C. Santhosh Kumar, K. I. Ramachandran, Ashish Panda:
Cosine distance features for robust speaker verification. 234-238 - Sayaka Shiota, Fernando Villavicencio, Junichi Yamagishi, Nobutaka Ono, Isao Echizen, Tomoko Matsui:
Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification. 239-243 - Antti Hurmalainen, Rahim Saeidi, Tuomas Virtanen:
Noise robust speaker recognition with convolutive sparse coding. 244-248 - Md. Jahangir Alam, Patrick Kenny, Themos Stafylakis:
Combining amplitude and phase-based features for speaker verification with short duration utterances. 249-253
Speech Synthesis 1-3
- Tuomo Raitio, Lauri Juvela, Antti Suni, Martti Vainio, Paavo Alku:
Phase perception of the glottal excitation of vocoded speech. 254-258 - Sunayana Sitaram, Serena Jeblee, Alan W. Black:
Using acoustics to improve pronunciation for synthesis of low resource languages. 259-263 - Tadashi Inai, Sunao Hara, Masanobu Abe, Yusuke Ijima, Noboru Miyazaki, Hideyuki Mizuno:
Sub-band text-to-speech combining sample-based spectrum with statistically generated spectrum. 264-268 - Heng Lu, Wei Zhang, Xu Shao, Quan Zhou, Wenhui Lei, Hongbin Zhou, Andrew P. Breen:
Pruning redundant synthesis units based on static and delta unit appearance frequency. 269-273 - Yamato Ohtani, Yu Nasu, Masahiro Morita, Masami Akamine:
Emotional transplant in statistical speech synthesis based on emotion additive model. 274-278 - Xurong Xie, Xunying Liu, Lan Wang, Rongfeng Su:
Generalized variable parameter HMMs based acoustic-to-articulatory inversion. 279-283 - Seyed Hamidreza Mohammadi, Alexander Kain:
Semi-supervised training of a voice conversion mapping function using a joint-autoencoder. 284-288 - Stefan Huber, Axel Roebel:
On glottal source shape parameter transformation using a novel deterministic and stochastic speech analysis and synthesis system. 289-293 - Yi-Chin Huang, Chung-Hsien Wu, Ming-Ge Shie:
Fluent personalized speech synthesis with prosodic word-level spontaneous speech generation. 294-298 - Yuji Oshima, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristics. 299-303 - Markus Toman, Michael Pucher:
Evaluation of state mapping based foreign accent conversion. 304-308 - Zhizheng Wu, Simon King:
Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features. 309-313
Mining and Annotation of Spoken and Multimodal Resources
- Jindrich Matousek, Daniel Tihelka:
Anomaly-based annotation errors detection in TTS corpora. 314-318 - Katrin Schweitzer, Markus Gärtner, Arndt Riester, Ina Rösiger, Kerstin Eckart, Jonas Kuhn, Grzegorz Dogil:
Analysing automatic descriptions of intonation with ICARUS. 319-323 - Nancy F. Chen, Rong Tong, Darren Wee, Pei Xuan Lee, Bin Ma, Haizhou Li:
iCALL corpus: Mandarin Chinese spoken by non-native speakers of European descent. 324-328 - Ka-Ho Wong, Yu Ting Yeung, Edwin H. Y. Chan, Patrick C. M. Wong, Gina-Anne Levow, Helen M. Meng:
Development of a Cantonese dysarthric speech corpus. 329-333 - Harish Arsikere, Sonal Patil, Ranjeet Kumar, Kundan Shrivastava, Om Deshmukh:
Stylex: a corpus of educational videos for research on speaking styles and their impact on engagement and learning. 334-338 - Dogan Can, David C. Atkins, Shrikanth S. Narayanan:
A dialog act tagging approach to behavioral coding: a case study of addiction counseling conversations. 339-343 - Valentina Vapnarsky, Claude Barras, Cédric Becquey, David Doukhan, Martine Adda-Decker, Lori Lamel:
Analysing rhythm in ritual discourse in yucatec maya using automatic speech alignment. 344-348 - Madina Hasan, Rama Doddipatla, Thomas Hain:
Noise-matched training of CRF based sentence end detection models. 349-353 - Jianjing Kuang, Mark Liberman:
The effect of spectral slope on pitch perception. 354-358
Speech Production Data and Models
- Honghao Bao, Wenhuan Lu, Kiyoshi Honda, Jianguo Wei, Qiang Fang, Jianwu Dang:
Combined cine- and tagged-MRI for tracking landmarks on the tongue surface. 359-363 - Guillaume Barbier, Louis-Jean Boë, Guillaume Captier, Rafael Laboissière:
Human vocal tract growth: a longitudinal study of the development of various anatomical structures. 364-368 - Ganesh Sivaraman, Vikramjit Mitra, Mark K. Tiede, Elliot Saltzman, Louis Goldstein, Carol Y. Espy-Wilson:
Analysis of coarticulated speech using estimated articulatory trajectories. 369-373 - Guillaume Barbier, Pascal Perrier, Lucie Ménard, Yohan Payan, Mark K. Tiede, Joseph S. Perkell:
Speech planning in 4-year-old children versus adults: acoustic and articulatory analyses. 374-378 - Tokihiko Kaburagi:
Morphological and acoustic analysis of the vocal tract using a multi-speaker volumetric MRI dataset. 379-383 - Zisis Iason Skordilis, Vikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan:
Experimental assessment of the tongue incompressibility hypothesis during speech production. 384-388
Deep Neural Networks in Language and Accent Recognition
- Radek Fér, Pavel Matejka, Frantisek Grézl, Oldrich Plchot, Jan Cernocký:
Multilingual bottleneck features for language recognition. 389-393 - Alan McCree, Daniel Garcia-Romero:
DNN senone MAP multinomial i-vectors for phonotactic language recognition. 394-397 - Yan Song, Xinhai Hong, Bing Jiang, Ruilian Cui, Ian McLoughlin, Li-Rong Dai:
Deep bottleneck network based i-vector representation for language identification. 398-402 - Alicia Lozano-Diez, Rubén Zazo-Candil, Javier Gonzalez-Dominguez, Doroteo T. Toledano, Joaquín González-Rodríguez:
An end-to-end approach to language identification in short utterances using convolutional neural networks. 403-407 - Ville Hautamäki, Sabato Marco Siniscalchi, Hamid Behravan, Valerio Mario Salerno, Ivan Kukanov:
Boosting universal speech attributes classification with deep neural network for foreign accent characterization. 408-412 - Wang Geng, Jie Li, Shanshan Zhang, Xinyuan Cai, Bo Xu:
Multilingual tandem bottleneck feature for language identification. 413-417
Speech Transmission
- Afsaneh Asaei, Milos Cernak, Hervé Bourlard:
On compressibility of neural network phonological features for low bit rate speech coding. 418-422 - Michal Lenarczyk:
Robust and accurate LSF location with laguerre method. 423-427 - Jochen Issing, Nikolaus Färber, Reinhard German:
Interactivity-aware playout adaptation. 428-432 - Jochen Issing, Nikolaus Färber, Reinhard German:
Advanced time shrinking using a drop classifier based on codec features. 433-437 - Andrew Hines, Eoin Gillen, Naomi Harte:
Measuring and monitoring speech quality for voice over IP with POLQA, viSQOL and p.563. 438-442 - Laura Fernández Gallardo, Sebastian Möller:
Towards the prediction of human speaker identification performance from measured speech quality. 443-447
Language Modeling for Conversational Speech
- Michael Levit, Andreas Stolcke, R. Subba, Sarangarajan Parthasarathy, Shuangyu Chang, S. Xie, T. Anastasakos, Benoît Dumoulin:
Personalization of word-phrase-entity language models. 448-452 - Akio Kobayashi, Manon Ichiki, Takahiro Oku, Kazuo Onoe, Shoei Sato:
Discriminative bilinear language modeling for broadcast transcriptions. 453-457 - Xi Ma, Xiaoxi Wang, Dong Wang, Zhiyong Zhang:
Recognize foreign low-frequency words with similar pairs. 458-462 - Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito:
Combinations of various language model technologies including data expansion and adaptation in spontaneous speech recognition. 463-467 - Petar S. Aleksic, Mohammadreza Ghodsi, Assaf Hurwitz Michaely, Cyril Allauzen, Keith B. Hall, Brian Roark, David Rybach, Pedro J. Moreno:
Bringing contextual information to google speech recognition. 468-472 - Lucy Vasserman, Vlad Schogol, Keith B. Hall:
Sequence-based class tagging for robust transcription in ASR. 473-477
Interspeech 2015 Computational Paralinguistics ChallengE (ComParE): Degree of Nativeness, Parkinson's & Eating Condition (Special Session)
- Florian Hönig:
The degree of nativeness sub-challenge: the data. - Juan Rafael Orozco-Arroyave:
The parkinson's condition sub-challenge: the data. - Anton Batliner:
The eating condition sub-challenge: the data. - Stefan Steidl:
The INTERSPEECH 2015 computational paralinguistics challenge: a summary of results. - Björn W. Schuller, Stefan Steidl, Anton Batliner, Simone Hantke, Florian Hönig, Juan Rafael Orozco-Arroyave, Elmar Nöth, Yue Zhang, Felix Weninger:
The INTERSPEECH 2015 computational paralinguistics challenge: nativeness, parkinson's & eating condition. 478-482 - Claude Montacié, Marie-José Caraty:
Phrase accentuation verification and phonetic variation measurement for the degree of nativeness sub-challenge. 483-487 - Eugénio Ribeiro, Jaime Ferreira, Julia Olcoz, Alberto Abad, Helena Moniz, Fernando Batista, Isabel Trancoso:
Combining multiple approaches to predict the degree of nativeness. 488-492 - Matthew P. Black, Daniel Bone, Zisis Iason Skordilis, Rahul Gupta, Wei Xia, Pavlos Papadopoulos, Sandeep Nallan Chakravarthula, Bo Xiao, Maarten Van Segbroeck, Jangwon Kim, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Automated evaluation of non-native English pronunciation quality: combining knowledge- and data-driven features at multiple time scales. 493-497 - David Sztahó, Gábor Kiss, Klára Vicsi:
Estimating the severity of parkinson's disease from speech using linear regression and database partitioning. 498-502 - Alexander Zlotnik, Juan Manuel Montero, Rubén San Segundo, Ascensión Gallardo-Antolín:
Random forest-based prediction of parkinson's disease progression using acoustic, ASR and intelligibility features. 503-507 - Guozhen An, David Guy Brizan, Min Ma, Michelle Morales, Ali Raza Syed, Andrew Rosenberg:
Automatic recognition of unified parkinson's disease rating from speech with acoustic, i-vector and phonotactic features. 508-512 - Seongjun Hahm, Jun Wang:
Parkinson's condition estimation using speech acoustic and inversely mapped articulatory data. 513-517 - James R. Williamson, Thomas F. Quatieri, Brian S. Helfer, Joseph Perricone, Satrajit S. Ghosh, Gregory A. Ciccarelli, Daryush D. Mehta:
Segment-dependent dynamics in predicting parkinson's disease. 518-522
Pronunciation, Prosody and Audiovisual Features and Models
- S. M. Houghton, Colin J. Champion, Philip Weber:
Recognition of voiced sounds with a continuous state HMM. 523-527 - Xiangyu Zeng, Shi Yin, Dong Wang:
Learning speech rate in speech recognition. 528-532 - Guoguo Chen, Hainan Xu, Minhua Wu, Daniel Povey, Sanjeev Khudanpur:
Pronunciation and silence probability modeling for ASR. 533-537 - Marelie H. Davel, Etienne Barnard, Charl Johannes van Heerden, William Hartmann, Damianos G. Karakos, Richard M. Schwartz, Stavros Tsakalidis:
Exploring minimal pronunciation modeling for low resource languages. 538-542 - Hao Zheng, Zhanlei Yang, Liwei Qiao, Jianping Li, Wenju Liu:
Attribute knowledge integration for speech recognition based on multi-task learning neural networks. 543-547 - Etienne Marcheret, Gerasimos Potamianos, Josef Vopicka, Vaibhava Goel:
Detecting audio-visual synchrony using deep neural networks. 548-552 - Shahram Kalantari, David Dean, Houman Ghaemmaghami, Sridha Sridharan, Clinton Fookes:
Cross database training of audio-visual hidden Markov models for phone recognition. 553-557 - Shahram Kalantari, David Dean, Sridha Sridharan:
Incorporating visual information for spoken term detection. 558-562 - Hiroshi Ninomiya, Norihide Kitaoka, Satoshi Tamura, Yurie Iribe, Kazuya Takeda:
Integration of deep bottleneck features for audio-visual speech recognition. 563-567 - Sofoklis Kakouros, Okko Räsänen:
Automatic detection of sentence prominence in speech using predictability of word-level acoustic features. 568-572 - Milos Cernak, Pierre-Edouard Honnet:
An empirical model of emphatic word detection. 573-577 - Yishuang Ning, Zhiyong Wu, Xiaoyan Lou, Helen M. Meng, Jia Jia, Lianhong Cai:
Using tilt for automatic emphasis detection with Bayesian networks. 578-582
Speech Analysis and Representation 1-3
- Linxue Bai, Peter Jancovic, Martin J. Russell, Philip Weber:
Analysis of a low-dimensional bottleneck neural network representation of speech for modelling speech dynamics. 583-587 - Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose:
Statistical acoustic-to-articulatory mapping unified with speaker normalization based on voice conversion. 588-592 - Raghavendra Reddy Pappagari, Karthika Vijayan, K. Sri Rama Murty:
Analysis of features from analytic representation of speech using MP-ABX measures. 593-597 - Erfan Loweimi, Jon Barker, Thomas Hain:
Source-filter separation of speech signal in the phase domain. 598-602 - Ranniery Maia, Yannis Stylianou, Masami Akamine:
A maximum likelihood approach to the detection of moments of maximum excitation and its application to high-quality speech parameterization. 603-607 - Christopher Liberatore, Sandesh Aryal, Zelun Wang, Seth Polsley, Ricardo Gutierrez-Osuna:
SABR: sparse, anchor-based representation of the speech signal. 608-612 - Tamás Gábor Csapó, Géza Németh:
Automatic transformation of irregular to regular voice by residual analysis and synthesis. 613-617 - Simon Preuß, Peter Birkholz:
Optical sensor calibration for electro-optical stomatography. 618-622 - Kálmán Abari, Tamás Gábor Csapó, Bálint Pál Tóth, Gábor Olaszy:
From text to formants - indirect model for trajectory prediction based on a multi-speaker parallel speech database. 623-627 - Chung-Chien Hsu, Jen-Tzung Chien, Tai-Shih Chi:
Layered nonnegative matrix factorization for speech separation. 628-632 - Catherine Laporte, Lucie Ménard:
Robust tongue tracking in ultrasound images: a multi-hypothesis approach. 633-637 - Danny Websdale, Thomas Le Cornu, Ben Milner:
Objective measures for predicting the intelligibility of spectrally smoothed speech with artificial excitation. 638-642
Speech Recognition - Technologies and Systems for New Applications
- Ann Lee, James R. Glass:
Mispronunciation detection without nonnative training data. 643-647 - Ramya Rasipuram, Milos Cernak, Alexandre Nanchen, Mathew Magimai-Doss:
Automatic accentedness evaluation of non-native speech using phonetic and sub-phonetic posterior probabilities. 648-652 - Min Ma, Keelan Evanini, Anastassia Loukina, Xinhao Wang, Klaus Zechner:
Using F0 contours to assess nativeness in a sentence repeat task. 653-657 - Rebecca Lunsford, Peter A. Heeman:
Using linguistic indicators of difficulty to identify mild cognitive impairment. 658-662 - Lionel Fontan, Jérôme Farinas, Isabelle Ferrané, Julien Pinquier, Xavier Aumont:
Automatic intelligibility measures applied to speech signals simulating age-related hearing loss. 663-667 - Sandeep Nallan Chakravarthula, Bo Xiao, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou:
Assessing empathy using static and dynamic behavior models based on therapist's language in addiction counseling. 668-672 - Yuzong Liu, Rishabh K. Iyer, Katrin Kirchhoff, Jeff A. Bilmes:
SVitchboard II and fiSVer i: high-quality limited-complexity corpora of conversational English speech. 673-677 - Herman Kamper, Aren Jansen, Sharon Goldwater:
Fully unsupervised small-vocabulary speech recognition using a segmental Bayesian model. 678-682 - Ottokar Tilk, Tanel Alumäe:
LSTM for punctuation restoration in speech transcripts. 683-687 - Emre Yilmaz, Deepak Baby, Hugo Van hamme:
Noise robust exemplar matching for speech enhancement: applications to automatic speech recognition. 688-692 - Yingming Gao, Yanlu Xie, Wen Cao, Jinsong Zhang:
A study on robust detection of pronunciation erroneous tendency based on deep neural network. 693-696 - Shrikant Joshi, Nachiket Deo, Preeti Rao:
Vowel mispronunciation detection using DNN acoustic models with cross-lingual training. 697-701 - Kshitiz Kumar, Ziad Al Bawab, Yong Zhao, Chaojun Liu, Benoît Dumoulin, Yifan Gong:
Confidence-features and confidence-scores for ASR applications in arbitration and DNN speaker adaptation. 702-706 - Pengfei Liu, Shoaib Jameel, Wai Lam, Bin Ma, Helen M. Meng:
Topic modeling for conference analytics. 707-711 - Pulkit Sharma, Vinayak Abrol, Aroor Dinesh Dileep, Anil Kumar Sao:
Sparse coding based features for speech units classification. 712-715
Show and Tell Session 1-4 (Special Session)
- Andreea I. Niculescu, Ngoc Thuy Huong Thai, Chongjia Ni, Boon Pang Lim, Kheng Hui Yeo, Rafael E. Banchs:
Smarter driving with IDA, the intelligent driving assistant for singapore. 716-717 - Kheng Hui Yeo, Rafael E. Banchs:
Talk it out: adding speech interaction to support informational and transactional applications on public touch-screen kiosks. 718-719 - Luis Fernando D'Haro, Seokhwan Kim, Rafael E. Banchs:
Conversational agent and management tools for conference and tourism domain. 720-721 - Askars Salimbajevs, Jevgenijs Strigins:
Latvian speech-to-text transcription service. 722-723 - Jakub Galka, Joanna Grzybowska, Magdalena Igras, Pawel Jaciów, Kamil Wajda, Marcin Witkowski, Mariusz Ziólko:
System supporting speaker identification in emergency call center. 724-725 - Ahmed Abdelali, Ahmed M. Ali, Francisco Guzmán, Felix Stahlberg, Stephan Vogel, Yifan Zhang:
QAT2 - the QCRI advanced transcription and translation system. 726-727 - Michael Stadtschnitzer, Christoph Schmidt:
Implementation of a live dialectal media subtitling system. 728-729 - Peter Bell, Catherine Lai, Clare Llewellyn, Alexandra Birch, Mark Sinclair:
A system for automatic broadcast news summarisation, geolocation and translation. 730-731 - Arturs Znotins, Kaspars Polis, Roberts Dargis:
Media monitoring system for latvian radio and TV broadcasts. 732-733 - Michel Assayag, Jonathan Huang, Jonathan Mamou, Oren Pereg, Saurav Sahay, Oren Shamir, Georg Stemmer, Moshe Wasserblat:
Meeting assistant application. 734-735
Distant and Reverberant Speech Recognition
- Kousuke Itakura, Izaya Nishimuta, Yoshiaki Bando, Katsutoshi Itoyama, Kazuyoshi Yoshii:
Bayesian integration of sound source separation and speech recognition: a new approach to simultaneous speech recognition. 736-740 - Ivan Himawan, Petr Motlícek, Sridha Sridharan, David Dean, Dian Tjondronegoro:
Channel selection in the short-time modulation domain for distant speech recognition. 741-745 - Gert Dekkers, Toon van Waterschoot, Bart Vanrumste, Bert Van Den Broeck, Jort F. Gemmeke, Hugo Van hamme, Peter Karsmakers:
A multi-channel speech enhancement framework for robust NMF-based speech recognition for speech-impaired users. 746-750 - Chanwoo Kim, Kean K. Chin:
Sound source separation algorithm using phase difference and angle distribution modeling near the target. 751-755 - Mirco Ravanelli, Maurizio Omologo:
Contaminated speech training methods for robust DNN-HMM distant speech recognition. 756-760 - Yajie Miao, Florian Metze:
Distance-aware DNNs for robust speech recognition. 761-765
Speech Analysis and Representation 1-3
- Christophe Mertens, Francis Grenez, François Viallet, Alain Ghio, Sabine Skodda, Jean Schoentgen:
Vocal tremor analysis via AM-FM decomposition of empirical modes of the glottal cycle length time series. 766-770 - Elizabeth Godoy, Nicolas Malyska, Thomas F. Quatieri:
Estimating lower vocal tract features with closed-open phase spectral analyses. 771-775 - S. M. Houghton, Colin J. Champion:
Inductive implementation of segmental HMMs as CS-HMMs. 776-780 - G. Nisha Meenakshi, Prasanta Kumar Ghosh:
A discriminative analysis within and across voiced and unvoiced consonants in neutral and whispered speech in multiple indian languages. 781-785 - T. J. Tsai, Andreas Stolcke:
Aligning meeting recordings via adaptive fingerprinting. 786-790 - Matthias Zöhrer, Robert Peharz, Franz Pernkopf:
On representation learning for artificial bandwidth extension. 791-795
L2 Speech Perception and Production
- Helena Levy:
Perception and production of vowel contrasts in German learners of English. 796-800 - Rong Tong, Nancy F. Chen, Bin Ma, Haizhou Li:
Goodness of tone (GOT) for non-native Mandarin tone recognition. 801-805 - Jeanin Jügler, Frank Zimmerer, Bernd Möbius, Christoph Draxler:
The effect of high-variability training on the perception and production of French stops by German native speakers. 806-810 - Wenfu Bao, Hui Feng, Jianwu Dang, Zhilei Liu, Yang Yu, Siyu Wang:
Perception of Mandarin tones by native tibetan speakers. 811-814 - Shambhu Nath Saha, Shyamal Kr. Das Mandal:
Study of acoustic correlates of English lexical stress produced by native (L1) bengali speakers compared to native (L1) English speakers. 815-819 - Yasuko Nagano-Madsen:
Prosodic phrasing unique to the acquisition of L2 intonation - an analysis of L2 Japanese intonation by L1 Swedish learners. 820-823
Information and Metadata Extraction from Speech
- Leda Sari, Batuhan Gündogdu, Murat Saraçlar:
Fusion of LVCSR and posteriorgram based keyword search. 824-828 - Gideon Mendels, Erica Cooper, Victor Soto, Julia Hirschberg, Mark J. F. Gales, Kate M. Knill, Anton Ragni, Haipeng Wang:
Improving speech recognition and keyword search for low resource languages using web data. 829-833 - Kentaro Domoto, Takehito Utsuro, Naoki Sawada, Hiromitsu Nishizaki:
Two-step spoken term detection using SVM classifier trained with pre-indexed keywords based on ASR result. 834-838 - Le Zhang, Damianos G. Karakos, William Hartmann, Roger Hsiao, Richard M. Schwartz, Stavros Tsakalidis:
Enhancing low resource keyword spotting with automatically retrieved web documents. 839-843 - Dario Bertero, Linlin Wang, Ho Yin Chan, Pascale Fung:
A comparison between a DNN and a CRF disfluency detection and reconstruction system. 844-848 - Julian Hough, David Schlangen:
Recurrent neural networks for incremental disfluency detection. 849-853
Deep Neural Networks for Speech Synthesis
- Qiong Hu, Zhizheng Wu, Korin Richmond, Junichi Yamagishi, Yannis Stylianou, Ranniery Maia:
Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning. 854-858 - Sivanand Achanta, Tejas Godambe, Suryakanth V. Gangashetty:
An investigation of recurrent neural network architectures for statistical parametric speech synthesis. 859-863 - Yuchen Fan, Yao Qian, Frank K. Soong, Lei He:
Sequence generation error (SGE) minimization based deep neural networks training for text-to-speech synthesis. 864-868 - Cassia Valentini-Botinhao, Zhizheng Wu, Simon King:
Towards minimum perceptual error training for DNN-based speech synthesis. 869-873 - Eunwoo Song, Hong-Goo Kang:
Deep neural network-based statistical parametric speech synthesis system using improved time-frequency trajectory excitation model. 874-878 - Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King:
A study of speaker adaptation for DNN-based speech synthesis. 879-883
Interspeech 2015 Computational Paralinguistics ChallengE (ComParE): Degree of Nativeness, Parkinson's & Eating Condition (Special Session)
- Abhay Prasad, Prasanta Kumar Ghosh:
Automatic classification of eating conditions from speech using acoustic feature selection and a set of hierarchical support vector machine classifiers. 884-888 - Johannes Wagner, Andreas Seiderer, Florian Lingenfelser, Elisabeth André:
Combining hierarchical classification with frequency weighting for the recognition of eating conditions. 889-893 - Dara Pir, Theodore Brown:
Acoustic group feature selection using wrapper method for automatic eating condition recognition. 894-898 - Thomas Pellegrini:
Comparing SVM, softmax, and shallow neural networks for eating condition classification. 899-903 - Benjamin Milde, Chris Biemann:
Using representation learning and out-of-domain data for a paralinguistic speech task. 904-908 - Heysem Kaya, Alexey Karpov, Albert Ali Salah:
Fisher vectors with cascaded normalization for paralinguistic analysis. 909-913 - Jangwon Kim, Md. Nasir, Rahul Gupta, Maarten Van Segbroeck, Daniel Bone, Matthew P. Black, Zisis Iason Skordilis, Zhaojun Yang, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Automatic estimation of parkinson's disease severity from diverse speech tasks. 914-918 - Tamás Grósz, Róbert Busa-Fekete, Gábor Gosztolya, László Tóth:
Assessing the degree of nativeness and parkinson's condition using Gaussian processes and deep rectifier neural networks. 919-923
Prosody 1-3
- Jan Michalsky:
Pitch scaling as a perceptual cue for questions in German. 924-928 - Uwe D. Reichel, Katalin Mády, Stefan Benus:
Parameterization of prosodic headedness. 929-933 - Biswajit Dev Sarma, Priyankoo Sarmah, Wendy Lalhminghlui, S. R. Mahadeva Prasanna:
Detection of mizo tones. 934-937 - Sophie Repp, Lena Rosin:
The intonation of echo wh-questions. 938-942 - Farhat Jabeen, Tina Bögel, Miriam Butt:
Immediately postverbal questions in urdu. 943-947 - Katalin Mády:
Prosodic (non-)realisation of broad, narrow and contrastive focus in Hungarian: a production and a perception study. 948-952 - Stefan Benus, Uwe D. Reichel, Juraj Simko:
F0 discontinuity as a marker of prosodic boundary strength in lombard speech. 953-957 - Cédric Gendrot, Martine Adda-Decker, Yaru Wu:
Comparing journalistic and spontaneous speech: prosodic and spectral analysis. 958-962 - Nadja Schauffler, Katrin Schweitzer:
Rhythm influences the tonal realisation of focus. 963-967 - Bistra Andreeva, Bernd Möbius, Grazyna Demenko, Frank Zimmerer, Jeanin Jügler:
Linguistic measures of pitch range in slavic and Germanic languages. 968-972 - Chunan Qiu, Jie Liang:
The effect of stress on vowel space in daxi hakka Chinese. 973-977 - Maria O'Reilly, Ailbhe Ní Chasaide:
Declination, peak height and pitch level in declaratives and questions of south connaught irish. 978-982 - Priyankoo Sarmah, Leena Dihingia, Wendy Lalhminghlui:
Contextual variation of tones in mizo. 983-986 - Daniela Wochner, Jana Schlegel, Nicole Dehé, Bettina Braun:
The prosodic marking of rhetorical questions in German. 987-991
Speaker and Language Recognition
- Yannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee:
High-resolution acoustic modeling and compact language modeling of language-universal speech attributes for spoken language identification. 992-996 - Saad Irtza, Vidhyasaharan Sethu, Phu Ngoc Le, Eliathamby Ambikairajah, Haizhou Li:
Phonemes frequency based PLLR dimensionality reduction for language recognition. 997-1001 - Sandro Cumani, Oldrich Plchot, Radek Fér:
Exploiting i-vector posterior covariances for short-duration language recognition. 1002-1006 - Athanasios Lykartsis, Stefan Weinzierl:
Using the beat histogram for speech rhythm description and language identification. 1007-1011 - Rahim Saeidi, Tuija Niemi, Hanna Karppelin, Jouni Pohjalainen, Tomi Kinnunen, Paavo Alku:
Speaker recognition for speech under face cover. 1012-1016 - Md. Hafizur Rahman, Ahilan Kanagasundaram, David Dean, Sridha Sridharan:
Dataset-invariant covariance normalization for out-domain PLDA speaker verification. 1017-1021 - Longting Xu, Kong-Aik Lee, Haizhou Li, Zhen Yang:
Sparse coding of total variability matrix. 1022-1026 - Weicheng Cai, Ming Li, Lin Li, Qingyang Hong:
Duration dependent covariance regularization in PLDA modeling for speaker verification. 1027-1031 - Hagai Aronowitz:
Exploiting supervector structure for speaker recognition trained on a small development set. 1032-1036 - Qingyang Hong, Lin Li, Ming Li, Ling Huang, Lihong Wan, Jun Zhang:
Modified-prior PLDA and score calibration for duration mismatch compensation in speaker recognition system. 1037-1041 - Sarfaraz Jelil, Rohan Kumar Das, Rohit Sinha, S. R. Mahadeva Prasanna:
Speaker verification using Gaussian posteriorgrams on fixed phrase short utterances. 1042-1046 - Laura Fernández Gallardo, Sebastian Möller, Michael Wagner:
Importance of intelligible phonemes for human speaker recognition in different channel bandwidths. 1047-1051 - Hitoshi Yamamoto, Takafumi Koshinaka:
Denoising autoencoder-based speaker feature restoration for utterances of short duration. 1052-1056 - Dayana Ribas, Emmanuel Vincent, José Ramón Calvo de Lara:
Full multicondition training for robust i-vector based speaker recognition. 1057-1061
Show and Tell Session 1-4 (Special Session)
- Bartosz Ziólko, Tomasz Jadczyk, Dawid Skurzok, Piotr Zelasko, Jakub Galka, Tomasz Pedzimaz, Ireneusz Gawlik, Szymon Piotr Palka:
SARMATA 2.0 automatic Polish language speech recognition system. 1062-1063 - Arlo Faria, Korbinian Riedhammer:
Remeeting - get more out of meetings. 1064-1065 - Ikuyo Masuda-Katsuse:
Web application system for pronunciation practice by children with disabilities and to support cooperation of teachers and medical workers. 1066-1067 - Caroline Kaufhold, Vadim Gamidov, Andreas Kießling, Klaus Reinhard, Elmar Nöth:
PATSY - it's all about pronunciation! 1068-1069 - Elias Azarov, Maxim Vashkevich, Denis Likhachov, Alexander A. Petrovsky:
Real-time pitch modification system for speech and singing voice. 1070-1071 - Guillaume Dubuisson Duplessis, Lucile Bechade, Mohamed A. Sehili, Agnès Delaborde, Vincent Letard, Anne-Laure Ligozat, Paul Deléglise, Yannick Estève, Sophie Rosset, Laurence Devillers:
Nao is doing humour in the CHIST-ERA joker project. 1072-1073 - Lisa Lange, Bartholomäus Pfeiffer, Daniel Duran:
ABIMS - auditory bewildered interaction measurement system. 1074-1075
Neural Networks and Speaker Adaptation
- Zhen Huang, Sabato Marco Siniscalchi, I-Fan Chen, Jinyu Li, Jiadong Wu, Chin-Hui Lee:
Maximum a posteriori adaptation of network parameters in deep models. 1076-1080 - Yan Huang, Yifan Gong:
Regularized sequence-level deep neural network model adaptation. 1081-1085 - Xiangang Li, Xihong Wu:
Modeling speaker variability using long short-term memory networks for speech recognition. 1086-1090 - Kshitiz Kumar, Chaojun Liu, Kaisheng Yao, Yifan Gong:
Intermediate-layer DNN adaptation for offline and session-based iterative speaker adaptation. 1091-1095 - Murali Karthick B, Prateek Kolhar, Srinivasan Umesh:
Speaker adaptation of convolutional neural network using speaker specific subspace vectors of SGMM. 1096-1100 - Yajie Miao, Florian Metze:
On speaker adaptation of long short-term memory recurrent neural networks. 1101-1105
Brain- and Other Biosignal-based Spoken Communication
- Emilio Parisotto, Youness Aliyari Ghassabeh, Matt J. MacDonald, Adelina Cozma, Elizabeth W. Pang, Frank Rudzicz:
Automatic identification of received language in MEG. 1106-1110 - Laurens van der Werff, Jón Guðnason, Kamilla Rún Jóhannsdóttir:
Detection of cardiovascular reactivity in speech. 1111-1115 - Alex Francois-Nienaber, Jed A. Meltzer, Frank Rudzicz:
Lateralization in emotional speech perception following transcranial direct current stimulation. 1116-1120 - Minda Yang, Sameer A. Sheth, Catherine A. Schevon, Guy M. McKhann II, Nima Mesgarani:
Speech reconstruction from human auditory cortex with deep neural networks. 1121-1125 - Jonathan S. Brumberg, Nichol Castro, Akshatha Rao:
Temporal dynamics of the speech readiness potential, and its use in a neural decoder of speech-motor intention. 1126-1130 - Dominic Heger, Christian Herff, Adriana de Pesters, Dominic Telaar, Peter Brunner, Gerwin Schalk, Tanja Schultz:
Continuous speech recognition from ECoG. 1131-1135
Deep Neural Networks in Speaker Recognition
- Yu-hsin Chen, Ignacio López-Moreno, Tara N. Sainath, Mirkó Visontai, Raziel Alvarez, Carolina Parada:
Locally-connected and convolutional neural networks for small footprint speaker recognition. 1136-1140 - Daniel Garcia-Romero, Alan McCree:
Insights into deep neural networks for speaker recognition. 1141-1145 - Fred Richardson, Douglas A. Reynolds, Najim Dehak:
A unified deep neural network for speaker and language recognition. 1146-1150 - Yao Tian, Meng Cai, Liang He, Jia Liu:
Investigation of bottleneck features and multilingual deep neural networks for speaker verification. 1151-1155 - Hua Xing, Gang Liu, John H. L. Hansen:
Frequency offset correction in single sideband (SSB) speech by deep neural network for speaker verification. 1156-1160 - Hao Zheng, Shanshan Zhang, Wenju Liu:
Exploring robustness of DNN/RNN for extracting speaker baum-welch statistics in mismatched conditions. 1161-1165
Speech Analysis and Representation 1-3
- Dhananjaya N. Gowda, Rahim Saeidi, Paavo Alku:
AM-FM based filter bank analysis for estimation of spectro-temporal envelopes and its application for speaker recognition in noisy reverberant environments. 1166-1170 - Thomas Drugman, Yannis Stylianou:
Fast and accurate phase unwrapping. 1171-1175 - Xugang Lu, Peng Shen, Yu Tsao, Chiori Hori, Hisashi Kawai:
Sparse representation with temporal max-smoothing for acoustic event detection. 1176-1180 - Rachel G. Anushiya, P. Vijayalakshmi, T. Nagarajan:
Estimation of glottal closure instants from telephone speech using a group delay-based approach that considers speech signal as a spectrum. 1181-1185 - Raúl Montaño, Francesc Alías:
The role of prosody and voice quality in text-dependent categories of storytelling across languages. 1186-1190 - Alexandre Hyafil, Milos Cernak:
Neuromorphic based oscillatory device for incremental syllable boundary detection. 1191-1195
Statistical Parametric Speech Synthesis
- Takenori Yoshimura, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda:
Simultaneous optimization of multiple tree structures for factor analyzed HMM-based speech synthesis. 1196-1200 - Maël Pouget, Thomas Hueber, Gérard Bailly, Timo Baumann:
HMM training strategy for incremental speech synthesis. 1201-1205 - Shinnosuke Takamichi, Tomoki Toda, Alan W. Black, Satoshi Nakamura:
Modulation spectrum-constrained trajectory training algorithm for HMM-based speech synthesis. 1206-1210 - Alan W. Black, Prasanna Kumar Muthukumar:
Random forests for statistical speech synthesis. 1211-1215 - Doo Hwa Hong, Joun Yeop Lee, Se Young Jang, Nam Soo Kim:
Speaker adaptation using relevance vector regression for HMM-based expressive TTS. 1216-1220 - Vassilios Tsiaras, Ranniery Maia, Vassilios Diakoloukas, Yannis Stylianou, Vassilios Digalakis:
Towards a linear dynamical model based speech synthesizer. 1221-1225
Speech Science in End-user Applications (Special Session)
- Céline De Looze, Brian Vaughan, Finnian Kelly, Alison M. Kay:
Providing objective metrics of team communication skills via interpersonal coordination mechanisms. 1226-1230 - Donghyeon Lee, Jinsik Lee, Eun-Kyoung Kim, Jaewon Lee:
Dialog act modeling for virtual personal assistant applications using a small volume of labeled data and domain knowledge. 1231-1235 - Csaba Zainkó, Mátyás Bartalis, Géza Németh, Gábor Olaszy:
A polyglot domain optimised text-to-speech system for railway station announcements. 1236-1240 - Partho Mandal, Shalini Jain, Gaurav Ojha, Anupam Shukla:
Development of hindi speech recognition system of agricultural commodities using deep neural network. 1241-1245 - Thomas Fehér, Michael Freitag, Christian Gruber:
Real-time audio signal enhancement for hands-free speech applications. 1246-1250 - Daniel Erro, Inma Hernáez, Agustín Alonso, D. García-Lorenzo, Eva Navas, Jianpei Ye, Haritz Arzelus, Igor Jauk, Nguyen Quy Hy, Carmen Magariños, R. Pérez-Ramón, M. Sulír, Xiaohai Tian, X. Wang:
Personalized synthetic voices for speaking impaired: website and app. 1251-1254
Speech Recognition: Evaluation and Low-resource Languages
- Reza Sahraeian, Dirk Van Compernolle, Febe de Wet:
Under-resourced speech recognition based on the speech manifold. 1255-1259 - Pavel Golik, Zoltán Tüske, Ralf Schlüter, Hermann Ney:
Multilingual features based keyword search for very low-resource languages. 1260-1264 - Xiaoyun Wang, Seiichi Yamamoto:
Second language speech recognition using multiple-pass decoding with lexicon represented by multiple reduced phoneme sets. 1265-1269 - Sarah Flora Samson Juan, Laurent Besacier, Benjamin Lecouteux, Mohamed Dyab:
Using resources from a closely-related language to develop ASR for a very under-resourced language: a case study for iban. 1270-1274 - Maxim L. Korenevsky, Andrey B. Smirnov, Valentin S. Mendelev:
Prediction of speech recognition accuracy for utterance classification. 1275-1279 - Eugen Beck, Ralf Schlüter, Hermann Ney:
Error bounds for context reduction and feature omission. 1280-1284 - Nobuyasu Itoh, Gakuto Kurata, Ryuki Tachibana, Masafumi Nishimura:
A metric for evaluating speech recognizer output based on human-perception model. 1285-1288 - Mohamed Ameur Ben Jannet, Olivier Galibert, Martine Adda-Decker, Sophie Rosset:
How to evaluate ASR output for named entity recognition? 1289-1293
Emotion 1, 2
- Hansjörg Mixdorff, Angelika Hönemann, Albert Rilliard:
Acoustic-prosodic analysis of attitudinal expressions in German. 1294-1298 - Hossein Khaki, Engin Erzin:
Continuous emotion tracking using total variability space. 1299-1303 - Chi-Chun Lee, Daniel Bone, Shrikanth S. Narayanan:
An analysis of the relationship between signal-derived vocal arousal score and human emotion production and perception. 1304-1308 - Hiroki Mori:
Morphology of vocal affect bursts: exploring expressive interjections in Japanese conversation. 1309-1313 - Mahnoosh Mehrabani, Ozlem Kalinli, Ruxin Chen:
Emotion clustering based on probabilistic linear discriminant analysis. 1314-1318 - Aaron Albin, Elliot Moore:
Objective study of the performance degradation in emotion recognition through the AMR-WB+ codec. 1319-1323 - Sudarsana Reddy Kadiri, P. Gangamohan, Suryakanth V. Gangashetty, Bayya Yegnanarayana:
Analysis of excitation source features of speech for emotion recognition. 1324-1328 - Zhaocheng Huang, Julien Epps, Eliathamby Ambikairajah:
An investigation of emotion change detection from speech. 1329-1333 - Wentao Gu, Ping Tang, Keikichi Hirose, Véronique Aubergé:
Crosslinguistic comparison on the perception of Mandarin attitudinal speech. 1334-1338 - Gábor Gosztolya:
Conflict intensity estimation from speech using Greedy forward-backward feature selection. 1339-1343
Spoken Language Understanding 1-3
- Imran A. Sheikh, Irina Illina, Dominique Fohr:
Study of entity-topic models for OOV proper name retrieval. 1344-1348 - Simon Boutin, Réal Tremblay, Patrick Cardinal, Doug Peters, Pierre Dumouchel:
Audio quotation marks for natural language understanding. 1349-1352 - Xiaohao Yang, Jia Liu:
Using word confusion networks for slot filling in spoken language understanding. 1353-1357 - Justin T. Chiu, Yajie Miao, Alan W. Black, Alexander I. Rudnicky:
Distributed representation-based spoken word sense induction. 1358-1362 - Sheng-syun Shen, Hung-yi Lee, Shang-wen Li, Victor Zue, Lin-Shan Lee:
Structuring lectures in massive open online courses (MOOCs) for efficient learning by linking similar sections and predicting prerequisites. 1363-1367 - Delphine Charlet, Géraldine Damnati, Jérémy Trione:
News talk-show chaptering with journalistic genres. 1368-1372 - Vikram Ramanarayanan, Lei Chen, Chee Wee Leong, Gary Feng, David Suendermann-Oeft:
An analysis of time-aggregated and time-series features for scoring different aspects of multimodal presentation data. 1373-1377 - David Nicolas Racca, Gareth J. F. Jones:
Incorporating prosodic prominence evidence into term weights for spoken content retrieval. 1378-1382 - Kuan-Yu Chen, Shih-Hung Liu, Hsin-Min Wang, Berlin Chen, Hsin-Hsi Chen:
Leveraging word embeddings for spoken document summarization. 1383-1387 - Vincent Renkens, Hugo Van hamme:
Mutually exclusive grounding for weakly supervised non-negative matrix factorisation. 1388-1392 - Emanuele Bastianelli, Danilo Croce, Roberto Basili, Daniele Nardi:
Using semantic maps for robust natural language interaction with robots. 1393-1397 - Yi Luan, Shinji Watanabe, Bret Harsham:
Efficient learning for spoken language understanding tasks with word embedding based pre-training. 1398-1402 - Emmanuel Ferreira, Bassam Jabaian, Fabrice Lefèvre:
Zero-shot semantic parser for spoken language understanding. 1403-1407 - Jérémie Tafforeau, Thierry Artières, Benoît Favre, Frédéric Béchet:
Adapting lexical representation and OOV handling from written to spoken language with word embedding. 1408-1412
Language Modeling for Speech Recognition
- Ebru Arisoy, Murat Saraçlar:
Multi-stream long short-term memory neural network language model. 1413-1417 - Keith B. Hall, Eunjoon Cho, Cyril Allauzen, Françoise Beaufays, Noah Coccaro, Kaisuke Nakajima, Michael Riley, Brian Roark, David Rybach, Linda Zhang:
Composition-based on-the-fly rescoring for salient n-gram biasing. 1418-1422 - Alex Marin, Mari Ostendorf, Ji He:
Learning phrase patterns for ASR name error detection using semantic similarity. 1423-1427 - Noam Shazeer, Joris Pelemans, Ciprian Chelba:
Sparse non-negative matrix language modeling for skip-grams. 1428-1432 - Joris Pelemans, Noam Shazeer, Ciprian Chelba:
Pruning sparse non-negative matrix n-gram language models. 1433-1437 - Ciprian Chelba, Xuedong Zhang, Keith B. Hall:
Geo-location for voice search language modeling. 1438-1442 - Rami Botros, Kazuki Irie, Martin Sundermeyer, Hermann Ney:
On efficient training of word classes and their application to recurrent neural network language models. 1443-1447 - Ali Orkan Bayer, Giuseppe Riccardi:
Deep semantic encodings for language modeling. 1448-1452 - Ming Sun, Yun-Nung Chen, Alexander I. Rudnicky:
Learning OOV through semantic relatedness in spoken dialog systems. 1453-1457 - Tze Yuang Chong, Rafael E. Banchs, Engsiong Chng, Haizhou Li:
TDTO language modeling with feedforward neural networks. 1458-1462
Fast Efficient and Scalable Computing for Neural Nets
- Matthias Paulik:
Improvements to the pruning behavior of DNN acoustic models. 1463-1467 - Hasim Sak, Andrew W. Senior, Kanishka Rao, Françoise Beaufays:
Fast and accurate recurrent neural network acoustic models for speech recognition. 1468-1472 - Preetum Nakkiran, Raziel Alvarez, Rohit Prabhavalkar, Carolina Parada:
Compressing deep neural networks using a rank-constrained topology. 1473-1477 - Tara N. Sainath, Carolina Parada:
Convolutional neural networks for small-footprint keyword spotting. 1478-1482 - Ewout van den Berg, Daniel Brand, Rajesh Bordawekar, Leonid Rachevsky, Bhuvana Ramabhadran:
Efficient GPU implementation of convolutional neural networks for speech recognition. 1483-1487 - Nikko Strom:
Scalable distributed DNN training using commodity GPU cloud computing. 1488-1492
Source Separation and Computational Auditory Scene Analysis
- Sachin N. Kalkur, Sandeep Reddy C, Rajesh M. Hegde:
Joint source localization and separation in spherical harmonic domain using a sparsity based method. 1493-1497 - Shaofei Zhang, Dong-Yan Huang, Lei Xie, Engsiong Chng, Haizhou Li, Minghui Dong:
Regularized non-negative matrix factorization using alternating direction method of multipliers and its application to source separation. 1498-1502 - Shuai Nie, Shan Liang, Wei Xue, Xueliang Zhang, Wenju Liu, Like Dong, Hong Yang:
Two-stage multi-target joint learning for monaural speech separation. 1503-1507 - Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee:
Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement. 1508-1512 - Kisoo Kwon, Jong Won Shin, Hyung Yong Kim, Nam Soo Kim:
Discriminative nonnegative matrix factorization using cross-reconstruction error for source separation. 1513-1516 - Faheem Khan, Ben Milner:
Using audio and visual information for single channel speaker separation. 1517-1521
Emotion 1, 2
- Chee Seng Chong, Jeesun Kim, Chris Davis:
Exploring acoustic differences between Cantonese (tonal) and English (non-tonal) spoken expressions of emotions. 1522-1526 - Elisavet Palogiannidi, Elias Iosif, Polychronis Koutsakis, Alexandros Potamianos:
Valence, arousal and dominance estimation for English, German, Greek, Portuguese and Spanish lexica using semantic models. 1527-1531 - Xinzhou Xu, Jun Deng, Wenming Zheng, Li Zhao, Björn W. Schuller:
Dimensionality reduction for speech emotion features by multiscale kernels. 1532-1536 - Jinkyu Lee, Ivan Tashev:
High-level feature representation using recurrent neural network for speech emotion recognition. 1537-1540 - Myung Jong Kim, Joohong Yoo, Younggwan Kim, Hoirin Kim:
Speech emotion classification using tree-structured sparse logistic regression. 1541-1545 - Bogdan Vlasenko, Andreas Wendemuth:
Annotators' agreement and spontaneous emotion classification performance. 1546-1550
Computational Models of Human Speech Perception
- Harald Höge:
On the nature of the features generated in the human auditory pathway for phone recognition. 1551-1555 - Kodai Yamamoto, Toshio Irino, Ryuichi Nisimura, Hideki Kawahara, Roy D. Patterson:
How the slope of the speech spectrum affects the perception of speaker size. 1556-1560 - Heikki Rasilo, Okko Räsänen:
Weakly-supervised word learning is improved by an active online algorithm. 1561-1565 - Lin Lin, Jon Barker, Guy J. Brown:
The effect of cochlear implant processing on speaker intelligibility: a perceptual study and computer model. 1566-1570 - Mengxue Cao, Aijun Li, Qiang Fang, Bernd J. Kröger:
Phonetic-phonological feature emerges by associating phonetic with semantic information - a GSOM-based modeling study. 1571-1575 - Louis ten Bosch, Lou Boves, Benjamin V. Tucker, Mirjam Ernestus:
DIANA: towards computational modeling reaction times in lexical decision in north American English. 1576-1580
Prosody Modeling for Speech Synthesis
- Qian Chen, Zhen-Hua Ling, Chen-Yu Yang, Li-Rong Dai:
Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributions. 1581-1585 - Manuel Sam Ribeiro, Junichi Yamagishi, Robert A. J. Clark:
A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis. 1586-1590 - Decha Moungsri, Tomoki Koriyama, Takao Kobayashi:
Duration prediction using multi-level model for GPR-based speech synthesis. 1591-1595 - Mahsa Sadat Elyasi Langarani, Jan P. H. van Santen, Seyed Hamidreza Mohammadi, Alexander Kain:
Data-driven foot-based intonation generator for text-to-speech synthesis. 1596-1600 - Branislav Gerazov, Pierre-Edouard Honnet, Aleksandar Gjoreski, Philip N. Garner:
Weighted correlation based atom decomposition intonation modelling. 1601-1605 - Raul Fernandez, Asaf Rendel, Bhuvana Ramabhadran, Ron Hoory:
Using deep bidirectional recurrent neural networks for prosodic-target prediction in a unit-selection text-to-speech system. 1606-1610
Speech and Language Processing of Children's Speech (Special Session)
- Hank Liao, Golan Pundak, Olivier Siohan, Melissa K. Carroll, Noah Coccaro, Qi-Ming Jiang, Tara N. Sainath, Andrew W. Senior, Françoise Beaufays, Michiel Bacchiani:
Large vocabulary automatic speech recognition for children. 1611-1615 - Daniel Bone, Matthew P. Black, Anil Ramakrishna, Ruth B. Grossman, Shrikanth S. Narayanan:
Acoustic-prosodic correlates of 'awkward' prosody in story retellings from adolescents with autism. 1616-1620 - Eva Fringi, Jill Fain Lehman, Martin J. Russell:
Evidence of phonological processes in automatic recognition of children's speech. 1621-1624 - Michael Pucher, Markus Toman, Dietmar Schabus, Cassia Valentini-Botinhao, Junichi Yamagishi, Bettina Zillinger, Erich Schmid:
Influence of speaker familiarity on blind and visually impaired children's perception of synthetic voices in audio games. 1625-1629 - Syed Shahnawazuddin, Rohit Sinha:
Low-memory fast on-line adaptation for acoustically mismatched children's speech recognition. 1630-1634 - Diego Giuliani, Bagher BabaAli:
Large vocabulary children's speech recognition with DNN-HMM and SGMM acoustic modeling. 1635-1639 - Avashna Govender, Febe de Wet, Jules-Raymond Tapamo:
HMM adaptation for child speech synthesis. 1640-1644 - Jaebok Kim, Khiet P. Truong, Vicky Charisi, Cristina Zaga, Manja Lohse, Dirk Heylen, Vanessa Evers:
Vocal turn-taking patterns in groups of children performing collaborative tasks: an exploratory study. 1645-1649 - Roozbeh Sadeghian, Stephen A. Zahorian:
Towards an automated screening tool for pediatric speech delay. 1650-1654 - Jorge Proença, Dirce Celorico, Sara Candeias, Carla Lopes, Fernando Perdigão:
Children's reading aloud performance: a database and automatic detection of disfluencies. 1655-1659 - Sundar Harshavardhan, Jill Fain Lehman, Rita Singh:
Keyword spotting in multi-player voice driven games for children. 1660-1664 - Jinxi Guo, Rohit Paturi, Gary Yeung, Steven M. Lulich, Harish Arsikere, Abeer Alwan:
Age-dependent height estimation and speaker normalization for children's speech using the first three subglottal resonances. 1665-1669
Syllables and Segments 1, 2
- Adrian Leemann, Camilla Bernardasci, Francis Nolan:
The effect of speakers' regional varieties on listeners' decision-making. 1670-1674 - Robert Fuchs:
Word-initial glottal stop insertion, hiatus resolution and linking in British English. 1675-1679 - Shanpeng Li, Wentao Gu:
Acoustic analysis of Mandarin affricates. 1680-1684 - Hannah Leykum, Sylvia Moosmüller, Wolfgang U. Dressler:
Homophonous phonotactic and morphonotactic consonant clusters in word-final position. 1685-1689 - Mark Gibson, Ana María Fernández Planas, Adamantios I. Gafos, Emily Remirez:
Consonant duration and VOT as a function of syllable complexity and voicing in a sub-set of Spanish clusters. 1690-1694 - Takayuki Arai:
Hands-on tool producing front vowels for phonetic education: aiming for pronunciation training with tactile sensation. 1695-1699 - Indranil Dutta, Ayushi Pandey:
Acoustics of articulatory constraints: vowel classification and nasalization. 1700-1704 - Janina Kraus:
Voice-conditioned allophones of MOUTH and PRICE in bahamian creole. 1705-1709 - Marie-José Kolly, Adrian Leemann, Florian Matter:
Analysis of spatial variation with app-based crowdsourced audio data. 1710-1714 - Mátyás Jani, Catia Cucchiarini, Roeland van Hout, Helmer Strik:
Confusability in L2 vowels: analyzing the role of different features. 1715-1719 - Frank Zimmerer, Jürgen Trouvain:
Perception of French speakers' German vowels. 1720-1724 - Jagoda Bruni, Daniel Duran, Grzegorz Dogil:
Unintuitive phonetic behavior in tswana post-nasal stops. 1725-1729
Speech Enhancement
- Chun Hoy Wong, Tan Lee, Yu Ting Yeung, Pak-Chung Ching:
Modeling temporal dependency for robust estimation of LP model parameters in speech enhancement. 1730-1734 - Colin Vaz, Shrikanth S. Narayanan:
Learning a speech manifold for signal subspace speech denoising. 1735-1739 - Samy Elshamy, Nilesh Madhu, Wouter Tirry, Tim Fingscheidt:
An iterative speech model-based a priori SNR estimator. 1740-1744 - Xiao-Lei Zhang, DeLiang Wang:
Multi-resolution stacking for speech separation based on boosted DNN. 1745-1749 - Sidsel Marie Nørholm, Martin Krawczyk-Becker, Timo Gerkmann, Steven van de Par, Jesper Rindom Jensen, Mads Græsbøll Christensen:
Least squares estimate of the initial phases in STFT based speech enhancement. 1750-1754 - Sidsel Marie Nørholm, Jesper Rindom Jensen, Mads Græsbøll Christensen:
Enhancement of non-stationary speech using harmonic chirp filters. 1755-1759 - Keisuke Kinoshita, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani:
Text-informed speech enhancement with deep neural networks. 1760-1764 - Shogo Masaya, Masashi Unoki:
Complex tensor factorization in modulation frequency domain for single-channel speech enhancement. 1765-1769 - Hyeonjoo Kang, JeeSok Lee, Soonho Baek, Hong-Goo Kang:
Systematic integration of acoustic echo canceller and noise reduction modules for voice communication systems. 1770-1774 - Chul Min Lee, Jong Won Shin, Nam Soo Kim:
DNN-based residual echo suppression. 1775-1779 - Qi He, Changchun Bao, Feng Bao:
Codebook-based speech enhancement using Markov process and speech-presence probability. 1780-1784 - Aleksej Chinaev, Reinhold Haeb-Umbach:
On optimal smoothing in minimum statistics based noise tracking. 1785-1789 - Yue Hao, Changchun Bao, Feng Bao, Feng Deng:
A data-driven speech enhancement method based on modeled long-range temporal dynamics. 1790-1794 - Florian Mayer, Pejman Mowlaee:
Improved phase reconstruction in single-channel speech separation. 1795-1799
Spoken Language Understanding 1-3
- Xiaohao Yang, Jia Liu:
Dialog state tracking using long short-term memory neural networks. 1800-1804 - José Lopes, Giampiero Salvi, Gabriel Skantze, Alberto Abad, Joakim Gustafson, Fernando Batista, Raveesh Meena, Isabel Trancoso:
Detecting repetitions in spoken dialogue systems using phonetic distances. 1805-1809 - Paul A. Crook, Jean-Philippe Robichaud, Ruhi Sarikaya:
Multi-language hypotheses ranking and domain tracking for open domain dialogue systems. 1810-1814 - Vijay Solanki, Alessandro Vinciarelli, Jane Stuart-Smith, Rachel Smith:
Measuring mimicry in task-oriented conversations: degree of mimicry is related to task difficulty. 1815-1819 - Kornel Laskowski:
Auto-imputing radial basis functions for neural-network turn-taking models. 1820-1824 - Quim Llimona, Jordi Luque, Xavier Anguera, Zoraida Hidalgo, Souneil Park, Nuria Oliver:
Effect of gender and call duration on customer satisfaction in call center big data. 1825-1829 - Zoraida Callejas, David Griol:
Using profile similarity to measure agreement in personality perception. 1830-1834 - Shizuka Nakamura, Miki Watanabe, Yuichiro Yoshikawa, Kohei Ogawa, Hiroshi Ishiguro:
Relieving mental stress of speakers using a tele-operated robot in foreign language speech education. 1835-1838 - Agustín Gravano, Stefan Benus, Rivka Levitan, Julia Hirschberg:
Backward mimicry and forward influence in prosodic contour choice in standard American English. 1839-1843 - Shammur Absar Chowdhury, Morena Danieli, Giuseppe Riccardi:
The role of speakers and context in classifying competition in overlapping speech. 1844-1848 - George Christodoulides, Mathieu Avanzi:
Automatic detection and annotation of disfluencies in spoken French corpora. 1849-1853 - Dilek Hakkani-Tür, Yun-Cheng Ju, Geoffrey Zweig, Gökhan Tür:
Clustering novel intents in a conversational interaction system with semantic parsing. 1854-1858 - Vladimir Despotovic, Oliver Walter, Reinhold Haeb-Umbach:
Semantic analysis of spoken input using Markov logic networks. 1859-1863 - Jan Svec, Adam Chýlek, Lubos Smídl:
Hierarchical discriminative model for spoken language understanding based on convolutional neural network. 1864-1868 - Yun-Nung Chen, William Yang Wang, Alexander I. Rudnicky:
Learning semantic hierarchy with distributed representations for unsupervised spoken language understanding. 1869-1873
Show and Tell Session 1-4 (Special Session)
- Kay Berkling, Nadine Pflaumer, Alexei Coyplove:
Phontasia - a game for training German orthography. 1874-1875 - Ka-Ho Wong, Wai-Kim Leung, Helen M. Meng:
E-commu-book: an assistive technology for users with speech impairments. 1876-1877 - Martina Röthlisberger, Iliana I. Karipidis, Georgette Pleisch, Volker Dellwo, Ulla Richardson, Silvia Brem:
Swiss graphogame: concept and design presentation of a computerised reading intervention for children with high risk for poor reading outcomes. 1878-1879 - Jakob Pfab, Hanna Jakob, Mona Späth, Christoph Draxler:
Neolexon - a therapy app for patients with aphasia. 1880-1881 - Sonal Patil, Harish Arsikere, Om Deshmukh:
Acoustic stress detection for improved navigation of educational videos. 1882-1883 - Xavier Anguera:
Multimodal read-aloud ebooks for language learning. 1884-1885 - Laurent Besacier, Elodie Gauthier, Mathieu Mangeot, Philippe Bretier, Paul C. Bagshaw, Olivier Rosec, Thierry Moudenc, François Pellegrino, Sylvie Voisin, Egidio Marsico, Pascal Nocera:
Speech technologies for african languages: example of a multilingual calculator for education. 1886-1887
Phonetic Recognition: Novel Approaches and Understanding
- Tuo Zhao, Yunxin Zhao, Xin Chen:
Time-frequency kernel-based CNN for speech recognition. 1888-1892 - Philip Weber, Colin J. Champion, S. M. Houghton, Peter Jancovic, Martin J. Russell:
Consonant recognition with continuous-state hidden Markov models and perceptually-motivated features. 1893-1897 - Sriram Ganapathy, Samuel Thomas, Dimitrios Dimitriadis, Steven J. Rennie:
Investigating factor analysis features for deep neural networks in noisy speech recognition. 1898-1902 - Ruchir Travadi, Shrikanth S. Narayanan:
Ensemble of Gaussian mixture localized neural networks with application to phone recognition. 1903-1907 - Jan Pesán, Lukás Burget, Hynek Hermansky, Karel Veselý:
DNN derived filters for processing of modulation spectrum of speech. 1908-1911 - Tasha Nagamine, Michael L. Seltzer, Nima Mesgarani:
Exploring how deep neural networks form phonemic categories. 1912-1916
Varieties of Speech
- Anastassia Loukina, Melissa Lopez, Keelan Evanini, David Suendermann-Oeft, Alexei V. Ivanov, Klaus Zechner:
Pronunciation accuracy and intelligibility of non-native speech. 1917-1921 - Frank Zimmerer, Jürgen Trouvain:
Productions of /h/ in German: French vs. German speakers. 1922-1926 - Anne Bonneau, Martine Cadot:
German non-native realizations of French voiced fricatives in final position of a group of words. 1927-1931 - Catherine T. Best, Jason A. Shaw, Gerard Docherty, Bronwen G. Evans, Paul Foulkes, Jennifer Hay, Jalal Al-Tamimi, Katharine Mair, Karen E. Mulak, Sophie Wood:
From newcastle MOUTH to aussie ears: australians' perceptual assimilation and adaptation for newcastle UK vowels. 1932-1936 - Rikke Louise Bundgaard-Nielsen, Brett Baker, Olga Maxwell, Janet Fletcher:
Wubuy coronal stop perception by speakers of three dialects of bangla. 1937-1941 - Daniel Hirst, Hongwei Ding:
Using melody metrics to compare English speech read by native speakers and by L2 Chinese speakers from shanghai. 1942-1946
Conversational Interaction
- James Gibson, Nikolaos Malandrakis, Francisco Romero, David C. Atkins, Shrikanth S. Narayanan:
Predicting therapist empathy in motivational interviews using language features inspired by psycholinguistic norms. 1947-1951 - Nikolaos Malandrakis, Shrikanth S. Narayanan:
Therapy language analysis using automatically generated psycholinguistic norms. 1952-1956 - Wei Xia, James Gibson, Bo Xiao, Brian R. Baucom, Panayiotis G. Georgiou:
A dynamic model for behavioral analysis of couple interactions using acoustic features. 1957-1961 - Rahul Gupta, Theodora Chaspari, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan:
Analysis and modeling of the role of laughter in motivational interviewing based psychotherapy conversations. 1962-1966 - Francesca Bonin, Nick Campbell, Carl Vogel:
The discourse value of social signals at topic change moments. 1967-1971 - Tobias Schrank, Barbara Schuppler:
Automatic detection of uncertainty in spontaneous German dialogue. 1972-1976
Speech and Audio Segmentation and Classification; Voice Activity Detection 1-3
- Fabien Ringeval, Erik Marchi, Marc Mehu, Klaus R. Scherer, Björn W. Schuller:
Face reading from speech - predicting facial action units from audio cues. 1977-1981 - Mahesh Kumar Nandwana, Hynek Boril, John H. L. Hansen:
A new front-end for classification of non-speech sounds: a study on human whistle. 1982-1986 - Sri Harsha Dumpala, Bhanu Teja Nellore, Raghu Ram Nevali, Suryakanth V. Gangashetty, Bayya Yegnanarayana:
Robust features for sonorant segmentation in continuous speech. 1987-1991 - Sebastian Gergen, Anil M. Nagathil, Rainer Martin:
Reduction of reverberation effects in the MFCC modulation spectrum for improved classification of acoustic signals. 1992-1996 - Jonathan William Dennis, Tran Huy Dat, Haizhou Li:
Spiking neural networks and the generalised hough transform for speech pattern detection. 1997-2001 - Woohyun Choi, Sangwook Park, David K. Han, Hanseok Ko:
Acoustic event recognition using dominant spectral basis vectors. 2002-2006
Spoken Dialogue Systems
- Pei-hao Su, David Vandyke, Milica Gasic, Dongho Kim, Nikola Mrksic, Tsung-Hsien Wen, Steve J. Young:
Learning from real users: rating dialogue success with neural networks for reinforcement learning in spoken dialogue systems. 2007-2011 - David Griol, Zoraida Callejas, Ramón López-Cózar:
A framework to develop context-aware adaptive dialogue system. 2012-2016 - David Griol, Zoraida Callejas:
A proposal to develop domain and subtask-adaptive dialog management models. 2017-2021 - Omar Zia Khan, Jean-Philippe Robichaud, Paul A. Crook, Ruhi Sarikaya:
Hypotheses ranking and state tracking for a multi-domain dialog system using multiple ASR alternates. 2022-2026 - Ji Wu, Miao Li, Chin-Hui Lee:
An entropy minimization framework for goal-driven dialogue management. 2027-2031 - Ingrid Zukerman, Andisheh Partovi, Su Nam Kim:
Context-dependent error correction of spoken referring expressions. 2032-2036
Automatic Speaker Verification Spoofing and Countermeasures (ASVspoof 2015) (Special Session)
- Zhizheng Wu, Tomi Kinnunen:
Automatic speaker verification spoofing and countermeasures (ASVspoof 2015): introductory talk by the organizers. - Junichi Yamagishi, Nicholas W. D. Evans:
Automatic speaker verification spoofing and countermeasures (ASVspoof 2015): open discussion and future plans. - Zhizheng Wu, Tomi Kinnunen, Nicholas W. D. Evans, Junichi Yamagishi, Cemal Hanilçi, Md. Sahidullah, Aleksandr Sizov:
ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. 2037-2041 - Jon Sánchez, Ibon Saratxaga, Inma Hernáez, Eva Navas, Daniel Erro:
The AHOLAB RPS SSD spoofing challenge 2015 submission. 2042-2046 - Mirjam Wester, Zhizheng Wu, Junichi Yamagishi:
Human vs machine spoofing detection on wideband and narrowband data. 2047-2051 - Xiong Xiao, Xiaohai Tian, Steven Du, Haihua Xu, Engsiong Chng, Haizhou Li:
Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge. 2052-2056 - Cemal Hanilçi, Tomi Kinnunen, Md. Sahidullah, Aleksandr Sizov:
Classifiers for synthetic speech detection: a comparison. 2057-2061 - Tanvina B. Patel, Hemant A. Patil:
Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech. 2062-2066 - Jesús Antonio Villalba López, Antonio Miguel, Alfonso Ortega, Eduardo Lleida:
Spoofing detection with DNN and one-class SVM for the ASVspoof 2015 challenge. 2067-2071 - Md. Jahangir Alam, Patrick Kenny, Gautam Bhattacharya, Themos Stafylakis:
Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015. 2072-2076 - Artur Janicki:
Spoofing countermeasure based on analysis of linear prediction error. 2077-2081 - Yi Liu, Yao Tian, Liang He, Jia Liu, Michael T. Johnson:
Simultaneous utilization of spectral magnitude and phase information to extract supervectors for speaker verification anti-spoofing. 2082-2086 - Md. Sahidullah, Tomi Kinnunen, Cemal Hanilçi:
A comparison of features for synthetic speech detection. 2087-2091 - Longbiao Wang, Yohei Yoshida, Yuta Kawakami, Seiichi Nakagawa:
Relative phase information for detecting human speech and spoofed speech. 2092-2096 - Nanxin Chen, Yanmin Qian, Heinrich Dinkel, Bo Chen, Kai Yu:
Robust deep feature for spoofing detection - the SJTU system for ASVspoof 2015 challenge. 2097-2101
Acoustic Modeling and Decoding Methods for Speech Recognition
- Kyungmin Lee, Chiyoun Park, Ilhwan Kim, Namhoon Kim, Jaewon Lee:
Applying GPGPU to recurrent neural network language model based fast network search in the real-time LVCSR. 2102-2106 - Youssef Oualil, Marc Schulder, Hartmut Helmke, Anna Schmidt, Dietrich Klakow:
Real-time integration of dynamic context information for improving automatic speech recognition. 2107-2111 - Cyril Allauzen, Michael Riley:
Rapid vocabulary addition to context-dependent decoder graphs. 2112-2116 - Hainan Xu, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur:
Modeling phonetic context with non-random forests for speech recognition. 2117-2121 - Benjamin Lecouteux, Didier Schwab:
Ant colony algorithm applied to automatic speech recognition graph decoding. 2122-2126 - Christophe Van Gysel, Leonid Velikovich, Ian McGraw, Françoise Beaufays:
Garbage modeling for on-device speech recognition. 2127-2131 - Haihua Xu, Van Hai Do, Xiong Xiao, Engsiong Chng:
A comparative study of BNF and DNN multilingual training on cross-lingual low-resource speech recognition. 2132-2136 - Martin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf:
Neural higher-order factors in conditional random fields for phoneme classification. 2137-2141 - Shahab Jalalvand, Daniele Falavigna:
Stacked auto-encoder for ASR error detection and word error rate prediction. 2142-2146
Speech Production Measurements and Analyses
- Satyabrata Parida, Ashok Kumar Pattem, Prasanta Kumar Ghosh:
Estimation of the air-tissue boundaries of the vocal tract in the mid-sagittal plane from electromagnetic articulograph data. 2147-2151 - Claudia Canevari, Leonardo Badino, Luciano Fadiga:
A new Italian dataset of parallel acoustic and articulatory data. 2152-2156 - Tamás Gábor Csapó, Steven M. Lulich:
Error analysis of extracted tongue contours from 2d ultrasound images. 2157-2161 - Andrea Bandini, Slim Ouni, Piero Cosi, Silvia Orlandi, Claudia Manfredi:
Accuracy of a markerless acquisition technique for studying speech articulators. 2162-2166 - Yujie Chi, Kiyoshi Honda, Jianguo Wei, Hui Feng, Jianwu Dang:
Measuring oral and nasal airflow in production of Chinese plosive. 2167-2171 - Carlo Drioli, Gian Luca Foresti:
Enhanced videokymographic data analysis based on vocal folds dynamics modeling. 2172-2176 - Andrew J. Kolb, Michael T. Johnson, Jeffrey Berry:
Interpolation of tongue fleshpoint kinematics from combined EMA position and orientation data. 2177-2181 - Gustavo Andrade-Miranda, Nathalie Henrich Bernardoni, Juan Ignacio Godino-Llorente:
A new technique for assessing glottal dynamics in speech and singing by means of optical-flow computation. 2182-2186 - Alexei Kochetov, Phil Howson:
On the incompatibility of trilling and palatalization: a single-subject study of sustained apical and uvular trills. 2187-2191 - Pengcheng Zhu, Lei Xie, Yunlin Chen:
Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings. 2192-2196
Speech Synthesis 1-3
- Yang Wang, Minghao Yang, Zhengqi Wen, Jianhua Tao:
Combining extreme learning machine and decision tree for duration prediction in HMM based speech synthesis. 2197-2201 - Duy Khanh Ninh, Yoichi Yamashita:
F0 parameterization of glottalized tones for HMM-based vietnamese TTS. 2202-2206 - Thomas Merritt, Junichi Yamagishi, Zhizheng Wu, Oliver Watts, Simon King:
Deep neural network context embeddings for model selection in rich-context HMM synthesis. 2207-2211 - Bo Chen, Zhehuai Chen, Jiachen Xu, Kai Yu:
An investigation of context clustering for statistical speech synthesis with deep neural network. 2212-2216 - Oliver Watts, Zhizheng Wu, Simon King:
Sentence-level control vectors for deep neural network speech synthesis. 2217-2221 - Simon Betz, Petra Wagner, David Schlangen:
Micro-structure of disfluencies: basics for conversational speech synthesis. 2222-2226 - György Szaszák, András Beke, Gábor Olaszy, Bálint Pál Tóth:
Using automatic stress extraction from audio for improved prosody modelling in speech synthesis. 2227-2231 - Pierre Lanchantin, Christophe Veaux, Mark J. F. Gales, Simon King, Junichi Yamagishi:
Reconstructing voices within the multiple-average-voice-model framework. 2232-2236 - Ye Kyaw Thu, Win Pa Pa, Jinfu Ni, Yoshinori Shiga, Andrew M. Finch, Chiori Hori, Hisashi Kawai, Eiichiro Sumita:
HMM based myanmar text to speech system. 2237-2241 - Shinji Takaki, Sangjin Kim, Junichi Yamagishi, JongJin Kim:
Multiple feed-forward deep neural networks for statistical parametric speech synthesis. 2242-2246
Spoken Translation & Speech-to-speech
- Nicholas Ruiz, Qin Gao, William Lewis, Marcello Federico:
Adapting machine translation models toward misrecognized speech with text-to-speech pronunciation rules and acoustic confusability. 2247-2251 - Frédéric Béchet, Benoît Favre, Mickael Rouvier:
"speech is silver, but silence is golden": improving speech-to-speech translation performance by slashing users input. 2252-2256 - Raymond W. M. Ng, Kashif Shah, Lucia Specia, Thomas Hain:
A study on the stability and effectiveness of features in quality estimation for spoken language translation. 2257-2261 - Joris Pelemans, Tom Vanallemeersch, Kris Demuynck, Hugo Van hamme, Patrick Wambacq:
Efficient language model adaptation for automatic speech recognition of spoken translations. 2262-2266 - Takashi Mieno, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
Speed or accuracy? a study in evaluation of simultaneous speech translation. 2267-2271 - Marcin Junczys-Dowmunt, Pawel Przybysz, Arleta Staszuk, Eun-Kyoung Kim, Jaewon Lee:
Large scale speech-to-text translation with out-of-domain corpora using better context-based models and domain adaptation. 2272-2276
Speech and Audio Segmentation and Classification; Voice Activity Detection 1-3
- Inyoung Hwang, Jaeseong Sim, Sang-Hyeon Kim, Kwang-Sub Song, Joon-Hyuk Chang:
A statistical model-based voice activity detection using multiple DNNs and noise awareness. 2277-2281 - Qing Wang, Jun Du, Xiao Bao, Zi-Rui Wang, Li-Rong Dai, Chin-Hui Lee:
A universal VAD based on jointly trained deep neural networks. 2282-2286 - Ge Zhan, Zhaoqiong Huang, Dongwen Ying, Jielin Pan, Yonghong Yan:
Spectrographic speech mask estimation using the time-frequency correlation of speech presence. 2287-2291 - Houman Ghaemmaghami, David Dean, Shahram Kalantari, Sridha Sridharan, Clinton Fookes:
Complete-linkage clustering for voice activity detection in audio and visual speech. 2292-2296 - Kaavya Sriskandaraja, Vidhyasaharan Sethu, Phu Ngoc Le, Eliathamby Ambikairajah:
A model based voice activity detector for noisy environments. 2297-2301 - Fei Tao, John H. L. Hansen, Carlos Busso:
An unsupervised visual-only voice activity detection approach using temporal orofacial features. 2302-2306
Advances in iVector-based Speaker Verification
- Patrick Kenny, Themos Stafylakis, Md. Jahangir Alam, Marcel Kockmann:
An i-vector backend for speaker verification. 2307-2311 - Maria Joana Correia, Alessio Brutti, Alberto Abad:
Multi-channel speaker verification based on total variability modelling. 2312-2316 - Na Li, Man-Wai Mak:
SNR-invariant PLDA modeling for robust speaker verification. 2317-2321 - Md. Hafizur Rahman, David Dean, Ahilan Kanagasundaram, Sridha Sridharan:
Investigating in-domain data requirements for PLDA training. 2322-2326 - Ondrej Glembek, Pavel Matejka, Oldrich Plchot, Jan Pesán, Lukás Burget, Petr Schwarz:
Migrating i-vectors between speaker recognition systems using regression neural networks. 2327-2331 - Ahilan Kanagasundaram, David Dean, Sridha Sridharan:
Improving PLDA speaker verification using WMFD and linear-weighted approaches in limited microphone data conditions. 2332-2336
Voice Quality
- Christer Gobl, Irena Yanushevskaya, Ailbhe Ní Chasaide:
The relationship between voice source parameters and the maxima dispersion quotient (MDQ). 2337-2341 - Manu Airaksinen, Tom Bäckström, Paavo Alku:
Glottal inverse filtering based on quadratic programming. 2342-2346 - N. P. Narendra, K. Sreenivasa Rao:
Automatic detection of creaky voice using epoch parameters. 2347-2351 - Rikke Louise Bundgaard-Nielsen, Brett Baker:
Perception of voicing in the absence of native voicing experience. 2352-2356 - Jody Kreiman, Soo Jin Park, Patricia A. Keating, Abeer Alwan:
The relationship between acoustic and perceived intraspeaker variability in voice quality. 2357-2360 - Li Jiao, Qiuwu Ma, Ting Wang, Yi Xu:
Perceptual cues of whispered tones: are they really special? 2361-2365
Neural Networks for Language Modeling
- Tsuyoshi Morioka, Tomoharu Iwata, Takaaki Hori, Tetsunori Kobayashi:
Multiscale recurrent neural network based language model. 2366-2370 - Kazuki Irie, Ralf Schlüter, Hermann Ney:
Bag-of-words input for long history representation in neural network-based language models for speech recognition. 2371-2375 - Ahmad Emami:
Efficient machine translation decoding with slow language models. 2376-2379 - Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito:
Latent words recurrent neural network language models. 2380-2384 - Vataya Chunwijitra, Ananlada Chotimongkol, Chai Wutiwiwatchai:
Combining multiple-type input units using recurrent neural network for LVCSR language modeling. 2385-2389 - Siva Reddy Gangireddy, Steve Renals, Yoshihiko Nankaku, Akinobu Lee:
Prosodically-enhanced recurrent neural network language models. 2390-2394
Biosignal-based Spoken Communication (Special Session)
- Matthias Janke, Michael Wand:
Biosignal-based spoken communication: welcome and introduction. - Matthias Janke, Michael Wand:
Biosignal-based spoken communication: panel and discussion. - Peter Anderson, Negar M. Harandi, Scott Moisik, Ian Stavness, Sidney S. Fels:
A comprehensive 3d biomechanically-driven vocal tract model including inverse dynamics for speech research. 2395-2399 - Ian McLoughlin, Yan Song:
Low frequency ultrasonic voice activity detection using convolutional neural networks. 2400-2404 - Florent Bocquelet, Thomas Hueber, Laurent Girin, Christophe Savariaux, Blaise Yvert:
Real-time control of a DNN-based articulatory synthesizer for silent speech conversion: a pilot study. 2405-2409 - Diandra Fabre, Thomas Hueber, Florent Bocquelet, Pierre Badin:
Tongue tracking in ultrasound images using eigentongue decomposition and artificial neural networks. 2410-2414 - Jun Wang, Seongjun Hahm:
Speaker-independent silent speech recognition with across-speaker articulatory normalization and speaker adaptive training. 2415-2419 - Lorenz Diener, Matthias Janke, Tanja Schultz:
Codebook clustering for unit selection based EMG-to-speech conversion. 2420-2424 - Majid Mirbagheri, Bradley Ekin, Les Atlas, Adrian K. C. Lee:
Flexible tracking of auditory attention. 2425-2429
Robust Speech Recognition: Features, Far-field and Reverberation
- Seyedmahdad Mirsamadi, John H. L. Hansen:
A study on deep neural network acoustic model adaptation for robust far-field speech recognition. 2430-2434 - Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara:
Speech dereverberation using long short-term memory. 2435-2439 - Vijayaditya Peddinti, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur:
Reverberation robust acoustic modeling using i-vectors with time delay neural networks. 2440-2444 - Kshitiz Kumar, Chaojun Liu, Yifan Gong:
Delta-melspectra features for noise robustness to DNN-based ASR systems. 2445-2448 - Vikramjit Mitra, Julien van Hout, Mitchell McLaren, Wen Wang, Martin Graciarena, Dimitra Vergyri, Horacio Franco:
Combating reverberation in large vocabulary continuous speech recognition. 2449-2453 - Martin Karafiát, Frantisek Grézl, Lukás Burget, Igor Szöke, Jan Cernocký:
Three ways to adapt a CTS recognizer to unseen reverberated speech in BUT system for the ASpIRE challenge. 2454-2458 - Mark J. Harvilla, Richard M. Stern:
Robust parameter estimation for audio declipping in noise. 2459-2463 - Bin Huang, Dengfeng Ke, Hao Zheng, Bo Xu, Yanyan Xu, Kaile Su:
Multi-task learning deep neural networks for speech feature denoising. 2464-2468 - Yuxuan Wang, Ananya Misra, Kean K. Chin:
Time-frequency masking for large scale robust speech recognition. 2469-2473 - Rongfeng Su, Xurong Xie, Xunying Liu, Lan Wang:
Efficient use of DNN bottleneck features in generalized variable parameter HMMs for noise robust speech recognition. 2474-2478 - Deepak Baby, Hugo Van hamme:
Investigating modulation spectrogram features for deep neural network-based automatic speech recognition. 2479-2483 - Kun Han, Yanzhang He, Deblin Bagchi, Eric Fosler-Lussier, DeLiang Wang:
Deep neural network based spectral feature mapping for robust speech recognition. 2484-2488
Social Signals, Assessment and Paralinguistics
- Bo Xiao, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Analyzing speech rate entrainment and its relation to therapist empathy in drug addiction counseling. 2489-2493 - Atsushi Ando, Taichi Asami, Manabu Okamoto, Hirokazu Masataki, Sumitaka Sakauchi:
Agreement and disagreement utterance detection in conversational speech by extracting and integrating local features. 2494-2498 - Md. Nasir, Wei Xia, Bo Xiao, Brian R. Baucom, Shrikanth S. Narayanan, Panayiotis G. Georgiou:
Still together?: the role of acoustic features in predicting marital outcome. 2499-2503 - Gábor Gosztolya:
On evaluation metrics for social signal detection. 2504-2508 - Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen:
Laughter and filler detection in naturalistic audio. 2509-2513 - Aasish Pappu, Amanda Stent:
Automatic formatted transcripts for videos. 2514-2518 - Lucas Azaïs, Adrien Payan, Tianjiao Sun, Guillaume Vidal, Tina Zhang, Eduardo Coutinho, Florian Eyben, Björn W. Schuller:
Does my speech rock? automatic assessment of public speaking skills. 2519-2523 - Roman B. Sergienko, Alexander Schmitt:
Verbal intelligence identification based on text classification. 2524-2528 - Shan-Wen Hsiao, Hung-Ching Sun, Ming-Chuan Hsieh, Ming-Hsueh Tsai, Hsin-Chih Lin, Chi-Chun Lee:
A multimodal approach for automatic assessment of school principals' oral presentation during pre-service training program. 2529-2533 - T. J. Tsai:
Are you TED talk material? comparing prosody in professors and TED speakers. 2534-2538 - Hayakawa Akira, Fasih Haider, Loredana Cerrato, Nick Campbell, Saturnino Luz:
Detection of cognitive states and their correlation to speech recognition performance in speech-to-speech machine translation systems. 2539-2543
Bandwidth Extension, Quality and Intelligibility Measures
- Friedemann Köster, Sebastian Möller:
Perceptual speech quality dimensions in a conversational situation. 2544-2548 - Jens Berger, Anna Llagostera:
Multidimensional evaluation and predicting overall speech quality. 2549-2552 - Andreas Gaich, Pejman Mowlaee:
On speech intelligibility estimation of phase-aware single-channel speech enhancement. 2553-2557 - Ricard Marxer, Martin Cooke, Jon Barker:
A framework for the evaluation of microscopic intelligibility models. 2558-2562 - Asger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen:
A binaural short time objective intelligibility measure for noisy and enhanced speech. 2563-2567 - Yan Tang, Martin Cooke, Bruno M. Fazenda, Trevor J. Cox:
A glimpse-based approach for predicting binaural intelligibility with single and multiple maskers in anechoic conditions. 2568-2572 - Fei Chen:
Improving the prediction power of the speech transmission index to account for non-linear distortions introduced by noise-reduction algorithms. 2573-2577 - Kehuang Li, Zhen Huang, Yong Xu, Chin-Hui Lee:
DNN-based speech bandwidth expansion and its application to adding high-frequency missing features for automatic speech recognition of narrowband speech. 2578-2582 - Hannu Pulakka, Ville Myllylä, Anssi Rämö, Paavo Alku:
Speech quality evaluation of artificial bandwidth extension: comparing subjective judgments and instrumental predictions. 2583-2587 - M. A. Tugtekin Turan, Engin Erzin:
Synchronous overlap and add of spectra for enhancement of excitation in artificial bandwidth extension of speech. 2588-2592 - Yingxue Wang, Shenghui Zhao, Wenbo Liu, Ming Li, Jingming Kuang:
Speech bandwidth expansion based on deep neural networks. 2593-2597 - Bin Liu, Jianhua Tao, Zhengqi Wen, Ya Li, Danish Bukhari:
A novel method of artificial bandwidth extension using deep architecture. 2598-2602
Show and Tell Session 1-4 (Special Session)
- Kong-Aik Lee, Guangsen Wang, Kam Pheng Ng, Hanwu Sun, Trung Hieu Nguyen, Ngoc Thuy Huong Thai, Bin Ma, Haizhou Li:
The reddots platform for mobile crowd-sourcing of speech data. 2603-2604 - Takayuki Arai:
Two extensions of umeda and teranishi's physical models of the human vocal tract. 2605-2606 - Mateusz Budnik, Laurent Besacier, Johann Poignant, Hervé Bredin, Claude Barras, Mickaël Stefas, Pierrick Bruneau, Thomas Tamisier:
Collaborative annotation for person identification in TV shows. 2607-2608 - Thomas Kisler, Florian Schiel, Uwe D. Reichel, Christoph Draxler:
Phonetic/linguistic web services at BAS. 2609-2610 - Raphael Winkelmann:
Managing speech databases with emur and the EMU-webapp. 2611-2612 - Sebastian Wankerl, Florian Hönig, Anton Batliner, Juan Rafael Orozco-Arroyave, Elmar Nöth:
Visual comparison of speaker groups. 2613-2614 - Rohit Kumar, Matthew E. Roy, Sanjika Hewavitharana, Dennis N. Mehay, Nina Zinovieva:
Tools for rapid customization of S2s systems for emergent domains. 2615-2616 - Florian Metze, Eric Riebling, Eric Fosler-Lussier, Andrew R. Plummer, Rebecca Bates:
The speech recognition virtual kitchen turns one. 2617-2618 - Jan Rennies, Andreas Volgenandt, Henning F. Schepker, Simon Doclo:
Model-based adaptive pre-processing of speech for enhanced intelligibility in noise and reverberation. 2619-2620 - Sebastian Möller, Tilo Westermann:
Experiences with and new application ideas for the interspeech app. 2621-2622 - Dmitry Sityaev, Praphul Kumar, Rajesh Ramchander:
Traditional IVR and visual IVR - killing two birds with one stone. 2623-2624
Discriminative Acoustic Training Methods for ASR
- Rogier C. van Dalen, Mark J. F. Gales:
Annotating large lattices with the exact word error. 2625-2629 - Vimal Manohar, Daniel Povey, Sanjeev Khudanpur:
Semi-supervised maximum mutual information training of deep neural network acoustic models. 2630-2634 - Shiliang Zhang, Hui Jiang, Si Wei, Li-Rong Dai:
Rectified linear neural networks with tied-scalar regularization for LVCSR. 2635-2639 - Yanzhang He, Eric Fosler-Lussier:
Segmental conditional random fields with deep neural networks as acoustic models for first-pass word recognition. 2640-2644 - Dongpeng Chen, Brian Mak:
Distinct triphone acoustic modeling using deep neural networks. 2645-2649 - Gregory Gelly, Jean-Luc Gauvain:
Minimum word error training of RNN-based voice activity detection. 2650-2654
Syllables and Segments 1, 2
- A. P. Prathosh, A. G. Ramakrishnan, T. V. Ananthapadmanabha:
Classification of place-of-articulation of stop consonants using temporal analysis. 2655-2659 - Marissa S. Barlaz, Maojing Fu, Zhi-Pei Liang, Ryan Shosted, Bradley P. Sutton:
The emergence of nasal velar codas in Brazilian Portuguese: an rt-MRI study. 2660-2664 - Elise Michon, Emmanuel Dupoux, Alejandrina Cristià:
Salient dimensions in implicit phonotactic learning. 2665-2669 - Phil Howson:
An acoustic examination of the three-way sibilant contrast in lower sorbian. 2670-2674 - Jiahong Yuan, Mark Liberman:
Investigating consonant reduction in Mandarin Chinese with improved forced alignment. 2675-2678 - Marianne Pouplier, Stefania Marin, Alexei Kochetov:
Durational characteristics and timing patterns of Russian onset clusters at two speaking rates. 2679-2683
Topics in Paralinguistics
- Thomas F. Quatieri, James R. Williamson, Christopher J. Smalt, Tejash Patel, Joseph Perricone, Daryush D. Mehta, Brian S. Helfer, Gregory A. Ciccarelli, Darrell O. Ricke, Nicolas Malyska, Jeff Palmer, Kristin Heaton, Marianna Eddy, Joseph Moran:
Vocal biomarkers to discriminate cognitive load in a working memory task. 2684-2688 - Chunlei Zhang, Gang Liu, Chengzhu Yu, John H. L. Hansen:
I-vector based physical task stress detection with different fusion strategies. 2689-2693 - László Tóth, Gábor Gosztolya, Veronika Vincze, Ildikó Hoffmann, Gréta Szatlóczki, Edit Biró, Fruzsina Zsura, Magdolna Pákáski, János Kálmán:
Automatic detection of mild cognitive impairment from spontaneous speech using ASR. 2694-2698 - Maxim Sidorov, Christina Brester, Alexander Schmitt:
Contemporary stochastic feature selection algorithms for speech-based emotion recognition. 2699-2703 - Carlos A. Ferrer-Riesgo, Diana Torres, Eduardo González-Moreira, José Ramón Calvo de Lara, Eduardo Castillo:
Effect of different jitter-induced glottal pulse shape changes in periodicity perturbation measures. 2704-2708 - Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen:
Automatic audio sentiment extraction using keyword spotting. 2709-2713
Spoken Language Processing
- Panupong Pasupat, Dilek Hakkani-Tür:
Unsupervised relation detection using automatic alignment of query patterns extracted from knowledge graphs and query click logs. 2714-2718 - The Tung Nguyen, Graham Neubig, Hiroyuki Shindo, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura:
A latent variable model for joint pause prediction and dependency parsing. 2719-2723 - Mohammad Hadi Bokaei, Hossein Sameti, Yang Liu:
Extractive meeting summarization through speaker zone detection. 2724-2728 - Shih-Hung Liu, Kuan-Yu Chen, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, Wen-Lian Hsu:
Positional language modeling for extractive broadcast news speech summarization. 2729-2733 - Saeid Mokaram, Roger K. Moore:
Speech-based location estimation of first responders in a simulated search and rescue scenario. 2734-2738 - Tahir Sousa, Lucie Flekova, Margot Mieskes, Iryna Gurevych:
Constructive feedback, thinking process and cooperation: assessing the quality of classroom interaction. 2739-2743
Voice Conversion
- Dong-Yan Huang, Minghui Dong, Haizhou Li:
A real-time variable-q non-stationary Gabor transform for pitch shifting. 2744-2748 - Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki:
Many-to-many voice conversion based on multiple non-negative matrix factorization. 2749-2753 - Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Statistical singing voice conversion based on direct waveform modification with global variance. 2754-2758 - Xiaohai Tian, Zhizheng Wu, Siu Wa Lee, Nguyen Quy Hy, Minghui Dong, Engsiong Chng:
System fusion for high-performance voice conversion. 2759-2763 - Agustín Alonso, Daniel Erro, Eva Navas, Inma Hernáez:
Speaker adaptation using only vocalic segments via frequency warping. 2764-2768 - Yusuke Tajiri, Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Non-audible murmur enhancement based on statistical conversion using air- and body-conductive microphones in noisy environments. 2769-2773
Advanced Crowdsourcing for Speech and Beyond (Special Session)
- Tim Polzehl, Gina-Anne Levow:
Advanced crowdsourcing for speech and beyond: introduction by the organizers. - Preethi Jyothi, Mark Hasegawa-Johnson:
Transcribing continuous speech using mismatched crowdsourcing. 2774-2778 - Shammur Absar Chowdhury, Marcos Calvo, Arindam Ghosh, Evgeny A. Stepanov, Ali Orkan Bayer, Giuseppe Riccardi, Fernando García, Emilio Sanchis Arnal:
Selection and aggregation techniques for crowdsourced semantic annotation task. 2779-2783 - Spencer Rothwell, Ahmad Elshenawy, Steele Carter, Daniela Braga, Faraz Romani, Michael Kennewick, Bob Kennewick:
Controlling quality and handling fraud in large scale crowdsourcing speech data collections. 2784-2788 - Spencer Rothwell, Steele Carter, Ahmad Elshenawy, Vladislavs Dovgalecs, Safiyyah Saleem, Daniela Braga, Bob Kennewick:
Data collection and annotation for state-of-the-art NER using unmanaged crowds. 2789-2793 - Tim Polzehl, Babak Naderi, Friedemann Köster, Sebastian Möller:
Robustness in speech quality assessment and temporal training expiry in mobile crowdsourcing environments. 2794-2798 - Babak Naderi, Tim Polzehl, Ina Wechsung, Friedemann Köster, Sebastian Möller:
Effect of trapping questions on the reliability of speech quality judgments in a crowdsourcing paradigm. 2799-2803 - Adrian Leemann, Marie-José Kolly, Jean-Philippe Goldman, Volker Dellwo, Ingrid Hove, Ibrahim Almajai, Sarah Grimm, Sylvain Robert, Daniel Wanitsch:
Voice Äpp: a mobile app for crowdsourcing Swiss German dialect data. 2804-2808 - Anastassia Loukina, Melissa Lopez, Keelan Evanini, David Suendermann-Oeft, Klaus Zechner:
Expert and crowdsourced annotation of pronunciation errors for automatic scoring systems. 2809-2813 - Hernisa Kacorri, Kaoru Shinkawa, Shin Saito:
Capcap: an output-agreement game for video captioning. 2814-2818 - Pepi Burgos, Eric Sanders, Catia Cucchiarini, Roeland van Hout, Helmer Strik:
Auris populi: crowdsourced native transcriptions of Dutch vowels spoken by adult Spanish learners. 2819-2823 - Samantha Wray, Ahmed Ali:
Crowdsource a little to label a lot: labeling a speech corpus of dialectal Arabic. 2824-2828 - Yashesh Gaur, Florian Metze, Yajie Miao, Jeffrey P. Bigham:
Using keyword spotting to help humans correct captioning faster. 2829-2833 - Tara McAllister Byun, Elaine Hitchcock, Daphna Harel:
Validating and optimizing a crowdsourced method for gradient measures of child speech. 2834-2838
Robust Speech Recognition: Adaptation
- Zhong-Qiu Wang, DeLiang Wang:
Joint training of speech separation, filterbank and acoustic model for robust automatic speech recognition. 2839-2843 - Shakti Rath, Sunil Sivadas, Bin Ma:
Joint environment and speaker normalization using factored front-end CMLLR. 2844-2848 - Akihiro Abe, Kazumasa Yamamoto, Seiichi Nakagawa:
Robust speech recognition using DNN-HMM acoustic model combining noise-aware training with spectral subtraction. 2849-2853 - Chengzhu Yu, Atsunori Ogawa, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, John H. L. Hansen:
Robust i-vector extraction for neural network adaptation in noisy environment. 2854-2857 - Michal Borsky, Petr Mizera, Petr Pollák:
Spectrally selective dithering for distorted speech recognition. 2858-2861 - Liang Lu, Steve Renals:
Feature-space speaker adaptation for probabilistic linear discriminant analysis acoustic models. 2862-2866 - Patrick Cardinal, Najim Dehak, Yu Zhang, James R. Glass:
Speaker adaptation using the i-vector technique for bottleneck features. 2867-2871 - Penny Karanasou, Mark J. F. Gales, Philip C. Woodland:
I-vector estimation using informative priors for adaptation of deep neural networks. 2872-2876 - Sri Garimella, Arindam Mandal, Nikko Strom, Björn Hoffmeister, Spyros Matsoukas, Sree Hari Krishnan Parthasarathi:
Robust i-vector based adaptation of DNN acoustic model for speech recognition. 2877-2881 - Natalia A. Tomashenko, Yuri Y. Khokhlov:
GMM-derived features for effective unsupervised adaptation of deep neural network acoustic models. 2882-2886 - Roger Hsiao, Tim Ng, Stavros Tsakalidis, Long Nguyen, Richard M. Schwartz:
Unsupervised adaptation for deep neural network using linear least square method. 2887-2891 - Sheng Li, Xugang Lu, Yuya Akita, Tatsuya Kawahara:
Ensemble speaker modeling using speaker adaptive training deep neural network for speaker adaptation. 2892-2896 - Mortaza Doulaty, Oscar Saz, Thomas Hain:
Data-selective transfer learning for multi-domain speech recognition. 2897-2901
Speech and Audio Segmentation and Classification; Voice Activity Detection 1-3
- Ganna Raboshchuk, Peter Jancovic, Climent Nadeu, Alex Peiró Lilja, Münevver Köküer, Blanca Muñoz Mahamud, Ana Riverola de Veciana:
Automatic detection of equipment alarms in a neonatal intensive care unit environment: a knowledge-based approach. 2902-2906 - Jia Dai, Wenju Liu, Chongjia Ni, Like Dong, Hong Yang:
"multilingual" deep neural network for music genre classification. 2907-2911 - Baiyang Liu, Björn Hoffmeister, Ariya Rastrow:
Accurate endpointing with expected pause duration. 2912-2916 - Wenbo Liu, Zhiding Yu, Bhiksha Raj, Ming Li:
Locality constrained transitive distance clustering on speech data. 2917-2921 - Miquel Espi, Masakiyo Fujimoto, Keisuke Kinoshita, Tomohiro Nakatani:
Feature extraction strategies in deep learning based acoustic event detection. 2922-2926 - Peter Transfeld, Simon Receveur, Tim Fingscheidt:
An acoustic event detection framework and evaluation metric for surveillance in cars. 2927-2931 - Abdessalam Bouchekif, Géraldine Damnati, Yannick Estève, Delphine Charlet, Nathalie Camelin:
Diachronic semantic cohesion for topic segmentation of TV broadcast news. 2932-2936 - Ivan Kraljevski, Zheng-Hua Tan, Maria Paola Bissiri:
Comparison of forced-alignment speech recognition and humans for generating reference VAD. 2937-2941 - Bernhard Lehner, Gerhard Widmer, Reinhard Sonnleitner:
Improving voice activity detection in movies. 2942-2946
Speech and Hearing Disorders
- Tomas Lustyk, Petr Bergl, Tino Haderlein, Elmar Nöth, Roman Cmejla:
Language-independent method for analysis of German stuttering recordings. 2947-2951 - Ahmed Y. Al-nasheri, Zulfiqar Ali, Ghulam Muhammad, Mansour Alsulaiman:
An investigation of MDVP parameters for voice pathology detection on three different databases. 2952-2956 - Jiantao Wu, Ping Yu, Nan Yan, Lan Wang, Xiaohui Yang, Manwa L. Ng:
Energy distribution analysis and nonlinear dynamical analysis of adductor spasmodic dysphonia. 2957-2961 - Benjawan Kasisopa, Nittayapa Klangpornkun, Denis Burnham:
Auditory-visual tone perception in hearing impaired Thai listeners. 2962-2966 - Panying Rong, Yana Yunusova, Jordan R. Green:
Speech intelligibility decline in individuals with fast and slow rates of ALS progression. 2967-2971 - Rong Na A, Koichi Mori, Naomi Sakai:
Latency analysis of speech shadowing reveals processing differences in Japanese adults who do and do not stutter. 2972-2976 - Brigitte Bigi, Katarzyna Klessa, Laurianne Georgeton, Christine Meunier:
A syllable-based analysis of speech temporal organization: a comparison between speaking styles in dysarthric and healthy populations. 2977-2981 - Bernd T. Meyer, Birger Kollmeier, Jasper Ooster:
Autonomous measurement of speech intelligibility utilizing automatic speech recognition. 2982-2986 - Monja Angelika Knoll, Melissa Johnstone, Charlene Blakely:
Can you hear me? acoustic modifications in speech directed to foreigners and hearing-impaired people. 2987-2990 - Yu Ting Yeung, Ka-Ho Wong, Helen M. Meng:
Improving automatic forced alignment for dysarthric speech transcription. 2991-2995
Speaker Recognition and Diarization 1-3
- Kong-Aik Lee, Anthony Larcher, Guangsen Wang, Patrick Kenny, Niko Brümmer, David A. van Leeuwen, Hagai Aronowitz, Marcel Kockmann, Carlos Vaquero, Bin Ma, Haizhou Li, Themos Stafylakis, Md. Jahangir Alam, Albert Swart, Javier Perez:
The reddots data collection for speaker recognition. 2996-3000 - Yongjun He, Chen Chen, Jiqing Han:
Noise-robust speaker recognition based on morphological component analysis. 3001-3005 - Andreas Nautsch, Rahim Saeidi, Christian Rathgeb, Christoph Busch:
Analysis of mutual duration and noise effects in speaker recognition: benefits of condition-matched cohort selection in score normalization. 3006-3010 - Josué Fredes, José Novoa, Víctor Poblete, Simon King, Richard M. Stern, Néstor Becerra Yoma:
Robustness to additive noise of locally-normalized cepstral coefficients in speaker verification. 3011-3015 - Navid Shokouhi, John H. L. Hansen:
Probabilistic linear discriminant analysis for robust speaker identification in co-channel speech. 3016-3020 - Hongcui Wang, Di Jin, Lantian Li, Jianwu Dang:
Community detection with manifold learning on speaker i-vector space for Chinese. 3021-3025 - Sree Harsha Yella, Andreas Stolcke:
A comparison of neural network feature transforms for speaker diarization. 3026-3030 - Ilya Shapiro, Neta Rabin, Irit Opher, Itshak Lapidot:
Clustering short push-to-talk segments. 3031-3035 - Anna Fedorova, Ondrej Glembek, Tomi Kinnunen, Pavel Matejka:
Exploring ANN back-ends for i-vector based speaker age estimation. 3036-3040 - Désiré Bansé, George R. Doddington, Daniel Garcia-Romero, John J. Godfrey, Craig S. Greenberg, Jaime Hernandez-Cordero, John M. Howard, Alvin F. Martin, Lisa P. Mason, Alan McCree, Douglas A. Reynolds:
Analysis of the second phase of the 2013-2014 i-vector machine learning challenge. 3041-3045 - Alvin F. Martin, Craig S. Greenberg, John M. Howard, Désiré Bansé, George R. Doddington, Jaime Hernandez-Cordero, Lisa P. Mason:
NIST language recognition evaluation - plans for 2015. 3046-3050
Dialogue and Discourse
- Marcin Wlodarczak, Mattias Heldner, Jens Edlund:
Communicative needs and respiratory constraints. 3051-3055 - Uwe D. Reichel, Nina Pörner, Dianne Nowack, Jennifer Cole:
Analysis and classification of cooperative and competitive dialogs. 3056-3060 - Alessandra Cervone, Catherine Lai, Silvia Pareti, Peter Bell:
Towards automatic detection of reported speech in dialogue using prosodic cues. 3061-3065 - Andrew Rosenberg, Raul Fernandez, Bhuvana Ramabhadran:
Modeling phrasing and prominence using deep recurrent learning. 3066-3070 - Céline De Looze, Irena Yanushevskaya, Andy Murphy, Eoghan O'Connor, Christer Gobl:
Pitch declination and reset as a function of utterance duration in conversational speech data. 3071-3075 - Valerie Freeman, Gina-Anne Levow, Richard A. Wright, Mari Ostendorf:
Investigating the role of 'yeah' in stance-dense conversation. 3076-3080
Speaker Recognition and Diarization 1-3
- Brecht Desplanques, Kris Demuynck, Jean-Pierre Martens:
Factor analysis for speaker segmentation and improved speaker diarization. 3081-3085 - Koji Inoue, Yukoh Wakabayashi, Hiromasa Yoshimoto, Katsuya Takanashi, Tatsuya Kawahara:
Enhanced speaker diarization with detection of backchannels using eye-gaze information in poster conversations. 3086-3090 - Héctor Delgado, Xavier Anguera, Corinne Fredouille, Javier Serrano:
Novel clustering selection criterion for fast binary key speaker diarization. 3091-3095 - Gregory Sell, Daniel Garcia-Romero, Alan McCree:
Speaker diarization with i-vectors from DNN senone posteriors. 3096-3099 - Abraham Woubie, Jordi Luque, Javier Hernando:
Using voice-quality measurements with prosodic and spectral features for speaker diarization. 3100-3104 - Srikanth R. Madikeri, Ivan Himawan, Petr Motlícek, Marc Ferras:
Integrating online i-vector extractor with information bottleneck based speaker diarization system. 3105-3109
L1/L2 Speech Perception and Acquisition
- Jiyoun Choi, Mirjam Broersma, Anne Cutler:
Enhanced processing of a lost language: linguistic knowledge or linguistic skill? 3110-3114 - Ann-Kathrin Grohe, Gregory J. Poarch, Adriana Hanulíková, Andrea Weber:
Production inconsistencies delay adaptation to foreign accents. 3115-3119 - Mikhail Ordin, Leona Polyanskaya:
Acquisition of English speech rhythm by monolingual children. 3120-3124 - Odette Scharenborg:
Durational information in word-initial lexical embeddings in spoken Dutch. 3125-3129 - Fei Chen, Nan Yan, Lan Wang, Tao Yang, Jiantao Wu, Han Zhao, Gang Peng:
The development of categorical perception of lexical tones in Mandarin-speaking preschoolers. 3130-3134 - Tomohiko Ooigawa:
Perception of Italian liquids by Japanese listeners: comparisons to Spanish liquids. 3135-3139
LVCSR Systems and Applications
- George Saon, Hong-Kwang Jeff Kuo, Steven J. Rennie, Michael Picheny:
The IBM 2015 English conversational telephone speech recognition system. 3140-3144 - Xunying Liu, Federico Flego, Linlin Wang, Chao Zhang, Mark J. F. Gales, Philip C. Woodland:
The Cambridge University 2014 BOLT conversational telephone Mandarin Chinese LVCSR system for speech translation. 3145-3149 - Samuel Thomas, George Saon, Hong-Kwang Jeff Kuo, Lidia Mangu:
The IBM BOLT speech transcription system. 3150-3153 - M. Ali Basha Shaik, Zoltán Tüske, Muhammad Ali Tahir, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney:
Improvements in RWTH LVCSR evaluation systems for Polish, Portuguese, English, urdu, and Arabic. 3154-3158 - Thiago Fraga-Silva, Jean-Luc Gauvain, Lori Lamel, Antoine Laurent, Viet Bac Le, Abdelkhalek Messaoudi:
Active learning based data selection for limited resource STT and KWS. 3159-3163 - Preethi Jyothi, Mark Hasegawa-Johnson:
Improved hindi broadcast ASR by adapting the language model and pronunciation model using a priori syntactic and morphophonemic knowledge. 3164-3168
Zero Resource Speech Technologies: Unsupervised Discovery of Linguistic Units (Special Session)
- Maarten Versteegh, Roland Thiollière, Thomas Schatz, Xuan-Nga Cao, Xavier Anguera, Aren Jansen, Emmanuel Dupoux:
The zero resource speech challenge 2015. 3169-3173 - Leonardo Badino, Alessio Mereta, Lorenzo Rosasco:
Discovering discrete subword units with binarized autoencoders and hidden-Markov-model encoders. 3174-3178 - Roland Thiollière, Ewan Dunbar, Gabriel Synnaeve, Maarten Versteegh, Emmanuel Dupoux:
A hybrid dynamic time warping-deep neural network architecture for unsupervised acoustic modeling. 3179-3183 - Wiehan Agenbag, Thomas Niesler:
Automatic segmentation and clustering of speech using sparse coding and metaheuristic search. 3184-3188 - Hongjie Chen, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li:
Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study. 3189-3193 - Pallavi Baljekar, Sunayana Sitaram, Prasanna Kumar Muthukumar, Alan W. Black:
Using articulatory features and inferred phonological segments in zero resource speech processing. 3194-3198 - Daniel Renshaw, Herman Kamper, Aren Jansen, Sharon Goldwater:
A comparison of neural network methods for unsupervised representation learning on the zero resource speech challenge. 3199-3203 - Okko Räsänen, Gabriel Doyle, Michael C. Frank:
Unsupervised word discovery from speech using automatic segmentation into syllable-like units. 3204-3208 - Vince Lyzinski, Gregory Sell, Aren Jansen:
An evaluation of graph clustering methods for unsupervised term discovery. 3209-3213
Neural Networks: Novel Architectures for LVCSR
- Vijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur:
A time delay neural network architecture for efficient modeling of long temporal contexts. 3214-3218 - Xiangang Li, Xihong Wu:
Long short-term memory based convolutional recurrent neural networks for large vocabulary speech recognition. 3219-3223 - Chao Zhang, Philip C. Woodland:
Parameterised sigmoid and reLU hidden activation functions for DNN acoustic modelling. 3224-3228 - Chiyuan Zhang, Stephen Voinea, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso A. Poggio:
Discriminative template learning in group-convolutional networks for invariant speech representations. 3229-3233 - Sunil Sivadas, Zhenzhou Wu, Bin Ma:
Investigation of parametric rectified linear units for noise robust speech recognition. 3234-3238 - Hang Su, Haihua Xu:
Multi-softmax deep neural network for semi-supervised training. 3239-3243 - Jia Cui, George Saon, Bhuvana Ramabhadran, Brian Kingsbury:
A multi-region deep neural network model in speech recognition. 3244-3248 - Liang Lu, Xingxing Zhang, Kyunghyun Cho, Steve Renals:
A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition. 3249-3253 - Linchen Zhu, Kevin Kilgour, Sebastian Stüker, Alex Waibel:
Gaussian free cluster tree construction using deep neural network. 3254-3258 - Mengxiao Bi, Yanmin Qian, Kai Yu:
Very deep convolutional neural networks for LVCSR. 3259-3263 - William Chan, Nan Rosemary Ke, Ian R. Lane:
Transferring knowledge from a RNN to a DNN. 3264-3268 - Changliang Liu, Jinyu Li, Yifan Gong:
SVD-based universal DNN modeling for multiple scenarios. 3269-3273 - Zhuo Chen, Shinji Watanabe, Hakan Erdogan, John R. Hershey:
Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks. 3274-3278
Speech and Music Analysis
- Yuzhou Liu, DeLiang Wang:
Speaker-dependent multipitch tracking using deep neural networks. 3279-3283 - P. Sujith, A. P. Prathosh, A. G. Ramakrishnan, Prasanta Kumar Ghosh:
An error correction scheme for GCI detection algorithms using pitch smoothness criterion. 3284-3288 - RaviShankar Prasad, Bayya Yegnanarayana:
Robust pitch estimation in noisy speech using ZTW and group delay function. 3289-3292 - Zhaoqiong Huang, Ge Zhan, Dongwen Ying, Yonghong Yan:
Robust localization of single sound source based on phase difference regression. 3293-3297 - Daniele Salvati, Carlo Drioli, Gian Luca Foresti:
Frequency map selection using a RBFN-based classifier in the MVDR beamformer for speaker localization in reverberant rooms. 3298-3301 - Ning Ma, Guy J. Brown, Tobias May:
Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions. 3302-3306 - Shuai Nie, Wei Xue, Shan Liang, Xueliang Zhang, Wenju Liu, Liwei Qiao, Jianping Li:
Joint optimization of recurrent networks exploiting source auto-regression for source separation. 3307-3311 - Rong Gong, Philippe Cuvillier, Nicolas Obin, Arshia Cont:
Real-time audio-to-score alignment of singing voice based on melody and lyric information. 3312-3316 - Jun-Yong Lee, Hye-Seung Cho, Hyoung-Gook Kim:
Vocal separation from monaural music using adaptive auditory filtering based on kernel back-fitting. 3317-3320 - Frederick Z. Yen, Mao-Chang Huang, Tai-Shih Chi:
A two-stage singing voice separation algorithm using spectro-temporal modulation features. 3321-3324 - Hyungjun Lim, Myung Jong Kim, Hoirin Kim:
Robust sound event classification using LBP-HOG based bag-of-audio-words feature representation. 3325-3329
Speech Synthesis 1-3
- Kaisheng Yao, Geoffrey Zweig:
Sequence-to-sequence neural net models for grapheme-to-phoneme conversion. 3330-3334 - Rosie Kay, Oliver Watts, Roberto Barra-Chicote, Cassie Mayo:
Knowledge versus data in TTS: evaluation of a continuum of synthesis systems. 3335-3339 - Steffen Eger:
Improving G2p from wiktionary and other (web) resources. 3340-3344 - Chuang Ding, Pengcheng Zhu, Lei Xie:
BLSTM neural networks for speech driven head motion synthesis. 3345-3349 - Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Articulatory controllable speech modification based on Gaussian mixture models with direct waveform modification using spectrum differential. 3350-3354 - Thomas Le Cornu, Ben Milner:
Reconstructing intelligible audio speech from visual speech features. 3355-3359 - Sunayana Sitaram, Alok Parlikar, Gopala Krishna Anumanchipalli, Alan W. Black:
Universal grapheme-based speech synthesis. 3360-3364 - Mirjam Wester, Matthew P. Aylett, Marcus Tomalin, Rasmus Dall:
Artificial personality and disfluency. 3365-3369 - Marc Evrard, Samuel Delalez, Christophe d'Alessandro, Albert Rilliard:
Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis. 3370-3374 - Luc Ardaillon, Gilles Degottex, Axel Roebel:
A multi-layer F0 model for singing voice synthesis using a b-spline representation with intuitive controls. 3375-3379 - Igor Jauk, Antonio Bonafonte, Paula Lopez-Otero, Laura Docío Fernández:
Creating expressive synthetic voices by unsupervised clustering of audiobooks. 3380-3384 - Sandesh Aryal, Ricardo Gutierrez-Osuna:
Articulatory-based conversion of foreign accents with deep neural networks. 3385-3389
Speech and Cognition in Adverse Conditions
- Mikko Tiainen, Lari Vainio, Kaisa Tiippana, Naeem Komeilipoor, Martti Vainio:
Action planning and congruency effect between articulation and grasping. 3390-3393 - Ron M. Hecht, Aharon Bar-Hillel, Stas Tiomkin, Hadar Levi, Omer Tsimhoni, Naftali Tishby:
Cognitive workload and vocabulary sparseness: theory and practice. 3394-3398 - Valentin Andrei, Horia Cucu, Andi Buzo, Corneliu Burileanu:
Counting competing speakers in a timeframe - human versus computer. 3399-3403 - Fei Chen, Alexander Siu Tai Kwok:
Segmental contribution to the intelligibility of ideal binary-masked sentences. 3404-3407 - Mako Ishida, Takayuki Arai:
Perception of an existing and non-existing L2 English phoneme behind noise by Japanese native speakers. 3408-3411 - Chitralekha Bhat, Sunil Kumar Kopparapu:
Viseme comparison based on phonetic cues for varying speech accents. 3412-3416
Audio Signal Analysis and Representation
- Colm O'Reilly, Nicola M. Marples, David J. Kelly, Naomi Harte:
Quantifying difference in vocalizations of bird populations. 3417-3421 - Jae Choi, Jeunghun Kim, Shin Jae Kang, Nam Soo Kim:
Reverberation-robust acoustic indoor localization. 3422-3425 - Huaiping Ming, Dong-Yan Huang, Lei Xie, Haizhou Li, Minghui Dong:
An alternating optimization approach for phase retrieval. 3426-3430 - Xiong Xiao, Shengkui Zhao, Xionghu Zhong, Douglas L. Jones, Engsiong Chng, Haizhou Li:
Learning to estimate reverberation time in noisy and reverberant rooms. 3431-3435 - Cheng Pang, Jie Zhang, Hong Liu:
Direction of arrival estimation based on reverberation weighting and noise error estimator. 3436-3440 - Huy Phan, Lars Hertel, Marco Maaß, Radoslaw Mazur, Alfred Mertins:
Representing nonspeech audio signals through speech classification models. 3441-3445
Robustness in Speaker Recognition
- Luciana Ferrer, Mitchell McLaren, Aaron Lawson, Martin Graciarena:
Mitigating the effects of non-stationary unseen noises on language recognition performance. 3446-3450 - Moez Ajili, Jean-François Bonastre, Solange Rossato, Juliette Kahn, Itshak Lapidot:
An information theory based data-homogeneity measure for voice comparison. 3451-3455 - David Dean, Ahilan Kanagasundaram, Houman Ghaemmaghami, Md. Hafizur Rahman, Sridha Sridharan:
The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition. 3456-3460 - Hagai Aronowitz:
Score stabilization for speaker recognition trained on a small development set. 3461-3465 - Abhinav Misra, Shivesh Ranjan, Chunlei Zhang, John H. L. Hansen:
Anti-spoofing system: an investigation of measures to detect synthetic and human speech. 3466-3470 - Michael J. Carne:
A likelihood ratio-based forensic voice comparison in microphone vs. mobile mismatched conditions using Japanese /ai/. 3471-3475
Evaluation of Speech Synthesis
- Mirjam Wester, Cassia Valentini-Botinhao, Gustav Eje Henter:
Are we using enough listeners? no! - an empirically-supported critique of interspeech 2014 TTS evaluations. 3476-3480 - Jonathan Chevelu, Damien Lolive, Sébastien Le Maguer, David Guennec:
How to compare TTS systems: a new subjective evaluation methodology focused on differences. 3481-3485 - Lukas Latacz, Werner Verhelst:
Double-ended prediction of the naturalness ratings of the blizzard challenge 2008-2013. 3486-3490 - Takashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga, Akinori Ito:
Entropy-based sentence selection for speech synthesis using phonetic and prosodic contexts. 3491-3495 - Tomoki Koriyama, Takao Kobayashi:
A comparison of speech synthesis systems based on GPR, HMM, and DNN with a small amount of training data. 3496-3500 - Raphael Ullmann, Ramya Rasipuram, Mathew Magimai-Doss, Hervé Bourlard:
Objective intelligibility assessment of text-to-speech systems through utterance verification. 3501-3505
Adaptive Methods for LVCSR
- Dominique Fohr, Irina Illina:
Continuous word representation using neural networks for proper name retrieval from diachronic documents. 3506-3510 - Xie Chen, T. Tan, Xunying Liu, Pierre Lanchantin, M. Wan, Mark J. F. Gales, Philip C. Woodland:
Recurrent neural network language model adaptation for multi-genre broadcast speech recognition. 3511-3515 - Wengong Jin, Tianxing He, Yanmin Qian, Kai Yu:
Paragraph vector based topic model for language model adaptation. 3516-3520 - Ching-feng Yeh, Yuan-ming Liou, Hung-yi Lee, Lin-Shan Lee:
Personalized speech recognizer with keyword-based personalized lexicon and language model using word vector representations. 3521-3525 - Sheng Li, Yuya Akita, Tatsuya Kawahara:
Discriminative data selection for lightly supervised training of acoustic model using closed caption texts. 3526-3530 - Amit Das, Mark Hasegawa-Johnson:
Cross-lingual transfer learning during supervised training in low resource scenarios. 3531-3535
Robust Speech Processing Using Observation Uncertainty and Uncertainty Propagation (Special Session)
- Ramón Fernandez Astudillo, Shinji Watanabe, Ahmed Hussen Abdelaziz, Dorothea Kolossa:
Robust speech processing using observation uncertainty and uncertainty propagation: session and paper overview. - Dayana Ribas González, Emmanuel Vincent, José Ramón Calvo de Lara:
Uncertainty propagation for noise robust speaker recognition: the case of NIST-SRE. 3536-3540 - Yuuki Tachioka, Shinji Watanabe:
Uncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features. 3541-3545 - Rahim Saeidi, Paavo Alku:
Accounting for uncertainty of i-vectors in speaker recognition using uncertainty propagation and modified imputation. 3546-3550 - Sri Harish Reddy Mallidi, Tetsuji Ogawa, Karel Veselý, Phani S. Nidadavolu, Hynek Hermansky:
Autoencoder based multi-stream combination for noise robust speech recognition. 3551-3555 - Christian Huemmer, Roland Maas, Andreas Schwarz, Ramón Fernandez Astudillo, Walter Kellermann:
Uncertainty decoding for DNN-HMM hybrid systems based on numerical sampling. 3556-3560 - Ahmed Hussen Abdelaziz, Shinji Watanabe, John R. Hershey, Emmanuel Vincent, Dorothea Kolossa:
Uncertainty propagation through deep neural networks. 3561-3565 - Marco Kühne:
Handling derivative filterbank features in bounded-marginalization-based missing data automatic speech recognition. 3566-3570 - Arun Narayanan, Ananya Misra, Kean K. Chin:
Large-scale, sequence-discriminative, joint adaptive training for masking-based robust ASR. 3571-3575 - Ramón Fernandez Astudillo, Maria Joana Correia, Isabel Trancoso:
Integration of DNN based speech enhancement and ASR. 3576-3580
Acoustic Model Adaptation and Training
- Chao Zhang, Philip C. Woodland:
A general artificial neural network extension for HTK. 3581-3585 - Tom Ko, Vijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur:
Audio augmentation for speech recognition. 3586-3589 - Xiaohui Zhang, Daniel Povey, Sanjeev Khudanpur:
A diversity-penalizing ensemble training method for deep learning. 3590-3594 - Gakuto Kurata, Daniel Willett:
Deep neural network training emphasizing central frames. 3595-3599 - Kai Chen, Zhi-Jie Yan, Qiang Huo:
Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach. 3600-3604 - Pawel Swietojanski, Peter Bell, Steve Renals:
Structured output layer with auxiliary targets for context-dependent acoustic modelling. 3605-3609 - Peter Bell, Steve Renals:
Complementary tasks for context-dependent deep neural network acoustic models. 3610-3614 - Jie Li, Heng Zhang, Xinyuan Cai, Bo Xu:
Towards end-to-end speech recognition for Chinese Mandarin using long short-term memory recurrent neural networks. 3615-3619 - Mingming Chen, Zhanlei Yang, Jizhong Liang, Yanpeng Li, Wenju Liu:
Improving deep neural networks based multi-accent Mandarin speech recognition using i-vectors and accent-specific top layer. 3620-3624 - Zhen Huang, Jinyu Li, Sabato Marco Siniscalchi, I-Fan Chen, Ji Wu, Chin-Hui Lee:
Rapid adaptation for deep neural networks through multi-task learning. 3625-3629 - Sree Hari Krishnan Parthasarathi, Björn Hoffmeister, Spyros Matsoukas, Arindam Mandal, Nikko Strom, Sri Garimella:
fMLLR based feature-space speaker adaptation of DNN acoustic models. 3630-3634 - Xiangang Li, Xihong Wu:
I-vector dependent feature space transformations for adaptive speech recognition. 3635-3639 - Mortaza Doulaty, Oscar Saz, Thomas Hain:
Unsupervised domain discovery using latent dirichlet allocation for acoustic modelling in speech recognition. 3640-3644 - Taichi Asami, Ryo Masumura, Hirokazu Masataki, Manabu Okamoto, Sumitaka Sakauchi:
Training data selection for acoustic modeling via submodular optimization of joint kullback-leibler divergence. 3645-3649
Spoken Term Detection, Spoken MT & Transliteration
- Eunah Cho, Kevin Kilgour, Jan Niehues, Alex Waibel:
Combination of NN and CRF models for joint detection of punctuation and disfluencies. 3650-3654 - Tze Siong Lau, I-Fan Chen, Chin-Hui Lee:
Tunable keyword-aware language modeling and context dependent fillers for LVCSR-based spoken keyword search. 3655-3659 - Haipeng Wang, Anton Ragni, Mark J. F. Gales, Kate M. Knill, Philip C. Woodland, Chao Zhang:
Joint decoding of tandem and hybrid systems for improved keyword spotting on low resource languages. 3660-3664 - Quoc Truong Do, Shinnosuke Takamichi, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura:
Preserving word-level emphasis in speech-to-speech translation using linear regression HSMMs. 3665-3669 - Hoang Gia Ngo, Nancy F. Chen, Binh Minh Nguyen, Bin Ma, Haizhou Li:
Phonology-augmented statistical transliteration for low-resource languages. 3670-3674 - Kazuki Oouchi, Ryota Kon'no, Takahiro Akyu, Kazuma Konno, Kazunori Kojima, Kazuyo Tanaka, Shi-wook Lee, Yoshiaki Itoh:
Evaluation of re-ranking by prioritizing highly ranked documents in spoken term detection. 3675-3679 - Abhijeet Saxena, B. Yegnanarayana:
Distinctive feature based representation of speech for query-by-example spoken term detection. 3680-3684 - Shi-wook Lee, Kazuyo Tanaka, Yoshiaki Itoh:
Combination of diverse subword units in spoken term detection. 3685-3689 - Dhananjay Ram, Afsaneh Asaei, Pranay Dighe, Hervé Bourlard:
Sparse modeling of posterior exemplars for keyword detection. 3690-3694
Stress, Load, and Pathologies
- Tin Lay Nwe, Qianli Xu, Cuntai Guan, Bin Ma:
Stress level detection using double-layer subband filter. 3695-3699 - Jürgen Trouvain, Khiet P. Truong:
Prosodic characteristics of read speech before and after treadmill running. 3700-3704 - Khiet P. Truong, Arne Nieuwenhuys, Peter Beek, Vanessa Evers:
A database for analysis of speech under physical stress: detection of exercise intensity while running and talking. 3705-3709 - Will Paul, Cecilia Ovesdotter Alm, Reynold J. Bailey, Joe Geigel, Linwei Wang:
Stressed out: what speech tells us about stress. 3710-3714 - Andreas Tsiartas, Andreas Kathol, Elizabeth Shriberg, Massimiliano de Zambotti, Adrian Willoughby:
Prediction of heart rate changes from speech features during interaction with a misbehaving dialog system. 3715-3719 - Mary Pietrowicz, Mark Hasegawa-Johnson, Karrie Karahalios:
Acoustic correlates for perceived effort levels in expressive speech. 3720-3724 - Khalid Daoudi, Ashwini Jaya Kumar:
Pitch-based speech perturbation measures using a novel GCI detection algorithm: application to pathological voice classification. 3725-3728 - Dimitra Vergyri, Bruce Knoth, Elizabeth Shriberg, Vikramjit Mitra, Mitchell McLaren, Luciana Ferrer, Pablo Garcia, Charles Marmar:
Speech-based assessment of PTSD in a military population using diverse feature classes. 3729-3733 - Bea Yu, Thomas F. Quatieri, James R. Williamson, James C. Mundt:
Cognitive impairment prediction in the elderly based on vocal biomarkers. 3734-3738 - Jorge Andrés Gómez García, Laureano Moro-Velázquez, Juan Ignacio Godino-Llorente, Germán Castellanos-Domínguez:
Automatic age detection in normal and pathological voice. 3739-3743
Interspeech 2015 Computational Paralinguistics ChallengE (ComParE): Degree of Nativeness, Parkinson's & Eating Condition (Special Session)
- Anton Batliner:
Wrapping up: the story of the compare challenges, what we learned and where to go.
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.