default search action
IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 25
Volume 25, Number 1, January 2017
- Jin Chu Wu, Alvin F. Martin, Craig S. Greenberg, Raghu N. Kacker:
The Impact of Data Dependence on Speaker Recognition Evaluation. 1-14 - Hélène Papadopoulos, George Tzanetakis:
Models for Music Analysis From a Markov Logic Networks Perspective. 15-30 - Ahmed Al-Tmeme, Wai Lok Woo, Satnam Singh Dlay, Bin Gao:
Underdetermined Convolutive Source Separation Using GEM-MU With Variational Approximated Optimum Model Order NMF2D. 31-45 - Mark A. Hasegawa-Johnson, Preethi Jyothi, Daniel McCloy, Majid Mirbagheri, Giovanni M. Di Liberto, Amit Das, Bradley Ekin, Chunxi Liu, Vimal Manohar, Hao Tang, Edmund C. Lalor, Nancy F. Chen, Paul Hager, Tyler Kekona, Rose Sloan, Adrian K. C. Lee:
ASR for Under-Resourced Languages From Probabilistic Transcription. 46-59 - Zhen Huang, Sabato Marco Siniscalchi, Chin-Hui Lee:
Bayesian Unsupervised Batch and Online Speaker Adaptation of Activation Function Parameters in Deep Models for Automatic Speech Recognition. 60-71 - Simon Durand, Juan Pablo Bello, Bertrand David, Gaël Richard:
Robust Downbeat Tracking Using an Ensemble of Convolutional Networks. 72-85 - Zhehuai Chen, Yimeng Zhuang, Yanmin Qian, Kai Yu:
Phone Synchronous Speech Recognition With CTC Lattices. 86-97 - Bo Wu, Kehuang Li, Minglei Yang, Chin-Hui Lee:
A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks. 98-107 - Hongjie Chen, Lei Xie, Cheung-Chi Leung, Xiaoming Lu, Bin Ma, Haizhou Li:
Modeling Latent Topics and Temporal Distance for Story Segmentation of Broadcast News. 108-119 - Hua Xing, John H. L. Hansen:
Single Sideband Frequency Offset Estimation and Correction for Quality Enhancement and Speaker Recognition. 120-132 - Andreas I. Koutrouvelis, Richard Christian Hendriks, Richard Heusdens, Jesper Jensen:
Relaxed Binaural LCMV Beamforming. 133-148 - Morten Kolbæk, Zheng-Hua Tan, Jesper Jensen:
Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems. 149-163 - Jakob Abeßer, Klaus Frieler, Estefanía Cano, Martin Pfleiderer, Wolf-Georg Zaddach:
Score-Informed Analysis of Tuning, Intonation, Pitch Modulation, and Dynamics in Jazz Solos. 168-177 - Alastair H. Moore, Christine Evers, Patrick A. Naylor:
Direction of Arrival Estimation in the Spherical Harmonic Domain Using Subspace Pseudointensity Vectors. 178-192 - Kun Li, Xiaojun Qian, Helen M. Meng:
Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks. 193-207 - Yoonchang Han, Jae-Hun Kim, Kyogu Lee:
Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music. 208-221
Volume 25, Number 2, February 2017
- Hanchi Chen, Thushara Dheemantha Abhayapala, Prasanga N. Samarasinghe, Wen Zhang:
Direct-to-Reverberant Energy Ratio Estimation Using a First-Order Microphone. 226-237 - Peter Bell, Pawel Swietojanski, Steve Renals:
Multitask Learning of Context-Dependent Targets in Deep Neural Network Acoustic Models. 238-247 - Rui Zhao, Kezhi Mao:
Topic-Aware Deep Compositional Models for Sentence Classification. 248-260 - Dalia El Badawy, Ngoc Q. K. Duong, Alexey Ozerov:
On-the-Fly Audio Source Separation - A Novel User-Friendly Framework. 261-272 - Filip Elvander, Johan Sward, Andreas Jakobsson:
Online Estimation of Multiple Harmonic Signals. 273-284 - Vincent Renkens, Hugo Van hamme:
Weakly Supervised Learning of Hidden Markov Models for Spoken Language Acquisition. 285-295 - Luca Remaggi, Philip J. B. Jackson, Philip Coleman, Wenwu Wang:
Acoustic Reflector Localization: Novel Image Source Reversion and Direct Localization Methods. 296-309 - Prasanga N. Samarasinghe, Thushara D. Abhayapala, Hanchi Chen:
Estimating the Direct-to-Reverberant Energy Ratio Using a Spherical Harmonics-Based Spatial Correlation Model. 310-319 - Shmulik Markovich Golan, Sharon Gannot, Walter Kellermann:
Combined LCMV-TRINICON Beamforming for Separating Multiple Speech Sources in Noisy and Reverberant Environments. 320-332 - Shakeel Ahmed, Muhammad Tahir Akhtar:
Gain Scheduling of Auxiliary Noise and Variable Step-Size for Online Acoustic Feedback Cancellation in Narrow-Band Active Noise Control Systems. 333-343 - Gabriel Sargent, Frédéric Bimbot, Emmanuel Vincent:
Estimating the Structural Segmentation of Popular Music Pieces Under Regularity Constraints. 344-358 - Jordan Cheer, Stephen Daley:
An Investigation of Delayless Subband Adaptive Filtering for Multi-Input Multi-Output Active Noise Control Applications. 359-373 - Sebastian J. Schlecht, Emanuël A. P. Habets:
Feedback Delay Networks: Echo Density and Mixing Time. 374-383 - Johannes Abel, Magdalena Kaniewska, Cyril Guillaume, Wouter Tirry, Tim Fingscheidt:
An Instrumental Quality Measure for Artificially Bandwidth-Extended Speech Signals. 384-396 - Robert Rehr, Timo Gerkmann:
An Analysis of Adaptive Recursive Smoothing with Applications to Noise PSD Estimation. 397-408 - Emilio Granell, Carlos D. Martínez-Hinarejos:
Multimodal Crowdsourcing for Transcribing Handwritten Documents. 409-419 - Yaping Ma, Yegui Xiao:
A New Strategy for Online Secondary-Path Modeling of Narrowband Active Noise Control. 420-434 - Jose A. Belloch, Alberto González, Enrique S. Quintana-Ortí, Miguel Ferrer, Vesa Välimäki:
GPU-Based Dynamic Wave Field Synthesis Using Fractional Delay Filters and Room Compensation. 435-447
Volume 25, Number 3, March 2017
- Qi He, Feng Bao, Changchun Bao:
Multiplicative Update of Auto-Regressive Gains for Codebook-Based Speech Enhancement. 457-468 - Zhongqing Wang, Sophia Yat Mei Lee, Shoushan Li, Guodong Zhou:
Emotion Analysis in Code-Switching Text With Joint Factor Graph Model. 469-480 - Ashwin Bellur, Mounya Elhilali:
Feedback-Driven Sensory Mapping Adaptation for Robust Speech Activity Detection. 481-492 - Zhiyuan Tang, Lantian Li, Dong Wang, Ravichander Vipperla:
Collaborative Joint Training With Multitask Recurrent Model for Speech and Speaker Recognition. 493-504 - Bidisha Sharma, S. R. Mahadeva Prasanna:
Sonority Measurement Using System, Source, and Suprasegmental Information. 505-518 - Hung-yi Lee, Bo-Hsiang Tseng, Tsung-Hsien Wen, Yu Tsao:
Personalizing Recurrent-Neural-Network-Based Language Model by Social Network. 519-530 - Ji Ming, Danny Crookes:
Speech Enhancement Based on Full-Sentence Correlation and Clean Speech Recognition. 531-543 - Quoc Truong Do, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Preserving Word-Level Emphasis in Speech-to-Speech Translation. 544-556 - Zhenghua Li, Jiayuan Chao, Min Zhang, Wenliang Chen, Meishan Zhang, Guohong Fu:
Coupled POS Tagging on Heterogeneous Annotations. 557-571 - Clement S. J. Doire, Mike Brookes, Patrick A. Naylor, Christopher M. Hicks, Dave Betts, Mohammad A. Dmour, Søren Holdt Jensen:
Single-Channel Online Enhancement of Speech Corrupted by Reverberation and Noise. 572-587 - Aleksandr Sizov, Kong-Aik Lee, Tomi Kinnunen:
Direct Optimization of the Detection Cost for I-Vector-Based Spoken Language Recognition. 588-597 - Imran A. Sheikh, Dominique Fohr, Irina Illina, Georges Linarès:
Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition. 598-610 - Mojtaba Farmani, Michael Syskind Pedersen, Zheng-Hua Tan, Jesper Jensen:
Informed Sound Source Localization Using Relative Transfer Functions for Hearing Aid Applications. 611-623 - Vikram C. M., S. R. Mahadeva Prasanna:
Epoch Extraction From Telephone Quality Speech Using Single Pole Filter. 624-636 - Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi:
Associative Memory Model-Based Linear Filtering and Its Application to Tandem Connectionist Blind Source Separation. 637-650 - Dani Cherkassky, Sharon Gannot:
Blind Synchronization in Wireless Acoustic Sensor Networks. 651-661 - Laurent Girin, Thomas Hueber, Xavier Alameda-Pineda:
Extending the Cascaded Gaussian Mixture Regression Framework for Cross-Speaker Acoustic-Articulatory Mapping. 662-673 - Mohamad Hasan Bahari, Alexander Bertrand, Marc Moonen:
Blind Sampling Rate Offset Estimation for Wireless Acoustic Sensor Networks Through Weighted Least-Squares Coherence Drift Estimation. 674-686 - Adam Kuklasinski, Simon Doclo, Søren Holdt Jensen, Jesper Rindom Jensen:
Correction to "Maximum Likelihood PSD Estimation for Speech Enhancement in Reverberation and Noise". 687
Volume 25, Number 4, April 2017
- Sharon Gannot, Emmanuel Vincent, Shmulik Markovich Golan, Alexey Ozerov:
A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation. 692-730 - Dongwen Ying, Ruohua Zhou, Junfeng Li, Yonghong Yan:
Window-Dominant Signal Subspace Methods for Multiple Short-Term Speech Source Localization. 731-744 - Sean U. N. Wood, Jean Rouat, Stéphane Dupont, Gueorgui Pironkov:
Blind Speech Separation and Enhancement With GCC-NMF. 745-755 - Constantin Spille, Birger Kollmeier, Bernd T. Meyer:
Combining Binaural and Cortical Features for Robust Speech Recognition. 756-767 - Yuma Koizumi, Kenta Niwa, Yusuke Hioka, Kazunori Kobayashi, Hitoshi Ohmuro:
Informative Acoustic Feature Selection to Maximize Mutual Information for Collecting Target Sources. 768-779 - Takuya Higuchi, Nobutaka Ito, Shoko Araki, Takuya Yoshioka, Marc Delcroix, Tomohiro Nakatani:
Online MVDR Beamformer Based on Complex Gaussian Mixture Model With Spatial Prior for Noise Robust ASR. 780-793 - Eita Nakamura, Kazuyoshi Yoshii, Shigeki Sagayama:
Rhythm Transcription of Polyphonic Piano Music Based on Merged-Output HMM for Multiple Voices. 794-806 - Omid Ghahabi, Javier Hernando:
Deep Learning Backend for Single and Multisession i-Vector Speaker Recognition. 807-817 - Penny Karanasou, Chunyang Wu, Mark J. F. Gales, Philip C. Woodland:
I-Vectors and Structured Neural Networks for Rapid Adaptation of Acoustic Models. 818-828 - G. Aneeja, B. Yegnanarayana:
Extraction of Fundamental Frequency From Degraded Speech Using Temporal Envelopes at High SNR Frequencies. 829-838 - Seyyed Saeed Sarfjoo, Cenk Demiroglu, Simon King:
Using Eigenvoices and Nearest-Neighbors in HMM-Based Cross-Lingual Speaker Adaptation With Limited Data. 839-851 - Yung-Yue Chen, Jia-Hao Zhang:
Background Noise Reduction Design for Dual Microphone Cellular Phones: Robust Approach. 852-862 - Liner Yang, Xinxiong Chen, Zhiyuan Liu, Maosong Sun:
Improving Word Representations with Document Labels. 863-870 - Shiliang Zhang, Cong Liu, Hui Jiang, Si Wei, Li-Rong Dai, Yu Hu:
Nonrecurrent Neural Structure for Long-Term Dependence. 871-884 - Xuefeng Yang, Kezhi Mao:
Task Independent Fine Tuning for Word Embeddings. 885-894 - Huawei Chen:
Design of Robust Broadband Beamformers Using Worst-Case Performance Optimization: A Semidefinite Programming Approach. 895-907 - Sandro Cumani, Pietro Laface:
Nonlinear I-Vector Transformations for PLDA-Based Speaker Recognition. 908-919
Volume 25, Number 5, May 2017
- Manu Airaksinen, Tom Bäckström, Paavo Alku:
Quadratic Programming Approach to Glottal Inverse Filtering by Joint Norm-1 and Norm-2 Optimization. 929-939 - Ofer Schwartz, Sharon Gannot, Emanuël A. P. Habets:
Multispeaker LCMV Beamformer and Postfilter for Source Separation and Noise Reduction. 940-951 - Dongmei Wang, Chengzhu Yu, John H. L. Hansen:
Robust Harmonic Features for Classification-Based Pitch Estimation. 952-964 - Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Bo Li, Arun Narayanan, Ehsan Variani, Michiel Bacchiani, Izhak Shafran, Andrew W. Senior, Kean K. Chin, Ananya Misra, Chanwoo Kim:
Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition. 965-979 - Hanieh Khalilian, Ivan V. Bajic, Rodney G. Vaughan:
A Simulation Study of a Three-Dimensional Sound Field Reproduction System for Immersive Communication. 980-995 - Andreas Franck, Wenwu Wang, Filippo Maria Fazi:
Sparse ℓ1-Optimal Multiloudspeaker Panning and Its Relation to Vector Base Amplitude Panning. 996-1010 - Songbin Li, Yizhen Jia, C.-C. Jay Kuo:
Steganalysis of QIM Steganography in Low-Bit-Rate Speech Signals. 1011-1022 - Naoyuki Kanda, Xugang Lu, Hisashi Kawai:
Maximum-a-Posteriori-Based Decoding for End-to-End Acoustic Models. 1023-1034 - Navid Shokouhi, John H. L. Hansen:
Teager-Kaiser Energy Operators for Overlapped Speech Detection. 1035-1047 - Yi-Chin Huang, Chung-Hsien Wu, Yan-You Chen, Ming-Ge Shie, Jhing-Fa Wang:
Personalized Spontaneous Speech Synthesis Using a Small-Sized Unsegmented Semispontaneous Speech. 1048-1060 - Jeongsoo Park, Jaeyoung Shin, Kyogu Lee:
Exploiting Continuity/Discontinuity of Basis Vectors in Spectrogram Decomposition for Harmonic-Percussive Sound Separation. 1061-1074 - Xueliang Zhang, DeLiang Wang:
Deep Learning Based Binaural Speech Separation in Reverberant Environments. 1075-1084 - Masood Delfarah, DeLiang Wang:
Features for Masking-Based Monaural Speech Separation in Reverberant Conditions. 1085-1094 - Feiran Yang, Gerald Enzner, Jun Yang:
Statistical Convergence Analysis for Optimal Control of DFT-Domain Adaptive Echo Canceler. 1095-1106 - Takashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga:
Sentence Selection Based on Extended Entropy Using Phonetic and Prosodic Contexts for Statistical Parametric Speech Synthesis. 1107-1116 - Gergely Firtha, Péter Fiala, Frank Schultz, Sascha Spors:
Improved Referencing Schemes for 2.5D Wave Field Synthesis Driving Functions. 1117-1127 - Esteban Maestre, Gary P. Scavone, Julius O. Smith III:
Joint Modeling of Bridge Admittance and Body Radiativity for Efficient Synthesis of String Instrument Sound by Digital Waveguides. 1128-1139 - Gongping Huang, Jacob Benesty, Jingdong Chen:
On the Design of Frequency-Invariant Beampatterns With Uniform Circular Microphone Arrays. 1140-1153 - Zdenek Prusa, Péter Balázs, Peter L. Søndergaard:
A Noniterative Method for Reconstruction of Phase From STFT Magnitude. 1154-1164
Volume 25, Number 6, June 2017
- Gaël Richard, Tuomas Virtanen, Juan Pablo Bello, Nobutaka Ono, Hervé Glotin:
Introduction to the Special Section on Sound Scene and Event Analysis. 1169-1171 - Héctor A. Sánchez-Hevia, David Ayllón, Roberto Gil-Pita, Manuel Rosa-Zurera:
Maximum Likelihood Decision Fusion for Weapon Classification in Wireless Acoustic Sensor Networks. 1172-1182 - Nithin Rao Koluguri, G. Nisha Meenakshi, Prasanta Kumar Ghosh:
Spectrogram Enhancement Using Multiple Window Savitzky-Golay (MWSG) Filter for Robust Bird Sound Detection. 1183-1192 - Dan Stowell, Emmanouil Benetos, Lisa F. Gill:
On-Bird Sound Recordings: Automatic Acoustic Recognition of Activities and Contexts. 1193-1206 - Brandon T. Carroll, Bradley M. Whitaker, Wayne Daley, David V. Anderson:
Outlier Learning via Augmented Frozen Dictionaries. 1207-1215 - Victor Bisot, Romain Serizel, Slim Essid, Gaël Richard:
Feature Learning With Matrix Factorization Applied to Acoustic Scene Classification. 1216-1229 - Yong Xu, Qiang Huang, Wenwu Wang, Peter Foster, Siddharth Sigtia, Philip J. B. Jackson, Mark D. Plumbley:
Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging. 1230-1241 - Rene Grzeszick, Axel Plinge, Gernot A. Fink:
Bag-of-Features Methods for Acoustic Event Detection and Classification. 1242-1252 - Alain Rakotomamonjy:
Supervised Representation Learning for Audio Scene Classification. 1253-1265 - Emmanouil Benetos, Grégoire Lafay, Mathieu Lagrange, Mark D. Plumbley:
Polyphonic Sound Event Tracking Using Linear Dynamical Systems. 1266-1277 - Huy Phan, Lars Hertel, Marco Maaß, Philipp Koch, Radoslaw Mazur, Alfred Mertins:
Improved Audio Scene Classification Based on Label-Tree Embeddings and Convolutional Neural Networks. 1278-1290 - Emre Çakir, Giambattista Parascandolo, Toni Heittola, Heikki Huttunen, Tuomas Virtanen:
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. 1291-1303 - Jens Schröder, Niko Moritz, Jörn Anemüller, Stefan Goetze, Birger Kollmeier:
Classifier Architectures for Acoustic Scenes and Events: Implications for DNNs, TDNNs, and Perceptual Features from DCASE 2016. 1304-1314 - Wenjun Yang, Sridhar Krishnan:
Combining Temporal Features by Local Binary Pattern for Acoustic Scene Classification. 1315-1321 - David Dov, Ronen Talmon, Israel Cohen:
Multimodal Kernel Method for Activity Detection of Sound Sources. 1322-1334 - Keisuke Imoto, Nobutaka Ono:
Spatial Cepstrum as a Spatial Feature Using a Distributed Microphone Array for Acoustic Scene Analysis. 1335-1343 - Ivo Trowitzsch, Johannes Mohr, Youssef Kashef, Klaus Obermayer:
Robust Detection of Environmental Sounds in Binaural Auditory Scenes. 1344-1356 - Abu Shafin Mohammad Mahdee Jameel, Shaikh Anowarul Fattah, Rajib Goswami, Wei-Ping Zhu, M. Omair Ahmad:
Noise Robust Formant Frequency Estimation Method Based on Spectral Model of Repeated Autocorrelation of Speech. 1357-1370 - Na Li, Man-Wai Mak, Jen-Tzung Chien:
DNN-Driven Mixture of PLDA for Robust Speaker Verification. 1371-1383 - Kai Wu, Vaninirappuputhenpurayil Gopalan Reju, Andy W. H. Khong, Shu Ting Goh:
Swarm Intelligence Based Particle Filter for Alternating Talker Localization and Tracking Using Microphone Arrays. 1384-1397
Volume 25, Number 7, July 2017
- Yu-An Chen, Ju-Chiang Wang, Yi-Hsuan Yang, Homer H. Chen:
Component Tying for Mixture Model Adaptation in Personalization of Music Emotion Recognition. 1409-1420 - Hossein Zeinali, Hossein Sameti, Lukás Burget:
HMM-Based Phrase-Independent i-Vector Extractor for Text-Dependent Speaker Verification. 1421-1435 - Xinzhou Xu, Jun Deng, Nicholas Cummins, Zixing Zhang, Chen Wu, Li Zhao, Björn W. Schuller:
A Two-Dimensional Framework of Multiple Kernel Subspace Learning for Recognizing Emotion in Speech. 1436-1449 - Mandy Korpusik, James R. Glass:
Spoken Language Understanding for a Nutrition Dialogue System. 1450-1461 - Mahmoud Fakhry, Piergiorgio Svaizer, Maurizio Omologo:
Audio Source Separation in Reverberant Environments Using β-Divergence-Based Nonnegative Factorization. 1462-1476 - Bracha Laufer-Goldshtein, Ronen Talmon, Sharon Gannot:
Semi-Supervised Source Localization on Multiple Manifolds With Distributed Microphones. 1477-1491 - Donald S. Williamson, DeLiang Wang:
Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising. 1492-1501 - Liang Lu, Steve Renals:
Small-Footprint Highway Deep Neural Networks for Speech Recognition. 1502-1511 - Ina Kodrasi, Simon Doclo:
Signal-Dependent Penalty Functions for Robust Acoustic Multi-Channel Equalization. 1512-1525 - Jung-Hee Kim, Jin Kim, Jae Hyeon Jeon, Sang Won Nam:
Delayless Individual-Weighting-Factors Sign Subband Adaptive Filter With Band-Dependent Variable Step-Sizes. 1526-1534 - Yannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee:
A Gender Mixture Detection Approach to Unsupervised Single-Channel Speech Separation Based on Deep Neural Networks. 1535-1546 - Giacomo Vairetti, Enzo De Sena, Michael Catrysse, Søren Holdt Jensen, Marc Moonen, Toon van Waterschoot:
A Scalable Algorithm for Physically Motivated and Sparse Approximation of Room Impulse Responses With Orthonormal Basis Functions. 1547-1561
Volume 25, Number 8, August 2017
- Francis Stevens, Damian T. Murphy, Lauri Savioja, Vesa Välimäki:
Modeling Sparsely Reflecting Outdoor Acoustic Scenes Using the Waveguide Web. 1566-1578 - Ferdinando Olivieri, Filippo Maria Fazi, Simone Fontana, Dylan Menzies, Philip Arthur Nelson:
Generation of Private Sound With a Circular Loudspeaker Array and the Weighted Pressure Matching Method. 1579-1591 - Samy Elshamy, Nilesh Madhu, Wouter Tirry, Tim Fingscheidt:
Instantaneous A Priori SNR Estimation by Cepstral Excitation Manipulation. 1592-1605 - Paavo Alku, Rahim Saeidi:
The Linear Predictive Modeling of Speech From Higher-Lag Autocorrelation Coefficients Applied to Noise-Robust Speaker Recognition. 1606-1617 - Cheng Pang, Hong Liu, Jie Zhang, Xiaofei Li:
Binaural Sound Localization Based on Reverberation Weighting and Generalized Parametric Mapping. 1618-1632 - Somanath Pradhan, Vinal Patel, Dipen Somani, Nithin V. George:
An Improved Proportionate Delayless Multiband-Structured Subband Adaptive Feedback Canceller for Digital Hearing Aids. 1633-1643 - Szymon Drgas, Tuomas Virtanen, Jörg Lücke, Antti Hurmalainen:
Binary Non-Negative Matrix Deconvolution for Audio Dictionary Learning. 1644-1656 - Fatemeh Saki, Nasser Kehtarnavaz:
Real-Time Unsupervised Classification of Environmental Noise Signals. 1657-1667 - Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen:
Automatic Sentiment Detection in Naturalistic Audio. 1668-1679 - Ofer Schwartz, Sharon Gannot, Emanuël A. P. Habets:
Cramér-Rao Bound Analysis of Reverberation Level Estimators for Dereverberation and Noise Reduction. 1680-1693 - Seyran Khademi, Richard C. Hendriks, W. Bastiaan Kleijn:
Intelligibility Enhancement Based on Mutual Information. 1694-1708 - Yuta Hatano, Chuang Shi, Yoshinobu Kajikawa:
Compensation for Nonlinear Distortion of the Frequency Modulation-Based Parametric Array Loudspeaker. 1709-1717 - Yu-Ren Chien, Daryush D. Mehta, Jón Guðnason, Matías Zanartu, Thomas F. Quatieri:
Evaluation of Glottal Inverse Filtering Algorithms Using a Physiologically Based Articulatory Speech Synthesizer. 1718-1730
Volume 25, Number 9, September 2017
- Jakob Abeßer, Gerald Schuller:
Instrument-Centered Music Transcription of Solo Bass Guitar Recordings. 1741-1750 - Thomas Le Cornu, Ben Milner:
Generating Intelligible Audio Speech From Visual Speech. 1751-1761 - Lemao Liu, Atsushi Fujita, Masao Utiyama, Andrew M. Finch, Eiichiro Sumita:
Translation Quality Estimation Using Only Bilingual Corpora. 1762-1772 - Emad M. Grais, Gerard Roma, Andrew J. R. Simpson, Mark D. Plumbley:
Two-Stage Single-Channel Audio Source Separation Using Deep Neural Networks. 1773-1783 - Giuliano Bernardi, Toon van Waterschoot, Jan Wouters, Marc Moonen:
Adaptive Feedback Cancellation Using a Partitioned-Block Frequency-Domain Kalman Filter Approach With PEM-Based Signal Prewhitening. 1784-1798 - Vinal Patel, Jordan Cheer, Nithin V. George:
Modified Phase-Scheduled-Command FxLMS Algorithm for Active Sound Profiling. 1799-1808 - Killian Janod, Mohamed Morchid, Richard Dufour, Georges Linarès, Renato De Mori:
Denoised Bottleneck Features From Deep Autoencoders for Telephone Conversation Analysis. 1809-1820 - Nikolaos Stefanakis, Despoina Pavlidi, Athanasios Mouchtaris:
Perpendicular Cross-Spectra Fusion for Sound Source Localization With a Planar Microphone Array. 1821-1835 - Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:
Simultaneous Optimization of Multiple Tree-Based Factor Analyzed HMM for Speech Synthesis. 1836-1845 - Eita Nakamura, Kazuyoshi Yoshii, Simon Dixon:
Note Value Recognition for Piano Transcription Using Markov Random Fields. 1846-1858
Volume 25, Number 10, October 2017
- Xiaohai Tian, Siu Wa Lee, Zhizheng Wu, Eng Siong Chng, Haizhou Li:
An Exemplar-Based Approach to Frequency Warping for Voice Conversion. 1863-1876 - Siying Wang, Sebastian Ewert, Simon Dixon:
Identifying Missing and Extra Notes in Piano Recordings Using Score-Informed Dictionary Learning. 1877-1889 - Sandro Cumani, Pietro Laface:
Joint Estimation of PLDA and Nonlinear Transformations of Speaker Vectors. 1890-1900 - Morten Kolbaek, Dong Yu, Zheng-Hua Tan, Jesper Jensen:
Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks. 1901-1913 - Cheng-Tao Chung, Cheng-Yu Tsai, Chia-Hsiang Liu, Lin-Shan Lee:
Unsupervised Iterative Deep Learning of Speech Features and Acoustic Tokens with Applications to Spoken Term Detection. 1914-1928 - Niccolò Antonello, Enzo De Sena, Marc Moonen, Patrick A. Naylor, Toon van Waterschoot:
Room Impulse Response Interpolation Using a Sparse Spatio-Temporal Representation of the Sound Field. 1929-1941 - Yanmin Qian, Nanxin Chen, Heinrich Dinkel, Zhizheng Wu:
Deep Feature Engineering for Noise Robust Spoofing Detection. 1942-1955 - Sina Hafezi, Alastair H. Moore, Patrick A. Naylor:
Augmented Intensity Vectors for Direction of Arrival Estimation in the Spherical Harmonic Domain. 1956-1968 - Byeongho Jo, Jung-Woo Choi:
Spherical Harmonic Smoothing for Localizing Coherent Sound Sources. 1969-1984 - Emma Jokinen, Ulpu Remes, Paavo Alku:
Intelligibility Enhancement of Telephone Speech Using Gaussian Process Regression for Normal-to-Lombard Spectral Tilt Conversion. 1985-1996 - Xiaofei Li, Laurent Girin, Radu Horaud, Sharon Gannot:
Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization With Spatial Sparsity Regularization. 1997-2012 - Marc Arnela, Oriol Guasch:
Finite Element Synthesis of Diphthongs Using Tuned Two-Dimensional Vocal Tracts. 2013-2023 - Deepak Baby, Hugo Van hamme:
Joint Denoising and Dereverberation Using Exemplar-Based Sparse Representations and Decaying Norm Constraint. 2024-2035
Volume 25, Number 11, November 2017
- Qinghua Huang, Lin Zhang, Yong Fang:
Two-Stage Decoupled DOA Estimation Based on Real Spherical Harmonics for Spherical Arrays. 2045-2058 - Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, Kazuya Takeda:
Duration-Controlled LSTM for Polyphonic Sound Event Detection. 2059-2070 - Monisankha Pal, Goutam Saha:
Spectral Mapping Using Prior Re-Estimation of i-Vectors and System Fusion for Voice Conversion. 2071-2084 - Seppo Enarvi, Peter Smit, Sami Virpioja, Mikko Kurimo:
Automatic Speech Recognition With Very Large Conversational Finnish and Estonian Vocabularies. 2085-2097 - Hannah Muckenhirn, Pavel Korshunov, Mathew Magimai-Doss, Sébastien Marcel:
Long-Term Spectral Statistics for Voice Presentation Attack Detection. 2098-2111 - Brian Hamilton, Stefan Bilbao:
FDTD Methods for 3-D Room Acoustics Simulation With High-Order Accuracy in Space and Time. 2112-2124 - Pejman Mowlaee, Martin Blass, W. Bastiaan Kleijn:
New Results in Modulation-Domain Single-Channel Speech Enhancement. 2125-2137 - Dylan Menzies, Filippo Maria Fazi:
Decoding and Compression of Channel and Scene Objects for Spatial Audio. 2138-2151 - Eunwoo Song, Frank K. Soong, Hong-Goo Kang:
Effective Spectral and Excitation Modeling Techniques for LSTM-RNN-Based Speech Synthesis Systems. 2152-2161 - Pulkit Sharma, Vinayak Abrol, Anil Kumar Sao:
Deep-Sparse-Representation-Based Features for Speech Recognition. 2162-2175 - Iynkaran Natgunanathan, Yong Xiang, Guang Hua, Gleb Beliakov, John Yearwood:
Patchwork-Based Multilayer Audio Watermarking. 2176-2187 - Chengzhu Yu, John H. L. Hansen:
Active Learning Based Constrained Clustering For Speaker Diarization. 2188-2198 - Emil Solsbæk Ottosen, Monika Dörfler:
A Phase Vocoder Based on Nonstationary Gabor Frames. 2199-2208 - Boaz Schwartz, Sharon Gannot, Emanuël A. P. Habets:
Two Model-Based EM Algorithms for Blind Source Separation in Noisy Environments. 2209-2222 - Maja Taseska, Emanuël A. P. Habets:
Nonstationary Noise PSD Matrix Estimation for Multichannel Blind Speech Extraction. 2223-2236 - Bruno Di Giorgi, Simon Dixon, Massimiliano Zanoni, Augusto Sarti:
A Data-Driven Model of Tonal Chord Sequence Complexity. 2237-2250 - Nikolaos Stefanakis, Despoina Pavlidi, Athanasios Mouchtaris:
Corrections to "Perpendicular Cross-Spectra Fusion for Sound Source Localization With a Planar Microphone Array". 2251
Volume 25, Number 12, December 2017
- Tanja Schultz, Thomas Hueber, Dean J. Krusienski, Jonathan S. Brumberg:
Introduction to the Special Issue on Biosignal-Based Spoken Communication. 2254-2256 - Tanja Schultz, Michael Wand, Thomas Hueber, Dean J. Krusienski, Christian Herff, Jonathan S. Brumberg:
Biosignal-Based Spoken Communication: A Survey. 2257-2271 - Christopher Dromey, Katherine M. Black:
Effects of Laryngeal Activity on Articulation. 2272-2280 - Michal Borsky, Daryush D. Mehta, Jarrad H. Van Stan, Jón Guðnason:
Modal and Nonmodal Voice Quality Classification Using Acoustic and Electroglottographic Features. 2281-2291 - Alborz Rezazadeh Sereshkeh, Robert E. Trott, Aurélien Bricout, Tom Chau:
EEG Classification of Covert Speech Using Regularized Neural Networks. 2292-2300 - Reza Sahraeian, Dirk Van Compernolle:
Crosslingual and Multilingual Speech Recognition Based on the Speech Manifold. 2301-2312 - Dorde T. Grozdic, Slobodan T. Jovicic:
Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering. 2313-2322 - Myung Jong Kim, Beiming Cao, Ted Mau, Jun Wang:
Speaker-Independent Silent Speech Recognition From Flesh-Point Articulatory Movements Using an LSTM Neural Network. 2323-2336 - Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda:
Articulatory Controllable Speech Modification Based on Statistical Inversion and Production Mappings. 2337-2350 - Ingmar Steiner, Sébastien Le Maguer, Alexander Hewer:
Synthesis of Tongue Motion and Acoustics From Text Using a Multimodal Articulatory Database. 2351-2361 - José A. González, Lam Aun Cheah, Angel M. Gomez, Phil D. Green, James M. Gilbert, Stephen R. Ell, Roger K. Moore, Ed Holdsworth:
Direct Speech Reconstruction From Articulatory Sensor Data by Machine Learning. 2362-2374 - Matthias Janke, Lorenz Diener:
EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals. 2375-2385 - Geoffrey S. Meltzner, James T. Heaton, Yunbin Deng, Gianluca De Luca, Serge H. Roy, Joshua C. Kline:
Silent Speech Recognition as an Alternative Communication Device for Persons With Laryngectomy. 2386-2398 - Fei Chen, Lan Wang, Hui Chen, Gang Peng:
Investigations on Mandarin Aspiratory Animations Using an Airflow Model. 2399-2409 - Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Michael L. Seltzer, Andreas Stolcke, Dong Yu, Geoffrey Zweig:
Toward Human Parity in Conversational Speech Recognition. 2410-2423 - Biao Zhang, Deyi Xiong, Jinsong Su, Hong Duan:
A Context-Aware Recurrent Encoder for Neural Machine Translation. 2424-2432 - Afsaneh Asaei, Milos Cernak, Hervé Bourlard:
Perceptual Information Loss due to Impaired Speech Production. 2433-2443 - Ning Ma, Tobias May, Guy J. Brown:
Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments. 2444-2453
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.