default search action
10th SSW 2019: Vienna, Austria
- Michael Pucher:
10th ISCA Speech Synthesis Workshop, SSW 2019, Vienna, Austria, September 20-22, 2019. ISCA 2019
Keynote 1
- Aäron van den Oord:
Deep learning for speech synthesis.
Oral 1: Neural vocoder
- Xin Wang, Junichi Yamagishi:
Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis. 1-6 - Prachi Govalkar, Johannes Fischer, Frank Zalkow, Christian Dittmar:
A Comparison of Recent Neural Vocoders for Speech Signal Reconstruction. 7-12 - Keiichiro Oura, Kazuhiro Nakamura, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda:
Deep neural network based real-time speech vocoder with periodic and aperiodic inputs. 13-18 - Qiao Tian, Xucheng Wan, Shan Liu:
Generative Adversarial Network based Speaker Adaptation for High Fidelity WaveNet Vocoder. 19-23
Oral 2: Adaptation
- Qiong Hu, Erik Marchi, David Winarsky, Yannis Stylianou, Devang Naik, Sachin Kajarekar:
Neural Text-to-Speech Adaptation from Low Quality Public Recordings. 24-28 - Bastian Schnell, Philip N. Garner:
Neural VTLN for Speaker Adaptation in TTS. 29-34 - David Álvarez, Santiago Pascual, Antonio Bonafonte:
Problem-Agnostic Speech Embeddings for Multi-Speaker Text-to-Speech with SampleRNN. 35-39
Poster 1: Voice conversion and multi-speaker TTS
- Hiroki Kanagawa, Yusuke Ijima:
Multi-Speaker Modeling for DNN-based Speech Synthesis Incorporating Generative Adversarial Networks. 40-44 - Ivan Himawan, Sandesh Aryal, Iris Ouyang, Shukhan Ng, Pierre Lanchantin:
Speaker Adaptation of Acoustic Model using a Few Utterances in DNN-based Speech Synthesis Systems. 45-50 - Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis. 51-56 - Wen-Chin Huang, Yi-Chiao Wu, Kazuhiro Kobayashi, Yu-Huai Peng, Hsin-Te Hwang, Patrick Lumban Tobing, Yu Tsao, Hsin-Min Wang, Tomoki Toda:
Generalization of Spectrum Differential based Direct Waveform Modification for Voice Conversion. 57-62 - Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda:
Statistical Voice Conversion with Quasi-periodic WaveNet Vocoder. 63-68 - Hitoshi Suda, Daisuke Saito, Nobuaki Minematsu:
Voice Conversion without Explicit Separation of Source and Filter Components Based on Non-negative Matrix Factorization. 69-74 - Gaku Kotani, Daisuke Saito:
Voice conversion based on full-covariance mixture density networks for time-variant linear transformations. 75-80 - Tobias Gburrek, Thomas Glarner, Janek Ebbers, Reinhold Haeb-Umbach, Petra Wagner:
Unsupervised Learning of a Disentangled Speech Representation for Voice Conversion. 81-86 - Maitreya Patel, Mihir Parmar, Savan Doshi, Nirmesh Shah, Hemant A. Patil:
Novel Inception-GAN for Whispered-to-Normal Speech Conversion. 87-92 - Riku Arakawa, Shinnosuke Takamichi, Hiroshi Saruwatari:
Implementation of DNN-based real-time voice conversion and its improvements by audio data augmentation and mask-shaped device. 93-98
Keynote 2
- W. Tecumseh Fitch, Bart de Boer:
Synthesizing animal vocalizations and modelling animal speech.
Oral 3: Evaluation and performance
- Rob Clark, Hanna Silén, Tom Kenter, Ralph Leith:
Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs. 99-104 - Petra Wagner, Jonas Beskow, Simon Betz, Jens Edlund, Joakim Gustafson, Gustav Eje Henter, Sébastien Le Maguer, Zofia Malisz, Éva Székely, Christina Tånnander, Jana Voße:
Speech Synthesis Evaluation - State-of-the-Art Assessment and Suggestion for a Novel Research Program. 105-110 - Shuhei Kato, Yusuke Yasuda, Xin Wang, Erica Cooper, Shinji Takaki, Junichi Yamagishi:
Rakugo speech synthesis using segment-to-segment neural transduction and style tokens - toward speech synthesis for entertaining audiences. 111-116 - Matthew P. Aylett, David A. Braude, Christopher J. Pidcock, Blaise Potard:
Voice Puppetry: Exploring Dramatic Performance to Develop Speech Synthesis. 117-120
Oral 4: Speech science
- Avashna Govender, Cassia Valentini-Botinhao, Simon King:
Measuring the contribution to cognitive load of each predicted vocoder speech parameter in DNN-based speech synthesis. 121-126 - Lorenz Gutscher, Michael Pucher, Carina Lozo, Marisa Hoeschele, Daniel C. Mann:
Statistical parametric synthesis of budgerigar songs. 127-131 - Marc Freixes, Marc Arnela, Francesc Alías, Joan Claudi Socoró:
GlottDNN-based spectral tilt analysis of tense voice emotional styles for the expressive 3D numerical synthesis of vowel [a]. 132-136
Poster 2: Applications and practical issues
- Christina Tånnander, Jens Edlund:
Preliminary guidelines for the efficient management of OOV words for spoken text. 137-142 - Noriyuki Matsunaga, Yamato Ohtani, Tatsuya Hirahara:
Loss Function Considering Temporal Sequence for Feed-Forward Neural Network-Fundamental Frequency Case. 143-148 - Tomoki Koriyama, Shinnosuke Takamichi, Takao Kobayashi:
Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis. 149-154 - Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen, Massimiliano Todisco, Nicholas W. D. Evans, Jean-François Bonastre:
Speaker Anonymization Using X-vector and Neural Waveform Models. 155-160 - Taiki Nakamura, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Hiroshi Saruwatari:
V2S attack: building DNN-based voice conversion from automatic speaker verification. 161-165 - Takato Fujimoto, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:
Impacts of input linguistic feature representation on Japanese end-to-end speech synthesis. 166-171 - Nobuyuki Nishizawa, Tomohiro Obara, Gen Hattori:
Evaluation of Block-Wise Parameter Generation for Statistical Parametric Speech Synthesis. 172-176 - Motoki Shimada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:
Low computational cost speech synthesis based on deep neural networks using hidden semi-Markov model structures. 177-182 - Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura:
Neural iTTS: Toward Synthesizing Speech in Real-time with End-to-end Neural Text-to-Speech Framework. 183-188
Keynote 3
- Claire Gardent:
Natural Language Generation: Creating Text.
Oral 5: Language and dialect varieties
- Aye Mya Hlaing, Win Pa Pa, Ye Kyaw Thu:
Enhancing Myanmar Speech Synthesis with Linguistic Information and LSTM-RNN. 189-193 - Anusha Prakash, Anju Leela Thomas, Srinivasan Umesh, Hema A. Murthy:
Building Multilingual End-to-End Speech Synthesisers for Indian Languages. 194-199 - Michael Pucher, Carina Lozo, Philip Vergeiner, Dominik Wallner:
Diphthong interpolation, phone mapping, and prosody transfer for speech synthesis of similar dialect pairs. 200-204 - Elshadai Tesfaye Biru, Yishak Tofik Mohammed, David Tofu, Erica Cooper, Julia Hirschberg:
Subset Selection, Adaptation, Gemination and Prosody Prediction for Amharic Text-to-Speech Synthesis. 205-210
Oral 6: Sequence to sequence model
- Yusuke Yasuda, Xin Wang, Junichi Yamagishi:
Initial investigation of encoder-decoder end-to-end TTS using marginalization of monotonic hard alignments. 211-216 - Oliver Watts, Gustav Eje Henter, Jason Fong, Cassia Valentini-Botinhao:
Where do the improvements come from in sequence-to-sequence neural TTS? 217-222 - Jason Fong, Jason Taylor, Korin Richmond, Simon King:
A Comparison of Letters and Phones as Input to Sequence-to-Sequence Models for Speech Synthesis. 223-227
Poster 3: Prosody
- Yuma Shirahata, Daisuke Saito, Nobuaki Minematsu:
Generative Modeling of F0 Contours Leveraged by Phrase Structure and Its Application to Statistical Focus Control. 228-233 - Masashi Aso, Shinnosuke Takamichi, Norihiro Takamune, Hiroshi Saruwatari:
Subword tokenization based on DNN-based acoustic model for end-to-end prosody generation. 234-238 - Zack Hodari, Oliver Watts, Simon King:
Using generative modelling to produce varied intonation for speech synthesis. 239-244 - Éva Székely, Gustav Eje Henter, Jonas Beskow, Joakim Gustafson:
How to train your fillers: uh and um in spontaneous speech synthesis. 245-250 - Mohammad Eshghi, Kou Tanaka, Kazuhiro Kobayashi, Hirokazu Kameoka, Tomoki Toda:
An Investigation of Features for Fundamental Frequency Pattern Prediction in Electrolaryngeal Speech Enhancement. 251-256 - Zofia Malisz, Harald Berthelsen, Jonas Beskow, Joakim Gustafson:
PROMIS: a statistical-parametric speech synthesis system with prominence control via a prominence network. 257-262 - Raul Fernandez:
Deep Mixture-of-Experts Models for Synthetic Prosodic-Contour Generation. 263-268 - Rose Sloan, Syed Sarfaraz Akhtar, Bryan Li, Ritvik Shrivastava, Agustín Gravano, Julia Hirschberg:
Prosody Prediction from Syntactic, Lexical, and Word Embedding Features. 269-274 - Slava Shechtman, Alexander Sorin:
Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities. 275-280
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.