Awesome-TTS
Official implementation of the source-filter HiFiGAN vocoder
A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, …
Microsoft Text-to-Speech API sample code in several languages, part of Cognitive Services.
Audio samples accompanying publications related to Tacotron, an end-to-end speech synthesis model.
feature extraction from speech signals
Official implementation for the paper: A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Units.
Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models
code for paper "Cross-modal Contrastive Learning for Speech Translation" (NAACL 2022)
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a ca…
Deezer source separation library including pretrained models.
Keep track of big models in audio domain, including speech, singing, music etc.
Controllable and fast Text-to-Speech for over 7000 languages!
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
A book about Text-to-Speech (TTS) in Chinese.
unofficial vits2-TTS implementation in pytorch
PITS: Variational Pitch Inference for End-to-end Pitch-controllable TTS without External Pitch Predictor
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
Audio generation using diffusion models, in PyTorch.
A collection of useful audio datasets and transforms for PyTorch.
A timeline of the latest AI models for audio generation, starting in 2023!
A Non-Autoregressive End-to-End Text-to-Speech (text-to-wav), supporting a family of SOTA unsupervised duration modelings. This project grows with the research community, aiming to achieve the ulti…
A Survey on Neural Speech Synthesis https://arxiv.org/pdf/2106.15561.pdf
Style-Controllable Zero-Shot Text to Speech Synthesizer based on VALL-E
Multilingual G2P in 100 languages
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/