Stars
Pytorch implementation of the CREPE pitch tracker
UniSpeech - Large Scale Self-Supervised Learning for Speech
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement
This repository is an implementation of this article: https://arxiv.org/pdf/2107.03312.pdf
This is the official source for our ICCV 2023 paper "EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation"
PyTorch Implementation for Paper "Emotionally Enhanced Talking Face Generation" (ICCVW'23 and ACM-MMW'23)
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)
A Python module for continuous wavelet spectral analysis. It includes a collection of routines for wavelet transform and statistical analysis via FFT algorithm. In addition, the module also include…
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
Official PyTorch implementation of Contrastive Learning of Musical Representations
PyTorch implementation of the wavelet analysis from Torrence & Compo (1998)
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Official implementation of SawSing (ISMIR'22)
Pitch Estimating Neural Networks (PENN)
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Official implementation of the source-filter HiFiGAN vocoder
🐤 Nix-TTS: Lightweight and End-to-end Text-to-Speech via Module-wise Distillation
Official implementation of Meta-StyleSpeech and StyleSpeech
g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese
Audio transformations library for PyTorch
Unofficial implementation of NVIDIA P-Flow TTS paper
An implementation of SoftDTW for PyTorch.
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Korean Sentence Embedding Repository
The official code repo for "Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data", in AAAI 2022
Streaming and Fine-tuning for Chatterbox TTS