Stars
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Simple text to phones converter for multiple languages
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)
Boosting your Web Services of Deep Learning Applications.
Artificial Neural Engine Machine Learning Library
Speech emotion recognition implemented in Keras (LSTM, CNN, SVM, MLP) | 语音情感识别
Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase
An implementation of Performer, a linear attention-based transformer, in Pytorch
Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
In defence of metric learning for speaker recognition
Official PyTorch implementation of BigVGAN (ICLR 2023)
Audio processing by using pytorch 1D convolution network
[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
PyTorch implementation of normalizing flow models
Collection of audio-focused loss functions in PyTorch
Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)
Flexible audio loudness meter in Python with implementation of ITU-R BS.1770-4 loudness algorithm
Fast CUDA implementation of (differentiable) soft dynamic time warping for PyTorch
Chinese text normalization for speech processing
A Generative Flow for Text-to-Speech via Monotonic Alignment Search
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on