Stars
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Simple text to phones converter for multiple languages
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)
Boosting your Web Services of Deep Learning Applications.
Speech emotion recognition implemented in Keras (LSTM, CNN, SVM, MLP) | 语音情感识别
Artificial Neural Engine Machine Learning Library
Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase
Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
An implementation of Performer, a linear attention-based transformer, in Pytorch
In defence of metric learning for speaker recognition
Official PyTorch implementation of BigVGAN (ICLR 2023)
Audio processing by using pytorch 1D convolution network
[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
PyTorch implementation of normalizing flow models
Collection of audio-focused loss functions in PyTorch
Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)
Flexible audio loudness meter in Python with implementation of ITU-R BS.1770-4 loudness algorithm
Fast CUDA implementation of (differentiable) soft dynamic time warping for PyTorch
Chinese text normalization for speech processing
A Generative Flow for Text-to-Speech via Monotonic Alignment Search
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
unofficial vits2-TTS implementation in pytorch