Stars
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
A python package to analyze and compare voices with deep learning
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
Speech emotion recognition implemented in Keras (LSTM, CNN, SVM, MLP) | 语音情感识别
AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
Unsupervised Speech Decomposition Via Triple Information Bottleneck
speech emotion recognition using a convolutional recurrent networks based on IEMOCAP
A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Implementation of "Perceptual Losses for Real-Time Style Transfer and Super-Resolution" in PyTorch
End-to-End Automatic Speech Recognition on PyTorch
Implementation code of non-parallel sequence-to-sequence VC
Any-to-any voice conversion by end-to-end extracting and fusing fine-grained voice fragments with attention
This repository contains code to replicate results from the ICASSP 2020 paper "StarGAN for Emotional Speech Conversion: Validated by Data Augmentation of End-to-End Emotion Recognition".
This is the implementation of the Speaker Odyssey 2020 paper " Transforming spectrum and prosody for emotional voice conversion with non-parallel training data".
This is the official implementation of the paper AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization.
A pytorch based end2end speech recognition system.
This is the implementation of our Interspeech 2020 paper "Converting anyone's emotion: towards speaker-independent emotional voice conversion".
This is the implementation of our Interspeech 2021 paper: Limited data emotional voice conversion leveraging text-to-speech: two-stage sequence-to-sequence training.
A system works on singing voice synthesis
Implementation of the paper "Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning" From INTERSPEECH 2019
End-to-end Speech Emotion Recognition using BLSTMs with self-attention and Multi-domain training
This is the code for controllable EVC framework for seen and unseen emotion generation.
3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition.
ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.