Stars
Foundational model for human-like, expressive TTS
Programmer's guide about how to cook at home.
This repository contains the code used to run experiments on the multi-swap K-means++ algorithm from https://arxiv.org/pdf/2309.16384.
PyTorch Implementation of FastSpeech 2 : Fast and High-Quality End-to-End Text to Speech
Official repository for "Speaking Style Conversion With Discrete Self-Supervised Units" (EMNLP 2023). https://arxiv.org/abs/2212.09730
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
大模型算法岗面试题(含答案):常见问题和概念解析 "大模型面试题"、"算法岗面试"、"面试常见问题"、"大模型算法面试"、"大模型应用基础"
SCORE: Self-supervised Correspondence Fine-tuning for Improved Content Representations
Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering"
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
PyTorch Implementation of StyleSinger(AAAI 2024): Style Transfer for Out-of-Domain Singing Voice Synthesis
[INTERSPEECH 2024] The official implementation of EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
[TAFFC 2025] The official implementation of EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
This is the implementation of the paper "Emotion Intensity and its Control for Emotional Voice Conversion".
CUDA and Triton implementations of Flash Attention with SoftmaxN.
This is the implementation our Interspeech 2022 paper " Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion".
[CVPR2024] Official implementation of the paper "Z∗: Zero-shot Style Transfer via Attention Rearrangement" a.k.a. "Z∗: Zero-shot Style Transfer via Attention Reweighting"
TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion
The offical repository of "IPMix: Label-Preserving Data Augmentation Method for Training Robust Classifiers"
The implementation of paper "SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody"
Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion
QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions