Stars
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Collection of pretrained models for the Montreal Forced Aligner
Command line utility for forced alignment using Kaldi
Extract phoneme-level timestamps from speeh audio.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder
collection of diffusion model papers categorized by their subareas
Official implementation of "MoMask: Generative Masked Modeling of 3D Human Motions (CVPR2024)"
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models