Stars
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Command line utility for forced alignment using Kaldi
Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder
Official implementation of "MoMask: Generative Masked Modeling of 3D Human Motions (CVPR2024)"
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Collection of pretrained models for the Montreal Forced Aligner
Extract phoneme-level timestamps from speeh audio.
Official implementation of paper: Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis