Skip to content
View 99-song's full-sized avatar

Block or report 99-song

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Jupyter Notebook 1,197 174 Updated Dec 8, 2025

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 18,144 2,016 Updated Dec 17, 2025

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 476 64 Updated Dec 18, 2025

Collection of pretrained models for the Montreal Forced Aligner

Python 179 26 Updated Oct 6, 2025

Command line utility for forced alignment using Kaldi

Python 1,698 273 Updated Nov 15, 2025

Extract phoneme-level timestamps from speeh audio.

Python 103 9 Updated Oct 30, 2025

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,554 772 Updated May 27, 2025

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

Python 1,631 208 Updated Jun 23, 2025

collection of diffusion model papers categorized by their subareas

2,087 95 Updated Dec 19, 2025

Official implementation of "MoMask: Generative Masked Modeling of 3D Human Motions (CVPR2024)"

Python 1,184 97 Updated Sep 13, 2024

Text-to-Audio/Music Generation

Python 2,541 202 Updated Sep 29, 2024

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 32,154 6,626 Updated Dec 19, 2025

Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.

Jupyter Notebook 783 77 Updated Sep 25, 2024

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 6,096 644 Updated Aug 10, 2024