Starred repositories
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
Versatile audio super resolution (any -> 48kHz) with AudioSR.
Breeze ASR 25 是一款先進的自動語音辨識(ASR)模型,基於 Whisper-large-v2 微調而成,特別針對台灣華語以及華語與英語混用的情境進行優化。Breeze ASR 25 is an advanced ASR model fine-tuned from Whisper-large-v2, optimized for Taiwanese Mandarin and Man…
This repository focuses on leveraging OpenAI's Whisper model for speech recognition in Chinese (Mandarin) and Taiwanese Hokkien languages. It includes tools and scripts for data preprocessing, mode…
SpeechGPT Series: Speech Large Language Models
Controllable and fast Text-to-Speech for over 7000 languages!
[IJCAI'23] Learning to Speak from Text for Low-Resource TTS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
🔊 Text-Prompted Generative Audio Model
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
Official repository of SepReformer for speech separation
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
A simple implementation for improving CosyVoice2 by GRPO method
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
Text-audio foundation model from Boson AI
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
Inference and training library for high-quality TTS models.
Official code for"DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation"
Tracking the progress in end-to-end speech translation
Official Repository of Paper: "SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding" (ICASSP 2026)
Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs