Stars
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenario…
Grapheme-to-Phoneme transductions that preserve input and output indices, and support cross-lingual g2p!
CjangCjengh / vits
Forked from jaywalnut310/vitsVITS implementation of Japanese, Chinese, Korean, Sanskrit and Thai
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Japanese Converter Kanji to Hiragana, Katakana, Roma-ji
Official code for"DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation"
Implementing DeepSeek R1's GRPO algorithm from scratch
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset
Chinese polyphone disambiguation for Text-to-Speech application
High-quality speech synthesis with LoRA fine-tuning on index-tts, enhancing prosody and naturalness for single and multi-speaker voices.
LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
Production First and Production Ready End-to-End Text-to-Speech Toolkit
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
Official Repository of Paper: "SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding" (ICASSP 2026)
Official implementation of the TTS model Lina-Speech
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning