Skip to content
View QinHsiu's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report QinHsiu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Awesome-TTS

some amazing TTS projects
122 repositories

Smarter data pipelines for audio.

Python 866 52 Updated Jan 10, 2024

This is a pytorch implementation of the paper: StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks

Python 523 92 Updated Oct 11, 2019

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

Python 321 59 Updated Jul 25, 2024

A single Gradio + React WebUI with extensions for ACE-Step, Kimi Audio, Piper TTS, GPT-SoVITS, CosyVoice, XTTSv2, DIA, Kokoro, OpenVoice, ParlerTTS, Stable Audio, MMS, StyleTTS2, MAGNet, AudioGen, …

TypeScript 2,840 295 Updated Nov 23, 2025

Site for sharing MusicGen + AudioGen Prompts and Creations

TypeScript 48 5 Updated Mar 25, 2025

How to use our public wav2vec2 dimensional emotion model

Jupyter Notebook 532 50 Updated May 22, 2023

Official implementation for the paper Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition

Python 153 34 Updated Oct 26, 2021

Deep learning for audio denoising

Python 742 130 Updated Oct 15, 2023

A multi-voice TTS system trained with an emphasis on quality

Jupyter Notebook 14,744 2,048 Updated Nov 19, 2024

Noise supression using deep filtering

Python 3,654 373 Updated Oct 17, 2024

🔦 A Pytorch implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Jupyter Notebook 499 61 Updated Jun 11, 2021

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,357 3,243 Updated Dec 24, 2025

A self-supervised framework for Text-to-Speech

Python 1 Updated Nov 5, 2023

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Python 194 26 Updated Nov 9, 2022

A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.

Python 805 196 Updated Apr 6, 2023

Code and dataset for photorealistic Codec Avatars driven from audio

Python 2,846 280 Updated Sep 15, 2024

A sound cloning tool with a web interface, using your voice or any sound to record audio / 一个带web界面的声音克隆工具,使用你的音色或任意声音来录制音频

Python 8,863 976 Updated Aug 29, 2025

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,561 772 Updated May 27, 2025

Faster Whisper transcription with CTranslate2

Python 19,619 1,638 Updated Nov 19, 2025

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Python 20,949 4,906 Updated Dec 18, 2025

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 53,476 5,855 Updated Dec 25, 2025

A generative speech model for daily dialogue.

Python 38,397 4,170 Updated Dec 3, 2025

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,502 302 Updated Nov 5, 2024

Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)

3,105 513 Updated Oct 19, 2023

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

Python 2,841 534 Updated Mar 24, 2023

Production First and Production Ready End-to-End Speech Recognition Toolkit

Python 4,971 1,170 Updated Dec 19, 2025

Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching

Python 4,158 689 Updated Dec 13, 2025

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 2,016 155 Updated Apr 21, 2025

SOTA Open Source TTS

Python 24,406 2,006 Updated Dec 1, 2025

一个简单的本地网页界面,使用ChatTTS将文字合成为语音,同时支持对外提供API接口。A simple native web interface that uses ChatTTS to synthesize text into speech, along with support for external API interfaces.

Python 7,467 914 Updated Dec 5, 2025