Skip to content
View choiHkk's full-sized avatar

Block or report choiHkk

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
167 stars written in Python
Clear filter

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Python 9,412 1,321 Updated Apr 24, 2024

so-vits-svc fork with realtime support, improved interface and more features.

Python 9,175 1,221 Updated Nov 6, 2025

vits2 backbone with multilingual-bert

Python 8,602 1,245 Updated Nov 4, 2025

Simultaneous speech-to-text model

Python 8,282 774 Updated Nov 6, 2025

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 7,993 716 Updated May 31, 2024

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Python 7,725 1,382 Updated Dec 6, 2023

🔥 2D and 3D Face alignment library build using pytorch

Python 7,414 1,380 Updated Aug 30, 2024

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,110 391 Updated Jul 11, 2024

Official repo for consistency models.

Python 6,432 434 Updated Mar 22, 2024

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 6,036 629 Updated Aug 10, 2024

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code

Python 4,641 779 Updated Mar 19, 2025

On-device TTS model by Neuphonic

Python 3,879 386 Updated Nov 4, 2025

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,823 344 Updated Jan 4, 2024

Vector (and Scalar) Quantization, in Pytorch

Python 3,665 297 Updated Nov 5, 2025

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 3,663 247 Updated Sep 25, 2025

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,600 291 Updated Aug 14, 2025

zero-shot voice conversion & singing voice conversion, with real-time support

Python 3,389 396 Updated Apr 20, 2025

Have a natural, spoken conversation with AI!

Python 3,313 364 Updated Jul 11, 2025

Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"

Python 3,174 523 Updated Jul 23, 2024

A python package to analyze and compare voices with deep learning

Python 3,142 468 Updated Oct 12, 2023

AudioLDM: Generate speech, sound effects, music and beyond, with text.

Python 2,760 248 Updated Jun 25, 2025

A simple, high-quality voice conversion tool focused on ease of use and performance.

Python 2,678 450 Updated Nov 5, 2025

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

Python 2,601 278 Updated Jan 12, 2025

Text-to-Audio/Music Generation

Python 2,515 202 Updated Sep 29, 2024

Offline Text To Speech synthesis for python

Python 2,434 354 Updated Nov 6, 2025

WaveNet vocoder

Python 2,367 496 Updated Jul 29, 2023

Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)

Python 2,319 277 Updated Aug 16, 2025

Longformer: The Long-Document Transformer

Python 2,172 288 Updated Feb 8, 2023

Audio generation using diffusion models, in PyTorch.

Python 2,080 178 Updated Jun 12, 2023

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

Python 2,031 216 Updated Oct 9, 2025