Skip to content
View choiHkk's full-sized avatar

Block or report choiHkk

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
172 stars written in Python
Clear filter

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Python 9,445 1,326 Updated Apr 24, 2024

so-vits-svc fork with realtime support, improved interface and more features.

Python 9,179 1,221 Updated Nov 13, 2025

vits2 backbone with multilingual-bert

Python 8,609 1,247 Updated Nov 10, 2025

Simultaneous speech-to-text model

Python 8,430 795 Updated Nov 10, 2025

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 8,021 719 Updated May 31, 2024

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Python 7,733 1,382 Updated Dec 6, 2023

🔥 2D and 3D Face alignment library build using pytorch

Python 7,426 1,380 Updated Aug 30, 2024

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 7,346 666 Updated Nov 10, 2025

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,121 392 Updated Jul 11, 2024

Official repo for consistency models.

Python 6,436 434 Updated Mar 22, 2024

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 6,052 631 Updated Aug 10, 2024

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code

Python 4,648 780 Updated Mar 19, 2025

On-device TTS model by Neuphonic

Python 3,953 393 Updated Nov 4, 2025

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,831 342 Updated Jan 4, 2024

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 3,707 251 Updated Sep 25, 2025

Vector (and Scalar) Quantization, in Pytorch

Python 3,680 300 Updated Nov 12, 2025

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,624 293 Updated Aug 14, 2025

zero-shot voice conversion & singing voice conversion, with real-time support

Python 3,406 398 Updated Apr 20, 2025

Have a natural, spoken conversation with AI!

Python 3,334 368 Updated Jul 11, 2025

Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"

Python 3,175 523 Updated Jul 23, 2024

A python package to analyze and compare voices with deep learning

Python 3,147 468 Updated Oct 12, 2023

AudioLDM: Generate speech, sound effects, music and beyond, with text.

Python 2,768 248 Updated Jun 25, 2025

A simple, high-quality voice conversion tool focused on ease of use and performance.

Python 2,682 455 Updated Nov 5, 2025

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

Python 2,604 279 Updated Jan 12, 2025

Text-to-Audio/Music Generation

Python 2,519 202 Updated Sep 29, 2024

Offline Text To Speech synthesis for python

Python 2,438 354 Updated Nov 6, 2025

WaveNet vocoder

Python 2,367 496 Updated Jul 29, 2023

Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)

Python 2,325 278 Updated Aug 16, 2025

Longformer: The Long-Document Transformer

Python 2,174 288 Updated Feb 8, 2023

Audio generation using diffusion models, in PyTorch.

Python 2,079 178 Updated Jun 12, 2023