Skip to content
View choiHkk's full-sized avatar

Block or report choiHkk

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
167 stars written in Python
Clear filter

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,817 135 Updated Jul 5, 2024

Simple text to phones converter for multiple languages

Python 1,475 192 Updated Sep 26, 2024
Python 1,454 185 Updated Feb 11, 2024

Kakao Hangul Analyzer III

Python 1,449 297 Updated Sep 1, 2025

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Python 1,403 131 Updated Apr 24, 2024

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch

Python 1,332 105 Updated Sep 24, 2023

CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)

Python 1,304 171 Updated Aug 19, 2024

Boosting your Web Services of Deep Learning Applications.

Python 1,244 189 Updated May 13, 2021

Speech emotion recognition implemented in Keras (LSTM, CNN, SVM, MLP) | 语音情感识别

Python 1,239 227 Updated Mar 25, 2023

Artificial Neural Engine Machine Learning Library

Python 1,231 42 Updated Sep 2, 2025

Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase

Python 1,206 176 Updated Dec 22, 2023

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022

Python 1,156 89 Updated May 15, 2024

An implementation of Performer, a linear attention-based transformer, in Pytorch

Python 1,156 148 Updated Feb 2, 2022

In defence of metric learning for speaker recognition

Python 1,143 286 Updated Mar 26, 2024

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 1,132 141 Updated Sep 5, 2024

Audio processing by using pytorch 1D convolution network

Python 1,093 96 Updated May 16, 2025

[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

Python 1,083 186 Updated Dec 22, 2023

PyTorch implementation of normalizing flow models

Python 900 129 Updated Aug 25, 2024

Collection of audio-focused loss functions in PyTorch

Python 819 72 Updated Jul 30, 2024

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

Python 746 127 Updated Apr 11, 2024

Flexible audio loudness meter in Python with implementation of ITU-R BS.1770-4 loudness algorithm

Python 729 57 Updated Jul 2, 2024

Fast CUDA implementation of (differentiable) soft dynamic time warping for PyTorch

Python 716 65 Updated Apr 3, 2024

Chinese text normalization for speech processing

Python 711 149 Updated Mar 18, 2023

A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Python 698 155 Updated Jul 12, 2022

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

Python 687 118 Updated Jan 19, 2025

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.

Python 633 194 Updated May 27, 2023

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 616 61 Updated Jun 9, 2024

PyTorch implementation of Glow

Python 543 99 Updated Nov 20, 2021

unofficial vits2-TTS implementation in pytorch

Python 539 98 Updated Mar 28, 2024