Skip to content
View huutuongtu's full-sized avatar
😀
Huh?
😀
Huh?

Block or report huutuongtu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
289 stars written in Python
Clear filter
Python 4,541 362 Updated Jun 12, 2025

An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).

Python 4,498 402 Updated Aug 1, 2024

LLM training code for Databricks foundation models

Python 4,350 575 Updated Oct 27, 2025

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,324 308 Updated Jun 21, 2025

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Python 4,226 485 Updated Apr 15, 2025

Official tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”

Python 3,996 742 Updated Oct 9, 2022

On-device TTS model by Neuphonic

Python 3,882 386 Updated Nov 4, 2025

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,823 344 Updated Jan 4, 2024

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 3,665 248 Updated Sep 25, 2025

Vector (and Scalar) Quantization, in Pytorch

Python 3,665 297 Updated Nov 5, 2025

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,600 291 Updated Aug 14, 2025

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,426 294 Updated Nov 5, 2024

A python package to analyze and compare voices with deep learning

Python 3,143 468 Updated Oct 12, 2023

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,089 215 Updated May 19, 2025

GLM-4-Voice | 端到端中英语音对话模型

Python 3,070 264 Updated Dec 5, 2024

Implementation for MatMul-free LM.

Python 3,034 196 Updated Jul 21, 2025

An unofficial PyTorch implementation of the audio LM VALL-E

Python 2,990 412 Updated May 10, 2023

Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning

Python 2,876 728 Updated Jul 28, 2022

POT : Python Optimal Transport

Python 2,676 538 Updated Nov 5, 2025

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

Python 2,601 278 Updated Jan 12, 2025

This library provides common speech features for ASR including MFCCs and filterbank energies.

Python 2,420 620 Updated Oct 20, 2021

label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful

Python 2,253 373 Updated Oct 17, 2024

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Python 2,252 539 Updated Jul 27, 2024

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Python 2,181 333 Updated Sep 10, 2025

A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.

Python 2,166 205 Updated Sep 26, 2025

TorchCFM: a Conditional Flow Matching library

Python 2,084 166 Updated Sep 9, 2025

Audio generation using diffusion models, in PyTorch.

Python 2,080 178 Updated Jun 12, 2023

Transcription, forced alignment, and audio indexing with OpenAI's Whisper

Python 2,061 220 Updated Oct 29, 2025

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

Python 2,034 216 Updated Oct 9, 2025

AI powered speech denoising and enhancement

Python 2,033 244 Updated Dec 3, 2024