Skip to content
View huutuongtu's full-sized avatar
😀
Huh?
😀
Huh?

Block or report huutuongtu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
289 stars written in Python
Clear filter

Code for BLT research paper

Python 2,006 180 Updated Nov 3, 2025

Inference script for Oasis 500M

Python 1,974 166 Updated Nov 8, 2024

Contrastive Language-Audio Pretraining

Python 1,885 191 Updated May 15, 2025

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,782 107 Updated Sep 27, 2024

first base model for full-duplex conversational audio

Python 1,768 112 Updated Jan 5, 2025

Vocal Remover using Deep Neural Networks

Python 1,726 251 Updated Jul 23, 2024

Underthesea - Vietnamese NLP Toolkit

Python 1,622 288 Updated Nov 7, 2025

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,611 160 Updated Nov 4, 2025

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…

Python 1,571 138 Updated Sep 22, 2025

[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 1,553 84 Updated Nov 4, 2025

Implementation of Hinton's forward-forward (FF) algorithm - an alternative to back-propagation

Python 1,487 143 Updated Sep 6, 2023

[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Python 1,402 64 Updated Mar 16, 2025

Pytorch implementation of Diffusion Models (https://arxiv.org/pdf/2006.11239.pdf)

Python 1,384 305 Updated Sep 7, 2023

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch

Python 1,332 105 Updated Sep 24, 2023

CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)

Python 1,304 171 Updated Aug 19, 2024

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,224 104 Updated Mar 2, 2025

A webui for different audio related Neural Networks

Python 1,214 107 Updated May 19, 2025

This is an open source project (formerly named Listen, Attend and Spell - PyTorch Implementation) for end-to-end ASR implemented with Pytorch, the well known deep learning toolkit.

Python 1,212 316 Updated Dec 19, 2020

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,201 85 Updated Sep 22, 2025

Long-form streaming TTS system for multi-speaker dialogue generation

Python 1,198 106 Updated Oct 26, 2025

Unofficial PyTorch implementation of Google AI's VoiceFilter system

Python 1,166 239 Updated Jul 25, 2024

Audio processing by using pytorch 1D convolution network

Python 1,093 96 Updated May 16, 2025

Tools for handling multimodal data in machine learning projects.

Python 1,079 255 Updated Oct 31, 2025

Code release for DynamicTanh (DyT)

Python 1,020 85 Updated Mar 30, 2025

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 982 76 Updated Dec 23, 2024

Pytorch Implementation (unofficial) of the paper "Mean Flows for One-step Generative Modeling" by Geng et al.

Python 925 54 Updated Oct 16, 2025

A Framework for Speech, Language, Audio, Music Processing with Large Language Model

Python 914 97 Updated Oct 24, 2025

g2p: English Grapheme To Phoneme Conversion

Python 891 134 Updated Jan 5, 2023