huutuongtu

Follow

😀

Huh?

Huu Tuong Tu huutuongtu

😀

Huh?

Follow

Strygwyr

16 followers · 62 following

Vietnam

Achievements

Achievements

Lists (16)

Sort

Aligner

Aligner for TTS, ASR, ...

Audio Enhancement

DATASET

improve_model_architecture

15 repositories

Interactive AI

MDD

MLOPS

SE

Singing Voice

Speaker Diarization

Speech LLM

14 repositories

Speech quality assessment

Speech Separation

Speech Tokenizer

10 repositories

Tool

trader

Stars

shaochenze / calm

Official implementation of "Continuous Autoregressive Language Models"

Python 201 30 Updated Nov 3, 2025

zjunlp / LightThinker

[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression

Python 120 5 Updated Apr 12, 2025

whyNLP / PCCoT

Parallel Continuous Chain-of-Thought with Jacobi Iteration. Accepted to EMNLP 2025.

Python 11 2 Updated Oct 3, 2025

meituan-longcat / LongCat-Flash-Omni

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 349 16 Updated Nov 4, 2025

bigai-nlco / UltraVoice

Official Repository of UltraVoice

JavaScript 44 1 Updated Oct 28, 2025

GeeeekExplorer / nano-vllm

Nano vLLM

Python 8,319 1,016 Updated Nov 3, 2025

ZhikangNiu / Semantic-VAE

Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"

Python 91 4 Updated Oct 26, 2025

XiaomiMiMo / MiMo-Audio-Tokenizer

A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.

Python 110 8 Updated Sep 19, 2025

knottwill / sesame-finetune

Finetune Sesame AI's conversational speech model on new languages and voices. Blog post: https://blog.speechmatics.com/sesame-finetune

Python 90 9 Updated Sep 27, 2025

NVIDIA / audio-intelligence

Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.

Python 83 5 Updated Oct 15, 2025

FireRedTeam / FireRedTTS2

Long-form streaming TTS system for multi-speaker dialogue generation

Python 1,191 106 Updated Oct 26, 2025

OpenBMB / VoxCPM

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

Python 2,021 215 Updated Oct 9, 2025

neuphonic / neutts-air

On-device TTS model by Neuphonic

Python 3,876 385 Updated Nov 4, 2025

herimor / voxtream

VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency

Python 162 21 Updated Oct 26, 2025

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,821 159 Updated Oct 9, 2025

ZHZisZZ / dllm

dLLM: Simple Diffusion Language Modeling

Python 194 13 Updated Nov 5, 2025

AmphionTeam / TaDiCodec

This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lan…

Python 55 1 Updated Sep 21, 2025

modelscope / DiffSynth-Studio

Enjoy the magic of Diffusion models!

Python 10,582 987 Updated Nov 5, 2025

xingchensong / FlashCosyVoice

FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.

Python 201 18 Updated Aug 14, 2025

schmiph2 / pysepm

Python implementation of performance metrics in Loizou's Speech Enhancement book

Python 441 92 Updated Feb 15, 2025

HeCheng0625 / Diffusion-Speech-Tokenizer

This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lan…

Python 190 12 Updated Sep 21, 2025

gradio-app / fastrtc

The python library for real-time communication

JavaScript 4,383 409 Updated Sep 19, 2025

yifan123 / flow_grpo

[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 1,544 83 Updated Nov 4, 2025

stepfun-ai / Step-Audio2

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,195 85 Updated Sep 22, 2025

boson-ai / higgs-audio

Text-audio foundation model from Boson AI

Python 7,570 559 Updated Sep 15, 2025

FrontierLabs / F5R-TTS

Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"

Python 123 15 Updated Jun 3, 2025

0xTaoDev / jupiter-python-sdk

Jupiter Python SDK is a Python library that allows you to use most of Jupiter features.

Python 244 61 Updated Apr 8, 2024

the-bird-F / GLM-Voice-RAG

A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E2E Retrieval.

Python 23 2 Updated Jul 11, 2025

resemble-ai / chatterbox

SoTA open-source TTS

Python 14,427 1,940 Updated Sep 25, 2025

lucidrains / h-net-dynamic-chunking

Implementation of the dynamic chunking mechanism in H-net by Hwang et al. of Carnegie Mellon

Python 65 1 Updated Aug 15, 2025