Skip to content
View huutuongtu's full-sized avatar
😀
Huh?
😀
Huh?

Block or report huutuongtu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official implementation of "Continuous Autoregressive Language Models"

Python 201 30 Updated Nov 3, 2025

[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression

Python 120 5 Updated Apr 12, 2025

Parallel Continuous Chain-of-Thought with Jacobi Iteration. Accepted to EMNLP 2025.

Python 11 2 Updated Oct 3, 2025

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 349 16 Updated Nov 4, 2025

Official Repository of UltraVoice

JavaScript 44 1 Updated Oct 28, 2025

Nano vLLM

Python 8,319 1,016 Updated Nov 3, 2025

Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"

Python 91 4 Updated Oct 26, 2025

A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.

Python 110 8 Updated Sep 19, 2025

Finetune Sesame AI's conversational speech model on new languages and voices. Blog post: https://blog.speechmatics.com/sesame-finetune

Python 90 9 Updated Sep 27, 2025

Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.

Python 83 5 Updated Oct 15, 2025

Long-form streaming TTS system for multi-speaker dialogue generation

Python 1,191 106 Updated Oct 26, 2025

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

Python 2,021 215 Updated Oct 9, 2025

On-device TTS model by Neuphonic

Python 3,876 385 Updated Nov 4, 2025

VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency

Python 162 21 Updated Oct 26, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,821 159 Updated Oct 9, 2025

dLLM: Simple Diffusion Language Modeling

Python 194 13 Updated Nov 5, 2025

This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lan…

Python 55 1 Updated Sep 21, 2025

Enjoy the magic of Diffusion models!

Python 10,582 987 Updated Nov 5, 2025

FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.

Python 201 18 Updated Aug 14, 2025

Python implementation of performance metrics in Loizou's Speech Enhancement book

Python 441 92 Updated Feb 15, 2025

This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lan…

Python 190 12 Updated Sep 21, 2025

The python library for real-time communication

JavaScript 4,383 409 Updated Sep 19, 2025

[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 1,544 83 Updated Nov 4, 2025

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,195 85 Updated Sep 22, 2025

Text-audio foundation model from Boson AI

Python 7,570 559 Updated Sep 15, 2025

Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"

Python 123 15 Updated Jun 3, 2025

Jupiter Python SDK is a Python library that allows you to use most of Jupiter features.

Python 244 61 Updated Apr 8, 2024

A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E2E Retrieval.

Python 23 2 Updated Jul 11, 2025

SoTA open-source TTS

Python 14,427 1,940 Updated Sep 25, 2025

Implementation of the dynamic chunking mechanism in H-net by Hwang et al. of Carnegie Mellon

Python 65 1 Updated Aug 15, 2025
Next