This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lan…

Python 197 14 Updated Jan 25, 2026

lavendery / UUG

Python 22 2 Updated Sep 14, 2025

xiquan-li / MeanAudio

[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows

Python 131 15 Updated Sep 2, 2025

Vance0124 / Token-level-Direct-Preference-Optimization

Reference implementation for Token-level Direct Preference Optimization(TDPO)

Python 153 14 Updated Feb 14, 2025

tencent-ailab / SongBloom

The official code repository for SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement

Python 765 86 Updated Dec 4, 2025

kyutai-labs / delayed-streams-modeling

Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.

Python 2,898 301 Updated Jan 26, 2026

stepfun-ai / Step-Audio2

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,383 103 Updated Mar 16, 2026

yl4579 / DMOSpeech2

Python 301 39 Updated Jul 22, 2025

boson-ai / higgs-audio

Text-audio foundation model from Boson AI

Python 8,013 614 Updated Jan 18, 2026

XueZeyue / DanceGRPO

An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation

Python 1,566 76 Updated Oct 16, 2025

limefax / rope-nd

N-dimensional Rotary Position Embeddings for PyTorch

Python 84 3 Updated Feb 14, 2024

gyt1145028706 / XY-Tokenizer

This is the code for paper: XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs

Python 91 5 Updated Sep 19, 2025

ali-vilab / alitok

[ICLR2026] AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model

Python 53 2 Updated Oct 12, 2025

zruiii / Chinese-Mimi

Chinese-Mimi 是对 Moshi 模型的声码器进行了中文语料上的适配。

Python 34 4 Updated Mar 13, 2025

Shy-98 / MELLE

Unofficial PyTorch implementation of "Autoregressive Speech Synthesis without Vector Quantization (MELLE)"

Python 41 8 Updated Jun 28, 2025

01Zhangbw / Speech-and-audio-papers-Top-Conference

135 6 Updated Jan 24, 2026

yujxx / PodAgent

PodAgent: A Comprehensive Framework for Podcast Generation

Python 122 12 Updated May 16, 2025

OpenMOSS / MOSS-TTSD

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enablin…

Python 1,232 119 Updated Mar 23, 2026

k2-fsa / ZipVoice

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python 939 132 Updated Dec 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yiming Li Ming-er

Achievements

Achievements

Block or report Ming-er

Stars

liangsusan-git / AV-NeRF

BASHLab / OWL

penn-waves-lab / SmartDJ

KdaiP / DC-Speech-VAE

Chen-GX / ReForm

LuckyBian / Math5470

jaeyeonkim99 / visage

Chaos96 / NTPP

XZWY / SpatialCodec

OpenBMB / UltraEval-Audio

DanielLin94144 / Full-Duplex-Bench

HeCheng0625 / Diffusion-Speech-Tokenizer