Skip to content
View Ming-er's full-sized avatar

Block or report Ming-er

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[NeurIPS 2023] AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis

Python 36 1 Updated Feb 15, 2024
Python 9 2 Updated Feb 25, 2026

[ICLR 2026] SmartDJ: declarative audio editing with audio langugae model.

Python 55 1 Updated Apr 2, 2026

5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs

Python 57 9 Updated Nov 19, 2025
Python 21 Updated Jan 31, 2026
Jupyter Notebook 1 1 Updated Oct 18, 2025

Official implementation of "ViSAGe: Video-to-Spatial AUdio Generation" (ICLR 2025)

Python 42 4 Updated Sep 10, 2025

Official code of ICML 2025 paper "NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction"

Python 134 22 Updated Oct 27, 2025

Implementation of SpatialCodec.

Python 69 6 Updated Sep 23, 2023

Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测,知己知彼。

Python 285 21 Updated Mar 19, 2026

A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models

Python 149 12 Updated Feb 23, 2026

This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lan…

Python 197 14 Updated Jan 25, 2026
Python 22 2 Updated Sep 14, 2025

[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows

Python 131 15 Updated Sep 2, 2025

Reference implementation for Token-level Direct Preference Optimization(TDPO)

Python 153 14 Updated Feb 14, 2025

The official code repository for SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement

Python 765 86 Updated Dec 4, 2025

Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.

Python 2,898 301 Updated Jan 26, 2026

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,383 103 Updated Mar 16, 2026
Python 301 39 Updated Jul 22, 2025

Text-audio foundation model from Boson AI

Python 8,013 614 Updated Jan 18, 2026

An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation

Python 1,566 76 Updated Oct 16, 2025

N-dimensional Rotary Position Embeddings for PyTorch

Python 84 3 Updated Feb 14, 2024

This is the code for paper: XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs

Python 91 5 Updated Sep 19, 2025

[ICLR2026] AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model

Python 53 2 Updated Oct 12, 2025

Chinese-Mimi 是对 Moshi 模型的声码器进行了中文语料上的适配。

Python 34 4 Updated Mar 13, 2025

Unofficial PyTorch implementation of "Autoregressive Speech Synthesis without Vector Quantization (MELLE)"

Python 41 8 Updated Jun 28, 2025

PodAgent: A Comprehensive Framework for Podcast Generation

Python 122 12 Updated May 16, 2025

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enablin…

Python 1,232 119 Updated Mar 23, 2026

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python 939 132 Updated Dec 2, 2025
Next