Skip to content
View lmxue's full-sized avatar

Organizations

@SparkAudio

Block or report lmxue

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 30 2 Updated Jun 15, 2026

foundation model plugin for Julius decoder

Python 64 17 Updated Jan 12, 2026

WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling

Python 201 6 Updated Jun 6, 2026

MMAE: A Massive Multitask Audio Editing Benchmark

Python 95 4 Updated Jun 8, 2026

An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.

Python 263 14 Updated Jun 22, 2026

Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR

Python 89 2 Updated May 13, 2026

Demo page of MINT-Bench

13 Updated May 26, 2026
Python 47 2 Updated May 2, 2026

[SIGGRAPH 2026] Repository of Audio-Omni

Python 392 33 Updated Jun 10, 2026

AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis

67 3 Updated Jun 12, 2026

High-Quality Voice Cloning TTS for 600+ Languages

Python 7,932 1,251 Updated Jul 3, 2026

"CLI-Anything: Making ALL Software Agent-Native" -- CLI-Hub: https://clianything.cc/

Python 44,599 4,168 Updated Jun 25, 2026

Xmart青年论坛仓库,存放历史学生论坛和前沿讲座的视频回放和讲义,获取最新Xmart预告欢迎关注公众号【XLANCE Lab】

54 Updated Apr 7, 2026

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…

Python 12,261 1,588 Updated Mar 17, 2026

MiniMax M2.1, a SOTA model for real-world dev & agents.

542 46 Updated Jan 28, 2026

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 1,056 104 Updated Jun 17, 2026

[ICLR 2026] SoFlow: Solution Flow Models for One-Step Generative Modeling

Python 160 7 Updated Apr 8, 2026

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,554 321 Updated May 26, 2026

[ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.

Python 138 Updated Apr 7, 2026

A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows

Python 291 21 Updated Jan 8, 2026

Di♪♪Rhythm 2: Efficient And High Fidelity Song Generation Via Block Flow Matching

Python 166 12 Updated Nov 9, 2025

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 3,470 450 Updated Dec 11, 2025

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

Python 32,370 3,687 Updated Jul 1, 2026

A ComfyUI custom node integration for local multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Echo-TTS, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterb…

Python 1,080 125 Updated Jun 24, 2026

[ICML 2025 Tokenization Workshop] HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling

Python 105 4 Updated Sep 28, 2025

[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows

Python 141 18 Updated Sep 2, 2025
Next