-
Nanjing University
- Suzhou
- https://lmxue.github.io/
- https://scholar.google.com/citations?user=KNqxVT0AAAAJ&hl=en
- in/liumeng-xue-01b7b9251
Stars
foundation model plugin for Julius decoder
WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling
An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.
Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR
ASLP-lab / MINT-Bench-Demo
Forked from LongWaytoG0/MINT-BenchDemo page of MINT-Bench
AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis
High-Quality Voice Cloning TTS for 600+ Languages
"CLI-Anything: Making ALL Software Agent-Native" -- CLI-Hub: https://clianything.cc/
Xmart青年论坛仓库,存放历史学生论坛和前沿讲座的视频回放和讲义,获取最新Xmart预告欢迎关注公众号【XLANCE Lab】
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
MiniMax M2.1, a SOTA model for real-world dev & agents.
MiMo-Audio: Audio Language Models are Few-Shot Learners
[ICLR 2026] SoFlow: Solution Flow Models for One-Step Generative Modeling
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
[ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.
A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
ASLP-lab / DiffRhythm2
Forked from xiaomi-research/diffrhythm2Di♪♪Rhythm 2: Efficient And High Fidelity Song Generation Via Block Flow Matching
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
A ComfyUI custom node integration for local multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Echo-TTS, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterb…
[ICML 2025 Tokenization Workshop] HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling
[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows