Skip to content
View ASLP-lab's full-sized avatar

Block or report ASLP-lab

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 3,444 440 Updated Dec 11, 2025

Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation

Python 37 1 Updated Jun 9, 2026
Python 16 1 Updated Jun 12, 2026

Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model

Python 25 1 Updated May 21, 2026
Python 47 2 Updated May 2, 2026

Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR

Python 88 2 Updated May 13, 2026

SOME: Singing-Oriented MIDI Extractor.

Python 695 54 Updated Mar 7, 2026

一本系统地教你将深度学习模型的性能最大化的战术手册。

3,205 287 Updated May 27, 2023

Scaled diffusion transformer for text-to-speech synthesis (DiT + T5Gemma2 conditioning, TorchTitan & Megatron backends, tested up to 1024 GPUs)

Python 24 Updated Mar 29, 2026

YingMusic-Singer-Plus: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance

Python 66 4 Updated Apr 12, 2026
HTML 13 Updated Mar 25, 2026

OmniCodec: Low Frame Rate Universal Audio Codec with Semantic–Acoustic Disentanglement

Python 40 1 Updated Apr 17, 2026

M7-TTS: A Mini-Scale Multilingual and Multi-Dialect Text-to-Speech Language Model with Mimi codec and Multi Token Prediction

20 1 Updated Mar 19, 2026

This challenge focuses on evaluating speech recognition and semantic understanding capabilities of AI glasses in complex real-world environments.

18 Updated Jun 14, 2026

An Open-Source Multidimension Speech Understanding Foundation Model Built upon OpenPangu on Ascend NPUs

Python 32 Updated Mar 15, 2026

A Large-scale Wu Dialect Speech Corpus with Multi-dimensional Annotations

Python 152 4 Updated Feb 6, 2026

An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.

Python 250 12 Updated Feb 26, 2026

Di♪♪Rhythm 2: Efficient And High Fidelity Song Generation Via Block Flow Matching

Python 165 12 Updated Nov 9, 2025

A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows

Python 284 20 Updated Jan 8, 2026
Python 149 23 Updated May 14, 2026

Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems

Python 112 8 Updated Jan 25, 2026

Official repository for the WenetSpeech-Chuan dataset.

Python 201 6 Updated Feb 5, 2026

A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation

Python 330 20 Updated Jun 6, 2026

Open repository of "MSU-Bench: Towards Understanding the Conversational Multi-Speaker Scenarios"

11 Updated Aug 11, 2025

A Massive Contextual Speech Recognition Benchmark.

Python 107 3 Updated Aug 6, 2025

A song aesthetic evaluation toolkit trained on SongEval.

Python 308 26 Updated Apr 8, 2026

Llasa Speed Up

Python 63 6 Updated Jan 18, 2026
Next