Skip to content
View mtxing's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report mtxing

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis

Python 142 16 Updated Jan 1, 2025

Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.

Python 324 20 Updated Dec 25, 2025
Python 431 28 Updated Nov 27, 2025

A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows

Python 187 11 Updated Dec 17, 2025

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 2,818 352 Updated Dec 11, 2025

Di♪♪Rhythm 2: Efficient And High Fidelity Song Generation Via Block Flow Matching

Python 133 5 Updated Nov 9, 2025

Open-Source Frontier Voice AI

Python 19,014 2,100 Updated Dec 17, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,162 193 Updated Oct 9, 2025

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

Python 3,098 340 Updated Dec 20, 2025

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).

Python 450 45 Updated Dec 25, 2025

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,457 124 Updated Dec 25, 2025

Text-audio foundation model from Boson AI

Python 7,771 578 Updated Sep 15, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,841 1,086 Updated Dec 25, 2025

MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…

Python 1,062 95 Updated Dec 8, 2025

Train transformer language models with reinforcement learning.

Python 16,777 2,375 Updated Dec 24, 2025

An elegant PyTorch deep reinforcement learning library.

Python 9,012 1,200 Updated Dec 1, 2025

PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator (NeurIPS 2024)

Jupyter Notebook 529 31 Updated Sep 8, 2025

A PyTorch native platform for training generative AI models

Python 4,873 651 Updated Dec 24, 2025

A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.

Python 221 31 Updated Aug 6, 2025

A feature-rich command-line audio/video downloader

Python 139,418 11,267 Updated Dec 25, 2025

UTokyo-SaruLab MOS Prediction System

Python 274 28 Updated Dec 18, 2025

LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement

Python 94 20 Updated Apr 1, 2025

Spark-TTS Inference Code

Python 10,853 1,162 Updated Apr 9, 2025

Convenient for developers to call inference models from version v1 to v3 through API, supporting streaming transmission and specified type file transfer.

Python 44 4 Updated Mar 4, 2025

OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.

Python 460 30 Updated Nov 23, 2025
Python 4,576 371 Updated Dec 19, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 653 48 Updated Jun 5, 2025

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 3,899 282 Updated Sep 25, 2025

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,757 304 Updated Aug 14, 2025
Next