Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,162 193 Updated Oct 9, 2025

OpenBMB / VoxCPM

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

Python 3,098 340 Updated Dec 20, 2025

modelscope / Trinity-RFT

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).

Python 450 45 Updated Dec 25, 2025

ByteDance-Seed / VeOmni

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,457 124 Updated Dec 25, 2025

boson-ai / higgs-audio

Text-audio foundation model from Boson AI

Python 7,771 578 Updated Sep 15, 2025

Omni-Avatar / OmniAvatar

Python 1,749 158 Updated Aug 6, 2025

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 11,841 1,086 Updated Dec 25, 2025

OpenMOSS / MOSS-TTSD

MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…

Python 1,062 95 Updated Dec 8, 2025

huggingface / trl

Train transformer language models with reinforcement learning.

Python 16,777 2,375 Updated Dec 24, 2025

thu-ml / tianshou

An elegant PyTorch deep reinforcement learning library.

Python 9,012 1,200 Updated Dec 1, 2025

magic-research / piecewise-rectified-flow

PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator (NeurIPS 2024)

Jupyter Notebook 529 31 Updated Sep 8, 2025

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 4,873 651 Updated Dec 24, 2025

xingchensong / TouchNet

A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.

Python 221 31 Updated Aug 6, 2025

yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader

Python 139,418 11,267 Updated Dec 25, 2025

sarulab-speech / UTMOSv2

UTokyo-SaruLab MOS Prediction System

Python 274 28 Updated Dec 18, 2025

Kevin-naticl / LLaSE-G1

LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement

Python 94 20 Updated Apr 1, 2025

SparkAudio / Spark-TTS

Spark-TTS Inference Code

Python 10,853 1,162 Updated Apr 9, 2025

CNFlyCat / GPT-SoVITS-V3-Infer-API

Forked from RVC-Boss/GPT-SoVITS

Convenient for developers to call inference models from version v1 to v3 through API, supporting streaming transmission and specified type file transfer.

Python 44 4 Updated Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtxing mtxing

Achievements

Achievements

Block or report mtxing

Starred repositories

WangHelin1997 / SSR-Speech

FunAudioLLM / Fun-Audio-Chat

stepfun-ai / Step-Audio-R1

ASLP-lab / MeanVC

Soul-AILab / SoulX-Podcast

ASLP-lab / DiffRhythm2

microsoft / VibeVoice

QwenLM / Qwen3-Omni