Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,395 210 Updated Jan 8, 2026

qibin0506 / Cortex

从零构建大模型：从预训练到RLHF的完整实践

Python 2,365 173 Updated Jan 30, 2026

ChaofanTao / Autoregressive-Models-in-Vision-Survey

[TMLR 2025🔥] A survey for the autoregressive models in vision.

787 22 Updated Nov 8, 2025

xingchensong / FlashCosyVoice

FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.

Python 241 26 Updated Nov 11, 2025

FrontierLabs / F5R-TTS

Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"

Python 147 16 Updated Jun 3, 2025

yl4579 / DMOSpeech2

Python 297 39 Updated Jul 22, 2025

boson-ai / higgs-audio

Text-audio foundation model from Boson AI

Python 7,906 601 Updated Jan 18, 2026

stepfun-ai / Step-Audio2

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,333 96 Updated Sep 22, 2025

Yxxxb / VoCo-LLaMA

[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".

Python 204 10 Updated Jun 18, 2025

halsay / ASR-TTS-paper-daily

Update ASR paper everyday

Python 450 22 Updated Feb 6, 2026

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,233 1,662 Updated Feb 4, 2026

xingchensong / TouchNet

A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.

Python 224 32 Updated Aug 6, 2025

Mddct / transformer-vocos

Python 36 4 Updated Sep 6, 2025

ywh-my / Easy-Finetune-Bert-VITS2

Bert-VITS2项目bug多且教程不友好。本proj尽可能修复了Bert-vits2项目的bug，并且可一键启动训练。仅需50条目标说话人语音，获得稳定、快速的TTS模型。

Python 67 9 Updated Aug 19, 2025

OpenBMB / UltraEval-Audio

Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测，知己知彼。

Python 270 17 Updated Feb 3, 2026

facebookresearch / flow_matching

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 4,099 300 Updated Jan 5, 2026

hiyouga / LlamaFactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 66,993 8,142 Updated Feb 4, 2026

Mddct / simple-tts

（WIP）long form speech generatoins

Python 31 4 Updated Apr 2, 2025

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 23,403 4,349 Updated Feb 7, 2026

TianHao Zhang Zth9730

Lists (3)

jax related

linux tools

tts

Stars