GYee

YeeKo GYee

计算机视觉、语音合成

SCUT
Guangzhou

Stars

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,974 323 Updated Jun 12, 2025

meituan-longcat / LongCat-AudioDiT

Python 416 37 Updated Apr 3, 2026

FireRedTeam / FireRedASR2S

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…

Python 462 27 Updated Mar 24, 2026

AmphionTeam / SpeechJudge

SpeechJudge: Towards Human-Level Judgment for Speech Naturalness (https://arxiv.org/abs/2511.07931)

Python 70 4 Updated Dec 23, 2025

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 2,363 232 Updated Jan 30, 2026

LoganLiu66 / SpeechAlignViz

Audio text align viewer

JavaScript 5 Updated Feb 4, 2026

openclaw / openclaw

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 354,424 71,644 Updated Apr 11, 2026

QwenLM / Qwen3-TTS

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…

Python 10,533 1,367 Updated Mar 17, 2026

ASLP-lab / WenetSpeech-Chuan

Official repository for the WenetSpeech-Chuan dataset.

Python 170 4 Updated Feb 5, 2026

Soul-AILab / SoulX-Podcast

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 3,280 426 Updated Dec 11, 2025

jzq2000 / MoonCast

Python 345 44 Updated Apr 11, 2025

owenliang / qwen-dpo

通义千问的DPO训练

Jupyter Notebook 64 7 Updated Sep 21, 2024

OpenGVLab / SDLM

Sequential Diffusion Language Model (SDLM) enhances pre-trained autoregressive language models by adaptively determining generation length and maintaining KV-cache compatibility, achieving high eff…

Python 96 4 Updated Dec 27, 2025

verl-project / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 20,586 3,624 Updated Apr 10, 2026

OpenBMB / VoxCPM

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

Python 9,185 1,086 Updated Apr 11, 2026

stepfun-ai / Step-Audio2

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,387 103 Updated Mar 16, 2026

SpeechColab / Leaderboard

SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.

Python 543 71 Updated Mar 29, 2025

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 45,017 6,034 Updated Aug 16, 2024

gabrielmittag / NISQA

NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment

Python 929 150 Updated Dec 1, 2024

sarulab-speech / UTMOSv2

UTokyo-SaruLab MOS Prediction System

Python 308 30 Updated Apr 2, 2026

OpenMOSS / MOSS-TTSD

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enablin…

Python 1,257 122 Updated Mar 23, 2026

kehanlu / DeSTA2.5-Audio

Code for DeSTA2.5-Audio, general-purpose LALM

Python 136 7 Updated Feb 4, 2026

MoonshotAI / Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,559 343 Updated Jun 21, 2025

boson-ai / higgs-audio

Text-audio foundation model from Boson AI

Python 8,020 619 Updated Jan 18, 2026

facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,769 1,171 Updated Apr 8, 2026

k2-fsa / ZipVoice

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python 946 133 Updated Dec 2, 2025

unslothai / unsloth

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.5, DeepSeek, gpt-oss locally.

Python 60,973 5,260 Updated Apr 11, 2026

xiaomi-research / r1-aqa

🤗 R1-AQA Model: mispeech/r1-aqa

Python 320 29 Updated Mar 28, 2025

Liuziyu77 / Visual-RFT

Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

Jupyter Notebook 2,262 106 Updated Oct 29, 2025

zhenye234 / LLaSA_training

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 659 53 Updated Jan 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly