Skip to content
View GYee's full-sized avatar
  • SCUT
  • Guangzhou

Block or report GYee

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,974 323 Updated Jun 12, 2025

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…

Python 462 27 Updated Mar 24, 2026

SpeechJudge: Towards Human-Level Judgment for Speech Naturalness (https://arxiv.org/abs/2511.07931)

Python 70 4 Updated Dec 23, 2025

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 2,363 232 Updated Jan 30, 2026

Audio text align viewer

JavaScript 5 Updated Feb 4, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 354,424 71,644 Updated Apr 11, 2026

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…

Python 10,533 1,367 Updated Mar 17, 2026

Official repository for the WenetSpeech-Chuan dataset.

Python 170 4 Updated Feb 5, 2026

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 3,280 426 Updated Dec 11, 2025
Python 345 44 Updated Apr 11, 2025

通义千问的DPO训练

Jupyter Notebook 64 7 Updated Sep 21, 2024

Sequential Diffusion Language Model (SDLM) enhances pre-trained autoregressive language models by adaptively determining generation length and maintaining KV-cache compatibility, achieving high eff…

Python 96 4 Updated Dec 27, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 20,586 3,624 Updated Apr 10, 2026

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

Python 9,185 1,086 Updated Apr 11, 2026

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,387 103 Updated Mar 16, 2026

SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.

Python 543 71 Updated Mar 29, 2025

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 45,017 6,034 Updated Aug 16, 2024

NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment

Python 929 150 Updated Dec 1, 2024

UTokyo-SaruLab MOS Prediction System

Python 308 30 Updated Apr 2, 2026

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enablin…

Python 1,257 122 Updated Mar 23, 2026

Code for DeSTA2.5-Audio, general-purpose LALM

Python 136 7 Updated Feb 4, 2026

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,559 343 Updated Jun 21, 2025

Text-audio foundation model from Boson AI

Python 8,020 619 Updated Jan 18, 2026

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,769 1,171 Updated Apr 8, 2026

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python 946 133 Updated Dec 2, 2025

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.5, DeepSeek, gpt-oss locally.

Python 60,973 5,260 Updated Apr 11, 2026

🤗 R1-AQA Model: mispeech/r1-aqa

Python 320 29 Updated Mar 28, 2025

Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

Jupyter Notebook 2,262 106 Updated Oct 29, 2025

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 659 53 Updated Jan 21, 2026
Next