Skip to content
View zhaoyx239's full-sized avatar
  • Shanghai Jiao Tong University
  • Shanghai
  • 10:44 (UTC +08:00)

Block or report zhaoyx239

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 15 Updated Jun 15, 2026

Zonos2 is a leading open-weight text-to-speech MoE.

Python 196 24 Updated Jun 16, 2026
Python 19 Updated Jun 8, 2026

FastContext: Training Efficient Repository Explorer for Coding Agents

Python 427 18 Updated Jun 17, 2026

Robust Speech Recognition Across Languages, Dialects, and Complex Acoustic Scenarios

Python 276 27 Updated Apr 23, 2026

DFlash: Block Diffusion for Flash Speculative Decoding

Python 5,154 372 Updated May 10, 2026

SOTA Open Source TTS

Python 30,840 2,639 Updated Jun 9, 2026

Academic Research Skills for Claude Code: research → write → review → revise → finalize

Python 32,136 2,644 Updated Jun 17, 2026

Official implementation for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning

Python 100 4 Updated Nov 20, 2024

启智平台任务管理 CLI:资源查询、任务提交、日志查看和 MCP/agent workflow

Python 2 Updated Jun 9, 2026

Audio-Oscar is a multi-agent framework for generating long-form, controllable audio from complex audio scene descriptions.

Python 41 4 Updated Jun 8, 2026

JoyAI-Echo: Pushing the Frontier of Long Audio-Visual Generation

Python 1,591 138 Updated Jun 16, 2026

MMAE: A Massive Multitask Audio Editing Benchmark

Python 94 3 Updated Jun 8, 2026

WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling

Python 181 6 Updated Jun 6, 2026

Official inference code for UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice.

Python 28 5 Updated May 30, 2026

An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.

Python 248 11 Updated Jun 4, 2026
Swift 14 Updated Jun 3, 2026

end-to-end text to audio scene generation model

39 1 Updated Jun 16, 2026

X-ASR is a series of automatic speech recognition models based on the icefall framework, focusing on streaming ASR and low-latency deployment.

Swift 120 11 Updated Jun 16, 2026

Confucius4-TTS: a Multilingual and Cross-Lingual Zero-Shot TTS Engine

Python 169 17 Updated Jun 16, 2026

First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios, 2.6M samples, and up to ~30% gains over SOTA where every other model falls apart. **You'll come …

Python 1,017 65 Updated Jun 2, 2026

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 17,390 3,436 Updated Jun 16, 2026

最全面的 Claude Code 中文教程 - 从零基础到企业级应用

Python 479 98 Updated Apr 5, 2026

Implementation for the paper "StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction".

Python 22 4 Updated May 8, 2026

[ICASSP 2026] Official code for "Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration"

Python 16 Updated Apr 16, 2026

High-Quality Voice Cloning TTS for 600+ Languages

Python 7,521 1,178 Updated Jun 11, 2026

Official code for "WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling"

Python 62 7 Updated May 13, 2026
Python 781 71 Updated Jun 1, 2026
Next