zhaoyx239

Follow

Yuxiang Zhao zhaoyx239

Follow

M.S. student in CS @ Shanghai Jiao Tong University · Intern @ AISpeech

37 followers · 75 following

Shanghai Jiao Tong University
Shanghai
10:44 (UTC +08:00)

Lists (8)

Sort

ASR & Understanding

11 repositories

Duplex & Turn Detection

Generation

24 repositories

Lab Work

29 repositories

Machine Translation

My Contributions

Speech Translation

Tools

Stars

SpeechColab / GigaSpeechBench

Python 15 Updated Jun 15, 2026

Zyphra / ZONOS2

Zonos2 is a leading open-weight text-to-speech MoE.

Python 196 24 Updated Jun 16, 2026

Qiushao-E / SWE-Explore-Bench

Python 19 Updated Jun 8, 2026

microsoft / fastcontext

FastContext: Training Efficient Repository Explorer for Coding Agents

Python 427 18 Updated Jun 17, 2026

XiaomiMiMo / MiMo-V2.5-ASR

Robust Speech Recognition Across Languages, Dialects, and Complex Acoustic Scenarios

Python 276 27 Updated Apr 23, 2026

z-lab / dflash

DFlash: Block Diffusion for Flash Speculative Decoding

Python 5,154 372 Updated May 10, 2026

fishaudio / fish-speech

SOTA Open Source TTS

Python 30,840 2,639 Updated Jun 9, 2026

Imbad0202 / academic-research-skills

Academic Research Skills for Claude Code: research → write → review → revise → finalize

Python 32,136 2,644 Updated Jun 17, 2026

yanghaha0908 / FastHuBERT

Official implementation for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning

Python 100 4 Updated Nov 20, 2024

zty624 / qzcli_tool

Forked from tianyilt/qzcli_tool

启智平台任务管理 CLI：资源查询、任务提交、日志查看和 MCP/agent workflow

Python 2 Updated Jun 9, 2026

ziye26 / Audio-Oscar

Audio-Oscar is a multi-agent framework for generating long-form, controllable audio from complex audio scene descriptions.

Python 41 4 Updated Jun 8, 2026

jd-opensource / JoyAI-Echo

JoyAI-Echo: Pushing the Frontier of Long Audio-Visual Generation

Python 1,591 138 Updated Jun 16, 2026

ddlBoJack / MMAE

MMAE: A Massive Multitask Audio Editing Benchmark

Python 94 3 Updated Jun 8, 2026

Bairong-Xdynamics / TurnSense

Python 139 12 Updated Jun 12, 2026

cwx-worst-one / WavTTS

WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling

Python 181 6 Updated Jun 6, 2026

cmots / UniSS

Official inference code for UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice.

Python 28 5 Updated May 30, 2026

Soul-AILab / SoulX-Transcriber

An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.

Python 248 11 Updated Jun 4, 2026

liutaocode / Vibe_XASR

Swift 14 Updated Jun 3, 2026

xiaomi-research / dasheng-audiogen

end-to-end text to audio scene generation model

39 1 Updated Jun 16, 2026

Gilgamesh-J / X-ASR

X-ASR is a series of automatic speech recognition models based on the icefall framework, focusing on streaming ASR and low-latency deployment.

Swift 120 11 Updated Jun 16, 2026

netease-youdao / Confucius4-TTS

Confucius4-TTS: a Multilingual and Cross-Lingual Zero-Shot TTS Engine

Python 169 17 Updated Jun 16, 2026

Stability-AI / stable-audio-3

Python 496 59 Updated Jun 16, 2026

xzf-thu / Mega-ASR

First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios, 2.6M samples, and up to ~30% gains over SOTA where every other model falls apart. **You'll come …

Python 1,017 65 Updated Jun 2, 2026

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 17,390 3,436 Updated Jun 16, 2026

xianyu110 / awesome-claudcode-tutorial

最全面的 Claude Code 中文教程 - 从零基础到企业级应用

Python 479 98 Updated Apr 5, 2026

xxyQwQ / StraTA

Implementation for the paper "StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction".

Python 22 4 Updated May 8, 2026

yfyeung / DS-WED

[ICASSP 2026] Official code for "Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration"

Python 16 Updated Apr 16, 2026

k2-fsa / OmniVoice

High-Quality Voice Cloning TTS for 600+ Languages

Python 7,521 1,178 Updated Jun 11, 2026

yanghaha0908 / WavCube

Official code for "WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling"

Python 62 7 Updated May 13, 2026

Tencent-Hunyuan / Hy-MT

Python 781 71 Updated Jun 1, 2026