-
Shanghai Jiao Tong University
- Shanghai
-
10:44
(UTC +08:00)
Lists (8)
Sort Name ascending (A-Z)
Stars
Zonos2 is a leading open-weight text-to-speech MoE.
FastContext: Training Efficient Repository Explorer for Coding Agents
Robust Speech Recognition Across Languages, Dialects, and Complex Acoustic Scenarios
DFlash: Block Diffusion for Flash Speculative Decoding
Academic Research Skills for Claude Code: research → write → review → revise → finalize
Official implementation for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
zty624 / qzcli_tool
Forked from tianyilt/qzcli_tool启智平台任务管理 CLI:资源查询、任务提交、日志查看和 MCP/agent workflow
Audio-Oscar is a multi-agent framework for generating long-form, controllable audio from complex audio scene descriptions.
JoyAI-Echo: Pushing the Frontier of Long Audio-Visual Generation
WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling
Official inference code for UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice.
An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.
end-to-end text to audio scene generation model
X-ASR is a series of automatic speech recognition models based on the icefall framework, focusing on streaming ASR and low-latency deployment.
Confucius4-TTS: a Multilingual and Cross-Lingual Zero-Shot TTS Engine
First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios, 2.6M samples, and up to ~30% gains over SOTA where every other model falls apart. **You'll come …
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
最全面的 Claude Code 中文教程 - 从零基础到企业级应用
Implementation for the paper "StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction".
[ICASSP 2026] Official code for "Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration"
High-Quality Voice Cloning TTS for 600+ Languages
Official code for "WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling"