Lists (31)
Sort Name ascending (A-Z)
agent
asr
chatai
codec
crawl
爬虫dataset
diffusion
flow-tts
interesting
large language model
learn
llm
llm-multimodal
llm-tts
music
paper list
prompt
RL
speaker
speech understand
ssl
super_resolution
超分算法,让音频质量更好tools
train & inference
tts
tts-dadapipe
tts-eval
tts-postprocess
video
vocoder
评判
Stars
🚀 通用 AI IDE 账号管理工具:支持 Antigravity / Codex / GitHub Copilot / Windsurf / Kiro / Cursor / Gemini-cli / CodeBuddy,多账号切换、配额监控、自动唤醒与多开实例管理。 🚀 Universal AI IDE account manager for Antigravity / Codex / …
[INTERSPEECH 2024] Official code for VoxSim: A perceptual voice similarity dataset
WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling
Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation
An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.
Real-Time Streamable Generative Speech Restoration with Flow Matching
Official implementation of paper "Vocoder is not all you need".
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
一个功能强大的 Polymarket 预测市场跟单交易系统,支持自动化跟单、多账户管理、实时订单推送和统计分析。
X-ASR is a series of automatic speech recognition models based on the icefall framework, focusing on streaming ASR and low-latency deployment.
end-to-end text to audio scene generation model
An enhanced tool for CodexApp, striving to make Codex better to use and more comfortable 一个CodexApp的增强工具,努力让Codex变得更好用更舒服
A self-hosted ML coding practice platform. 68 problems from ReLU to flow matching — attention, training, RLHF, diffusion, and more. Instant feedback in the browser.
Multi-modal Emotion detection from IEMOCAP on Speech, Text, Motion-Capture Data using Neural Nets.
Talker-T2AV Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling
First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios, 2.6M samples, and up to ~30% gains over SOTA where every other model falls apart. **You'll come …
MultiModal Audio Generation in Raw Waveform Space.
A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows
A curated list of practical Codex skills for automating workflows across the Codex CLI and API.
A suite of plugins for legal workflows
ICLR 2026 Oral: WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
MCore-Bridge: Providing Megatron-Core model definitions for state-of-the-art large models and making Megatron training as simple as Transformers — with support for 300+ large language models (Qwen3…
A geometry-aware audio codec leveraging two-dimensional quantization
Official code for "WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling"
Academic Research Skills for Claude Code: research → write → review → revise → finalize
Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model