Highlights
- Pro
Lists (26)
Sort Name ascending (A-Z)
AcousticFrontend
AcousticModel
ASR
ASR-pretrain
ASV
AudioQuality
AwesomeList
Paper list, awesome list and so on.BandwidthExtension
Classification
Codec
Data
Develop
Evaluation
FrontEnd
FrontEnd for Text-to-SpeechHow-to
LLM
Music
Performance
Quant
SingingVoiceSynthesis
SpeechEditing
SpeechSeperation
Tools
Universal Method
Vocoder
VoiceConversion
Starred repositories
🐍📦 Ultra-fast Python package for calculating and analyzing the Word Error Rate (WER). Built for the scalable evaluation of speech and transcription accuracy.
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
Text-to-text alignment algorithm for speech recognition error analysis.
轻量级大语言模型MiniMind的源码解读,包含tokenizer、RoPE、MoE、KV Cache、pretraining、SFT、LoRA、DPO等完整流程
A python module to repair invalid JSON from LLMs
Simultaneous speech-to-text model
MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…
Text-audio foundation model from Boson AI
Chinese voice corpus. 中文语音语料,语音更加清晰自然,包含8个开源数据集,3200个说话人,900小时语音,1300万字。
A Collection of Papers on Diffusion Language Models
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deplo…
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
Paper list of simultaneous translation / streaming translation, including text-to-text machine translation and speech-to-text translation.
[EMNLP 2025] MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation
A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.
代码大模型 预训练&微调&DPO 数据处理 业界处理pipeline sota
Heuristic filtering framework for RefineCode
A quick guide (especially) for trending instruction finetuning datasets