A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD

Python 351 26 Updated Apr 4, 2026

pengzhendong / compute-wer

Compute WER and SER for speech recognition evaluation

Python 27 3 Updated Mar 18, 2026

QwenLM / Qwen3-TTS

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…

Python 10,639 1,380 Updated Mar 17, 2026

jeremy110 / Finetune_Nemo_ASR

Finetune Nemo parakeet ASR model with new language (support 8 bit optimizer). Experimental birwkv-fastconformer TDT for long-form ASR(8.5 hours in single pass).

Python 20 4 Updated Nov 27, 2025

narcotic-sh / senko

Very fast, accurate speaker diarization

Python 251 27 Updated Mar 25, 2026

GeeeekExplorer / nano-vllm

Nano vLLM

Python 12,864 1,919 Updated Apr 13, 2026

LONGXUANX / nano-whisper

A demo-level low-latency, high-throughput inference engine for whisper

Python 19 4 Updated Nov 9, 2025

analyticsinmotion / werpy

🐍📦 Ultra-fast Python package for calculating and analyzing the Word Error Rate (WER). Built for the scalable evaluation of speech and transcription accuracy.

Python 25 6 Updated Mar 30, 2026

inclusionAI / Ming-UniAudio

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation

Python 441 29 Updated Nov 27, 2025

corticph / error-align

Text-to-text alignment algorithm for speech recognition error analysis.

Python 28 4 Updated Apr 6, 2026

hans0809 / MiniMind-in-Depth

轻量级大语言模型MiniMind的源码解读，包含tokenizer、RoPE、MoE、KV Cache、pretraining、SFT、LoRA、DPO等完整流程

920 79 Updated Jun 16, 2025

mangiucugna / json_repair

A python module to repair invalid JSON from LLMs

Python 4,647 182 Updated Apr 13, 2026

microsoft / VibeVoice

Open-Source Frontier Voice AI

Python 39,323 4,557 Updated Apr 10, 2026

ufal / SimulStreaming

Python 561 82 Updated Mar 10, 2026

QuentinFuxa / WhisperLiveKit

Simultaneous speech-to-text models

Python 10,088 1,040 Updated Mar 31, 2026

mingyin0312 / RLFromScratch

Python 570 54 Updated Aug 28, 2025

OpenMOSS / MOSS-TTSD

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enablin…

Python 1,268 123 Updated Mar 23, 2026