soong2

Follow

soong2

Follow

0 followers · 2 following

Stars

modelscope / FunASR

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

Python 18,144 1,857 Updated Jun 16, 2026

MooreThreads / MooER

MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not …

Python 221 17 Updated Jan 8, 2025

ictnlp / LLaMA-Omni2

Python 273 28 Updated May 19, 2025

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,141 223 Updated May 19, 2025

kyutai-labs / hibiki

Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- H…

Rust 1,470 118 Updated Apr 15, 2025

yxlu-0102 / MP-SENet

Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement

Python 490 74 Updated May 19, 2025

snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 9,342 785 Updated Mar 26, 2026

sp-uhh / storm

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

Python 254 33 Updated Sep 13, 2024

hitz-zentroa / whisper-lm-transformers

Add n-gram and LLM language model support to HF Transformers Whisper models.

Python 14 2 Updated May 6, 2025

kpu / kenlm

KenLM: Faster and Smaller Language Model Queries

C++ 2,779 545 Updated Mar 30, 2025

hitz-zentroa / whisper-lm

Add n-gram and large language model (LLM) support to Whisper models.

Jupyter Notebook 43 4 Updated May 6, 2025

Rikorose / DeepFilterNet

Noise supression using deep filtering

Python 4,335 472 Updated Oct 17, 2024

sp-uhh / sgmse_crp

Python 31 3 Updated Jan 9, 2024

sp-uhh / sgmse

Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation

Python 757 106 Updated May 12, 2026

LiangXu123 / ROSE-CD

Robust One-step Speech Enhancement via Consistency Distillation (ROSE-CD)(IEEE WASPAA ORAL)

Python 10 Updated Jun 7, 2026

Xiaobin-Rong / gtcrn

The official implementation of GTCRN, an ultra-lightweight SE model.

Python 670 111 Updated Jan 18, 2026

kyutai-labs / moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 10,407 969 Updated May 16, 2026

ufal / SimulStreaming

Python 620 91 Updated Jun 8, 2026

moonshine-ai / moonshine

Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces

C 8,482 459 Updated Jun 2, 2026

soham97 / awesome-sound_event_detection

Reading list for research topics in Sound AI

198 9 Updated Aug 8, 2024

rsxdalv / VibeVoice

Forked from microsoft/VibeVoice

Frontier Open-Source Text-to-Speech

Python 132 129 Updated Sep 9, 2025

QuentinFuxa / WhisperLiveKit

Simultaneous speech-to-text models

Python 10,452 1,082 Updated Jun 12, 2026

hkchengrex / MMAudio

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 2,209 259 Updated Feb 23, 2026

NVIDIA / audio-flamingo

PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models

1,144 96 Updated Dec 15, 2025

fluxions-ai / vui

Real-time voice assistant — WebRTC streaming, faster-whisper ASR, local LLM, Vui Nano (300M) TTS. OpenAI Realtime API compatible. Voice cloning, barge-in, ~9× realtime on a 4090. Apache 2.0.

Python 701 72 Updated Jun 12, 2026

BinWang28 / audio-ai-hub

The hub for audio AI research: papers, open models, benchmarks & datasets across audio LLMs, speech recognition, TTS, music & audio generation.

Python 933 48 Updated Jun 15, 2026

halsay / ASR-TTS-paper-daily

Update ASR paper everyday

Python 515 24 Updated May 16, 2026

QwenLM / Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 2,081 165 Updated Apr 21, 2025

QwenLM / Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,903 145 Updated Jul 5, 2024

bytedance / SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

1,450 115 Updated May 26, 2026