First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios, 2.6M samples, and up to ~30% gains over SOTA where every other model falls apart. **You'll come …

Python 983 63 Updated Jun 2, 2026

hyzhang24 / DuplexSLA

DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action

75 Updated May 20, 2026

dograh-hq / dograh

Open source voice AI platform. Self-hosted alternative to Vapi and Retell. On Prem, BYOK across Speech to Speech or LLM/STT/TTS, with a visual workflow builder, MCP native and telephony support.

Python 4,375 920 Updated Jun 12, 2026

NVlabs / DiffusionNFT

[ICLR 2026 Oral] DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Python 901 37 Updated Feb 10, 2026

tmux / tmux

tmux source code

C 46,523 2,690 Updated Jun 13, 2026

NVIDIA / audio-intelligence

Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.

Python 127 11 Updated Mar 3, 2026

facebookresearch / WavFlow

MultiModal Audio Generation in Raw Waveform Space.

Python 152 10 Updated May 26, 2026

inclusionAI / LLaDA2.X

LLaDA2.0 is the diffusion language model series developed by InclusionAI team, Ant Group.

432 23 Updated Feb 12, 2026

jingyaogong / minimind-o

🎙️ 「大模型」从0训练0.1B能听能说能看的全模态Omni模型！A 0.1B Omni model trained from scratch, capable of listening, speaking, and seeing!

Python 1,847 219 Updated Jun 8, 2026

innovator-zero / FASTER

FASTER: Rethinking Real-Time Flow VLAs

Python 129 8 Updated May 14, 2026

Tencent / SongBench

Python 46 1 Updated Apr 30, 2026

kyutai-labs / moshi-rag

MoshiRAG is a compact full-duplex speech language model augmented with asynchronous knowledge retrieval to improve factuality without sacrificing real-time interactivity.

Rust 105 8 Updated Apr 28, 2026

OpenMOSS / MOSS-Music

MOSS-Music is an open-source music understanding model for targeting musical captioning, lyrics ASR, structural analysis, chord / key / tempo reasoning, and long-form musical question answering.

Python 92 6 Updated May 9, 2026