The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,533 319 Updated May 26, 2026

GLJS / AudSemThinker

GitHub Repository for the AudSemThinker Model and the AudSem Dataset

Python 14 2 Updated Jun 4, 2025

vllm-project / vllm-omni

A framework for efficient model inference with omni-modality models

Python 5,195 1,134 Updated Jun 18, 2026

microsoft / call-center-ai

Send a phone call from AI agent, in an API call. Or, directly call the bot from the configured phone number!

Python 6,507 777 Updated Jun 17, 2026

TEN-framework / ten-framework

Open-source framework for conversational voice AI agents

Python 10,682 1,295 Updated Jun 16, 2026

ozspeech / OZSpeech

[ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching

Jupyter Notebook 45 6 Updated Feb 9, 2025

SirryChen / SpeechMedAssist

The first medical SpeechLM, open-sourced with weight, data, and code of training, inference, and evaluation.

Python 9 Updated Apr 23, 2026

openclaw / openclaw

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 379,390 79,417 Updated Jun 18, 2026

a710128 / nanovllm-voxcpm

Python 254 51 Updated Jun 3, 2026

QwenLM / Qwen3-TTS

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…

Python 12,019 1,558 Updated Mar 17, 2026

NVIDIA / personaplex

PersonaPlex code.

Python 10,053 1,400 Updated Mar 2, 2026

FireRedTeam / FireRedChat

A Fully Self-Hosted Solution for Full-Duplex Voice Interaction

Python 544 45 Updated Sep 28, 2025

OpenMOSS / SpeechGPT-2.0-preview

GPT-4o-level, real-time spoken dialogue system.

Python 377 33 Updated Jan 27, 2025

amsehili / auditok

An audio/acoustic activity detection and audio segmentation tool

Python 852 101 Updated May 14, 2026

snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 9,363 786 Updated Mar 26, 2026

jingzhunxue / FlowMirror_HydraVox

FlowMirror-HydraVox — A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokens per step for faster, high-quality speech synthesis, featuri…

Python 49 4 Updated Feb 17, 2026

Ksuriuri / index-tts-vllm

Added vLLM support to IndexTTS for faster inference.

Python 1,180 167 Updated Apr 13, 2026

Aratako / T5Gemma-TTS

Multilingual TTS model with voice cloning and duration control, based on T5Gemma encoder-decoder LLM

Python 308 31 Updated Apr 3, 2026

resemble-ai / chatterbox

SoTA open-source TTS

Python 25,118 3,330 Updated Jun 10, 2026

moeru-ai / airi

💖🧸 Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minec…

TypeScript 41,096 4,137 Updated Jun 18, 2026