Skip to content
View ABexit's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.
  • University of Chinese Academy of Sciences
  • BeiJing

Block or report ABexit

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

High-Quality Voice Cloning TTS for 600+ Languages

Python 7,577 1,187 Updated Jun 11, 2026

[ACL 2026 Main] FineLAP: Taming Heterogeneous Supervision for Fine-grained Language-Audio Pre-training

Python 35 1 Updated Apr 20, 2026

VITA-QINYU: Expressive Spoken Language Model for Role-Playing and Singing

Python 121 7 Updated Apr 3, 2026
Python 432 34 Updated Mar 25, 2026

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,533 319 Updated May 26, 2026

GitHub Repository for the AudSemThinker Model and the AudSem Dataset

Python 14 2 Updated Jun 4, 2025

A framework for efficient model inference with omni-modality models

Python 5,195 1,134 Updated Jun 18, 2026

Send a phone call from AI agent, in an API call. Or, directly call the bot from the configured phone number!

Python 6,507 777 Updated Jun 17, 2026

Open-source framework for conversational voice AI agents

Python 10,682 1,295 Updated Jun 16, 2026

[ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching

Jupyter Notebook 45 6 Updated Feb 9, 2025

The first medical SpeechLM, open-sourced with weight, data, and code of training, inference, and evaluation.

Python 9 Updated Apr 23, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 379,390 79,417 Updated Jun 18, 2026
Python 254 51 Updated Jun 3, 2026

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…

Python 12,019 1,558 Updated Mar 17, 2026

PersonaPlex code.

Python 10,053 1,400 Updated Mar 2, 2026

A Fully Self-Hosted Solution for Full-Duplex Voice Interaction

Python 544 45 Updated Sep 28, 2025

GPT-4o-level, real-time spoken dialogue system.

Python 377 33 Updated Jan 27, 2025

An audio/acoustic activity detection and audio segmentation tool

Python 852 101 Updated May 14, 2026

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 9,363 786 Updated Mar 26, 2026

FlowMirror-HydraVox — A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokens per step for faster, high-quality speech synthesis, featuri…

Python 49 4 Updated Feb 17, 2026

Added vLLM support to IndexTTS for faster inference.

Python 1,180 167 Updated Apr 13, 2026

Multilingual TTS model with voice cloning and duration control, based on T5Gemma encoder-decoder LLM

Python 308 31 Updated Apr 3, 2026

SoTA open-source TTS

Python 25,118 3,330 Updated Jun 10, 2026

💖🧸 Self hosted, you-owned Grok Companion, a container of souls of waifu, cyber livings to bring them into our worlds, wishing to achieve Neuro-sama's altitude. Capable of realtime voice chat, Minec…

TypeScript 41,096 4,137 Updated Jun 18, 2026

GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Python 2,340 173 Updated May 16, 2026

The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"

Python 640 43 Updated Mar 4, 2026

Controllable and fast Text-to-Speech for over 7000 languages!

Python 2,203 319 Updated Jan 25, 2026

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 6,291 691 Updated Aug 10, 2024

A generative speech model for daily dialogue.

Python 39,474 4,248 Updated Apr 10, 2026

GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters

Python 814 77 Updated Mar 6, 2026
Next