Skip to content
View whaozl's full-sized avatar

Block or report whaozl

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning

Python 738 87 Updated Dec 17, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,134 190 Updated Oct 9, 2025
Python 7 1 Updated Nov 15, 2025

Official repository for the WenetSpeech-Chuan dataset.

Python 129 1 Updated Nov 27, 2025

A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation

Python 247 10 Updated Nov 30, 2025
Python 1,162 65 Updated Dec 3, 2025

Audience engagement platform

TypeScript 33 11 Updated Dec 16, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 6 5 Updated Dec 15, 2025

We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction

Python 167 11 Updated Dec 16, 2025

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

TypeScript 19,518 1,915 Updated Dec 19, 2025

A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows

Python 179 10 Updated Dec 17, 2025

A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.

Python 135 17 Updated Oct 7, 2025

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 443 24 Updated Dec 15, 2025

A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.

Python 127 10 Updated Sep 19, 2025

LongCat Audio Tokenizer and Detokenizer

Python 261 18 Updated Dec 15, 2025

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 903 87 Updated Sep 20, 2025
Python 4,573 370 Updated Dec 19, 2025

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,487 213 Updated Dec 16, 2025

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,271 92 Updated Sep 22, 2025

FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.

Python 232 25 Updated Nov 11, 2025

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 2,734 337 Updated Dec 11, 2025

On-device TTS model by Neuphonic

Python 4,274 448 Updated Dec 15, 2025

Official code for"DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation"

Python 205 18 Updated Nov 28, 2025

Bert-vits2-V2.3 训练和推理

Python 49 20 Updated Mar 13, 2024

Bert-vits2-V2.2 训练和推理

Python 9 2 Updated Dec 19, 2023

vits2 backbone with multilingual-bert

Python 8,642 1,255 Updated Dec 15, 2025

A barebones WebSocket client and server implementation written in 100% Java.

Java 10,779 2,595 Updated Nov 2, 2025

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

Python 2,983 320 Updated Dec 15, 2025

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Python 291 38 Updated May 16, 2025
Next