whaozl

Follow

Anjos whaozl

Follow

爱王晓，爱程序，爱语言，开心coding，开心playing，做最好的boy。AM coding,PM reading,one one up!

28 followers · 40 following

Shanghai,China
blog.csdn.net/zhulinniao

Achievements

Achievements

Stars

zai-org / GLM-TTS

GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning

Python 738 87 Updated Dec 17, 2025

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,134 190 Updated Oct 9, 2025

ASLP-lab / Llasa-1B-Yue-Updated

Python 7 1 Updated Nov 15, 2025

ASLP-lab / WenetSpeech-Chuan

Official repository for the WenetSpeech-Chuan dataset.

Python 129 1 Updated Nov 27, 2025

ASLP-lab / WenetSpeech-Yue

A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation

Python 247 10 Updated Nov 30, 2025

pipecat-ai / smart-turn

Python 1,162 65 Updated Dec 3, 2025

dtinth / auden

Audience engagement platform

TypeScript 33 11 Updated Dec 16, 2025

cchen1436 / NeMo

Forked from NVIDIA-NeMo/NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 6 5 Updated Dec 15, 2025

wenet-e2e / west

We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction

Python 167 11 Updated Dec 16, 2025

langfuse / langfuse

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

TypeScript 19,518 1,915 Updated Dec 19, 2025

ASLP-lab / MeanVC

A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows

Python 179 10 Updated Dec 17, 2025

neuphonic / neucodec

A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.

Python 135 17 Updated Oct 7, 2025

meituan-longcat / LongCat-Flash-Omni

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 443 24 Updated Dec 15, 2025

XiaomiMiMo / MiMo-Audio-Tokenizer

A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.

Python 127 10 Updated Sep 19, 2025

meituan-longcat / LongCat-Audio-Codec

LongCat Audio Tokenizer and Detokenizer

Python 261 18 Updated Dec 15, 2025

XiaomiMiMo / MiMo-Audio

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 903 87 Updated Sep 20, 2025

stepfun-ai / Step-Audio

Python 4,573 370 Updated Dec 19, 2025

facebookresearch / omnilingual-asr

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,487 213 Updated Dec 16, 2025

stepfun-ai / Step-Audio2

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,271 92 Updated Sep 22, 2025

xingchensong / FlashCosyVoice

FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.

Python 232 25 Updated Nov 11, 2025

Soul-AILab / SoulX-Podcast

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 2,734 337 Updated Dec 11, 2025

neuphonic / neutts-air

On-device TTS model by Neuphonic

Python 4,274 448 Updated Dec 15, 2025

GiantAILab / DiaMoE-TTS

Official code for"DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation"

Python 205 18 Updated Nov 28, 2025

v3ucn / Bert-vits2-V2.3

Bert-vits2-V2.3 训练和推理

Python 49 20 Updated Mar 13, 2024

v3ucn / Bert-vits2-V2.2

Bert-vits2-V2.2 训练和推理

Python 9 2 Updated Dec 19, 2023

fishaudio / Bert-VITS2

vits2 backbone with multilingual-bert

Python 8,642 1,255 Updated Dec 15, 2025

TooTallNate / Java-WebSocket

A barebones WebSocket client and server implementation written in 100% Java.

Java 10,779 2,595 Updated Nov 2, 2025

OpenBMB / VoxCPM

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

Python 2,983 320 Updated Dec 15, 2025

meituan-longcat / LongCat-Flash-Chat

1,245 61 Updated Dec 15, 2025

mbzuai-oryx / LLMVoX

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Python 291 38 Updated May 16, 2025