Skip to content
View WendongGan's full-sized avatar
🎯
Focusing
🎯
Focusing
  • UESTC
  • Chengdu,China

Block or report WendongGan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

Python 1,001 251 Updated Apr 10, 2026

Ming-omni-tts: Simple and Efficient Unified Generation of Speech, Music, and Sound with Precise Control

Python 220 16 Updated Feb 26, 2026

https://adongwanai.github.io/AgentGuide | AI Agent开发指南 | LangGraph实战 | 高级RAG | 转行大模型 | 大模型面试 | 算法工程师 | 面试题库 | 强化学习|数据合成

HTML 3,549 343 Updated Apr 2, 2026

OmniCodec: Low Frame Rate Universal Audio Codec with Semantic–Acoustic Disentanglement

Python 32 1 Updated Apr 6, 2026

Bash is all you need - A nano claude code–like 「agent harness」, built from 0 to 1

TypeScript 52,268 8,546 Updated Apr 7, 2026

Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.

Python 178 16 Updated Mar 20, 2026

🤗 R1-AQA Model: mispeech/r1-aqa

Python 320 29 Updated Mar 28, 2025

Easy fine-tuning for Qwen3-TTS: Fast voice cloning and high-quality multilingual speech synthesis.

Python 69 11 Updated Apr 8, 2026

Audio skills for Claw

Shell 16 Updated Mar 13, 2026

Pre-training, SFT, DPO and GRPO for Text-to-Audio Generation

Python 45 6 Updated Mar 13, 2026

Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion

Python 119 3 Updated Mar 12, 2026

ChatTTS 2000条音色稳定性打分🥇+区分男女年龄👧+在线试听🔈 ChatTTS 2K Speaker Stability Score & Categorized by Gender and Age & Audio Preview

Python 720 39 Updated Jul 2, 2024

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…

Python 70 5 Updated Mar 18, 2026

This challenge focuses on evaluating speech recognition and semantic understanding capabilities of AI glasses in complex real-world environments.

12 Updated Apr 10, 2026

Code for Latent Speech-Text Transformer (LST)

Python 14 2 Updated Mar 12, 2026

Claw 们终将接管世界,PUAClaw is All You Need

HTML 2,585 238 Updated Mar 9, 2026

A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD

Python 349 26 Updated Apr 4, 2026

An Open-Source Multidimension Speech Understanding Foundation Model Built upon OpenPangu on Ascend NPUs

Python 29 Updated Mar 15, 2026

A curated list of full-duplex spoken dialogue models & benchmarks

39 1 Updated Apr 9, 2026

FLM-Audio is a audio-language subversion of RoboEgo/FLM-Ego -- an omnimodal model with native full duplexity.

Python 66 9 Updated Dec 9, 2025

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…

Python 10,597 1,378 Updated Mar 17, 2026

WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models

Python 30 Updated Feb 13, 2026

This is the official implementation of reverberant speech to room impulse response estimator

Python 42 5 Updated Aug 7, 2024

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 355,751 72,017 Updated Apr 13, 2026

Unofficial Implementation of MiniMax-Speech

Python 6 1 Updated Feb 23, 2026

Local-first Suno-style music studio powered by ACE-Step 1.5.

TypeScript 128 18 Updated Feb 7, 2026

Official repository for the paper "Audio ControlNet for Fine-Grained Audio Generation and Editing".

Python 71 3 Updated Feb 7, 2026

Reinforcement Learning via Self-Distillation (SDPO)

Python 761 82 Updated Feb 18, 2026

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 2,376 232 Updated Jan 30, 2026

UniAudio 2.0: An audio fundation model for text, speech, sound, and music

Python 274 7 Updated Feb 14, 2026
Next