Skip to content
View WendongGan's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Ant Group
  • Chengdu,China

Block or report WendongGan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

YingMusic-Singer-Plus: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance

Python 65 4 Updated Apr 12, 2026
14 Updated Jun 9, 2026

Di♪♪Rhythm 2: Efficient And High Fidelity Song Generation Via Block Flow Matching

Python 165 12 Updated Nov 9, 2025
Python 1 Updated Feb 24, 2026

MMAE: A Massive Multitask Audio Editing Benchmark

Python 91 3 Updated Jun 8, 2026

WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling

Python 174 6 Updated Jun 6, 2026
Python 16 1 Updated Jun 12, 2026

First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios, 2.6M samples, and up to ~30% gains over SOTA where every other model falls apart. **You'll come …

Python 983 63 Updated Jun 2, 2026

DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action

75 Updated May 20, 2026

Open source voice AI platform. Self-hosted alternative to Vapi and Retell. On Prem, BYOK across Speech to Speech or LLM/STT/TTS, with a visual workflow builder, MCP native and telephony support.

Python 4,375 920 Updated Jun 12, 2026

[ICLR 2026 Oral] DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Python 901 37 Updated Feb 10, 2026

tmux source code

C 46,523 2,690 Updated Jun 13, 2026

Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.

Python 127 11 Updated Mar 3, 2026

MultiModal Audio Generation in Raw Waveform Space.

Python 152 10 Updated May 26, 2026

LLaDA2.0 is the diffusion language model series developed by InclusionAI team, Ant Group.

432 23 Updated Feb 12, 2026

🎙️ 「大模型」从0训练0.1B能听能说能看的全模态Omni模型!A 0.1B Omni model trained from scratch, capable of listening, speaking, and seeing!

Python 1,847 219 Updated Jun 8, 2026

FASTER: Rethinking Real-Time Flow VLAs

Python 129 8 Updated May 14, 2026
Python 46 1 Updated Apr 30, 2026

MoshiRAG is a compact full-duplex speech language model augmented with asynchronous knowledge retrieval to improve factuality without sacrificing real-time interactivity.

Rust 105 8 Updated Apr 28, 2026

MOSS-Music is an open-source music understanding model for targeting musical captioning, lyrics ASR, structural analysis, chord / key / tempo reasoning, and long-form musical question answering.

Python 92 6 Updated May 9, 2026
Python 47 4 Updated Apr 27, 2026
Python 1 Updated Oct 4, 2025
Python 88 9 Updated Feb 24, 2026
Python 20 3 Updated Feb 28, 2026

τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

Python 1,343 349 Updated Jun 11, 2026

Ming-omni-tts: Simple and Efficient Unified Generation of Speech, Music, and Sound with Precise Control

Python 240 17 Updated Feb 26, 2026

https://adongwanai.github.io/AgentGuide | AI Agent开发指南 | LangGraph实战 | 高级RAG | 转行大模型 | 大模型面试 | 算法工程师 | 面试题库 | 强化学习|数据合成

HTML 5,874 581 Updated Jun 9, 2026

OmniCodec: Low Frame Rate Universal Audio Codec with Semantic–Acoustic Disentanglement

Python 40 1 Updated Apr 17, 2026

Bash is all you need - A nano claude code–like 「agent harness」, built from 0 to 1

Python 66,348 10,813 Updated Jun 7, 2026
Next