echocatzh

Follow

🤗

willing to share

Shimin Zhang echocatzh

🤗

willing to share

Follow

Audio Engineer

189 followers · 121 following

Tongyi Lab
Hangzhou, China
17:06 (UTC +08:00)
https://scholar.google.com/citations?user=hreTTqwAAAAJ&hl=en

Achievements

Achievements

Lists (4)

Sort

audio deepfake

DSP

digital signal processing method

Interesting

Some interesting method

NN-FE

neural network based speech enhancement method

Stars

Yeachan-Heo / oh-my-codex

OmX - Oh My codeX: Your codex is not alone. Add hooks, agent teams, HUDs, and so much more.

TypeScript 28,949 2,307 Updated May 18, 2026

openai / codex

Lightweight coding agent that runs in your terminal

Rust 83,442 12,093 Updated May 18, 2026

FunAudioLLM / Fun-ASR

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

Python 1,154 111 Updated Feb 25, 2026

X-PLUG / MobileAgent

Mobile-Agent: The Powerful GUI Agent Family

Python 8,681 875 Updated May 14, 2026

ErikEkstedt / VoiceActivityProjection

Voice Activity Projection Models: Self-supervised learning of Turn-taking Events

Python 100 21 Updated May 29, 2024

pipecat-ai / smart-turn

Python 1,385 82 Updated Jan 29, 2026

TEN-framework / ten-turn-detection

Turn detection for full-duplex dialogue communication

Python 561 39 Updated Dec 26, 2025

ASLP-lab / Easy-Turn

Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems

Python 110 8 Updated Jan 25, 2026

wenet-e2e / west

We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction

Python 206 17 Updated Apr 7, 2026

ASLP-lab / OSUM

OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.

Python 492 32 Updated Nov 23, 2025

opendilab / HH-Codec

[ICML 2025 Tokenization Workshop] HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling

Python 97 4 Updated Sep 28, 2025

ModelCloud / GPTQModel

LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

Python 1,151 184 Updated May 18, 2026

AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 5,060 543 Updated Apr 11, 2025

JusperLee / TIGER

TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation

Python 420 62 Updated Apr 20, 2026

OpenMOSS / SpeechGPT-2.0-preview

GPT-4o-level, real-time spoken dialogue system.

Python 378 33 Updated Jan 27, 2025

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,729 2,234 Updated Feb 1, 2025

deepseek-ai / DeepSeek-V3

Python 103,561 16,748 Updated Aug 28, 2025

LetterLiGo / SafeEar

[ACM CCS'24] SafeEar: Content Privacy-Preserving Audio Deepfake Detection

Python 185 22 Updated Mar 24, 2025

modelscope / ClearerVoice-Studio

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 4,144 341 Updated Aug 14, 2025

Stability-AI / stable-codec

A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.

Python 430 29 Updated Feb 12, 2026

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 21,090 2,431 Updated May 3, 2026

jishengpeng / WavChat

A Survey of Spoken Dialogue Models (60 pages)

318 18 Updated Nov 28, 2024

hhguo / SoCodec

Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications

Python 91 8 Updated Dec 20, 2024

VITA-MLLM / VITA

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,512 181 Updated Mar 28, 2025

open-webui / open-webui

User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

Python 137,563 19,640 Updated May 15, 2026

exo-explore / exo

Run frontier AI locally.

Python 44,759 3,168 Updated May 15, 2026

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 39,277 4,258 Updated Apr 10, 2026

X-LANCE / SLAM-LLM

A Framework for Speech, Language, Audio, Music Processing with Large Language Model

Python 1,032 114 Updated Jan 15, 2026

forever208 / ADM-ES

[ICLR 2024] Official code for the paper 'Elucidating the Exposure Bias in Diffusion Models'

Python 49 2 Updated Jun 2, 2025

metame-ai / awesome-audio-plaza

Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation

408 20 Updated Nov 2, 2025