Skip to content
View DonkeyHang's full-sized avatar

Block or report DonkeyHang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A modern GUI client based on Tauri, designed to run in Windows, macOS and Linux for tailored proxy experience

TypeScript 109,579 7,964 Updated Apr 10, 2026

A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows

Python 248 18 Updated Jan 8, 2026

A 10000+ hours dataset for Chinese speech recognition

Shell 597 53 Updated Jan 9, 2026

https://hf.co/hexgrad/Kokoro-82M

JavaScript 6,473 719 Updated Aug 6, 2025

Speaker anonymization pipeline for hiding the identity of the speaker of a recording by changing the voice in it.

Shell 98 10 Updated Jul 4, 2025

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Python 2,877 261 Updated Dec 8, 2025

[CVPR 2025] "DiC: Rethinking Conv3x3 Designs in Diffusion Models", a performant & speedy Conv3x3 diffusion model.

Python 250 20 Updated Jun 12, 2025

[Interspeech 2025] DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec

Jupyter Notebook 65 8 Updated Mar 11, 2026

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,283 110 Updated Mar 2, 2025

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,763 178 Updated Jan 26, 2026

Android NDK samples with Android Studio

C++ 10,500 4,257 Updated Feb 20, 2026

Official repository for FlowSE (Interspeech 2025)

JavaScript 89 11 Updated Jul 9, 2025

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.

Python 872 134 Updated Mar 3, 2026

[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

Python 1,111 190 Updated Jan 5, 2026

TTS with kokoro and onnx runtime

Python 2,464 262 Updated Jan 30, 2026

[AAAI 2025] EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

Python 4,211 463 Updated Apr 7, 2026

Audio-FLAN

Jupyter Notebook 159 5 Updated Sep 23, 2025

This project uses a variety of advanced voiceprint recognition models such as EcapaTdnn, ResNetSE, ERes2Net, CAM++, etc. It is not excluded that more models will be supported in the future. At the …

Python 1,260 167 Updated Dec 17, 2025

Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"

Python 3,224 287 Updated Jan 8, 2026

dog-can-sing-song

Python 54 6 Updated Jan 9, 2026

Noise supression using deep filtering

Python 43 8 Updated Aug 20, 2025

Limiter, compressor, convolver, equalizer and auto volume and many other plugins for PipeWire applications

HTML 9,261 338 Updated Apr 10, 2026

mnn asr demo.

C++ 26 2 Updated Mar 24, 2025

Unofficial SoundStream implementation of Pytorch with training code and 16kHz pretrained checkpoint

Python 78 12 Updated Feb 9, 2026

LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation with Spoken Language Models" (arXiv 2024).

93 4 Updated Dec 28, 2024

SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.

Python 881 100 Updated Oct 10, 2025

🚀 「大模型」1小时从0训练67M参数的视觉多模态VLM!🌏 Train a 67M-parameter VLM from scratch in just 1 hours!

Python 7,382 811 Updated Apr 4, 2026

🚀🚀 「大模型」2小时完全从0训练64M的小参数GPT!🌏 Train a 64M-parameter GPT from scratch in just 2h!

Python 46,445 5,721 Updated Apr 10, 2026

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 4,033 332 Updated Aug 14, 2025

PyTorch Implementation of TCSinger(EMNLP 2024): Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control

Python 376 46 Updated Oct 7, 2025
Next