AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…

858 81 Updated Jul 8, 2025

Neutone / neutone_sdk

Join the community on Discord for more discussions around Neutone! https://discord.gg/VHSMzb8Wqp

Python 562 29 Updated Nov 2, 2025

deezer / spleeter

Deezer source separation library including pretrained models.

Python 27,705 3,045 Updated Apr 2, 2025

roudimit / whisper-flamingo

Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation

Jupyter Notebook 192 14 Updated Jul 29, 2025

espressif / esp-sr

Speech recognition

C 1,161 171 Updated Oct 29, 2025

bbruceyuan / Hands-On-Large-Language-Models-CN

中文翻译的 Hands-On-Large-Language-Models (hands-on-llms)，动手学习大模型

Jupyter Notebook 1,625 173 Updated Oct 19, 2025

Andong-Li-speech / RTNet

implementation of Monaural Speech Enhancement with Recursive Learning in the Time Domain

Python 48 7 Updated Nov 4, 2020

wenet-e2e / west

We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction

Python 139 8 Updated Nov 5, 2025

rasbt / LLMs-from-scratch

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 77,950 11,503 Updated Nov 3, 2025

MLNLP-World / LLMs-from-scratch-CN

LLMs-from-scratch项目中文翻译

Jupyter Notebook 1,906 310 Updated Oct 15, 2025

Mitchell-GH / code-analysis-easycom

MATLAB 1 Updated Jul 23, 2025

twin-tigon / thesis-matlab

Collection of MATLAB scripts and toolboxes regarding my Master Thesis on psychoacoustics

MATLAB 10 1 Updated Dec 16, 2017

ARM-software / CMSIS-DSP

CMSIS-DSP embedded compute library for Cortex-M and Cortex-A

C 848 196 Updated Oct 27, 2025

GuitarML / mldsp-papers

Collection of papers related to neural nets/machine learning for audio DSP.

144 4 Updated Apr 29, 2025

EfficientDL / book

PDFs and Codelabs for the Efficient Deep Learning book.

Jupyter Notebook 202 25 Updated May 29, 2023

qiuqiangkong / panns_inference

Python 247 37 Updated Mar 5, 2024

AsahiLinux / speakersafetyd

Rust speaker safety daemon for Asahi Linux

Rust 182 14 Updated Mar 29, 2025

nahue-passano / loudspeaker-tmatrix

Loudspeaker simulation

Jupyter Notebook 5 1 Updated Aug 22, 2025

fcampelo / Loudspeaker-model

FEMM loudspeaker model

Lua 4 3 Updated Sep 27, 2021

qiuqiangkong / audioset_tagging_cnn

Python 1,603 288 Updated Jul 25, 2024

PaddlePaddle / PaddleOCR

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Python 62,711 9,240 Updated Nov 5, 2025

DCASE-REPO / DESED_task

Domestic environment sound event detection task

Python 149 69 Updated Jun 11, 2024

yehav / SR_for_InRoom_Comm

Speech Reinforcement for In-Room Communications

MATLAB 7 4 Updated Mar 9, 2025

marc1701 / FACT

Feedback Analysis and Cancellation Toolkit

MATLAB 24 7 Updated Nov 26, 2018

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

evan2jiang

Lists (22)

🔮 Future ideas

TTS

主动降噪

信号基础

回声

大模型

嵌入式

房间声学

扬声器保护

效率工具

有限元

机器人音频

汇总

汇总资源

波束

测试

盲源分离

眼镜

空间音频

语音增强

音效

风噪

Starred repositories

diffusion-models