Lists (22)
Sort Name ascending (A-Z)
Starred repositories
Open-Source Chrome extension for AI-powered web automation. Run multi-agent workflows using your own LLM API key. Alternative to OpenAI Operator.
Examples of my Claude Code infrastructure with skill auto-activation, hooks, and agents
12 Lessons to Get Started Building AI Agents
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…
Join the community on Discord for more discussions around Neutone! https://discord.gg/VHSMzb8Wqp
Deezer source separation library including pretrained models.
kyutai-labs / nanoGPTaudio
Forked from karpathy/nanoGPTCode for the blog "Neural audio codecs: how to get audio into LLMs"
Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation
中文翻译的 Hands-On-Large-Language-Models (hands-on-llms),动手学习大模型
implementation of Monaural Speech Enhancement with Recursive Learning in the Time Domain
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
LLMs-from-scratch项目中文翻译
Collection of MATLAB scripts and toolboxes regarding my Master Thesis on psychoacoustics
CMSIS-DSP embedded compute library for Cortex-M and Cortex-A
Collection of papers related to neural nets/machine learning for audio DSP.
PDFs and Codelabs for the Efficient Deep Learning book.
Rust speaker safety daemon for Asahi Linux
Loudspeaker simulation
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
Domestic environment sound event detection task
zuowanbushiwo / awesome-speech-recognition-speech-synthesis-papers
Forked from zzw922cn/awesome-speech-recognition-speech-synthesis-papersautomatic speech recognition paper roadmap, including HMM, DNN, RNN, CNN, Seq2Seq, Attention
Speech Reinforcement for In-Room Communications