Stars
MOSS-Audio is an open-source foundation model for unified audio understanding, enabling speech, sound, music, captioning, QA, and reasoning in real-world scenarios.
GGML-based C++ inference for BS Roformer/Mel-Band-Roformer vocal separation | 纯 C++ 实现的基于 GGML 的 BS Roformer/Mel-Band-Roformer 人声分离推理
ShoufaChen / WavFlow-Dev
Forked from facebookresearch/WavFlowMultiModal Audio Generation in Raw Waveform Space.
The official implementation of WaveNet-VNNs for Active Noise Control (ANC), a fully causal solution.
A curated list of models, benchmarks, tools and guides for audio editing
[ICLR 2026] SmartDJ: declarative audio editing with audio langugae model.
Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages, 15x faster than Whisper, non-autoregressive.
The source code for CineSRD and the SubtitleSD benchmark is provided in this repository.
An Open-Source Project to Unify Audio Processing and Generation
Rebuild of GTCRN using Grouped TCNs, amidst other changes. Initially an attempt to target MCU deployment.
The awesome collection of OpenClaw skills. 5,400+ skills filtered and categorized from the official OpenClaw Skills Registry.🦞
This is the PyTorch implementation of the Universal Source Separation with Weakly labelled Data.
Speed-optimized streaming neural speech enhancement network
This is the official implementation of the LiSenNet
The official repo of UL-UNAS, an ultra-lightweight SE model.
PyTorch-based room impulse response (RIR) simulation toolkit with dynamic scenes, GPU acceleration.
The official implementation of GTCRN, an ultra-lightweight SE model.
A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation
Codebase for the paper "Visually Informed Binaural Audio Generation without Binaural Audios" (CVPR 2021)
A Python Library for Full Reference Binaural Fidelity Testing, Visualization & Feature Generation
Official page of "DeepASA: An Object-Oriented Multi-Purpose Network for Auditory Scene Analysis"