Stars
AI generates a real, editable PowerPoint from any document — native shapes & animations, speaker notes voiced as audio narration, and the option to follow your own .pptx template, not slide images …
Python implementation of OMLSA+IMCRA algorithm for speech enhancement.
Yaxin9Luo / FigMirror
Forked from VILA-Lab/FigMirrorAn Automated AI Agent Tool for Plotting Your Data in Any Paper's Figure Style.
A repository for Automatic Speech Recognition (ASR) that ensembles multiple open-source models to achieve SOTA quality of recognition. Useful if you need to get the maximum quality of recognition d…
Operator-level compressed GTCRN with ERB-CRM pipeline preserved and DPGRNN intact, ready for edge deployment.
A training code template for DNN-based speech enhancement.
This project focuses on audio processing and filter simulation research. It uses Python for simulation experiments and C++ for engineering implementation, covering extensive machine learning practi…
Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini C…
Traditional Speech Enhancement Methods
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deplo…
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
🧠「大模型」2小时完全从0训练64M的小参数LLM!Train a 64M-parameter LLM from scratch in just 2h!
🎙️ 「大模型」从0训练0.1B能听能说能看的全模态Omni模型!A 0.1B Omni model trained from scratch, capable of listening, speaking, and seeing!
Single Channel Speech Enhancement Methods and Toolbox
speech enhancement\speech seperation\sound source localization
A Survey of Continual Learning for Speech and Audio Models
Audio Coding Notebooks and Tutorials
Python implementation of performance metrics in Loizou's Speech Enhancement book
Models for DCASE 2026 Semantic Acoustic Imaging for Sound Event Localization and Detection from Spatial Audio and Audiovisual Scenes
This is the public repository for SALSA-Lite features for polyphonic sound event localization and detection using microphone arrays.
You can find the speech algorithms you want here
[CVPR 2025] Pytorch implementation of the paper "Hearing Anywhere in Any Environment"
[TASLP] Open-Vocabulary Sound Event Localization and Detection with Joint Learning of CLAP Embedding and Activity-Coupled Cartesian DOA Vector
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD