Code to train a custom time-domain autoencoder to dereverb audio
-
Updated
Nov 30, 2023 - Python
Code to train a custom time-domain autoencoder to dereverb audio
Dual-model speech AI toolkit for speaker verification and speaker-aware diarization, with streaming inference, meeting analysis, long-audio monitoring, and speaker-bank integration.
Unified dual-teacher distillation (ReDimNet + ASSIST) into Wav2Vec2 for speaker verification and deepfake detection
Local MCP server + CLI turning YouTube & local audio into rich sonic signatures. Extracts BPM, section-by-section key, vocal presence, transient punch, and 512-dim CLAP vibe embeddings. Powered by Demucs stem separation & librosa. 100% private, offline-first, and GPU-accelerated with graceful CPU/HPSS degradation.
Real-time speech enhancement pipeline — custom-trained U-Net denoising model, ONNX inference, Overlap-Add synthesis, and virtual audio routing for Teams, Zoom, and DAW use. CPU-only, no cloud dependency.
Engine identification using acoustic signal analysis and machine learning to classify 8 vehicle types. Audio signals are processed using FFT and feature extraction, and a multi-class model predicts vehicle categories based on their unique sound patterns.
Audio analysis in javascript/typescript
Edge-deployable keyword spotter: INT8-quantized DS-CNN on Google Speech Commands, exported to ONNX, with fp32 vs INT8 benchmarks, a live mic demo, and a C++ inference harness.
Machine learning system for music genre classification using feature engineering, stratified evaluation, SVC/XGBoost modeling, and reproducible prediction export.
Automated audio/video ML pipeline for detecting and transcribing jazz solos from live recordings. Runs nightly against Smalls Jazz Club archives: uses CLAP (instrument detection), Demucs (source separation), CLIP (performer identification), and basic-pitch (MIDI transcription). Results served via REST API.
Neural TTS and voice-cloning application using XTTS/VITS. Supports 3–30 s reference audio for speaker adaptation, real-time pitch/speed control, and WAV/MP3 export.
AI-generated audio summarisation pipeline — Whisper transcription, LLM key-insight extraction, and structured spoken summaries with TTS playback and Streamlit interface.
Audio file processing pipeline with GPT-4-powered error diagnosis — detects codec issues, sample rate mismatches, and corruption artefacts with automated remediation suggestions.
Key Features: Simple VAE architecture with encoder/decoder Synthetic music data generation for training Interactive training with progress tracking Music generation from latent space sampling Audio conversion and playback Downloadable audio files
Convert Meta's HTDemucs (Hybrid Transformer Demucs) to Apple Core ML. Real-valued STFT/ISTFT wrapper, manual MHA decomposition, pre-computed overlap-add. Includes Swift example.
Music harmony AI — chord progression analysis with Roman numeral labelling, voice leading checker, style-conditioned progression generation (Baroque/Jazz/Pop), and MIDI export via music21.
Add a description, image, and links to the audio-ml topic page so that developers can more easily learn about it.
To associate your repository with the audio-ml topic, visit your repo's landing page and select "manage topics."