Speech & Voice ML
Robust Speech Recognition via Large-Scale Weak Supervision
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
Faster Whisper transcription with CTranslate2
We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multi…
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Manipulate audio with a simple and easy high level interface
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Official PyTorch Implementation of CleanUNet (ICASSP 2022)
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Recurrent neural network for audio noise reduction
Noise reduction in python using spectral gating (speech, bioacoustics, audio, time-domain signals)
Trained neural networks and requisite information and data for rnnoise-nu
Recurrent neural network for audio noise reduction, slightly improved for general use
🔊 Awesome list for Whisper — an open-source AI-powered speech recognition system developed by OpenAI
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.
A lightweight library to compute Diarization Error Rate (DER).
The collection of pre-trained, state-of-the-art AI models for ailia SDK