Stars
VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
Implementation of the paper "Spoken Language Recognition using X-vectors" in Pytorch
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
mansr / sox
Forked from cbagwell/soxSoX, Swiss Army knife of sound processing
A PyTorch implementation of the Transformer model in "Attention is All You Need".
Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.
An easy to use PyTorch to TensorRT converter
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
The Official Implementation of “Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis”
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
Command line utility for forced alignment using Kaldi
Include Basis-MelGAN, MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.
Library for Textless Spoken Language Processing