- South Africa
- https://orcid.org/0000-0002-8168-7857
Lists (1)
Sort Name ascending (A-Z)
Stars
A TensorFlow Implementation of DC-TTS: yet another text-to-speech model
Command-line tools for speech and intent recognition on Linux
AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
💬 SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/
🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
The Implementation of FastSpeech based on pytorch.
Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.
A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
AcademiCodec: An Open Source Audio Codec Model for Academic Research
PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al.
spring-media / ForwardTacotron
Forked from fatchord/WaveRNN⏩ Generating speech in a single forward pass without any attention!
Voice Conversion by CycleGAN (语音克隆/语音转换): CycleGAN-VC2
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
Multimodal AI Story Teller, built with Stable Diffusion, GPT, and neural text-to-speech
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
Grapheme to phoneme conversion with deep learning.
Official repository for RawNet, RawNet2, and RawNet3
Tensorflow implementation of "Generalized End-to-End Loss for Speaker Verification"
My-Voice Analysis is a Python library for the analysis of voice (simultaneous speech, high entropy) without the need of a transcription. It breaks utterances and detects syllable boundaries, fundam…
A tokenizer, text cleaner, and phonemizer for many human languages.
A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, …
Deep neural networks for getting text-independent speaker embedding written in TensorFlow
Code for paper "SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows"
begeekmyfriend / tacotron
Forked from keithito/tacotronA TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model
Deye/Sunsynk Inverter Python library and Home Assistant OS Addon
A Python library for measuring the acoustic features of speech (simultaneous speech, high entropy) compared to ones of native speech.