Stars
Simultaneous speech-to-text model
(Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307.02486)
Fast and flexible image augmentation library. Paper about the library: https://www.mdpi.com/2078-2489/11/2/125
Sequence alignement methods with helpers for PyTorch.
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation
A timeline of the latest AI models for audio generation, starting in 2023!
Trainer for audio-diffusion-pytorch
AudioLDM: Generate speech, sound effects, music and beyond, with text.
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
Audio generation using diffusion models, in PyTorch.
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
PyTorch implementation of normalizing flow models