Stars
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
🔊 Text-Prompted Generative Audio Model
singing voice change based on whisper, and lora for singing voice clone
Tensors and Dynamic neural networks in Python with strong GPU acceleration
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
An unofficial implement of autoregressive vocoder Multiband-WaveRNN. Audio samples in https://rongjiehuang.github.io/Multiband-WaveRNN/
A No-Recurrence Sequence-to-Sequence Model for Speech Recognition
FastSpeech2 with cross-lingual support
Neural network-based forced alignment with bidirectional attention mechanism
[SIGGRAPH 2022 Journal Track] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
SyntaSpeech: Syntax-aware Generative Adversarial Text-to-Speech; IJCAI 2022; Official code
tacotronV2 + wavernn 实现中文语音合成(Tensorflow + pytorch)
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
A system works on singing voice synthesis
Quasi-Periodic Parallel WaveGAN Pytorch implementation
A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Implementation of Gaussian Mixture Variational Autoencoder (GMVAE) for Unsupervised Clustering
Official Tensorflow implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (ICLR 2020)
Speech Enhancement Generative Adversarial Network in TensorFlow
Pytorch implementation for few-shot photorealistic video-to-video translation.
Utilities for resampling and filtering audio data