Stars
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
Audio waveform visualisation, converts any audio to a nice video
Unified automatic quality assessment for speech, music, and sound.
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Code for the paper "Training Diffusion Models with Reinforcement Learning"
Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍
Port of OpenAI's Whisper model in C/C++
A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
Python implementation of performance metrics in Loizou's Speech Enhancement book
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
Hydra is a framework for elegantly configuring complex applications
A self-supervised learning framework for audio-visual speech
[ECCV 2018] Official code for "Graph R-CNN for Scene Graph Generation"
A faster pytorch implementation of faster r-cnn
Wavenet Autoencoder for Unsupervised speech representation learning (after Chorowski, Jan 2019)
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
Learning diverse image-to-image translation from unpaired data
Collection of generative models in Pytorch version.
This repository provides state of the art (SoTA) results for all machine learning problems. We do our best to keep this repository up to date. If you do find a problem's SoTA result is out of date …
Implementations of various VAE-based semi-supervised and generative models in PyTorch
Tensorflow implementation of the speech model described in Neural Discrete Representation Learning (a.k.a. VQ-VAE)
Python library for audio and music analysis