Stars
A latent text-to-image diffusion model
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
🔊 Text-Prompted Generative Audio Model
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
⛔️ DEPRECATED – See https://github.com/ageron/handson-ml3 instead.
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
A multi-voice TTS system trained with an emphasis on quality
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
This repository contains implementations and illustrative code to accompany DeepMind publications
High-Resolution Image Synthesis with Latent Diffusion Models
Code release for NeRF (Neural Radiance Fields)
Convert AI papers to GUI,Make it easy and convenient for everyone to use artificial intelligence technology。让每个人都简单方便的使用前沿人工智能技术
🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Taming Transformers for High-Resolution Image Synthesis
Silero Models: pre-trained text-to-speech models made embarrassingly simple
Tacotron 2 - PyTorch implementation with faster-than-realtime inference
CoTracker is a model for tracking any point (pixel) on a video.
Probabilistic reasoning and statistical analysis in TensorFlow
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts…
Singing Voice Conversion via diffusion model
PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)
Noise reduction in python using spectral gating (speech, bioacoustics, audio, time-domain signals)
Official code for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
functorch is JAX-like composable function transforms for PyTorch.
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
Demonstrations of Magenta Models
Learning to Learn using One-Shot Learning, MAML, Reptile, Meta-SGD and more with Tensorflow