Stars
A latent text-to-image diffusion model
🔊 Text-Prompted Generative Audio Model
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Instruct-tune LLaMA on consumer hardware
Neural Networks: Zero to Hero
A multi-voice TTS system trained with an emphasis on quality
This open-source curriculum introduces the fundamentals of Model Context Protocol (MCP) through real-world, cross-language examples in .NET, Java, TypeScript, JavaScript, Rust and Python. Designed …
[WIP] Resources for AI engineers. Also contains supporting materials for the book AI Engineering (Chip Huyen, 2025)
QLoRA: Efficient Finetuning of Quantized LLMs
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
Using Low-rank adaptation to quickly fine-tune diffusion models.
Overview and tutorial of the LangChain Library
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
Code and slides of my YouTube series called "Audio Signal Proessing for Machine Learning"
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Codebase for Time-series Generative Adversarial Networks (TimeGAN) - NeurIPS 2019
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.
A series of tutorial notebooks on denoising diffusion probabilistic models in PyTorch
Soft speech units for voice conversion
Multilingual G2P in 100 languages
An official implementation of "UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data"
Educational implementation of the Discrete Flow Matching paper
rotten-work / vits-mandarin-windows
Forked from jaywalnut310/vitsVITS for Mandarin. Support Windows and Linux, low-end and high-end hardwares
Torch implementation of NANSY, Neural Analysis and Synthesis, arXiv:2110.14513
Deep Speech Distances PyTorch