Starred repositories
A latent text-to-image diffusion model
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
🔊 Text-Prompted Generative Audio Model
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Learn OpenCV : C++ and Python Examples
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
A multi-voice TTS system trained with an emphasis on quality
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
High-Resolution Image Synthesis with Latent Diffusion Models
PyTorch tutorials and fun projects including neural talk, neural style, poem writing, anime generation (《深度学习框架PyTorch:入门与实战》)
LAVIS - A One-stop Library for Language-Vision Intelligence
A collection of pre-trained, state-of-the-art models in the ONNX format
Image restoration with neural networks but without learning.
Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
Using Low-rank adaptation to quickly fine-tune diffusion models.
Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
A unified framework for 3D content generation.
Taming Transformers for High-Resolution Image Synthesis
A course in reinforcement learning in the wild
Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
The 3rd edition of course.fast.ai
Materials for the Hugging Face Diffusion Models Course
Text recognition (optical character recognition) with deep learning methods, ICCV 2019
Tutorials for creating and using ONNX models
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework
Kandinsky 2 — multilingual text2image latent diffusion model
Easily compute clip embeddings and build a clip retrieval system with them