Stars
A latent text-to-image diffusion model
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
🔊 Text-Prompted Generative Audio Model
This repository is maintained by Omar Santos (@santosomar) and includes thousands of resources related to ethical hacking, bug bounties, digital forensics and incident response (DFIR), AI security,…
A simple screen parsing tool towards pure vision based GUI agent
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Instruct-tune LLaMA on consumer hardware
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
llama3 implementation one matrix multiplication at a time
This repository contains the source code for the paper First Order Motion Model for Image Animation
A multi-voice TTS system trained with an emphasis on quality
PyTorch code and models for the DINOv2 self-supervised learning method.
LAVIS - A One-stop Library for Language-Vision Intelligence
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Zero-Shot Speech Editing and Text-to-Speech in the Wild
A real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body
COCO API - Dataset @ http://cocodataset.org/
[ICCV 2019] Monocular depth estimation from a single image
Fault-tolerant, highly scalable GPU orchestration, and a machine learning framework designed for training models with billions to trillions of parameters
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
The hub for EleutherAI's work on interpretability and learning dynamics
DeepFashion2 Dataset https://arxiv.org/pdf/1901.07973.pdf
This is the repo for our new project Highly Accurate Dichotomous Image Segmentation
An extensive node suite for ComfyUI with over 210 new nodes
[ICLR24] Official PyTorch Implementation of Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
An out-of-box human parsing representation extractor.
Utility functions for handling MIDI data in a nice/intuitive way.