Highlights
- Pro
Stars
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
PyTorch code and models for the DINOv2 self-supervised learning method.
Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
This repo is meant to serve as a guide for Machine Learning/AI technical interviews.
A unified framework for 3D content generation.
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Align Anything: Training All-modality Model with Feedback
Metric depth estimation from a single image
Paper reading notes on Deep Learning and Machine Learning
Code for "GVHMR: World-Grounded Human Motion Recovery via Gravity-View Coordinates", Siggraph Asia 2024
From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)
Cosmos-Transfer1-DiffusionRenderer: High-quality video de-lighting and re-lighting based on Cosmos video diffusion framework
A Scalable Pipeline for Making Steerable Multi-Task Mid-Level Vision Datasets from 3D Scans [ICCV 2021]
Unofficial implementation of RealFill
Official Implementation of paper "A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence"
ViT Prisma is a mechanistic interpretability library for Vision and Video Transformers (ViTs).
[CVPR 2023] DynamicStereo: Consistent Dynamic Depth from Stereo Videos.
Official implementation of SwiftSketch
Official implementation of ICCV 2025 paper - CharaConsist: Fine-Grained Consistent Character Generation
Textual Inversion for DeepFloyd IF
Code for paper Background Prompting for Improved Object Depth