Stars
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
StableLM: Stability AI Language Models
PyTorch code and models for the DINOv2 self-supervised learning method.
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
DeepFashion2 Dataset https://arxiv.org/pdf/1901.07973.pdf
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
A suite of image and video neural tokenizers
Demo of running NNs across different frameworks
Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
iBOT 🤖: Image BERT Pre-Training with Online Tokenizer (ICLR 2022)
Official Jax Implementation of MaskGIT
(CVPR 2022) Pytorch implementation of "Self-supervised transformers for unsupervised object discovery using normalized cut"
Official implementation and data release of the paper "Visual Prompting via Image Inpainting".
Distributed Robot Interaction Dataset.
Streaming over lightweight data transformations
Tensorflow implementation of DETR : Object Detection with Transformers
Implementations of some popular Saliency Maps in Keras
Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025
Automagically curate test sets based on user given constraints