Stars
A simple pip-installable Python tool to generate your HTML citation world map from your Google Scholar ID.
⚡Batch Face Processing for Fast Modern Research, including face detection, face alignment, face reconstruction, head pose estimation, face parsing
[ICML 2022] Code and data for our paper "IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages"
Enriching MS-COCO with Chinese sentences and tags for cross-lingual multimedia tasks
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
CLAIR: A (surprisingly) simple semantic text metric with large language models.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
[ACL 2023 Findings] FACTUAL dataset, the textual scene graph parser trained on FACTUAL.
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Computer Vision Annotation Tool (CVAT) is a leading platform for building high-quality visual datasets for vision AI. It offers open-source, cloud, and enterprise products, as well as labeling serv…
f.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the community. Free and open source — self-host for your organization with complete privacy.
BARTScore: Evaluating Generated Text as Text Generation
Source code for the paper "Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training"
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
[EMNLP 2018] Training for Diversity in Image Paragraph Captioning
Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021
GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
Efficient, check-pointed data loading for deep learning with massive data sets.