Skip to content
View e-bug's full-sized avatar

Block or report e-bug

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Data repository for the VALSE benchmark.

Python 40 5 Updated Feb 15, 2024

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

Python 506 52 Updated Nov 25, 2022

Code for ALBEF: a new vision-language pre-training method

Python 1,757 219 Updated Sep 20, 2022

Paper List for Contrastive Learning for Natural Language Processing

574 60 Updated Apr 27, 2023

Pytorch implementation of set transformer

Jupyter Notebook 677 116 Updated Feb 11, 2020

Code and data for ImageCoDe, a contextual vison-and-language benchmark

Python 41 6 Updated Mar 1, 2024

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Jupyter Notebook 5,711 766 Updated Mar 3, 2026

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-…

Jupyter Notebook 911 116 Updated Aug 24, 2023

Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.

HTML 270 60 Updated Aug 18, 2022

[EMNLP'21] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.

Python 77 8 Updated Aug 14, 2022

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.

Jupyter Notebook 2,003 260 Updated Jan 24, 2024

Pre-trained V+L Data Preparation

Python 47 3 Updated Jun 2, 2020

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Python 1,535 230 Updated Apr 3, 2024

[NAACL'21 & ACL'21] SapBERT: Self-alignment pretraining for BERT & XL-BEL: Cross-Lingual Biomedical Entity Linking.

Python 231 39 Updated Apr 28, 2023

Recent Advances in Vision and Language PreTrained Models (VL-PTMs)

1,157 104 Updated Aug 19, 2022

BERT-related papers

2,036 279 Updated Aug 12, 2023

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…

Python 36,944 5,171 Updated Jul 2, 2026

Awesome Transformers (self-attention) in Computer Vision

270 36 Updated Jul 31, 2021

Paper bank for Self-Supervised Learning

587 57 Updated Mar 14, 2023

PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"

Python 523 126 Updated Dec 8, 2021
Jupyter Notebook 1,222 543 Updated May 13, 2024

Meshed-Memory Transformer for Image Captioning. CVPR 2020

Python 545 135 Updated Dec 21, 2022

Code for the CoNLL 2019 paper "Compositional Generalization in Image Captioning" by Mitja Nikolaus, Mostafa Abdou, Matthew Lamm, Rahul Aralikatte and Desmond Elliott

Python 26 5 Updated Jun 14, 2020

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

Jupyter Notebook 1,470 371 Updated Feb 3, 2023

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 162,173 33,731 Updated Jul 3, 2026

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

Python 800 111 Updated Jun 30, 2021

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

Python 543 101 Updated May 1, 2023

Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".

Jupyter Notebook 745 110 Updated May 22, 2023
Next