[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
-
Updated
Sep 9, 2025 - Python
[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
Evaluate robustness of adaptation methods on large vision-language models
🛡️ Explore visual prompt injection risks in AI browsers by uncovering hidden commands in images and layouts, enhancing security against multimodal threats.
🌋 A flexible framework for training and configuring Vision-Language Models
📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)
Official repository of "GenView++: Unifying Adaptive View Generation and Quality-Driven Supervision for Contrastive Representation Learning".
This project generates behavioral descriptions from images by combining computer vision and natural language processing. It goes beyond basic scene descriptions to infer human behaviors, intentions, and social contexts.
VTC: Improving Video-Text Retrieval with User Comments
Unofficial implementation for Sigmoid Loss for Language Image Pre-Training
[MICCAI‘25 Early Accept] MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment
This repository contains the implementation of AlignVLM paper, which proposes a novel method for vision language alignment
Bias-to-Text: Debiasing Unknown Visual Biases through Language Interpretation
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
A codebase for flexible and efficient Image Text Representation Alignment
Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Multi-Aspect Vision Language Pretraining - CVPR2024
MICCAI 2024 Oral: Vision-Language Open-Set Detectors for Bone Fenestration and Dehiscence Detection from Intraoral Images
Easy wrapper for inserting LoRA layers in CLIP.
Add a description, image, and links to the vision-language-pretraining topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-pretraining topic, visit your repo's landing page and select "manage topics."