🛡️ Explore visual prompt injection risks in AI browsers by uncovering hidden commands in images and layouts, enhancing security against multimodal threats.
-
Updated
Nov 14, 2025
🛡️ Explore visual prompt injection risks in AI browsers by uncovering hidden commands in images and layouts, enhancing security against multimodal threats.
🌋 A flexible framework for training and configuring Vision-Language Models
This project generates behavioral descriptions from images by combining computer vision and natural language processing. It goes beyond basic scene descriptions to infer human behaviors, intentions, and social contexts.
[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
Official code for CVPR2025 "Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection"
MICCAI 2024 Oral: Vision-Language Open-Set Detectors for Bone Fenestration and Dehiscence Detection from Intraoral Images
Official repository of "GenView++: Unifying Adaptive View Generation and Quality-Driven Supervision for Contrastive Representation Learning".
Unofficial implementation for Sigmoid Loss for Language Image Pre-Training
Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
This repository contains the implementation of AlignVLM paper, which proposes a novel method for vision language alignment
VTC: Improving Video-Text Retrieval with User Comments
[MICCAI‘25 Early Accept] MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment
Evaluate robustness of adaptation methods on large vision-language models
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models
A codebase for flexible and efficient Image Text Representation Alignment
[Science Advances] Demographic Bias of Vision-Language Foundation Models in Medical Imaging
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
Bias-to-Text: Debiasing Unknown Visual Biases through Language Interpretation
Add a description, image, and links to the vision-language-pretraining topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-pretraining topic, visit your repo's landing page and select "manage topics."