#

vision-language-pretraining

Here are 42 public repositories matching this topic...

BUAADreamer / CCRK

[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning

retrieval wit cross-modal cross-lingual mscoco multi30k image-text-search cross-modal-retrieval xlm-roberta swin-transformer cross-lingual-retrieval image-text-retrieval vision-language-pretraining iglue xflickrco kdd2024

Updated Sep 9, 2025
Python

adarobustness / adaptation_robustness

Evaluate robustness of adaptation methods on large vision-language models

robustness adaptation parameter-efficient-tuning vision-language-pretraining

Updated Aug 23, 2023
Shell

unitaryai / VTC-dataset

dataset video-understanding video-text-retrieval vision-language-pretraining vision-language-dataset

Updated May 1, 2024
Python

Laminal-battleofstmihiel103 / adversarial-vision

🛡️ Explore visual prompt injection risks in AI browsers by uncovering hidden commands in images and layouts, enhancing security against multimodal threats.

python deep-neural-networks ai pytorch clip fairness robustness visual-question-answering crnn-ocr adversarial-training low-resource-script word-spotting adversarial-defense pretraining adversarial-robustness neurips-2020 vision-language-pretraining vision-language-model

Updated Nov 14, 2025

thisisiron / LLaVA-Pool

🌋 A flexible framework for training and configuring Vision-Language Models

vlm finetuning vision-language-pretraining vision-language-model llava multimodal-large-language-models qwen llavapool

Updated Jul 6, 2025
Python

megvii-research / protoclip

📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)

self-supervised-learning contrastive-learning vision-language-pretraining

Updated Nov 8, 2023
Python

xiaojieli0903 / GenViewPlusPlus

Official repository of "GenView++: Unifying Adaptive View Generation and Quality-Driven Supervision for Contrastive Representation Learning".

self-supervised-learning diffusion-models contrastive-learning vision-language-pretraining generative-augmented-generation vision-langu

Updated Sep 28, 2025
Python

PinsaraPerera / AI-assessment

This project generates behavioral descriptions from images by combining computer vision and natural language processing. It goes beyond basic scene descriptions to infer human behaviors, intentions, and social contexts.

streamlit behavioral-analysis vision-language-pretraining gpt-3-5-turbo blip-model

Updated May 5, 2025
Python

unitaryai / VTC

VTC: Improving Video-Text Retrieval with User Comments

comments video-understanding multimodal-deep-learning video-text-retrieval vision-language-transformer vision-language-pretraining

Updated Dec 8, 2025
Python

ahmdtaha / distributed_sigmoid_loss

Unofficial implementation for Sigmoid Loss for Language Image Pre-Training

python3 pytorch unsupervised-learning vision-and-language multimodal-deep-learning self-supervised-learning vision-language contrastive-learning distributed-data-parallel vision-transformer vision-language-pretraining

Updated Sep 26, 2023
Python

SiyuanYan1 / MAKE

[MICCAI‘25 Early Accept] MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment

medical-image-analysis skin-lesion-classification vision-language-pretraining vision-language-model dermatology-ai medical-foundation-model

Updated Nov 15, 2025
Python

mvish7 / AlignVLM

This repository contains the implementation of AlignVLM paper, which proposes a novel method for vision language alignment

multimodality huggingface-transformers vision-language-pretraining vision-language-model smolvlm vision-language-alignment

Updated May 23, 2025
Python

alinlab / b2t

Bias-to-Text: Debiasing Unknown Visual Biases through Language Interpretation

explainable-ai vision-language-pretraining bias-and-fairness

Updated May 21, 2023
Python

yiren-jian / BLIText

[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

multimodal-deep-learning vision-language-transformer vision-language-pretraining

Updated Dec 5, 2023
Python

TencentARC / FLM

Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)

language-modeling vision-language-pretraining

Updated May 15, 2023
Python

ChenDelong1999 / ITRA

A codebase for flexible and efficient Image Text Representation Alignment

computer-vision deep-learning pytorch multimodal-learning vision-language-pretraining

Updated Jun 20, 2023
Python

LooperXX / ManagerTower

Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

vision-language multi-modal-learning vision-language-pretraining vision-language-learning

Updated Aug 23, 2025
Python

HieuPhan33 / CVPR2024_MAVL

Multi-Aspect Vision Language Pretraining - CVPR2024

zero-shot-classification vision-language-pretraining vision-language-model zero-shot-segmentation medical-vision-and-language-pretraining

Updated Aug 20, 2024
Python

xmed-lab / FD-SOS

MICCAI 2024 Oral: Vision-Language Open-Set Detectors for Bone Fenestration and Dehiscence Detection from Intraoral Images

object-detection teeth vision-language object-detector open-set-object-detection vision-language-pretraining miccai2024

Updated Apr 1, 2025
Python

jaisidhsingh / LoRA-CLIP

Easy wrapper for inserting LoRA layers in CLIP.

lora multimodal multimodal-deep-learning image-text-matching parameter-efficient-tuning vision-language-pretraining low-rank-adaptation

Updated Jun 16, 2024
Python

Improve this page

Add a description, image, and links to the vision-language-pretraining topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-pretraining topic, visit your repo's landing page and select "manage topics."