Domain aligned CLIP for few-shot classification

MW Gondal, J Gast, IA Ruiz, R Droste… - Proceedings of the …, 2024 - openaccess.thecvf.com
Large vision-language representation learning models like CLIP have demonstrated
impressive performance for zero-shot transfer to downstream tasks while largely benefiting
from inter-modal (image-text) alignment via contrastive objectives. This downstream
performance can further be enhanced by full-scale fine-tuning which is often compute
intensive, requires large labelled data, and can reduce out-of-distribution (OOD) robustness.
Furthermore, sole reliance on inter-modal alignment might overlook the rich information …

Domain Aligned CLIP for Few-shot Classification

M Waleed Gondal, J Gast, I Alonso Ruiz… - arXiv e …, 2023 - ui.adsabs.harvard.edu
Large vision-language representation learning models like CLIP have demonstrated
impressive performance for zero-shot transfer to downstream tasks while largely benefiting
from inter-modal (image-text) alignment via contrastive objectives. This downstream
performance can further be enhanced by full-scale fine-tuning which is often compute
intensive, requires large labelled data, and can reduce out-of-distribution (OOD) robustness.
Furthermore, sole reliance on inter-modal alignment might overlook the rich information …
Showing the best results for this search. See all results