GitHub - shyammarjit/LR0.FM: LR0.FM: Low-Resolution Zero-shot Classification Benchmark For Foundation Models

🚀 LR0.FM (ICLR-25 🎉)
webpage | paper | Individual results | WAR-SAR Ranking

💡 Highlights

✨ We introduce LR0.FM, a comprehensive benchmark evaluating the impact of low resolution on the zero-shot classification performance of 10 FM(s) across 66 backbones and 15 datasets.
✨ We propose a novel metric, Weighted Aggregated Robustness, to address the limitations of existing metrics and better evaluate model performance across resolutions and datasets.
✨ Our key findings show that: (i) model size positively correlates with robustness to resolution degradation, (ii) pre-training dataset quality is more important than its size, and (iii) fine-tuned and higher resolution models are less robust against LR.
✨ Our analysis further reveals that the model makes semantically reasonable predictions at LR, and the lack of fine-grained details in input adversely impacts the model’s initial layers more than the deeper layers.
✨ Our proposed LR-TK0 enhances model robustness to low-resolution without altering pre-trained weights, demonstrating effectiveness across several datasets and its generalization capability across backbones and other approaches.

✨ This work will also appear in ICCV'25 (Non-Proceedings Tracks)

🎖️ LeaderBoard

X-axis represent relative robustness for each dataset, for all models. Last column indicates the SAR (using relative robustness) and WAR (using improved relative robsutnes).

🎖️ Zero-Shot Classification Results in Low Resolution

Setup has instructions for setting up the conda environment to run and train models.

Model	Backbones	Validation Code	Results
CLIP	CLIP-ViT-B/32, CLIP-ViT-B/16, CLIP-ViT-L/14, CLIP-ViT-L/14@336px, CLIP-RN50, CLIP-RN101, CLIP-RN50x4, CLIP-RN50x16, CLIP-RN50x64	Code, Script	results_csv, results_pdf
BLIP	BLIP-ViT-B/16 (14M), BLIP-ViT-B/16 (129M), BLIP-ViT-B/16 & CapFilt-L (129M), BLIP-ViT-L/16 (129M), BLIP-ViT-B/16 (129M + COCO), BLIP-ViT-B/16 (129M + Flickr), BLIP-ViT-L/16 (129M + COCO), BLIP-ViT-L/16 (129M + Flickr)	code	results_csv, results_pdf
MetaCLIP	MetaCLIP-ViT-B/32 (400M), MetaCLIP-ViT-B/32 (2.5B), MetaCLIP-ViT-B/16 (400M), MetaCLIP-ViT-B/16 (2.5B), MetaCLIP-ViT-L/14 (400M), MetaCLIP-ViT-L/14 (2.5B), MetaCLIP-ViT-H/14 (2.5B), MetaCLIP-ViT-G/14 (2.5B)	Code, Script	results_csv, results_pdf
EVA-CLIP	EVA-01-CLIP-g/14, EVA-01-CLIP-g/14+, EVA-02-CLIP-B/16, EVA-02-CLIP-E/14, EVA-02-CLIP-E/14+, EVA-02-CLIP-L/14, EVA-02-CLIP-L/14+	Code, Script	results_csv, results_pdf
EVA-CLIP-18B	EVA-CLIP-8B	Code	Last column above
CLIPA-v2	CLIPA(v2)-ViT-G/14, CLIPA(v2)-ViT-G/14@336px, CLIPA(v2)-ViT-H/14, CLIPA(v2)-ViT-H/14@336px (DataComp-1B), CLIPA(v2)-ViT-H/14@336px (LAION-2B), CLIPA(v2)-ViT-L/14, CLIPA(v2)-ViT-L/14@336px	Code	results_csv, results_pdf
$M^2$-Encoder	$M^2$-Encoder-0.4B, $M^2$-Encoder-1B, $M^2$-Encoder-10B	code	results_csv, results_pdf
CoCa	CoCa-ViT-B/32, CoCa-ViT-L/14 (laion2b_s13b_b90k), CoCa-ViT-L/14(laion2b_s13b_b90k + mscoco)	Code	results_csv, results_pdf
SigLIP	SigLIP-ViT-B/16, SigLIP-ViT-B/16@256px, SigLIP-ViT-B/16@384px, SigLIP-ViT-B/16@512px, SigLIP-ViT-L/16@256px, SigLIP-ViT-L/16@384px, SigLIP-ViT-SO400M, SigLIP-ViT-SO400M@384px	Code	results_csv, results_pdf
OpenCLIP	OpenCLIP-ViT-B/16, OpenCLIP-ViT-B/32@256px, OpenCLIP-ViT-L/14 (laion2b_s32b_b82k), OpenCLIP-ViT-L/14 (datacomp_xl_s13b_b90k), OpenCLIP-ViT-H/14, OpenCLIP-ViT-H/14-quickgelu, OpenCLIP-ViT-H/14-quickgelu@378px, OpenCLIP-ViT-G/14	Code, Script	results_csv, results_pdf
ALIBEF	ALBEF (4M), ALBEF (14M), ALBEF (14M + coco_finetuned), ALBEF (14M + flickr_finetuned)	code	results_csv, results_pdf

⭐ WAR ( & SAR ) Evaluation Metrics

Dataset weights.

Dataset	Weight
Imagenet	0.15556157429688613
ImageNet-A	0.970498446080589
ImageNet-V2	0.2854574367981364
ImageNet-R	0.01
ImageNet-Sketch	0.021456095637452655
Caltech101 (300 x 200)	0.01
DTD split-1 (300x300 - 640x640)	0.505922498560715
Food101 (512*512)	0.01
SUN397	0.407563119725743
Stanford Cars (360x240)	0.13583821249199218
FGVC Aircraft	0.8229545014750042
Oxford Pets	0.08995285864599148
Flowers102	0.08972060770047119
EuroSAT	1.0
UCF101	0.01

Code to compute WAR & Improvemend Robsutness (eq 1 in paper) is shown here. Run python generate_SAR_WAR.py 16 to generate SAR & WAR scores for all models. Results are dummed inside MetaData/WAR_SAR_Ranking/.

⚡⚡ Diffusion generated synthetic Dataset

Total 7,000 captions were used to generate images. These captions were randomly sampled google caption dataset and are placed in https://github.com/shyammarjit/LR0.FM/tree/main/MetaData/Captions

Feeding the dataset to the Diffusuion model via :

import torch 
from diffusers import PixArtAlphaPipeline
pipe = PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-1024-MS", torch_dtype=torch.float16)
pipe = pipe.to('cuda')


line = line.strip() ## caption line 
offset = 0 
for fold in range(5):
    images =pipe(line, num_images_per_prompt=10,  ).images
    [img.save(f"{ROOT}/{k+1 + offset}/{i}.png") for k,img in enumerate(images)]
    offset += 10

🔥 Lr-Tokens

Training Code provided for EVA, MetaCLIP, OpenCLIP

Setup has instructions for setting up the conda environment to run and train models.

✏️ Citation

If you think this project is helpful, please feel free to leave a star⭐️ and cite our paper:

@inproceedings{
    pathak2025lrfm,
    title={{ LR0.FM: Low-Res Benchmark and Improving robustness for Zero-Shot Classification in Foundation Models} },
    author={Priyank Pathak and Shyam Marjit and Shruti Vyas and Yogesh S Rawat},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=AsFxRSLtqR}
}

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
ALBEF		ALBEF
BLIP		BLIP
CLIP		CLIP
CLIP_benchmark		CLIP_benchmark
EVA		EVA
M2_Encoder		M2_Encoder
MetaCLIP		MetaCLIP
MetaData		MetaData
open_clip		open_clip
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 LR0.FM (ICLR-25 🎉)
webpage | paper | Individual results | WAR-SAR Ranking

💡 Highlights

🎖️ LeaderBoard

🎖️ Zero-Shot Classification Results in Low Resolution

⭐ WAR ( & SAR ) Evaluation Metrics

⚡⚡ Diffusion generated synthetic Dataset

🔥 Lr-Tokens

✏️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

shyammarjit/LR0.FM

Folders and files

Latest commit

History

Repository files navigation

🚀 LR0.FM (ICLR-25 🎉) webpage | paper | Individual results | WAR-SAR Ranking 💡 Highlights

🎖️ LeaderBoard

🎖️ Zero-Shot Classification Results in Low Resolution

⭐ WAR ( & SAR ) Evaluation Metrics

⚡⚡ Diffusion generated synthetic Dataset

🔥 Lr-Tokens

✏️ Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

🚀 LR0.FM (ICLR-25 🎉)
webpage | paper | Individual results | WAR-SAR Ranking

💡 Highlights

Packages