Revisiting Vision-Language Foundations for No-Reference Image Quality Assessment

Official repository for the WACV 2026 paper

Paper (PDF) | arXiv | WACV 2026 Open Access

Ankit Yadav, Ta Duc Huy, Lingqiao Liu The University of Adelaide

We present the first systematic evaluation of six prominent pretrained backbones (CLIP, SigLIP2, DINOv2, DINOv3, Perception, and ResNet) for No-Reference Image Quality Assessment (NR-IQA). Our study uncovers that (1) SigLIP2 consistently achieves strong performance, and (2) the choice of activation function plays a surprisingly crucial role. We introduce a learnable activation selection mechanism that adaptively determines the nonlinearity for each channel, achieving new state-of-the-art SRCC on CLIVE, KADID10K, and AGIQA3K.

Quick Start

git clone https://github.com/drkkgy/NR_IQA_AGM.git && cd NR_IQA_AGM
pip install -r requirements.txt

# Reproduce every paper number in one command (skips runs whose dataset is missing)
python eval_all.py

# Discover the available pretrained runs, then evaluate one by name
python eval_checkpoint.py --list
python eval_checkpoint.py --run Gating_CLIVE

# Evaluate the B_Gated (gating) checkpoint with GradCAM heatmaps
python eval.py --dataset CLIVE

Architecture

Loss: MSE + pair-wise margin ranking loss

Dataset Setup

Create a Dataset/ folder in the project root and organise each benchmark as shown below. The exact sub-directory names and annotation files must match what dataset.py expects.

Dataset/
├── KonIQ_10K/
│   ├── koniq10k_512x384/
│   │   └── 512x384/                # 10,073 images
│   └── koniq10k_scores_and_distributions/
│       └── koniq10k_scores_and_distributions.csv
│
├── CLIVE/
│   └── ChallengeDB_release/
│       ├── Data/
│       │   ├── AllImages_release.mat
│       │   ├── AllMOS_release.mat
│       │   └── AllStdDev_release.mat
│       └── Images/                  # 1,162 images
│
├── SPAQ/
│   ├── SPAQ_dataset/
│   │   └── Annotations/
│   │       └── MOS_and_Image_attribute_scores.xlsx
│   └── TestImage/                   # 11,125 images
│
├── KADID-10K/
│   └── kadid10k/
│       ├── dmos.csv
│       └── images/                  # 10,125 images
│
├── FLIVE/
│   ├── labels_image.csv
│   └── database/                    # ~40,000 images (sub-folders inside)
│
├── AGIQA-3k/
│   ├── data.csv
│   └── images/                      # 2,982 images
│
└── AGIQA-1k/
    ├── AIGC_MOS_Zscore.xlsx
    └── images/                      # 1,000 images

Tip: You can symlink existing dataset directories instead of copying:
ln -s /path/to/your/KonIQ_10K Dataset/KonIQ_10K

Installation

Option A: pip (inside an existing environment)

pip install -r requirements.txt

Note: Install PyTorch with the CUDA version matching your GPU driver first. See pytorch.org/get-started.

Option B: Conda (creates a fresh environment)

conda env create -f environment.yml
conda activate nr_iqa_agm

Edit pytorch-cuda=12.1 in environment.yml if you need a different CUDA version (e.g. 11.8).

Training

# Train on KonIQ-10K with default hyperparameters
python train.py --dataset KonIQ_10K

# Train on CLIVE with LoRA rank 8, 20 epochs, batch size 4
python train.py --dataset CLIVE --peft_method LoRA --lora_r 8 --epochs 20 --batch_size 4

# Cross-dataset: train on KonIQ-10K, evaluate on CLIVE
python train.py --dataset KonIQ_10K_CLIVE

# Full fine-tuning (no PEFT adapter)
python train.py --dataset SPAQ --peft_method NA

# Deep Prompt Tuning instead of LoRA
python train.py --dataset KADID10K --peft_method DPT

# Resume a previous run
python train.py --dataset KonIQ_10K --resume

# Dry-run for quick debugging (100 train batches, 32 eval batches)
python train.py --dataset CLIVE --dry_run

# Disable WandB logging
python train.py --dataset CLIVE --no_wandb

Key Training Arguments

Flag	Default	Description
`--dataset`	required	Dataset to train on (see list above)
`--data_dir`	`./Dataset`	Root directory of all datasets
`--model_id`	`google/siglip2-so400m-patch16-512`	HuggingFace backbone
`--peft_method`	`LoRA`	`LoRA`, `DPT`, or `NA`
`--epochs`	`15`	Number of training epochs
`--batch_size`	`2`	Per-device batch size
`--lr`	`1e-4`	Learning rate
`--grad_accum`	`6`	Gradient accumulation steps (effective batch = batch_size * grad_accum)
`--lr_milestones`	`30,35`	Comma-separated epoch milestones for MultiStepLR
`--checkpoint_steps`	`5000`	Save a checkpoint every N steps
`--stage_name`	`AGM_seed8`	Prefix for checkpoint directories
`--resume`	off	Resume from the latest `resume_state/` file
`--dry_run`	off	Fast debugging mode
`--no_wandb`	off	Disable Weights & Biases logging
`--no_eval`	off	Skip evaluation during training
`--eval_every`	`1`	Evaluate every N epochs

Pretrained Checkpoints

The repo ships with seven pretrained runs in pretrained_checkpoints/, covering two architectures (B_Gated with MLP3_Gated activation gating, and B_Sig with mlp_3_layer_sigmoid_siglip), each as a LoRA adapter on SigLIP-2. Reference them by their short run name:

Run name	Arch	Train set	Test set
`Gating_CLIVE`	B_Gated	CLIVE	CLIVE
`Gating_KonIQ`	B_Gated	KonIQ-10K	KonIQ-10K
`Gating_KonIQ_to_CLIVE`	B_Gated	KonIQ-10K	CLIVE
`Gating_CLIVE_to_KonIQ`	B_Gated	CLIVE	KonIQ-10K
`Sigmoid_CLIVE`	B_Sig	CLIVE	CLIVE
`Sigmoid_KonIQ`	B_Sig	KonIQ-10K	KonIQ-10K
`Sigmoid_KonIQ_to_CLIVE`	B_Sig	KonIQ-10K	CLIVE

Each registered run carries the same train/val split and seed used during training, so the reported SRCC/PLCC on the held-out partition match the paper. Pass --run <name> to eval_checkpoint.py and the right split is applied for you — no extra flags needed.

Run python eval_checkpoint.py --list to print the same table with the underlying checkpoint paths. eval_all.py iterates every entry and prints an SRCC/PLCC summary — see the Evaluation section below.

Results

Performance comparison with state-of-the-art methods on seven benchmark datasets. Values represent SRCC and PLCC averaged over three runs (seeds: 8, 19, 25). B: Baseline, B_Sig: Baseline_Sigmoid, B_Gated: Baseline_Gated (Ours).

(a) Non-diffusion methods

Method	CLIVE		KonIQ10K		FLIVE		SPAQ		AGIQA3K		AGIQA1K		KADID10K		Average
	SRCC	PLCC	SRCC	PLCC	SRCC	PLCC	SRCC	PLCC	SRCC	PLCC	SRCC	PLCC	SRCC	PLCC	SRCC	PLCC
ILNIQE	.508	.508	.523	.537	-	-	.713	.712	-	-	-	-	.534	.558	.570	.579
BRISQUE	.629	.629	.681	.685	.303	.341	.809	.817	-	-	-	-	.528	.567	.590	.608
WaDIQaM	.682	.671	.804	.807	.455	.467	-	-	-	-	-	-	.739	.752	.670	.674
DBCNN	.851	.869	.875	.884	.545	.551	.911	.915	-	-	-	-	.851	.856	.807	.815
TIQA	.845	.861	.892	.903	.541	.581	-	-	-	-	-	-	.850	.855	.782	.800
MetaIQA	.835	.802	.887	.856	.540	.507	-	-	-	-	-	-	.762	.775	.756	.735
P2P-BM	.844	.842	.872	.885	.526	.598	-	-	-	-	-	-	.840	.849	.770	.793
HyperIQA	.859	.882	.906	.917	.544	.602	.911	.915	-	-	-	-	.852	.845	.814	.832
TReS	.846	.877	.915	.928	.554	.625	-	-	-	-	-	-	.859	.859	.794	.822
MUSIQ	.702	.746	.916	.928	.566	.661	.918	.921	-	-	-	-	.875	.872	.795	.826
CONTRIQUE	-	-	-	-	-	-	-	-	.804	.868	.670	.708	-	-	.737	.788
RE-IQA	.840	.854	.914	.923	.645	.733	.918	.925	.785	.845	.614	.670	.872	.885	.798	.834
GenZIQA	-	-	-	-	-	-	-	-	.832	.892	.840	.861	-	-	.836	.877
LoDA	.876	.899	.932	.944	.578	.679	.925	.928	-	-	-	-	.931	.936	.848	.877
QCN	.875	.893	.934	.945	.644	.741	.923	.928	-	-	-	-	-	-	-	-
B	.875	.905	.932	.943	.533	.641	.927	.931	.865	.917	.857	.889	.961	.964	.850	.884
B_Sig (Ours)	.909	.930	.938	.947	.521	.608	.921	.926	.878	.923	.872	.897	.939	.943	.854	.882
B_Gated (Ours)	.887	.912	.953	.962	.556	.647	.928	.932	.867	.919	.873	.892	.970	.973	.862	.891

Evaluation

There are two evaluation scripts:

eval.py — evaluates B_Gated (MLP3_Gated) checkpoints with optional GradCAM visualisation.
eval_checkpoint.py — unified script that supports all three architectures (Baseline, B_Sig, B_Gated) with auto-detection.

eval.py (B_Gated only, with GradCAM)

# Evaluate using the pretrained CLIVE->CLIVE checkpoint (auto-detected)
python eval.py --dataset CLIVE

# Cross-dataset: pretrained KonIQ_10K->CLIVE checkpoint
python eval.py --dataset KonIQ_10K_CLIVE

# Evaluate a specific (user-trained) checkpoint
python eval.py --dataset KonIQ_10K \
    --checkpoint_dir best_checkpoints/AGM_seed8_train_KonIQ_10K_test_KonIQ_10K

# Skip GradCAM visualisation
python eval.py --dataset SPAQ --no_gradcam

# Custom output path
python eval.py --dataset AGIQA3K --output my_results.json

Flag	Default	Description
`--dataset`	required	Dataset to evaluate on
`--data_dir`	`./Dataset`	Root directory of all datasets
`--checkpoint_dir`	auto	Path to checkpoint dir (auto-detected from `best_checkpoints/`)
`--batch_size`	`4`	Evaluation batch size
`--no_gradcam`	off	Skip GradCAM heatmap generation
`--output`	auto	Path to save results JSON

eval_checkpoint.py (all architectures)

Unified script supporting Baseline, B_Sig, and B_Gated architectures. Architecture is auto-detected from the checkpoint's mlp.pt state-dict keys (and the directory name) — manual override is available via --arch.

Architecture	MLP Head	Detection signal
Baseline (B)	`mlp_3_layer` (ReLU)	`adapter.*` keys, no "sigmoid" in path
B_Sig	`mlp_3_layer_sigmoid_siglip` (Sigmoid + LeakyReLU)	`adapter.*` keys, "sigmoid" or "b_sig" in path
B_Gated	`MLP3_Gated` (learnable gated activations)	`act1.g*` keys

All shipped checkpoints are PEFT/LoRA adapters on top of google/siglip2-so400m-patch16-512, loaded with peft.PeftModel.from_pretrained() and merged for clean inference.

By run name (recommended)

# List all pretrained runs
python eval_checkpoint.py --list

# Evaluate one by name (resolves checkpoint + arch + dataset from the registry)
python eval_checkpoint.py --run Gating_CLIVE
python eval_checkpoint.py --run Sigmoid_KonIQ_to_CLIVE

# Override the registry's test dataset (cross-dataset experiment with the same checkpoint)
python eval_checkpoint.py --run Gating_KonIQ --dataset AGIQA3K

# Pick a specific GPU and save to a custom location
python eval_checkpoint.py --run Gating_CLIVE --device cuda:1 --output clive.json

Ad-hoc evaluation (user-trained checkpoints)

For checkpoints not in the registry, evaluation uses the same 20% held-out split as training — pass --seed so the partition can be reconstructed deterministically (use the same seed you trained with).

# Evaluate a user-trained checkpoint on its held-out 20% partition
python eval_checkpoint.py \
    --checkpoint best_checkpoints/AGM_seed8_train_CLIVE_test_CLIVE \
    --dataset CLIVE \
    --seed 8

# Manually override the architecture if auto-detection picks wrong
python eval_checkpoint.py \
    --checkpoint /path/to/baseline_sigmoid_ckpt \
    --dataset KonIQ_10K \
    --seed 8 \
    --arch baseline_sig

Flag	Default	Description
`--list`	off	Print the run registry and exit
`--run`	none	Named run from the registry (see `--list`)
`--checkpoint`	none	Path to checkpoint dir (required when `--run` is not given)
`--dataset`	none	Dataset to evaluate on; overrides the registry's default when used with `--run`
`--arch`	auto	Override architecture: `baseline`, `baseline_sig`, or `gating`
`--seed`	required when `--run` is not set	Seed for the 80/20 `random_split`; must match the training seed
`--data_dir`	`./Dataset`	Root directory of all datasets
`--batch_size`	`4`	Evaluation batch size
`--device`	`cuda`	Device string (`cuda`, `cuda:0`, `cuda:1`, `cpu`)
`--output`	auto	Path to save results JSON

eval_all.py (reproduce every paper number in one shot)

# Run every pretrained checkpoint on its native test set
python eval_all.py

# Run only the gating-architecture checkpoints
python eval_all.py --filter Gating

# Larger batch size, alternate dataset root, save to a custom location
python eval_all.py --data_dir /data/IQA --batch_size 8 --output my_sweep.json

# Fail loudly on the first missing dataset (default: skip-with-warning)
python eval_all.py --strict

Runs whose dataset is not present under --data_dir are skipped with a warning by default — useful when you only have a subset of benchmarks downloaded. After the sweep, an SRCC/PLCC summary table is printed to stdout and combined results are saved to results/eval_all.json.

Evaluation Metrics

SRCC (Spearman Rank-order Correlation Coefficient): measures monotonic association between predicted and ground-truth scores.
PLCC (Pearson Linear Correlation Coefficient): measures linear correlation after fitting.

Supported Datasets

Dataset	Type	# Images	Score Range
KonIQ-10K	Authentic distortions	10,073	MOS / 100
CLIVE	Authentic distortions	1,162	MOS / 100
SPAQ	Smartphone photos	11,125	MOS / 100
KADID-10K	Synthetic distortions	10,125	(DMOS - 1) / 4
FLIVE	Authentic (in-the-wild)	~40,000	MOS / 100
AGIQA-3K	AI-generated	2,982	MOS_quality / 5
AGIQA-1K	AI-generated	1,000	MOS / 5

Cross-Dataset Experiments

Pass a combined dataset ID to train.py:

KonIQ_10K_CLIVE — train on KonIQ-10K, evaluate on CLIVE
CLIVE_KonIQ_10K — train on CLIVE, evaluate on KonIQ-10K

License

See LICENSE.

TODO

Add pretrained checkpoints for KonIQ-10K (B_Gated, B_Sig)
Add cross-dataset pretrained checkpoints (KonIQ-10K -> CLIVE, CLIVE -> KonIQ-10K)
Add Sigmoid_CLIVE_to_KonIQ checkpoint (B_Sig trained on CLIVE, tested on KonIQ-10K)
Add pretrained checkpoints for SPAQ, KADID-10K, FLIVE, AGIQA-3K, AGIQA-1K

Citation

If you use this codebase in your research, please cite:

@InProceedings{Yadav_2026_WACV,
    author    = {Yadav, Ankit and Huy, Ta Duc and Liu, Lingqiao},
    title     = {Revisiting Vision-Language Foundations for No-Reference Image Quality Assessment},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {March},
    year      = {2026},
    pages     = {5416-5425}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Revisiting Vision-Language Foundations for No-Reference Image Quality Assessment

Official repository for the WACV 2026 paper

Quick Start

Architecture

Dataset Setup

Installation

Option A: pip (inside an existing environment)

Option B: Conda (creates a fresh environment)

Training

Key Training Arguments

Pretrained Checkpoints

Results

(a) Non-diffusion methods

Evaluation

eval.py (B_Gated only, with GradCAM)

eval_checkpoint.py (all architectures)

By run name (recommended)

Ad-hoc evaluation (user-trained checkpoints)

eval_all.py (reproduce every paper number in one shot)

Evaluation Metrics

Supported Datasets

Cross-Dataset Experiments

License

TODO

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
configs		configs
models		models
pretrained_checkpoints		pretrained_checkpoints
resources		resources
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
environment.yml		environment.yml
eval.py		eval.py
eval_all.py		eval_all.py
eval_checkpoint.py		eval_checkpoint.py
requirements.txt		requirements.txt
seed.py		seed.py
train.py		train.py
util.py		util.py

Folders and files

Latest commit

History

Repository files navigation

Revisiting Vision-Language Foundations for No-Reference Image Quality Assessment

Official repository for the WACV 2026 paper

Quick Start

Architecture

Dataset Setup

Installation

Option A: pip (inside an existing environment)

Option B: Conda (creates a fresh environment)

Training

Key Training Arguments

Pretrained Checkpoints

Results

(a) Non-diffusion methods

Evaluation

eval.py (B_Gated only, with GradCAM)

eval_checkpoint.py (all architectures)

By run name (recommended)

Ad-hoc evaluation (user-trained checkpoints)

eval_all.py (reproduce every paper number in one shot)

Evaluation Metrics

Supported Datasets

Cross-Dataset Experiments

License

TODO

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages