CG-IAA: Towards Explainable Image Aesthetics Assessment with Attribute-Oriented Critiques Generation
Official PyTorch implementation of "Towards Explainable Image Aesthetics Assessment with Attribute-oriented Critiques Generation" (IEEE TCSVT 2025).
- π We release the multi-attribute aesthetic critiques generation model with pre-trained weights and training data!
- π Our CG-IAA paper was accepted by IEEE TCSVT!
- [Coming Soon] The complete aesthetic assessment model will be released soon.
CG-IAA addresses a critical challenge in image aesthetics assessment: How can we leverage the power of multimodal learning when aesthetic critiques are unavailable? Our solution generates high-quality aesthetic critiques from multiple attribute perspectives, enabling both accurate aesthetic prediction and enhanced model explainability.
-
Multi-Attribute Aesthetic Critiques Generation: We propose a CLIP-based model that generates diverse aesthetic critiques from four different perspectives:
- π¨ Color and Light: Color harmony, saturation, lighting quality
- π Composition: Layout, balance, structural elements
- π Depth and Focus: Depth of field, focus, blur effects
- β General Feelings: Overall aesthetic impression and quality
-
Enhanced Explainability: Generated critiques provide human-readable explanations for aesthetic judgments, making the model more transparent and interpretable.
The CG-IAA framework consists of three main components:
- VLAP (Vision-Language Aesthetic Pretraining): Fine-tune CLIP on aesthetic data
- MAEL (Multi-Attribute Experts Learning): Train attribute-specific expert models
- MAP (Multimodal Aesthetics Prediction): Fuse visual and textual features for final prediction
-
Aesthetic Critiques Generation Model - Multi-attribute aesthetic critiques generation
- Pre-trained model weights
- Inference code for single image processing
-
Training Data - Large-scale multi-attribute aesthetic critique dataset
- ~150K critiques for Color and Light
- ~100K critiques for Composition
- ~120K critiques for Depth and Focus
- ~570K critiques for General Feelings
- Total: ~940K aesthetic critiques with attribute annotations
- Complete aesthetic assessment model
# Clone the repository
git clone https://github.com/your-username/CG-IAA.git
cd CG-IAA
# Create and activate conda environment
conda env create -f environment.yml
conda activate cg-iaaDownload the pre-trained model weights from Google Drive and place them in the checkpoints/ directory:
π₯ Download Model Weights The checkpoints directory should contain:
checkpoints/
βββ base_model.pt # Base model
βββ color.pt # Color expert model
βββ composition.pt # Composition expert model
βββ dof.pt # Depth of Field expert model
βββ general.pt # General expert model
If you want to train your own models, download our multi-attribute aesthetic critique dataset:
Generate aesthetic critiques for a single image:
python caption_inference.py --image_path samples/1.jpgOutput:
================================================================================
Multi-Attribute Aesthetic Captions for: samples/1.jpg
================================================================================
[Color]
[Composition]
[Depth of Field]
[General]
================================================================================
Our generated aesthetic critiques achieve competitive performance when used alone for IAA task:
| Method | PLCC β | SRCC β | ACC β |
|---|---|---|---|
| ARIC (AAAI 2023) | 0.591 | 0.550 | 74.3 |
| VILA (CVPR 2023) | 0.534 | 0.505 | 75.2 |
| AesCritique (Ours) | 0.720 | 0.712 | 80.8 |
Tested on AVA database using text-only input
Our released multi-attribute aesthetic critique dataset is organized as follows:
data/
βββ color.json # Color and Light critiques
βββ composition.json # Composition critiques
βββ dof.json # Depth and Focus critiques
βββ general.json # General Feelings critiques
Each JSON file contains entries in the following format:
[
{
"id": 0,
"img_id": "773931",
"caption": "Image feels a tad dark, which I dont think helps this image for me."
},
...
]CG-IAA is built upon the following excellent open-source projects:
- CLIP - Contrastive Language-Image Pre-training
- ClipCap - CLIP Prefix for Image Captioning
- timm - PyTorch Image Models
If you find our work useful, please consider citing our paper:
@article{li2025cgiaa,
author={Li, Leida and Sheng, Xiangfei and Chen, Pengfei and Wu, Jinjian and Dong, Weisheng},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
title={Towards Explainable Image Aesthetics Assessment With Attribute-Oriented Critiques Generation},
year={2025},
volume={35},
number={2},
pages={1464-1477}
}