Skip to content
/ CG-IAA Public

[TCSVT 2025] Official code release of our paper "Towards Explainable Image Aesthetics Assessment With Attribute-Oriented Critiques Generation"

License

Notifications You must be signed in to change notification settings

sxfly99/CG-IAA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CG-IAA: Towards Explainable Image Aesthetics Assessment with Attribute-Oriented Critiques Generation

Paper License

Official PyTorch implementation of "Towards Explainable Image Aesthetics Assessment with Attribute-oriented Critiques Generation" (IEEE TCSVT 2025).


πŸ“° News

  • πŸŽ‰ We release the multi-attribute aesthetic critiques generation model with pre-trained weights and training data!
  • πŸŽ‰ Our CG-IAA paper was accepted by IEEE TCSVT!
  • [Coming Soon] The complete aesthetic assessment model will be released soon.

πŸ’‘ Overview

CG-IAA addresses a critical challenge in image aesthetics assessment: How can we leverage the power of multimodal learning when aesthetic critiques are unavailable? Our solution generates high-quality aesthetic critiques from multiple attribute perspectives, enabling both accurate aesthetic prediction and enhanced model explainability.

Key Contributions

  • Multi-Attribute Aesthetic Critiques Generation: We propose a CLIP-based model that generates diverse aesthetic critiques from four different perspectives:

    • 🎨 Color and Light: Color harmony, saturation, lighting quality
    • πŸ“ Composition: Layout, balance, structural elements
    • πŸ” Depth and Focus: Depth of field, focus, blur effects
    • ⭐ General Feelings: Overall aesthetic impression and quality
  • Enhanced Explainability: Generated critiques provide human-readable explanations for aesthetic judgments, making the model more transparent and interpretable.

Framework Architecture

CG-IAA Pipeline

The CG-IAA framework consists of three main components:

  1. VLAP (Vision-Language Aesthetic Pretraining): Fine-tune CLIP on aesthetic data
  2. MAEL (Multi-Attribute Experts Learning): Train attribute-specific expert models
  3. MAP (Multimodal Aesthetics Prediction): Fuse visual and textual features for final prediction

πŸš€ What's Released

βœ… Currently Available

  1. Aesthetic Critiques Generation Model - Multi-attribute aesthetic critiques generation

    • Pre-trained model weights
    • Inference code for single image processing
  2. Training Data - Large-scale multi-attribute aesthetic critique dataset

    • ~150K critiques for Color and Light
    • ~100K critiques for Composition
    • ~120K critiques for Depth and Focus
    • ~570K critiques for General Feelings
    • Total: ~940K aesthetic critiques with attribute annotations

πŸ”œ Coming Soon

  • Complete aesthetic assessment model

πŸ“¦ Installation

Requirements

# Clone the repository
git clone https://github.com/your-username/CG-IAA.git
cd CG-IAA

# Create and activate conda environment
conda env create -f environment.yml
conda activate cg-iaa

Download Pre-trained Weights

Download the pre-trained model weights from Google Drive and place them in the checkpoints/ directory:

πŸ“₯ Download Model Weights The checkpoints directory should contain:

checkpoints/
β”œβ”€β”€ base_model.pt          # Base model
β”œβ”€β”€ color.pt              # Color expert model
β”œβ”€β”€ composition.pt        # Composition expert model
β”œβ”€β”€ dof.pt               # Depth of Field expert model
└── general.pt           # General expert model

Download Data (Optional)

If you want to train your own models, download our multi-attribute aesthetic critique dataset:

πŸ“₯ Download Training Data


🎯 Quick Start

Single Image Inference

Generate aesthetic critiques for a single image:

python caption_inference.py --image_path samples/1.jpg

Output:

================================================================================
Multi-Attribute Aesthetic Captions for: samples/1.jpg
================================================================================

[Color]

[Composition]

[Depth of Field]

[General]

================================================================================

πŸ“Š Model Performance

Our generated aesthetic critiques achieve competitive performance when used alone for IAA task:

Method PLCC ↑ SRCC ↑ ACC ↑
ARIC (AAAI 2023) 0.591 0.550 74.3
VILA (CVPR 2023) 0.534 0.505 75.2
AesCritique (Ours) 0.720 0.712 80.8

Tested on AVA database using text-only input


πŸ“ Dataset Structure

Our released multi-attribute aesthetic critique dataset is organized as follows:

data/
β”œβ”€β”€ color.json           # Color and Light critiques
β”œβ”€β”€ composition.json     # Composition critiques
β”œβ”€β”€ dof.json            # Depth and Focus critiques
└── general.json        # General Feelings critiques

Each JSON file contains entries in the following format:

[
  {
    "id": 0,
    "img_id": "773931",
    "caption": "Image feels a tad dark, which I dont think helps this image for me."
  },
  ...
]

πŸ“Š Visualization

visualization


πŸ™ Acknowledgement

CG-IAA is built upon the following excellent open-source projects:

  • CLIP - Contrastive Language-Image Pre-training
  • ClipCap - CLIP Prefix for Image Captioning
  • timm - PyTorch Image Models

πŸ“– Citation

If you find our work useful, please consider citing our paper:

@article{li2025cgiaa,
  author={Li, Leida and Sheng, Xiangfei and Chen, Pengfei and Wu, Jinjian and Dong, Weisheng},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={Towards Explainable Image Aesthetics Assessment With Attribute-Oriented Critiques Generation}, 
  year={2025},
  volume={35},
  number={2},
  pages={1464-1477}
}

About

[TCSVT 2025] Official code release of our paper "Towards Explainable Image Aesthetics Assessment With Attribute-Oriented Critiques Generation"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages