Skip to content

gmum/InfoDisent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InfoDisent

InfoDisent learns semantically disentangled visual representations by augmenting a frozen pretrained backbone with three lightweight, jointly-trained components:

Component Role
Orthogonal map (UnitaryMatrixMultiplication) Rotates the feature space so each channel can carry an independent concept
Non-negative head (NonNegativeLinear) Ensures classification scores are a non-negative combination of concepts
Sparse pooling (Gumbel-Softmax max-pool) Encourages each channel to fire at a single spatial location, producing prototype-like activations

Only the three head layers are trained; the backbone is kept frozen.


Repository layout

infodisent/
├── train.py                        Main training entry point
├── requirements.txt
│
├── models/                         Model definitions
│   ├── utils.py                    UnitaryMatrixMultiplication, NonNegativeLinear,
│   │                               GumbelScheduler, train_phase
│   ├── resnet.py                   own_resnet{18,34,50},  base_resnet{18,34,50}
│   ├── densenet.py                 own_densenet121,       base_densenet121
│   ├── convnext.py                 own_convnext_{tiny,large}, base_convnext_{tiny,large}
│   ├── vision_transformer.py       own_vit_b_16,          base_vit_b_16
│   ├── swin_transformer.py         own_swin_v2_s,         base_swin_v2_s
│   ├── maxvit.py                   own_maxvit_t,          base_maxvit_t
│   └── simple_cnn.py               own_simple_cnn (MNIST / toy experiments)
│
├── utils/                          Training utilities
│   ├── presets.py                  Data-augmentation presets
│   ├── transforms.py               MixUp / CutMix helpers
│   ├── sampler.py                  Repeated-Augmentation sampler
│   ├── tensor2image.py             Image-grid helper
│   ├── utils.py                    MetricLogger, accuracy helpers
│   └── using_wandb.py              W&B initialisation helpers
│
├── scripts/                        Ready-to-run training scripts
│   ├── train_resnet_cub.sh         InfoDisent ResNet     → CUB-200-2011
│   ├── train_resnet_imagenet.sh    InfoDisent ResNet     → ImageNet
│   ├── train_densenet_cub.sh       InfoDisent DenseNet   → CUB-200-2011
│   ├── train_convnext_cub.sh       InfoDisent ConvNeXt   → CUB-200-2011
│   ├── train_swin_imagenet.sh      InfoDisent Swin-V2-S  → ImageNet
│   ├── train_maxvit_imagenet.sh    InfoDisent MaxViT-T   → ImageNet
│   ├── train_vit_imagenet.sh       InfoDisent ViT-B/16   → ImageNet
│   ├── train_baselines_cub.sh      All baselines         → CUB-200-2011
│   ├── train_baselines_imagenet.sh All baselines         → ImageNet
│   └── train_baselines_cars.sh     All baselines         → Stanford Cars
│
└── analysis/                       Post-training analysis notebooks
    ├── README.md
    ├── utils_analysis.py           Shared helpers (load_data, generate_heatmap, …)
    ├── evaluation.ipynb            Accuracy tables
    ├── disentanglement_scores.ipynb  RV coefficient / diversity scores
    ├── sparsity.ipynb              Channel sparsity + prototype extraction
    └── heatmaps_and_visualisation.ipynb  Heatmap overlays, Grad-CAM comparison

Installation

git clone https://github.com/your-org/infodisent.git
cd infodisent
pip install -r requirements.txt

Tested with Python 3.10, PyTorch 2.2, torchvision 0.17.


Datasets

Dataset Layout Notes
CUB-200-2011 <root>/train/<class>/, <root>/val/<class>/ data_type=cropped or full; part annotations in <cub_root>/parts/ needed for semantic purity
Stanford Cars <root>/train/<class>/, <root>/val/<class>/ data_type=cropped
Stanford Dogs <root>/train/<class>/, <root>/val/<class>/ data_type=other
ImageNet Standard ILSVRC layout data_type=other

Quick start

Train baselines (frozen backbone, avg-pool head)

# CUB-200-2011 (edit DATA_PATH inside the script first):
bash scripts/train_baselines_cub.sh 0

# ImageNet:
bash scripts/train_baselines_imagenet.sh 0

Fine-tune the InfoDisent head

# Edit DATA_PATH and RESUME in the script, then:
bash scripts/train_resnet_cub.sh 0

Or call train.py directly:

python train.py \
    --data-path      /data/CUB_200_2011/cub200_cropped \
    --dataset-name   CUB_200_2011 \
    --data_type      cropped \
    --model          own_resnet34 \
    --resume         /checkpoints/base_resnet34_cub.pth \
    --output-dir     ./results/$(date +%Y-%m-%d) \
    --epochs         25 \
    --batch-size     16 \
    --opt            adamw \
    --lr             1e-5 \
    --lr-scheduler   reducelronplateau \
    --gumbel-dim     -1 \
    --gumbel_tau     1.0 0.2 \
    --gumbel_range   5 25 \
    --finetuning

Post-training analysis

cd analysis
jupyter lab
# Open and run notebooks 1 → 4 in order

Pretrained baseline weights

Baseline models (no InfoDisent head) are initialised from torchvision ImageNet pretrained weights, downloaded automatically on first use.

Full list of available weights:

https://docs.pytorch.org/vision/main/models.html


Key training arguments

Argument Default Description
--model resnet18 Model name (see models/__init__.py)
--finetuning False Freeze backbone; train only changed_layers
--gumbel-dim 1 Gumbel axis: -1 (flatten spatial) or 1 (channel)
--gumbel_tau 1.0 0.2 Start and end temperatures
--gumbel_range 20 90 Epoch range for τ annealing
--gumbel_annealing_strategy cosine linear, exponential, cosine, constant
--amp False Enable automatic mixed precision
--wandb-project None W&B project (omit to disable)

Citation

@inproceedings{infodisent2026,
  title     = {InfoDisent: Explainability of Image Classification Models by Information Disentanglement},
  author    = {Łukasz Struski, Dawid Rymarczyk, Jacek Tabor},
  booktitle = {https://arxiv.org/abs/2409.10329},
  year      = {2026},
}

License

MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors