InfoDisent learns semantically disentangled visual representations by augmenting a frozen pretrained backbone with three lightweight, jointly-trained components:
| Component | Role |
|---|---|
Orthogonal map (UnitaryMatrixMultiplication) |
Rotates the feature space so each channel can carry an independent concept |
Non-negative head (NonNegativeLinear) |
Ensures classification scores are a non-negative combination of concepts |
| Sparse pooling (Gumbel-Softmax max-pool) | Encourages each channel to fire at a single spatial location, producing prototype-like activations |
Only the three head layers are trained; the backbone is kept frozen.
infodisent/
├── train.py Main training entry point
├── requirements.txt
│
├── models/ Model definitions
│ ├── utils.py UnitaryMatrixMultiplication, NonNegativeLinear,
│ │ GumbelScheduler, train_phase
│ ├── resnet.py own_resnet{18,34,50}, base_resnet{18,34,50}
│ ├── densenet.py own_densenet121, base_densenet121
│ ├── convnext.py own_convnext_{tiny,large}, base_convnext_{tiny,large}
│ ├── vision_transformer.py own_vit_b_16, base_vit_b_16
│ ├── swin_transformer.py own_swin_v2_s, base_swin_v2_s
│ ├── maxvit.py own_maxvit_t, base_maxvit_t
│ └── simple_cnn.py own_simple_cnn (MNIST / toy experiments)
│
├── utils/ Training utilities
│ ├── presets.py Data-augmentation presets
│ ├── transforms.py MixUp / CutMix helpers
│ ├── sampler.py Repeated-Augmentation sampler
│ ├── tensor2image.py Image-grid helper
│ ├── utils.py MetricLogger, accuracy helpers
│ └── using_wandb.py W&B initialisation helpers
│
├── scripts/ Ready-to-run training scripts
│ ├── train_resnet_cub.sh InfoDisent ResNet → CUB-200-2011
│ ├── train_resnet_imagenet.sh InfoDisent ResNet → ImageNet
│ ├── train_densenet_cub.sh InfoDisent DenseNet → CUB-200-2011
│ ├── train_convnext_cub.sh InfoDisent ConvNeXt → CUB-200-2011
│ ├── train_swin_imagenet.sh InfoDisent Swin-V2-S → ImageNet
│ ├── train_maxvit_imagenet.sh InfoDisent MaxViT-T → ImageNet
│ ├── train_vit_imagenet.sh InfoDisent ViT-B/16 → ImageNet
│ ├── train_baselines_cub.sh All baselines → CUB-200-2011
│ ├── train_baselines_imagenet.sh All baselines → ImageNet
│ └── train_baselines_cars.sh All baselines → Stanford Cars
│
└── analysis/ Post-training analysis notebooks
├── README.md
├── utils_analysis.py Shared helpers (load_data, generate_heatmap, …)
├── evaluation.ipynb Accuracy tables
├── disentanglement_scores.ipynb RV coefficient / diversity scores
├── sparsity.ipynb Channel sparsity + prototype extraction
└── heatmaps_and_visualisation.ipynb Heatmap overlays, Grad-CAM comparison
git clone https://github.com/your-org/infodisent.git
cd infodisent
pip install -r requirements.txtTested with Python 3.10, PyTorch 2.2, torchvision 0.17.
| Dataset | Layout | Notes |
|---|---|---|
| CUB-200-2011 | <root>/train/<class>/, <root>/val/<class>/ |
data_type=cropped or full; part annotations in <cub_root>/parts/ needed for semantic purity |
| Stanford Cars | <root>/train/<class>/, <root>/val/<class>/ |
data_type=cropped |
| Stanford Dogs | <root>/train/<class>/, <root>/val/<class>/ |
data_type=other |
| ImageNet | Standard ILSVRC layout | data_type=other |
# CUB-200-2011 (edit DATA_PATH inside the script first):
bash scripts/train_baselines_cub.sh 0
# ImageNet:
bash scripts/train_baselines_imagenet.sh 0# Edit DATA_PATH and RESUME in the script, then:
bash scripts/train_resnet_cub.sh 0Or call train.py directly:
python train.py \
--data-path /data/CUB_200_2011/cub200_cropped \
--dataset-name CUB_200_2011 \
--data_type cropped \
--model own_resnet34 \
--resume /checkpoints/base_resnet34_cub.pth \
--output-dir ./results/$(date +%Y-%m-%d) \
--epochs 25 \
--batch-size 16 \
--opt adamw \
--lr 1e-5 \
--lr-scheduler reducelronplateau \
--gumbel-dim -1 \
--gumbel_tau 1.0 0.2 \
--gumbel_range 5 25 \
--finetuningcd analysis
jupyter lab
# Open and run notebooks 1 → 4 in orderBaseline models (no InfoDisent head) are initialised from torchvision ImageNet pretrained weights, downloaded automatically on first use.
Full list of available weights:
| Argument | Default | Description |
|---|---|---|
--model |
resnet18 |
Model name (see models/__init__.py) |
--finetuning |
False |
Freeze backbone; train only changed_layers |
--gumbel-dim |
1 |
Gumbel axis: -1 (flatten spatial) or 1 (channel) |
--gumbel_tau |
1.0 0.2 |
Start and end temperatures |
--gumbel_range |
20 90 |
Epoch range for τ annealing |
--gumbel_annealing_strategy |
cosine |
linear, exponential, cosine, constant |
--amp |
False |
Enable automatic mixed precision |
--wandb-project |
None |
W&B project (omit to disable) |
@inproceedings{infodisent2026,
title = {InfoDisent: Explainability of Image Classification Models by Information Disentanglement},
author = {Łukasz Struski, Dawid Rymarczyk, Jacek Tabor},
booktitle = {https://arxiv.org/abs/2409.10329},
year = {2026},
}MIT License.