InfoDisent

InfoDisent learns semantically disentangled visual representations by augmenting a frozen pretrained backbone with three lightweight, jointly-trained components:

Component	Role
Orthogonal map (`UnitaryMatrixMultiplication`)	Rotates the feature space so each channel can carry an independent concept
Non-negative head (`NonNegativeLinear`)	Ensures classification scores are a non-negative combination of concepts
Sparse pooling (Gumbel-Softmax max-pool)	Encourages each channel to fire at a single spatial location, producing prototype-like activations

Only the three head layers are trained; the backbone is kept frozen.

Repository layout

infodisent/
├── train.py                        Main training entry point
├── requirements.txt
│
├── models/                         Model definitions
│   ├── utils.py                    UnitaryMatrixMultiplication, NonNegativeLinear,
│   │                               GumbelScheduler, train_phase
│   ├── resnet.py                   own_resnet{18,34,50},  base_resnet{18,34,50}
│   ├── densenet.py                 own_densenet121,       base_densenet121
│   ├── convnext.py                 own_convnext_{tiny,large}, base_convnext_{tiny,large}
│   ├── vision_transformer.py       own_vit_b_16,          base_vit_b_16
│   ├── swin_transformer.py         own_swin_v2_s,         base_swin_v2_s
│   ├── maxvit.py                   own_maxvit_t,          base_maxvit_t
│   └── simple_cnn.py               own_simple_cnn (MNIST / toy experiments)
│
├── utils/                          Training utilities
│   ├── presets.py                  Data-augmentation presets
│   ├── transforms.py               MixUp / CutMix helpers
│   ├── sampler.py                  Repeated-Augmentation sampler
│   ├── tensor2image.py             Image-grid helper
│   ├── utils.py                    MetricLogger, accuracy helpers
│   └── using_wandb.py              W&B initialisation helpers
│
├── scripts/                        Ready-to-run training scripts
│   ├── train_resnet_cub.sh         InfoDisent ResNet     → CUB-200-2011
│   ├── train_resnet_imagenet.sh    InfoDisent ResNet     → ImageNet
│   ├── train_densenet_cub.sh       InfoDisent DenseNet   → CUB-200-2011
│   ├── train_convnext_cub.sh       InfoDisent ConvNeXt   → CUB-200-2011
│   ├── train_swin_imagenet.sh      InfoDisent Swin-V2-S  → ImageNet
│   ├── train_maxvit_imagenet.sh    InfoDisent MaxViT-T   → ImageNet
│   ├── train_vit_imagenet.sh       InfoDisent ViT-B/16   → ImageNet
│   ├── train_baselines_cub.sh      All baselines         → CUB-200-2011
│   ├── train_baselines_imagenet.sh All baselines         → ImageNet
│   └── train_baselines_cars.sh     All baselines         → Stanford Cars
│
└── analysis/                       Post-training analysis notebooks
    ├── README.md
    ├── utils_analysis.py           Shared helpers (load_data, generate_heatmap, …)
    ├── evaluation.ipynb            Accuracy tables
    ├── disentanglement_scores.ipynb  RV coefficient / diversity scores
    ├── sparsity.ipynb              Channel sparsity + prototype extraction
    └── heatmaps_and_visualisation.ipynb  Heatmap overlays, Grad-CAM comparison

Installation

git clone https://github.com/your-org/infodisent.git
cd infodisent
pip install -r requirements.txt

Tested with Python 3.10, PyTorch 2.2, torchvision 0.17.

Datasets

Dataset	Layout	Notes
CUB-200-2011	`<root>/train/<class>/`, `<root>/val/<class>/`	`data_type=cropped` or `full`; part annotations in `<cub_root>/parts/` needed for semantic purity
Stanford Cars	`<root>/train/<class>/`, `<root>/val/<class>/`	`data_type=cropped`
Stanford Dogs	`<root>/train/<class>/`, `<root>/val/<class>/`	`data_type=other`
ImageNet	Standard ILSVRC layout	`data_type=other`

Quick start

Train baselines (frozen backbone, avg-pool head)

# CUB-200-2011 (edit DATA_PATH inside the script first):
bash scripts/train_baselines_cub.sh 0

# ImageNet:
bash scripts/train_baselines_imagenet.sh 0

Fine-tune the InfoDisent head

# Edit DATA_PATH and RESUME in the script, then:
bash scripts/train_resnet_cub.sh 0

Or call train.py directly:

python train.py \
    --data-path      /data/CUB_200_2011/cub200_cropped \
    --dataset-name   CUB_200_2011 \
    --data_type      cropped \
    --model          own_resnet34 \
    --resume         /checkpoints/base_resnet34_cub.pth \
    --output-dir     ./results/$(date +%Y-%m-%d) \
    --epochs         25 \
    --batch-size     16 \
    --opt            adamw \
    --lr             1e-5 \
    --lr-scheduler   reducelronplateau \
    --gumbel-dim     -1 \
    --gumbel_tau     1.0 0.2 \
    --gumbel_range   5 25 \
    --finetuning

Post-training analysis

cd analysis
jupyter lab
# Open and run notebooks 1 → 4 in order

Pretrained baseline weights

Baseline models (no InfoDisent head) are initialised from torchvision ImageNet pretrained weights, downloaded automatically on first use.

Full list of available weights:

https://docs.pytorch.org/vision/main/models.html

Key training arguments

Argument	Default	Description
`--model`	`resnet18`	Model name (see `models/__init__.py`)
`--finetuning`	`False`	Freeze backbone; train only `changed_layers`
`--gumbel-dim`	`1`	Gumbel axis: `-1` (flatten spatial) or `1` (channel)
`--gumbel_tau`	`1.0 0.2`	Start and end temperatures
`--gumbel_range`	`20 90`	Epoch range for τ annealing
`--gumbel_annealing_strategy`	`cosine`	`linear`, `exponential`, `cosine`, `constant`
`--amp`	`False`	Enable automatic mixed precision
`--wandb-project`	`None`	W&B project (omit to disable)

Citation

@inproceedings{infodisent2026,
  title     = {InfoDisent: Explainability of Image Classification Models by Information Disentanglement},
  author    = {Łukasz Struski, Dawid Rymarczyk, Jacek Tabor},
  booktitle = {https://arxiv.org/abs/2409.10329},
  year      = {2026},
}

License

MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InfoDisent

Repository layout

Installation

Datasets

Quick start

Train baselines (frozen backbone, avg-pool head)

Fine-tune the InfoDisent head

Post-training analysis

Pretrained baseline weights

Key training arguments

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
analysis		analysis
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

InfoDisent

Repository layout

Installation

Datasets

Quick start

Train baselines (frozen backbone, avg-pool head)

Fine-tune the InfoDisent head

Post-training analysis

Pretrained baseline weights

Key training arguments

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages