Original Authors of the project:
https://github.com/dynamical-inference/patchsae
https://github.com/Prisma-Multimodal/ViT-Prisma
Set up your environment with these simple steps:
# Create and activate environment
conda create --name patchsae python=3.12
conda activate patchsae
# Install dependencies
pip install -r requirements.txt
# Always set PYTHONPATH before running any scripts
cd patchsae
PYTHONPATH=./ python src/demo/app.pyFirst, download the necessary files:
You can download the files using gdown as follows:
# Activate environment first (see Getting Started)
# Download necessary files (35MB + 513MB)
gdown --id 1NJzF8PriKz_mopBY4l8_44R0FVi2uw2g # out.zip
gdown --id 1reuDjXsiMkntf1JJPLC5a3CcWuJ6Ji3Z # data.zip
# Extract files
unzip data.zip
unzip out.zipπ‘ Need
gdown? Install it with:conda install conda-forge::gdown
Your folder structure should look like:
patchsae/
βββ configs/
βββ data/ # From data.zip
βββ out/ # From out.zip
βββ src/
β βββ demo/
β βββ app.py
βββ tasks/
βββ requirements.txt
βββ ... (other files)
- First run will download datasets from HuggingFace automatically (About 30GB in total)
- Demo runs on CPU by default
- Access the interface at http://127.0.0.1:7860 (or the URL shown in terminal)
- Training Instructions: See tasks/README.md
- Analysis Notebooks:
This section implements test-time adaptation using Sparse Autoencoders (SAE) for neuron amplification on Vision Transformers.
- run
run_tta.pyfor evaluation of Neuron Amplication - Implementation Wrapper at
vit_tta.pyusing SAE-Tester - Simple evaluation logic at
evalate.py - Additional experiments coming up...
The implementation uses ViT-Prisma for SAE-based interventions.
# Navigate to ViT-Prisma directory
cd ViT-Prisma
# Install according to their documentation
pip install -e .
# Or see: ViT-Prisma/docs for detailed installation instructions# Download ImageNet-Sketch (sketch domain for evaluation)
# Place it in ./data/imagenet_sketch/.
βββ prisma_tta.py # Main evaluation script
βββ tools/ # Core implementation modules
β βββ config.py # Configuration settings
β βββ models.py # Model and SAE loading
β βββ data.py # Dataset handling
β βββ hooks.py # Feature amplification hooks
β βββ evaluation.py # Evaluation logic
β βββ utils.py # Utility functions
βββ ViT-Prisma/ # SAE implementation (submodule)
βββ src/
βββ vit_prisma/
python prisma_tta.py --data_path ./data/imagenet_sketchpython prisma_tta.py --data_path ./data/imagenet_sketch \
--layers 9 10 11 \
--k 1 --gamma 2.0 --eta 1.0 \
--batch_size 64 \
--save_resultsFor systems with limited GPU memory, use separate passes:
python prisma_tta.py --data_path ./data/imagenet_sketch \
--separate_passes \
--batch_size 128python prisma_tta.py --data_path ./data/imagenet_sketch \
--subset_size 1000 \
--batch_size 32--layers: Transformer layers to apply amplification (default:[9, 10, 11])--k: Number of top-K features to amplify (default:1)--gamma: Amplification coefficient (default:1.5)--eta: Delta scaling coefficient (default:1.0)--selection_method: Patch selection method (topk,threshold,adaptive)--top_k_percent: Percentage of patches to select (default:0.4)--separate_passes: Enable memory-efficient evaluation mode--save_results: Save results to./results/directory
- SAE Integration: Uses pre-trained SAEs from Prisma-Multimodal
- Neuron Amplification: Selectively amplifies top-K activated features in SAE latent space
- Spatial Selection: Optional spatial masking for patch-wise feature control
- CLIP-based Evaluation: Zero-shot classification using CLIP text embeddings
- Colab notebook: Prisma-TTA.ipynb (if available)
- Additional experiments coming up...
- SAE for ViT
- SAELens
- Differentiable and Fast Geometric Median in NumPy and PyTorch
- Self-regulating Prompts: Foundational Model Adaptation without Forgetting [ICCV 2023]
- Used in:
configs/andmsrc/models/
- Used in:
- MaPLe: Multi-modal Prompt Learning CVPR 2023
- Used in:
configs/models/maple/...yamlanddata/clip/maple/imagenet/model.pth.tar-2
- Used in:
Our code is distributed under an MIT license, please see the LICENSE file for details. The NOTICE file lists license for all third-party code included in this repository. Please include the contents of the LICENSE and NOTICE files in all re-distributions of this code.
If you find our code or models useful in your work, please cite our paper:
@inproceedings{
lim2025patchsae,
title={Sparse autoencoders reveal selective remapping of visual concepts during adaptation},
author={Hyesu Lim and Jinho Choi and Jaegul Choo and Steffen Schneider},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=imT03YXlG2}
}