This is the official PyTorch implementation of:
VisualAD: Language-Free Zero-Shot Anomaly Detection via Vision Transformer
CVPR 2026
- Language-Free: VisualAD removes the text encoder entirely and learns anomaly/normal prototypes purely in the visual feature space.
- Two Learnable Tokens: An anomaly token and a normal token are inserted into a frozen ViT, interacting with patch tokens through multi-layer self-attention to encode normality and abnormality.
- SCA & SAF Modules: Spatial-Aware Cross-Attention (SCA) injects fine-grained spatial evidence into the tokens; Self-Alignment Function (SAF) recalibrates patch features before anomaly scoring.
- 13 Benchmarks: State-of-the-art performance across 6 industrial and 7 medical zero-shot anomaly detection benchmarks.
- Backbone Agnostic: Adapts seamlessly to CLIP (ViT-L/14@336px) and DINOv2 (ViT-L/14).
| Dataset | WinCLIP | APRIL-GAN | AnomalyCLIP | AdaCLIP | VisualAD (CLIP) | VisualAD (DINOv2) |
|---|---|---|---|---|---|---|
| MVTec-AD | 90.4 / 92.7 / 95.6 | 86.1 / 90.4 / 93.6 | 91.6 / 92.7 / 96.2 | 92.0 / 92.7 / 96.4 | 92.2 / 93.2 / 96.7 | 90.1 / 92.4 / 94.8 |
| VisA | 75.6 / 78.2 / 78.8 | 77.4 / 78.6 / 80.9 | 81.0 / 80.3 / 84.4 | 79.7 / 79.6 / 83.2 | 84.7 / 82.5 / 87.6 | 83.1 / 81.4 / 86.8 |
| BTAD | 68.2 / 67.8 / 70.9 | 73.7 / 68.7 / 69.9 | 88.7 / 86.0 / 90.6 | 90.0 / 87.2 / 91.5 | 94.9 / 93.9 / 97.0 | 88.2 / 84.7 / 89.7 |
| KSDD2 | 93.5 / 86.4 / 94.2 | 90.4 / 82.9 / 92.0 | 91.9 / 84.5 / 93.4 | 94.9 / 90.3 / 96.2 | 98.0 / 93.9 / 98.3 | 97.7 / 93.1 / 98.1 |
| DAGM | 91.8 / 75.8 / 79.5 | 94.4 / 80.3 / 83.9 | 98.0 / 90.6 / 92.4 | 98.3 / 91.5 / 94.2 | 99.5 / 95.0 / 97.8 | 93.2 / 83.9 / 86.1 |
| DTD-Synthetic | 95.1 / 94.1 / 97.7 | 85.5 / 89.1 / 94.0 | 93.7 / 94.3 / 97.4 | 92.1 / 92.4 / 96.3 | 97.5 / 96.6 / 99.1 | 91.0 / 94.4 / 97.4 |
| Dataset | WinCLIP | APRIL-GAN | AnomalyCLIP | AdaCLIP | VisualAD (CLIP) | VisualAD (DINOv2) |
|---|---|---|---|---|---|---|
| MVTec-AD | 82.3 / 24.8 / 18.2 / 62.0 | 87.5 / 42.3 / 39.1 / 43.7 | 91.0 / 38.9 / 34.4 / 81.7 | 88.5 / 43.9 / 41.0 / 47.6 | 90.8 / 43.9 / 41.2 / 87.5 | 91.3 / 47.4 / 45.4 / 88.6 |
| VisA | 73.2 / 9.0 / 5.4 / 51.1 | 93.8 / 32.6 / 26.2 / 86.5 | 95.4 / 27.6 / 20.7 / 86.4 | 95.1 / 33.8 / 29.2 / 71.3 | 95.8 / 34.6 / 28.4 / 91.0 | 95.3 / 35.2 / 29.9 / 88.2 |
| BTAD | 72.7 / 18.5 / 12.9 / 27.3 | 91.3 / 40.1 / 37.7 / 21.0 | 93.0 / 47.1 / 41.5 / 71.0 | 87.7 / 42.3 / 36.6 / 17.1 | 91.1 / 49.8 / 43.1 / 80.4 | 93.4 / 42.6 / 38.7 / 76.7 |
| DTD-Synthetic | 79.5 / 16.1 / 9.8 / 51.5 | 94.9 / 60.4 / 61.0 / 33.8 | 97.5 / 55.8 / 52.5 / 87.9 | 95.1 / 58.4 / 56.1 / 34.3 | 98.1 / 64.3 / 65.5 / 94.8 | 96.7 / 65.8 / 67.7 / 92.4 |
All backbones use ViT-L/14@336px (CLIP) or ViT-L/14 (DINOv2). Full results including medical benchmarks (OCT17, BrainMRI, Brain_AD, HIS, CVC-ClinicDB, Endo, Kvasir) are available in our paper.
pip install -r requirements.txtMain dependencies: PyTorch >= 2.0, torchvision, timm, scikit-learn, scipy, tqdm.
VisualAD is evaluated on 13 zero-shot anomaly detection benchmarks across industrial inspection and medical imaging. The dataset packages used by this project are available below:
| Dataset | Domain | Description | Download |
|---|---|---|---|
| MVTec-AD | Industrial | Object and texture anomaly detection benchmark with image-level labels and pixel-level masks. | Google Drive |
| VisA | Industrial | Visual anomaly benchmark for industrial object inspection across multiple product categories. | Google Drive |
| BTAD | Industrial | BTech industrial anomaly detection benchmark with three product categories. | Google Drive |
| KSDD2 | Industrial | Surface defect detection benchmark for production-line visual inspection. | Google Drive |
| DAGM | Industrial | Synthetic textured-surface defect benchmark with annotated defects. | Google Drive |
| DTD-Synthetic | Industrial | Synthetic anomaly localization benchmark built from texture images. | Google Drive |
| OCT17 | Medical | Retinal OCT scans for retinal disease anomaly/classification evaluation. | Google Drive |
| BrainMRI | Medical | Brain MRI images for brain tumor or lesion anomaly classification. | Google Drive |
| Brain_AD | Medical | Brain MRI benchmark for brain tumor or lesion anomaly classification. | Google Drive |
| HIS | Medical | Histopathology images for abnormal tissue analysis. | Google Drive |
| CVC-ClinicDB | Medical | Colonoscopy polyp segmentation benchmark with pixel-level annotations. | Google Drive |
| Endo | Medical | Endoscopic anomaly/lesion benchmark with segmentation annotations. | Google Drive |
| Kvasir | Medical | Gastrointestinal endoscopy benchmark for polyp/anomaly segmentation. | Google Drive |
We adopt the same dataset structure and JSON format as AnomalyCLIP. After downloading a dataset, place it under your local dataset directory and make sure meta.json exists in the dataset root. We also provide scripts to generate the required JSON metadata files:
python generate_dataset_json/mvtec.py
python generate_dataset_json/visa.pyWe provide pre-trained checkpoints with the CLIP (ViT-L/14@336px) backbone:
| Training Set | Checkpoint | Evaluation Set |
|---|---|---|
| VisA | weight/train_on_visa/CLIP.pth |
MVTec-AD and other cross-dataset benchmarks |
| MVTec-AD | weight/train_on_mvtec/CLIP.pth |
VisA |
Run the full cross-dataset training and evaluation pipeline (MVTec-AD <-> VisA) with a single command:
bash scripts/CLIP.shPlease modify the dataset paths in scripts/CLIP.sh before running.
If you find this work useful, please consider citing:
@InProceedings{Hou_2026_CVPR,
author = {Hou, Yanning and Li, Peiyuan and Liu, Zirui and Wang, Yitong and Ruan, Yanran and Qiu, Jianfeng and Xu, Ke},
title = {VisualAD: Language-Free Zero-Shot Anomaly Detection via Vision Transformer},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2026},
pages = {21346-21356}
}This project builds upon CLIP and DINOv2. We thank the authors of AnomalyCLIP and AdaCLIP for their open-source implementations.
If you have any questions, feel free to reach out:
- Email: yanning_hou@nudt.edu.cn
- WeChat: HYNing777