Official implementation of the paper in CVPR 2026.
Addressing the prohibitive costs of data annotation and the scarcity of sewer defect samples, we propose SFR-Net, a novel Multi-Label Zero-Shot Learning (ML-ZSL) framework. To mitigate "Alignment Ambiguity" in complex pipe environments, SFR-Net employs a three-stage paradigm: Representation Steering (RS) for scene adaptation, Multi-Granularity Evidence Fusion (MEF) for decoupled feature aggregation, and Generalized Relational Score Refining (GR) for transferring relational logic to unseen defects. Experiments on the Sewer-ML and WZ-Pipe datasets demonstrate that SFR-Net achieves state-of-the-art (SOTA) performance and significantly boosts zero-shot generalization.
a large-scale, multi-label benchmark dataset specifically designed for sewer pipe defect classification. It contains over 1.3 million images with 17 distinct defect categories. We select the five least-frequent categories as the unseen defects, ensuring no corresponding samples exist in the training set.
a distinct dataset that employs different inspection standards from Sewer-ML. Specifically, the standards followed by WZ-Pipe feature more detailed and well-defined defect categories. It comprises approximately 60,000 samples across 17 categories, with the partition of unseen classes remaining consistent with that of Sewer-ML to ensure experimental comparability.
Download this dataset by submitting a form in the following link:
https://forms.gle/AJVLWeQDv1uGVnG87
Note on Dataset Versioning
The current release is a refined version of the dataset reported in the paper. We have performed a second round of re-annotation and removed low-quality or privacy-sensitive samples for better usability and safety.
Result on refined WZ-Pipe Dataset
| Setting | P@1 | R@1 | F1@1 | P@3 | R@3 | F1@3 | mAP |
|---|---|---|---|---|---|---|---|
| ZSL | 5.13 | 13.68 | 7.46 | 6.55 | 52.42 | 11.65 | 8.25 |
| Setting | P@3 | R@3 | F1@3 | P@5 | R@5 | F1@5 | mAP |
|---|---|---|---|---|---|---|---|
| GZSL | 26.72 | 54.23 | 35.80 | 19.74 | 66.76 | 30.47 | 26.06 |
Install the environment through conda:
conda env create -f environment.yml| Backbone | Dataset | Resolution | mAP(ZSL/GZSL) | Download |
|---|---|---|---|---|
| ViT-B/16 | Sewer-ML | 224x224 | 12.58/43.28 | [Google Drive] [Baidu Netdisk] |
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_addr=localhost --master_port=12355 \
main_mlzsl.py --config_file configs/sewerml.ymlCUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_addr=localhost --master_port=12355 \
main_mlzsl.py --config_file configs/sewerml.yml MODEL.LOAD True TEST.EVAL True TEST.WEIGHT sewerml_best.pthpython main_mlzsl.py --config_file configs/sewerml.yml MODEL.DIST_TRAIN False MODEL.LOAD True TEST.EVAL True TEST.WEIGHT sewerml_best.pthIf you find our work useful, please cite our paper:
@InProceedings{Chen_2026_CVPR,
author = {Chen, Zhao-Min and Huang, Xinjian and Ge, Yisu and Li, Yu},
title = {SFR-Net: Steering-Fusion-Refining Network in Multi-label Zero-Shot Sewer Defect Detection},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2026},
pages = {41636-41645}
}
This repo benefits from RAM, CLIP and CLIP-Adapter. Thanks for their wonderful works.