Contrastive Transformer Masked Image Hashing for Degraded Image Retrieval
Contrastive Transformer Masked Image Hashing for Degraded Image Retrieval
Xiaobo Shen, Haoyu Cai, Xiuwen Gong, Yuhui Zheng
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 1218-1226.
https://doi.org/10.24963/ijcai.2024/135
Hashing utilizes hash code as a compact image representation, offering excellent performance in large-scale image retrieval due to its computational and storage advantages. However, the prevalence of degraded images on social media platforms, resulting from imperfections in the image capture process, poses new challenges for conventional image retrieval methods. To address this issue, we propose Contrastive Transformer Masked Image Hashing (CTMIH), a novel deep unsupervised hashing method specifically designed for degraded image retrieval, which is challenging yet relatively less studied. CTMIH addresses the problem by training on transformed and masked images, aiming to learn transform-invariant hash code in an unsupervised manner to mitigate performance degradation caused by image deterioration. CTMIH utilizes Vision Transformer (ViT) architecture applied to image patches to capture distant semantic relevance. CTMIH introduces cross-view debiased contrastive loss to align hash tokens from augmented views of the same image and presents semantic mask reconstruction loss at the patch level to recover masked patch tokens. Extensive empirical studies conducted on three benchmark datasets demonstrate the superiority of the proposed CTMIH over the state-of-the-art in both degraded and normal image retrieval.
Keywords:
Computer Vision: CV: Image and video retrieval
Machine Learning: ML: Unsupervised learning