skip to main content
research-article

Intermediary-Generated Bridge Network for RGB-D Cross-Modal Re-Identification

Published: 19 November 2024 Publication History

Abstract

RGB-D cross-modal person re-identification (re-id) targets at retrieving the person of interest across RGB and depth image modalities. To cope with the modal discrepancy, some existing methods generate an auxiliary mode with either inherent properties of input modes or extra deep networks. However, such useful intermediary role included in generated mode is often overlooked in these approaches, leading to insufficient exploitation of crucial bridge knowledge. By contrast, in this article, we propose a novel approach that constructs an intermediary mode through the constraints of self-supervised intermediary learning, which is freedom from modal prior knowledge and additional module parameters. We then design a bridge network to fully mine the intermediary role of generated modality through carrying out multi-modal integration and decomposition. For one thing, this network leverages a multi-modal transformer to integrate the information of three modes via fully exploiting their heterogeneous relations with the intermediary mode as the bridge. It conducts the identification consistency constraint to promote cross-modal associations. For another, it employs circle contrastive learning to decompose the cross-modal constraint process into several subprocedures, which provides the intermediate relay during pulling two original modalities closer. Experiments on two public datasets demonstrate that the proposed method exceeds the state-of-the-arts. The effectiveness of each component in this method is verified through numerous ablation studies. Additionally, we have demonstrated the generalization ability of the proposed method through experiments.

References

[1]
Emrah Basaran, Muhittin Gökmen, and Mustafa E Kamasak. 2020. An efficient framework for visible–infrared cross modality person re-identification. Signal Processing: Image Communication 87 (2020), 115933.
[2]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning. PMLR, 1597–1607.
[3]
Seokeon Choi, Sumin Lee, Youngeun Kim, Taekyung Kim, and Changick Kim. 2020. Hi-CMD: Hierarchical cross-modality disentanglement for visible-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10257–10266.
[4]
Pingyang Dai, Rongrong Ji, Haibin Wang, Qiong Wu, and Yuyu Huang. 2018. Cross-modality person re-identification with generative adversarial training. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI ’18), Vol. 1, 2.
[5]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.
[6]
Javier Domínguez-Martín, María J Gómez-Silva, and Arturo De la Escalera. 2023. Neural architectures for feature embedding in person re-identification: A comparative view. ACM Transactions on Intelligent Systems and Technology 14, 5 (2023), 1–21.
[7]
Jiahao Gong, Sanyuan Zhao, Kin-Man Lam, Xin Gao, and Jianbing Shen. 2023. Spectrum-irrelevant fine-grained representation for visible–infrared person re-identification. Computer Vision and Image Understanding 232 (2023), 103703.
[8]
Frank M. Hafner, Amran Bhuiyan, Julian F. P. Kooij, and Eric Granger. 2019. RGB-depth cross-modal person re-identification. In Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS ’19). IEEE, 1–8.
[9]
Frank M. Hafner, Amran Bhuyian, Julian F. P. Kooij, and Eric Granger. 2022. Cross-modal distillation for RGB-depth person re-identification. Computer Vision and Image Understanding 216 (2022), 103352.
[10]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9729–9738.
[11]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.
[12]
Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv:1703.07737. Retrieved from https://arxiv.org/pdf/1703.07737
[13]
Zhipeng Huang, Jiawei Liu, Liang Li, Kecheng Zheng, and Zheng-Jun Zha. 2022. Modality-adaptive mixup and invariant decomposition for RGB-infrared person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, 1034–1042.
[14]
Vijay John, Gwenn Englebienne, and Ben Krose. 2013. Person re-identification using height-based gait in colour depth camera. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 3345–3349.
[15]
Minsu Kim, Seungryong Kim, Jungin Park, Seongheon Park, and Kwanghoon Sohn. 2023. PartMix: Regularization strategy to learn part discovery for visible-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18621–18632.
[16]
Diangang Li, Xing Wei, Xiaopeng Hong, and Yihong Gong. 2020. Infrared-visible cross-modal person re-identification with an x modality. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 4610–4617.
[17]
Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2197–2206.
[18]
Giuseppe Lisanti, Iacopo Masi, Andrew D. Bagdanov, and Alberto Del Bimbo. 2014. Person re-identification by iterative re-weighted sparse ranking. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 8 (2014), 1629–1642.
[19]
Hong Liu, Liang Hu, and Liqian Ma. 2017. Online RGB-D person re-identification based on metric model update. CAAI Transactions on Intelligence Technology 2, 1 (2017), 48–55.
[20]
Jialun Liu, Yifan Sun, Feng Zhu, Hongbin Pei, Yi Yang, and Wenhui Li. 2022. Learning memory-augmented unidirectional metrics for cross-modality person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19366–19375.
[21]
Wenhe Liu, Xiaojun Chang, Ling Chen, Dinh Phung, Xiaoqin Zhang, Yi Yang, and Alexander G. Hauptmann. 2020. Pair-based uncertainty and diversity promoting early active learning for person re-identification. ACM Transactions on Intelligent Systems and Technology (TIST) 11, 2 (2020), 1–15.
[22]
Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. 2016. Large-margin softmax loss for convolutional neural networks. In Proceedings of the 33rd International Conference on International Conference on Machine Learning (ICML ’16), Vol. 48, 507–516.
[23]
Hao Luo, Youzhi Gu, Xingyu Liao, Shenqi Lai, and Wei Jiang. 2019. Bag of tricks and a strong baseline for deep person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 0–0.
[24]
Andreas Møgelmose, Thomas B. Moeslund, and Kamal Nasrollahi. 2013. Multimodal person re-identification using RGB-D sensors and a transient identification database. In Proceedings of the International Workshop on Biometrics and Forensics (IWBF ’13). IEEE, 1–4.
[25]
Matteo Munaro, Andrea Fossati, Alberto Basso, Emanuele Menegatti, and Luc Van Gool. 2014. One-shot person re-identification with a consumer depth camera. In Person Re-Identification. Advances in Computer Vision and Pattern Recognition. S. Gong, M. Cristani, S. Yan, and C. Loy (Eds.), Springer, London, 161–181.
[26]
Mehdi Noroozi and Paolo Favaro. 2016. Unsupervised learning of visual representations by solving jigsaw puzzles. In Proceedings of the 14th European Conference on Computer Vision (ECCV ’16). Springer, 69–84.
[27]
Federico Pala, Riccardo Satta, Giorgio Fumera, and Fabio Roli. 2015. Multimodal person reidentification using RGB-D cameras. IEEE Transactions on Circuits and Systems for Video Technology 26, 4 (2015), 788–799.
[28]
Nan Pu, Wei Chen, Yu Liu, Erwin M. Bakker, and Michael S. Lew. 2020. Dual gaussian-based variational subspace disentanglement for visible-infrared person re-identification. In Proceedings of the 28th ACM International Conference on Multimedia, 2149–2158.
[29]
Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv:1804.02767. Retrieved from https://arxiv.org/pdf/1804.02767
[30]
Yifan Sun, Changmao Cheng, Yuhan Zhang, Chi Zhang, Liang Zheng, Zhongdao Wang, and Yichen Wei. 2020. Circle loss: A unified perspective of pair similarity optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6398–6407.
[31]
Jiajie Tian, Qihao Tang, Rui Li, Zhu Teng, Baopeng Zhang, and Jianping Fan. 2021. A camera identity-guided distribution consistency method for unsupervised multi-target domain person re-identification. ACM Transactions on Intelligent Systems and Technology (TIST) 12, 4 (2021), 1–18.
[32]
Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, David Lopez-Paz, and Yoshua Bengio. 2019. Manifold mixup: Better representations by interpolating hidden states. In Proceedings of the International Conference on Machine Learning. PMLR, 6438–6447.
[33]
Guan’an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, and Zengguang Hou. 2019. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3623–3632.
[34]
Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Yung-Yu Chuang, and Shin’ichi Satoh. 2019. Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 618–626.
[35]
Ancong Wu, Wei-Shi Zheng, Hong-Xing Yu, Shaogang Gong, and Jianhuang Lai. 2017. RGB-infrared cross-modality person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, 5380–5389.
[36]
Jingjing Wu, Jianguo Jiang, Meibin Qi, Cuiqun Chen, and Jingjing Zhang. 2022. An end-to-end heterogeneous restraint network for RGB-D cross-modal person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications(TOMM) 18, 4 (2022), 1–22.
[37]
Zhirong Wu, Yuanjun Xiong, Stella X. Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3733–3742.
[38]
Xinxing Xu, Wen Li, and Dong Xu. 2015. Distance metric learning using privileged information for face verification and person re-identification. IEEE Transactions on Neural Networks And Learning Systems 26, 12 (2015), 3150–3162.
[39]
Mang Ye, Jianbing Shen, David J. Crandall, Ling Shao, and Jiebo Luo. 2020. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In Proceedings of the 16th European Conference on Computer Vision (ECCV ’20). Springer, 229–247.
[40]
Mang Ye, Jianbing Shen, and Ling Shao. 2020. Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE Transactions on Information Forensics and Security 16 (2020), 728–739.
[41]
Sangdoo Yun, Dongyoon Han, Seong J. Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. 2019. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 6023–6032.
[42]
Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization. arXiv:1710.09412. Retrieved from https://arxiv.org/pdf/1710.09412
[43]
Peng Zhang, Jingsong Xu, Qiang Wu, Yan Huang, and Jian Zhang. 2019. Top-push constrained modality-adaptive dictionary learning for cross-modality person re-identification. IEEE Transactions on Circuits and Systems for Video Technology 30, 12 (2019), 4554–4566.
[44]
Richard Zhang, Phillip Isola, and Alexei A. Efros. 2016. Colorful image colorization. In Proceedings of the 14th European Conference on Computer Vision (ECCV ’16).Springer, 649–666.
[45]
Yukang Zhang and Hanzi Wang. 2023. Diverse embedding expansion network and low-light cross-modality benchmark for visible-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2153–2162.
[46]
Yukang Zhang, Yan Yan, Jie Li, and Hanzi Wang. 2023. MRCN: A novel modality restitution and compensation network for visible-infrared person re-identification. arXiv:2303.14626. Retrieved from https://arxiv.org/pdf/2303.14626
[47]
Yukang Zhang, Yan Yan, Yang Lu, and Hanzi Wang. 2021. Towards a unified middle modality learning for visible-infrared person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia, 788–796.
[48]
Zhenghui Zhao, Rui Sun, Zi Yang, and Jun Gao. 2021. Visible-infrared person re-identification based on frequency-domain simulated multispectral modality for dual-mode cameras. IEEE Sensors Journal 22, 1 (2021), 989–1002.
[49]
Jiaxuan Zhuo, Junyong Zhu, Jianhuang Lai, and Xiaohua Xie. 2017. Person re-identification on heterogeneous camera network. In Proceedings of the CCF Chinese Conference on Computer Vision. Springer, 280–291.

Cited By

View all

Index Terms

  1. Intermediary-Generated Bridge Network for RGB-D Cross-Modal Re-Identification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 15, Issue 6
    December 2024
    727 pages
    EISSN:2157-6912
    DOI:10.1145/3613712
    • Editor:
    • Huan Liu
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 November 2024
    Online AM: 29 July 2024
    Accepted: 18 July 2024
    Revised: 22 March 2024
    Received: 10 November 2023
    Published in TIST Volume 15, Issue 6

    Check for updates

    Author Tags

    1. RGB-D cross-modal person re-identification
    2. Auxiliary modal generation
    3. Self-supervised intermediary learning
    4. Heterogeneous relation integration
    5. Cross-modal contrastive learning decomposition

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • National Key Research and Development Programs of China
    • Fundamental Research Funds for the Central Universities

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)274
    • Downloads (Last 6 weeks)25
    Reflects downloads up to 12 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media