research-article

Intermediary-Generated Bridge Network for RGB-D Cross-Modal Re-Identification

Authors:

Shengeng TangAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology, Volume 15, Issue 6

Article No.: 116, Pages 1 - 25

https://doi.org/10.1145/3682066

Published: 19 November 2024 Publication History

Abstract

RGB-D cross-modal person re-identification (re-id) targets at retrieving the person of interest across RGB and depth image modalities. To cope with the modal discrepancy, some existing methods generate an auxiliary mode with either inherent properties of input modes or extra deep networks. However, such useful intermediary role included in generated mode is often overlooked in these approaches, leading to insufficient exploitation of crucial bridge knowledge. By contrast, in this article, we propose a novel approach that constructs an intermediary mode through the constraints of self-supervised intermediary learning, which is freedom from modal prior knowledge and additional module parameters. We then design a bridge network to fully mine the intermediary role of generated modality through carrying out multi-modal integration and decomposition. For one thing, this network leverages a multi-modal transformer to integrate the information of three modes via fully exploiting their heterogeneous relations with the intermediary mode as the bridge. It conducts the identification consistency constraint to promote cross-modal associations. For another, it employs circle contrastive learning to decompose the cross-modal constraint process into several subprocedures, which provides the intermediate relay during pulling two original modalities closer. Experiments on two public datasets demonstrate that the proposed method exceeds the state-of-the-arts. The effectiveness of each component in this method is verified through numerous ablation studies. Additionally, we have demonstrated the generalization ability of the proposed method through experiments.

References

[1]

Emrah Basaran, Muhittin Gökmen, and Mustafa E Kamasak. 2020. An efficient framework for visible–infrared cross modality person re-identification. Signal Processing: Image Communication 87 (2020), 115933.

[2]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning. PMLR, 1597–1607.

[3]

Seokeon Choi, Sumin Lee, Youngeun Kim, Taekyung Kim, and Changick Kim. 2020. Hi-CMD: Hierarchical cross-modality disentanglement for visible-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10257–10266.

[4]

Pingyang Dai, Rongrong Ji, Haibin Wang, Qiong Wu, and Yuyu Huang. 2018. Cross-modality person re-identification with generative adversarial training. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI ’18), Vol. 1, 2.

[5]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.

[6]

Javier Domínguez-Martín, María J Gómez-Silva, and Arturo De la Escalera. 2023. Neural architectures for feature embedding in person re-identification: A comparative view. ACM Transactions on Intelligent Systems and Technology 14, 5 (2023), 1–21.

Digital Library

[7]

Jiahao Gong, Sanyuan Zhao, Kin-Man Lam, Xin Gao, and Jianbing Shen. 2023. Spectrum-irrelevant fine-grained representation for visible–infrared person re-identification. Computer Vision and Image Understanding 232 (2023), 103703.

Digital Library

[8]

Frank M. Hafner, Amran Bhuiyan, Julian F. P. Kooij, and Eric Granger. 2019. RGB-depth cross-modal person re-identification. In Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS ’19). IEEE, 1–8.

[9]

Frank M. Hafner, Amran Bhuyian, Julian F. P. Kooij, and Eric Granger. 2022. Cross-modal distillation for RGB-depth person re-identification. Computer Vision and Image Understanding 216 (2022), 103352.

[10]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9729–9738.

[11]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.

[12]

Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv:1703.07737. Retrieved from https://arxiv.org/pdf/1703.07737

[13]

Zhipeng Huang, Jiawei Liu, Liang Li, Kecheng Zheng, and Zheng-Jun Zha. 2022. Modality-adaptive mixup and invariant decomposition for RGB-infrared person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, 1034–1042.

[14]

Vijay John, Gwenn Englebienne, and Ben Krose. 2013. Person re-identification using height-based gait in colour depth camera. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 3345–3349.

[15]

Minsu Kim, Seungryong Kim, Jungin Park, Seongheon Park, and Kwanghoon Sohn. 2023. PartMix: Regularization strategy to learn part discovery for visible-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18621–18632.

[16]

Diangang Li, Xing Wei, Xiaopeng Hong, and Yihong Gong. 2020. Infrared-visible cross-modal person re-identification with an x modality. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 4610–4617.

[17]

Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2197–2206.

[18]

Giuseppe Lisanti, Iacopo Masi, Andrew D. Bagdanov, and Alberto Del Bimbo. 2014. Person re-identification by iterative re-weighted sparse ranking. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 8 (2014), 1629–1642.

Digital Library

[19]

Hong Liu, Liang Hu, and Liqian Ma. 2017. Online RGB-D person re-identification based on metric model update. CAAI Transactions on Intelligence Technology 2, 1 (2017), 48–55.

[20]

Jialun Liu, Yifan Sun, Feng Zhu, Hongbin Pei, Yi Yang, and Wenhui Li. 2022. Learning memory-augmented unidirectional metrics for cross-modality person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19366–19375.

[21]

Wenhe Liu, Xiaojun Chang, Ling Chen, Dinh Phung, Xiaoqin Zhang, Yi Yang, and Alexander G. Hauptmann. 2020. Pair-based uncertainty and diversity promoting early active learning for person re-identification. ACM Transactions on Intelligent Systems and Technology (TIST) 11, 2 (2020), 1–15.

Digital Library

[22]

Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. 2016. Large-margin softmax loss for convolutional neural networks. In Proceedings of the 33rd International Conference on International Conference on Machine Learning (ICML ’16), Vol. 48, 507–516.

[23]

Hao Luo, Youzhi Gu, Xingyu Liao, Shenqi Lai, and Wei Jiang. 2019. Bag of tricks and a strong baseline for deep person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 0–0.

[24]

Andreas Møgelmose, Thomas B. Moeslund, and Kamal Nasrollahi. 2013. Multimodal person re-identification using RGB-D sensors and a transient identification database. In Proceedings of the International Workshop on Biometrics and Forensics (IWBF ’13). IEEE, 1–4.

[25]

Matteo Munaro, Andrea Fossati, Alberto Basso, Emanuele Menegatti, and Luc Van Gool. 2014. One-shot person re-identification with a consumer depth camera. In Person Re-Identification. Advances in Computer Vision and Pattern Recognition. S. Gong, M. Cristani, S. Yan, and C. Loy (Eds.), Springer, London, 161–181.

[26]

Mehdi Noroozi and Paolo Favaro. 2016. Unsupervised learning of visual representations by solving jigsaw puzzles. In Proceedings of the 14th European Conference on Computer Vision (ECCV ’16). Springer, 69–84.

[27]

Federico Pala, Riccardo Satta, Giorgio Fumera, and Fabio Roli. 2015. Multimodal person reidentification using RGB-D cameras. IEEE Transactions on Circuits and Systems for Video Technology 26, 4 (2015), 788–799.

Digital Library

[28]

Nan Pu, Wei Chen, Yu Liu, Erwin M. Bakker, and Michael S. Lew. 2020. Dual gaussian-based variational subspace disentanglement for visible-infrared person re-identification. In Proceedings of the 28th ACM International Conference on Multimedia, 2149–2158.

Digital Library

[29]

Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv:1804.02767. Retrieved from https://arxiv.org/pdf/1804.02767

[30]

Yifan Sun, Changmao Cheng, Yuhan Zhang, Chi Zhang, Liang Zheng, Zhongdao Wang, and Yichen Wei. 2020. Circle loss: A unified perspective of pair similarity optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6398–6407.

[31]

Jiajie Tian, Qihao Tang, Rui Li, Zhu Teng, Baopeng Zhang, and Jianping Fan. 2021. A camera identity-guided distribution consistency method for unsupervised multi-target domain person re-identification. ACM Transactions on Intelligent Systems and Technology (TIST) 12, 4 (2021), 1–18.

Digital Library

[32]

Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, David Lopez-Paz, and Yoshua Bengio. 2019. Manifold mixup: Better representations by interpolating hidden states. In Proceedings of the International Conference on Machine Learning. PMLR, 6438–6447.

[33]

Guan’an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, and Zengguang Hou. 2019. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3623–3632.

[34]

Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Yung-Yu Chuang, and Shin’ichi Satoh. 2019. Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 618–626.

[35]

Ancong Wu, Wei-Shi Zheng, Hong-Xing Yu, Shaogang Gong, and Jianhuang Lai. 2017. RGB-infrared cross-modality person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, 5380–5389.

[36]

Jingjing Wu, Jianguo Jiang, Meibin Qi, Cuiqun Chen, and Jingjing Zhang. 2022. An end-to-end heterogeneous restraint network for RGB-D cross-modal person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications(TOMM) 18, 4 (2022), 1–22.

Digital Library

[37]

Zhirong Wu, Yuanjun Xiong, Stella X. Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3733–3742.

[38]

Xinxing Xu, Wen Li, and Dong Xu. 2015. Distance metric learning using privileged information for face verification and person re-identification. IEEE Transactions on Neural Networks And Learning Systems 26, 12 (2015), 3150–3162.

[39]

Mang Ye, Jianbing Shen, David J. Crandall, Ling Shao, and Jiebo Luo. 2020. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In Proceedings of the 16th European Conference on Computer Vision (ECCV ’20). Springer, 229–247.

Digital Library

[40]

Mang Ye, Jianbing Shen, and Ling Shao. 2020. Visible-infrared person re-identification via homogeneous augmented tri-modal learning. IEEE Transactions on Information Forensics and Security 16 (2020), 728–739.

[41]

Sangdoo Yun, Dongyoon Han, Seong J. Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. 2019. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 6023–6032.

[42]

Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization. arXiv:1710.09412. Retrieved from https://arxiv.org/pdf/1710.09412

[43]

Peng Zhang, Jingsong Xu, Qiang Wu, Yan Huang, and Jian Zhang. 2019. Top-push constrained modality-adaptive dictionary learning for cross-modality person re-identification. IEEE Transactions on Circuits and Systems for Video Technology 30, 12 (2019), 4554–4566.

Digital Library

[44]

Richard Zhang, Phillip Isola, and Alexei A. Efros. 2016. Colorful image colorization. In Proceedings of the 14th European Conference on Computer Vision (ECCV ’16).Springer, 649–666.

[45]

Yukang Zhang and Hanzi Wang. 2023. Diverse embedding expansion network and low-light cross-modality benchmark for visible-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2153–2162.

[46]

Yukang Zhang, Yan Yan, Jie Li, and Hanzi Wang. 2023. MRCN: A novel modality restitution and compensation network for visible-infrared person re-identification. arXiv:2303.14626. Retrieved from https://arxiv.org/pdf/2303.14626

[47]

Yukang Zhang, Yan Yan, Yang Lu, and Hanzi Wang. 2021. Towards a unified middle modality learning for visible-infrared person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia, 788–796.

Digital Library

[48]

Zhenghui Zhao, Rui Sun, Zi Yang, and Jun Gao. 2021. Visible-infrared person re-identification based on frequency-domain simulated multispectral modality for dual-mode cameras. IEEE Sensors Journal 22, 1 (2021), 989–1002.

[49]

Jiaxuan Zhuo, Junyong Zhu, Jianhuang Lai, and Xiaohua Xie. 2017. Person re-identification on heterogeneous camera network. In Proceedings of the CCF Chinese Conference on Computer Vision. Springer, 280–291.

Cited By

Ge H(2024)Extending Knowledge Distillation for Personalized FederationAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5666-7_33(392-403)Online publication date: 5-Aug-2024
https://dl.acm.org/doi/10.1007/978-981-97-5666-7_33

Index Terms

Intermediary-Generated Bridge Network for RGB-D Cross-Modal Re-Identification
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object identification

Recommendations

An End-to-end Heterogeneous Restraint Network for RGB-D Cross-modal Person Re-identification
The RGB-D cross-modal person re-identification (re-id) task aims to identify the person of interest across the RGB and depth image modes. The tremendous discrepancy between these two modalities makes this task difficult to tackle. Few researchers pay ...
A Local-Global Self-attention Interaction Network for RGB-D Cross-Modal Person Re-identification
Pattern Recognition and Computer Vision
Abstract
RGB-D cross-modal person re-identification (Re-ID) task aims to match the person images between the RGB and depth modalities. This task is rather challenging for the tremendous discrepancy between these two modalities in addition to common issues ...
Automatic inference of cross-modal nonverbal interactions in multiparty conversations: "who responds to whom, when, and how?" from gaze, head gestures, and utterances
ICMI '07: Proceedings of the 9th international conference on Multimodal interfaces

A novel probabilistic framework is proposed for analyzing cross-modal nonverbal interactions in multiparty face-to-face conversations. The goal is to determine "who responds to whom, when, and how" from multimodal cues including gaze, head gestures, and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 15, Issue 6

December 2024

727 pages

EISSN:2157-6912

DOI:10.1145/3613712

Editor:
Huan Liu
Arizona State University, AZ

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 November 2024

Online AM: 29 July 2024

Accepted: 18 July 2024

Revised: 22 March 2024

Received: 10 November 2023

Published in TIST Volume 15, Issue 6

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
National Key Research and Development Programs of China
Fundamental Research Funds for the Central Universities

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
274
Total Downloads

Downloads (Last 12 months)274
Downloads (Last 6 weeks)25

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ge H(2024)Extending Knowledge Distillation for Personalized FederationAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5666-7_33(392-403)Online publication date: 5-Aug-2024
https://dl.acm.org/doi/10.1007/978-981-97-5666-7_33

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents