{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:17:08Z","timestamp":1750220228294,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":68,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,6,27]],"date-time":"2022-06-27T00:00:00Z","timestamp":1656288000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"the National Key Research and Development Program of China","award":["2018YFB1402600"],"award-info":[{"award-number":["2018YFB1402600"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,6,27]]},"DOI":"10.1145\/3512527.3531359","type":"proceedings-article","created":{"date-parts":[[2022,6,23]],"date-time":"2022-06-23T22:23:32Z","timestamp":1656023012000},"page":"508-517","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["3D-Augmented Contrastive Knowledge Distillation for Image-based Object Pose Estimation"],"prefix":"10.1145","author":[{"given":"Zhidan","family":"Liu","sequence":"first","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China"}]},{"given":"Zhen","family":"Xing","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China"}]},{"given":"Xiangdong","family":"Zhou","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China"}]},{"given":"Yijiang","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China"}]},{"given":"Guichun","family":"Zhou","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2022,6,27]]},"reference":[{"key":"e_1_3_2_2_1_1","first-page":"7","article-title":"Self-Supervised MultiModal Versatile Networks","volume":"2","author":"Alayrac Jean-Baptiste","year":"2020","unstructured":"Jean-Baptiste Alayrac , Adria Recasens , Rosalia Schneider , Relja Arandjelovic , Jason Ramapuram , Jeffrey De Fauw , Lucas Smaira , Sander Dieleman , and Andrew Zisserman . 2020 . Self-Supervised MultiModal Versatile Networks . NeurIPS , Vol. 2 , 6 (2020), 7 . Jean-Baptiste Alayrac, Adria Recasens, Rosalia Schneider, Relja Arandjelovic, Jason Ramapuram, Jeffrey De Fauw, Lucas Smaira, Sander Dieleman, and Andrew Zisserman. 2020. Self-Supervised MultiModal Versatile Networks. NeurIPS, Vol. 2, 6 (2020), 7.","journal-title":"NeurIPS"},{"key":"e_1_3_2_2_2_1","volume-title":"International Conference on Machine Learning (ICML). PMLR, 1597--1607","author":"Chen Ting","year":"2020","unstructured":"Ting Chen , Simon Kornblith , Mohammad Norouzi , and Geoffrey Hinton . 2020 b. A simple framework for contrastive learning of visual representations . In International Conference on Machine Learning (ICML). PMLR, 1597--1607 . Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020 b. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning (ICML). PMLR, 1597--1607."},{"key":"e_1_3_2_2_3_1","volume-title":"2020 c. Big self-supervised models are strong semi-supervised learners. arXiv preprint arXiv:2006.10029","author":"Chen Ting","year":"2020","unstructured":"Ting Chen , Simon Kornblith , Kevin Swersky , Mohammad Norouzi , and Geoffrey Hinton . 2020 c. Big self-supervised models are strong semi-supervised learners. arXiv preprint arXiv:2006.10029 ( 2020 ). Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey Hinton. 2020 c. Big self-supervised models are strong semi-supervised learners. arXiv preprint arXiv:2006.10029 (2020)."},{"key":"e_1_3_2_2_4_1","volume-title":"2020 a. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297","author":"Chen Xinlei","year":"2020","unstructured":"Xinlei Chen , Haoqi Fan , Ross Girshick , and Kaiming He . 2020 a. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297 ( 2020 ). Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. 2020 a. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297 (2020)."},{"key":"e_1_3_2_2_5_1","volume-title":"The MOPED framework: Object recognition and pose estimation for manipulation. The international journal of robotics research","author":"Collet Alvaro","year":"2011","unstructured":"Alvaro Collet , Manuel Martinez , and Siddhartha S Srinivasa . 2011. The MOPED framework: Object recognition and pose estimation for manipulation. The international journal of robotics research , Vol. 30 , 10 ( 2011 ), 1284--1306. Alvaro Collet, Manuel Martinez, and Siddhartha S Srinivasa. 2011. The MOPED framework: Object recognition and pose estimation for manipulation. The international journal of robotics research, Vol. 30, 10 (2011), 1284--1306."},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01281"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV48630.2021.00192"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA40945.2020.9196714"},{"key":"e_1_3_2_2_10_1","volume-title":"Christopher KI Williams, John Winn, and Andrew Zisserman.","author":"Everingham Mark","year":"2010","unstructured":"Mark Everingham , Luc Van Gool , Christopher KI Williams, John Winn, and Andrew Zisserman. 2010 . The pascal visual object classes (voc) challenge. International journal of computer vision (IJCV) , Vol. 88 , 2 (2010), 303--338. Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision (IJCV), Vol. 88, 2 (2010), 303--338."},{"key":"e_1_3_2_2_11_1","volume-title":"International Conference on Machine Learning (ICML). PMLR, 1126--1135","author":"Finn Chelsea","year":"2017","unstructured":"Chelsea Finn , Pieter Abbeel , and Sergey Levine . 2017 . Model-agnostic meta-learning for fast adaptation of deep networks . In International Conference on Machine Learning (ICML). PMLR, 1126--1135 . Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning (ICML). PMLR, 1126--1135."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3460426.3463609"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00319"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00231"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939754"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.309"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2006.100"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00201"},{"key":"e_1_3_2_2_20_1","volume-title":"Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531","author":"Hinton Geoffrey","year":"2015","unstructured":"Geoffrey Hinton , Oriol Vinyals , and Jeff Dean . 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 ( 2015 ). Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)."},{"key":"e_1_3_2_2_21_1","volume-title":"Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670","author":"Hjelm R Devon","year":"2018","unstructured":"R Devon Hjelm , Alex Fedorov , Samuel Lavoie-Marchildon , Karan Grewal , Phil Bachman , Adam Trischler , and Yoshua Bengio . 2018. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 ( 2018 ). R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2018. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018)."},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.336"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01336"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00375"},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.106"},{"key":"e_1_3_2_2_26_1","unstructured":"David Lopez-Paz L\u00e9on Bottou Bernhard Sch\u00f6lkopf and Vladimir Vapnik. 2016. Unifying distillation and privileged information. (2016).  David Lopez-Paz L\u00e9on Bottou Bernhard Sch\u00f6lkopf and Vladimir Vapnik. 2016. Unifying distillation and privileged information. (2016)."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00217"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2015.2513408"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.597"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00403"},{"key":"e_1_3_2_2_31_1","volume-title":"Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748","author":"van den Oord Aaron","year":"2018","unstructured":"Aaron van den Oord , Yazhe Li , and Oriol Vinyals . 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 ( 2018 ). Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)."},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.01006"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01072"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00409"},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00944"},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2017.7989233"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00342"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .","author":"Qi Charles R.","key":"e_1_3_2_2_38_1","unstructured":"Charles R. Qi , Hao Su , Kaichun Mo , and Leonidas J. Guibas . 2017. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ."},{"key":"e_1_3_2_2_39_1","volume-title":"Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al.","author":"Radford Alec","year":"2021","unstructured":"Alec Radford , Jong Wook Kim , Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021 . Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020 (2021). Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020 (2021)."},{"key":"e_1_3_2_2_40_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.308"},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.36"},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00314"},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2019.2950449"},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00947"},{"key":"e_1_3_2_2_46_1","volume-title":"Contrastive Representation Distillation. In International Conference on Learning Representations (ICLR) .","author":"Tian Yonglong","year":"2019","unstructured":"Yonglong Tian , Dilip Krishnan , and Phillip Isola . 2019 . Contrastive Representation Distillation. In International Conference on Learning Representations (ICLR) . Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2019. Contrastive Representation Distillation. In International Conference on Learning Representations (ICLR) ."},{"key":"e_1_3_2_2_47_1","volume-title":"Proceedings, Part XI 16","author":"Tian Yonglong","year":"2020","unstructured":"Yonglong Tian , Dilip Krishnan , and Phillip Isola . 2020 . Contrastive multiview coding. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020 , Proceedings, Part XI 16 . Springer, 776--794. Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2020. Contrastive multiview coding. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XI 16. Springer, 776--794."},{"key":"e_1_3_2_2_48_1","volume-title":"Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects. In Conference on Robot Learning. PMLR, 306--316","author":"Tremblay Jonathan","year":"2018","unstructured":"Jonathan Tremblay , Thang To , Balakumar Sundaralingam , Yu Xiang , Dieter Fox , and Stan Birchfield . 2018 . Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects. In Conference on Robot Learning. PMLR, 306--316 . Jonathan Tremblay, Thang To, Balakumar Sundaralingam, Yu Xiang, Dieter Fox, and Stan Birchfield. 2018. Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects. In Conference on Robot Learning. PMLR, 306--316."},{"key":"e_1_3_2_2_49_1","volume-title":"British Machine Vision Conference (BMVC) .","author":"Tseng Hung-Yu","year":"2019","unstructured":"Hung-Yu Tseng , Shalini De Mello , Jonathan Tremblay , Sifei Liu , Stan Birchfield , Ming-Hsuan Yang , and Jan Kautz . 2019 . Few-shot viewpoint estimation . In British Machine Vision Conference (BMVC) . Hung-Yu Tseng, Shalini De Mello, Jonathan Tremblay, Sifei Liu, Stan Birchfield, Ming-Hsuan Yang, and Jan Kautz. 2019. Few-shot viewpoint estimation. In British Machine Vision Conference (BMVC) ."},{"key":"e_1_3_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298758"},{"key":"e_1_3_2_2_51_1","volume-title":"A new learning paradigm: Learning using privileged information. Neural networks","author":"Vapnik Vladimir","year":"2009","unstructured":"Vladimir Vapnik and Akshay Vashist . 2009. A new learning paradigm: Learning using privileged information. Neural networks , Vol. 22 , 5--6 ( 2009 ), 544--557. Vladimir Vapnik and Akshay Vashist. 2009. A new learning paradigm: Learning using privileged information. Neural networks, Vol. 22, 5--6 (2009), 544--557."},{"key":"e_1_3_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_7"},{"key":"e_1_3_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00275"},{"key":"e_1_3_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.320"},{"key":"e_1_3_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2019.00163"},{"key":"e_1_3_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00393"},{"key":"e_1_3_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46484-8_10"},{"key":"e_1_3_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2014.6836101"},{"key":"e_1_3_2_2_59_1","volume-title":"PoseContrast: Class-Agnostic Object Viewpoint Estimation in the Wild with Pose-Aware Contrastive Learning. In International Conference on 3D Vision (3DV) .","author":"Xiao Yang","year":"2021","unstructured":"Yang Xiao , Yuming Du , and Renaud Marlet . 2021 . PoseContrast: Class-Agnostic Object Viewpoint Estimation in the Wild with Pose-Aware Contrastive Learning. In International Conference on 3D Vision (3DV) . Yang Xiao, Yuming Du, and Renaud Marlet. 2021. PoseContrast: Class-Agnostic Object Viewpoint Estimation in the Wild with Pose-Aware Contrastive Learning. In International Conference on 3D Vision (3DV) ."},{"key":"e_1_3_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58520-4_12"},{"key":"e_1_3_2_2_61_1","volume-title":"British Machine Vision Conference (BMVC) .","author":"Xiao Yang","year":"2019","unstructured":"Yang Xiao , Xuchong Qiu , Pierre-Alain Langlois , Mathieu Aubry , and Renaud Marlet . 2019 . Pose from Shape: Deep Pose Estimation for Arbitrary 3D Objects . In British Machine Vision Conference (BMVC) . Yang Xiao, Xuchong Qiu, Pierre-Alain Langlois, Mathieu Aubry, and Renaud Marlet. 2019. Pose from Shape: Deep Pose Estimation for Arbitrary 3D Objects. In British Machine Vision Conference (BMVC) ."},{"key":"e_1_3_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00692"},{"key":"e_1_3_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3460426.3463668"},{"key":"e_1_3_2_2_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00381"},{"key":"e_1_3_2_2_65_1","first-page":"18123","article-title":"Counterfactual contrastive learning for weakly-supervised vision-language grounding","volume":"33","author":"Zhang Zhu","year":"2020","unstructured":"Zhu Zhang , Zhou Zhao , Zhijie Lin , Xiuqiang He , 2020 . Counterfactual contrastive learning for weakly-supervised vision-language grounding . Advances in Neural Information Processing Systems , Vol. 33 (2020), 18123 -- 18134 . Zhu Zhang, Zhou Zhao, Zhijie Lin, Xiuqiang He, et al. 2020. Counterfactual contrastive learning for weakly-supervised vision-language grounding. Advances in Neural Information Processing Systems, Vol. 33 (2020), 18123--18134.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_2_66_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01246-5_20"},{"key":"e_1_3_2_2_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2014.6907430"},{"key":"e_1_3_2_2_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3460426.3463648"}],"event":{"name":"ICMR '22: International Conference on Multimedia Retrieval","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Newark NJ USA","acronym":"ICMR '22"},"container-title":["Proceedings of the 2022 International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3512527.3531359","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3512527.3531359","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:11Z","timestamp":1750188611000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3512527.3531359"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,27]]},"references-count":68,"alternative-id":["10.1145\/3512527.3531359","10.1145\/3512527"],"URL":"https:\/\/doi.org\/10.1145\/3512527.3531359","relation":{},"subject":[],"published":{"date-parts":[[2022,6,27]]},"assertion":[{"value":"2022-06-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}