{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:55:21Z","timestamp":1750308921240,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":22,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,2,25]],"date-time":"2023-02-25T00:00:00Z","timestamp":1677283200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Exascale Computing Project","award":["17-SC-20-SC"],"award-info":[{"award-number":["17-SC-20-SC"]}]},{"name":"National Science Foundation, USA","award":["2113996"],"award-info":[{"award-number":["2113996"]}]},{"name":"Argonne Leadership Computing Facility, DOE Office of Science User Facility","award":["DE-AC02-06CH11357"],"award-info":[{"award-number":["DE-AC02-06CH11357"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,2,25]]},"DOI":"10.1145\/3587278.3595644","type":"proceedings-article","created":{"date-parts":[[2023,7,5]],"date-time":"2023-07-05T14:53:46Z","timestamp":1688568826000},"page":"1-6","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Transfer Learning Across Heterogeneous Features For Efficient Tensor Program Generation"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8820-6016","authenticated-orcid":false,"given":"Gaurav","family":"Verma","sequence":"first","affiliation":[{"name":"Stony Brook University, Stony Brook, New York, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4832-0834","authenticated-orcid":false,"given":"Siddhisanket","family":"Raskar","sequence":"additional","affiliation":[{"name":"Argonne National Laboratory, Illinois, Chicago, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6390-0255","authenticated-orcid":false,"given":"Zhen","family":"Xie","sequence":"additional","affiliation":[{"name":"Argonne National Laboratory, Illinois, Chicago, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0706-2824","authenticated-orcid":false,"given":"Abid M","family":"Malik","sequence":"additional","affiliation":[{"name":"Brookhaven National Laboratory, Upton, New York, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6279-0007","authenticated-orcid":false,"given":"Murali","family":"Emani","sequence":"additional","affiliation":[{"name":"Argonne National Laboratory, Illinois, Chicago, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8449-8579","authenticated-orcid":false,"given":"Barbara","family":"Chapman","sequence":"additional","affiliation":[{"name":"Stony Brook University, Stony Brook, New York, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,7,5]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Chameleon: Adaptive code optimization for expedited deep neural network compilation. arXiv preprint arXiv:2001.08743","author":"Ahn Byung Hoon","year":"2020","unstructured":"Byung Hoon Ahn , Prannoy Pilligundla , Amir Yazdanbakhsh , and Hadi Esmaeilzadeh . 2020 . Chameleon: Adaptive code optimization for expedited deep neural network compilation. arXiv preprint arXiv:2001.08743 (2020). Byung Hoon Ahn, Prannoy Pilligundla, Amir Yazdanbakhsh, and Hadi Esmaeilzadeh. 2020. Chameleon: Adaptive code optimization for expedited deep neural network compilation. arXiv preprint arXiv:2001.08743 (2020)."},{"key":"e_1_3_2_1_2_1","volume-title":"Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang.","author":"Bradbury James","year":"2018","unstructured":"James Bradbury , Roy Frostig , Peter Hawkins , Matthew James Johnson , Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018 . JAX: composable transformations of Python +NumPy programs. http:\/\/github.com\/google\/jax James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs. http:\/\/github.com\/google\/jax"},{"key":"e_1_3_2_1_3_1","volume-title":"13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen , Thierry Moreau , Ziheng Jiang , Lianmin Zheng , Eddie Yan , Haichen Shen , Meghan Cowan , Leyuan Wang , Yuwei Hu , Luis Ceze , 2018 . {TVM}: An automated {End-to-End} optimizing compiler for deep learning . In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) . 578--594. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. {TVM}: An automated {End-to-End} optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 578--594."},{"key":"e_1_3_2_1_4_1","volume-title":"Learning to optimize tensor programs. Advances in Neural Information Processing Systems 31","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , and Arvind Krishnamurthy . 2018. Learning to optimize tensor programs. Advances in Neural Information Processing Systems 31 ( 2018 ). Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. Learning to optimize tensor programs. Advances in Neural Information Processing Systems 31 (2018)."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2017.24"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3559009.3569682"},{"key":"e_1_3_2_1_7_1","first-page":"387","article-title":"A learned performance model for tensor processing units","volume":"3","author":"Kaufman Sam","year":"2021","unstructured":"Sam Kaufman , Phitchaya Phothilimthana , Yanqi Zhou , Charith Mendis , Sudip Roy , Amit Sabne , and Mike Burrows . 2021 . A learned performance model for tensor processing units . Proceedings of Machine Learning and Systems 3 (2021), 387 -- 400 . Sam Kaufman, Phitchaya Phothilimthana, Yanqi Zhou, Charith Mendis, Sudip Roy, Amit Sabne, and Mike Burrows. 2021. A learned performance model for tensor processing units. Proceedings of Machine Learning and Systems 3 (2021), 387--400.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_1_8_1","volume-title":"International Conference on machine learning. PMLR, 4505--4515","author":"Mendis Charith","year":"2019","unstructured":"Charith Mendis , Alex Renda , Saman Amarasinghe , and Michael Carbin . 2019 . Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks . In International Conference on machine learning. PMLR, 4505--4515 . Charith Mendis, Alex Renda, Saman Amarasinghe, and Michael Carbin. 2019. Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks. In International Conference on machine learning. PMLR, 4505--4515."},{"key":"e_1_3_2_1_9_1","unstructured":"Charith Mendis Cambridge Yang Yewen Pu Dr Amarasinghe Michael Carbin etal 2019. Compiler auto-vectorization with imitation learning. Advances in Neural Information Processing Systems 32 (2019).  Charith Mendis Cambridge Yang Yewen Pu Dr Amarasinghe Michael Carbin et al. 2019. Compiler auto-vectorization with imitation learning. Advances in Neural Information Processing Systems 32 (2019)."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-95953-1_7"},{"key":"e_1_3_2_1_11_1","volume-title":"Glow: Graph lowering compiler techniques for neural networks. arXiv preprint arXiv:1805.00907","author":"Rotem Nadav","year":"2018","unstructured":"Nadav Rotem , Jordan Fix , Saleem Abdulrasool , Garret Catron , Summer Deng , Roman Dzhabarov , Nick Gibson , James Hegeman , Meghan Lele , Roman Levenstein , 2018 . Glow: Graph lowering compiler techniques for neural networks. arXiv preprint arXiv:1805.00907 (2018). Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Garret Catron, Summer Deng, Roman Dzhabarov, Nick Gibson, James Hegeman, Meghan Lele, Roman Levenstein, et al. 2018. Glow: Graph lowering compiler techniques for neural networks. arXiv preprint arXiv:1805.00907 (2018)."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3497776.3517774"},{"key":"e_1_3_2_1_13_1","volume-title":"XLA : Compiling Machine Learning for Peak Performance.","author":"Sabne Amit","year":"2020","unstructured":"Amit Sabne . 2020 . XLA : Compiling Machine Learning for Peak Performance. Amit Sabne. 2020. XLA : Compiling Machine Learning for Peak Performance."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380584"},{"key":"e_1_3_2_1_15_1","first-page":"323","article-title":"Value learning for throughput optimization of deep learning workloads","volume":"3","author":"Steiner Benoit","year":"2021","unstructured":"Benoit Steiner , Chris Cummins , Horace He , and Hugh Leather . 2021 . Value learning for throughput optimization of deep learning workloads . Proceedings of Machine Learning and Systems 3 (2021), 323 -- 334 . Benoit Steiner, Chris Cummins, Horace He, and Hugh Leather. 2021. Value learning for throughput optimization of deep learning workloads. Proceedings of Machine Learning and Systems 3 (2021), 323--334.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2010.04.018"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528416.3530251"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW52791.2021.00128"},{"key":"e_1_3_2_1_19_1","volume-title":"SC'98: Proceedings of the 1998 ACM\/IEEE conference on Supercomputing. IEEE, 38--38","author":"Clinton Whaley R","year":"1998","unstructured":"R Clinton Whaley and Jack J Dongarra . 1998 . Automatically tuned linear algebra software . In SC'98: Proceedings of the 1998 ACM\/IEEE conference on Supercomputing. IEEE, 38--38 . R Clinton Whaley and Jack J Dongarra. 1998. Automatically tuned linear algebra software. In SC'98: Proceedings of the 1998 ACM\/IEEE conference on Supercomputing. IEEE, 38--38."},{"key":"e_1_3_2_1_20_1","first-page":"906","article-title":"Autosync: Learning to synchronize for data-parallel distributed deep learning","volume":"33","author":"Zhang Hao","year":"2020","unstructured":"Hao Zhang , Yuan Li , Zhijie Deng , Xiaodan Liang , Lawrence Carin , and Eric Xing . 2020 . Autosync: Learning to synchronize for data-parallel distributed deep learning . Advances in Neural Information Processing Systems 33 (2020), 906 -- 917 . Hao Zhang, Yuan Li, Zhijie Deng, Xiaodan Liang, Lawrence Carin, and Eric Xing. 2020. Autosync: Learning to synchronize for data-parallel distributed deep learning. Advances in Neural Information Processing Systems 33 (2020), 906--917.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_21_1","volume-title":"14th USENIX symposium on operating systems design and implementation (OSDI20)","author":"Zheng Lianmin","year":"2020","unstructured":"Lianmin Zheng , Chengfan Jia , Minmin Sun , Zhao Wu , Cody Hao Yu , Ameer Haj-Ali , Yida Wang , Jun Yang , Danyang Zhuo , Koushik Sen , 2020 . Ansor: Generating {High-Performance} Tensor Programs for Deep Learning . In 14th USENIX symposium on operating systems design and implementation (OSDI20) . 863--879. Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, et al. 2020. Ansor: Generating {High-Performance} Tensor Programs for Deep Learning. In 14th USENIX symposium on operating systems design and implementation (OSDI20). 863--879."},{"key":"e_1_3_2_1_22_1","volume-title":"Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).","author":"Zheng Lianmin","year":"2021","unstructured":"Lianmin Zheng , Ruochen Liu , Junru Shao , Tianqi Chen , Joseph E Gonzalez , Ion Stoica , and Ameer Haj Ali . 2021 . Tenset: A large-scale program performance dataset for learned tensor compilers . In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1). Lianmin Zheng, Ruochen Liu, Junru Shao, Tianqi Chen, Joseph E Gonzalez, Ion Stoica, and Ameer Haj Ali. 2021. Tenset: A large-scale program performance dataset for learned tensor compilers. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)."}],"event":{"name":"ExHET 23:: 2nd International Workshop on Extreme Heterogeneity Solutions","sponsor":["SIGHPC ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing","SIGPLAN ACM Special Interest Group on Programming Languages"],"location":"Montreal QC Canada","acronym":"ExHET 23:"},"container-title":["Proceedings of the 2nd International Workshop on Extreme Heterogeneity Solutions"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3587278.3595644","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3587278.3595644","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T21:36:38Z","timestamp":1750282598000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3587278.3595644"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,25]]},"references-count":22,"alternative-id":["10.1145\/3587278.3595644","10.1145\/3587278"],"URL":"https:\/\/doi.org\/10.1145\/3587278.3595644","relation":{},"subject":[],"published":{"date-parts":[[2023,2,25]]},"assertion":[{"value":"2023-07-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}