{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T03:38:38Z","timestamp":1776310718306,"version":"3.50.1"},"reference-count":39,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2023,9,30]],"date-time":"2023-09-30T00:00:00Z","timestamp":1696032000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62192733, 62192731, 61751210, 62072007, 61832009, 62192730"],"award-info":[{"award-number":["62192733, 62192731, 61751210, 62072007, 61832009, 62192730"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2023,11,30]]},"abstract":"<jats:p>Developers often perform repetitive code editing activities (up to 70%) for various reasons (e.g., code refactoring) during software development. Many deep learning (DL) models have been proposed to automate code editing by learning from the code editing history. Among DL-based models, pre-trained code editing models have achieved the state-of-the-art (SOTA) results. Pre-trained models are first pre-trained with pre-training tasks and fine-tuned with the code editing task. Existing pre-training tasks mainly are code infilling tasks (e.g., masked language modeling), which are derived from the natural language processing field and are not designed for automatic code editing.<\/jats:p>\n          <jats:p>\n            In this article, we propose a novel pre-training task specialized in code editing and present an effective pre-trained code editing model named\n            <jats:sc>CodeEditor<\/jats:sc>\n            . Compared to previous code infilling tasks, our pre-training task further improves the performance and generalization ability of code editing models. Specifically, we collect lots of real-world code snippets as the ground truth and use a powerful generator to rewrite them into mutated versions. Then, we pre-train our\n            <jats:sc>CodeEditor<\/jats:sc>\n            to edit mutated versions into the corresponding ground truth, to learn edit patterns. We conduct experiments on four code editing datasets and evaluate the pre-trained\n            <jats:sc>CodeEditor<\/jats:sc>\n            in three settings (i.e., fine-tuning, few-shot, and zero-shot). (1) In the fine-tuning setting, we train the pre-trained\n            <jats:sc>CodeEditor<\/jats:sc>\n            with four datasets and evaluate it on the test data.\n            <jats:sc>CodeEditor<\/jats:sc>\n            outperforms the SOTA baselines by 15%, 25.5%, 9.4%, and 26.6% on four datasets. (2) In the few-shot setting, we train the pre-trained\n            <jats:sc>CodeEditor<\/jats:sc>\n            with limited data and evaluate it on the test data.\n            <jats:sc>CodeEditor<\/jats:sc>\n            substantially performs better than all baselines, even outperforming baselines that are fine-tuned with all data. (3) In the zero-shot setting, we evaluate the pre-trained\n            <jats:sc>CodeEditor<\/jats:sc>\n            on the test data without training.\n            <jats:sc>CodeEditor<\/jats:sc>\n            correctly edits 1,113 programs, while the SOTA baselines cannot work. The results show that the superiority of our pre-training task and the pre-trained\n            <jats:sc>CodeEditor<\/jats:sc>\n            is more effective in automatic code editing.\n          <\/jats:p>","DOI":"10.1145\/3597207","type":"journal-article","created":{"date-parts":[[2023,5,22]],"date-time":"2023-05-22T12:02:37Z","timestamp":1684756957000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":31,"title":["<scp>CodeEditor<\/scp>\n            : Learning to Edit Source Code with Pre-trained Models"],"prefix":"10.1145","volume":"32","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5579-8852","authenticated-orcid":false,"given":"Jia","family":"Li","sequence":"first","affiliation":[{"name":"Key Lab of High Confidence Software Technology, MoE, School of Computer Science, Peking University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5828-0186","authenticated-orcid":false,"given":"Ge","family":"Li","sequence":"additional","affiliation":[{"name":"Key Lab of High Confidence Software Technology, MoE, School of Computer Science, Peking University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0198-2304","authenticated-orcid":false,"given":"Zhuo","family":"Li","sequence":"additional","affiliation":[{"name":"Key Lab of High Confidence Software Technology, MoE, School of Computer Science, Peking University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1087-226X","authenticated-orcid":false,"given":"Zhi","family":"Jin","sequence":"additional","affiliation":[{"name":"Key Lab of High Confidence Software Technology, MoE, School of Computer Science, Peking University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0093-3292","authenticated-orcid":false,"given":"Xing","family":"Hu","sequence":"additional","affiliation":[{"name":"Zhejiang University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3290-0244","authenticated-orcid":false,"given":"Kechi","family":"Zhang","sequence":"additional","affiliation":[{"name":"Key Lab of High Confidence Software Technology, MoE, School of Computer Science, Peking University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0260-6404","authenticated-orcid":false,"given":"Zhiyi","family":"Fu","sequence":"additional","affiliation":[{"name":"Key Lab of High Confidence Software Technology, MoE, School of Computer Science, Peking University, China"}]}],"member":"320","published-online":{"date-parts":[[2023,9,30]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"2655","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Ahmad Wasi","year":"2021","unstructured":"Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified Pre-training for program understanding and generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2655\u20132668."},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ESEM.2013.23"},{"issue":"4","key":"e_1_3_2_4_2","doi-asserted-by":"crossref","first-page":"1385","DOI":"10.1109\/TSE.2020.3020502","article-title":"Codit: Code editing with tree-based neural models","volume":"48","author":"Chakraborty Saikat","year":"2020","unstructured":"Saikat Chakraborty, Yangruibo Ding, Miltiadis Allamanis, and Baishakhi Ray. 2020. Codit: Code editing with tree-based neural models. IEEE Transactions on Software Engineering 48, 4 (2020), 1385\u20131399.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"e_1_3_2_5_2","first-page":"443","volume-title":"Proceedings of the 2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201921)","author":"Chakraborty Saikat","year":"2021","unstructured":"Saikat Chakraborty and Baishakhi Ray. 2021. On multi-modal learning of editing source code. In Proceedings of the 2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201921). IEEE, 443\u2013455."},{"key":"e_1_3_2_6_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Clark Kevin","year":"2019","unstructured":"Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2019. ELECTRA: Pre-training text encoders as discriminators rather than generators. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_7_2","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171\u20134186."},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3551349.3556903"},{"key":"e_1_3_2_9_2","doi-asserted-by":"crossref","first-page":"1536","DOI":"10.18653\/v1\/2020.findings-emnlp.139","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Feng Zhangyin","year":"2020","unstructured":"Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et\u00a0al. 2020. CodeBERT: A Pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. 1536\u20131547."},{"key":"e_1_3_2_10_2","unstructured":"GitHub. 2022. Real-world code changes. https:\/\/github.com\/apache\/hadoop\/pull\/4670\/files#diffdac9de4dd225110eff2f29a44000bf32705f02df2b3fcf17b5d89bc236c12f01."},{"key":"e_1_3_2_11_2","volume-title":"Proceeding os the International Conference on Learning Representations","author":"Guo Daya","year":"2020","unstructured":"Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, LIU Shujie, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, et\u00a0al. 2020. GraphCodeBERT: Pre-training code representations with data flow. In Proceeding os the International Conference on Learning Representations."},{"key":"e_1_3_2_12_2","unstructured":"Hamel Husain Ho-Hsiang Wu Tiferet Gazit Miltiadis Allamanis and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search. CoRR abs\/1909.09436 (2019). arXiv:1909.09436 http:\/\/arxiv.org\/abs\/1909.09436."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"e_1_3_2_14_2","first-page":"155","volume-title":"Proceedings of the 2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201921)","author":"Li Jia","year":"2021","unstructured":"Jia Li, Yongmin Li, Ge Li, Xing Hu, Xin Xia, and Zhi Jin. 2021. Editsum: A retrieve-and-edit framework for source code summarization. In Proceedings of the 2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201921). IEEE, 155\u2013166."},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2302.06144 arXiv:2302.06144"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2210.17029"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2303.17780"},{"key":"e_1_3_2_18_2","unstructured":"Xiaonan Li Yeyun Gong Yelong Shen Xipeng Qiu Hang Zhang Bolun Yao Weizhen Qi Daxin Jiang Weizhu Chen and Nan Duan. 2022. CodeRetriever: Unimodal and bimodal contrastive learning. CoRR abs\/2201.10866 (2022). arXiv:2201.10866. https:\/\/arxiv.org\/abs\/2201.10866."},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3324884.3416591"},{"key":"e_1_3_2_20_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. CoRR abs\/1907.11692 (2019). arXiv:1907.11692. http:\/\/arxiv.org\/abs\/1907.11692."},{"key":"e_1_3_2_21_2","volume-title":"Proceedings of the 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)","author":"Lu Shuai","year":"2021","unstructured":"Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, et\u00a0al. 2021. CodeXGLUE: A machine learning benchmark dataset for code understanding and generation. In Proceedings of the 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)."},{"key":"e_1_3_2_22_2","first-page":"336","volume-title":"Proceedings of the 2021 IEEE\/ACM 43rd International Conference on Software Engineering (ICSE\u201921)","author":"Mastropaolo Antonio","year":"2021","unstructured":"Antonio Mastropaolo, Simone Scalabrino, Nathan Cooper, David Nader Palacio, Denys Poshyvanyk, Rocco Oliveto, and Gabriele Bavota. 2021. Studying the usage of text-to-text transfer transformer to support code-related tasks. In Proceedings of the 2021 IEEE\/ACM 43rd International Conference on Software Engineering (ICSE\u201921). IEEE, 336\u2013347."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/2950290.2950333"},{"key":"e_1_3_2_24_2","first-page":"180","volume-title":"Proceedings of the 2013 28th IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201913)","author":"Nguyen Hoan Anh","year":"2013","unstructured":"Hoan Anh Nguyen, Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N. Nguyen, and Hridesh Rajan. 2013. A study of repetitiveness of code changes in software evolution. In Proceedings of the 2013 28th IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201913). IEEE, 180\u2013190."},{"key":"e_1_3_2_25_2","volume-title":"Proceedings of the 2022 IEEE\/ACM 44th International Conference on Software Engineering (ICSE\u201922)","author":"Niu Changan","year":"2022","unstructured":"Changan Niu, Chuanyi Li, Vincent Ng, Jidong Ge, Liguo Huang, and Bin Luo. 2022. SPT-Code: Sequence-to-sequence Pre-Training for learning the representation of source code. In Proceedings of the 2022 IEEE\/ACM 44th International Conference on Software Engineering (ICSE\u201922). IEEE."},{"key":"e_1_3_2_26_2","first-page":"311","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311\u2013318."},{"key":"e_1_3_2_27_2","unstructured":"Alec Radford Jeffrey Wu Rewon Child David Luan Dario Amodei and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog 1 8 (2019) 9. https:\/\/cdn.openai.com\/better-language-models\/language_models_are_unsupervised_multitask_learners.pdf."},{"key":"e_1_3_2_28_2","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21 (2020), 1\u201367.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_29_2","first-page":"367","volume-title":"Proceedings of the 2013 28th IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201913)","author":"Ray Baishakhi","year":"2013","unstructured":"Baishakhi Ray, Miryung Kim, Suzette Person, and Neha Rungta. 2013. Detecting and characterizing semantic inconsistencies in ported code. In Proceedings of the 2013 28th IEEE\/ACM International Conference on Automated Software Engineering (ASE\u201913). IEEE, 367\u2013377."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1162"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1355"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6430"},{"key":"e_1_3_2_33_2","volume-title":"Proceedings of the 2022 IEEE\/ACM 44st International Conference on Software Engineering (ICSE\u201922)","author":"Thongtanunam Patanamon","year":"2022","unstructured":"Patanamon Thongtanunam, Chanathip Pornprasit, and Chakkrit Tantithamthavorn. 2022. AutoTransform: Automated code transformation to support modern code review process. In Proceedings of the 2022 IEEE\/ACM 44st International Conference on Software Engineering (ICSE\u201922). IEEE."},{"key":"e_1_3_2_34_2","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1109\/ICSE.2019.00021","volume-title":"Proceedings of the 2019 IEEE\/ACM 41st International Conference on Software Engineering (ICSE\u201919)","author":"Tufano Michele","year":"2019","unstructured":"Michele Tufano, Jevgenija Pantiuchina, Cody Watson, Gabriele Bavota, and Denys Poshyvanyk. 2019. On learning meaningful code changes via neural machine translation. In Proceedings of the 2019 IEEE\/ACM 41st International Conference on Software Engineering (ICSE\u201919). IEEE, 25\u201336."},{"key":"e_1_3_2_35_2","doi-asserted-by":"crossref","unstructured":"Rosalia Tufano Simone Masiero Antonio Mastropaolo Luca Pascarella Denys Poshyvanyk and Gabriele Bavota. 2022. Using Pre-trained models to boost code review automation. In Proceedings of the 44th International Conference on Software Engineering . 2291\u20132302.","DOI":"10.1145\/3510003.3510621"},{"key":"e_1_3_2_36_2","first-page":"163","volume-title":"Proceedings of the 2021 IEEE\/ACM 43rd International Conference on Software Engineering (ICSE\u201921)","author":"Tufano Rosalia","year":"2021","unstructured":"Rosalia Tufano, Luca Pascarella, Michele Tufanoy, Denys Poshyvanykz, and Gabriele Bavota. 2021. Towards automating code review activities. In Proceedings of the 2021 IEEE\/ACM 43rd International Conference on Software Engineering (ICSE\u201921). IEEE, 163\u2013174."},{"key":"e_1_3_2_37_2","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017 December 4-9 2017 Long Beach CA USA Isabelle Guyon Ulrike von Luxburg Samy Bengio Hanna M. Wallach Rob Fergus S. V. N. Vishwanathan and Roman Garnett (Eds.). 5998\u20136008. https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.8."},{"key":"e_1_3_2_38_2","doi-asserted-by":"crossref","first-page":"8696","DOI":"10.18653\/v1\/2021.emnlp-main.685","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Wang Yue","year":"2021","unstructured":"Yue Wang, Weishi Wang, Shafiq Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware unified Pre-trained Encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 8696\u20138708."},{"key":"e_1_3_2_39_2","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing (Demo Track)","author":"Yin Pengcheng","year":"2018","unstructured":"Pengcheng Yin and Graham Neubig. 2018. TRANX: A transition-based neural abstract Syntax parser for semantic parsing and code generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (Demo Track)."},{"key":"e_1_3_2_40_2","doi-asserted-by":"crossref","first-page":"571","DOI":"10.18653\/v1\/2021.emnlp-main.45","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Zhou Wangchunshu","year":"2021","unstructured":"Wangchunshu Zhou, Tao Ge, Canwen Xu, Ke Xu, and Furu Wei. 2021. Improving Sequence-to-sequence Pre-training via sequence span rewriting. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 571\u2013582."}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3597207","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3597207","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:06Z","timestamp":1750182546000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3597207"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,30]]},"references-count":39,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,11,30]]}},"alternative-id":["10.1145\/3597207"],"URL":"https:\/\/doi.org\/10.1145\/3597207","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,30]]},"assertion":[{"value":"2022-09-25","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-04-07","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-09-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}