{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,21]],"date-time":"2026-01-21T19:24:54Z","timestamp":1769023494012,"version":"3.49.0"},"reference-count":61,"publisher":"Institution of Engineering and Technology (IET)","issue":"8","license":[{"start":{"date-parts":[[2024,10,30]],"date-time":"2024-10-30T00:00:00Z","timestamp":1730246400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["ietresearch.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["IET Computer Vision"],"published-print":{"date-parts":[[2024,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Object detection in remote sensing images aims to interpret images to obtain information on the category and location of potential targets, which is of great importance in traffic detection, marine supervision, and space reconnaissance. However, the complex backgrounds and large scale variations in remote sensing images present significant challenges. Traditional methods relied mainly on image filtering or feature descriptor methods to extract features, resulting in underperformance. Deep learning methods, especially one\u2010stage detectors, for example, the Real\u2010Time Object Detector (RTMDet) offers advanced solutions with efficient network architectures. Nevertheless, difficulty in feature extraction from complex backgrounds and target localisation in scale variations images limits detection accuracy. In this paper, an improved detector based on RTMDet, called the Multi\u2010Scale Feature Extraction\u2010assist RTMDet (MRTMDet), is proposed which address limitations through enhancement feature extraction and fusion networks. At the core of MRTMDet is a new backbone network MobileViT++ and a feature fusion network SFC\u2010FPN, which enhances the model's ability to capture global and multi\u2010scale features by carefully designing a hybrid feature processing unit of CNN and a transformer based on vision transformer (ViT) and poly\u2010scale convolution (PSConv), respectively. The experiment in DIOR\u2010R demonstrated that MRTMDet achieves competitive performance of 62.2%\u00a0mAP, balancing precision with a lightweight design.<\/jats:p>","DOI":"10.1049\/cvi2.12317","type":"journal-article","created":{"date-parts":[[2024,10,30]],"date-time":"2024-10-30T12:00:40Z","timestamp":1730289640000},"page":"1223-1234","update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Multi\u2010scale feature extraction for energy\u2010efficient object detection in remote sensing images"],"prefix":"10.1049","volume":"18","author":[{"given":"Di","family":"Wu","sequence":"first","affiliation":[{"name":"Hebei Key Laboratory of Optical Fiber Biosensing and Communication Devices Institute of Information Technology Handan University  Handan China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-5077-9300","authenticated-orcid":false,"given":"Hongning","family":"Liu","sequence":"additional","affiliation":[{"name":"College of Electrical and Mechanical Engineering Dalian Minzu University  Dalian China"}]},{"given":"Jiawei","family":"Xu","sequence":"additional","affiliation":[{"name":"Space Star Technology Co., Ltd  Beijing China"}]},{"given":"Fei","family":"Xie","sequence":"additional","affiliation":[{"name":"Hebei Key Laboratory of Optical Fiber Biosensing and Communication Devices Institute of Information Technology Handan University  Handan China"}]}],"member":"265","published-online":{"date-parts":[[2024,10,30]]},"reference":[{"key":"e_1_2_11_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.isprsjprs.2016.03.014"},{"key":"e_1_2_11_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.isprsjprs.2019.11.023"},{"key":"e_1_2_11_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00418"},{"key":"e_1_2_11_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.isprsjprs.2018.04.003"},{"key":"e_1_2_11_6_1","doi-asserted-by":"publisher","DOI":"10.1080\/01431161.2014.999881"},{"key":"e_1_2_11_7_1","first-page":"433","volume-title":"Proceedings of the 2016 4th International Workshop on Earth Observation and Remote Sensing Applications (EORSA)","author":"Cheng G.","year":"2016"},{"key":"e_1_2_11_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/lgrs.2013.2246538"},{"key":"e_1_2_11_9_1","doi-asserted-by":"publisher","DOI":"10.5194\/isprsarchives\u2010xl\u20103\u2010107\u20102014"},{"key":"e_1_2_11_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/tgrs.2014.2374218"},{"key":"e_1_2_11_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.324"},{"key":"e_1_2_11_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.91"},{"key":"e_1_2_11_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00972"},{"key":"e_1_2_11_14_1","article-title":"Faster r\u2010cnn: towards real\u2010time object detection with region proposal networks","volume":"28","author":"Ren S.","year":"2015","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_2_11_15_1","first-page":"6154","volume-title":"Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Cai Z.","year":"2018"},{"key":"e_1_2_11_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"e_1_2_11_17_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i4.16426"},{"key":"e_1_2_11_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/tgrs.2021.3062048"},{"key":"e_1_2_11_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/tpami.2020.2974745"},{"key":"e_1_2_11_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00296"},{"key":"e_1_2_11_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/tgrs.2022.3149780"},{"key":"e_1_2_11_22_1","first-page":"923","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Hou L.","year":"2022"},{"key":"e_1_2_11_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/tgrs.2022.3183022"},{"key":"e_1_2_11_24_1","article-title":"RTMDet: an empirical study of designing real\u2010time object detectors","author":"Lyu C.","year":"2022","journal-title":"arXiv preprint arXiv:2212.07784"},{"key":"e_1_2_11_25_1","article-title":"An image is worth 16x16 words: transformers for image recognition at scale","author":"Dosovitskiy A.","year":"2020","journal-title":"arXiv preprint arXiv:2010.11929"},{"key":"e_1_2_11_26_1","doi-asserted-by":"publisher","DOI":"10.3390\/rs13234779"},{"key":"e_1_2_11_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TGRS.2022.3222818"},{"key":"e_1_2_11_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TGRS.2022.3144894"},{"key":"e_1_2_11_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TGRS.2022.3157671"},{"key":"e_1_2_11_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/tgrs.2023.3286826"},{"key":"e_1_2_11_31_1","doi-asserted-by":"publisher","DOI":"10.3390\/rs15051196"},{"key":"e_1_2_11_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSTARS.2023.3285259"},{"key":"e_1_2_11_33_1","doi-asserted-by":"publisher","DOI":"10.3390\/rs15030842"},{"key":"e_1_2_11_34_1","first-page":"1440","volume-title":"Proceedings of the Proceedings of the IEEE International Conference on Computer Vision","author":"Girshick R.","year":"2015"},{"key":"e_1_2_11_35_1","first-page":"3520","volume-title":"Proceedings of the Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Xie X.","year":"2021"},{"key":"e_1_2_11_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3594315.3594325"},{"key":"e_1_2_11_37_1","article-title":"Spatial transform decoupling for oriented object detection","author":"Yu H.","year":"2023","journal-title":"arXiv preprint arXiv:2308.10561"},{"key":"e_1_2_11_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00281"},{"key":"e_1_2_11_39_1","article-title":"Interacting embranchment one stage anchor free detector for orientation aerial object detection","author":"Lin Y.","year":"2019","journal-title":"arXiv preprint arXiv:1912.00969"},{"key":"e_1_2_11_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TGRS.2023.3311416"},{"key":"e_1_2_11_41_1","article-title":"Yolox: exceeding yolo series in 2021","author":"Ge Z.","year":"2021","journal-title":"arXiv preprint arXiv:2107.08430"},{"key":"e_1_2_11_42_1","article-title":"Deep and light\u2010weight transformer","author":"Mehta S.","year":"2020","journal-title":"arXiv preprint arXiv:2008.00623"},{"key":"e_1_2_11_43_1","first-page":"12259","volume-title":"Proceedings of the Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Graham B.","year":"2021"},{"key":"e_1_2_11_44_1","article-title":"Separable self\u2010attention for mobile vision transformers","author":"Mehta S.","year":"2022","journal-title":"arXiv preprint arXiv:2206.02680"},{"key":"e_1_2_11_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58589-1_37"},{"key":"e_1_2_11_46_1","article-title":"Mobilevit: light\u2010weight, general\u2010purpose, and mobile\u2010friendly vision transformer","author":"Mehta S.","year":"2021","journal-title":"arXiv preprint arXiv:2110.02178"},{"key":"e_1_2_11_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_2_11_48_1","article-title":"Mobilevitv3: mobile\u2010friendly vision transformer with simple and effective fusion of local, global and input features","author":"Wadekar S.N.","year":"2022","journal-title":"arXiv preprint arXiv:2209.15159"},{"key":"e_1_2_11_49_1","first-page":"12021","volume-title":"Proceedings of the Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Chen J.","year":"2023"},{"key":"e_1_2_11_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01350"},{"key":"e_1_2_11_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_11_52_1","first-page":"1","volume-title":"Proceedings of the Computer Vision and Pattern Recognition","author":"Farhadi A.","year":"2018"},{"key":"e_1_2_11_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW50498.2020.00203"},{"key":"e_1_2_11_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3548541"},{"key":"e_1_2_11_55_1","article-title":"Pytorch: an imperative style, high\u2010performance deep learning library","volume":"32","author":"Paszke A.","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_2_11_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00140"},{"key":"e_1_2_11_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00764"},{"key":"e_1_2_11_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00716"},{"key":"e_1_2_11_59_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_8"},{"key":"e_1_2_11_60_1","first-page":"6105","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Tan M.","year":"2019"},{"key":"e_1_2_11_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2023.3242323"},{"key":"e_1_2_11_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/TGRS.2023.3292418"}],"container-title":["IET Computer Vision"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/pdf\/10.1049\/cvi2.12317","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T11:37:21Z","timestamp":1761565041000},"score":1,"resource":{"primary":{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/10.1049\/cvi2.12317"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,30]]},"references-count":61,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2024,12]]}},"alternative-id":["10.1049\/cvi2.12317"],"URL":"https:\/\/doi.org\/10.1049\/cvi2.12317","archive":["Portico"],"relation":{},"ISSN":["1751-9632","1751-9640"],"issn-type":[{"value":"1751-9632","type":"print"},{"value":"1751-9640","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,30]]},"assertion":[{"value":"2024-02-20","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-07","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}