{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T19:20:38Z","timestamp":1757618438710,"version":"3.44.0"},"reference-count":70,"publisher":"Springer Science and Business Media LLC","issue":"9","license":[{"start":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T00:00:00Z","timestamp":1751414400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T00:00:00Z","timestamp":1751414400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2022YFC3340900"],"award-info":[{"award-number":["2022YFC3340900"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Artif Intell Rev"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>To address the growing size of AI model training data and the lack of a universal data selection methodology\u2013factors that significantly drive up training costs\u2013this paper presents the General Information Metrics Evaluation (GIME) method. GIME leverages general information metrics from Objective Information Theory (OIT), including <jats:italic>volume<\/jats:italic>, <jats:italic>delay<\/jats:italic>, <jats:italic>scope<\/jats:italic>, <jats:italic>granularity<\/jats:italic>, <jats:italic>variety<\/jats:italic>, <jats:italic>duration<\/jats:italic>, <jats:italic>sampling rate<\/jats:italic>, <jats:italic>aggregation<\/jats:italic>, <jats:italic>coverage<\/jats:italic>, <jats:italic>distortion<\/jats:italic>, and <jats:italic>mismatch<\/jats:italic> to optimize dataset selection for training purposes. Comprehensive experiments conducted across diverse domains, such as CTR Prediction, Civil Case Prediction, and Weather Forecasting, demonstrate that GIME effectively preserves model performance while substantially reducing both training time and costs. Additionally, applying GIME within the Judicial AI Program led to a remarkable 39.56% reduction in total model training expenses, underscoring its potential to support efficient and sustainable AI development.<\/jats:p>","DOI":"10.1007\/s10462-025-11281-z","type":"journal-article","created":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T23:24:49Z","timestamp":1751412289000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["General information metrics for improving AI model training efficiency"],"prefix":"10.1007","volume":"58","author":[{"given":"Jianfeng","family":"Xu","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Congcong","family":"Liu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaoying","family":"Tan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaojie","family":"Zhu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anpeng","family":"Wu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huan","family":"Wan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Weijun","family":"Kong","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chun","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hu","family":"Xu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kun","family":"Kuang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fei","family":"Wu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,7,2]]},"reference":[{"key":"11281_CR1","doi-asserted-by":"publisher","first-page":"311","DOI":"10.1016\/j.protcy.2012.05.047","volume":"4","author":"K Abhishek","year":"2012","unstructured":"Abhishek K, Singh MP, Ghosh S, Anand Abhishek (2012) Weather forecasting model using artificial neural network. Procedia Technol 4:311\u2013318. https:\/\/doi.org\/10.1016\/j.protcy.2012.05.047","journal-title":"Procedia Technol"},{"key":"11281_CR2","doi-asserted-by":"publisher","unstructured":"Achiam J, Steven A, Sandhini A, Lama A, Ilge A, Florencia LA, Diogo A, Janko A, Sam A, Shyamal A, et al. (2023). \u201cGpt-4 technical report\u201d. In: arXiv preprint arXiv:2303.08774. https:\/\/doi.org\/10.48550\/arXiv.2303.08774","DOI":"10.48550\/arXiv.2303.08774"},{"key":"11281_CR3","doi-asserted-by":"publisher","unstructured":"Alzubaidi, L, Mohammed AF, Freek H, Asma S, Jose S, Ye D, Ashish G, Kenneth C, Amin A, Yuantong G (2024). \u201cSSP: self-supervised pertaining technique for classification of shoulder implants in x-ray medical images: a broad experimental study\u201d. In: Artificial Intelligence Review 57.10, p. 261. https:\/\/doi.org\/10.1007\/s10462-024-10878-0","DOI":"10.1007\/s10462-024-10878-0"},{"key":"11281_CR4","unstructured":"Ash RB (2012). Information theory. Courier Corporation"},{"key":"11281_CR5","doi-asserted-by":"publisher","unstructured":"Bandi A, Pydi VSRA, Yudu EVPKK (2023). \u201cThe power of generative ai: A review of requirements, models, input\u2013output formats, evaluation metrics, and challenges\u201d. In: Future Internet 15.8, p. 260. https:\/\/doi.org\/10.3390\/fi15080260","DOI":"10.3390\/fi15080260"},{"issue":"3","key":"11281_CR6","doi-asserted-by":"publisher","first-page":"295","DOI":"10.1162\/neco.1989.1.3.295","volume":"1","author":"HB Barlow","year":"1989","unstructured":"Barlow HB (1989) Unsupervised learning. Neural Comput 1(3):295\u2013311","journal-title":"Neural Comput"},{"issue":"1","key":"11281_CR7","first-page":"281","volume":"13","author":"J Bergstra","year":"2012","unstructured":"Bergstra J, Yoshua B (2012) Random search for hyper-parameter optimization. The J Mach Learn Res 13(1):281\u2013305","journal-title":"The J Mach Learn Res"},{"key":"11281_CR8","doi-asserted-by":"publisher","unstructured":"Bialkova, S (2024). \u201cAI transforming business and everyday life\u201d. In: The rise of AI user applications: Chatbots integration foundations and trends. Springer, pp. 143\u2013165. https:\/\/doi.org\/10.1007\/978-3-031-56471-0_9","DOI":"10.1007\/978-3-031-56471-0_9"},{"key":"11281_CR9","doi-asserted-by":"publisher","unstructured":"Bi\u00e7ici E, Serdarcan D (2023). \u201cEfficiently Sampling in Neural Network Training for Click-Through Rate Prediction\u201d. In: 2023 8th International Conference on Computer Science and Engineering (UBMK). IEEE, pp. 469\u2013472. https:\/\/doi.org\/10.1109\/UBMK59864.2023.10286811","DOI":"10.1109\/UBMK59864.2023.10286811"},{"key":"11281_CR10","unstructured":"Borovykh A, Sander B, Cornelis WO (2017). \u201cConditional time series forecasting with convolutional neural networks\u201d. In: arXiv preprint arXiv:1703.04691"},{"key":"11281_CR11","doi-asserted-by":"crossref","unstructured":"Cai W, Muhan Z, Ya Z (2016). \u201cBatch mode active learning for regression with expected model change\u201d. In: IEEE transactions on neural networks and learning systems 28.7, pp. 1668\u20131681","DOI":"10.1109\/TNNLS.2016.2542184"},{"issue":"240","key":"11281_CR12","first-page":"1","volume":"24","author":"A Chowdhery","year":"2023","unstructured":"Chowdhery A, Sharan N, Jacob D, Maarten B, Gaurav M, Adam R, Paul B, Hyung WC, Charles S, Sebastian G et al (2023) Palm: scaling language modeling with pathways. J Mach Learn Res 24(240):1\u2013113","journal-title":"J Mach Learn Res"},{"key":"11281_CR13","doi-asserted-by":"publisher","unstructured":"Covington, P, Jay A, Emre S (2016). \u201cDeep neural networks for youtube recommendations\u201d. In: Proceedings of the 10th ACM conference on recommender systems, pp. 191\u2013198. https:\/\/doi.org\/10.1145\/2959100.2959190","DOI":"10.1145\/2959100.2959190"},{"key":"11281_CR14","doi-asserted-by":"crossref","unstructured":"Cubuk, ED, Barret Z, Dandelion Mane, VV, Quoc VL (2019). \u201cAutoaugment: Learning augmentation strategies from data\u201d. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp. 113\u2013123","DOI":"10.1109\/CVPR.2019.00020"},{"key":"11281_CR15","doi-asserted-by":"crossref","unstructured":"Cui C, Yunsheng M, Xu C, Wenqian Y, Yang Z, Kaizhao L, Jintai C, Juanwu L, Zichong Y, Kuei-Da L, et al. (2024). \u201cA survey on multimodal large language models for autonomous driving\u201d. In: Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, pp. 958\u2013979. https:\/\/doi.org\/https:\/\/doi.ieeecomputersociety.org\/10.1109\/WACVW60836.2024.00106","DOI":"10.1109\/WACVW60836.2024.00106"},{"key":"11281_CR16","doi-asserted-by":"publisher","unstructured":"Cui, J, Xiaoyu S, Shaochun W (2023). \u201cA survey on legal judgment prediction: Datasets, metrics, models and challenges\u201d. In: IEEE Access. https:\/\/doi.org\/10.1109\/ACCESS.2023.3317083","DOI":"10.1109\/ACCESS.2023.3317083"},{"key":"11281_CR17","doi-asserted-by":"publisher","unstructured":"Danka T, Peter H (2018). \u201cmodAL: A modular active learning framework for Python\u201d. In: arXiv preprint arXiv:1805.00979. https:\/\/doi.org\/10.48550\/arXiv.1805.00979","DOI":"10.48550\/arXiv.1805.00979"},{"key":"11281_CR18","doi-asserted-by":"publisher","DOI":"10.1016\/j.cosrev.2021.100379","volume":"40","author":"S Dong","year":"2021","unstructured":"Dong S, Ping W, Khushnood A (2021) A survey on deep learning and its applications. Comput Sci Rev 40:100379. https:\/\/doi.org\/10.1016\/j.cosrev.2021.100379","journal-title":"Comput Sci Rev"},{"issue":"1","key":"11281_CR19","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1007\/s12599-023-00834-7","volume":"66","author":"S Feuerriegel","year":"2024","unstructured":"Feuerriegel S, Jochen H, Christian J, Patrick Z (2024) Generative ai. Business Inform Syst Eng 66(1):111\u2013126. https:\/\/doi.org\/10.1007\/s12599-023-00834-7","journal-title":"Business Inform Syst Eng"},{"key":"11281_CR20","unstructured":"Finn C, Pieter A, Sergey L (2017). \u201cModel-agnostic meta-learning for fast adaptation of deep networks\u201d. In: International conference on machine learning. PMLR, pp. 1126\u20131135"},{"issue":"12","key":"11281_CR21","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1145\/3458723","volume":"64","author":"T Gebru","year":"2021","unstructured":"Gebru T, Jamie M, Briana V, Jennifer WV, Hanna W, Hal Daum\u00e9 I, Kate C (2021) Datasheets for datasets. Commun ACM 64(12):86\u201392. https:\/\/doi.org\/10.1145\/3458723","journal-title":"Commun ACM"},{"issue":"11","key":"11281_CR22","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1145\/3422622","volume":"63","author":"I Goodfellow","year":"2020","unstructured":"Goodfellow I, Jean P-A, Mehdi M, Bing X, David W-F, Sherjil O, Aaron C, Yoshua B (2020) Generative adversarial networks. Commun ACM 63(11):139\u2013144","journal-title":"Commun ACM"},{"key":"11281_CR23","doi-asserted-by":"publisher","unstructured":"Gunasekar S, Yi Z, Jyoti A, Caio CTM, Allie DG, Sivakanth G, Mojan J, Piero K, Gustavo dR, Olli S, et al. (2023). \u201cTextbooks Are All You Need\u201d. In: arXiv preprint arXiv:2306.11644. https:\/\/doi.org\/10.48550\/arXiv.2306.11644","DOI":"10.48550\/arXiv.2306.11644"},{"key":"11281_CR24","doi-asserted-by":"publisher","unstructured":"Guo, H, Ruiming T, Yunming Y, Zhenguo L, Xiuqiang H (2017). \u201cDeepFM: a factorization-machine based neural network for CTR prediction\u201d. In: arXiv preprint arXiv:1703.04247. https:\/\/doi.org\/10.48550\/arXiv.1703.04247","DOI":"10.48550\/arXiv.1703.04247"},{"issue":"2","key":"11281_CR25","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1109\/MIS.2009.36","volume":"24","author":"A Halevy","year":"2009","unstructured":"Halevy A, Peter N, Fernando P (2009) The unreasonable effectiveness of data. IEEE Intel Syst 24(2):8\u201312","journal-title":"IEEE Intel Syst"},{"issue":"5","key":"11281_CR26","doi-asserted-by":"publisher","first-page":"2753","DOI":"10.3390\/app13052753","volume":"13","author":"OH Hamid","year":"2023","unstructured":"Hamid OH (2023) Data-centric and model-centric AI: Twin drivers of compact and robust industry 4.0 solutions. Appl Sci 13(5):2753. https:\/\/doi.org\/10.3390\/app13052753","journal-title":"Appl Sci"},{"key":"11281_CR27","first-page":"6840","volume":"33","author":"J Ho","year":"2020","unstructured":"Ho J, Jain A, Abbeel Pieter (2020) Denoising diffusion probabilistic models. Adv Neural Inform Proc Syst 33:6840\u20136851","journal-title":"Adv Neural Inform Proc Syst"},{"key":"11281_CR28","doi-asserted-by":"publisher","unstructured":"Hochreiter, S (1997). \u201cLong Short-term Memory\u201d. In: Neural Computation MIT-Press. https:\/\/doi.org\/10.1162\/neco.1997.9.8.1735","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"11281_CR29","doi-asserted-by":"publisher","DOI":"10.1109\/MNET.2024.3353377","author":"X Huang","year":"2024","unstructured":"Huang X, Peichun L, Hongyang D, Jiawen K, Dusit N, Dong IK, Yuan W (2024) Federated learning-empowered AI-generated content in wireless networks. IEEE Network. https:\/\/doi.org\/10.1109\/MNET.2024.3353377","journal-title":"IEEE Network"},{"key":"11281_CR30","doi-asserted-by":"publisher","DOI":"10.1007\/s12599-024-00857-8","author":"J Jakubik","year":"2024","unstructured":"Jakubik J, Michael V, Niklas K, Jannis W, Gerhard S (2024) Data-centric artificial intelligence. Business Inform Syst Eng. https:\/\/doi.org\/10.1007\/s12599-024-00857-8","journal-title":"Business Inform Syst Eng"},{"key":"11281_CR31","doi-asserted-by":"crossref","unstructured":"Jiao B, Xin L, Jingbo X, Brij BG, Lei B, Qingshan Z (2022). \u201cHierarchical sampling for the visualization of large scale-free graphs\u201d. In: IEEE Transactions on Visualization and Computer Graphics 29.12, pp. 5111\u20135123","DOI":"10.1109\/TVCG.2022.3201567"},{"key":"11281_CR32","doi-asserted-by":"publisher","unstructured":"Kaplan, J, Sam M, Tom H, Tom BB, Benjamin C, Rewon C, Scott G, Alec R, JW, Dario A (2020). \u201cScaling laws for neural language models\u201d. In: arXiv preprint arXiv:2001.08361. https:\/\/doi.org\/10.48550\/arXiv.2001.08361","DOI":"10.48550\/arXiv.2001.08361"},{"key":"11281_CR33","unstructured":"Kingma DP, Max W (2022). Auto-Encoding Variational Bayes. arXiv: https:\/\/arxiv.org\/abs\/1312.6114"},{"key":"11281_CR34","doi-asserted-by":"publisher","unstructured":"Li Y, S\u00e9bastien B, Ronen E, AD G, Suriya G, Yin TL (2023). \u201cTextbooks Are All You Need II: phi-1.5 technical report\u201d. In: arXiv preprint arXiv:2309.05463 5. https:\/\/doi.org\/10.48550\/arXiv.2309.05463","DOI":"10.48550\/arXiv.2309.05463"},{"key":"11281_CR35","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2024.3357138","author":"S Liu","year":"2024","unstructured":"Liu S, Linlin Y, Rui Z, Bing L, Rui L, Han Y, Chau Y (2024) Afm3d: an asynchronous federated meta-learning framework for driver distraction detection. IEEE Trans Intel Trans Syst. https:\/\/doi.org\/10.1109\/TITS.2024.3357138","journal-title":"IEEE Trans Intel Trans Syst"},{"key":"11281_CR36","doi-asserted-by":"publisher","unstructured":"Ma L, Yating Z, Tianyi W, Xiaozhong L, Wei Y, Changlong S, Shikun Z (2021). \u201cLegal judgment prediction with multi-stage case representation learning in the real court setting\u201d. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 993\u20131002. https:\/\/doi.org\/10.1145\/3404835.3462945","DOI":"10.1145\/3404835.3462945"},{"key":"11281_CR37","doi-asserted-by":"publisher","unstructured":"Minaee S, Tomas M, Narjes N, Meysam C, Richard S, Xavier A, Jianfeng G (2024). \u201cLarge language models: A survey\u201d. In: arXiv preprint arXiv:2402.06196. https:\/\/doi.org\/10.48550\/arXiv.2402.06196","DOI":"10.48550\/arXiv.2402.06196"},{"key":"11281_CR38","unstructured":"Mnih V, Koray K, David S, Alex G, Ioannis A, Daan W, Martin R (2013). \u201cPlaying atari with deep reinforcement learning\u201d. In: arXiv preprint arXiv:1312.5602"},{"key":"11281_CR39","doi-asserted-by":"publisher","unstructured":"Motamedi, M, Nikolay S, Tim K (2021). \u201cA data-centric approach for training deep neural networks with less data\u201d. In: arXiv preprint arXiv:2110.03613. https:\/\/doi.org\/10.48550\/arXiv.2110.03613","DOI":"10.48550\/arXiv.2110.03613"},{"key":"11281_CR40","unstructured":"Okanovic P, Roger W, Vasilis M, Konstantinos EN, Amin K, Dionysis K, Nezihe MG, Theodoros R (2023). \u201cRepeated random sampling for minimizing the time-to-accuracy of learning\u201d. In: arXiv preprint arXiv:2305.18424"},{"key":"11281_CR41","doi-asserted-by":"publisher","first-page":"296","DOI":"10.1007\/s10489-019-01519-z","volume":"50","author":"SH Park","year":"2020","unstructured":"Park SH, Kim Seoung Bum (2020) Robust expected model change for active learning in regression. Appl Intel 50:296\u2013313","journal-title":"Appl Intel"},{"key":"11281_CR42","doi-asserted-by":"crossref","unstructured":"Pawluk M, Pawe\u0142, T, Jan M (2019). \u201cInformation-theoretic feature selection using high-order interactions\u201d. In: Machine Learning, Optimization, and Data Science: 4th International Conference, LOD 2018, Volterra, Italy, September 13-16, 2018, Revised Selected Papers 4. Springer, pp. 51\u201363","DOI":"10.1007\/978-3-030-13709-0_5"},{"key":"11281_CR43","doi-asserted-by":"publisher","unstructured":"Pouyanfar S, Saad S, Yilin Y, Haiman T, Yudong T, Maria PR, Mei-Ling S, Shu-Ching C, Sundaraja Sr (2018). \u201cA survey on deep learning: Algorithms, techniques, and applications\u201d. In: ACM computing surveys (CSUR) 51.5, pp. 1\u201336. https:\/\/doi.org\/10.1145\/3234150","DOI":"10.1145\/3234150"},{"key":"11281_CR44","unstructured":"Program on Information Resources Policy (1976). Program projects, annual report volume 2 (1975\u20131976). Technical Report R-76-2. Cambridge, MA, USA: Computation Laboratory, Harvard University"},{"key":"11281_CR45","unstructured":"R\u00e9nyi A (1961). \u201cOn measures of entropy and information\u201d. In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, volume 1: contributions to the theory of statistics. Vol. 4. University of California Press, pp. 547\u2013562"},{"issue":"3","key":"11281_CR46","doi-asserted-by":"publisher","first-page":"620","DOI":"10.1162\/dint_a_00155","volume":"4","author":"B Sekeroglu","year":"2022","unstructured":"Sekeroglu B, Yoney KE, Kamil D, Fadi A-T (2022) Comparative evaluation and comprehensive analysis of machine learning models for regression problems. Data Intel 4(3):620\u2013652. https:\/\/doi.org\/10.1162\/dint_a_00155","journal-title":"Data Intel"},{"key":"11281_CR47","volume-title":"Information rules: a strategic guide to the network economy","author":"C Shapiro","year":"1999","unstructured":"Shapiro C (1999) Information rules: a strategic guide to the network economy. Harvard Business School Press, Cambridge"},{"key":"11281_CR48","doi-asserted-by":"publisher","unstructured":"Shi C, Tania S, Bin L (2021). \u201cThe Smart Court-A New Pathway to Justice in China?\u201d In: IJCA. Vol. 12. HeinOnline, p. 1. https:\/\/doi.org\/10.36745\/ijca.367","DOI":"10.36745\/ijca.367"},{"issue":"1","key":"11281_CR49","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-019-0197-0","volume":"6","author":"C Shorten","year":"2019","unstructured":"Shorten C, Taghi MK (2019) A survey on image data augmentation for deep learning. J Big Data 6(1):1\u201348. https:\/\/doi.org\/10.1186\/s40537-019-0197-0","journal-title":"J Big Data"},{"issue":"3","key":"11281_CR50","doi-asserted-by":"publisher","first-page":"144","DOI":"10.1016\/j.dsm.2023.06.001","volume":"6","author":"P Singh","year":"2023","unstructured":"Singh P (2023) Systematic review of data-centric approaches in artificial intelligence and machine learning. Data Sci Manag 6(3):144\u2013157. https:\/\/doi.org\/10.1016\/j.dsm.2023.06.001","journal-title":"Data Sci Manag"},{"key":"11281_CR51","doi-asserted-by":"publisher","unstructured":"Song Q, Dehua C, Hanning Z, Jiyan Y, Yuandong T, Xia H (2020). \u201cTowards automated neural interaction discovery for click-through rate prediction\u201d. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 945\u2013955. https:\/\/doi.org\/10.1145\/3394486.3403137","DOI":"10.1145\/3394486.3403137"},{"issue":"1","key":"11281_CR52","doi-asserted-by":"publisher","first-page":"1108","DOI":"10.21037\/qims-23-892","volume":"14","author":"D Tian","year":"2024","unstructured":"Tian D, Shitao J, Lei Z, Xin L, Yiyao X (2024) The role of large language models in medical image processing: a narrative review. Quant Imaging Med Surg 14(1):1108. https:\/\/doi.org\/10.21037\/qims-23-892","journal-title":"Quant Imaging Med Surg"},{"key":"11281_CR53","doi-asserted-by":"publisher","unstructured":"Touvron H, Thibaut L, Gautier I, Xavier M, Marie-Anne L, Timoth\u00e9e L, Baptiste R, Naman G, Eric H, Faisal A, et al. (2023). \u201cLlama: Open and efficient foundation language models\u201d. In: arXiv preprint arXiv:2302.13971. https:\/\/doi.org\/10.48550\/arXiv.2302.13971","DOI":"10.48550\/arXiv.2302.13971"},{"key":"11281_CR54","doi-asserted-by":"publisher","unstructured":"Touvron H, Louis M, Kevin S, Peter A, Amjad A, Yasmine B, Nikolay B, Soumya B, Prajjwal B, Shruti B, et al. (2023). \u201cLlama 2: Open foundation and fine-tuned chat models\u201d. In: arXiv preprint arXiv:2307.09288. https:\/\/doi.org\/10.48550\/arXiv.2307.09288","DOI":"10.48550\/arXiv.2307.09288"},{"key":"11281_CR55","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2022.108603","volume":"127","author":"J Wan","year":"2022","unstructured":"Wan J, Hongmei C, Tianrui L, Wei H, Min L, Chuan L (2022) R2CI: information theoretic-guided feature selection with multiple correlations. Pattern Recognit 127:108603","journal-title":"Pattern Recognit"},{"key":"11281_CR56","doi-asserted-by":"publisher","first-page":"56","DOI":"10.5334\/dsj-2019-056","volume":"18","author":"J Wang","year":"2019","unstructured":"Wang J, Zhao Y, Chen J, Zhang S, Zhao X, He Y (2019) Efficient stratified sampling graphing method for mass data. Data Sci J 18:56\u201356","journal-title":"Data Sci J"},{"key":"11281_CR57","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/11810.001.0001","volume-title":"Cybernetics or control and communication in the animal and the machine","author":"N Wiener","year":"2019","unstructured":"Wiener N (2019) Cybernetics or control and communication in the animal and the machine. MIT press, Cambridge"},{"key":"11281_CR58","doi-asserted-by":"crossref","unstructured":"Wongvorachan T, Surina H, Okan B (2023). \u201cA comparison of undersampling, oversampling, and SMOTE methods for dealing with imbalanced classification in educational data mining\u201d. In: Information 14.1, p. 54","DOI":"10.3390\/info14010054"},{"key":"11281_CR59","doi-asserted-by":"crossref","unstructured":"Xu J (2024). \u201cResearch and Application of General Information Measures Based on a Unified Model\u201d. In: IEEE Transactions on Computers. https:\/\/doi.ieeecomputersociety.org\/10.1109\/TC.2024.3349650","DOI":"10.1109\/TC.2024.3349650"},{"key":"11281_CR60","doi-asserted-by":"publisher","DOI":"10.1016\/j.eng.2022.04.018","author":"J Xu","year":"2022","unstructured":"Xu J, Zhenyu L, Shuliang W, Tao Z, Yashi W, Yingfei W, Yingxu D (2022) Foundations and applications of information systems dynamics. Engineering. https:\/\/doi.org\/10.1016\/j.eng.2022.04.018","journal-title":"Engineering"},{"key":"11281_CR61","doi-asserted-by":"publisher","unstructured":"Xu, J, Xuefeng M, Yanli S, Jun T, Bin X, Yongjie Q (2014). \u201cObjective information theory: A Sextuple model and 9 kinds of metrics\u201d. In: 2014 Science and information conference. IEEE, pp. 793\u2013802. https:\/\/doi.org\/10.1109\/SAI.2014.6918277","DOI":"10.1109\/SAI.2014.6918277"},{"key":"11281_CR62","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-19-9929-1","volume-title":"Objective information theory","author":"J Xu","year":"2023","unstructured":"Xu J, Shuliang W, Zhenyu L, Yashi W, Yingfei W, Yingxu D (2023) Objective information theory. Springer Nature, Cham"},{"issue":"2","key":"11281_CR63","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2021.102853","volume":"59","author":"Y Yang","year":"2022","unstructured":"Yang Y, Panyu Z (2022) Click-through rate prediction in online advertising: a literature review. Information Proc Manag 59(2):102853. https:\/\/doi.org\/10.1016\/j.ipm.2021.102853","journal-title":"Information Proc Manag"},{"key":"11281_CR64","doi-asserted-by":"crossref","unstructured":"Yoo D, In SK (2019). \u201cLearning loss for active learning\u201d. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp. 93\u2013102","DOI":"10.1109\/CVPR.2019.00018"},{"key":"11281_CR65","first-page":"108735","volume":"37","author":"Z Yu","year":"2024","unstructured":"Yu Z, Das S, Xiong C (2024) Mates: model-aware data selection for efficient pretraining with data influence models. Adv Neural Inform Proc Syst 37:108735\u2013108759","journal-title":"Adv Neural Inform Proc Syst"},{"key":"11281_CR66","doi-asserted-by":"publisher","unstructured":"Zha D, Zaid PB, Kwei-Herng L, Fan Y, Zhimeng J, Shaochen Z, Xia H (2023). \u201cData-centric artificial intelligence: A survey\u201d. In: arXiv preprint arXiv:2303.10158. https:\/\/doi.org\/10.48550\/arXiv.2303.10158","DOI":"10.48550\/arXiv.2303.10158"},{"key":"11281_CR67","doi-asserted-by":"publisher","unstructured":"Zhan X, Qingzhong W, Kuan-hao H, Haoyi X, Dejing D, Antoni BC (2022). \u201cA comparative survey of deep active learning\u201d. In: arXiv preprint arXiv:2203.13450. https:\/\/doi.org\/10.48550\/arXiv.2203.13450","DOI":"10.48550\/arXiv.2203.13450"},{"key":"11281_CR68","unstructured":"Zhang C, Huaping Z, Kuan Z, Chengliang C, Rui W, Xinlin Z, Tianyi B, Qiu J, Lei C, Ju F, Ye Y, Guoren W, Conghui H (2025). \u201cHarnessing Diversity for Important Data Selection in Pretraining Large Language Models\u201d. In: The Thirteenth International Conference on Learning Representations"},{"key":"11281_CR69","doi-asserted-by":"publisher","unstructured":"Zhang Z, Xu H, Zhiyuan L, Xin J, Maosong S, Qun L (2019). \u201cERNIE: Enhanced Language Representation with Informative Entities\u201d. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1441\u20131451. https:\/\/doi.org\/10.18653\/v1\/P19-1139","DOI":"10.18653\/v1\/P19-1139"},{"key":"11281_CR70","doi-asserted-by":"publisher","unstructured":"Zhao WX, Kun Z, Junyi L, Tianyi T, Xiaolei W, Yupeng H, Yingqian M, Beichen Z, Junjie Z, Zican D, et al. (2023). \u201cA survey of large language models\u201d. In: arXiv preprint arXiv:2303.18223. https:\/\/doi.org\/10.48550\/arXiv.2303.18223","DOI":"10.48550\/arXiv.2303.18223"}],"container-title":["Artificial Intelligence Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-025-11281-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10462-025-11281-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-025-11281-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,7]],"date-time":"2025-09-07T00:24:08Z","timestamp":1757204648000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10462-025-11281-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,2]]},"references-count":70,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["11281"],"URL":"https:\/\/doi.org\/10.1007\/s10462-025-11281-z","relation":{},"ISSN":["1573-7462"],"issn-type":[{"type":"electronic","value":"1573-7462"}],"subject":[],"published":{"date-parts":[[2025,7,2]]},"assertion":[{"value":"30 May 2025","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 July 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"289"}}