{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T19:10:58Z","timestamp":1773515458729,"version":"3.50.1"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2025,4,8]],"date-time":"2025-04-08T00:00:00Z","timestamp":1744070400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,4,8]],"date-time":"2025-04-08T00:00:00Z","timestamp":1744070400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"CLARIFY - European Union\u2019s Horizon research and innovation program","award":["860627"],"award-info":[{"award-number":["860627"]}]},{"name":"CLARIFY - European Union\u2019s Horizon research and innovation program","award":["860627"],"award-info":[{"award-number":["860627"]}]},{"name":"ARTI-CONF","award":["825134"],"award-info":[{"award-number":["825134"]}]},{"name":"ARTI-CONF","award":["825134"],"award-info":[{"award-number":["825134"]}]},{"name":"ENVRI-FAIR","award":["824068"],"award-info":[{"award-number":["824068"]}]},{"name":"ENVRI-FAIR","award":["824068"],"award-info":[{"award-number":["824068"]}]},{"name":"LifeWatch ERIC"},{"name":"BLUECLOUD 2026","award":["101094227"],"award-info":[{"award-number":["101094227"]}]},{"name":"NWO LTER-LIFE"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Data Sci Anal"],"published-print":{"date-parts":[[2025,10]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Oceanic research initiatives like Argo, GLOSS, and EMSO aim to enhance our understanding of the oceans and climate through extensive data collection. Maintaining the quality of collected data is essential for effective data analysis and real-world applications. While automated and semi-automated tests can provide real-time or near-real-time validation, thorough quality control still depends on operator review. Consequently, current Quality Control (QC) processes continue to be labor-intensive. Machine Learning (ML) methods, which can analyze vast amounts of data and learn complex patterns autonomously, offer significant potential for improving QC processes. However, challenges like severe data disproportion persist for ML approaches. This article proposes exploiting active learning (AL) to assist QC experts, reducing their workload by proactively selecting informative data points for labeling. Targeting the data distribution challenge, AL, coupled with imbalance-resilient classifiers, enhances model performance in recognizing erroneous data points. To mitigate the cold-start problem in AL, we propose outlier detection for initializing classifiers, significantly reducing annotation costs. Our approach is tested on data generated by 5 Argo floats, demonstrating its feasibility to lessen the labeling workload for experts and tackle significant data imbalance. Although the experiments are limited in scale, the findings indicate a promising outlook for using active learning in ocean data quality assessment, facilitating an effective semi-automated quality control framework.<\/jats:p>","DOI":"10.1007\/s41060-025-00751-w","type":"journal-article","created":{"date-parts":[[2025,4,8]],"date-time":"2025-04-08T16:15:14Z","timestamp":1744128914000},"page":"4777-4798","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Leveraging active learning for ocean data quality assessment: reducing labeling workload and addressing severe data imbalance challenges"],"prefix":"10.1007","volume":"20","author":[{"given":"Na","family":"Li","sequence":"first","affiliation":[]},{"given":"Yiyang","family":"Qi","sequence":"additional","affiliation":[]},{"given":"Ruyue","family":"Xin","sequence":"additional","affiliation":[]},{"given":"Peide","family":"Zhu","sequence":"additional","affiliation":[]},{"given":"Zhiming","family":"Zhao","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,4,8]]},"reference":[{"key":"751_CR1","unstructured":"Team, A.S.: Argo: the global array of profiling floats. Observing the Oceans in the 21st Century. Publisher: GODAE Project Office, Bureau of Meteorology (2001)"},{"key":"751_CR2","unstructured":"Merrifield, M., Aarup, T., Allen, A., Aman, A., Caldwell, P., Bradshaw, E., Fernandes, R., Hayashibara, H., Hernandez, F., Kilonsky, B., et al.: The global sea level observing system (GLOSS). Proceedings of the OceanObs 9 (2009)"},{"issue":"1","key":"751_CR3","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1016\/j.nima.2008.12.214","volume":"602","author":"P Favali","year":"2009","unstructured":"Favali, P., Beranzoli, L.: EMSO: European multidisciplinary seafloor observatory. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrom. Detect. Assoc. Equip. 602(1), 21\u201327 (2009). https:\/\/doi.org\/10.1016\/j.nima.2008.12.214","journal-title":"Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrom. Detect. Assoc. Equip."},{"key":"751_CR4","doi-asserted-by":"publisher","unstructured":"Wong, A., Keeley, R., Carval, T.: Argo quality control manual for CTD and trajectory data (2024). https:\/\/doi.org\/10.13155\/33951","DOI":"10.13155\/33951"},{"key":"751_CR5","doi-asserted-by":"publisher","unstructured":"Abeysirigunawardena, D., Jeffries, M., Morley, M.G., Bui, A.O.V., Hoeberechts, M.: Data quality control and quality assurance practices for Ocean Networks Canada observatories. In: OCEANS 2015\u2014MTS\/IEEE Washington, pp. 1\u20138 (2015). https:\/\/doi.org\/10.23919\/OCEANS.2015.7404600","DOI":"10.23919\/OCEANS.2015.7404600"},{"issue":"21","key":"751_CR6","doi-asserted-by":"publisher","first-page":"3470","DOI":"10.3390\/rs12213470","volume":"12","author":"R Diamant","year":"2020","unstructured":"Diamant, R., Shachar, I., Makovsky, Y., Ferreira, B.M., Cruz, N.A.: Cross-sensor quality assurance for marine observatories. Remote Sens. 12(21), 3470 (2020). https:\/\/doi.org\/10.3390\/rs12213470","journal-title":"Remote Sens."},{"issue":"8","key":"751_CR7","doi-asserted-by":"publisher","first-page":"2628","DOI":"10.3390\/s18082628","volume":"18","author":"Y Zhou","year":"2018","unstructured":"Zhou, Y., Qin, R., Xu, H., Sadiq, S., Yu, Y.: A data quality control method for seafloor observatories: the application of observed time series data in the East China Sea. Sensors 18(8), 2628 (2018). https:\/\/doi.org\/10.3390\/s18082628","journal-title":"Sensors"},{"key":"751_CR8","doi-asserted-by":"publisher","DOI":"10.3389\/fmars.2021.611742","volume":"8","author":"S Mieruch","year":"2021","unstructured":"Mieruch, S., Demirel, S., Simoncelli, S., Schlitzer, R., Seitz, S.: SalaciaML: a deep learning approach for supporting ocean data quality control. Front. Mar. Sci. 8, 611742 (2021)","journal-title":"Front. Mar. Sci."},{"key":"751_CR9","doi-asserted-by":"publisher","DOI":"10.1016\/j.cageo.2021.104803","volume":"155","author":"GP Castel\u00e3o","year":"2021","unstructured":"Castel\u00e3o, G.P.: A machine learning approach to quality control oceanographic data. Comput. Geosci. 155, 104803 (2021). https:\/\/doi.org\/10.1016\/j.cageo.2021.104803","journal-title":"Comput. Geosci."},{"key":"751_CR10","doi-asserted-by":"publisher","first-page":"429","DOI":"10.1016\/j.ins.2019.11.004","volume":"513","author":"F Thabtah","year":"2020","unstructured":"Thabtah, F., Hammoud, S., Kamalov, F., Gonsalves, A.: Data imbalance in classification: experimental evaluation. Inf. Sci. 513, 429\u2013441 (2020). https:\/\/doi.org\/10.1016\/j.ins.2019.11.004","journal-title":"Inf. Sci."},{"key":"751_CR11","unstructured":"Settles, B.: Active Learning Literature Survey. Technical Report, University of Wisconsin-Madison Department of Computer Sciences (2009)"},{"key":"751_CR12","doi-asserted-by":"publisher","unstructured":"Argo: Argo float data and metadata from Global Data Assembly Centre (Argo GDAC). [object Object] (2023). https:\/\/doi.org\/10.17882\/42182. https:\/\/www.seanoe.org\/data\/00311\/42182\/ Accessed 2023-08-02","DOI":"10.17882\/42182"},{"key":"751_CR13","doi-asserted-by":"publisher","unstructured":"Li, N., Qi, Y., Xin, R., Zhao, Z.: Ocean data quality assessment through outlier detection-enhanced active learning. In: 2023 IEEE International Conference on Big Data (BigData), pp. 102\u2013107 (2023). https:\/\/doi.org\/10.1109\/BigData59044.2023.10386969","DOI":"10.1109\/BigData59044.2023.10386969"},{"key":"751_CR14","doi-asserted-by":"publisher","first-page":"1152236","DOI":"10.3389\/fmars.2023.1152236","volume":"10","author":"AM Sk\u00e5lvik","year":"2023","unstructured":"Sk\u00e5lvik, A.M., Saetre, C., Fr\u00f8ysa, K.-E., Bj\u00f8rk, R.N., Tengberg, A.: Challenges, limitations, and measurement strategies to ensure data quality in deep-sea sensors. Front. Mar. Sci. 10, 1152236 (2023). https:\/\/doi.org\/10.3389\/fmars.2023.1152236","journal-title":"Front. Mar. Sci."},{"issue":"4","key":"751_CR15","doi-asserted-by":"publisher","first-page":"1305","DOI":"10.1175\/JCLI-D-15-0028.1","volume":"29","author":"F Gaillard","year":"2016","unstructured":"Gaillard, F., Reynaud, T., Thierry, V., Kolodziejczyk, N., Schuckmann, K.: In situ-based reanalysis of the global ocean temperature and salinity with ISAS: variability of the heat content and steric height. J. Clim. 29(4), 1305\u20131323 (2016)","journal-title":"J. Clim."},{"issue":"5","key":"751_CR16","doi-asserted-by":"publisher","first-page":"789","DOI":"10.1175\/JTECH-D-18-0244.1","volume":"37","author":"J Gourrion","year":"2020","unstructured":"Gourrion, J., Szekely, T., Killick, R., Owens, B., Reverdin, G., Chapron, B.: Improved statistical method for quality control of hydrographic observations. J. Atmos. Ocean. Technol. 37(5), 789\u2013806 (2020). https:\/\/doi.org\/10.1175\/JTECH-D-18-0244.1","journal-title":"J. Atmos. Ocean. Technol."},{"issue":"9","key":"751_CR17","doi-asserted-by":"publisher","first-page":"1263","DOI":"10.1109\/TKDE.2008.239","volume":"21","author":"H He","year":"2009","unstructured":"He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263\u20131284 (2009). https:\/\/doi.org\/10.1109\/TKDE.2008.239","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"751_CR18","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-023-06353-6","author":"G Aguiar","year":"2023","unstructured":"Aguiar, G., Krawczyk, B., Cano, A.: A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework. Mach. Learn. (2023). https:\/\/doi.org\/10.1007\/s10994-023-06353-6","journal-title":"Mach. Learn."},{"key":"751_CR19","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1613\/jair.953","volume":"16","author":"NV Chawla","year":"2002","unstructured":"Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321\u2013357 (2002). https:\/\/doi.org\/10.1613\/jair.953","journal-title":"J. Artif. Intell. Res."},{"key":"751_CR20","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1016\/j.ins.2017.05.008","volume":"409\u2013410","author":"W-C Lin","year":"2017","unstructured":"Lin, W.-C., Tsai, C.-F., Hu, Y.-H., Jhang, J.-S.: Clustering-based undersampling in class-imbalanced data. Inf. Sci. 409\u2013410, 17\u201326 (2017). https:\/\/doi.org\/10.1016\/j.ins.2017.05.008","journal-title":"Inf. Sci."},{"key":"751_CR21","doi-asserted-by":"publisher","unstructured":"Junsomboon, N., Phienthrakul, T.: Combining over-sampling and under-sampling techniques for imbalance dataset. In: Proceedings of the 9th International Conference on Machine Learning and Computing. ICMLC \u201917, pp. 243\u2013247. Association for Computing Machinery, New York, NY, USA (2017). https:\/\/doi.org\/10.1145\/3055635.3056643","DOI":"10.1145\/3055635.3056643"},{"key":"751_CR22","doi-asserted-by":"publisher","unstructured":"Thai-Nghe, N., Gantner, Z., Schmidt-Thieme, L.: Cost-sensitive learning methods for imbalanced data. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1\u20138 (2010). https:\/\/doi.org\/10.1109\/IJCNN.2010.5596486. ISSN: 2161-4407. https:\/\/ieeexplore.ieee.org\/abstract\/document\/5596486. Accessed 2024-03-27","DOI":"10.1109\/IJCNN.2010.5596486"},{"key":"751_CR23","unstructured":"Schapire, R.E.: A brief introduction to boosting. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence\u2014Volume 2. IJCAI\u201999, pp. 1401\u20131406. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999)"},{"key":"751_CR24","unstructured":"Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.-Y.: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS\u201917, pp. 3149\u20133157. Curran Associates Inc., Red Hook, NY, USA (2017)"},{"key":"751_CR25","doi-asserted-by":"publisher","unstructured":"Chen, T., Guestrin, C.: XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery And Data Mining, pp. 785\u2013794. ACM, San Francisco California USA (2016). https:\/\/doi.org\/10.1145\/2939672.2939785","DOI":"10.1145\/2939672.2939785"},{"issue":"1","key":"751_CR26","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman, L.: Random forests. Mach. Learn. 45(1), 5\u201332 (2001). https:\/\/doi.org\/10.1023\/A:1010933404324","journal-title":"Mach. Learn."},{"key":"751_CR27","doi-asserted-by":"publisher","unstructured":"Aghdam, H.H., Gonzalez-Garcia, A., Lopez, A., Weijer, J.: Active learning for deep detection neural networks. In: 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 3671\u20133679 (2019). https:\/\/doi.org\/10.1109\/ICCV.2019.00377. ISSN: 2380-7504. https:\/\/ieeexplore.ieee.org\/document\/9009535. Accessed 2024-04-03","DOI":"10.1109\/ICCV.2019.00377"},{"issue":"9","key":"751_CR28","doi-asserted-by":"publisher","first-page":"180","DOI":"10.1145\/3472291","volume":"54","author":"P Ren","year":"2021","unstructured":"Ren, P., Xiao, Y., Chang, X., Huang, P.-Y., Li, Z., Gupta, B.B., Chen, X., Wang, X.: A survey of deep active learning. ACM Comput. Surv. 54(9), 180\u2013118040 (2021). https:\/\/doi.org\/10.1145\/3472291","journal-title":"ACM Comput. Surv."},{"key":"751_CR29","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2021.102062","volume":"71","author":"S Budd","year":"2021","unstructured":"Budd, S., Robinson, E.C., Kainz, B.: A survey on active learning and human-in-the-loop deep learning for medical image analysis. Med. Image Anal. 71, 102062 (2021). https:\/\/doi.org\/10.1016\/j.media.2021.102062","journal-title":"Med. Image Anal."},{"key":"751_CR30","doi-asserted-by":"publisher","unstructured":"Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory. COLT \u201992, pp. 287\u2013294. Association for Computing Machinery, New York, NY, USA (1992). https:\/\/doi.org\/10.1145\/130385.130417","DOI":"10.1145\/130385.130417"},{"key":"751_CR31","unstructured":"Chandra, A.L., Desai, S.V., Devaguptapu, C., Balasubramanian, V.N.: On initial pools for deep active learning. In: NeurIPS 2020 Workshop on Pre-registration in Machine Learning, pp. 14\u201332. PMLR, Virtual (2021). ISSN: 2640-3498. https:\/\/proceedings.mlr.press\/v148\/chandra21a.html Accessed 2024-03-27"},{"key":"751_CR32","doi-asserted-by":"publisher","unstructured":"Liu, F.T., Ting, K.M., Zhou, Z.-H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413\u2013422 (2008). https:\/\/doi.org\/10.1109\/ICDM.2008.17. ISSN: 2374-8486. https:\/\/ieeexplore.ieee.org\/document\/4781136. Accessed 2024-03-27","DOI":"10.1109\/ICDM.2008.17"},{"key":"751_CR33","unstructured":"Sch\u00f6lkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., Platt, J.: Support vector method for novelty detection. In: Proceedings of the 12th International Conference on Neural Information Processing Systems. NIPS\u201999, pp. 582\u2013588. MIT Press, Cambridge, MA, USA (1999)"},{"key":"751_CR34","doi-asserted-by":"publisher","unstructured":"Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: Identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management Of data, pp. 93\u2013104. ACM, Dallas Texas USA (2000). https:\/\/doi.org\/10.1145\/342009.335388","DOI":"10.1145\/342009.335388"},{"issue":"5","key":"751_CR35","doi-asserted-by":"publisher","first-page":"1189","DOI":"10.1214\/aos\/1013203451","volume":"29","author":"JH Friedman","year":"2001","unstructured":"Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189\u20131232 (2001)","journal-title":"Ann. Stat."},{"key":"751_CR36","unstructured":"Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A.: CatBoost: unbiased boosting with categorical features. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS\u201918, pp. 6639\u20136649. Curran Associates Inc., Red Hook, NY, USA (2018)"},{"key":"751_CR37","doi-asserted-by":"publisher","unstructured":"Ertekin, S., Huang, J., Bottou, L., Giles, L.: Learning on the border: active learning in imbalanced data classification. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management. CIKM \u201907, pp. 127\u2013136. Association for Computing Machinery, New York, NY, USA (2007). https:\/\/doi.org\/10.1145\/1321440.1321461. https:\/\/dl.acm.org\/doi\/10.1145\/1321440.1321461 Accessed 2024-03-27","DOI":"10.1145\/1321440.1321461"},{"issue":"86","key":"751_CR38","first-page":"2579","volume":"9","author":"LVD Maaten","year":"2008","unstructured":"Maaten, L.V.D., Hinton, G.: Visualizing Data using t-SNE. J. Mach. Learn. Res. 9(86), 2579\u20132605 (2008)","journal-title":"J. Mach. Learn. Res."}],"container-title":["International Journal of Data Science and Analytics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41060-025-00751-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41060-025-00751-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41060-025-00751-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,25]],"date-time":"2025-09-25T10:49:21Z","timestamp":1758797361000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s41060-025-00751-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,8]]},"references-count":38,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,10]]}},"alternative-id":["751"],"URL":"https:\/\/doi.org\/10.1007\/s41060-025-00751-w","relation":{},"ISSN":["2364-415X","2364-4168"],"issn-type":[{"value":"2364-415X","type":"print"},{"value":"2364-4168","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,8]]},"assertion":[{"value":"6 April 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 February 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 April 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}