Incremental Data Drifting: Evaluation Metrics, Data Generation, and Approach Comparison

Published: 25 July 2024


Incremental data drifting is a common problem when employing a machine-learning model in industrial applications. The underlying data distribution evolves gradually, e.g., users change their buying preferences on an E-commerce website over time. The problem needs to be addressed to obtain high performance. Right now, studies regarding incremental data drifting suffer from several issues. For one thing, there is a lack of clear-defined incremental drift datasets for examination. Existing efforts use either collected real datasets or synthetic datasets that show two obvious limitations. One is in particular when and of which type of drifts the distribution undergoes is unknown, and the other is that a simple synthesized dataset cannot reflect the complex representation we would normally face in the real world. For another, there lacks a well-defined protocol to evaluate a learner’s knowledge transfer capability on an incremental drift dataset. To provide a holistic discussion on these issues, we create approaches to generate datasets with specific drift types, and define a novel protocol for evaluation. Besides, we investigate recent advances in the transfer learning field, including Domain Adaptation and Lifelong Learning, and examine how they perform in the presence of incremental data drifting. The results unfold the relationships among drift types, knowledge preservation, and learning approaches.


  1. Incremental Data Drifting: Evaluation Metrics, Data Generation, and Approach Comparison



      Published: 25 July 2024
      Online AM: 24 May 2024
      Accepted: 28 February 2024
      Revised: 23 January 2024
      Received: 18 April 2023
      Published in TIST Volume 15, Issue 4

      Author Tags

      1. Concept drift
      2. incremental data drift
      3. data generation


      Funding Sources

      • National Science and Technology Council (NSTC) of Taiwan


