(Tutorial at IEEE ICDE 2025)
This hands-on tutorial provides a comprehensive introduction to key topics in data stream learning, combining theoretical foundations with practical demonstrations and code examples. Participants will explore essential concepts, including supervised learning for data streams, building efficient pipelines for online preprocessing and model training, detecting and visualizing concept drift, and applying anomaly detection algorithms to streaming data. We will also delve into the challenges and opportunities of AutoML for data streams and tackle practical concerns related to partially and delayed labeled data streams. The tutorial features CapyMOA, an open-source library that offers efficient algorithm implementations through a high-level Python API. Participants will gain hands-on experience using this tool, with all source code available at https://github.com/adaptive-machine-learning/CapyMOA and supporting tutorials and installation guides accessible at https://capymoa.org/. By the end of the session, attendees will be equipped with practical skills and tools to address real-world challenges in data stream learning.
In this hands-on tutorial, our aim is to familiarize participants with the application of various machine-learning tasks to streaming data. Alongside providing an introductory overview outlining the typical supervised learning cycle (classification and regression), and assumptions of this setting, we will focus on the following topics:
- Introduction to data stream learning and supervised tasks for stream learning;
- Pipelines for online preprocessing and supervised learning tasks;
- Concept drift detection, visualization and evaluation;
- Anomaly detection algorithms on streaming data;
- The limitations and opportunities w.r.t. AutoML for data streams; and
- Practical concerns when dealing with partially and delayed labeled data streams;