0% found this document useful (0 votes)

81 views37 pages

Data Science & Automation Master Course

The document outlines the course content for a Master's degree in data science and automation engineering, including introductions to various data science and machine learning techniques as well as industrial automation topics, with evaluation based on exams and an optional small data analysis project. The course is taught by Professor Mirko Mazzoleni at the University of Bergamo and covers both theoretical and practical applications of data science.

Uploaded by

a.steimers

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views37 pages

Data Science & Automation Master Course

Uploaded by

a.steimers

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Lesson 1.

DATA SCIENCE AND

AUTOMATION COURSE

MASTER DEGREE SMART

TECHNOLOGY ENGINEERING

Introduction
TEACHER
Mirko Mazzoleni
PLACE
University of Bergamo
Who I am
• Name: Mirko Mazzoleni

• Studies: Ph.D. Engineering and Applied Sciences at University of Bergamo (Control

specialization) + Master degree Computer Engineering (CE) at University of Bergamo

• Currently: Assistant Professor @ University of Bergamo

✓ System identification, machine learning, fault detection, condition monitoring

✓ System identification and data analysis (Master Degree Computer Engineering)
✓ Data science and automation (Master Degree Mechanical Engineering)

• Contact details:
✓ mirko.mazzoleni@unibg.it ✓ http://cal.unibg.it/ CAL research laboratory
✓ https://mirkomazzoleni.github.io/ ✓ https://www.facebook.com/calunibg/

2 /37
Course content 6. Decision trees

Part I: Data science 7. Neural networks

8. Machine vision
1. Introduction to data science
8.1 Classic approaches
1.1 The business perspective
8.2 Convolutional neural networks and deep
1.2 CRISP-DM process
learning
1.3 Supervised vs. unsupervised problems
9. Unsupervised learning
2. Linear regression
9.1 k-means clustering
3. Feasibility of learning
9.2 Principal Component Analysis
3.1 Bias-Variance tradeoff
10. Fault diagnosis
4. Logistic regression
10.1 Model-based fault diagnosis
5. Overfitting and regularization
10.2 Signal-based fault diagnosis
5.1 Validation and cross-validation
10.3 Data-driven fault diagnosis
5.2 Performance metrics

3 /37
Course content
Part II: Automation

12. Introduction to industrial automation 15. Structured text language

13. Introduction to PLC 16. Automatic PLC code generation

14. Ladder language 17. Laboratory experience

4 /37
Evaluation
• Written exam – 2 hours
Up to 25 points
• Theoretical open questions and exercises

+
• [OPTIONAL] Small data analysis project
(groups of max 3 people) Up to 6 points

5 /37
Data science projects in the CAL research group
1. Forecasting of sales volume (for food industry)

• Development of the data management platform

• Algorithm design
• Testing/validation

6 /37
Data science projects in the CAL research group
Plant disease
2. Image processing classification

People identification
and classification

Blimp

7 /37
Data science projects in the CAL research group
3. Fault diagnosis
Bearing inner
race fault

Ballscrew
jam in EMA

8 /37
Data science projects in the CAL research group
4. Industrial automation
ICT for remote mantainance Automatic transplant machine

9 /37
Outline
1. Introduction to data science

2. The business perspective and the CRISP-DM process

3. Supervised vs. unsupervised problems

10 /37
Outline
1. Introduction to data science

2. The business perspective and the CRISP-DM process

3. Supervised vs. unsupervised problems

11 /37
Why
Retail $0,8T
Travels $480B
Business value created by
Logistics $475B
the AI up to 2030 [1] Automotive & assembly $405B
Materials $300B
Advanced electronics & semiconductors $291B
Healthcare systems & services $267B
$13 High tech
Telecom
$267B
$174B

Trillions
Oil & gas $173B
Agricoulture $164B

• It is difficult to find an industrial sector that will not benefit from AI in the near future

12 /37
We will use the terms “machine learning”, “data mining”, “data science” quite

Why interchangeably in this course

Data science has been deemed as the sexiest job of the 21st century

• Virtually every aspect of business is now open to data collection (operations,

manufacturing, supply-chain management, customer behaviour, marketing campaigns)

• Collected information need to be analyzed properly in order to get actionable results

• A huge amount of data requires specific infrastructures to be handled

• A huge amount of data requires computational power to be analyzed

• We can let computers perform decisions given previous examples

• Rising of specific job titles

13 /37
Learning examples
Recent years: stunning breakthroughs in computer vision applications

14 /37
Learning examples
Recent years: stunning breakthroughs in computer vision applications

15 /37
What learning is about
Machine learning and data science are meaningful to be applied if:
1. A pattern exists
2. We cannot pin it down mathematically (an analytical solutions does not exists)
3. We have data on it

Assumption 1. and 2. are not mandatory:

• If a pattern does not exist, I do not learn anything

• If I can describe the pattern mathematically, I will not presumably learn the best relation
• The real constraint is assumption 3

16 /37
Data types
The data can have different formats. The most typical is that of a table

House
# bedrooms Price (1000$) • AIM: predict house prices
area(feet 2 )
523 1 115
645 1 150 Regression
708 2 210
1034 3 280 • The data can come from a
2290 4 355
database or from .csv, Excel files…
2545 4 440

A B Learn the relation from House area to Price

Learn the relation from House area AND
A B #bedrooms to Price
17 /37
Data types
Another type of data can be an image

Picture Label
• AIM: recognize if there is a cat in the image
Cat

Not cat Classification

Cat • Learn the relation from an image to a «class of

belonging» (cat vs. not cat)
Not cat

18 /37
Data are dirty
Garbage IN, garbage OUT
House
# bedrooms Price (1000$)
Data problems: area(feet 2 )
523 1 115
• Missing values 645 1 0,001
708 unknown 210
1034 3 unknown
• Not correct values
unknown 4 355
2545 unknown 440

Different data types

Structured data
Images, audio, text Not structured data

19 /37
Machine learning vs. data science
House area (feet 2 ) # bedrooms # bathrooms Recently renowed Price (1000$)

523 1 2 No 115
645 1 3 No 150
708 2 1 No 210
1034 3 3 Si 280
2290 4 4 No 355
2545 4 5 Si 440

A B
Machine learning Data science
• Predict B given A • Houses with 3 bathrooms are more expensive
Output: Code and than those with 2 bathrooms of the same size
• Running software program
• Recently renovated Output: Slide deck
(web site\ mobile app)
houses cost 15% more

20 /37
Machine learning vs. data science
Other tools
AI
ML

Deep
learning

Data science

21 /37
Outline
1. Introduction to data science

2. The business perspective and the CRISP-DM process

3. Supervised vs. unsupervised problems

22 /37
Data-analytic thinking Picture taken from [1]: Provost, Foster, and Tom Fawcett. “Data
Science for Business: What you need to know about data mining
and data-analytic thinking”. O'Reilly Media, Inc., 2013

Data-driven decision-making (DDD) refers to the practice

of basing decisions on the analysis of data, rather than
purely on intuition [2, 3]
• Some decisions can be made automatically (finance,
recommendations)

• Data engineering and processing is a fundamental

support to industrial analytics

• Data, and the capability to extract useful knowledge from

data, should be regarded as key strategic asset
✓ Need to invest to acquire the right data (even lose
money)
✓ Understand data science even if you will not do it

23 /37
Approaching a data mining problem
Cross Industry Standard Process for Data Mining
(CRISP-DM) https://mineracaodedados.files.wordpress.com/2012/04/the-crisp-
dm-model-the-new-blueprint-for-data-mining-shearer-colin.pdf

Iteration is the rule rather the exception:

• Business understanding
• Data understanding
• Data preparation
• Modeling
• Evaluation
• Deployment

24 /37
CRISP-DM: Business understanding
Cast the business problems into one or more data science problems

• Frame the problem such that one or more sub-problems involve

building models for a data mining task (classification, regression,
probability estimation, and so on)

• Think carefully about the use scenario

✓ What exactly do we want to do?
✓ How exactly would we do it?
✓ What parts of this use scenario constitute possible data mining models?

25 /37
CRISP-DM: Data understanding
Identify the available and needed data

• Costs/benefits of acquiring each source of data

• Are the data at disposal related to the business problem?

• Can we use a proxy for data that we can not have?

• As data understanding progresses, the solution paths may differ

26 /37
CRISP-DM: Data preparation
Clean and prepare data for use with algorithms

• Usually the algorithms we employ require data in a different

format with respect to the available one
✓ Convert string to numbers, infer missing data, import data from excel files, …

• Data preprocessing/cleaning/labeling [3] (most of data science project time is

spent here)

• Pay attention to not use historical data that will not be available when your model
will be used

27 /37
CRISP-DM: Modeling
Estimate a mathematical model to extract pattern from data

• In most cases, standard algorithms can be directly applied on

data

• The aim is to find a model in order to use it on unseen data

• The type of the model has to be chosen based on:

✓ What data mining task we want to solve
✓ Performance measures
✓ Availability of libraries for deployement

28 /37
CRISP-DM: Evaluation
Assess the validity of the results

• We could find patterns that exist only in the particular dataset

that we have at disposal (overfitting)

• The devised solution and the model’s decisions should the comprehensible by the
stakeholders

• Usually evaluation is performed before deploying. In this case, build environments

that closely mimic the real use scenario

• Evaluation can be performed also on-line (in production) [4]

29 /37
CRISP-DM: Deployment
Put the model (or the data mining steps) into production

• Usually requires to re-code the model, to make it compatible with

the existing technology
• This step can require quite investment in time. Usually the data science team builds a
propototype that is then passed to the development team

• For this reason, it is suggested to involve a member of the development team in the
early phases of the data science project

• Deployment can involve not only the final model, but also previous phases (data
collection, model building, evaluation)

30 /37
From business problems to data mining tasks
Each data science project is unique. The aim is to decompose the business problem
into subtasks for which a common approach exists.

There are many machine learning algorithms. However, they address a handful of tasks:

• Classification and class probability estimation • Profiling

• Regression • Link prediction
• Symilarity matching • Data reduction
• Clustering • Causal modeling
• Co-occurrence grouping

31 /37
Outline
1. Introduction to data science

2. The business perspective and the CRISP-DM process

3. Supervised vs. unsupervised problems

32 /37
Supervised vs unsupervised methods
A specific data science task can be tackled via a supervised or unsupervised approach

Unsupervised A B
“Do our customers naturally fall into different groups?”
There is no a specific target (or purpose) for the grouping. The aim is only to find similarities between
individuals

Supervised A B
“Can we find groups of customers who have particularly high likelihoods of canceling
their service soon after their contract expire?”
There is a specific target: find people who will leave when contract expires. In this case, there must be data
on the target. The value of the target for an individual is called label or class. We need a dataset of people
that we know they left (labeled dataset)

33 /37
Supervised vs unsupervised methods
• Classification and class probability estimation
• Regression Supervised
• Causal modeling
• Symilarity matching
• Link prediction Supervised or Unsupervised
• Data reduction
• Clustering
• Co-occurrence grouping Unsupervised
• Profiling

34 /37
Business problems as data science examples
Supervised Unsupervised

• Spam e-mail detection system • Market segmentation

• Credit approval • Market basket analysis

• Recognize objects in images • Language models (word2vec)

• Find the relation between house • Social network analysis

prices and house sizes
• Low-order data representations
• Predict the stock market
• Movies recommendation

Supervised or unsupervised
35 /37
Additional resources
MOOC Books

• Learning from data (Yaser S. Abu-Mostafa - EDX) • Data science for business (Foster Provost, Tom
Fawcett)

• Machine learning (Andrew Ng - Coursera)

• An Introduction to Statistical Learning, with
• Deep learning (Andrew Ng - Coursera) application in R (Gareth James, Daniela Witten, Trevor
Hastie and Robert Tibshirani)

• The analytics edge (Dimitris Bertsimas - EDX) • Neural Networks and Deep Learning
(Michael Nielsen)
• Statistical learning (Trevor Hastie and
Robert Tibshirani - Standford Lagunita)
• P̂attern Recognition and Machine
Learning (Christopher Bishop)

36 /37
References
1. Notes from the AI frontier: Modeling the impact of AI on the world economy, 2018.
2. Provost, Foster, and Tom Fawcett. “Data Science for Business: What you need to know about data mining and
data-analytic thinking”. O'Reilly Media, Inc., 2013.
3. Brynjolfsson, E., Hitt, L. M., and Kim, H. H. “Strength in numbers: How does data driven decision making affect firm
performance?” Tech. rep., available at SSRN: http://ssrn.com/abstract=1819486, 2011.
4. Pyle, D. “Data Preparation for Data Mining”. Morgan Kaufmann, 1999.
5. Kohavi, R., and Longbotham, R. “Online experiments: Lessons learned”. Computer, 40 (9), 103–105, 2007.
6. Abu-Mostafa, Yaser S., Malik Magdon-Ismail, and Hsuan-Tien Lin. ”Learning from data”. AMLBook, 2012.
7. Andrew Ng. ”Machine learning”. Coursera MOOC. (https://www.coursera.org/learn/machine-learning)

37 /37

Unit 1
No ratings yet
Unit 1
34 pages
Data Mining and BI - Student Notes 2
No ratings yet
Data Mining and BI - Student Notes 2
40 pages
01-Introduction To Data Science
No ratings yet
01-Introduction To Data Science
17 pages
Introduction to Data Science Course
No ratings yet
Introduction to Data Science Course
9 pages
Data Science with Python Guide
No ratings yet
Data Science with Python Guide
25 pages
Lecture 1 & 2
No ratings yet
Lecture 1 & 2
53 pages
Internship Report: T.J.Instituteoftechnology
No ratings yet
Internship Report: T.J.Instituteoftechnology
29 pages
Dsbda Unit1
No ratings yet
Dsbda Unit1
232 pages
DS3 Data Science Introduction
No ratings yet
DS3 Data Science Introduction
18 pages
Datsci A2
No ratings yet
Datsci A2
80 pages
Project Report
No ratings yet
Project Report
29 pages
Contact For The Course: - Instructor: Dr. Kauser Ahmed P
No ratings yet
Contact For The Course: - Instructor: Dr. Kauser Ahmed P
54 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
Data Science Overview and Applications
No ratings yet
Data Science Overview and Applications
13 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Lecture 1 - Introduction To Data Science
No ratings yet
Lecture 1 - Introduction To Data Science
14 pages
Chapter 1
No ratings yet
Chapter 1
85 pages
Hammad Raza.
No ratings yet
Hammad Raza.
28 pages
File of ML
No ratings yet
File of ML
42 pages
GE 461 Introduction To Data Science: Spring 2021
No ratings yet
GE 461 Introduction To Data Science: Spring 2021
39 pages
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
No ratings yet
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
28 pages
Mds101 Unit 1
No ratings yet
Mds101 Unit 1
6 pages
PAM - Complete
No ratings yet
PAM - Complete
322 pages
Data Analytics for MBA Students
No ratings yet
Data Analytics for MBA Students
50 pages
Chapter 1
No ratings yet
Chapter 1
62 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
3 pages
DS Unit 1 - ABM
No ratings yet
DS Unit 1 - ABM
103 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
Week 12 Intro To DS and ML
No ratings yet
Week 12 Intro To DS and ML
67 pages
Data Science Report - Compress
No ratings yet
Data Science Report - Compress
31 pages
Data Science
100% (2)
Data Science
52 pages
Data Science Intro
No ratings yet
Data Science Intro
52 pages
Data+Science+in+Python+ +Data+Prep+&+EDA
No ratings yet
Data+Science+in+Python+ +Data+Prep+&+EDA
196 pages
Class 2 - Lifecycle ML Concepts in Ds
No ratings yet
Class 2 - Lifecycle ML Concepts in Ds
22 pages
Case Study Data Science Business
100% (1)
Case Study Data Science Business
805 pages
Predictive Analytics Modelling (21CSH-440) : Apex Institute of Technology
No ratings yet
Predictive Analytics Modelling (21CSH-440) : Apex Institute of Technology
42 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
54 pages
Kadir
No ratings yet
Kadir
84 pages
DTS Modul Data Science Methodology
100% (1)
DTS Modul Data Science Methodology
56 pages
Ds Final
No ratings yet
Ds Final
3 pages
Intro DA and ML Lecture 1 - S-2
No ratings yet
Intro DA and ML Lecture 1 - S-2
17 pages
Data Science Training Insights
No ratings yet
Data Science Training Insights
32 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
30 pages
Industry 4.0 & AI in Data Management
No ratings yet
Industry 4.0 & AI in Data Management
8 pages
Lecture 1
No ratings yet
Lecture 1
29 pages
Applied Data Science With Machine Learning
100% (2)
Applied Data Science With Machine Learning
21 pages
Intro To Data-Science Final
No ratings yet
Intro To Data-Science Final
3 pages
Data Science & Business Basics Guide
No ratings yet
Data Science & Business Basics Guide
35 pages
Introduction
No ratings yet
Introduction
20 pages
Fundamentals of Data Science Unit 1
No ratings yet
Fundamentals of Data Science Unit 1
33 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
31 pages
Machine Learning Unit-1.1
No ratings yet
Machine Learning Unit-1.1
29 pages
Introduction To Data Science and Machine Learning
No ratings yet
Introduction To Data Science and Machine Learning
30 pages
Introduction To Data Science Course Outline
No ratings yet
Introduction To Data Science Course Outline
5 pages
File
No ratings yet
File
27 pages
English Learning Exercises
No ratings yet
English Learning Exercises
1 page
Career Myths for Young Adults
No ratings yet
Career Myths for Young Adults
7 pages
Ahmed ALQatbi CV (1) (1127) 2023
No ratings yet
Ahmed ALQatbi CV (1) (1127) 2023
3 pages
R1 - Chapter 2
No ratings yet
R1 - Chapter 2
4 pages
PETA - Graphic Organizer
No ratings yet
PETA - Graphic Organizer
2 pages
Research Problem Selection Guide
No ratings yet
Research Problem Selection Guide
3 pages
Module 11 Use of Tech
No ratings yet
Module 11 Use of Tech
11 pages
Benchmark - Model Building
No ratings yet
Benchmark - Model Building
4 pages
Interviewing For Employment and Following Up
No ratings yet
Interviewing For Employment and Following Up
26 pages
Infographic Sensation and Perception
No ratings yet
Infographic Sensation and Perception
2 pages
GST 111 Course Outline
No ratings yet
GST 111 Course Outline
2 pages
1 - Anticipation Guide
No ratings yet
1 - Anticipation Guide
2 pages
Efficient Services in The Industry 40 and Intellig
No ratings yet
Efficient Services in The Industry 40 and Intellig
6 pages
2025 Etp Mooc Syllabus
No ratings yet
2025 Etp Mooc Syllabus
11 pages
Theories of Personality Matrix: Prepared by
No ratings yet
Theories of Personality Matrix: Prepared by
10 pages
Idioms and Expressions 1
No ratings yet
Idioms and Expressions 1
12 pages
Understanding Vague Language
No ratings yet
Understanding Vague Language
4 pages
Introduction to Game Theory
No ratings yet
Introduction to Game Theory
5 pages
Intro - Philosophy by Claus Emmeche
No ratings yet
Intro - Philosophy by Claus Emmeche
35 pages
Use Case Diagrams Explained
No ratings yet
Use Case Diagrams Explained
2 pages
MGT501 Topic 1 and Intro
No ratings yet
MGT501 Topic 1 and Intro
26 pages
Dictation Paper
No ratings yet
Dictation Paper
12 pages
CCE3 - KNOWLEDGE REPRESENTATION AND ML DL With Answer
No ratings yet
CCE3 - KNOWLEDGE REPRESENTATION AND ML DL With Answer
46 pages
Affermative To Negative
No ratings yet
Affermative To Negative
6 pages
Performance Appraisal in Nursing
50% (2)
Performance Appraisal in Nursing
7 pages
GT Learning Journal 2 Think Ahead Interactive ENG UK
No ratings yet
GT Learning Journal 2 Think Ahead Interactive ENG UK
33 pages
Revisiting Museum Collections Toolkit
No ratings yet
Revisiting Museum Collections Toolkit
30 pages
User Experience Design Portfolio
No ratings yet
User Experience Design Portfolio
29 pages
Spanish Stu Textbook M1
100% (4)
Spanish Stu Textbook M1
242 pages
Jaegwon Kim - Supervenience and Mind
100% (4)
Jaegwon Kim - Supervenience and Mind
395 pages