Introduction To Machine Learning
Introduction To Machine Learning
espace
Master ESA - University of Orléans
Christophe HURLIN
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 1 / 123
Introduction
Outline
1. Introduction
3. Basic Concepts of ML
4. ML Algorithms
5. Taxonomy of Data
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 2 / 123
Introduction
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 3 / 123
Introduction
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 4 / 123
Introduction
Recommended Readings
Varian, H. (2014). Big Data: New Tricks for Econometrics. Journal of Economic Perspec-
tives, Spring, 3–28.
Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J., and Mullainathan, S. (2018).
Human Decisions and Machine Predictions. Quarterly Journal of Economics, 133(1),
237–293.
Haghighi, M. , Joseph, A., Kapetanios, G., Kurz, C., Lenza, M., and Marcucci, J. (2024).
Machine Learning for Economic Policy. Journal of Econometrics, in press.
Desai, A. (2023). Machine Learning for Economics Research: When, What and How.
Bank of Canada, Staff Analytical Note 2023–16.
Athey, S. and Imbens, G. (2019). Machine Learning Methods That Economists Should
Know About. Annual Review of Economics, 11, 685–725.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 5 / 123
AI and ML: Key Definitions
Outline
1. Introduction
3. Basic Concepts of ML
4. ML Algorithms
5. Taxonomy of Data
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 6 / 123
AI and ML: Key Definitions
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 7 / 123
AI and ML: Key Definitions
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 8 / 123
AI and ML: Key Definitions
Remark: The terms Artificial Intelligence and Machine Learning are often confused. While AI is
the broader field, ML is a subfield of AI that focuses specifically on systems that learn from data.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 9 / 123
AI and ML: Key Definitions
Definition: Generative AI
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 10 / 123
AI and ML: Key Definitions
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 11 / 123
AI and ML: Key Definitions
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 12 / 123
AI and ML: Key Definitions
Deep Learning is a branch of Machine Learning based on artificial neural networks with
many layers (hence “deep”). These models are capable of learning complex patterns from
large amounts of data and are particularly effective in tasks such as image recognition,
natural language processing, and speech analysis.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 13 / 123
AI and ML: Key Definitions
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 14 / 123
AI and ML: Key Definitions
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 15 / 123
AI and ML: Key Definitions
A Large Language Model (LLM) is a deep learning model trained on massive corpora
of text data to understand, generate, and manipulate human language.
LLMs rely on advanced neural network architectures, typically based on transformers,
to model the statistical relationships between words, phrases, and contexts. They are
capable of a wide range of natural language processing (NLP) tasks, including:
• Text generation
• Summarization
• Translation
• Question answering
• Dialogue systems
LLMs are pre-trained on vast corpora and often fine-tuned for specific applications or
domains.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 16 / 123
AI and ML: Key Definitions
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 17 / 123
AI and ML: Key Definitions
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 18 / 123
AI and ML: Key Definitions
These models are trained on large text corpora and are deployed through APIs or inte-
grated into enterprise solutions.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 19 / 123
AI and ML: Key Definitions
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 20 / 123
AI and ML: Key Definitions
Key Concepts
1 Artificial Intelligence.
3 Machine Learning.
4 Deep Learning.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 21 / 123
Basic Concepts of ML
Outline
1. Introduction
3. Basic Concepts of ML
4. ML Algorithms
5. Taxonomy of Data
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 22 / 123
Basic Concepts of ML
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 23 / 123
Basic Concepts of ML
Terminology in ML (data):
• Target (dependent variable): The outcome variable yi to be predicted. Also called label in
classification problems.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 24 / 123
Basic Concepts of ML
In supervised ML, the data is typically organized into a matrix of features X ∈ Rn×d and a target
vector y ∈ Rn :
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 25 / 123
Basic Concepts of ML
Terminology in ML (model):
• Prediction function (hypothesis): The model’s output function, typically written as f̂ (x) or
ŷ , which approximates the relationship between inputs x and target y .
• Loss function: A function that measures the error between predicted values ŷi and actual
values yi , used to train the model. Common examples: squared error, cross-entropy.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 26 / 123
Basic Concepts of ML
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 27 / 123
Basic Concepts of ML
In Machine Learning, tasks are categorized into different learning modes based on the
structure of the data and the type of feedback available:
Supervised Learning: The algorithm is trained on labeled data, meaning each input xi is associ-
ated with an output yi . The goal is to learn a function f that maps inputs to outputs.
Unsupervised Learning: The data is unlabeled. The goal is to discover hidden patterns or struc-
tures in the input data xi , such as clusters or latent factors.
Semi-supervised Learning: Combines a small amount of labeled data with a large amount of
unlabeled data. The algorithm leverages both to improve learning accuracy.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 28 / 123
Basic Concepts of ML
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 29 / 123
Basic Concepts of ML
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 30 / 123
Basic Concepts of ML Supervised Learning
Learning Modes
Supervised Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 31 / 123
Basic Concepts of ML Supervised Learning
The objective is to learn a general mapping rule, called the model,that can be applied
to unseen data to produce accurate predictions.
• Classification: Predict a discrete label or category (e.g., spam vs. not spam).
• Regression: Predict a continuous numerical value (e.g., house price, GDP
growth).
Once the model has been trained, it can be used to predict the output for new inputs
not encountered during training.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 32 / 123
Basic Concepts of ML Supervised Learning
The resulting model can then be used to classify new images that were never seen
during the training phase.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 33 / 123
Basic Concepts of ML Supervised Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 34 / 123
Basic Concepts of ML Supervised Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 35 / 123
Basic Concepts of ML Supervised Learning
Regression vs Classification
In supervised learning, two major types of problems are distinguished based on the
nature of the target variable (Y ):
• Regression: when the target variable is continuous. The goal is to predict a
numerical value from the explanatory variables.
Example: predicting the price of a house or the temperature.
• Classification: when the target variable is categorical. The goal is to assign each
observation to one or more predefined categories.
Example: detecting whether an email is spam or not, or classifying images of
fruits.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 36 / 123
Basic Concepts of ML Supervised Learning
Regression vs Classification
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 37 / 123
Basic Concepts of ML Supervised Learning
Supervised learning is the most widely used ML approach in economics and finance. It
is particularly suited to prediction tasks with labeled historical data. Typical applications
include:
• Credit risk prediction: estimating the probability of default using features from
loan applications or account behavior.
• Fraud detection: identifying anomalous or fraudulent transactions in real-time.
• Forecasting: predicting macroeconomic indicators (e.g., GDP growth, inflation) or
financial variables (e.g., stock returns, interest rates).
• Customer segmentation and targeting: classifying households or firms based
on consumption or investment profiles.
• Text classification: categorizing financial disclosures, news articles, or central
bank communications.
These applications often use structured datasets and require careful feature engineering
and validation.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 38 / 123
Basic Concepts of ML Supervised Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 39 / 123
Basic Concepts of ML Supervised Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 40 / 123
Basic Concepts of ML Supervised Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 41 / 123
Basic Concepts of ML Unsupervised Learning
Learning Modes
Unsupervised Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 42 / 123
Basic Concepts of ML Unsupervised Learning
Unsupervised Learning
Unsupervised learning refers to machine learning methods that work with unlabeled
data, meaning there is no predefined target variable.
The objective is not to predict a target, but to uncover hidden structures, patterns, or
associations within the data.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 43 / 123
Basic Concepts of ML Unsupervised Learning
The goal of unsupervised learning is to discover and extract useful information from data without
relying on labeled outputs.
Unsupervised learning is particularly useful for exploratory data analysis, allowing researchers to
reveal underlying structures or anomalies without prior assumptions.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 44 / 123
Basic Concepts of ML Unsupervised Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 45 / 123
Basic Concepts of ML Unsupervised Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 46 / 123
Basic Concepts of ML Unsupervised Learning
Unsupervised learning is widely applied in empirical economics and financial analysis for
pattern discovery and dimensionality reduction. Common use cases include:
• Clustering of consumers or firms: identifying latent segments based on
behavior, preferences, or financial indicators.
• Anomaly detection: detecting unusual patterns in transactions, financial
statements, or macroeconomic time series.
• Topic modeling: extracting themes from large text corpora such as news articles,
policy documents, or academic literature.
• Dimensionality reduction: simplifying large datasets (e.g., survey data or panel
data) using methods like PCA before visualization or modeling.
• Market structure analysis: uncovering patterns in product characteristics or
pricing strategies across firms.
These techniques are particularly valuable in exploratory analysis and for preprocessing
data in supervised pipelines.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 47 / 123
Basic Concepts of ML Unsupervised Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 48 / 123
Basic Concepts of ML Semi-supervised Learning
Learning Modes
Semi-supervised Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 49 / 123
Basic Concepts of ML Semi-supervised Learning
Semi-Supervised Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 50 / 123
Basic Concepts of ML Semi-supervised Learning
Semi-Supervised Learning
Semi-supervised methods are widely used in applications such as image classification, natural
language processing, and fraud detection, where unlabeled data is abundant but labels are scarce.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 51 / 123
Basic Concepts of ML Semi-supervised Learning
Semi-Supervised Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 52 / 123
Basic Concepts of ML Semi-supervised Learning
Semi-Supervised Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 53 / 123
Basic Concepts of ML Semi-supervised Learning
Semi-supervised learning is useful in situations where labeled data are scarce or costly
to obtain, but large amounts of unlabeled data are available. In economics and finance,
this setting is common. Typical applications include:
• Credit scoring: using a small set of labeled defaults with a large pool of
unlabeled accounts to improve risk prediction.
• Fraud detection: combining a few confirmed fraud cases with a broad set of
transactions to identify suspicious patterns.
• Text classification: labeling economic or financial documents (e.g., reports,
reviews, complaints) using limited annotations.
• Customer segmentation: leveraging sparse labeled information to guide
clustering and behavioral profiling.
• Predicting regulatory outcomes: using partial outcomes from past decisions to
generalize over unlabeled policy cases.
These methods allow analysts to reduce annotation costs while maintaining predictive
accuracy, making them attractive for applied research and industry practice.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 54 / 123
Basic Concepts of ML Semi-supervised Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 55 / 123
Basic Concepts of ML Semi-supervised Learning
Although not covered in this training program, several strategies are commonly used in
semi-supervised learning (SSL):
• Self-training: A supervised model is trained on labeled data and then used to
generate pseudo-labels for the unlabeled data. The most confident predictions are
iteratively added to the training set.
• Label Spreading: Labels from labeled examples are propagated to unlabeled
ones based on similarity. This approach assumes that nearby points (e.g., in
terms of nearest neighbors) are likely to share the same label.
These techniques exploit the structure of the data to improve learning performance, even
when labeled data is scarce.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 56 / 123
Basic Concepts of ML Reinforcement Learning
Learning Modes
Reinforcement Learning
Source: Wikipedia
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 57 / 123
Basic Concepts of ML Reinforcement Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 58 / 123
Basic Concepts of ML Reinforcement Learning
Reinforcement Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 59 / 123
Basic Concepts of ML Reinforcement Learning
Applications of RL in Economics
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 60 / 123
Basic Concepts of ML Reinforcement Learning
Families of RL Algorithms
These algorithms are often trained using simulation or interaction with complex environ-
ments.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 61 / 123
Basic Concepts of ML Other Types of Learning
Learning Modes
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 62 / 123
Basic Concepts of ML Other Types of Learning
In this training, we focus on the foundations of ML and will briefly mention some of these
paradigms without detailed coverage.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 63 / 123
Basic Concepts of ML Other Types of Learning
Deep Learning
Deep Learning is a subfield of machine learning that uses multi-layered neural networks to auto-
matically learn data representations.
Unlike traditional machine learning, deep learning eliminates the need for manual feature en-
gineering. Instead, the algorithm learns both the features and the decision rule directly from raw
data through successive layers of abstraction.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 64 / 123
Basic Concepts of ML Other Types of Learning
Natural Language Processing (NLP) refers to the branch of AI focused on enabling machines to
understand, interpret, generate, and respond to human language, both written and spoken.
NLP combines insights from computational linguistics, statistics, and machine learning, particularly
deep learning.
Source: Amazinum
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 65 / 123
Basic Concepts of ML Other Types of Learning
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 66 / 123
Basic Concepts of ML Summary of Learning Modes
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 67 / 123
Basic Concepts of ML Summary of Learning Modes
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 68 / 123
Basic Concepts of ML Summary of Learning Modes
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 69 / 123
Basic Concepts of ML Summary of Learning Modes
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 70 / 123
Basic Concepts of ML Summary of Learning Modes
Basic Concepts of ML
Key Concepts
1 Model feature.
2 Target or label.
3 Prediction function or hypothesis.
4 Loss function.
5 Regression vs. classification model.
6 Labelled vs. unlabelled data.
7 Supervised learning.
8 Unsupervised learning.
9 Semi-supervised learning.
10 Reinforcement learning.
11 Deep learning.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 71 / 123
ML Algorithms
Outline
1. Introduction
3. Basic Concepts of ML
4. ML Algorithms
5. Taxonomy of Data
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 72 / 123
ML Algorithms Most often used ML algorithm
What is an Algorithm?
Definition: Algorithm
In the context of ML, an algorithm defines the procedure by which a model is learned
from data. It specifies how the model’s parameters are estimated, updated, and opti-
mized based on a given objective function.
An algorithm is not the final model itself, but the method used to generate the model
from data.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 73 / 123
ML Algorithms Most often used ML algorithm
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 74 / 123
ML Algorithms Most often used ML algorithm
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 75 / 123
ML Algorithms Most often used ML algorithm
However, in economics and finance, some algorithms are more frequently employed in
practice. We now present those that are most commonly used in these fields for each
learning mode.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 76 / 123
ML Algorithms Most often used ML algorithm
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 77 / 123
ML Algorithms Most often used ML algorithm
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 78 / 123
ML Algorithms Most often used ML algorithm
Source: Scikit-Learn
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 79 / 123
ML Algorithms Most often used ML algorithm
Source: LinkedIn
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 80 / 123
ML Algorithms Most often used ML algorithm
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 81 / 123
ML Algorithms Most often used ML algorithm
Source: Kaggle surveys 2017-202, cited in Capellupo (2021), Toward Data Science.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 82 / 123
ML Algorithms What should be a ML formation for economists?
Transition Question
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 83 / 123
ML Algorithms What should be a ML formation for economists?
ML Background of Economists
Breiman, L., Friedman, J., Olshen, R. and C. Stone (1984), Classification and Regression
Trees, Wadsworth, Int. Group.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 84 / 123
ML Algorithms What should be a ML formation for economists?
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 85 / 123
ML Algorithms What should be a ML formation for economists?
Note: This figure shows the number of times each ML algorithm is used across 110 articles from a literature survey on credit scoring.
Source: Markov et al. (2022), Credit scoring methods: Latest trends and points to consider, Journal of Finance and Data Science, 8,
180-201.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 86 / 123
ML Algorithms What should be a ML formation for economists?
ML Algorithms
Key Concepts
3 Logistic Regression.
4 Decision Trees.
5 Random Forests.
6 Neural Networks.
7 Ensemble Methods.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 87 / 123
Taxonomy of Data
Outline
1. Introduction
3. Basic Concepts of ML
4. ML Algorithms
5. Taxonomy of Data
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 88 / 123
Taxonomy of Data
Typology of Data
Source: Luna-Reyes, L. F., Martin, E. G., and Ivonchyk, M. (2022). Data Analytics for Public Policy and Management. Pressbooks.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 89 / 123
Taxonomy of Data
Typology of Data
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 90 / 123
Taxonomy of Data
Quantitative Data
Quantitative data (or numerical data) refers to measurable information that takes numer-
ical values. It is typically divided into:
• Discrete quantitative data: distinct, countable values, usually integers.
• Continuous quantitative data: values that can take an infinite number of
possibilities within a given interval.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 91 / 123
Taxonomy of Data
Qualitative Data
Qualitative data (or categorical data) refers to non-numeric information that describes
attributes or characteristics. It can be divided into:
• Nominal qualitative data: categories without any intrinsic order.
• Ordinal qualitative data: categories with a meaningful order.
Examples of nominal data: Eye color (blue, green, brown), housing type (apartment,
house, studio).
Examples of ordinal data: Satisfaction level (dissatisfied, neutral, satisfied), school rank-
ing (first, second, third).
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 92 / 123
Taxonomy of Data
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 93 / 123
Taxonomy of Data
2. One-Hot Encoding:
1 Cat 1 0 0
2 Dog 0 1 0
3 Bird 0 0 1
4 Cat 1 0 0
5 Dog 0 1 0
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 94 / 123
Taxonomy of Data
A binary variable is a variable that can take only two values, typically 0 and 1, represent-
ing opposite states such as yes/no or true/false.
Note: Categorical variables with two categories can be directly encoded as 0 and 1.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 95 / 123
Taxonomy of Data
The goal is to build a credit scoring model to estimate the probability that a borrower
will default within the first 12 months of a loan contract.
• The dataset includes several characteristics of the borrower and the loan, which
serve as explanatory variables in the logistic regression model.
• The target variable is a binary default indicator equal to 1 in case of default and 0
otherwise.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 96 / 123
Taxonomy of Data
0 = No
0 = No
0 = Other
0 = Other
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 97 / 123
Taxonomy of Data
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 98 / 123
Taxonomy of Data
Many software tools require you to declare — or allow you to adjust — the data type when
importing a dataset. For example, in Python, variable types are automatically detected,
but it is possible (and often recommended) to specify them manually to ensure consis-
tency and accuracy.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 99 / 123
Taxonomy of Data
• Continuous variables (‘float‘, ‘int‘): Represent numeric values that can take a wide range of
values. Examples: Property price (‘float‘), age of an individual (‘int‘).
• Categorical variables (‘category‘, ‘object‘): Take a limited number of distinct values
representing categories. Examples: Type of contract (‘category‘), product color (‘object‘).
• Binary variables (‘bool‘, ‘category‘): Variables with only two levels (0/1 or True/False).
Examples: Homeownership (‘bool‘), default indicator (‘category‘).
• Ordinal variables (‘category‘ with order): Categorical variables with a natural order.
Examples: Satisfaction level (‘category‘ with defined order), movie rating (1 star, 2 stars,
etc.).
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 100 / 123
Taxonomy of Data
1 import pandas as pd
2
3 # Load the Excel file
4 file_path = "Scoring_data.xlsx"
5 df = pd.read_excel(file_path, sheet_name="Data")
6
7 # Show types detected automatically by pandas
8 print("Automatically detected types:")
9 print(df.dtypes)
10
11 # Define categorical (binary) and continuous variables
12 binary_variables = ["Default", "Down payment", "Credit event", "Married"
, "Homeowner"]
13 continuous_variables = ["Age", "Car price", "Funding amount", "Job
tenure", "Loan duration", "Monthly payment"]
14
15 # Convert binary variables to ’category’ (or use ’bool’ if preferred)
16 df[binary_variables] = df[binary_variables].astype("category")
17
18 # Show types after explicit conversion
19 print("\nTypes after explicit conversion:")
20 print(df.dtypes)
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 101 / 123
Taxonomy of Data
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 102 / 123
Taxonomy of Data
Typology of Data
Source: Luna-Reyes, L. F., Martin, E. G., and Ivonchyk, M. (2022). Data Analytics for Public Policy and Management. Pressbooks.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 103 / 123
Taxonomy of Data
Structured Data
Structured data refers to data that is predefined and formatted according to a specific
structure (typically in relational databases or tables) before being stored in a data ware-
house.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 104 / 123
Taxonomy of Data
Structured Data
A relational database is a system for storing and managing structured data, organized
into interrelated tables. Each table consists of rows (records) and columns (attributes),
clearly defining the structure of the data.
A relational database for a company storing employee information might contain the fol-
lowing tables:
• Employees table: (ID, Last Name, First Name, Position, Salary)
• Departments table: (ID, Department Name, Manager)
• Projects table: (ID, Project Name, Budget, Associated Department)
Relationships between these tables allow for efficient querying using the SQL language.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 105 / 123
Taxonomy of Data
Structured Data
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 106 / 123
Taxonomy of Data
Structured Data
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 107 / 123
Taxonomy of Data
SQL
Definition: SQL
SQL (Structured Query Language) is a programming language used to interact with rela-
tional databases.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 108 / 123
Taxonomy of Data
SQL
Source: medium.com
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 109 / 123
Taxonomy of Data
Typology of Data
Source: Luna-Reyes, L. F., Martin, E. G., and Ivonchyk, M. (2022). Data Analytics for Public Policy and Management. Pressbooks.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 110 / 123
Taxonomy of Data
Unstructured Data
Social media content, free-text data, and audio recordings are typical examples of un-
structured data. Unlike relational databases, these data require specialized technologies
such as NoSQL databases or Natural Language Processing (NLP) techniques to be
processed effectively.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 111 / 123
Taxonomy of Data
Unstructured Data
Source: Lawtomated, *Structured vs. Unstructured Data: What are they and why care?*, April 2019.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 112 / 123
Taxonomy of Data
Unstructured Data
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 113 / 123
Taxonomy of Data
Semi-Structured Data
Semi-structured data refers to information that does not follow a rigid schema like rela-
tional databases but still has an implicit organization through tags, metadata, or a hierar-
chical structure. It lies between structured and unstructured data.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 114 / 123
Taxonomy of Data
Semi-Structured Data
Source: Educba.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 115 / 123
Taxonomy of Data
Semi-Structured Data
Source: hackernoon.com.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 116 / 123
Taxonomy of Data
Semi-Structured Data
Source: hackernoon.com.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 117 / 123
Taxonomy of Data
Typology of Data
Source: Luna-Reyes, L. F., Martin, E. G., and Ivonchyk, M. (2022). Data Analytics for Public Policy and Management. Pressbooks.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 118 / 123
Taxonomy of Data
Big Data
Big Data refers to large, diverse, and high-velocity datasets that exceed the capabilities
of traditional data management tools. These datasets require advanced technologies for
collection, storage, processing, and visualization.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 119 / 123
Taxonomy of Data
Big Data
Source: Analytixlabs
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 120 / 123
Taxonomy of Data
High-Dimensional Setting
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 121 / 123
Taxonomy of Data
Taxonomy of Data
Key Concepts
1 Quantitative data.
2 Qualitative data.
4 Binary variable.
5 Structured data.
6 Relational databases.
8 Semi-structured data.
9 Unstructured data.
10 Big data.
12 High-dimensional setting.
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 122 / 123
Taxonomy of Data
End of Session
Christophe HURLIN (University of Orléans and IUF) Introduction to Machine Learning September 14, 2025 123 / 123