0% found this document useful (0 votes)

23 views83 pages

2 Linear

The document provides an introduction to machine learning, focusing on concepts such as image classification, supervised and unsupervised learning, and linear regression. It discusses the basic recipe for machine learning, including data splitting, model training, and optimization techniques. Additionally, it covers the importance of hyperparameters and loss functions in model performance.

Uploaded by

Ahmed N. Faisal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views83 pages

2 Linear

Uploaded by

Ahmed N. Faisal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 83

Machine Learning

Basics

Daniel Cremers Introduction to Deep Learning 1

AI vs ML vs DL

Artificial Intelligence

Machine Learning

Deep
Learning

Daniel Cremers Introduction to Deep Learning 2

A Simple Task:
Image Classification

Daniel Cremers Introduction to Deep Learning 3

Image Classification

Task

Daniel Cremers Introduction to Deep Learning 4

Image Classification

Daniel Cremers Introduction to Deep Learning 5

Image Classification

Occlusions

Daniel Cremers Introduction to Deep Learning 6

Image Classification

Background clutter

Daniel Cremers Introduction to Deep Learning 7

Pose Illumination Appearance
Daniel Cremers Introduction to Deep Learning 8
Image Classification

Representation

Daniel Cremers Introduction to Deep Learning 9

A Simple Classifier

Daniel Cremers Introduction to Deep Learning 10

Nearest Neighbor

?
Daniel Cremers Introduction to Deep Learning 11
Nearest Neighbor
NN classifier = dog

distance
Introduction to Deep Learning 12
Daniel Cremers
Nearest Neighbor

k-NN classifier = cat

distance
Introduction to Deep Learning 13
Daniel Cremers
Nearest Neighbor
The Data NN Classifier 5NN Classifier

How does the NN classifier perform on training data?

What classifier is more likely to perform best on test data?
What are we actually learning?
Daniel Cremers Introduction Source:
to Deephttps://commons.wikimedia.org/wiki/File:Data3classes.png
Learning 14
Nearest Neighbor
L1 distance : |𝑥 − 𝑐|
• Hyperparameters L2 distance : ||𝑥 − 𝑐||2
No. of Neighbors: 𝑘

• These parameters are problem dependent.

• How do we choose these hyperparameters?

Daniel Cremers Introduction to Deep Learning 15

Machine Learning for
Classification

Daniel Cremers Introduction to Deep Learning 16

Machine Learning
• How can we learn to perform image classification?

Task Experience

Image
Data
classification

Daniel Cremers Introduction to Deep Learning 17

Machine Learning
• 𝑀𝜃 𝐼 = {DOG, CAT}
CAT

Model Image Class Label

DOG DOG
Model Params

CAT

CAT DOG

Daniel Cremers Introduction to Deep Learning 18

Machine Learning
• 𝑀𝜃 𝐼 = {DOG, CAT} Given 𝒊 images with train labels

CAT

Model Image Class Label

DOG DOG
Model Params

CAT
𝜃∗ = argmin σ𝑖 D(𝑀𝜃 𝐼𝑖 − 𝑌𝑖 )
𝜃
CAT DOG
“Distance” function {DOG, CAT}
Daniel Cremers Introduction to Deep Learning 19
Basic Recipe for Machine Learning
• Split your data

60% 20% 20%

train validation test

Find model params 𝜃

Other splits are also possible (e.g., 80%/10%/10%)

Daniel Cremers Introduction to Deep Learning 20
Basic Recipe for Machine Learning
• Split your data

60% 20% 20%

train validation test

Find your hyperparameters

Other splits are also possible (e.g., 80%/10%/10%)

Daniel Cremers Introduction to Deep Learning 21
Basic Recipe for Machine Learning
• Split your data

60% 20% 20%

train validation test

Find your hyperparameters

Test set is only used once!

Daniel Cremers Introduction to Deep Learning 22

Machine Learning
• How can we learn to perform image classification?

Task Experience
Performance
Image
measure Data
classification

Accuracy

Daniel Cremers Introduction to Deep Learning 23

Machine Learning
Unsupervised learning Supervised learning

• Labels or target classes

Daniel Cremers Introduction to Deep Learning 24

Machine Learning
Unsupervised learning Supervised learning
CAT

DOG DOG

CAT

CAT DOG

Daniel Cremers Introduction to Deep Learning 25

Machine Learning
Unsupervised learning Supervised learning
CAT
• No label or target class
• Find out properties of
the structure of the DOG DOG
data
CAT
• Clustering (k-means,
PCA, etc.) CAT DOG

Daniel Cremers Introduction to Deep Learning 26

Machine Learning
Unsupervised learning Supervised learning
CAT

DOG DOG

CAT

CAT DOG

Daniel Cremers Introduction to Deep Learning 27

Machine Learning
Unsupervised learning Supervised learning
CAT

DOG DOG

CAT

CAT DOG

Daniel Cremers Introduction to Deep Learning 28

Machine Learning
Unsupervised learning Supervised learning

Reinforcement learning

interaction
Agents Environment

Daniel Cremers Introduction to Deep Learning 29

Machine Learning
Unsupervised learning Supervised learning

Reinforcement learning

reward
Agents Environment

Daniel Cremers Introduction to Deep Learning 30

Machine Learning
Unsupervised learning Supervised learning

Reinforcement learning

reward
Agents Environment

Daniel Cremers Introduction to Deep Learning 31

Linear Decision Boundaries
Let’s start with a simple linear Model!

What are the pros

and cons for using
linear decision
boundaries?

Daniel Cremers Introduction to Deep Learning 32

Linear Regression

Daniel Cremers Introduction to Deep Learning 33

Linear Regression
• Supervised learning
• Find a linear model that explains a target 𝒚 given
inputs 𝒙
𝒚

𝒙
Daniel Cremers Introduction to Deep Learning 34
Linear Regression
Training

{𝒙1:𝑛 , 𝒚1:𝑛 } Learner 𝜽

Data points Model parameters

Input (e.g., image,

measurement) Labels
(e.g., cat/dog)

Daniel Cremers Introduction to Deep Learning 35

Linear Regression
can be parameters of a
Training Neural Network

{𝒙1:𝑛 , 𝒚1:𝑛 } Learner 𝜽

Data points Model parameters

Testing

𝑥𝑛+1 , 𝜽 Predictor 𝑦ො𝑛+1

Estimation
Daniel Cremers Introduction to Deep Learning 36
Linear Prediction
• A linear model is expressed in the form
𝑑 input dimension

𝑦ො𝑖 = ෍ 𝑥𝑖𝑗 𝜃𝑗
𝑗=1
weights (i.e., model parameters)

Input data, features

Daniel Cremers Introduction to Deep Learning 37

Linear Prediction
• A linear model is expressed in the form
𝑑

𝑦ො𝑖 = 𝜃0 + ෍ 𝑥𝑖𝑗 𝜃𝑗 = 𝜃0 + 𝑥𝑖1 𝜃1 + 𝑥𝑖2 𝜃2 + ⋯ + 𝑥𝑖𝑑 𝜃𝑑

𝑗=1
𝒚
bias

𝜃0

𝒙
Daniel Cremers Introduction to Deep Learning 38
Linear Prediction
Outside Number of 𝑥3
𝑥1 𝜃1 𝜃3
temperature people

Temperature
𝜃2 of a building 𝜃4
Level of Sun
𝑥2 𝑥4
humidity exposure

Daniel Cremers Introduction to Deep Learning 39

Linear Prediction
𝑦ො1 𝑥11 ⋯ 𝑥1𝑑 𝜃1
𝑦ො2 𝑥21 ⋯ 𝑥2𝑑 𝜃
= 𝜃0 + ⋮ ⋱ ⋮ ∙ 2
⋮ ⋮
𝑦ො𝑛 𝑥𝑛1 ⋯ 𝑥𝑛𝑑 𝜃𝑑

𝑦ො1 1 𝑥11 ⋯ 𝑥1𝑑 𝜃0

𝑦ො2 𝑥21 ⋯ 𝑥2𝑑 𝜃1
= 1 ⋮ ⋱ ⋮
֜ 𝐲ො = 𝐗𝜽
⋮ ⋮ ⋮
𝑦ො𝑛 1 𝑥𝑛1 ⋯ 𝑥𝑛𝑑 𝜃𝑑

Daniel Cremers Introduction to Deep Learning 40

Linear Prediction
𝐲ො = 𝐗𝜽 Input features
Prediction (one sample has 𝑑
features)

𝑦ො1 1 𝑥11 ⋯ 𝑥1𝑑 𝜃0

𝑦ො2 𝑥21 ⋯ 𝑥2𝑑 𝜃1
= 1 ⋮ ⋱ ⋮ Model
⋮ ⋮ ⋮
𝑦ො𝑛 1 𝑥𝑛1 ⋯ 𝑥𝑛𝑑 𝜃𝑑 parameters
(𝑑 weights and 1 bias)

Daniel Cremers Introduction to Deep Learning 41

Linear Prediction

Temperature MODEL
of the building 0.2
𝑦ො1 0.64
1 25 50 2 50
= ⋅ 0
𝑦ො2 1 − 10 50 0 10
1
0.14
Daniel Cremers Introduction to Deep Learning 42
Linear Prediction
How do we
obtain the
model?
Temperature MODEL
of the building 0.2
𝑦ො1 0.64
1 25 50 2 50
= ⋅ 0
𝑦ො2 1 − 10 50 0 10
1
0.14
Daniel Cremers Introduction to Deep Learning 43
How to Obtain the Model?
Labels (ground truth)
Data points
𝑦
𝐗
Optimization
Loss
function

Model parameters Estimation

𝜃 𝑦ො
Daniel Cremers Introduction to Deep Learning 44
How to Obtain the Model?

• Loss function: measures how good my estimation is

(how good my model is) and tells the optimization
method how to make it better.

• Optimization: changes the model in order to improve

the loss function (i.e., to improve my estimation).

Daniel Cremers Introduction to Deep Learning 45

Linear Regression: Loss Function

Prediction:
Temperature
of the building

Daniel Cremers Introduction to Deep Learning 46

Linear Regression: Loss Function

Prediction:
Temperature
of the building

Daniel Cremers Introduction to Deep Learning 47

Linear Regression: Loss Function

𝑛 Objective function
Minimizing 1
𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖 2 Energy
𝑛
Daniel Cremers
𝑖=1
Introduction to Deep Learning
Cost function 48
Optimization: Linear Least Squares
• Linear least squares: an approach to fit a linear model
to the data
𝑛
1 2
min 𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖
𝜃 𝑛
𝑖=1

• Convex problem, there exists a closed-form solution

that is unique.

Daniel Cremers Introduction to Deep Learning 49

Optimization: Linear Least Squares
𝑛 𝑛
1 2
1 2
min 𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖 = ෍ 𝐱 𝑖 𝜽 − 𝑦𝑖
𝜽 𝑛 𝑛
𝑖=1 𝑖=1

𝑛 training samples The estimation comes

from the linear model

Daniel Cremers Introduction to Deep Learning 50

Optimization: Linear Least Squares
𝑛 𝑛
1 2
1 2
min 𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖 = ෍ 𝐱 𝑖 𝜽 − 𝑦𝑖
𝜽 𝑛 𝑛
𝑖=1 𝑖=1

min 𝐽 𝜽 = 𝐗𝜽 − 𝒚 𝑇 (𝐗𝜽 − 𝒚) Matrix notation

𝜽

𝑛 training samples, 𝑛 labels

each input vector has
size 𝑑

Daniel Cremers Introduction to Deep Learning 51

Optimization: Linear Least Squares
𝑛 𝑛
1 2
1 2
min 𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖 = ෍ 𝐱 𝑖 𝜽 − 𝑦𝑖
𝜽 𝑛 𝑛
𝑖=1 𝑖=1

min 𝐽 𝜽 = 𝐗𝜽 − 𝒚 𝑇 (𝐗𝜽 − 𝒚) Matrix notation

𝜽

Daniel Cremers Introduction to Deep Learning 52

Optimization: Linear Least Squares
𝑛 𝑛
1 2
1 2
min 𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖 = ෍ 𝐱 𝑖 𝜽 − 𝑦𝑖
𝜽 𝑛 𝑛
𝑖=1 𝑖=1

min 𝐽 𝜽 = 𝐗𝜽 − 𝒚 𝑇 (𝐗𝜽 − 𝒚)
𝜽

Convex
𝜕𝐽(𝜽)
=0
𝜕𝜽

Optimum
Daniel Cremers Introduction to Deep Learning 53
Optimization Details in the
exercise
𝜕𝐽(𝜃) session!
= 2𝐗 𝑇 𝐗𝜽 − 2𝐗 𝑇 𝐲 = 0
𝜕𝜃

𝜃 = 𝐗𝑇 𝐗 −1 𝐗 𝑇 𝐲

We have found True output:

an analytical Inputs: Outside
Temperature of
solution to a temperature,
the building
convex problem number of people,
…
Daniel Cremers Introduction to Deep Learning 54
Is this the best Estimate?
• Least squares estimate
𝑛
1 2
𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖
𝑛
𝑖=1

Daniel Cremers Introduction to Deep Learning 55

Maximum Likelihood

Daniel Cremers Introduction to Deep Learning 56

Maximum Likelihood Estimate

𝑝𝑑𝑎𝑡𝑎 (𝐲|𝐗) True underlying distribution

𝑝𝑚𝑜𝑑𝑒𝑙 (𝐲|𝐗, 𝜽) Parametric family of distributions

Controlled by parameter(s)

Daniel Cremers Introduction to Deep Learning 57

Maximum Likelihood Estimate
• A method of estimating the parameters of a statistical
model given observations,

𝑝𝑚𝑜𝑑𝑒𝑙 (𝐲|𝐗, 𝜽)

Observations from 𝑝𝑑𝑎𝑡𝑎 (𝐲|𝐗)

Daniel Cremers Introduction to Deep Learning 58

Maximum Likelihood Estimate
• A method of estimating the parameters of a statistical
model given observations, by finding the parameter
values that maximize the likelihood of making the
observations given the parameters.

𝜽𝑴𝑳 = arg max 𝑝𝑚𝑜𝑑𝑒𝑙 (𝐲|𝐗, 𝜽)

𝜽

Daniel Cremers Introduction to Deep Learning 59

Maximum Likelihood Estimate
• MLE assumes that the training samples are
independent and generated by the same probability
distribution
𝑛

𝑝𝑚𝑜𝑑𝑒𝑙 𝐲 𝐗, 𝜽 = ෑ 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)

𝑖=1

“i.i.d.” assumption

Daniel Cremers Introduction to Deep Learning 60

Maximum Likelihood Estimate
𝑛

𝜽𝑴𝑳 = arg max ෑ 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)

𝜽
𝑖=1

𝜽𝑴𝑳 = arg max ෍ log 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)

𝜽
𝑖=1

Logarithmic property log 𝑎𝑏 = log 𝑎 + log 𝑏

Daniel Cremers Introduction to Deep Learning 61

Back to Linear Regression
𝑛

𝜽𝑴𝑳 = arg max ෍ log 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)

𝜽
𝑖=1

What shape does our

probability distribution
have?

Daniel Cremers Introduction to Deep Learning 62

Back to Linear Regression
𝑝(𝑦𝑖 |𝐱 𝑖 , 𝜽) What shape does our probability
distribution have?

Daniel Cremers Introduction to Deep Learning 63

Back to Linear Regression
Gaussian / Normal
𝑝(𝑦𝑖 |𝐱 𝑖 , 𝜽) distribution

2 2
Assuming 𝑖
𝑦 = 𝒩 𝐱 𝑖 𝜽, 𝜎 = 𝐱 𝑖 𝜽 + 𝒩(0, 𝜎 )

mean
Gaussian:
1 1
− 2 𝑦𝑖 −𝜇 2
𝑝 𝑦𝑖 = 𝑒 2𝜎 𝑦𝑖 ~ 𝒩(𝜇, 𝜎 2 )
2𝜋𝜎 2
Daniel Cremers Introduction to Deep Learning 64
Back to Linear Regression
𝑝 𝑦𝑖 𝐱 𝑖 , 𝜽 = ?

2 2
Assuming 𝑖
𝑦 = 𝒩 𝐱 𝑖 𝜽, 𝜎 = 𝐱 𝑖 𝜽 + 𝒩(0, 𝜎 )

mean
Gaussian:
1 1
− 2 𝑦𝑖 −𝜇 2
𝑝 𝑦𝑖 = 𝑒 2𝜎 𝑦𝑖 ~ 𝒩(𝜇, 𝜎 2 )
2𝜋𝜎 2
Daniel Cremers Introduction to Deep Learning 65
Back to Linear Regression
1 2
2 −1/2 − 𝑦𝑖 −𝐱 𝒊 𝜽
𝑝 𝑦𝑖 𝐱 𝑖 , 𝜽 = 2𝜋𝜎 𝑒 2𝜎2

2 2
Assuming 𝑖
𝑦 = 𝒩 𝐱 𝑖 𝜽, 𝜎 = 𝐱 𝑖 𝜽 + 𝒩(0, 𝜎 )

mean
Gaussian:
1 1
− 2 𝑦𝑖 −𝜇 2
𝑝 𝑦𝑖 = 𝑒 2𝜎 𝑦𝑖 ~ 𝒩(𝜇, 𝜎 2 )
2𝜋𝜎 2
Daniel Cremers Introduction to Deep Learning 66
Back to Linear Regression
1 2
2 −1/2 − 𝑦𝑖 −𝐱 𝒊 𝜽
𝑝 𝑦𝑖 𝐱 𝑖 , 𝜽 = 2𝜋𝜎 𝑒 2𝜎2

𝑛
Original
optimization 𝜽𝑴𝑳 = arg max ෍ log 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)
𝜽
problem 𝑖=1

Daniel Cremers Introduction to Deep Learning 67

Back 𝑛to Linear Regression
1 1
2 −2 − 2 𝑦𝑖 −𝒙𝒊 𝜽 2
෍ log 2𝜋𝜎 𝑒 2𝜎
𝑖=1 Canceling log and 𝑒

𝑛 𝑛
1 2
1 2
෍ − log 2𝜋𝜎 + ෍ − 2 𝑦𝑖 − 𝒙𝒊 𝜽
2 2𝜎
𝑖=1 𝑖=1
Matrix notation

𝑛 2
1 𝑇
− log 2𝜋𝜎 − 2 𝒚 − 𝑿𝜽 𝒚 − 𝑿𝜽
2 2𝜎
Daniel Cremers Introduction to Deep Learning 68
Back to Linear Regression
𝑛

𝜃𝑀𝐿 = arg max ෍ log 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)

𝜃
𝑖=1
𝑛 2
1 𝑇
− log 2𝜋𝜎 − 2 𝐲 − 𝐗𝜽 𝐲 − 𝐗𝜽
2 2𝜎
Details in the How can we find
𝜕𝐽(𝜽)
=0 the estimate of
exercise session! 𝜕𝜽 theta?

𝜽 = 𝑿𝑇 𝑿 −1 𝑿𝑇 𝐲

Daniel Cremers Introduction to Deep Learning 69

Linear Regression
• Maximum Likelihood Estimate (MLE) corresponds to
the Least Squares Estimate (given the assumptions)

• Introduced the concepts of loss function and

optimization to obtain the best model for regression

Daniel Cremers Introduction to Deep Learning 70

Image Classification

Daniel Cremers Introduction to Deep Learning 71

Regression vs Classification

• Regression: predict a continuous output value (e.g.,

temperature of a room)

• Classification: predict a discrete value

– Binary classification: output is either 0 or 1
– Multi-class classification: set of N classes

Daniel Cremers Introduction to Deep Learning 72

Logistic Regression
CAT classifier

Daniel Cremers Introduction to Deep Learning 73

Sigmoid for Binary Predictions
1
𝑥0 𝜎 𝑥 =
𝜃0 1 + 𝑒 −𝑥
1

𝜃1 Can be interpreted
𝑥1 Σ as a probability

𝜃2
0
𝑥2
𝑦ො𝑖 = 𝑝(𝑦𝑖 = 1|𝐱𝑖 , 𝜽)

Daniel Cremers Introduction to Deep Learning 74

Spoiler Alert: 1-Layer Neural Network
1
𝑥0 𝜎 𝑥 =
𝜃0 1 + 𝑒 −𝑥
1

𝜃1 Can be interpreted
𝑥1 Σ as a probability

𝜃2
0
𝑥2
𝑦ො𝑖 = 𝑝(𝑦𝑖 = 1|𝐱𝑖 , 𝜽)

Daniel Cremers Introduction to Deep Learning 75

Logistic Regression: Max. Likelihood
• Probability of a binary output
𝑛
𝑦𝑖 (1−𝑦𝑖 )
𝑝 y 𝐗, 𝜽 = 𝐲ො = ෑ 𝑦ො𝑖 1 − 𝑦ො𝑖
𝑖=1

• Maximum Likelihood Estimate 𝑦ො𝑖 = 𝑝(𝑦𝑖 = 1|𝐱𝑖 , 𝜽)

𝜽𝑴𝑳 = arg max log 𝑝 y 𝐗, 𝜽

𝜽

Daniel Cremers Introduction to Deep Learning 76

Logistic Regression: Loss Function
𝑛
𝑦𝑖 (1−𝑦𝑖 )
𝑝 y 𝐗, 𝜽 = 𝐲ො = ෑ 𝑦ො𝑖 1 − 𝑦ො𝑖
𝑖=1
𝑛
𝑦𝑖 (1−𝑦𝑖 )
log 𝑝 y 𝐗, 𝜽 = ෍ log 𝑦ො𝑖 1 − 𝑦ො𝑖
𝑖=1

= ෍ 𝑦𝑖 log 𝑦ො𝑖 + (1 − 𝑦𝑖 ) log(1 − 𝑦ො𝑖 )

𝑖=1
Daniel Cremers Introduction to Deep Learning 77
Logistic Regression: Loss Function
ℒ 𝑦ො𝑖 , 𝑦𝑖 = −[𝑦𝑖 log 𝑦ො𝑖 + 1 − 𝑦𝑖 log(1 − 𝑦ො𝑖 )]

Referred to as binary cross-entropy loss (BCE)

• Related to the multi-class loss you will see in this

course (also called softmax loss)

Daniel Cremers Introduction to Deep Learning 78

Logistic Regression: Optimization
• Loss for each training sample:
ℒ 𝑦ො𝑖 , 𝑦𝑖 = −[𝑦𝑖 log 𝑦ො𝑖 + (1 − 𝑦𝑖 ) log(1 − 𝑦ො𝑖 )]

• Overall loss 𝑛
1
𝐶 𝜃 = − ෍ ℒ 𝑦ො𝑖 , 𝑦𝑖
𝑛 𝑦ො𝑖 = 𝜎(𝐱 𝑖 𝜽)
𝑖=1

𝑛
Minimization 1
= − ෍ 𝑦𝑖 log 𝑦ො𝑖 + (1 − 𝑦𝑖 ) log(1 − 𝑦ො𝑖 )
𝑛
𝑖=1 79
Daniel Cremers Introduction to Deep Learning
Logistic Regression: Optimization
• No closed-form solution

• Make use of an iterative method → gradient descent

Gradient descent –
later on!

Daniel Cremers Introduction to Deep Learning 80

Insights from the first lecture
• We can learn from experience
-> Intelligence, certain ability to infer the future!

• Even linear models are often pretty good for

complex phenomena: e.g., weather:
– Linear combination of day-time, day-year etc. is often
pretty good

Daniel Cremers Introduction to Deep Learning 81

Next Lectures

• Next exercise session: Math Recap II

• Next Lecture: Lecture 3:

– Jumping towards our first Neural Networks and
Computational Graphs

Daniel Cremers Introduction to Deep Learning 82

References for further Reading
• Cross validation:
– https://medium.com/@zstern/k-fold-cross-validation-
explained-5aeba90ebb3
– https://towardsdatascience.com/train-test-split-and-
cross-validation-in-python-80b61beca4b6

• General Machine Learning book:

– Pattern Recognition and Machine Learning. C. Bishop.

Daniel Cremers Introduction to Deep Learning 83

3 Intro2nn
No ratings yet
3 Intro2nn
62 pages
8.augmentation and Regularization
No ratings yet
8.augmentation and Regularization
62 pages
6 Trainingnn
No ratings yet
6 Trainingnn
51 pages
7.losses and Activations
No ratings yet
7.losses and Activations
79 pages
5.scaling Optimization
No ratings yet
5.scaling Optimization
67 pages
4.optimization and Backprop
No ratings yet
4.optimization and Backprop
75 pages
Deep - Learning
No ratings yet
Deep - Learning
49 pages
Deep Learning Tutorial for Business
No ratings yet
Deep Learning Tutorial for Business
58 pages
2 Linear
No ratings yet
2 Linear
84 pages
03 Supervised Classification
No ratings yet
03 Supervised Classification
68 pages
A High-Bias, Low-Variance Introduction To Machine Learning For Physicists PDF
No ratings yet
A High-Bias, Low-Variance Introduction To Machine Learning For Physicists PDF
117 pages
Deep Learning - A Gentle Introduction
No ratings yet
Deep Learning - A Gentle Introduction
100 pages
Inductive Bias in Deep Learning
No ratings yet
Inductive Bias in Deep Learning
78 pages
DL Unit 1
No ratings yet
DL Unit 1
9 pages
R20!63!20ITC27 Deep Learning Lab Manual (Minor Proj 2) Dr.K.ramu
No ratings yet
R20!63!20ITC27 Deep Learning Lab Manual (Minor Proj 2) Dr.K.ramu
47 pages
DLbook
No ratings yet
DLbook
165 pages
Short Course On Deep Learning: Welcome!!
No ratings yet
Short Course On Deep Learning: Welcome!!
57 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
79 pages
MITx 6.86x Notes - MD
No ratings yet
MITx 6.86x Notes - MD
91 pages
10 Architectures
No ratings yet
10 Architectures
90 pages
A Selective Overview of Deep Learning: Jianqing Fan Cong Ma Yiqiao Zhong April 16, 2019
No ratings yet
A Selective Overview of Deep Learning: Jianqing Fan Cong Ma Yiqiao Zhong April 16, 2019
37 pages
Machine Learning A Lecture Note
100% (1)
Machine Learning A Lecture Note
111 pages
01 02 Intro
No ratings yet
01 02 Intro
11 pages
Deep Learning Basics (Lecture Notes) : Romain Tavenard
No ratings yet
Deep Learning Basics (Lecture Notes) : Romain Tavenard
49 pages
Cheatsheets For Deep Learning 1650192034
No ratings yet
Cheatsheets For Deep Learning 1650192034
95 pages
Machine Learning 2025
No ratings yet
Machine Learning 2025
111 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
34 pages
The Little Book of Deep Learning - (François Fleuret) - University of Geneva-2023.compressed
No ratings yet
The Little Book of Deep Learning - (François Fleuret) - University of Geneva-2023.compressed
163 pages
1803 08823 PDF
No ratings yet
1803 08823 PDF
122 pages
Lbdlu
No ratings yet
Lbdlu
168 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
163 pages
Introduction To Deep Learning AI 2025
No ratings yet
Introduction To Deep Learning AI 2025
78 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
195 pages
1 AI - Introduction and ML
No ratings yet
1 AI - Introduction and ML
32 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
167 pages
DL - M2 - Deep Feedforward NN
No ratings yet
DL - M2 - Deep Feedforward NN
97 pages
DNN - M2 - Deep Feedforward NN 23dec
No ratings yet
DNN - M2 - Deep Feedforward NN 23dec
97 pages
Deep Learning Course Introduction
No ratings yet
Deep Learning Course Introduction
34 pages
IS23A Chuong 7 Hocsau-Deep Learning v1
No ratings yet
IS23A Chuong 7 Hocsau-Deep Learning v1
44 pages
Mehta Et Al. - 2019 - A High-Bias, Low-Variance Introduction To Machine PDF
No ratings yet
Mehta Et Al. - 2019 - A High-Bias, Low-Variance Introduction To Machine PDF
116 pages
Deep Learning A Tutorial
No ratings yet
Deep Learning A Tutorial
16 pages
Lecture 4 - Deep Learning Introduction
No ratings yet
Lecture 4 - Deep Learning Introduction
63 pages
Lecture Notes 2016
No ratings yet
Lecture Notes 2016
132 pages
Machine Learning for Beginners
No ratings yet
Machine Learning for Beginners
32 pages
Simple Introduction of Neural Network
No ratings yet
Simple Introduction of Neural Network
28 pages
AI-Lecture 8 (Machine Learning Overview)
No ratings yet
AI-Lecture 8 (Machine Learning Overview)
42 pages
Coding Neural Networks-Classification & Regression
No ratings yet
Coding Neural Networks-Classification & Regression
39 pages
DeepML Master 0 1 OrganIntro
No ratings yet
DeepML Master 0 1 OrganIntro
24 pages
LBDL
No ratings yet
LBDL
142 pages
Unit 1
No ratings yet
Unit 1
38 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
98 pages
CS230
No ratings yet
CS230
101 pages
Deep Learning Essentials
100% (1)
Deep Learning Essentials
140 pages
Lecture 02 - Neural Networks - 4p
No ratings yet
Lecture 02 - Neural Networks - 4p
10 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
140 pages
Applying Statistical Learning Theory To Deep Learning
No ratings yet
Applying Statistical Learning Theory To Deep Learning
51 pages
Machine Learning The Basics
No ratings yet
Machine Learning The Basics
158 pages
Thebook PDF
No ratings yet
Thebook PDF
234 pages
Exercise 2a-Macalinao, Joshua M.
No ratings yet
Exercise 2a-Macalinao, Joshua M.
2 pages
Thien My HW 10 Random Process
No ratings yet
Thien My HW 10 Random Process
3 pages
Syilfi, Dwi Ispriyanti, Diah Safitri: Analisis Regresi Linier Piecewise Dua Segmen
No ratings yet
Syilfi, Dwi Ispriyanti, Diah Safitri: Analisis Regresi Linier Piecewise Dua Segmen
11 pages
Final Exam Random Signals and Noise
No ratings yet
Final Exam Random Signals and Noise
7 pages
Chapter 4
No ratings yet
Chapter 4
7 pages
(Ebook PDF) Statistics: A Tool For Social Research, 4th Canadian Edition by Joseph Healey - Quickly Download The Ebook To Explore The Full Content
100% (2)
(Ebook PDF) Statistics: A Tool For Social Research, 4th Canadian Edition by Joseph Healey - Quickly Download The Ebook To Explore The Full Content
46 pages
MID Exam Probability
No ratings yet
MID Exam Probability
2 pages
2018-A Reduced New Modified Weibull Distribution
No ratings yet
2018-A Reduced New Modified Weibull Distribution
27 pages
ISAS - Tool Version 5.3: Method and Configuration
No ratings yet
ISAS - Tool Version 5.3: Method and Configuration
12 pages
Beginner Guide To Fragility Vulnerability and Risk
No ratings yet
Beginner Guide To Fragility Vulnerability and Risk
149 pages
TB frq5
No ratings yet
TB frq5
13 pages
MAT3003 Probability Statistics and Reliability
No ratings yet
MAT3003 Probability Statistics and Reliability
2 pages
CIVE1219 Transport 3 Assignment2
No ratings yet
CIVE1219 Transport 3 Assignment2
38 pages
Econometricstutorials Exam QuestionsSelectedAnswers
100% (1)
Econometricstutorials Exam QuestionsSelectedAnswers
11 pages
Random Variable
No ratings yet
Random Variable
10 pages
Business Statistics Question Bank
100% (1)
Business Statistics Question Bank
4 pages
STAT 414-27.3-Applications in Practice
No ratings yet
STAT 414-27.3-Applications in Practice
4 pages
15 GIT All Exercises
100% (1)
15 GIT All Exercises
162 pages
Power System State Estimation Guide
No ratings yet
Power System State Estimation Guide
18 pages
PT0323 - 034-039 - PeerReview (Web Watermark)
No ratings yet
PT0323 - 034-039 - PeerReview (Web Watermark)
5 pages
Topics in Circular Statistics-Vol 5.
No ratings yet
Topics in Circular Statistics-Vol 5.
336 pages
Week 10C
No ratings yet
Week 10C
25 pages
Statistics Quiz: Key Concepts & Problems
No ratings yet
Statistics Quiz: Key Concepts & Problems
4 pages
Chapter 3
100% (2)
Chapter 3
19 pages
Econometric Model With Qualitative Variables - 2
No ratings yet
Econometric Model With Qualitative Variables - 2
20 pages
Topic 1v5
No ratings yet
Topic 1v5
34 pages
Understanding Normal Distribution
No ratings yet
Understanding Normal Distribution
29 pages
General Statistics 4th Edition Warren Chase Fred Bown Full
100% (1)
General Statistics 4th Edition Warren Chase Fred Bown Full
106 pages
EViews Help - Panel Unit Root Testing
No ratings yet
EViews Help - Panel Unit Root Testing
10 pages
Burdick2008. Gauge Repeatability and Reproducibility Study, Misclassification Ratespdf
No ratings yet
Burdick2008. Gauge Repeatability and Reproducibility Study, Misclassification Ratespdf
4 pages

2 Linear

Uploaded by

2 Linear

Uploaded by

Machine Learning

Daniel Cremers Introduction to Deep Learning 1

Daniel Cremers Introduction to Deep Learning 2

Daniel Cremers Introduction to Deep Learning 3

Daniel Cremers Introduction to Deep Learning 4

Daniel Cremers Introduction to Deep Learning 5

Daniel Cremers Introduction to Deep Learning 6

Daniel Cremers Introduction to Deep Learning 7

Daniel Cremers Introduction to Deep Learning 9

Daniel Cremers Introduction to Deep Learning 10

k-NN classifier = cat

How does the NN classifier perform on training data?

• These parameters are problem dependent.

• How do we choose these hyperparameters?

Daniel Cremers Introduction to Deep Learning 15

Daniel Cremers Introduction to Deep Learning 16

Daniel Cremers Introduction to Deep Learning 17

Model Image Class Label

Daniel Cremers Introduction to Deep Learning 18

Model Image Class Label

60% 20% 20%

Find model params 𝜃

Other splits are also possible (e.g., 80%/10%/10%)

60% 20% 20%

Find your hyperparameters

Other splits are also possible (e.g., 80%/10%/10%)

60% 20% 20%

Find your hyperparameters

Daniel Cremers Introduction to Deep Learning 22

Daniel Cremers Introduction to Deep Learning 23

• Labels or target classes

Daniel Cremers Introduction to Deep Learning 24

Daniel Cremers Introduction to Deep Learning 25

Daniel Cremers Introduction to Deep Learning 26

Daniel Cremers Introduction to Deep Learning 27

Daniel Cremers Introduction to Deep Learning 28

Daniel Cremers Introduction to Deep Learning 29

Daniel Cremers Introduction to Deep Learning 30

Daniel Cremers Introduction to Deep Learning 31

What are the pros

Daniel Cremers Introduction to Deep Learning 32

Daniel Cremers Introduction to Deep Learning 33

{𝒙1:𝑛 , 𝒚1:𝑛 } Learner 𝜽

Input (e.g., image,

Daniel Cremers Introduction to Deep Learning 35

{𝒙1:𝑛 , 𝒚1:𝑛 } Learner 𝜽

𝑥𝑛+1 , 𝜽 Predictor 𝑦ො𝑛+1

Input data, features

Daniel Cremers Introduction to Deep Learning 37

𝑦ො𝑖 = 𝜃0 + ෍ 𝑥𝑖𝑗 𝜃𝑗 = 𝜃0 + 𝑥𝑖1 𝜃1 + 𝑥𝑖2 𝜃2 + ⋯ + 𝑥𝑖𝑑 𝜃𝑑

Daniel Cremers Introduction to Deep Learning 39

𝑦ො1 1 𝑥11 ⋯ 𝑥1𝑑 𝜃0

Daniel Cremers Introduction to Deep Learning 40

𝑦ො1 1 𝑥11 ⋯ 𝑥1𝑑 𝜃0

Daniel Cremers Introduction to Deep Learning 41

Model parameters Estimation

• Loss function: measures how good my estimation is

• Optimization: changes the model in order to improve

Daniel Cremers Introduction to Deep Learning 45

Daniel Cremers Introduction to Deep Learning 46

Daniel Cremers Introduction to Deep Learning 47

• Convex problem, there exists a closed-form solution

Daniel Cremers Introduction to Deep Learning 49

𝑛 training samples The estimation comes

Daniel Cremers Introduction to Deep Learning 50

min 𝐽 𝜽 = 𝐗𝜽 − 𝒚 𝑇 (𝐗𝜽 − 𝒚) Matrix notation

𝑛 training samples, 𝑛 labels

Daniel Cremers Introduction to Deep Learning 51

min 𝐽 𝜽 = 𝐗𝜽 − 𝒚 𝑇 (𝐗𝜽 − 𝒚) Matrix notation

More on matrix notation in the next exercise session

Daniel Cremers Introduction to Deep Learning 52

We have found True output:

Daniel Cremers Introduction to Deep Learning 55

Daniel Cremers Introduction to Deep Learning 56

𝑝𝑑𝑎𝑡𝑎 (𝐲|𝐗) True underlying distribution

𝑝𝑚𝑜𝑑𝑒𝑙 (𝐲|𝐗, 𝜽) Parametric family of distributions