2 Linear
2 Linear
Basics
Artificial Intelligence
Machine Learning
Deep
Learning
Task
Occlusions
Background clutter
Representation
?
Daniel Cremers Introduction to Deep Learning 11
Nearest Neighbor
NN classifier = dog
distance
Introduction to Deep Learning 12
Daniel Cremers
Nearest Neighbor
distance
Introduction to Deep Learning 13
Daniel Cremers
Nearest Neighbor
The Data NN Classifier 5NN Classifier
Task Experience
Image
Data
classification
DOG DOG
Model Params
CAT
CAT DOG
CAT
DOG DOG
Model Params
CAT
𝜃∗ = argmin σ𝑖 D(𝑀𝜃 𝐼𝑖 − 𝑌𝑖 )
𝜃
CAT DOG
“Distance” function {DOG, CAT}
Daniel Cremers Introduction to Deep Learning 19
Basic Recipe for Machine Learning
• Split your data
Task Experience
Performance
Image
measure Data
classification
Accuracy
DOG DOG
CAT
CAT DOG
DOG DOG
CAT
CAT DOG
DOG DOG
CAT
CAT DOG
Reinforcement learning
interaction
Agents Environment
Reinforcement learning
reward
Agents Environment
Reinforcement learning
reward
Agents Environment
𝒙
Daniel Cremers Introduction to Deep Learning 34
Linear Regression
Training
Testing
𝑦ො𝑖 = 𝑥𝑖𝑗 𝜃𝑗
𝑗=1
weights (i.e., model parameters)
𝜃0
𝒙
Daniel Cremers Introduction to Deep Learning 38
Linear Prediction
Outside Number of 𝑥3
𝑥1 𝜃1 𝜃3
temperature people
Temperature
𝜃2 of a building 𝜃4
Level of Sun
𝑥2 𝑥4
humidity exposure
Temperature MODEL
of the building 0.2
𝑦ො1 0.64
1 25 50 2 50
= ⋅ 0
𝑦ො2 1 − 10 50 0 10
1
0.14
Daniel Cremers Introduction to Deep Learning 42
Linear Prediction
How do we
obtain the
model?
Temperature MODEL
of the building 0.2
𝑦ො1 0.64
1 25 50 2 50
= ⋅ 0
𝑦ො2 1 − 10 50 0 10
1
0.14
Daniel Cremers Introduction to Deep Learning 43
How to Obtain the Model?
Labels (ground truth)
Data points
𝑦
𝐗
Optimization
Loss
function
Prediction:
Temperature
of the building
Prediction:
Temperature
of the building
𝑛 Objective function
Minimizing 1
𝐽 𝜽 = 𝑦ො𝑖 − 𝑦𝑖 2 Energy
𝑛
Daniel Cremers
𝑖=1
Introduction to Deep Learning
Cost function 48
Optimization: Linear Least Squares
• Linear least squares: an approach to fit a linear model
to the data
𝑛
1 2
min 𝐽 𝜽 = 𝑦ො𝑖 − 𝑦𝑖
𝜃 𝑛
𝑖=1
min 𝐽 𝜽 = 𝐗𝜽 − 𝒚 𝑇 (𝐗𝜽 − 𝒚)
𝜽
Convex
𝜕𝐽(𝜽)
=0
𝜕𝜽
Optimum
Daniel Cremers Introduction to Deep Learning 53
Optimization Details in the
exercise
𝜕𝐽(𝜃) session!
= 2𝐗 𝑇 𝐗𝜽 − 2𝐗 𝑇 𝐲 = 0
𝜕𝜃
𝜃 = 𝐗𝑇 𝐗 −1 𝐗 𝑇 𝐲
Controlled by parameter(s)
𝑝𝑚𝑜𝑑𝑒𝑙 (𝐲|𝐗, 𝜽)
“i.i.d.” assumption
2 2
Assuming 𝑖
𝑦 = 𝒩 𝐱 𝑖 𝜽, 𝜎 = 𝐱 𝑖 𝜽 + 𝒩(0, 𝜎 )
mean
Gaussian:
1 1
− 2 𝑦𝑖 −𝜇 2
𝑝 𝑦𝑖 = 𝑒 2𝜎 𝑦𝑖 ~ 𝒩(𝜇, 𝜎 2 )
2𝜋𝜎 2
Daniel Cremers Introduction to Deep Learning 64
Back to Linear Regression
𝑝 𝑦𝑖 𝐱 𝑖 , 𝜽 = ?
2 2
Assuming 𝑖
𝑦 = 𝒩 𝐱 𝑖 𝜽, 𝜎 = 𝐱 𝑖 𝜽 + 𝒩(0, 𝜎 )
mean
Gaussian:
1 1
− 2 𝑦𝑖 −𝜇 2
𝑝 𝑦𝑖 = 𝑒 2𝜎 𝑦𝑖 ~ 𝒩(𝜇, 𝜎 2 )
2𝜋𝜎 2
Daniel Cremers Introduction to Deep Learning 65
Back to Linear Regression
1 2
2 −1/2 − 𝑦𝑖 −𝐱 𝒊 𝜽
𝑝 𝑦𝑖 𝐱 𝑖 , 𝜽 = 2𝜋𝜎 𝑒 2𝜎2
2 2
Assuming 𝑖
𝑦 = 𝒩 𝐱 𝑖 𝜽, 𝜎 = 𝐱 𝑖 𝜽 + 𝒩(0, 𝜎 )
mean
Gaussian:
1 1
− 2 𝑦𝑖 −𝜇 2
𝑝 𝑦𝑖 = 𝑒 2𝜎 𝑦𝑖 ~ 𝒩(𝜇, 𝜎 2 )
2𝜋𝜎 2
Daniel Cremers Introduction to Deep Learning 66
Back to Linear Regression
1 2
2 −1/2 − 𝑦𝑖 −𝐱 𝒊 𝜽
𝑝 𝑦𝑖 𝐱 𝑖 , 𝜽 = 2𝜋𝜎 𝑒 2𝜎2
𝑛
Original
optimization 𝜽𝑴𝑳 = arg max log 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)
𝜽
problem 𝑖=1
𝑛 𝑛
1 2
1 2
− log 2𝜋𝜎 + − 2 𝑦𝑖 − 𝒙𝒊 𝜽
2 2𝜎
𝑖=1 𝑖=1
Matrix notation
𝑛 2
1 𝑇
− log 2𝜋𝜎 − 2 𝒚 − 𝑿𝜽 𝒚 − 𝑿𝜽
2 2𝜎
Daniel Cremers Introduction to Deep Learning 68
Back to Linear Regression
𝑛
𝜽 = 𝑿𝑇 𝑿 −1 𝑿𝑇 𝐲
𝜃1 Can be interpreted
𝑥1 Σ as a probability
𝜃2
0
𝑥2
𝑦ො𝑖 = 𝑝(𝑦𝑖 = 1|𝐱𝑖 , 𝜽)
𝜃1 Can be interpreted
𝑥1 Σ as a probability
𝜃2
0
𝑥2
𝑦ො𝑖 = 𝑝(𝑦𝑖 = 1|𝐱𝑖 , 𝜽)
• Overall loss 𝑛
1
𝐶 𝜃 = − ℒ 𝑦ො𝑖 , 𝑦𝑖
𝑛 𝑦ො𝑖 = 𝜎(𝐱 𝑖 𝜽)
𝑖=1
𝑛
Minimization 1
= − 𝑦𝑖 log 𝑦ො𝑖 + (1 − 𝑦𝑖 ) log(1 − 𝑦ො𝑖 )
𝑛
𝑖=1 79
Daniel Cremers Introduction to Deep Learning
Logistic Regression: Optimization
• No closed-form solution
Gradient descent –
later on!