0% found this document useful (0 votes)
23 views83 pages

2 Linear

The document provides an introduction to machine learning, focusing on concepts such as image classification, supervised and unsupervised learning, and linear regression. It discusses the basic recipe for machine learning, including data splitting, model training, and optimization techniques. Additionally, it covers the importance of hyperparameters and loss functions in model performance.

Uploaded by

Ahmed N. Faisal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views83 pages

2 Linear

The document provides an introduction to machine learning, focusing on concepts such as image classification, supervised and unsupervised learning, and linear regression. It discusses the basic recipe for machine learning, including data splitting, model training, and optimization techniques. Additionally, it covers the importance of hyperparameters and loss functions in model performance.

Uploaded by

Ahmed N. Faisal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Machine Learning

Basics

Daniel Cremers Introduction to Deep Learning 1


AI vs ML vs DL

Artificial Intelligence

Machine Learning

Deep
Learning

Daniel Cremers Introduction to Deep Learning 2


A Simple Task:
Image Classification

Daniel Cremers Introduction to Deep Learning 3


Image Classification

Task

Daniel Cremers Introduction to Deep Learning 4


Image Classification

Daniel Cremers Introduction to Deep Learning 5


Image Classification

Occlusions

Daniel Cremers Introduction to Deep Learning 6


Image Classification

Background clutter

Daniel Cremers Introduction to Deep Learning 7


Pose Illumination Appearance
Daniel Cremers Introduction to Deep Learning 8
Image Classification

Representation

Daniel Cremers Introduction to Deep Learning 9


A Simple Classifier

Daniel Cremers Introduction to Deep Learning 10


Nearest Neighbor

?
Daniel Cremers Introduction to Deep Learning 11
Nearest Neighbor
NN classifier = dog

distance
Introduction to Deep Learning 12
Daniel Cremers
Nearest Neighbor

k-NN classifier = cat

distance
Introduction to Deep Learning 13
Daniel Cremers
Nearest Neighbor
The Data NN Classifier 5NN Classifier

How does the NN classifier perform on training data?


What classifier is more likely to perform best on test data?
What are we actually learning?
Daniel Cremers Introduction Source:
to Deephttps://commons.wikimedia.org/wiki/File:Data3classes.png
Learning 14
Nearest Neighbor
L1 distance : |𝑥 − 𝑐|
• Hyperparameters L2 distance : ||𝑥 − 𝑐||2
No. of Neighbors: 𝑘

• These parameters are problem dependent.

• How do we choose these hyperparameters?

Daniel Cremers Introduction to Deep Learning 15


Machine Learning for
Classification

Daniel Cremers Introduction to Deep Learning 16


Machine Learning
• How can we learn to perform image classification?

Task Experience

Image
Data
classification

Daniel Cremers Introduction to Deep Learning 17


Machine Learning
• 𝑀𝜃 𝐼 = {DOG, CAT}
CAT

Model Image Class Label

DOG DOG
Model Params

CAT

CAT DOG

Daniel Cremers Introduction to Deep Learning 18


Machine Learning
• 𝑀𝜃 𝐼 = {DOG, CAT} Given 𝒊 images with train labels

CAT

Model Image Class Label

DOG DOG
Model Params

CAT
𝜃∗ = argmin σ𝑖 D(𝑀𝜃 𝐼𝑖 − 𝑌𝑖 )
𝜃
CAT DOG
“Distance” function {DOG, CAT}
Daniel Cremers Introduction to Deep Learning 19
Basic Recipe for Machine Learning
• Split your data

60% 20% 20%


train validation test

Find model params 𝜃

Other splits are also possible (e.g., 80%/10%/10%)


Daniel Cremers Introduction to Deep Learning 20
Basic Recipe for Machine Learning
• Split your data

60% 20% 20%


train validation test

Find your hyperparameters

Other splits are also possible (e.g., 80%/10%/10%)


Daniel Cremers Introduction to Deep Learning 21
Basic Recipe for Machine Learning
• Split your data

60% 20% 20%


train validation test

Find your hyperparameters


Test set is only used once!

Daniel Cremers Introduction to Deep Learning 22


Machine Learning
• How can we learn to perform image classification?

Task Experience
Performance
Image
measure Data
classification

Accuracy

Daniel Cremers Introduction to Deep Learning 23


Machine Learning
Unsupervised learning Supervised learning

• Labels or target classes

Daniel Cremers Introduction to Deep Learning 24


Machine Learning
Unsupervised learning Supervised learning
CAT

DOG DOG

CAT

CAT DOG

Daniel Cremers Introduction to Deep Learning 25


Machine Learning
Unsupervised learning Supervised learning
CAT
• No label or target class
• Find out properties of
the structure of the DOG DOG
data
CAT
• Clustering (k-means,
PCA, etc.) CAT DOG

Daniel Cremers Introduction to Deep Learning 26


Machine Learning
Unsupervised learning Supervised learning
CAT

DOG DOG

CAT

CAT DOG

Daniel Cremers Introduction to Deep Learning 27


Machine Learning
Unsupervised learning Supervised learning
CAT

DOG DOG

CAT

CAT DOG

Daniel Cremers Introduction to Deep Learning 28


Machine Learning
Unsupervised learning Supervised learning

Reinforcement learning

interaction
Agents Environment

Daniel Cremers Introduction to Deep Learning 29


Machine Learning
Unsupervised learning Supervised learning

Reinforcement learning

reward
Agents Environment

Daniel Cremers Introduction to Deep Learning 30


Machine Learning
Unsupervised learning Supervised learning

Reinforcement learning

reward
Agents Environment

Daniel Cremers Introduction to Deep Learning 31


Linear Decision Boundaries
Let’s start with a simple linear Model!

What are the pros


and cons for using
linear decision
boundaries?

Daniel Cremers Introduction to Deep Learning 32


Linear Regression

Daniel Cremers Introduction to Deep Learning 33


Linear Regression
• Supervised learning
• Find a linear model that explains a target 𝒚 given
inputs 𝒙
𝒚

𝒙
Daniel Cremers Introduction to Deep Learning 34
Linear Regression
Training

{𝒙1:𝑛 , 𝒚1:𝑛 } Learner 𝜽


Data points Model parameters

Input (e.g., image,


measurement) Labels
(e.g., cat/dog)

Daniel Cremers Introduction to Deep Learning 35


Linear Regression
can be parameters of a
Training Neural Network

{𝒙1:𝑛 , 𝒚1:𝑛 } Learner 𝜽


Data points Model parameters

Testing

𝑥𝑛+1 , 𝜽 Predictor 𝑦ො𝑛+1


Estimation
Daniel Cremers Introduction to Deep Learning 36
Linear Prediction
• A linear model is expressed in the form
𝑑 input dimension

𝑦ො𝑖 = ෍ 𝑥𝑖𝑗 𝜃𝑗
𝑗=1
weights (i.e., model parameters)

Input data, features

Daniel Cremers Introduction to Deep Learning 37


Linear Prediction
• A linear model is expressed in the form
𝑑

𝑦ො𝑖 = 𝜃0 + ෍ 𝑥𝑖𝑗 𝜃𝑗 = 𝜃0 + 𝑥𝑖1 𝜃1 + 𝑥𝑖2 𝜃2 + ⋯ + 𝑥𝑖𝑑 𝜃𝑑


𝑗=1
𝒚
bias

𝜃0

𝒙
Daniel Cremers Introduction to Deep Learning 38
Linear Prediction
Outside Number of 𝑥3
𝑥1 𝜃1 𝜃3
temperature people

Temperature
𝜃2 of a building 𝜃4
Level of Sun
𝑥2 𝑥4
humidity exposure

Daniel Cremers Introduction to Deep Learning 39


Linear Prediction
𝑦ො1 𝑥11 ⋯ 𝑥1𝑑 𝜃1
𝑦ො2 𝑥21 ⋯ 𝑥2𝑑 𝜃
= 𝜃0 + ⋮ ⋱ ⋮ ∙ 2
⋮ ⋮
𝑦ො𝑛 𝑥𝑛1 ⋯ 𝑥𝑛𝑑 𝜃𝑑

𝑦ො1 1 𝑥11 ⋯ 𝑥1𝑑 𝜃0


𝑦ො2 𝑥21 ⋯ 𝑥2𝑑 𝜃1
= 1 ⋮ ⋱ ⋮
֜ 𝐲ො = 𝐗𝜽
⋮ ⋮ ⋮
𝑦ො𝑛 1 𝑥𝑛1 ⋯ 𝑥𝑛𝑑 𝜃𝑑

Daniel Cremers Introduction to Deep Learning 40


Linear Prediction
𝐲ො = 𝐗𝜽 Input features
Prediction (one sample has 𝑑
features)

𝑦ො1 1 𝑥11 ⋯ 𝑥1𝑑 𝜃0


𝑦ො2 𝑥21 ⋯ 𝑥2𝑑 𝜃1
= 1 ⋮ ⋱ ⋮ Model
⋮ ⋮ ⋮
𝑦ො𝑛 1 𝑥𝑛1 ⋯ 𝑥𝑛𝑑 𝜃𝑑 parameters
(𝑑 weights and 1 bias)

Daniel Cremers Introduction to Deep Learning 41


Linear Prediction

Temperature MODEL
of the building 0.2
𝑦ො1 0.64
1 25 50 2 50
= ⋅ 0
𝑦ො2 1 − 10 50 0 10
1
0.14
Daniel Cremers Introduction to Deep Learning 42
Linear Prediction
How do we
obtain the
model?
Temperature MODEL
of the building 0.2
𝑦ො1 0.64
1 25 50 2 50
= ⋅ 0
𝑦ො2 1 − 10 50 0 10
1
0.14
Daniel Cremers Introduction to Deep Learning 43
How to Obtain the Model?
Labels (ground truth)
Data points
𝑦
𝐗
Optimization
Loss
function

Model parameters Estimation


𝜃 𝑦ො
Daniel Cremers Introduction to Deep Learning 44
How to Obtain the Model?

• Loss function: measures how good my estimation is


(how good my model is) and tells the optimization
method how to make it better.

• Optimization: changes the model in order to improve


the loss function (i.e., to improve my estimation).

Daniel Cremers Introduction to Deep Learning 45


Linear Regression: Loss Function

Prediction:
Temperature
of the building

Daniel Cremers Introduction to Deep Learning 46


Linear Regression: Loss Function

Prediction:
Temperature
of the building

Daniel Cremers Introduction to Deep Learning 47


Linear Regression: Loss Function

𝑛 Objective function
Minimizing 1
𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖 2 Energy
𝑛
Daniel Cremers
𝑖=1
Introduction to Deep Learning
Cost function 48
Optimization: Linear Least Squares
• Linear least squares: an approach to fit a linear model
to the data
𝑛
1 2
min 𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖
𝜃 𝑛
𝑖=1

• Convex problem, there exists a closed-form solution


that is unique.

Daniel Cremers Introduction to Deep Learning 49


Optimization: Linear Least Squares
𝑛 𝑛
1 2
1 2
min 𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖 = ෍ 𝐱 𝑖 𝜽 − 𝑦𝑖
𝜽 𝑛 𝑛
𝑖=1 𝑖=1

𝑛 training samples The estimation comes


from the linear model

Daniel Cremers Introduction to Deep Learning 50


Optimization: Linear Least Squares
𝑛 𝑛
1 2
1 2
min 𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖 = ෍ 𝐱 𝑖 𝜽 − 𝑦𝑖
𝜽 𝑛 𝑛
𝑖=1 𝑖=1

min 𝐽 𝜽 = 𝐗𝜽 − 𝒚 𝑇 (𝐗𝜽 − 𝒚) Matrix notation


𝜽

𝑛 training samples, 𝑛 labels


each input vector has
size 𝑑

Daniel Cremers Introduction to Deep Learning 51


Optimization: Linear Least Squares
𝑛 𝑛
1 2
1 2
min 𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖 = ෍ 𝐱 𝑖 𝜽 − 𝑦𝑖
𝜽 𝑛 𝑛
𝑖=1 𝑖=1

min 𝐽 𝜽 = 𝐗𝜽 − 𝒚 𝑇 (𝐗𝜽 − 𝒚) Matrix notation


𝜽

More on matrix notation in the next exercise session

Daniel Cremers Introduction to Deep Learning 52


Optimization: Linear Least Squares
𝑛 𝑛
1 2
1 2
min 𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖 = ෍ 𝐱 𝑖 𝜽 − 𝑦𝑖
𝜽 𝑛 𝑛
𝑖=1 𝑖=1

min 𝐽 𝜽 = 𝐗𝜽 − 𝒚 𝑇 (𝐗𝜽 − 𝒚)
𝜽

Convex
𝜕𝐽(𝜽)
=0
𝜕𝜽

Optimum
Daniel Cremers Introduction to Deep Learning 53
Optimization Details in the
exercise
𝜕𝐽(𝜃) session!
= 2𝐗 𝑇 𝐗𝜽 − 2𝐗 𝑇 𝐲 = 0
𝜕𝜃

𝜃 = 𝐗𝑇 𝐗 −1 𝐗 𝑇 𝐲

We have found True output:


an analytical Inputs: Outside
Temperature of
solution to a temperature,
the building
convex problem number of people,

Daniel Cremers Introduction to Deep Learning 54
Is this the best Estimate?
• Least squares estimate
𝑛
1 2
𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖
𝑛
𝑖=1

Daniel Cremers Introduction to Deep Learning 55


Maximum Likelihood

Daniel Cremers Introduction to Deep Learning 56


Maximum Likelihood Estimate

𝑝𝑑𝑎𝑡𝑎 (𝐲|𝐗) True underlying distribution

𝑝𝑚𝑜𝑑𝑒𝑙 (𝐲|𝐗, 𝜽) Parametric family of distributions

Controlled by parameter(s)

Daniel Cremers Introduction to Deep Learning 57


Maximum Likelihood Estimate
• A method of estimating the parameters of a statistical
model given observations,

𝑝𝑚𝑜𝑑𝑒𝑙 (𝐲|𝐗, 𝜽)

Observations from 𝑝𝑑𝑎𝑡𝑎 (𝐲|𝐗)

Daniel Cremers Introduction to Deep Learning 58


Maximum Likelihood Estimate
• A method of estimating the parameters of a statistical
model given observations, by finding the parameter
values that maximize the likelihood of making the
observations given the parameters.

𝜽𝑴𝑳 = arg max 𝑝𝑚𝑜𝑑𝑒𝑙 (𝐲|𝐗, 𝜽)


𝜽

Daniel Cremers Introduction to Deep Learning 59


Maximum Likelihood Estimate
• MLE assumes that the training samples are
independent and generated by the same probability
distribution
𝑛

𝑝𝑚𝑜𝑑𝑒𝑙 𝐲 𝐗, 𝜽 = ෑ 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)


𝑖=1

“i.i.d.” assumption

Daniel Cremers Introduction to Deep Learning 60


Maximum Likelihood Estimate
𝑛

𝜽𝑴𝑳 = arg max ෑ 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)


𝜽
𝑖=1

𝜽𝑴𝑳 = arg max ෍ log 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)


𝜽
𝑖=1

Logarithmic property log 𝑎𝑏 = log 𝑎 + log 𝑏

Daniel Cremers Introduction to Deep Learning 61


Back to Linear Regression
𝑛

𝜽𝑴𝑳 = arg max ෍ log 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)


𝜽
𝑖=1

What shape does our


probability distribution
have?

Daniel Cremers Introduction to Deep Learning 62


Back to Linear Regression
𝑝(𝑦𝑖 |𝐱 𝑖 , 𝜽) What shape does our probability
distribution have?

Daniel Cremers Introduction to Deep Learning 63


Back to Linear Regression
Gaussian / Normal
𝑝(𝑦𝑖 |𝐱 𝑖 , 𝜽) distribution

2 2
Assuming 𝑖
𝑦 = 𝒩 𝐱 𝑖 𝜽, 𝜎 = 𝐱 𝑖 𝜽 + 𝒩(0, 𝜎 )

mean
Gaussian:
1 1
− 2 𝑦𝑖 −𝜇 2
𝑝 𝑦𝑖 = 𝑒 2𝜎 𝑦𝑖 ~ 𝒩(𝜇, 𝜎 2 )
2𝜋𝜎 2
Daniel Cremers Introduction to Deep Learning 64
Back to Linear Regression
𝑝 𝑦𝑖 𝐱 𝑖 , 𝜽 = ?

2 2
Assuming 𝑖
𝑦 = 𝒩 𝐱 𝑖 𝜽, 𝜎 = 𝐱 𝑖 𝜽 + 𝒩(0, 𝜎 )

mean
Gaussian:
1 1
− 2 𝑦𝑖 −𝜇 2
𝑝 𝑦𝑖 = 𝑒 2𝜎 𝑦𝑖 ~ 𝒩(𝜇, 𝜎 2 )
2𝜋𝜎 2
Daniel Cremers Introduction to Deep Learning 65
Back to Linear Regression
1 2
2 −1/2 − 𝑦𝑖 −𝐱 𝒊 𝜽
𝑝 𝑦𝑖 𝐱 𝑖 , 𝜽 = 2𝜋𝜎 𝑒 2𝜎2

2 2
Assuming 𝑖
𝑦 = 𝒩 𝐱 𝑖 𝜽, 𝜎 = 𝐱 𝑖 𝜽 + 𝒩(0, 𝜎 )

mean
Gaussian:
1 1
− 2 𝑦𝑖 −𝜇 2
𝑝 𝑦𝑖 = 𝑒 2𝜎 𝑦𝑖 ~ 𝒩(𝜇, 𝜎 2 )
2𝜋𝜎 2
Daniel Cremers Introduction to Deep Learning 66
Back to Linear Regression
1 2
2 −1/2 − 𝑦𝑖 −𝐱 𝒊 𝜽
𝑝 𝑦𝑖 𝐱 𝑖 , 𝜽 = 2𝜋𝜎 𝑒 2𝜎2

𝑛
Original
optimization 𝜽𝑴𝑳 = arg max ෍ log 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)
𝜽
problem 𝑖=1

Daniel Cremers Introduction to Deep Learning 67


Back 𝑛to Linear Regression
1 1
2 −2 − 2 𝑦𝑖 −𝒙𝒊 𝜽 2
෍ log 2𝜋𝜎 𝑒 2𝜎
𝑖=1 Canceling log and 𝑒

𝑛 𝑛
1 2
1 2
෍ − log 2𝜋𝜎 + ෍ − 2 𝑦𝑖 − 𝒙𝒊 𝜽
2 2𝜎
𝑖=1 𝑖=1
Matrix notation

𝑛 2
1 𝑇
− log 2𝜋𝜎 − 2 𝒚 − 𝑿𝜽 𝒚 − 𝑿𝜽
2 2𝜎
Daniel Cremers Introduction to Deep Learning 68
Back to Linear Regression
𝑛

𝜃𝑀𝐿 = arg max ෍ log 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)


𝜃
𝑖=1
𝑛 2
1 𝑇
− log 2𝜋𝜎 − 2 𝐲 − 𝐗𝜽 𝐲 − 𝐗𝜽
2 2𝜎
Details in the How can we find
𝜕𝐽(𝜽)
=0 the estimate of
exercise session! 𝜕𝜽 theta?

𝜽 = 𝑿𝑇 𝑿 −1 𝑿𝑇 𝐲

Daniel Cremers Introduction to Deep Learning 69


Linear Regression
• Maximum Likelihood Estimate (MLE) corresponds to
the Least Squares Estimate (given the assumptions)

• Introduced the concepts of loss function and


optimization to obtain the best model for regression

Daniel Cremers Introduction to Deep Learning 70


Image Classification

Daniel Cremers Introduction to Deep Learning 71


Regression vs Classification

• Regression: predict a continuous output value (e.g.,


temperature of a room)

• Classification: predict a discrete value


– Binary classification: output is either 0 or 1
– Multi-class classification: set of N classes

Daniel Cremers Introduction to Deep Learning 72


Logistic Regression
CAT classifier

Daniel Cremers Introduction to Deep Learning 73


Sigmoid for Binary Predictions
1
𝑥0 𝜎 𝑥 =
𝜃0 1 + 𝑒 −𝑥
1

𝜃1 Can be interpreted
𝑥1 Σ as a probability

𝜃2
0
𝑥2
𝑦ො𝑖 = 𝑝(𝑦𝑖 = 1|𝐱𝑖 , 𝜽)

Daniel Cremers Introduction to Deep Learning 74


Spoiler Alert: 1-Layer Neural Network
1
𝑥0 𝜎 𝑥 =
𝜃0 1 + 𝑒 −𝑥
1

𝜃1 Can be interpreted
𝑥1 Σ as a probability

𝜃2
0
𝑥2
𝑦ො𝑖 = 𝑝(𝑦𝑖 = 1|𝐱𝑖 , 𝜽)

Daniel Cremers Introduction to Deep Learning 75


Logistic Regression: Max. Likelihood
• Probability of a binary output
𝑛
𝑦𝑖 (1−𝑦𝑖 )
𝑝 y 𝐗, 𝜽 = 𝐲ො = ෑ 𝑦ො𝑖 1 − 𝑦ො𝑖
𝑖=1

• Maximum Likelihood Estimate 𝑦ො𝑖 = 𝑝(𝑦𝑖 = 1|𝐱𝑖 , 𝜽)

𝜽𝑴𝑳 = arg max log 𝑝 y 𝐗, 𝜽


𝜽

Daniel Cremers Introduction to Deep Learning 76


Logistic Regression: Loss Function
𝑛
𝑦𝑖 (1−𝑦𝑖 )
𝑝 y 𝐗, 𝜽 = 𝐲ො = ෑ 𝑦ො𝑖 1 − 𝑦ො𝑖
𝑖=1
𝑛
𝑦𝑖 (1−𝑦𝑖 )
log 𝑝 y 𝐗, 𝜽 = ෍ log 𝑦ො𝑖 1 − 𝑦ො𝑖
𝑖=1

= ෍ 𝑦𝑖 log 𝑦ො𝑖 + (1 − 𝑦𝑖 ) log(1 − 𝑦ො𝑖 )


𝑖=1
Daniel Cremers Introduction to Deep Learning 77
Logistic Regression: Loss Function
ℒ 𝑦ො𝑖 , 𝑦𝑖 = −[𝑦𝑖 log 𝑦ො𝑖 + 1 − 𝑦𝑖 log(1 − 𝑦ො𝑖 )]

Referred to as binary cross-entropy loss (BCE)

• Related to the multi-class loss you will see in this


course (also called softmax loss)

Daniel Cremers Introduction to Deep Learning 78


Logistic Regression: Optimization
• Loss for each training sample:
ℒ 𝑦ො𝑖 , 𝑦𝑖 = −[𝑦𝑖 log 𝑦ො𝑖 + (1 − 𝑦𝑖 ) log(1 − 𝑦ො𝑖 )]

• Overall loss 𝑛
1
𝐶 𝜃 = − ෍ ℒ 𝑦ො𝑖 , 𝑦𝑖
𝑛 𝑦ො𝑖 = 𝜎(𝐱 𝑖 𝜽)
𝑖=1

𝑛
Minimization 1
= − ෍ 𝑦𝑖 log 𝑦ො𝑖 + (1 − 𝑦𝑖 ) log(1 − 𝑦ො𝑖 )
𝑛
𝑖=1 79
Daniel Cremers Introduction to Deep Learning
Logistic Regression: Optimization
• No closed-form solution

• Make use of an iterative method → gradient descent

Gradient descent –
later on!

Daniel Cremers Introduction to Deep Learning 80


Insights from the first lecture
• We can learn from experience
-> Intelligence, certain ability to infer the future!

• Even linear models are often pretty good for


complex phenomena: e.g., weather:
– Linear combination of day-time, day-year etc. is often
pretty good

Daniel Cremers Introduction to Deep Learning 81


Next Lectures

• Next exercise session: Math Recap II

• Next Lecture: Lecture 3:


– Jumping towards our first Neural Networks and
Computational Graphs

Daniel Cremers Introduction to Deep Learning 82


References for further Reading
• Cross validation:
– https://medium.com/@zstern/k-fold-cross-validation-
explained-5aeba90ebb3
– https://towardsdatascience.com/train-test-split-and-
cross-validation-in-python-80b61beca4b6

• General Machine Learning book:


– Pattern Recognition and Machine Learning. C. Bishop.

Daniel Cremers Introduction to Deep Learning 83

You might also like