0% found this document useful (0 votes)
4 views9 pages

Lec3 Backpropagation

The document discusses the construction and training of Multilayer Perceptrons (MLPs) using backpropagation, focusing on determining weights and biases to minimize expected error. It explains the process of estimating the true function through sampling and empirical loss, and introduces various loss functions for different tasks such as regression, binary classification, and multiclass classification. The learning algorithm is highlighted as a search in the hypothesis space, with a reference to further optimization techniques in neural networks.

Uploaded by

tahaebrahimie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views9 pages

Lec3 Backpropagation

The document discusses the construction and training of Multilayer Perceptrons (MLPs) using backpropagation, focusing on determining weights and biases to minimize expected error. It explains the process of estimating the true function through sampling and empirical loss, and introduces various loss functions for different tasks such as regression, binary classification, and multiclass classification. The learning algorithm is highlighted as a search in the hypothesis space, with a reference to further optimization techniques in neural networks.

Uploaded by

tahaebrahimie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Backpropagation

Fatemeh Seyyedsalehi

Sharif University of Technology

Spring 2025

Computer science - Sharif univ. Backpropagation 1/9


Constructing the MLP

MLPs are capable to represent any function


But how do we construct it?
▶ I.e., how do we determine the weights (and biases) of the network to
best represent a target function?
▶ Assuming that the architecture of the network is given
By minimizing expected error
Z
W = argmin e(f (X ; W ), t(X ))p(X ) dX
W X

= argmin E [e(f (X ; W ), t(X ))]


W
Computer science - Sharif univ. Backpropagation 2/9
Estimating the True Function

The true function t(x) is unknown, so sample it


▶ Basically, get input-output pairs for a number of samples of input
▶ i.e., preparing the training dataset
Estimate the function from the samples
The empirical estimate of the expected error is the average error over
the samples:
T
1 X
E [div(f (X ; W ), t(X ))] ≈ e(f (Xi ; W ), yi )
T
i=1

We can hope that minimizing the empirical loss will minimize the true
loss

Computer science - Sharif univ. Backpropagation 3/9


Training Models and Loss Functions

We seek parameters that produce the best possible mapping from


input to output for the task at hand.
A loss function or cost function returns a single number describing
the mismatch between:
▶ Model predictions f (X ; W )
▶ Ground-truth outputs yi
We shifted perspective to think of neural networks as computing of
probability distributions pr (y |W ) over the output space.
▶ This led to a principled approach for building loss functions.
▶ Maximizing the likelihood of the observed data under these
distributions.

Computer science - Sharif univ. Backpropagation 4/9


Example 1: Univariate Regression

The loss function is given by:


N
X
L[W ] = − log [p(yi |[xi , W ])]
i=1

Considering the conditional probability as a normal distribution, we


have:
" N #
(yi − f(xi ; W ))2
 
X 1
arg min − log √ exp −
W 2πσ 2 2σ 2
i=1
hP i
N
= arg min i=1 (yi − f (xi ; W ))2
W
Least squares!

Computer science - Sharif univ. Backpropagation 5/9


Example 2: Binary Classification

Bernoulli distribution is a suitable probability that can be defined over


the domain of such predictions

p(y |λ) = (1 − λ)1−y · λy

The neural network can be trained to predict the parameter λ.


N
X
L[W ] = ((1 − yi ) log[1 − f (xi ; W )] − yi log[f (xi ; W )])
i=1

Binary cross-entropy loss!

Computer science - Sharif univ. Backpropagation 6/9


Example 3: Multiclass Classification

Categorical distribution is a suitable one for this domain:


y ∈ {1, 2, ...k}
The neural network should predict k parameters λk ∈ [0, 1], summed
to 1.
Usually we use the Softmax function in this situation:
e zi
Softmax(zi ) = Pn zj
j=1 e

where zj s are outputs of the network.


multi-class cross-entropy loss!

Computer science - Sharif univ. Backpropagation 7/9


Our main problem

Computer science - Sharif univ. Backpropagation 8/9


The learning algorithm

Searching in the hypothesis space


Next: a course on optimization and how to do it neural networks.
Following slides are selected from Deep Learning course CMU 11-785.

Computer science - Sharif univ. Backpropagation 9/9

You might also like