0% found this document useful (0 votes)
10 views33 pages

Part 2

The document outlines key concepts in artificial intelligence, focusing on neural networks, backpropagation algorithms, and various loss functions used in training. It explains the differences between AI, ML, and DL, as well as the significance of loss functions like regression loss, cross-entropy loss, and focal loss in optimizing neural networks. The document also emphasizes the iterative nature of training neural networks to minimize prediction errors.

Uploaded by

Mohi Gpt4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views33 pages

Part 2

The document outlines key concepts in artificial intelligence, focusing on neural networks, backpropagation algorithms, and various loss functions used in training. It explains the differences between AI, ML, and DL, as well as the significance of loss functions like regression loss, cross-entropy loss, and focal loss in optimizing neural networks. The document also emphasizes the iterative nature of training neural networks to minimize prediction errors.

Uploaded by

Mohi Gpt4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Arti cial Intelligence 2

CI 2 - S3

Pr. Hamza Alami

Academic year: 2024/2025


fi
Outline
1. Recap

2. Backpropagation algorithm

3. Loss functions

2 Hamza Alami CI2-S3 2024/2025


Recap
• What is the difference between IA, ML, and DL?

• What is a perceptron, MLP, universal approximator?

• What is the difference between Heaviside and sigmoid functions?

• What is the algorithm used to optimize NNs?

3 Hamza Alami CI2-S3 2024/2025


Outline
1. Recap

2. Backpropagation algorithm

3. Loss functions

4 Hamza Alami CI2-S3 2024/2025


Backpropagation Algorithm
• Let’s consider the following network:

b (0) b (1)

w (0) σ w (1) σ
X z (0) a (0) z (1) a (1)

1
( ŷ − ȳ)
2
2∑
loss_ function =

ŷ = a (L)
w (L)
z (L) σ a (L) loss_ function

b (L)

5 Hamza Alami
Backpropagation Algorithm
• We have the following: δ δ δz (l)
• And: =
(l)
z =w a (l) (l−1)
+b (l) δw (l) δz δw
(l) (l)
(l)
a = σ (z )
(l) (l)
(l) δz

(l) δ δw (l)
δ = (l) (l) (l−1)
δz =δ a
(l)
1 • And: δ δ δz
( ŷ − ȳ)
2
=
2∑
= loss_ function =
δb (l) δz (l) δb (l)
ŷ = a (L) (l)
(l) δz

𝕃
δb (l)
(l)

𝕃
6
𝕃
𝕃
𝕃
𝕃
Hamza Alami
Backpropagation Algorithm
• In the case of the last layer:
(L)
δ δ δz 1
( ŷ − ȳ)
2
= = loss_ function =
2∑
δw (L) δz (L) δw (L) (L) ŷ = a (L)
δz (L) (L) δ δa
=δ (L) δ = (L) (L)
δw (L) δa δz
(L) (L)
(L) (L−1)
=δ a = (ŷ − ȳ)a (1 − a )
(L)
δ δ δz
=
δb (L) δz δb
(L) (L)
(L)

𝕃
𝕃
7
𝕃
𝕃
𝕃
𝕃
Hamza Alami
Backpropagation Algorithm
• In the case of an arbitrary layer:
(l)
δ δ δz 1
( ŷ − ȳ)
2
= = loss_ function =
2∑
δw (l) δz (l) δw (l) ŷ = a (L)
(l)
(L) δz
=δ (l+1) (l)
δw (l) (l) δ δz δa
δ = (l+1)
(l) (l−1)
=δ a δz δa (l) δz (l)
(l+1) (l+1) (l) (l)
=δ w a (1 − a )
(l)
δ δ δz
=
δb (l) δz δb
(l) (l)
(l)

𝕃
𝕃
8
𝕃
𝕃
𝕃
𝕃
Hamza Alami
Backpropagation Algorithm
• Now lets consider with arbitrary inputs and hidden layers
(l)
z =W a (l) (l−1)
+b (l) • Now that a layer has multiple neurons
we have one auxiliary variable per neuron
a = σ (z )
(l) (l)

(l) δ
δj = (l)
∀j ∈ {0...number of neurons in layer l}
1 L δzj
= loss_ function = ∥a − ȳ∥2
2
𝕃
9 Hamza Alami
𝕃
Backpropagation Algorithm
• Auxiliary variable in the case of the last layer

(L)
δ δaj 1 L
= loss_ function = ∥a − ȳ∥ 2
(L)
δj = (L) (L)
2
δaj δzj
(L) (L) (L)
= (aj − ȳj)aj (1 − aj )

(L) (L) (L) (L)


δ = (a − ȳ) ⊙ a ⊙ (1 − a )
𝕃
𝕃
10 Hamza Alami
Backpropagation Algorithm
• Auxiliary variable in the case an arbitrary layer

(0 )
(l+1) (l+1) (l+1)
δ z , z1 , z2 , . . . 1 L 2
(l) = loss_ function = ∥a − ȳ∥
δj = (l)
2
δzj
(l+1) (l)
δ δzk δak

=
= ((W )
(l+1) (l) (l)
) δ ( )
δzk δaj δzj (l) (l+1) T (l+1) (l) (l)
k δ ⊙ a ⊙ 1 − a
𝕃
(l+1) (l+1) (l) (l)

= δk wkj aj (1 − aj )
k
𝕃
𝕃
11 Hamza Alami
Backpropagation Algorithm
• Let’s consider the following network:

b0(0)
b0(1)
(0)
w00 σ
X1 (0)
z0(0) a0(0) (1)
w00
w01
z0(1)
σ a0(1) loss_ function
(0)
w10
z1(0)
σ a1(0)
(1)
w10
X2 (0)
w11

1
( ŷ − ȳ)
2
b1(0)
2∑
loss_ function =

(1)
ŷ = a0

12 Hamza Alami
Outline
1. Recap

2. Backpropagation algorithm

3. Loss functions

13 Hamza Alami CI2-S3 2024/2025


So far so good
• Neural networks (NNs) are essentially function approximators

• In case of supervised learning, NNs model the decision boundaries


between classes

• Training NNs is an iterative process which aims to nd the weights that


predicts correctly the training outputs

• The loss function measures the deviation of the NNs outputs from the
desired outputs

14 Hamza Alami
fi
Loss function
• The loss function can be viewed as a surface in a high dimensional space

• A loss function can be described by the equation:

n−1 n−1

( ( ) ) ∑ (y , ȳ ∣ w, b)
(i) (i) (i) (i) (i) (i)

(w, b) = f x ∣ w, b , ȳ =
i=0 i=0
𝕃
𝕃
𝕃
15 Hamza Alami
Loss function

16 Hamza Alami
Loss function and global minima
• The global minimum of the loss function using the training data is not
actually what provides the best generalization to the test data

• A study suggests that global minima in practice are irrelevant as they often
leads to over tting
“Choromanska, Anna, et al. "The loss surfaces of multilayer
networks." Artificial intelligence and statistics. PMLR, 2015.”

17 Hamza Alami
fi
Regression loss
• Used to solve regression problems

• The L2 norm of the difference between the model predictions and the
expected predictions (MSE)

• RMSE, MAE

1 L 2
= ∥a − ȳ∥
2
𝕃
18 Hamza Alami
Cross entropy loss

Predicted Expected

p(cat) 1
How do effectively quantitatively estimate
NN p(dog) 0 the deviation of the predicted classes
and the expected classes ?
p(airplane) 0

p(automobile) 0

19 Hamza Alami
Expected classes

Cross entropy loss

Good predictions

Bad predictions
20 Hamza Alami
Cross entropy loss
pexpected (i) log (ppredicted (i))

cross_entropy = −
i

• If pexpected (i) is close to 1 and ppredicted (i) is close to 1, the CE is close to


0

• If pexpected (i) is close to 1 and ppredicted (i) is close to 0, then


log (ppredicted (i)) will be close to −∞ and the CE will be very high

• If pexpected (i) is close to 0 then it will not contribute the CE

21 Hamza Alami
Binary Cross entropy loss
• In the case of the number of classes is 2, considering p0 the predicted
probability of the class 0 then p1 = 1 − p0

• Thus the binary cross entropy is de ned as:

BCE = − pexpected (0) log (p0) − (1 − pexpected (0)) log (1 − p0)

∂ 1 1 ȳ 1 − ȳ
− = ȳ − (1 − ȳ) = − = 0 ⟹ y = ȳ −0.25 log(0.25) − 0.75 log(0.75) = 0.56 ≠ 0
∂y y 1−y y 1−y ⟸

22 Hamza Alami
𝕃
fi
Softmax cross entropy loss
• Considering the previous example of classes cat, dog, airplane, automobile
a score vector may be [9 10 0.1 -3]

• The scores are unbounded, they can be any real number

• NN behave better when the loss function involves a bounded set of


numbers in the same range

• The Softmax converts unbounded scores into probabilities

23 Hamza Alami
Softmax cross entropy loss
• Given the scores vector s = [s1, s2, . . . , sN−1]the corresponding Softmax
vector is:

[ ∑k e ∑k e ∑k e k ]
s1 s2 sN−1
e e e
softmax (s) = s
, s
, . . . , s
k k

The sum of the softmax vector is 1

An element of the softmax vector represent the probability of a class

24 Hamza Alami
Softmax cross entropy loss
• Why Softmax ?

• The Softmax is a smooth (differentiable) approximation of the


argmaxonehot

argmaxonehot (p) = [0,1] Far from each other

p = [9.99,10] argmaxonehot (q) = [1,0]

q = [10,9.99]
softmax (p) = [0.4975,0.5025]
Close to each other

softmax (q) = [0.5025,0.4975]

25 Hamza Alami
Softmax cross entropy loss
• The Softmax output probabilities so it will be used in the last layer

• The Softmax probabilities are then used to compute the cross entropy loss

• Various deep learning libraries combine the softmax and CE loss in one
operation

• That combination tends to be numerically better

26 Hamza Alami
Softmax cross entropy loss

27 Hamza Alami
Focal loss
• Not all training data are equally important and we need to use the available
data wisely

• The idea behind the focal loss is to focus more on data that are not doing
well

• The focal loss can achieve better performances in the case of imbalanced
data

28 Hamza Alami
Focal loss
• Lets consider the binary CE

−log (y) if the expected class is 1


{−log (1 − y) if the expected class is 0
(y, ȳ) =

• The focal loss is de ned as:

− (1 − y) log (y) if the expected class is 1


γ

{ −y log (1 − y) if the expected class is 0


(y, ȳ) = γ
𝕃
𝕃
29 Hamza Alami
fi
Focal loss

30 Hamza Alami
Hinge loss
• A hinge loss function increases if the goodness criterion is not satis ed
and becomes 0 If the criterion is satis ed

“If you are not my friend the distance between


us can vary from small to large.
But I don’t distinguish between friends. All my
friends are at a distance 0 from me”

Hinged door

31 Hamza Alami
fi
fi
Hinge loss
• The hinge loss is de ned by the equation
N−1 yj the score of incorrect classes
max (0,yj − yc + m)
∑ yc the score of the correct classe
j=0,j≠c

• In the case of good output: yj < yc ⟹ max (0,yj − yc) = 0

• In the case of bad output: yj > yc ⟹ max (0,yj − yc) = yj − yc

32 Hamza Alami
fi
Hinge loss

33 Hamza Alami

You might also like