0% found this document useful (0 votes)

10 views33 pages

Part 2

The document outlines key concepts in artificial intelligence, focusing on neural networks, backpropagation algorithms, and various loss functions used in training. It explains the differences between AI, ML, and DL, as well as the significance of loss functions like regression loss, cross-entropy loss, and focal loss in optimizing neural networks. The document also emphasizes the iterative nature of training neural networks to minimize prediction errors.

Uploaded by

Mohi Gpt4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views33 pages

Part 2

Uploaded by

Mohi Gpt4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Arti cial Intelligence 2

CI 2 - S3

Pr. Hamza Alami

Academic year: 2024/2025

fi
Outline
1. Recap

2. Backpropagation algorithm

3. Loss functions

2 Hamza Alami CI2-S3 2024/2025

Recap
• What is the difference between IA, ML, and DL?

• What is a perceptron, MLP, universal approximator?

• What is the difference between Heaviside and sigmoid functions?

• What is the algorithm used to optimize NNs?

3 Hamza Alami CI2-S3 2024/2025

Outline
1. Recap

2. Backpropagation algorithm

3. Loss functions

4 Hamza Alami CI2-S3 2024/2025

Backpropagation Algorithm
• Let’s consider the following network:

b (0) b (1)

w (0) σ w (1) σ
X z (0) a (0) z (1) a (1)

1
( ŷ − ȳ)
2
2∑
loss_ function =
…
ŷ = a (L)
w (L)
z (L) σ a (L) loss_ function

b (L)

5 Hamza Alami
Backpropagation Algorithm
• We have the following: δ δ δz (l)
• And: =
(l)
z =w a (l) (l−1)
+b (l) δw (l) δz δw
(l) (l)
(l)
a = σ (z )
(l) (l)
(l) δz
=δ
(l) δ δw (l)
δ = (l) (l) (l−1)
δz =δ a
(l)
1 • And: δ δ δz
( ŷ − ȳ)
2
=
2∑
= loss_ function =
δb (l) δz (l) δb (l)
ŷ = a (L) (l)
(l) δz
=δ
𝕃
δb (l)
(l)
=δ
𝕃
6
𝕃
𝕃
𝕃
𝕃
Hamza Alami
Backpropagation Algorithm
• In the case of the last layer:
(L)
δ δ δz 1
( ŷ − ȳ)
2
= = loss_ function =
2∑
δw (L) δz (L) δw (L) (L) ŷ = a (L)
δz (L) (L) δ δa
=δ (L) δ = (L) (L)
δw (L) δa δz
(L) (L)
(L) (L−1)
=δ a = (ŷ − ȳ)a (1 − a )
(L)
δ δ δz
=
δb (L) δz δb
(L) (L)
(L)
=δ
𝕃
𝕃
7
𝕃
𝕃
𝕃
𝕃
Hamza Alami
Backpropagation Algorithm
• In the case of an arbitrary layer:
(l)
δ δ δz 1
( ŷ − ȳ)
2
= = loss_ function =
2∑
δw (l) δz (l) δw (l) ŷ = a (L)
(l)
(L) δz
=δ (l+1) (l)
δw (l) (l) δ δz δa
δ = (l+1)
(l) (l−1)
=δ a δz δa (l) δz (l)
(l+1) (l+1) (l) (l)
=δ w a (1 − a )
(l)
δ δ δz
=
δb (l) δz δb
(l) (l)
(l)
=δ
𝕃
𝕃
8
𝕃
𝕃
𝕃
𝕃
Hamza Alami
Backpropagation Algorithm
• Now lets consider with arbitrary inputs and hidden layers
(l)
z =W a (l) (l−1)
+b (l) • Now that a layer has multiple neurons
we have one auxiliary variable per neuron
a = σ (z )
(l) (l)

(l) δ
δj = (l)
∀j ∈ {0...number of neurons in layer l}
1 L δzj
= loss_ function = ∥a − ȳ∥2
2
𝕃
9 Hamza Alami
𝕃
Backpropagation Algorithm
• Auxiliary variable in the case of the last layer

(L)
δ δaj 1 L
= loss_ function = ∥a − ȳ∥ 2
(L)
δj = (L) (L)
2
δaj δzj
(L) (L) (L)
= (aj − ȳj)aj (1 − aj )

(L) (L) (L) (L)

δ = (a − ȳ) ⊙ a ⊙ (1 − a )
𝕃
𝕃
10 Hamza Alami
Backpropagation Algorithm
• Auxiliary variable in the case an arbitrary layer

(0 )
(l+1) (l+1) (l+1)
δ z , z1 , z2 , . . . 1 L 2
(l) = loss_ function = ∥a − ȳ∥
δj = (l)
2
δzj
(l+1) (l)
δ δzk δak
∑
=
= ((W )
(l+1) (l) (l)
) δ ( )
δzk δaj δzj (l) (l+1) T (l+1) (l) (l)
k δ ⊙ a ⊙ 1 − a
𝕃
(l+1) (l+1) (l) (l)
∑
= δk wkj aj (1 − aj )
k
𝕃
𝕃
11 Hamza Alami
Backpropagation Algorithm
• Let’s consider the following network:

b0(0)
b0(1)
(0)
w00 σ
X1 (0)
z0(0) a0(0) (1)
w00
w01
z0(1)
σ a0(1) loss_ function
(0)
w10
z1(0)
σ a1(0)
(1)
w10
X2 (0)
w11

1
( ŷ − ȳ)
2
b1(0)
2∑
loss_ function =

(1)
ŷ = a0

12 Hamza Alami
Outline
1. Recap

2. Backpropagation algorithm

3. Loss functions

13 Hamza Alami CI2-S3 2024/2025

So far so good
• Neural networks (NNs) are essentially function approximators

• In case of supervised learning, NNs model the decision boundaries

between classes

• Training NNs is an iterative process which aims to nd the weights that

predicts correctly the training outputs

• The loss function measures the deviation of the NNs outputs from the
desired outputs

14 Hamza Alami
fi
Loss function
• The loss function can be viewed as a surface in a high dimensional space

• A loss function can be described by the equation:

n−1 n−1

( ( ) ) ∑ (y , ȳ ∣ w, b)
(i) (i) (i) (i) (i) (i)
∑
(w, b) = f x ∣ w, b , ȳ =
i=0 i=0
𝕃
𝕃
𝕃
15 Hamza Alami
Loss function

16 Hamza Alami
Loss function and global minima
• The global minimum of the loss function using the training data is not
actually what provides the best generalization to the test data

• A study suggests that global minima in practice are irrelevant as they often
leads to over tting
“Choromanska, Anna, et al. "The loss surfaces of multilayer
networks." Artificial intelligence and statistics. PMLR, 2015.”

17 Hamza Alami
fi
Regression loss
• Used to solve regression problems

• The L2 norm of the difference between the model predictions and the
expected predictions (MSE)

• RMSE, MAE

1 L 2
= ∥a − ȳ∥
2
𝕃
18 Hamza Alami
Cross entropy loss

Predicted Expected

p(cat) 1
How do effectively quantitatively estimate
NN p(dog) 0 the deviation of the predicted classes
and the expected classes ?
p(airplane) 0

p(automobile) 0

19 Hamza Alami
Expected classes

Cross entropy loss

Good predictions

Bad predictions
20 Hamza Alami
Cross entropy loss
pexpected (i) log (ppredicted (i))
∑
cross_entropy = −
i

• If pexpected (i) is close to 1 and ppredicted (i) is close to 1, the CE is close to

• If pexpected (i) is close to 1 and ppredicted (i) is close to 0, then

log (ppredicted (i)) will be close to −∞ and the CE will be very high

• If pexpected (i) is close to 0 then it will not contribute the CE

21 Hamza Alami
Binary Cross entropy loss
• In the case of the number of classes is 2, considering p0 the predicted
probability of the class 0 then p1 = 1 − p0

• Thus the binary cross entropy is de ned as:

BCE = − pexpected (0) log (p0) − (1 − pexpected (0)) log (1 − p0)

∂ 1 1 ȳ 1 − ȳ
− = ȳ − (1 − ȳ) = − = 0 ⟹ y = ȳ −0.25 log(0.25) − 0.75 log(0.75) = 0.56 ≠ 0
∂y y 1−y y 1−y ⟸

22 Hamza Alami
𝕃
fi
Softmax cross entropy loss
• Considering the previous example of classes cat, dog, airplane, automobile
a score vector may be [9 10 0.1 -3]

• The scores are unbounded, they can be any real number

• NN behave better when the loss function involves a bounded set of

numbers in the same range

• The Softmax converts unbounded scores into probabilities

23 Hamza Alami
Softmax cross entropy loss
• Given the scores vector s = [s1, s2, . . . , sN−1]the corresponding Softmax
vector is:

[ ∑k e ∑k e ∑k e k ]
s1 s2 sN−1
e e e
softmax (s) = s
, s
, . . . , s
k k

The sum of the softmax vector is 1

An element of the softmax vector represent the probability of a class

24 Hamza Alami
Softmax cross entropy loss
• Why Softmax ?

• The Softmax is a smooth (differentiable) approximation of the

argmaxonehot

argmaxonehot (p) = [0,1] Far from each other

p = [9.99,10] argmaxonehot (q) = [1,0]

q = [10,9.99]
softmax (p) = [0.4975,0.5025]
Close to each other

softmax (q) = [0.5025,0.4975]

25 Hamza Alami
Softmax cross entropy loss
• The Softmax output probabilities so it will be used in the last layer

• The Softmax probabilities are then used to compute the cross entropy loss

• Various deep learning libraries combine the softmax and CE loss in one
operation

• That combination tends to be numerically better

26 Hamza Alami
Softmax cross entropy loss

27 Hamza Alami
Focal loss
• Not all training data are equally important and we need to use the available
data wisely

• The idea behind the focal loss is to focus more on data that are not doing
well

• The focal loss can achieve better performances in the case of imbalanced
data

28 Hamza Alami
Focal loss
• Lets consider the binary CE

−log (y) if the expected class is 1

{−log (1 − y) if the expected class is 0
(y, ȳ) =

• The focal loss is de ned as:

− (1 − y) log (y) if the expected class is 1

{ −y log (1 − y) if the expected class is 0

(y, ȳ) = γ
𝕃
𝕃
29 Hamza Alami
fi
Focal loss

30 Hamza Alami
Hinge loss
• A hinge loss function increases if the goodness criterion is not satis ed
and becomes 0 If the criterion is satis ed

“If you are not my friend the distance between

us can vary from small to large.
But I don’t distinguish between friends. All my
friends are at a distance 0 from me”

Hinged door

31 Hamza Alami
fi
fi
Hinge loss
• The hinge loss is de ned by the equation
N−1 yj the score of incorrect classes
max (0,yj − yc + m)
∑ yc the score of the correct classe
j=0,j≠c

• In the case of good output: yj < yc ⟹ max (0,yj − yc) = 0

• In the case of bad output: yj > yc ⟹ max (0,yj − yc) = yj − yc

32 Hamza Alami
fi
Hinge loss

33 Hamza Alami

Lec 04 Deep Networks 2
No ratings yet
Lec 04 Deep Networks 2
78 pages
Part 1
No ratings yet
Part 1
48 pages
Part 3
No ratings yet
Part 3
37 pages
3a Variations
No ratings yet
3a Variations
17 pages
L3 Cse256 Fa24 FFN
No ratings yet
L3 Cse256 Fa24 FFN
64 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
79 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
Lect 5
No ratings yet
Lect 5
89 pages
3a Variations4
No ratings yet
3a Variations4
5 pages
Practical-5 - 2CEIT606 - Artificial Intelligence
No ratings yet
Practical-5 - 2CEIT606 - Artificial Intelligence
14 pages
Unit 2 DL
No ratings yet
Unit 2 DL
70 pages
Lect 8
No ratings yet
Lect 8
117 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
Neural Networks
No ratings yet
Neural Networks
38 pages
Neural Network (Basics)
No ratings yet
Neural Network (Basics)
48 pages
DL145611 03 Shallow
No ratings yet
DL145611 03 Shallow
92 pages
7 TrainingNN-2
No ratings yet
7 TrainingNN-2
84 pages
Lec3 Backpropagation
No ratings yet
Lec3 Backpropagation
9 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
W02 MLOptDL
No ratings yet
W02 MLOptDL
23 pages
Lecture 19
No ratings yet
Lecture 19
8 pages
Machine Learning Loss Functions Guide
100% (2)
Machine Learning Loss Functions Guide
37 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
61 pages
Lecture 4
No ratings yet
Lecture 4
50 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
05 AIS302 ANN-Optimization
No ratings yet
05 AIS302 ANN-Optimization
44 pages
Handwritten Notes - Unit 1,2
No ratings yet
Handwritten Notes - Unit 1,2
9 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
Topic 4 (Part 2) - NN Learning
No ratings yet
Topic 4 (Part 2) - NN Learning
92 pages
NN WK 3 Lec 5 6 Gradient Descent
No ratings yet
NN WK 3 Lec 5 6 Gradient Descent
7 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
31 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
EE2211 Introduction To Machine Learning
No ratings yet
EE2211 Introduction To Machine Learning
99 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
DL Unit-2
No ratings yet
DL Unit-2
24 pages
Lec 2
No ratings yet
Lec 2
43 pages
4 Classification 3
No ratings yet
4 Classification 3
59 pages
Ilovepdf Merged Unit 1 Compressed
No ratings yet
Ilovepdf Merged Unit 1 Compressed
223 pages
DNN Full Merged Compressed Compressed
No ratings yet
DNN Full Merged Compressed Compressed
863 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
MtechDL Unit2
No ratings yet
MtechDL Unit2
25 pages
Artificial Neural Networks & Fuzzy Logic
No ratings yet
Artificial Neural Networks & Fuzzy Logic
13 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
ML Unit - 2
No ratings yet
ML Unit - 2
70 pages
CHC 351 Module 4
No ratings yet
CHC 351 Module 4
126 pages
Deep Learning for Tech Enthusiasts
No ratings yet
Deep Learning for Tech Enthusiasts
50 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Lecture 2
No ratings yet
Lecture 2
6 pages
CS480 6 Linear Models
No ratings yet
CS480 6 Linear Models
68 pages
Deep Learning Notes-2
No ratings yet
Deep Learning Notes-2
16 pages
Session 2 ANN 2024
No ratings yet
Session 2 ANN 2024
29 pages
Loss Function - Ipynb - Colaboratory
No ratings yet
Loss Function - Ipynb - Colaboratory
6 pages
03-Linear Classification
No ratings yet
03-Linear Classification
17 pages
Neural Networks & Gradient Descent
No ratings yet
Neural Networks & Gradient Descent
77 pages
Lecture 1
No ratings yet
Lecture 1
6 pages
Neural Networks (Basics)
No ratings yet
Neural Networks (Basics)
30 pages
Cs217 Perceptron Sigmoid Softmax Week5 3feb25
No ratings yet
Cs217 Perceptron Sigmoid Softmax Week5 3feb25
90 pages
Machine Learning Loss Functions Guide
No ratings yet
Machine Learning Loss Functions Guide
100 pages
Multi-Camera Person Detection & Heatmap
No ratings yet
Multi-Camera Person Detection & Heatmap
4 pages
1.cross Entropy Problem and Soln
No ratings yet
1.cross Entropy Problem and Soln
3 pages
Logistic Regression - Gradient Descent - Example
No ratings yet
Logistic Regression - Gradient Descent - Example
4 pages
Solarx: Solar Panel Segmentation and Classification
No ratings yet
Solarx: Solar Panel Segmentation and Classification
9 pages
CNN Unit
No ratings yet
CNN Unit
52 pages
DL Unit 1
No ratings yet
DL Unit 1
21 pages
Knowledge Distillation For In-Memory Keyword Spotting Model
No ratings yet
Knowledge Distillation For In-Memory Keyword Spotting Model
5 pages
Frequency-Guided U-Net: Leveraging Attention Filter Gates and Fast Fourier Transformation For Enhanced Medical Image Segmentation
No ratings yet
Frequency-Guided U-Net: Leveraging Attention Filter Gates and Fast Fourier Transformation For Enhanced Medical Image Segmentation
29 pages
Chapter 5 - Machine Learning
100% (1)
Chapter 5 - Machine Learning
114 pages
Sigmoid Neuron and Cross-Entropy - Parveen Khurana - Medium
No ratings yet
Sigmoid Neuron and Cross-Entropy - Parveen Khurana - Medium
17 pages
AI, ML, Deep Learning Differences
No ratings yet
AI, ML, Deep Learning Differences
28 pages
Lecture 0.3 - Linear Classifiers, Logistic Regression, Multiclass Classification
No ratings yet
Lecture 0.3 - Linear Classifiers, Logistic Regression, Multiclass Classification
48 pages
Unit 1 Part 2 Notes
No ratings yet
Unit 1 Part 2 Notes
34 pages
Imbalanced Data
No ratings yet
Imbalanced Data
54 pages
Softmax Reg Skimmed - Ipynb - Colab
No ratings yet
Softmax Reg Skimmed - Ipynb - Colab
9 pages
Deep Learning MCQ
No ratings yet
Deep Learning MCQ
6 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Class Xii Ai Worksheet Booklet Part2 2023-2024
No ratings yet
Class Xii Ai Worksheet Booklet Part2 2023-2024
26 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
9 pages
Final Report - CAB 420
No ratings yet
Final Report - CAB 420
13 pages
Essential Math For AI - ML
100% (1)
Essential Math For AI - ML
22 pages
Ch03 LogisticRegression
No ratings yet
Ch03 LogisticRegression
79 pages
3.areal-Time Humanbonefracturedetection Andclassification
100% (1)
3.areal-Time Humanbonefracturedetection Andclassification
17 pages
Week 3
No ratings yet
Week 3
3 pages
Introduction To Cross Entropy Loss
No ratings yet
Introduction To Cross Entropy Loss
13 pages
The Sequence Memoizer
No ratings yet
The Sequence Memoizer
8 pages
Loss Functions
No ratings yet
Loss Functions
17 pages
Classification and Clustering Algorithm Notes
No ratings yet
Classification and Clustering Algorithm Notes
19 pages
DL Co-3 PPT 2
No ratings yet
DL Co-3 PPT 2
25 pages
Advanced Machine Learning
No ratings yet
Advanced Machine Learning
5 pages

Part 2

Uploaded by

Part 2

Uploaded by

Arti cial Intelligence 2

Pr. Hamza Alami

Academic year: 2024/2025

2 Hamza Alami CI2-S3 2024/2025

• What is a perceptron, MLP, universal approximator?

• What is the difference between Heaviside and sigmoid functions?

• What is the algorithm used to optimize NNs?

3 Hamza Alami CI2-S3 2024/2025

4 Hamza Alami CI2-S3 2024/2025

(L) (L) (L) (L)

13 Hamza Alami CI2-S3 2024/2025

• In case of supervised learning, NNs model the decision boundaries

• Training NNs is an iterative process which aims to nd the weights that

• A loss function can be described by the equation:

Cross entropy loss

• If pexpected (i) is close to 1 and ppredicted (i) is close to 1, the CE is close to

• If pexpected (i) is close to 1 and ppredicted (i) is close to 0, then

• If pexpected (i) is close to 0 then it will not contribute the CE

• Thus the binary cross entropy is de ned as:

BCE = − pexpected (0) log (p0) − (1 − pexpected (0)) log (1 − p0)

• The scores are unbounded, they can be any real number

• NN behave better when the loss function involves a bounded set of

• The Softmax converts unbounded scores into probabilities

The sum of the softmax vector is 1

An element of the softmax vector represent the probability of a class

• The Softmax is a smooth (differentiable) approximation of the

argmaxonehot (p) = [0,1] Far from each other

p = [9.99,10] argmaxonehot (q) = [1,0]

softmax (q) = [0.5025,0.4975]

• That combination tends to be numerically better

−log (y) if the expected class is 1

• The focal loss is de ned as:

− (1 − y) log (y) if the expected class is 1

{ −y log (1 − y) if the expected class is 0

“If you are not my friend the distance between

• In the case of good output: yj < yc ⟹ max (0,yj − yc) = 0

• In the case of bad output: yj > yc ⟹ max (0,yj − yc) = yj − yc

You might also like