0% found this document useful (0 votes)

9 views66 pages

Part 2 Module 2 DL BP

Uploaded by

geetapillai1963

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views66 pages

Part 2 Module 2 DL BP

Uploaded by

geetapillai1963

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Deep Learning

BITS Pilani
Pilani Campus
Deep Neural Network

Disclaimer and Acknowledgement

• The content for these slides has been obtained from books and various other source on the Internet
• I here by acknowledge all the contributors for their material and inputs.
• I have provided source information wherever necessary
• I have added and modified the content to suit the requirements of the course

BITS Pilani, Pilani Campus

Session Agenda

• Back Propagation

BITS Pilani, Pilani Campus

Training: Go Forwards, then Backwards…
Step 3: Calculate partial derivatives
(using Backpropagation)
Step 1: Calculate 𝒚̂ using
computation graph.

Step 2: Determine the cost.

Step 4: Update each

parameter

Brad Quinton Scott Chin

Training Neural Networks: optimizing
parameters
• We are given an architecture though its weights 𝐖.
• Also given a training data 𝐷 = {(𝐱𝑖, 𝑦i)}
• We are given a loss function ℒ(𝐷; 𝐖)
• We can use gradient descent to minimizes the loss.
• At each step, the weight vector is modified in the direction that
produces the steepest descent along the error surface.
Algorithm for Gradient Computation –
Backpropagation
𝜕ℒ
• For each sub-parameter 𝑊𝑖 ∈ 𝐖: 𝑤𝑖𝑡+1 = 𝑤𝑖𝑡 −∝ 𝜕𝑤 𝑖
𝜕ℒ
It all comes down to effectively computing 𝜕𝑤𝑖
𝜕ℒ
• How to efficiently compute 𝜕𝑤 for all parameters?
𝑖

• The calculus just gets a bit more complicated for a neural

network
• Depth gives more representational capacity to neural networks.
• However, training deep nets is not trivial.
• The solution is “Backpropagation” algorithm!
• Backpropagation is a systematic numerical method to calculate
the partial derivatives (i.e. partial derivative of the cost w.r.t. each
parameter)
Rumelhart, Hinton, Williams, “Learning Representations by Back-Propagating Errors”, 1986
Key Intuitions Required for
Backpropagation
1. Gradient Descent
• Change the weights 𝐖 in the direction of gradient to minimize the
error function.

2. Chain Rule
• Use the chain rule to calculate the weights of the intermediate
weights

3. Dynamic Programming (Memorization)

• Parameter gradients depend on the gradients of the earlier layers!
• So, when computing gradients at each layer, we can reuse
gradients computed for higher layers for lower layers (i.e.,
memorization).
The Computation Graph of A Neural
Network
• we can represent any neural network in terms of a computation
graph
• The loss function can be computed by moving from left to right
𝒃[𝟐]

𝒃[𝟏] 𝑾[𝟐]
𝒛[𝟐]
𝒙 𝒛[𝟏] = 𝒘[𝟏] 𝒙 + 𝒃[𝟏] 𝒂[𝟏] = 𝝈(𝒛[𝟏] ) 𝒂[𝟐] = 𝝈(𝒛[𝟐] ) 𝓛(𝒂[𝟐] , 𝒚)
= 𝒘[𝟐] 𝒂[𝟏] + 𝒃[𝟐]
𝑾[𝟏]

The corresponding computation graph

Cross entropy cost function:

Cost 𝒚, 𝑦 = −𝑦 ∗ 𝒍𝒐𝒈𝒚 − 1 − y log 1 − 𝒚
Backward Propagation

𝒃[𝟐]
𝒚 = 𝒂[𝟐]
𝒃[𝟏] 𝑾[𝟐]
𝒛[𝟐]
𝒙 𝒛 [𝟏]
=𝒘 [𝟏]
𝒙+𝒃 [𝟏] 𝒂 [𝟏] [𝟏]
= 𝝈(𝒛 ) 𝒂[𝟐] = 𝝈(𝒛[𝟐] ) 𝓛(𝒂[𝟐] , 𝒚)
= 𝒘[𝟐] 𝒂[𝟏] + 𝒃[𝟐]
𝑾[𝟏]
a[2] → 𝓛
𝝏𝓛 𝝏 [𝟐] [ 𝟐]
[𝟐]
= [𝟐] −𝒚𝒍𝒐𝒈 𝒂 − 𝟏 − 𝒚 𝒍𝒐𝒈 𝟏 − 𝒂
𝝏𝒂 𝝏𝒂
−𝒚 (𝟏−𝒚)
= 𝒂[𝟐] + (𝟏−𝒂[𝟐])
Backward Propagation
𝒃[𝟐]

𝒃[𝟏] 𝑾[𝟐]

𝒙 𝒛[𝟏] = 𝒘[𝟏] 𝒙 + 𝒃[𝟏] 𝒂[𝟏] = 𝝈(𝒛[𝟏] ) 𝒛[𝟐] = 𝒘[𝟐] 𝒂[𝟏] + 𝒃[𝟐] 𝒂[𝟐] = 𝝈(𝒛[𝟐] ) 𝓛(𝒂[𝟐] , 𝒚)

𝑾[𝟏]

z[2] →a[2] → 𝓛

𝒂[𝟐] = 𝝈 𝒛 𝟐 𝝏𝓛 𝝏𝓛 𝝏𝒂[𝟐]
= × = 𝒂[𝟐] −𝒚
𝝏𝒛[𝟐] 𝝏𝒂[𝟐] 𝝏𝒛[𝟐]
𝝏𝒂[𝟐] [𝟐] 𝟏 − 𝒂[𝟐]
= 𝒂
𝝏𝒛[𝟐]
Backward Propagation
𝒃[𝟐]

𝒃[𝟏] 𝑾[𝟐]

𝒙 𝒛[𝟏] = 𝒘[𝟏] 𝒙 + 𝒃[𝟏] 𝒂[𝟏] = 𝝈(𝒛[𝟏] ) 𝒛[𝟐] = 𝒘[𝟐] 𝒂[𝟏] + 𝒃[𝟐] 𝒂[𝟐] = 𝝈(𝒛[𝟐] ) 𝓛(𝒂[𝟐] , 𝒚)

𝑾[𝟏]

b[2] →z[2] →a[2] → 𝓛

𝝏𝓛 𝝏𝓛 𝝏𝒂[𝟐] 𝝏𝒛[𝟐] 𝝏𝓛
𝝏𝒃[𝟐]
=
𝝏𝒂[𝟐]
×
𝝏𝒛[𝟐]
×
𝝏𝒃[𝟐]
=
𝝏𝒛[𝟐]

𝝏𝓛
1
𝝏𝒛[𝟐]
Backward Propagation

𝒃[𝟐]

𝒃[𝟏] 𝑾[𝟐]

𝒙 𝒛[𝟏] = 𝒘[𝟏] 𝒙 + 𝒃[𝟏] 𝒂[𝟏] = 𝝈(𝒛[𝟏] ) 𝒛[𝟐] = 𝒘[𝟐] 𝒂[𝟏] + 𝒃[𝟐] 𝒂[𝟐] = 𝝈(𝒛[𝟐] ) 𝓛(𝒂[𝟐] , 𝒚)

𝑾[𝟏]

w[2] →z[2] →a[2] → 𝓛

𝝏𝓛 𝝏𝓛 𝝏𝒂[𝟐] 𝝏𝒛[𝟐] 𝝏𝓛 𝟏 𝑻

[𝟐]
= [𝟐]
× [𝟐] × [𝟐]
= [𝟐] × 𝒂
𝝏𝑾 𝝏𝒂 𝝏𝒛 𝝏𝑾 𝝏𝒛

Dimension of dz[2] is (n[2],1)

Dimension of a[1] is (n[1],1)
𝝏𝓛 Hence transpose of a[1]
𝝏𝒛[𝟐] Dimension of dW[2] is (n[2], n[1])
Backward Propagation

𝒃[𝟏] 𝑾[𝟐]

𝒙 𝒛[𝟏] = 𝒘[𝟏] 𝒙 + 𝒃[𝟏] 𝒂[𝟏] = 𝝈(𝒛[𝟏] ) 𝒛[𝟐] = 𝒘[𝟐] 𝒂[𝟏] + 𝒃[𝟐] 𝒂[𝟐] = 𝝈(𝒛[𝟐] ) 𝓛(𝒂[𝟐] , 𝒚)

𝑾[𝟏] 𝒃[𝟐]

a[1] →z[2] →a[2] → 𝓛

𝝏𝓛 𝝏𝓛 𝝏𝒂[𝟐] 𝝏𝒛[𝟐] 𝟐 𝑻 × 𝝏𝓛
= × × = 𝒘
𝝏𝒂[𝟏] 𝝏𝒂[𝟐] 𝝏𝒛[𝟐] 𝝏𝒂[𝟏] 𝝏𝒛[𝟐]

𝝏𝓛
Dimension of dz[2] is (n[2],1)
𝝏𝒛[𝟐] Dimension of W[2] is (n[2], n[1])
Dimension of da[1] is (n[1],1)
But, the calculation of da[1] is not required.
Backward Propagation Dimension of dz[2] is (n[2],1)
Dimension of W[2] is (n[2], n[1])
Dimension of da[1] is (n[1],1)
𝒃[𝟐] Dimension of dz[1] is (n[1],1)
𝒃[𝟏] 𝑾[𝟐]

𝒙 𝒛[𝟏] = 𝒘[𝟏] 𝒙 + 𝒃[𝟏] 𝒂[𝟏] = 𝝈(𝒛[𝟏] ) 𝒛[𝟐] = 𝒘[𝟐] 𝒂[𝟏] + 𝒃[𝟐] 𝒂[𝟐] = 𝝈(𝒛[𝟐] ) 𝓛(𝒂[𝟐] , 𝒚)

𝑾[𝟏]

z[1] →a[1] →z[2] →a[2] → 𝓛

Element-wise product
𝝏𝓛 𝝏𝓛 𝝏𝒂[𝟐] 𝝏𝒛[𝟐] 𝝏𝒂[𝟏] 𝟐 𝑻 𝝏𝓛 [𝟏] (𝟏 − 𝒂[𝟏] )
= × × × = 𝒘 × ∗ 𝒂
𝝏𝒛[𝟏] 𝝏𝒂[𝟐] 𝝏𝒛[𝟐] 𝝏𝒂[𝟏] 𝝏𝒛[𝟏] 𝝏𝒛[𝟐]

𝒂[𝟏] = 𝝈 𝒛 𝟏

𝝏𝓛 𝝏𝒂[𝟏]
𝝏𝒂[𝟏] [𝟏]
= 𝒂[𝟏] 𝟏 − 𝒂[𝟏]
𝝏𝒛
Backward Propagation
𝒃[𝟐]

𝒃[𝟏] 𝑾[𝟐]
𝒛[𝟐]
𝒙 𝒛 [𝟏]
=𝒘 [𝟏]
𝒙+𝒃 [𝟏] 𝒂 [𝟏] [𝟏]
= 𝝈(𝒛 ) 𝒂[𝟐] = 𝝈(𝒛[𝟐] ) 𝓛(𝒂[𝟐] , 𝒚)
= 𝒘[𝟐] 𝒂[𝟏] + 𝒃[𝟐]
𝑾[𝟏]

b[1] → z[1] →a[1] →z[2] →a[2] → 𝓛

𝝏𝓛 𝝏𝓛 𝝏𝒂[𝟐] 𝝏𝒛[𝟐] 𝝏𝒂[𝟏] 𝝏𝒛[𝟏] 𝝏𝓛

[𝟏]
= [𝟐]
× [𝟐] × [𝟏] × [𝟏] × [𝟏] = [𝟏]
𝝏𝒃 𝝏𝒂 𝝏𝒛 𝝏𝒂 𝝏𝒛 𝝏𝒃 𝝏𝒛

𝟏
Backward Propagation
𝒃[𝟐]

W[1]→ z[1] →a[1] →z[2] →a[2] → 𝓛

𝝏𝓛 𝝏𝓛 𝝏𝒂[𝟐] 𝝏𝒛[𝟐] 𝝏𝒂[𝟏] 𝝏𝒛[𝟏] 𝝏𝓛 𝑻

= × × × × = 𝒙
𝝏𝒘[𝟏] 𝝏𝒂[𝟐] 𝝏𝒛[𝟐] 𝝏𝒂[𝟏] 𝝏𝒛[𝟏] 𝝏𝒘[𝟏] 𝝏𝒛[𝟏]

x
Backward Propagation: Summary
One training example For all training example
For cross entropy loss and sigmoid activation in the last layer

[𝟐]
𝝏𝓛 [𝟐]
𝝏𝓛 𝒅𝒁 = = 𝑨 −𝒀
𝒅𝒛 = [𝟐] = 𝒂[𝟐] −𝒚
[𝟐] 𝝏𝒁[𝟐]
𝝏𝒛 𝝏𝓛 𝟏
𝒅𝒃 = [𝟐] = ෍ 𝒅𝒁[𝟐]
[𝟐]
𝝏𝓛 𝝏𝒃 𝒎
𝒅𝒃 = [𝟐] = 𝒅𝒛[𝟐]
[𝟐]
𝝏𝓛 𝟏
𝝏𝒃 [𝟐]
𝒅𝑾 = = 𝒅𝒁 [𝟐] . 𝑨 𝟏 𝑻

[𝟐]
𝝏𝓛 [𝟐] 𝟏 𝑻 𝝏𝑾[𝟐] 𝒎
𝒅𝒘 = [𝟐]
= 𝒅𝒛 . 𝒂
𝝏𝑾 𝝏𝓛 𝑻
𝒅𝒁[𝟏] = = 𝑾 𝟐 𝒅𝒁[𝟐] ∗ 𝒈[𝟏]` 𝒁[𝟏]
𝝏𝒁[𝟏]
𝝏𝓛 𝑻
𝒅𝒛[𝟏] = = 𝒘 𝟐 𝒅𝒛[𝟐] ∗ 𝒈` 𝒛[𝟏]
𝝏𝒛[𝟏]
𝝏𝓛 𝟏
𝝏𝓛 𝒅𝒃[𝟏] = [𝟏] = ෍ 𝒅𝒁[𝟏]
𝒅𝒃 [𝟏]
= [𝟏] = 𝒅𝒛[𝟏] 𝝏𝒃 𝒎
𝝏𝒃
𝝏𝓛 𝟏
𝝏𝓛 𝒅𝑾[𝟏] = = 𝒅𝒁 [𝟏] . 𝑿𝑻
𝒅𝒘[𝟏] = = 𝒅𝒛[𝟏] . 𝒙𝑻 𝝏𝑾[𝟏] 𝒎
𝝏𝒘[𝟏]
Equations for layer l
One training example For all training example

Input da[l]
Output da[l-1], dw[l] , db[l]
𝒅𝒁[𝒍] = 𝒅𝑨[𝒍] ∗ 𝒈[𝒍]` 𝒁[𝒍]
𝒅𝒛[𝒍] = 𝒅𝒂[𝒍] ∗ 𝒈[𝒍]` 𝒛[𝒍]
𝟏
𝒅𝒃 = ෍ 𝒅𝒁[𝒍]
[𝒍]
𝒅𝒃[𝒍] = 𝒅𝒛[𝒍] 𝒎
𝒍 [𝒍]
𝟏 [𝒍] 𝒍−𝟏 𝑻

𝒅𝑾[𝒍] = 𝒅𝒛 . 𝒂 𝒍−𝟏 𝒅𝑾 = 𝒅𝒁 . 𝑨
𝒎
𝑻
𝒅𝒂[𝒍−𝟏] =𝑾 𝒍 𝒅𝒛[𝒍] [𝒍−𝟏] 𝒍 𝑻
𝒅𝑨 = 𝑾 𝒅𝒁[𝒍]
Scaling up for L layers and all training examples in NN
Forward Propagation
Algorithm Forward propagation:
Output
Input

Cache output

Backward propagation:

Cache input
Output

Input
Scaling up for L layers and all training examples in NN
Update the
Parameters
Scaling up for L layers and all training examples in NN
Example
Calculate all matrix dimensions and total number of parameters
Array broadcasting

Source:numpy.org
Array broadcasting

Source:numpy.org
Exercise - MSE Loss

Consider the neural network with two inputs x1 and x2 and the initial weights are
w0 = 0.5, w1 = 0.8, w2 = 0.3. Draw the network, compute the output, mean
squared loss function and weight updation when the input is (1, 0), the learning rate
is 0.01 and target output is 1. Assume any other relevant information.

1 w0

w1
Σ
x1 σ ŷ
w2
x2
Solution
E X E RCISE - B C E

Consider the neural network with two inputs x1 and x2 and the initial weights are
w0 = 0.5, w1 = 0.8, w2 = 0.3. Draw the network, compute the output, binary cross
entropy loss function and weight updation when the input is (1, 0), the learning rate
is 0.01 and target output is 1. Assume any other relevant information.

1 w0

w1
Σ
x1 σ ŷ
w2
x2
Solution
Exercise (without vectorization)
W11[1] =0.2
W12[1] =0.4
[𝟏]
𝒘𝟏𝟏
[𝟏]
W13[1] = -0.5
[𝟏]
𝒛𝟏 𝒂𝟏 [𝟐] W21[1] = -0.3
𝒙𝟏 [𝟏]
𝒘𝟏𝟐
𝒘𝟏𝟏
[𝟏] W22[1] =0.1
𝒘𝟏𝟑
[𝟏]
[𝟐]
𝒛𝟏 [𝟐] W23[1] =0.2
𝒃𝟏 𝒂𝟏 ෝ
𝒚

𝒙𝟐 W11[2] =-0.3
[𝟏] [𝟐] [𝟐]
𝒃𝟏 W12[2] = -0.2
𝒘𝟐𝟏 𝒘𝟏𝟐
[𝟏]
𝒘𝟐𝟐 [𝟏]
𝒛𝟐 [𝟏] b1[1] = -0.4
𝒙𝟑 𝒂𝟐
[𝟏] b2[1] = -0.2
𝒘𝟐𝟑
b1[2] = 0.1
[𝟏]
𝒃𝟐

=0.9
For X= { 1, 0, 1} and y=1

Find the cross entropy loss and weight updates after 1st iteration
Continued……
For the layer 2

[𝟐]
𝝏𝓛 [𝟐] −𝒚
𝒅𝒛𝟏 = = 𝒂 𝟏  For Cross entropy cost function
𝝏𝒛𝟏[𝟐]
𝝏𝓛
𝒅𝒛𝟏 [𝟐] = = 𝒂 𝟏
𝟐 𝟏 − 𝒂𝟏 𝟐 (𝒚 − 𝒂𝟏 𝟐 )  For MSE cost function
𝝏𝒛𝟏 𝟐
Exercise – With vectorization

1 4 5
x W [1]
0 3 6

−1
prod = W [1]T x b[1] −2
1 1
-1 7
-2 y=1
z [1] = prod + b[1]
4 -3
x1 a[10] a[11] a[12] ŷ a[1] = σ(z[1]) −3 −2
5
W [2]

3 -2
6 prod = W [2]T a[1] b[2] 7
x2 [0] [1]
a2 a2

z [2] = prod + b[2]

a[2] = σ(z[2]) yˆ
Computation Graph for Forward Pass

1 4 5
x W [1]
0 3 6

4 −1
prod = W [1]T x b[1] −2
3

3
z [1] = prod + b[1]
1

0.95
a[1] = σ(z[1]) W [2] −3 −2
0.73

−4.31 prod = W [2]T a[1] b[2] 7

2.69 z [2] = prod + b[2]

0.94 a[2] = σ(z[2]) yˆ 1

Computation Graph for Cost Function
Computation Graph for Backward Pass
Demo

https://playground.tensorflow.org/
Practice problems
Example 1 Computational Graph for Back
Propagation
Computational Graph for Back
Propagation
Computational Graph for Back
Propagation
Computational Graph for Back
Propagation
Computational Graph for Back
Propagation
Computational Graph for Back
Propagation
Computational Graph for Back
Propagation
Computational Graph for Back
Propagation
Computational Graph for Back
Propagation
Computational Graph for Back
Propagation
Computational Graph for Back
Propagation
Computational Graph for Back
Propagation
Computational Graph for Back
Propagation
Computational Graph for Back
Propagation
Computational Graph for Back
Propagation
Computational Graph for Back
Propagation
Computational Graph for Back
Propagation
Example 2

59
60
61
Optional
Derivative of cost function w.r.t final layer
linear function
Derivative of cost function w.r.t final layer
linear function
(Derivative of sigmoid activation function)
Derivative of cost function w.r.t final layer
linear function
Thank You All !

BITS Pilani, Pilani Campus

Notes On Backpropagation
No ratings yet
Notes On Backpropagation
14 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Deep Learning's Evolution and Impact
No ratings yet
Deep Learning's Evolution and Impact
6 pages
An Introduction To Mathematics Behind Neural Networks
No ratings yet
An Introduction To Mathematics Behind Neural Networks
5 pages
Exp - 4 - 5 (Prakash)
No ratings yet
Exp - 4 - 5 (Prakash)
10 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
5 Backward Propagation
No ratings yet
5 Backward Propagation
81 pages
Neural-Networks Back Propagation
No ratings yet
Neural-Networks Back Propagation
70 pages
Module 1 DL
No ratings yet
Module 1 DL
84 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Lecture Slides 4 - Backpropagation - 2021
No ratings yet
Lecture Slides 4 - Backpropagation - 2021
19 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
SJNanda Neural Network
No ratings yet
SJNanda Neural Network
47 pages
Machine Learning: Backpropagation Basics
No ratings yet
Machine Learning: Backpropagation Basics
5 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
An Introduction To Mathematics Behind Neural Networks - Towards Data Science
No ratings yet
An Introduction To Mathematics Behind Neural Networks - Towards Data Science
14 pages
Neural Networks: Derivation: 1 Model
No ratings yet
Neural Networks: Derivation: 1 Model
9 pages
Introduction To Feed Forward Neural Networks
No ratings yet
Introduction To Feed Forward Neural Networks
121 pages
Foundations of Deep Learning
No ratings yet
Foundations of Deep Learning
30 pages
10 Neural Nets
No ratings yet
10 Neural Nets
61 pages
L04 Slides - mlp1
No ratings yet
L04 Slides - mlp1
22 pages
9 Neural Networks Learning
No ratings yet
9 Neural Networks Learning
38 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
Understanding Backpropagation Algorithm - Towards Data Science
No ratings yet
Understanding Backpropagation Algorithm - Towards Data Science
11 pages
Lecture 02
No ratings yet
Lecture 02
37 pages
Neural Networks: Backpropagation
No ratings yet
Neural Networks: Backpropagation
9 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Unit 3
No ratings yet
Unit 3
110 pages
Lecture 02-2
No ratings yet
Lecture 02-2
37 pages
Week 2 Artificial Neural Networks - Part II
No ratings yet
Week 2 Artificial Neural Networks - Part II
40 pages
Neural Networks
No ratings yet
Neural Networks
52 pages
Cours 2
No ratings yet
Cours 2
25 pages
Clase 4 Backpropagation
No ratings yet
Clase 4 Backpropagation
63 pages
DL CS 5 M2 Class Live Session Flow May 25
No ratings yet
DL CS 5 M2 Class Live Session Flow May 25
64 pages
Neural Networks: Learning: Introduction To Machine Learning
No ratings yet
Neural Networks: Learning: Introduction To Machine Learning
8 pages
2012-1158. Backpropagation NN
No ratings yet
2012-1158. Backpropagation NN
56 pages
SJNanda Neural Network
No ratings yet
SJNanda Neural Network
43 pages
SJNanda - Neural Network
No ratings yet
SJNanda - Neural Network
43 pages
First
No ratings yet
First
92 pages
Learning 3
No ratings yet
Learning 3
98 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
Lecture 13.3 Classification ANN
No ratings yet
Lecture 13.3 Classification ANN
64 pages
5 - From Linear Models To Multi-Layer Perceptrons
No ratings yet
5 - From Linear Models To Multi-Layer Perceptrons
45 pages
Back Propogation
No ratings yet
Back Propogation
43 pages
Neural Network Backpropagation Guide
No ratings yet
Neural Network Backpropagation Guide
9 pages
Deep Learning for Beginners
100% (1)
Deep Learning for Beginners
87 pages
14 Backprop
No ratings yet
14 Backprop
34 pages
Understanding Neural Network Math
No ratings yet
Understanding Neural Network Math
6 pages
Module 3.docxaiml
No ratings yet
Module 3.docxaiml
20 pages
Neural Network Training
No ratings yet
Neural Network Training
73 pages
4 Perceptron 06 08 2025
No ratings yet
4 Perceptron 06 08 2025
32 pages
2024 04 CS115 Vector Caculus
No ratings yet
2024 04 CS115 Vector Caculus
131 pages
DS303 NN
No ratings yet
DS303 NN
20 pages
Karpathy 1 Micrograd 2
No ratings yet
Karpathy 1 Micrograd 2
27 pages
Purdue Digital Signal Processing Labs
No ratings yet
Purdue Digital Signal Processing Labs
177 pages
Eigenvalues and Eigenvectors
No ratings yet
Eigenvalues and Eigenvectors
33 pages
Logic Gate and Binary Conversion Worksheet
100% (1)
Logic Gate and Binary Conversion Worksheet
5 pages
Performance Evaluation of Multiple Machine Learning Models For Wine Quality Prediction
No ratings yet
Performance Evaluation of Multiple Machine Learning Models For Wine Quality Prediction
15 pages
SpV8: Pursuing Optimal Vectorization and Regular Computation Pattern in SPMV
No ratings yet
SpV8: Pursuing Optimal Vectorization and Regular Computation Pattern in SPMV
21 pages
A New Method For Dimensionality Reduction Using K-Means Clustering Algorithm For High Dimensional Data Set
No ratings yet
A New Method For Dimensionality Reduction Using K-Means Clustering Algorithm For High Dimensional Data Set
6 pages
Lagrangian Formalism
No ratings yet
Lagrangian Formalism
9 pages
IDT-240416 R1 Reviewer
No ratings yet
IDT-240416 R1 Reviewer
40 pages
January 2007 MS - S2 Edexcel
No ratings yet
January 2007 MS - S2 Edexcel
7 pages
Image Restoration for Engineers
No ratings yet
Image Restoration for Engineers
32 pages
Updated Syllabus - ECE - 5th To 8th Sem - 21.11.20 (Exp)
No ratings yet
Updated Syllabus - ECE - 5th To 8th Sem - 21.11.20 (Exp)
79 pages
Robot Force Control Overview
No ratings yet
Robot Force Control Overview
22 pages
Reduction of Order
No ratings yet
Reduction of Order
5 pages
CV Syllabus
No ratings yet
CV Syllabus
3 pages
Mathematical Foundation of Computer Science
No ratings yet
Mathematical Foundation of Computer Science
3 pages
Understanding Mechanics, 1995, A. J. Sadler, D.W.S. Thorning, 0199141851, 9780199141852, Oxford University Press, 1995
No ratings yet
Understanding Mechanics, 1995, A. J. Sadler, D.W.S. Thorning, 0199141851, 9780199141852, Oxford University Press, 1995
25 pages
AI BOOK - NOTES (In The Form of A PPT)
100% (1)
AI BOOK - NOTES (In The Form of A PPT)
38 pages
EXCERCISE
No ratings yet
EXCERCISE
4 pages
Chapter1: Numerical Differentiation: 1.1 Finite Difference Approximation of The Derivative
No ratings yet
Chapter1: Numerical Differentiation: 1.1 Finite Difference Approximation of The Derivative
51 pages
Theory of Computation Basics
No ratings yet
Theory of Computation Basics
60 pages
Ai Notes
No ratings yet
Ai Notes
68 pages
(Ebook PDF) Linear Systems Theory: Second Edition 2nd Edition Instant Download
100% (7)
(Ebook PDF) Linear Systems Theory: Second Edition 2nd Edition Instant Download
58 pages
Assessment of Maintenance Effectiveness For Repairable Systems PM and CM Case Studies
No ratings yet
Assessment of Maintenance Effectiveness For Repairable Systems PM and CM Case Studies
5 pages
Short Project On MBAl
67% (3)
Short Project On MBAl
19 pages
AE5031 B23 B01 B02 Syllabus
No ratings yet
AE5031 B23 B01 B02 Syllabus
5 pages
Classical Control Revision
No ratings yet
Classical Control Revision
7 pages
Operations Research
No ratings yet
Operations Research
9 pages
Ai HW-4
No ratings yet
Ai HW-4
4 pages
DL - Assignment 3 Solution
No ratings yet
DL - Assignment 3 Solution
7 pages
Yuki X Yuki X X X: Disp Disp Disp
No ratings yet
Yuki X Yuki X X X: Disp Disp Disp
4 pages

Part 2 Module 2 DL BP

Uploaded by

Part 2 Module 2 DL BP

Uploaded by

Deep Learning

Disclaimer and Acknowledgement

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

Step 2: Determine the cost.

Step 4: Update each

Brad Quinton Scott Chin

• The calculus just gets a bit more complicated for a neural

3. Dynamic Programming (Memorization)

The corresponding computation graph

Cross entropy cost function:

b[2] →z[2] →a[2] → 𝓛

w[2] →z[2] →a[2] → 𝓛

Dimension of dz[2] is (n[2],1)

a[1] →z[2] →a[2] → 𝓛

z[1] →a[1] →z[2] →a[2] → 𝓛

b[1] → z[1] →a[1] →z[2] →a[2] → 𝓛

𝝏𝓛 𝝏𝓛 𝝏𝒂[𝟐] 𝝏𝒛[𝟐] 𝝏𝒂[𝟏] 𝝏𝒛[𝟏] 𝝏𝓛

W[1]→ z[1] →a[1] →z[2] →a[2] → 𝓛

𝝏𝓛 𝝏𝓛 𝝏𝒂[𝟐] 𝝏𝒛[𝟐] 𝝏𝒂[𝟏] 𝝏𝒛[𝟏] 𝝏𝓛 𝑻

z [2] = prod + b[2]

−4.31 prod = W [2]T a[1] b[2] 7

2.69 z [2] = prod + b[2]

0.94 a[2] = σ(z[2]) yˆ 1

BITS Pilani, Pilani Campus

You might also like