0% found this document useful (0 votes)

98 views7 pages

Neural Networks Handout

1. This document introduces multi-layer neural networks as an extension of perceptrons to handle non-linear classification problems with multiple outputs. 2. A multi-layer neural network uses multiple hidden layers with sigmoid activation functions between the input, hidden, and output layers to learn non-linear discriminating surfaces. 3. The sigmoid activation function is used because it is smooth and differentiable, unlike the step function used in perceptrons, allowing error backpropagation for training the network weights to minimize misclassification error across multiple outputs.

Uploaded by

Sandeep Kumar Yadlapalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views7 pages

Neural Networks Handout

Uploaded by

Sandeep Kumar Yadlapalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Input Output

layer layer
x1
Introduction to Machine Learning x2
Neural Networks x3 Output
x4
Varun Chandola x5

March 8, 2019
– Why not work with thresholded perceptron?

Outline ∗ Not differentiable

– How to learn non-linear surfaces?
– How to generalize to multiple outputs, numeric output?
Contents
1 Extending Perceptrons 1 The reason we do not use the thresholded perceptron is because the objec-
tive function is not differentiable. To understand this, recall that to compute
2 Multi Layered Perceptrons 2 the gradient for perceptron learning we compute the partial derivative of the
2.1 Generalizing to Multiple Labels . . . . . . . . . . . . . . . . . 2 objective function with respect to every component of the weight vector.
2.2 Properties of Sigmoid Function . . . . . . . . . . . . . . . . . 4
∂E ∂ 1X
2.3 Motivation for Using Non-linear Surfaces . . . . . . . . . . . . 4 = (yj − w> xj )2
∂wi ∂wi 2 j
3 Feed Forward Neural Networks 5
Now if we use the thresholded perceptron, we need to replace w> xj with o in
4 Backpropagation 5 the above equation, where o is −1 if w> xj < 0 and 1, otherwise. Obviously,
4.1 Derivation of the Backpropagation Rules . . . . . . . . . . . . 7 given that o is not smooth, the function is not differentiable. Hence we work
with the unthresholded perceptron unit.
5 Final Algorithm 10

6 Wrapping up Neural Networks 11 2 Multi Layered Perceptrons

7 Bias Variance Tradeoff 11
2.1 Generalizing to Multiple Labels
• Distinguishing between multiple categories
1 Extending Perceptrons
• Solution: Add another layer - Multi Layer Neural Networks
• Questions?

2
Hidden Output 2.2 Properties of Sigmoid Function
Inputs
layer layer 1 1 1

x1 o1

fh (x)

fs (x)
ft (x)
0 0
x2 o2
x3 o3
−1 −1 0
x4 o4
x0 = 1 x x x
1
The threshold output in the case of the sigmoid unit is continuous and
smooth, as opposed to a perceptron unit or a linear unit. A useful property
Multi-class classification is more applicable than binary classification. of sigmoid is that its derivative can be easily expressed as:
Applications include, handwritten digit recognition, robotics, etc.
dσ(y)
= σ(y)(1 − σ(y))
• Linear Unit dy

• Perceptron Unit One can also use e−ky instead of e−y , where k controls the “steepness” of the
threshold curve.
• Sigmoid Unit

– Smooth, differentiable threshold function 2.3 Motivation for Using Non-linear Surfaces
1
σ(net) =
1 + e−net

– Non-linear output

x0
x1
x2 Output
x3 net = w> x o = σ(net)
x4
As mentioned ear-
lier, the perceptron unit cannot be used as it is not differentiable. The linear The learning problem
unit is differentiable but only learns linear discriminating surfaces. So to is to recognize 10 different vowel sounds from the audio input. The raw sound
learn non-linear surfaces, we need to use a non-linear unit such as the sig- signal is compressed into two features using spectral analysis.
moid.

3 4
Hidden Output • Objective function for N training examples:
Inputs
layer layer
N
X N k
1 XX
J= Ji = (yil − oil )2
x1 o1 2 i=1 l=1
i=1
x2 o2
x3 o3 • yil - Target value associated with lth class for input (xi )
x4 o4
• yil = 1 when k is true class for xi , and 0 otherwise
x0 = 1
1
• oil - Predicted output value at lth output node for xi

What are we learning?

3 Feed Forward Neural Networks Weight vectors for all output and hidden nodes that minimize J
• d + 1 input nodes (including bias) The first question that comes to mind is, why not use a standard gradient
descent based minimization as the one that we saw in single perceptron unit
• m hidden nodes learning. The reason is that the output at every output node (ol ) is directly
• k output nodes dependent on the weights associated with the output nodes but not with
weights at hidden nodes. But the input values are “used” by the hidden
• At hidden nodes: wj , 1 ≤ j ≤ m, wj ∈ Rd+1 nodes and are not “visible” to the output nodes. To learn all the weights
• At output nodes: wl , 1 ≤ l ≤ k, wl ∈ Rm+1 simultaneously, direct minimization is not possible. Advanced methods such
as Backpropagation need to be employed.
The multi-layer neural network shown above is used in a feed forward mode,
i.e., information only flows in one direction (forward). Each hidden node 1. Initialize all weights to small values
“collects” the inputs from all input nodes and computes a weighted sum
of the inputs and then applies the sigmoid function to the weighted sum. 2. For each training example, hx, yi:
The output of each hidden node is forwarded to every output node. The
(a) Propagate input forward through the network
output node “collects” the inputs (from hidden layer nodes) and computes a
weighted sum of its inputs and then applies the sigmoid function to obtain (b) Propagate errors backward through the network
the final output. The class corresponding to the output node with the largest
output value is assigned as the predicted class for the input. Gradient Descent
For implementation, one can even represent the weights as two matrices,
W (1) (m × d + 1) and W (2) (k × m + 1). • Move in the opposite direction of the gradient of the objective function

• −η∇J
N
X
4 Backpropagation ∇J = ∇Ji
i=1
• Assume that the network structure is predetermined (number of hidden
nodes and interconnections)

5 6
• What is the gradient computed with respect to? Observation 1 P
Weight wrq is connected to J through netr = i wrq urq .
– Weights - m at hidden nodes and k at output nodes
∂J ∂J ∂netr ∂J
– wj (j = 1 . . . m) = = urq
∂wrq ∂netr ∂wrq ∂netr
– wl (l = 1 . . . k)
∂J
PN ∂Ji Observation 2
• wj ← wj − η ∂w = wj − η i=1 ∂wj
j netl for an output node is connected to J only through the output value of
∂J
PN ∂J the node (or ol )
• wl ← wl − η ∂w l
= wl − η i=1 ∂wl

 ∂Ji  ∂J ∂J ∂ol
∂w1
=
 ∂Ji  ∂netl ∂ol ∂netl
 ∂w2  The first term above can be computed as:
∇Ji =  .. 
 . 
k
∂Ji
∂J ∂ 1X
∂wm+k
= (yl − ol )2
∂ol ∂ol 2 l=1
 ∂Ji

∂wr1 The entries in the summation in the right hand side will be non zero only for
∂Ji  ∂Ji 
= ∂wr2  l. This results in:
∂wr ..
. ∂J ∂ 1
= (yl − ol )2
∂ol ∂ol 2
• Need to compute ∂Ji = −(yl − ol )
∂wrq
Moreover, the second term in the chain rule above can be computed as:
• Update rule for the q th entry in the rth weight vector:
∂ol ∂σ(netl )
X ∂JiN =
∂J ∂netl ∂netl
wrq ← wrq − η = wrq − η
∂wrq ∂wrq = ol (1 − ol )
i=1

The last result arises from the fact ol is a sigmoid function. Using the above
4.1 Derivation of the Backpropagation Rules results, one can compute the following.

Assume that we only one training example, i.e., i = 1, J = Ji . Dropping the ∂J

= −(yl − ol )ol (1 − ol )
subscript i from here onwards. ∂netl
Let
• Consider any weight wrq δl = (yl − ol )ol (1 − ol )
• Let urq be the q th element of the input vector coming in to the rth unit. Therefore,
∂J
= −δl
∂netl

7 8
Finally we can compute the partial derivative of the error with respect to the Thus, the gradient becomes:
weight wlj as:
∂J ∂J ∂J
= −δl ulj = ujp
∂wlj ∂wjp ∂netj
Xk
Update Rule for Output Units = −zj (1 − zj )( δl wlj )ujp
wlj ← wlj + ηδl ulj l=1
= −δj ujp
where δl = (yl − ol )ol (1 − ol ).
• Question: What is ulj for the lth output node? Update Rule for Hidden Units
• ulj is the j th input to lth output node, which will be the output coming wjp ← wjp + ηδj ujp
from the j th hidden node.
k
X
Observation 3 δj = oj (1 − oj ) δl wlj
netj for a hidden node is connected to J through all output nodes l=1

k δl = (yl − ol )ol (1 − ol )
∂J X ∂J ∂netl
=
∂netj l=1
∂netl ∂netj
Remember that we have already computed the first term on the right hand • Question: What is ujp for the j th hidden node?
side for output nodes:
∂J • ujp is the pth input to j th hidden node, which will be pth attribute value
= −δl for the input, i.e., xp .
∂netl
where δl = (yl − ol )ol (1 − ol ). This result gives us:
∂J
k
X ∂netl 5 Final Algorithm
= −δl
∂netj l=1
∂netj • While not converged:
k
X ∂netl ∂zj – Move forward to compute outputs at hidden and output nodes
= −δl
∂zj ∂netj
l=1 – Move backward to propagate errors back
Xk
∂zj ∗ Compute δ errors at output nodes (δl )
= −δl wlj
l=1
∂netj ∗ Compute δ errors at hidden nodes (δj )
Xk
– Update all weights according to weight update equations
= −δl wlj zj (1 − zj )
l=1
k
X
= −zj (1 − zj ) δl wlj
l=1

9 10
6 Wrapping up Neural Networks • Poor performance on unseen data

• Error function contains many local minima

Low Variance - High Bias
• No guarantee of convergence
• Less sensitive to training data
– Not a “big” issue in practical deployments
• Higher training error
• Improving backpropagation
• Better performance on unseen data
– Adding momentum
– Using stochastic gradient descent • General rule of thumb – If two models are giving similar training error,
– Train multiple times using different initializations choose the simpler model

Adding momentum to the learning process refers to adding an “inertia” term • What is simple for a neural network?
which tries to keep the current value of a weight value similar to the one taken • Low weights in the weight matrices?
in the previous round.
– Why?

7 Bias Variance Tradeoff – The simple answer to this is that if the weights in the weight
vectors at each node are high, the resulting discriminating sur-
• Neural networks are universal function approximators face learnt by the neural network will be highly non-linear. If
the weights are smaller, the surface will be smoother (and hence
– By making the model more complex (increasing number of hidden simpler).
layers or m) one can lower the error
• Penalize solutions in which the weights are high
• Is the model with least training error the best model?
• Can be done by introducing a penalty term in the objective function
– The simple answer is no!
– Risk of overfitting (chasing the data) – Regularization
– Overfitting ⇐ High generalization error
Regularization for Backpropagation !
m d+1 k m+1
λ X X (1) 2 X X (2) 2
High Variance - Low Bias Je = J + (wji ) + (wlj )
2n j=1 i=1 l=1 j=1
• “Chases the data”

• Model parameters change significantly when the training data is changed,

hence the term high variance.

• Very low training error

11 12
Other Extensions?

• Use a different loss function (why)?

– Quadratic (Squared), Cross-entropy, Exponential, KL Divergence,

etc.

• Use a different activation function (why)?

– Sigmoid
1
f (z) =
1 + exp(−z)
– Tanh
ez − e−z
f (z) =
ez + e−z
– Rectified Linear Unit (ReLU)

f (z) = max(0, z)

References

Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
Learning in Multi-Layer Perceptrons - Back-Propagation: Neural Computation: Lecture 7
No ratings yet
Learning in Multi-Layer Perceptrons - Back-Propagation: Neural Computation: Lecture 7
20 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
68 pages
Neural Networks & Deep Learning 2025
No ratings yet
Neural Networks & Deep Learning 2025
73 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Neural Networks Unit-3
No ratings yet
Neural Networks Unit-3
14 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
Neural Network
No ratings yet
Neural Network
44 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Lect8 DNN
No ratings yet
Lect8 DNN
33 pages
Mind - How To Build A Neural Network (Part One)
No ratings yet
Mind - How To Build A Neural Network (Part One)
9 pages
Deep Learning & Neural Networks
No ratings yet
Deep Learning & Neural Networks
10 pages
36-Multi-Layer Perceptron and Its Properties-30-10-2024
No ratings yet
36-Multi-Layer Perceptron and Its Properties-30-10-2024
39 pages
4.2 Ann
No ratings yet
4.2 Ann
26 pages
l7 - Learning in Multi-Layer Perceptrons, Back-Propagation
No ratings yet
l7 - Learning in Multi-Layer Perceptrons, Back-Propagation
16 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
How To Build Your Own Neural Network From Scratch in
No ratings yet
How To Build Your Own Neural Network From Scratch in
6 pages
NN Intro
No ratings yet
NN Intro
34 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
14 pages
ML Unit - 2
No ratings yet
ML Unit - 2
70 pages
Main
No ratings yet
Main
25 pages
Part - Understanding A Neural Network
No ratings yet
Part - Understanding A Neural Network
96 pages
Module 3 - Modified
No ratings yet
Module 3 - Modified
106 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Pr2 ANN WriteUp
No ratings yet
Pr2 ANN WriteUp
11 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
13 - Neural Network (Perceptrons)
No ratings yet
13 - Neural Network (Perceptrons)
31 pages
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
No ratings yet
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
31 pages
Neural Network (Perceptrons)
No ratings yet
Neural Network (Perceptrons)
31 pages
Module 5 Lecture 2
No ratings yet
Module 5 Lecture 2
45 pages
Deep Learning for Beginners
100% (1)
Deep Learning for Beginners
87 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
8 pages
NN 2
No ratings yet
NN 2
12 pages
Neural
No ratings yet
Neural
53 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
Multilayer Percept Ron
No ratings yet
Multilayer Percept Ron
7 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
UNIT 3 - Backpropagation Algorithm
No ratings yet
UNIT 3 - Backpropagation Algorithm
38 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Classification BP Regression KNN Other Classifiers - Final
No ratings yet
Classification BP Regression KNN Other Classifiers - Final
116 pages
John Bullinaria's Step by Step Guide To Implement Neuronal Network in C
No ratings yet
John Bullinaria's Step by Step Guide To Implement Neuronal Network in C
6 pages
Understanding Backpropagation Algorithm - Towards Data Science
No ratings yet
Understanding Backpropagation Algorithm - Towards Data Science
11 pages
DL - ANN - RNN - CNN (Autosaved) (Autosaved)
No ratings yet
DL - ANN - RNN - CNN (Autosaved) (Autosaved)
53 pages
CS224n: Natural Language Processing With Deep Learning
No ratings yet
CS224n: Natural Language Processing With Deep Learning
18 pages
Understanding and Creating Neural Networks
No ratings yet
Understanding and Creating Neural Networks
69 pages
Lecture 8
No ratings yet
Lecture 8
65 pages
Chap11 Neural Nets
No ratings yet
Chap11 Neural Nets
38 pages
Neural Network
100% (1)
Neural Network
54 pages
NN Introduction MES
No ratings yet
NN Introduction MES
39 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
L04 Slides - mlp1
No ratings yet
L04 Slides - mlp1
22 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Simple Backpropagation Guide
No ratings yet
Simple Backpropagation Guide
26 pages
Leadership Principles
No ratings yet
Leadership Principles
1 page
Intro to Logistic Regression
No ratings yet
Intro to Logistic Regression
4 pages
Smart Agricultural System
No ratings yet
Smart Agricultural System
7 pages
00 Zeroth Review Template
No ratings yet
00 Zeroth Review Template
14 pages
Tech Resume: Barkha Ahuja
No ratings yet
Tech Resume: Barkha Ahuja
1 page
Internship Day To Day Activities
No ratings yet
Internship Day To Day Activities
2 pages
IPT Report
No ratings yet
IPT Report
46 pages
Aspire Systems - Final
No ratings yet
Aspire Systems - Final
15 pages
Learn Perl in About 2 Hours 30 Minutes: by Sam Hughes
No ratings yet
Learn Perl in About 2 Hours 30 Minutes: by Sam Hughes
23 pages
Shell Programming for IT Students
No ratings yet
Shell Programming for IT Students
2 pages
Standard Chartered - Final
No ratings yet
Standard Chartered - Final
3 pages
HTML
No ratings yet
HTML
1 page
Expt No: 2 (C) POPING Values From Stack To Registers Date: 19-Jan-2016
No ratings yet
Expt No: 2 (C) POPING Values From Stack To Registers Date: 19-Jan-2016
2 pages
Expt No:1 (A) Rom To Ram Transfer Date: 12 JAN
No ratings yet
Expt No:1 (A) Rom To Ram Transfer Date: 12 JAN
2 pages
Expt No:3 (B) Addition of BCD Data Date: 2 FEB
No ratings yet
Expt No:3 (B) Addition of BCD Data Date: 2 FEB
2 pages
Datapath 1
No ratings yet
Datapath 1
10 pages
Audio Codec
No ratings yet
Audio Codec
3 pages
Speed and Distance Measurement Using Ultrasonic Sensor
No ratings yet
Speed and Distance Measurement Using Ultrasonic Sensor
4 pages
Assignment - Personal Letter of Expression: Psychology and Sociology
No ratings yet
Assignment - Personal Letter of Expression: Psychology and Sociology
3 pages
Hostel Emergency Leave Form
No ratings yet
Hostel Emergency Leave Form
1 page
Answer / Skema Jawapan Peperiksaan Percubaan SPM 2011 Additional Mathematics (Paper 1)
0% (1)
Answer / Skema Jawapan Peperiksaan Percubaan SPM 2011 Additional Mathematics (Paper 1)
7 pages
H2 Math 9740 Exam Solutions 2013
100% (1)
H2 Math 9740 Exam Solutions 2013
30 pages
Strategic Management Assignment Guide
No ratings yet
Strategic Management Assignment Guide
10 pages
6 Sigma 7 QC Tools Fmea Doe
100% (1)
6 Sigma 7 QC Tools Fmea Doe
19 pages
Decision-Making Based On Data Analytics
No ratings yet
Decision-Making Based On Data Analytics
3 pages
Ledoux Concentration of Measure
No ratings yet
Ledoux Concentration of Measure
250 pages
Functions and Their Graphs
No ratings yet
Functions and Their Graphs
71 pages
Relation and Functions
100% (3)
Relation and Functions
14 pages
Control Systems and Engineering Lesson 3
No ratings yet
Control Systems and Engineering Lesson 3
46 pages
Lecture 13
No ratings yet
Lecture 13
5 pages
6230b Lctof
No ratings yet
6230b Lctof
2 pages
10210MA102 CODE Complete Lecture Notes
No ratings yet
10210MA102 CODE Complete Lecture Notes
379 pages
Calculus Course
No ratings yet
Calculus Course
354 pages
Coding Qualitative Data: A Synthesis Guiding The Novice: Qualitative Research Journal May 2019
No ratings yet
Coding Qualitative Data: A Synthesis Guiding The Novice: Qualitative Research Journal May 2019
28 pages
Order Reduction and Variation of Parameters Reduction of Order
No ratings yet
Order Reduction and Variation of Parameters Reduction of Order
5 pages
Overview of Parametric Models:: C University of New South Wales School of Risk and Actuarial Studies
No ratings yet
Overview of Parametric Models:: C University of New South Wales School of Risk and Actuarial Studies
11 pages
Practice Quiz M2 (Ungraded) 4
No ratings yet
Practice Quiz M2 (Ungraded) 4
4 pages
Cal Bang
No ratings yet
Cal Bang
2 pages
Graphs of A Piecewise Linear Function
No ratings yet
Graphs of A Piecewise Linear Function
11 pages
Residues & Applications in Complex Analysis
No ratings yet
Residues & Applications in Complex Analysis
54 pages
Engineering Tolerances Guide
No ratings yet
Engineering Tolerances Guide
15 pages
Assignment#3 Opeman, Abonitalla 3bsais-2
No ratings yet
Assignment#3 Opeman, Abonitalla 3bsais-2
2 pages
Chapter 2 - Sequences: Sequence
No ratings yet
Chapter 2 - Sequences: Sequence
16 pages
Restaurant Revenue Management PDF
No ratings yet
Restaurant Revenue Management PDF
15 pages
Decision Making Reporter
No ratings yet
Decision Making Reporter
17 pages
Decision Theory
No ratings yet
Decision Theory
19 pages
Abb 1
No ratings yet
Abb 1
9 pages
Leander Boykin - : What Is Educational Research?
No ratings yet
Leander Boykin - : What Is Educational Research?
1 page
Functions of Two Variables Limits
No ratings yet
Functions of Two Variables Limits
43 pages
Forecasting Methods
No ratings yet
Forecasting Methods
38 pages

Neural Networks Handout

Uploaded by

Neural Networks Handout

Uploaded by

Input Output

Outline ∗ Not differentiable

6 Wrapping up Neural Networks 11 2 Multi Layered Perceptrons

What are we learning?

Assume that we only one training example, i.e., i = 1, J = Ji . Dropping the ∂J

• Error function contains many local minima

• Model parameters change significantly when the training data is changed,

• Very low training error

• Use a different loss function (why)?

– Quadratic (Squared), Cross-entropy, Exponential, KL Divergence,

• Use a different activation function (why)?

You might also like