0% found this document useful (0 votes)

152 views11 pages

Neural Networks Backpropagation Algorithm: COMP4302/COMP5322, Lecture 4, 5

The document provides an overview of backpropagation for neural networks. It begins with an introduction to backpropagation and the XOR problem. It then describes the backpropagation algorithm, including the derivation of the algorithm using an error function and gradient descent. It details how backpropagation calculates the error signals to adjust weights in both output and hidden layers. The document concludes with discussions on topics like stopping criteria, input and output encoding.

Uploaded by

Martin Garcia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

152 views11 pages

Neural Networks Backpropagation Algorithm: COMP4302/COMP5322, Lecture 4, 5

Uploaded by

Martin Garcia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Backpropagation - Outline

COMP4302/COMP5322, Lecture 4, 5

NEURAL NETWORKS

Backpropagation
z XOR problem
z neuron model
Backpropagation algorithm
z

Backpropagation Algorithm

z
z
z
z

Heuristic modifications of backpropagation

z
z
z

COMP4302/5322 Neural Networks, w4, s2 2003

Convergence example
Momentum
Learning rate

Limitations and capabilities

Interesting applications

COMP4302/5322 Neural Networks, w4, s2 2003

Boundary Investigation for the XOR Problem

XOR problem - Example

Derivation
Example
Error space
Universality of backpropagation
Generalization and overfitting

first layer boundaries:

XOR problem is not linearly separable and, hence, cannot be

solved by a single layer perceptron network

1
1
0
0
p1 = , t1 = 0 p2 = , t2 = 1 p3 = , t3 = 1 p4 = , t4 = 0
1
0
1
0

1st layer, 1st neuron;

1st decision boundary

But it can be solved by a multilayer perceptron network:

The 2 perceptrons in the input layer
identify linearly separable parts, and
their outputs are combined by another
perceptron to form the final solution

COMP4302/5322 Neural Networks, w4, s2 2003

2d layer combines the two boundaries together:

2d layer, 1st neuron:
combined boundary

COMP4302/5322 Neural Networks, w4, s2 2003

Can We Train Such a Network?

There exist a two layer perceptron network capable of solving the
XOR problem!

But how can we train such network to learn from examples?

z
z

Rosenblatt and Widrow were aware of this

there is no reason to suppose that any of the virtues of perceptrons carry

over to the many-layered version
Mortal blow in the area the majority of scientific community walked
away from the field of NNs

Book: Parallel Distributed Processing (1986) by Rumelhart and

McClelland

Multi-layer networks can be trained by the backpropagation

algorithm (also called generalized gradient descent and generalized
delta rule)

Multi-layer networks trained with backpropagation are currently

the most widely used NNs

Rosenblatt and others were not able to successfully modify the

perceptron rule to train these more complex networks
1969 book Perceptrons by Minsky and Papert
z

Can We Train Such a Network? cont.

1st layer, 2d neuron:

2d decision boundary

Discovery of the backpropagation algorithm

z 1974 by Paul Werbos
z

his thesis presented the algorithm in the context of general networks, with
NNs as a special case, and was not disseminated in the NN community

Rediscovered by David Rumelhart, Geoffrey Hinton, Ronald Williams

1986; David Parker 1985; Yann Le Cun 1985
COMP4302/5322 Neural Networks, w4, s2 2003

COMP4302/5322 Neural Networks, w4, s2

2003

COMP4302/5322 Neural Networks, w4, s2 2003

Backpropagation Neuron Model

Backpropagation Network

Example of a backpropagation network with one hidden layer

a = f(wp+b)

output neurons

any differentiable transfer function f can be used; most frequently the

sigmoid and tan-sigmoid (hyperbolic tangent sigmoid) functions are used:
a=

1
1 + e n

hidden neurons

e n e n
e n + e n

inputs

COMP4302/5322 Neural Networks, w4, s2 2003

Backpropagation Learning

Backpropagation Algorithm

Similar to ADALINEs learning

Supervised learning
We define an error function (based on the training set) and would
like to minimize it by adjusting the weights using hill climbing
algorithm
Mean square error (mse) is the performance index

Backpropagation (generalized gradient descent) is a generalization

of the LMS algorithm

We define an error function and would like to minimize it using

the gradient descent

z
z

Error - difference between the target (t) and actual (a) network output
Mean square error of one output neuron over all n examples:

mse =

1
1
(t ( k ) a ( k )) 2
e(k ) 2 = n
n k =1
k =1

Multilayer perceptrons used the backpropagation algorithm to

adjusts the weights and biases of the network in order to minimize
the mean square error (over all output and all examples)
COMP4302/5322 Neural Networks, w4, s2 2003

mean squared error is the performance index

This is achieved by adjusting the weights

The generalized delta rule does this by

calculating the error for the current input
example and then backpropagating this
error from layer to layer

COMP4302/5322 Neural Networks, w4, s2 2003

Backpropagation Algorithm Intuitive

Understanding

Backpropagation - Derivation

How to adjust the weights?

For output neuron the desired and target output is known, so the
adjustment is simple
For hidden neurons its not that obvious!

a neural network with one hidden layer; indexes:

i over output neurons, j over hidden, k over inputs, over input patterns

Intuitively: if a hidden neuron is connected

to output with large error, adjust its
weights a lot, otherwise dont alter the
weights too much
Mathematically: weights of a hidden
neuron are adjusted in direct proportion
to the error in the neuron to which it is
connected

mse (over all neurons, over all patterns):

1
( d i oi ) 2
2 i

d i target output of neuron i for input pattern

oi actual output of neuron i for input pattern

Express E in terms of weights and input signals

1. Input for the hidden neuron j for
net j = wkj .ok + b j

2. Activation of neuron j as function of its input:

o j = f (net j ) = f ( wkj .ok + b j )

COMP4302/5322 Neural Networks, w4, s2 2003

COMP4302/5322 Neural Networks, w4, s2

2003

COMP4302/5322 Neural Networks, w4, s2 2003

Backpropagation Derivation - 2

Backpropagation Derivation - 3

3. Input for the output neuron i :

8. For hidden neuron - calculating the derivatives using the chain rule:

neti = w ji .oj + b j = w ji f ( wkj .ok + t j ) + bi

wkj =

4. Output for the output neuron i :

= f ( w ji . f ( wkj .ok + t j ) + bi )

5. Substituting 4 into E:

where

1
E = d i f ( w ji . f ( wkj .ok + t j ) + bi ))
2 i
j
k

) (

i = d i oi f net i

w pq =

1+ e

netl

net
ol
= 1+ e
netl
netl

0.8
0.6

0.4
0.2
0
-1

-0.5

0.5

netl

w pq new = w pq old + w pq

where o is activation of an input or hidden neuron and

eq. 7 (output neuron) or eq. 8 (hidden neuron)

1. Determine the architecture

how many input and output neurons; what output encoding
hidden neurons and layers

wkj = j ok = oj 1 oj w ji i ok

w pq (t ) - weight from node p to node q at time t

w pq = q o p - weight change
i = (d i o i ) oi (1 oi )

w ji = d i oi oi (1 oi ) oj

Present a training example and propagate it through the network

(forward pass)
Calculate the actual output
Adapt weights starting from the output layer and working backwards
(backward pass)

j = o j (1 o j ) w ji i

i = (d i o i ) oi (1 oi )

is given either by

COMP4302/5322 Neural Networks, w4, s2 2003

w pq (t + 1) = w pq (t ) + w pq

2. Initialize all weights and biases to small random values, typically [-1,1]
3. Repeat until termination criterion satisfied:

11. Backpropagation rule for sigmoid transfer function:

output neuron
hidden neuron

Backpropagation - Summary

= ol 1 ol

(1 + e )
netl

q
inputpatterns

Backpropagation Derivation - 4

for output neuron

10. From the formulas for => we must be able to calculate the derivatives
for f. For a sigmoid transfer function:
1

j = f net j w ji

9. In general, for a connection from p to q :

COMP4302/5322 Neural Networks, w4, s2 2003

for hidden neuron

E
= d i oi f net i oj = i oj
w ji

6. Steepest gradient descent: adjust the weights so that the change moves
the system down the error surface in the direction of the locally steepest
descent, given by the negative of the gradient:
where

f ( net l ) = ol =

= i w ji f net kj ok = j ok

w ji =

) (

= d i oi f neti w ji f net j ok =

oi = f (neti ) = f ( w ji .oj + b j ) =
j

E
E oj
=
=
wkj
o j wkj

COMP4302/5322 Neural Networks, w4, s2 2003

Stopping Criteria

- for output neuron i

j = o j (1 o j ) w ji i - for hidden neuron j

(the sum is over the i nodes in the

layer above the node j)

wp q
op
p

COMP4302/5322 Neural Networks, w4, s2 2003

Inputs Encoding

The stopping criteria is checked at the end of each epoch:

z The error (mean absolute or mean square) at the end of an epoch is

below a threshold
z All training examples are propagated and the mean (absolute or
square) error is calculated
zThe threshold is determined heuristicly e.g. 0.3
z Maximum number of epochs is reached
z Early stopping using a validation set (TTS)

How to encode the inputs for nominal attributes?

Example - nominal attribute A with values none, some and many
Local encoding - use a single input neuron and use an appropriate
number of distinct values to correspond to the attribute values, e.g.
none=0, some=0.5 and many=1
Distributed (binary) encoding use one neuron for each attribute value
which is on or off (i.e. 1 or 0) depending on the correct value

It typically takes hundreds or thousands of epochs for an NN

to converge
Try nnd11bc!

COMP4302/5322 Neural Networks, w4, s2 2003

COMP4302/5322 Neural Networks, w4, s2

2003

COMP4302/5322 Neural Networks, w4, s2 2003

How to Determine if an Example is Correctly

Classified?

Output Encoding
How to encode the outputs and represent targets?
Local encoding
1 output neuron
different output values represent different classes, e.g. <0.2 class 1,
>0.8 class 2, in between ambiguous class (class 3)
Distributed (binary, 1-of-n) encoding is typically used in multi class
problems
Number of outputs = number of classes
Example: 3 classes, 3 output neurons; class 1 is represented as 1 0 0,
class 2 - as 0 1 0 and class 3 - as 0 0 1
Another representation of the targets: use 0.1 instead of 0 and 0.9
instead of 1
Motivation for choosing binary over local encoding
Provides more degree of freedom to represent the target function (n
times as many weights available)
The difference between the the output with highest value and the
second highest can be used as a measure how confident the prediction
is (close values => ambiguous classification)
COMP4302/5322 Neural Networks, w4, s2 2003

ex.1: 0.6 0.1 | class 1 (banana)

ex.2: 0.2 0.3 | class 2 (orange)

6
-0.4

n=(inputs+output_neurons)/2

How many output neurons?

What encoding of the outputs?
10 for class 1, 01 for class 0
Initial weights and learning rate

1. Forward pass for ex. 1 - calculate the outputs o6 and o7

2 classes, 2 dim. input data

training set:

How many inputs?

How many hidden neurons?
Heuristic:

COMP4302/5322 Neural Networks, w4, s2 2003

Backpropagation Example (cont. 1)

Backpropagation - Example

Network architecture

Accuracy may be used to evaluate performance once training has finished

or as a stopping criteria checked at the end of each epoch
Binary encoding
apply each example and get the resulting output activations of the
output neurons; the example will belong to the class corresponding to
the output neuron with highest activation.
Example: 3 classes; the outputs for ex.X are 0.3, 0.7, 0.5 => ex. X
belongs to class 2

0.1

0.2

-0.1

-0.2
0.1

0.1
0

0.2

0.5

0.2
0.3

hidden
neurons

0.6

-0.1
4

-0.2

output
neurons

0.6

-0.4

net4= o1 w14+ o2w24+b4=0.60+0.10.2+0.2=0.22

o4=1/(1+e-net4) =0.55
net5= o1 *w15+ o2*w25+b5=0.6*0.3+0.1*(-0.4)+0.5=0.64
o5=1/(1+e-net5) =0.65
Activations of the output units:
net6= o3 *w36+ o4*w46+ o5*w56 +b6=0.53*(-0.4)+0.55*0.1+0.65*0.6-0.1=0.13
o6=1/(1+e-net6) =0.53

input
neurons

Lets =0.1 and the weights are set

as in the picture
COMP4302/5322 Neural Networks, w4, s2 2003

o1=0.6, o2=0.1, target output 1 0, i.e. class 1

Activations of the hidden units:
net3= o1 *w13+ o2*w23+b3=0.6*0.1+0.1*(-0.2)+0.1=0.14
o3=1/(1+e-net3) =0.53

Backpropagation Example (cont. 2)

2. Backward pass for ex. 1

net7= o3 w37+ o4w47+ o5w57 +b7=0.530.2+0.55(-0.1)+0.65(-0.2)+0.6=0.52

22
o7=1/(1+e-net7) =0.63 COMP4302/5322 Neural Networks, w4, s2 2003

Backpropagation Example (cont. 3)

Calculate the errors of the hidden units 3, 4 and 5

Calculate the output errors 6 and 7 (note that d6=1, d7=0 for class 1)
6 = (d6-o6) * o6 * (1-o6)=(1-0.53)*0.53*(1-0.53)=0.12
7 = (d7-o7) * o7 * (1-o7)=(0-0.63)*0.63*(1-0.63)=-0.15

3 = o3 * (1-o3) * (w36* 6 +w37 * 7 ) =

= 0.53*(1-0.53)(-0.4*0.12+0.2*(-0.15))=-0.019
Similarly for 4 and 5

Calculate the new weights between the hidden and output units (=0.1)
w36= * 6 * o3 = 0.1*0.12*0.53=0.006
w36new = w36old + w36 = -0.4+0.006=-0.394

Calculate the new weights between the input and hidden units (=0.1)
w13= * 3 * o1 = 0.1*(-0.019)*0.6=-0.0011
w13new = w13old + w13 = 0.1-0.0011=0.0989
Similarly for w23new , w14new , w24new , w15new and w25new; b3, b4 and b6

w37= * 7 * o3 = 0.1*-0.15*0.53=-0.008
w37new = w37old + w37 = 0.2-0.008=-0.19
Similarly for w46new , w47new , w56new and w57new
For the biases b6 and b7 (remember: biases are weights with input 1):
b6= * 6 * 1 = 0.1*0.12=0.012
b6new = b6old + b6 = -0.1+0.012=-0.012
COMP4302/5322 Neural Networks, w4, s2 2003
23
Similarly for b7

COMP4302/5322 Neural Networks, w4, s2

2003

3. Repeat the same procedure for the other training examples

Forward pass for ex. 2backward pass for ex.2
Forward pass for ex. 3backward pass for ex. 3

Note: its better to apply input examples in random order

COMP4302/5322 Neural Networks, w4, s2 2003

Backpropagation Example (cont. 4)

Steepest Gradient Descent

Not optimal - is guaranteed to find a minimum but it might

4. At the end of the epoch check if the stopping criteria is

satisfied:

be a local minimum!
Backpropagations error space: many local and 1 global

if yes: stop training

if not, continue training:
epoch++
go to step 1

minimum => the generalized gradient descent may not find

the global minimum

COMP4302/5322 Neural Networks, w4, s2 2003

Error Surface - Example

COMP4302/5322 Neural Networks, w4, s2 2003

Convergence for the Example

w1(1,1) vs w2(1,1)
vs error

Try nnd12sd1!

Path a converges to the

optimum solution but is very
slow
Path b gets trapped in a
local minimum

COMP4302/5322 Neural Networks, w4, s2 2003

Convergence Case b

COMP4302/5322 Neural Networks, w4, s2 2003

Convergence Case a

The algorithm converges to a local minimum

z The trajectory is trapped in a valley and diverges from the
optimal solution

The algorithm is slow to converge as there are flat

surfaces over the path

What can we do?

What can be done?

z Try different initializations

b
b
a
a

squared error

<- a typical learning curve:

long periods of little progress
short periods of rapid advance
COMP4302/5322 Neural Networks, w4, s2 2003

COMP4302/5322 Neural Networks, w4, s2

2003

COMP4302/5322 Neural Networks, w4, s2 2003

Speeding up the Convergence

Backpropagation with Momentum

The modified learning rule ( - momentum):

Solution 1: Increase the learning rate

z Faster on the flat part but unstable when
falling into the steep valley that contains
the minimum point overshooting the
minimum
Try nnd12sd2!

w pq (t + 1) = w pq (t ) + w pq
=> w pq (t + 1) = w pq (t ) + ( w pq (t ) w pq (t 1)) + w pq

where
w pq = q o p - weight change
i = (d i o i ) oi (1 oi )

- for output neuron i

wp q
op
p

j = o j (1 o j ) w ji i - for hidden neuron j

Solution 2: Smooth out the trajectory by averaging the

updates to the parameters

The use of momentum might smooth out the oscillations and produce a
stable trajectory
COMP4302/5322 Neural Networks, w4, s2 2003

COMP4302/5322 Neural Networks, w4, s2 2003

Momentum

w(k)

(the sum is over the i nodes in the

layer above the node j)

Typical values for momentum: 0.6 - 0.9

The theory behind momentum comes from linear filters

First Order Linear Filter

Observation
z convergence might be improved if by smoothing out the
trajectory by averaging the updates to the parameters
First order filter:
z w(k) input, y(k) output
z - momentum coefficient

Oscillation in the filter output y(k) is less than the

oscillation in the filter input w(k)

y (k ) = y (k 1) + (1 ) w(k ) 0 < 1
2k
w(k ) = 1 + sin

As the momentum coefficient increases, the oscillation in

the output is reduced

The average filter output is the same as the average filter

input

w(k)

Although as the momentum increases the filter output is

slow to respond

=> The filter tends to reduce the amount of oscillation, while

still tracking the average value

y(k)

COMP4302/5322 Neural Networks, w4, s2 2003

Backpropagation with Momentum - Example

Example the same learning rate and initial position:

COMP4302/5322 Neural Networks, w4, s2 2003

Backpropagation with Momentum cont.

By the use of momentum we can use a larger learning

rate while maintaining the stability of the algorithm

without
momentum

Momentum also tends to accelerate convergence when the

with
momentum 0.8

trajectory is moving in a consistent direction

squared error

Try nnd12mo!

Smooth and faster

convergence
Stable algorithm

COMP4302/5322 Neural Networks, w4, s2 2003

COMP4302/5322 Neural Networks, w4, s2

2003

COMP4302/5322 Neural Networks, w4, s2 2003

Variable Learning Rate

Constant throughout training (standard steepest descent)

The performance is very sensitive to the proper setting of

the learning rate

z
z

Too small slow convergence

Too big oscillation, overshooting of the minimum
It is not possible to determine the optimum learning rate before
training as it changes during training and depends on the error
surface

Calculate the squared error (over the entire training set)

If the error increases by more than a predefined % :
z Weight update is discarded
z Learning rate is decreased by some factor (1> >0) [=5% typically]
z Momentum is set to 0
If the error increases by less than :
z Weight update is accepted
z Learning rate is increased by some factor >1
z If momentum has been set to 0, it is reset to its original value

Variable learning rate

z goal: keep the learning rate as large as possible while keeping
learning stable
z Several algorithms have been proposed

COMP4302/5322 Neural Networks, w4, s2 2003

Update the weights

If the error decreases:

z Weight update is accepted
z Learning rate is unchanged
z Momentum is unchanged
37

COMP4302/5322 Neural Networks, w4, s2 2003

Variable Learning Rate - Example

Universality of Backpropagation
Boolean functions

with constant
learning rate

with variable
learning rate

Every boolean function can be represented by network with a

single hidden layer

Continuous functions - universal approximation theorems

Any bounded continuous function can be approximated with

arbitrary small error by a network with one hidden layer
(Cybenko 1989, Hornik et al. 1989):
Any function can be approximated to arbitrary small error by a
network with two hidden layers (Cybenco 1988)

These are existence theorems they say the solution exist but

dont say how to choose the number of hidden neurons!

Try nnd12vl!

COMP4302/5322 Neural Networks, w4, s2 2003

Choice of Network Architecture

How many hidden neurons? (1 input, 1 output)

The task constrains the number of inputs and output units

6
f ( x ) = 1 + sin
x
4

but not the number of hidden layers and neurons in them

z
z

Too many free parameters (weights) overtraining

Too few the network is not able to learn the input-output mapping
A heuristic to start with: 1 hidden layer with n hidden neurons,
n=(inputs+output_neurons)/2

Performance with different number of hidden units:

COMP4302/5322 Neural Networks, w4, s2 2003

COMP4302/5322 Neural Networks, w4, s2

2003

Choice of Network Architecture - Example

An art! Typically - by trial and error

COMP4302/5322 Neural Networks, w4, s2 2003

Unless there are at least 5

hidden neurons, NN cannot
represent the function
=> backpropagation produces
network which minimizes the
error, but the capabilities of
the NN are limited by the
number of hidden neurons
Try nnd11fa!

COMP4302/5322 Neural Networks, w4, s2 2003

Generalization

When Does Overfitting Occur?

Supervised learning training with finite number of

Training examples are noisy

examples of proper behaviour: {p1,t1}, {p2,t2},,{pn,tn)

Based on them the network should be able to generalize what
it has learned to the total population of examples
Overtraining (overfitting):
z

A good fit to noisy data

Overfitting

the error on the training set is very small but when a new data is
presented to the network, the error is high
=> the network has memorized the training examples but has not
learned to generalize to new situations!

COMP4302/5322 Neural Networks, w4, s2 2003

When Does Overfitting Occur? cont.

training examples
6 was sampled to create 11 training examples
f ( x ) = 1 + sin
x
4
7 free parameters

COMP4302/5322 Neural Networks, w4, s2 2003

Preventing Overtraining

Number of the free parameters is bigger than the number of

Example: x- training set, o-testing set

Use network that is just large enough to provide an adequate fit

z OckhamRazor dont use a bigger network when a smaller one will
work
z The network should not have more free parameters than there are
training examples!
However, it is difficult to know beforehand how large a network

25 free parameters

should be for a specific application!

Try nnd11gn!
COMP4302/5322 Neural Networks, w4, s2 2003

Preventing Overtraining - Validation Set

Approach

Problems with the validation set approach small data sets

z Not enough data may be available to provide a validation set
z Overfitting is most severe for small data sets

Available data is divided into 3 subsets

z Training set (TS)
z

Used for computing the gradient and updating the weights

Training test set (TTS), also called validation set

Preventing Overtraining Cross Validation

Approach

Use an early stopping method

COMP4302/5322 Neural Networks, w4, s2 2003

The error on the TTS is monitored during the training

This error will normally decrease during the initial phase of training (as does
the training error)
However, when the network begins to overfit the data, the error on the TTS
will typically begin to rise
Training is stopped when the error on the TTS increases for a pres-specified
number of iterations and the weights and biases at the minimum of the TTS
error are returned

K-fold cross validation may be used

z Perform k fold cross validation
z Each time determine the number of epochs ep that result in best
performance on the respective test partition
z Calculate the mean of ep, ep_mean
z Final run: train the network on all examples for ep_mean epochs

Testing set
z

Not used during training but to compare different algorithms once training
has completed
COMP4302/5322 Neural Networks, w4, s2 2003

COMP4302/5322 Neural Networks, w4, s2

2003

COMP4302/5322 Neural Networks, w4, s2 2003

Limitations and Capabilities

Limitations and Capabilities cont.

MLPs trained with backpropagation can perform function

Sensitive to the value of the learning rate

approximation and pattern classification

Theoretically they can

z Perform any linear and non-linear computation
z Can approximate any reasonable function arbitrary well
z => are able to overcome the limitations of perceptrons and ADALINEs
In practice:
z May not always find a solution can be trapped in a local minimum
z Their performance is sensitive to the starting conditions (initialization
of weights)
z Sensitive to the number of hidden layers and neurons
z
z
z

Too small slow learning

Too big instability or poor performance

The proper choices depends on the nature of examples

Trial and error
Refer to the choices that have worked well in similar problems
=> succesfful application of NNs requires time and experience

z
z
z
z

NN training is an art. NN experts are artists; they are not mere

handbook users P.H. Winston

Too few neurons underfitting, unable to learn what you want it to learn
too many overfitting, learns slowly
=> the architecture of a MLP network is not completely constrained by the
problem to be solved as the number of hidden layers and neurons are left
to the designer
COMP4302/5322 Neural Networks, w4, s2 2003

COMP4302/5322 Neural Networks, w4, s2 2003

Backpropagation Algorithm Summary

Backpropagation

Gradient descent

When to Consider NNs?

uses approximate steepest descent algorithm for minimizing the mean

square error
The standard gradient descent is slow as it requires small learning rate
for stable learning
Gradient descent with momentum is faster as it allows higher learning
rate while maintaining stability

z
z
z
z
z
z

There are several variations of the backpropagation algorithm

COMP4302/5322 Neural Networks, w4, s2 2003

Input is high dimensional discrete or continuous data

Output is discrete or continuous
Output is a vector of values
Data might be noisy
Long training times are acceptable
Fast reply is needed
Form of target function is unknown (but there are examples
available)
Explaining the result to humans is not important (NN are like black
boxes)
See the following examples

COMP4302/5322 Neural Networks, w4, s2 2003

Some Interesting NN Applications

NETtalk
Sejnowski and Rosenberg 87

A few examples of the many significant applications of NNs

Network design was the result of several months trial and

error experimentation
Moral: NNs are widely applicable but they cannot magically

solve problems; wrong choices lead to poor performance

NNs are the second best way of doing just about anything

John Denker
z

NN provide passable performance on many tasks that would be

difficult to solve explicitly with other techniques

COMP4302/5322 Neural Networks, w4, s2 2003

COMP4302/5322 Neural Networks, w4, s2

2003

Pronunciation of written English

z Fascinating problem in linguistics
z Task with high commercial profit
z How?
z

Task for the NN: learning to map the text to phonemes

z Good task for a NN as most of the rules are approximately correct
z

Mapping the text stream to phonemes

Passing the phonemes to speech generator

E.g. cat [k], century [s]

COMP4302/5322 Neural Networks, w4, s2 2003

NETtalk -Architecture
z

z
z

NETtalk - Performance

203 input neurons 7 (sliding window: the character to be

pronounced and the 3 characters before and after it) x 29 possible
characters (26 letters + blank, period, other punctuation)
80 hidden
26 output corresponding to the phonemes

Training set
z 1024-words hand transcribed into phonemes
z Accuracy on training set: 90% after 50 epochs
z Why not 100%?
z A few dozen hours of training time + a few months of experimentation
with different architectures
Testing
z Accuracy 78%
Importance
z A good showpiece for the philosophy of NNs
z The network appears to mimic the speech patterns of young children
incorrect bubble at first (as the weights are random), then gradually
improving to become understandable

COMP4302/5322 Neural Networks, w4, s2 2003

Handwritten Character Recognition

COMP4302/5322 Neural Networks, w4, s2 2003

Handwritten Character Recognition cont.

Le Cun et al. 89

Training 7300 examples

Read zip code on hand-addressed envelopes

Testing 2000 examples

Task for the NN:

z A preprocessor is used to recognize the segments in the individual
digits
z Based on the segments, the network has to identify the digits

Accuracy 99%

Network architecture
z 256 input neurons 16x16 array of pixels
z 3 hidden layers 768, 192, 30 neurons respectively
z 10 output neurons digits 0-9
z Not fully connected network
z

Hardware implementation (in VLSI)

z enables letters to be sorted at high speed
z zip codes
One of the largest applications of NNs

If it was a fully connected network 200 000 connections (impossible to

train); instead only 9760 connections
Units in the hidden layer act as feature detectors e.g. each unit in the 1st
hidden layer is connected with 25 input neurons (5x5pixel region)
COMP4302/5322 Neural Networks, w4, s2 2003

COMP4302/5322 Neural Networks, w4, s2 2003

Driving Motor Vehicles

ALVIN (Autonomous

Land Vehicle In a Neural Network)

Learns to drive a van along a single lane on a highway

Once trained on a particular road,

ALVIN can drive at speed > 40 miles
per hour
Chevy van and US Army HMMWV
personnel carrier
z

COMP4302/5322 Neural Networks, w4, s2

2003

Fully connected backpropagation NN with 1 hidden layer

z 960 input neurons the signal from the camera is preprocessed to yield
30x32 image intensity grid
z 5 hidden neurons
z 32 output neurons corresponding to directions
z

computer-controlled steering,
acceleration and braking
sensors: color stereo video camera,
radar, positioning system, scanning
laser finders

COMP4302/5322 Neural Networks, w4, s2 2003

ALVINN - Architecture

Pomerleau, 1993

If the output node with the highest activation is

z The left most , than ALVINN turns sharply left
z The right most, than ALVINN turns sharply right
z A node between them, than ALVINN directs the van in a
proportionally intermediate direction

Smoothing the direction it is calculated as average suggested not

only by the output node with highest activation but also by the
nodes immediate neighbours

Training examples (image-direction pairs)

z Recording such pairs when human drives the vehicle
z After collecting 5 mins such data and 10 mins training, ALVINN can
60
drive on its own COMP4302/5322 Neural Networks, w4, s2 2003

ALVINN - Training

ALVINN - Results

Training examples (image-direction pairs)

z Recording such pairs when human drives the vehicle
z After collecting 5 min such data and 10 min training, ALVINN can
drive on its own
z

Potential problem: as the human is too good and (typically) does not
stray from the lane, there are no training examples that show how to
recover when you are misaligned with the road
Solution: ALVINN corrects this by creating synthetic training
examples it rotates each video image to create additional views of
what the road would look like if the van were a little off course to the
left or right

COMP4302/5322 Neural Networks, w4, s2 2003

Impressive results
z ALVINN has driven at speeds up to 70 miles per hour for up to 90
miles on public highways near Pittsburgh
z Also at normal speeds on single lane dirt roads, paved bike paths, and
two lane suburban streets
Limitations
z Unable to drive on a road type for which it hasnt been trained
z Not very robust to changes in lighting conditions and presence of other
vehicles
Comparison with traditional vision algorithms
z Use image processing to analyse the scene and find the road and then
follow it
z Most of them achieve 3-4 miles per hour

COMP4302/5322 Neural Networks, w4, s2 2003

ALVINN - Discussion
Why is ALVINN so successful?
z Fast computation - once trained, the NN is able to compute a new
steering direction 10 times a second => the computed direction can be
off by 10% from the ideal as long as the system is able to make a
correction in a few tenths of a second
z Learning from examples is very appropriate
z

No good theory of driving but it is easy to collect examples +. Motivated the

use of learning algorithm (but not necessary NNs)

Driving is continuous, noisy domain, in which almost all features

contribute some information => NNs are better choice than some other
learning algorithms (e.g. DTs)

COMP4302/5322 Neural Networks, w4, s2 2003

COMP4302/5322 Neural Networks, w4, s2

2003

Ann4-3s.pdf 7oct PDF
No ratings yet
Ann4-3s.pdf 7oct PDF
21 pages
Neural Networks & Backpropagation
No ratings yet
Neural Networks & Backpropagation
14 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
68 pages
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 4: Artificial Neural Networks (Based On Chapter 4 of Mitchell T.., Machine Learning, 1997)
14 pages
ML Unit - 2
No ratings yet
ML Unit - 2
70 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Unit II Supervised II
No ratings yet
Unit II Supervised II
16 pages
5 - From Linear Models To Multi-Layer Perceptrons
No ratings yet
5 - From Linear Models To Multi-Layer Perceptrons
45 pages
Multi Layer Perceptron Haykin
No ratings yet
Multi Layer Perceptron Haykin
50 pages
Lecture 13.3 Classification ANN
No ratings yet
Lecture 13.3 Classification ANN
64 pages
L04 Slides - mlp1
No ratings yet
L04 Slides - mlp1
22 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
19 pages
Backpropagation
No ratings yet
Backpropagation
37 pages
Understanding Backpropagation Algorithm - Towards Data Science
No ratings yet
Understanding Backpropagation Algorithm - Towards Data Science
11 pages
Back-Propagation Algorithm
No ratings yet
Back-Propagation Algorithm
26 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
2025 Lecture07 P2 MLP
No ratings yet
2025 Lecture07 P2 MLP
56 pages
AIML-Module-3-part 2
No ratings yet
AIML-Module-3-part 2
122 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
13 pages
ANN Unit 3
No ratings yet
ANN Unit 3
100 pages
Neural Network
100% (1)
Neural Network
54 pages
4 Perceptron 06 08 2025
No ratings yet
4 Perceptron 06 08 2025
32 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
14 pages
Lecture (4) Backpropagation
No ratings yet
Lecture (4) Backpropagation
27 pages
Week3 Backpropagation
No ratings yet
Week3 Backpropagation
32 pages
3ML.05.NeuralNetworks DeepLearning
No ratings yet
3ML.05.NeuralNetworks DeepLearning
67 pages
Neural
No ratings yet
Neural
53 pages
Classification BP Regression KNN Other Classifiers - Final
No ratings yet
Classification BP Regression KNN Other Classifiers - Final
116 pages
Exp - 4 - 5 (Prakash)
No ratings yet
Exp - 4 - 5 (Prakash)
10 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
Neural Network Learning Guide
No ratings yet
Neural Network Learning Guide
43 pages
Machine Learning: Chapter 4. Artificial Neural Networks
0% (1)
Machine Learning: Chapter 4. Artificial Neural Networks
34 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
ANN Research
No ratings yet
ANN Research
18 pages
2023 Lecture11 NeuralNetworks
No ratings yet
2023 Lecture11 NeuralNetworks
48 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
Module 1 DL
No ratings yet
Module 1 DL
84 pages
Chapter3 - BP
No ratings yet
Chapter3 - BP
12 pages
Deep Learning for Beginners
100% (1)
Deep Learning for Beginners
87 pages
Notes ML 02 Slides RNN ANN
No ratings yet
Notes ML 02 Slides RNN ANN
105 pages
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
No ratings yet
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
39 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Lecture 12 - Neural Networks (DONE!!) PDF
No ratings yet
Lecture 12 - Neural Networks (DONE!!) PDF
27 pages
ANN Notes Updated
0% (1)
ANN Notes Updated
46 pages
Multi Layer Perceptron 1
No ratings yet
Multi Layer Perceptron 1
54 pages
L6 Neural Network
No ratings yet
L6 Neural Network
57 pages
Neural Networks & Algorithms Guide
No ratings yet
Neural Networks & Algorithms Guide
10 pages
Machine Learning: Algorithms and Applications: (Continued)
No ratings yet
Machine Learning: Algorithms and Applications: (Continued)
17 pages
Exp 3
No ratings yet
Exp 3
9 pages
Multi-Layer Perceptron & Backpropagation
No ratings yet
Multi-Layer Perceptron & Backpropagation
88 pages
Back Propagation Technique
No ratings yet
Back Propagation Technique
24 pages
2021 Lecture11 NeuralNetworks
No ratings yet
2021 Lecture11 NeuralNetworks
48 pages
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
No ratings yet
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
31 pages
Unit 2 - ML
No ratings yet
Unit 2 - ML
18 pages
Polynomials PPT - PDF
No ratings yet
Polynomials PPT - PDF
1 page
RM-Lab20 - Correlation and Regression Analysis Using SPSS
No ratings yet
RM-Lab20 - Correlation and Regression Analysis Using SPSS
6 pages
3.4 Partial Fractions, Mat 2280: Specific Cases Where Partial Fractions Is Not Necessary 1.
No ratings yet
3.4 Partial Fractions, Mat 2280: Specific Cases Where Partial Fractions Is Not Necessary 1.
8 pages
Agebra Revision Assignment 1
No ratings yet
Agebra Revision Assignment 1
5 pages
Ant Colony Optimization Guide
No ratings yet
Ant Colony Optimization Guide
18 pages
Interpolation
No ratings yet
Interpolation
14 pages
Tos Grade 10
No ratings yet
Tos Grade 10
1 page
CFD Code Comparison for Engineers
No ratings yet
CFD Code Comparison for Engineers
4 pages
Engineering Optimization Methods
No ratings yet
Engineering Optimization Methods
78 pages
X-FEM Code Modifications & Error Analysis
No ratings yet
X-FEM Code Modifications & Error Analysis
6 pages
Chapter 7 Transportation Assig 921689678281889
No ratings yet
Chapter 7 Transportation Assig 921689678281889
22 pages
Numerical Analysis Using R Solutions To ODEs and PDEs 1st Edition Graham W. Griffiths Instant Download
No ratings yet
Numerical Analysis Using R Solutions To ODEs and PDEs 1st Edition Graham W. Griffiths Instant Download
48 pages
Assignment Problems Exercise
50% (2)
Assignment Problems Exercise
6 pages
BLAST - A Heuristic Algorithm
No ratings yet
BLAST - A Heuristic Algorithm
18 pages
Curve Fitting & Interpolation Guide
No ratings yet
Curve Fitting & Interpolation Guide
48 pages
Polynomial Basics for Students
No ratings yet
Polynomial Basics for Students
3 pages
Generalized ERF
No ratings yet
Generalized ERF
3 pages
UNIT3DAApptx 2022 08 05 11 12 14
No ratings yet
UNIT3DAApptx 2022 08 05 11 12 14
68 pages
Linear-Quadratic Regulator Guide
100% (1)
Linear-Quadratic Regulator Guide
4 pages
RNNs for Sequence Modeling
No ratings yet
RNNs for Sequence Modeling
93 pages
Viterbi Algorithm
No ratings yet
Viterbi Algorithm
5 pages
Numerical Integration in 1D (Lec 11)
No ratings yet
Numerical Integration in 1D (Lec 11)
26 pages
07 Network Flow I
No ratings yet
07 Network Flow I
96 pages
Factor and Solve Polynomial Equations: Before Now Why?
No ratings yet
Factor and Solve Polynomial Equations: Before Now Why?
7 pages
Exercises Part2 Optimization2023 (1)
No ratings yet
Exercises Part2 Optimization2023 (1)
6 pages
Numerical Analysis Lecture Notes
No ratings yet
Numerical Analysis Lecture Notes
72 pages
Week 1 Sol
No ratings yet
Week 1 Sol
3 pages
Machine Learning Regression
No ratings yet
Machine Learning Regression
64 pages
ch-2 5
No ratings yet
ch-2 5
3 pages
Algorithm Analysis and Design S6 B.Tech CS August 2023 (S)
No ratings yet
Algorithm Analysis and Design S6 B.Tech CS August 2023 (S)
2 pages