UNIT III
MULTILAYER PERCEPTRON –
Back propagation algorithm
XOR problem Heuristics,
Output representation and decision rule,
Computer experiment,
Feature detection.
BACK PROPAGATION -
Back propagation and differentiation,
Hessian matrix,
Generalization,
Cross validation,
Network pruning Techniques,
Virtues and limitations of back propagation learning,
Accelerated convergence,
Supervised learning.
Multilayer perceptron, which stands for a neural network with one or more hidden
Layers
Rosenblatt’s perceptron, which is basically a single-layer neural network. this
network is limited to the classification of linearly separable patterns.
Adptive filtering, using Widrow and Hoff’s LMS algorithm. This algorithm is also
based on a single linear neuron with adjustable weights, which limits the computing
power of the algorithm.
To overcome the practical limitations of the perceptron and the LMS algorithm, a
neural network structure known as the multilayer perceptron.
The following three points highlight the basic features of multilayer perceptrons:
• The model of each neuron in the network includes a nonlinear activation func-
tion that is differentiable.
• The network contains one or more layers that are hidden from both the input and
output nodes.
• The network exhibits a high degree of connectivity, the extent of which is deter-
mined by synaptic weights of the network.
A popular method for the training of multilayer perceptrons is the back-propagation
algorithm, which includes the LMS algorithm as a special case. The training proceeds in
two phases:
1) Forward phase, the synaptic weights of the network are fixed and the input signal is
propagated through the network, layer by layer, until it reaches the output. Thus, in
this phase, changes are confined to the activation potentials and outputs of the
neurons in the network.
2) Backward phase, an error signal is produced by comparing the output of the
network with a desired response. The resulting error signal is propagated through the
network, again layer by layer, but this time the propagation is performed in the
backward direction. In this second phase, successive adjustments are made to the
synaptic weights of the network. Calculation of the adjustments for the output layer is
straightforward, but it is much more challenging for the hidden layers.
Multi-layer Perceptron neural architecture
• In a typical MLP network, the input units (X i) are fully
connected to all hidden layer units (Yj) and the hidden layer
units are fully connected to all output layer units (Z k)
• Each of the connections
between the input to hidden
and hidden to output layer
units has an associated weight
attached to it (Wij or Wjk)
• The hidden and output layer
units also derive their bias
values (bj or bk) from weighted
connections to units whose
outputs are always 1 (true
neurons)
Architectural graph of a multiplayer perceptron with two hidden layers and an output layer.
The network shown here is fully connected. This means that a neuron in any layer of the
network is connected to all the neurons (nodes) in the previous layer.
Signal flow through the network progresses in a forward direction, from left to right and on
a layer-by-layer basis.
Two kinds of signals are identified in this network:
1. Function Signals. A function signal is an input signal (stimulus) that comes in at the
input end of the network, propagates forward (neuron by neuron) through the network,
and emerges at the output end of the network as an output signal.
2. Error Signals. An error signal originates at an output neuron of the network and
MLP training algorithm
A Multi-Layer Perceptron (MLP) neural network trained using
the Backpropagation learning algorithm is one of the most
powerful forms of supervised neural network system.
The training of such a network involves three stages:
• feedforward of the input training pattern,
• calculation and backpropagation of the associated error
• adjustment of the weights
This procedure is repeated for each pattern over several
complete passes (epochs) through the training set.
After training, application of the net only involves the
computations of the feedforward phase.
Fig. 4.3, which depicts neuron j being fed by a set offunction signals produced by a
layer of neurons to its left.
Backpropagation Learning Algorithm
Feed Forward phase:
• Xi = input[i]
• Yj = f( bj + XiWij)
• Zk = f( bk + YjWjk)
Backpropagation of errors:
• k = Zk[1 - Zk](dk - Zk)
• j = Yj[1 - Yj] k Wjk
Weight updating:
• Wjk(t+1) = Wjk(t) + kYj + [Wjk(t) - Wjk(t - 1)]
• bk(t+1) = bk(t) + kYtn + [bk(t) - bk(t - 1)]
• Wij(t+1) = Wij(t) + jXi + [Wij(t) - Wij(t - 1)]
• bj(t+1) = bj(t) + jXtn + [bj(t) - bj(t - 1)]
Test stopping condition
After each epoch of training the Root Mean Square error of the
network for all of the patterns in a separate validation set is
calculated.
ERMS = (dk - Zk)2
n.k
• n is the number of patterns in the set
• k is the number of neuron units in the output layer
Training is terminated when the ERMS value for the validation set
either starts to increase or remains constant over several
epochs.
This prevents the network from being overtrained (i.e.
memorising the training set) and ensures that the ability of the
network to generalise (i.e. correctly classify non-trained
patterns) will be at its maximum.
Factors affecting network performance
Number of hidden nodes:
• Too many and the network may memorise training set
• Too few and the network may not learn the training set
Initial weight set:
• some starting weight sets may lead to a local minimum
• other starting weight sets avoid the local minimum.
Training set:
• must be statistically relevant
• patterns should be presented in random order
Date representation:
• Low level - very large training set might be required
• High level – human expertise required
MLP as classifiers
MLP classifiers are used in a wide range of domains from
engineering to medical diagnosis. A classic example of use
is as an Optical Character Recogniser.
Simple example would be a
35-8-26 mlp network. This
network could learn to map
input patterns, corresponding
to the 5x7 matrix
representations of the capital
letters A - Z, to 1 of 26 output
patterns.
After training, this network then classifies ‘noisy’ input
patterns to the correct output pattern that the network
was trained to produce.