0% found this document useful (0 votes)
33 views56 pages

ANN Module 1

Uploaded by

madhugowdaks.iet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views56 pages

ANN Module 1

Uploaded by

madhugowdaks.iet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Dr.

Praveen BLESSINGTON T
Professor
Email: praveentblessington@gmail.com
Mobile: 9730562120
 Artificial neural networks (ANNs) are
biologically inspired computer programs
designed to simulate the way in which the
human brain processes information.
 An ANN is formed from hundreds of single
units, artificial neurons or processing
elements, connected with coefficients
(weights), which constitute the neural
structure and are organized in layers.
 The power of neural computations
comes from connecting neurons in
a network.
 The behavior of a neural network is
determined by the transfer
functions of its neurons, by the
learning rule, and by the
architecture itself.
 In 1943, McCulloch and Pitts first modeled a
simple neural network using electrical
circuits in order to describe how neurons in
the brain might work.
 In 1949, first learning law for artificial neural
networks was designed by Donald Hebb.
 In 1958, a learning method for McCulloch
and Pitts neuron model named Perceptron
was invented by Rosenblatt.
 In 1960, Widrow and Hoff developed
models called “ADALINE” and
“MADALINE”.
 In 1961, Rosenblatt made an unsuccessful
attempt but proposed the
“backpropagation” scheme for multilayer
networks.
 In 1969, Multilayer perceptron (MLP) was
invented by Minsky and Papert.
 Robustness and fault tolerance: The decay of nerve cells
does not seem to affect the performance significantly.
 Flexibility: The network automatically adjusts to a new
environment without using any preprogrammed
instructions.
 Ability to deal with a variety of data situations: The
network can deal with information that is fuzzy,
probabilistic, noisy and inconsistent.
 Collective computation: The network performs routinely
many operations in parallel and also a given task in a
distributed manner.
 The term ‘Neural’ is derived from the basic
functional unit ‘neuron’ of human nervous
system.
 Brain contains approximate 1011 neurons,
each of which has 102–105 connections
with other neurons.
 Neurons are organized in a fully connected
network and act like messenger in
receiving and sending impulses.
 The connections can be inhibitory
(decreasing strength) or excitatory
(increasing strength) in nature.
 The result is an intelligent brain
capable of learning, prediction and
recognition.
 Neurons consist of four basic parts–
Dendrite, Soma (cell body), Axon,
Synapse.
 ANNs are digitized model of a human brain
i.e. a biologically inspired computational model
to simulate the way in which human brain
processes information.
 It is formed from hundreds of single units,
artificial neurons, connected with coefficients
(weights) which constitute the neural structure.
They are also known as processing elements as
they process information.
 Each processing element has weighted
inputs, transfer function and one output.
Processing element is essentially an
equation which balance inputs and outputs.
 ANNs learn (or are trained) through
experience with appropriate learning
example just like people do, not from
programming.
 Processing unit: We can consider an artificial neural network
(ANN) as a highly simplified model of the structure of the biological
neural network. An ANN consists of interconnected processing units.
▪ The general model of a processing unit consists of a summing part
followed by an output part. The summing part receives N input
values, weights each value, and computes a weighted sum. The
weighted sum is called the activation value.
▪ The output part produces a signal from the activation value. The sign
of the weight for each input determines whether the input is
excitatory (positive weight) or inhibitory (negative weight).
▪ The inputs could be discrete or continuous data values, and likewise
the outputs also could be discrete or continuous. The input and output
could also be deterministic or stochastic or fuzzy
 Interconnections: In an artificial neural network several processing units are
interconnected according to some topology to accomplish a pattern recognition task.
Therefore, the inputs to a processing unit may come from the outputs of other
processing units, and/or from external sources. The output of each unit may be given to
several units including itself.
▪ The amount of the output of one unit received by another unit depends on the
strength of the connection between the units, and it is reflected in the weight value
associated with the connecting link.
▪ If there are N units in a given ANN, then at any instant of time each unit will have a
unique activation value and a unique output value.
▪ The set of the N activation values of the network defines the activation state of the
network at that instant. Likewise, the set of the N output values of the network
defines the output state of the network at that instant.
▪ Depending on the discrete or continuous nature of the activation and output values,
the state of the network can be described by a discrete or continuous point in an N-
dimensional space.
 Operations: In operation, each unit of an ANN receives inputs from
other connected units and/or from an external source. A weighted
sum of the inputs is computed at a given instant of time.
▪ The activation value determines the actual output from the output function unit, i.e.,
the output state of the unit. The output values and. other external inputs in turn
determine the activation and output states of the other units.
▪ Activation dynamics determines the activation values of all the units, i.e., the
activation state of the network as a function of time. The activation dynamics also
determines the dynamics of the output state of the network. The set of all activation
states defines the activation state space of the network.
▪ The set of all output states defines the output state space of the network. Activation
dynamics determines the trajectory of the path of the states in the state space of the
network.
▪ For a given network, defined by the units and their interconnections with appropriate
weights, the activation states determine the short term memory function of the
network.
 McCulloch-Pitts Model:
 The output signal (s) is typically a nonlinear
function f(x) of the activation value x.
 The following equations describe the operation of
an MP model:
 Non-linear functions: Three commonly used nonlinear functions
(binary, ramp and sigmoid)
 some examples of logic circuits realized using
 In this model a binary output function is used with the following
logic:

 A single input and a single output MP neuron with proper weight


and threshold gives an output a unit time. This unit delay property
of the MP neuron can be used to build sequential digital circuits.
 With feedback, it is also possible to have a memory cell which can
retain the output indefinitely in the absence of any input. In the
MP model the weights are fixed.
 Hence a network using this model does not have the capability of
learning. Moreover, the original model allows only binary output
states, operating at discrete time steps.
 The Rosenblatt's perceptron model for an artificial
neuron consists of outputs from sensory units to a fixed
set of association units, the outputs of which are fed to
an MP neuron is incorporated in the operation of the
limit.
 The desired or target output (b) is compared with the
actual binary output (s), and the error (6) is used to
adjust the weights.
 There is a perceptron learning law which gives a
step-by-step procedure for adjusting the weights.
Whether the weight adjustment converges or not
depends on the nature of the desired input-output
pairs to be represented by the model.
 The perceptron convergence theorem enables us to
determine whether the given pattern pairs are
representable or not.
 If the weight values converge, then the
corresponding problem is said to be represented by
the perceptron network.
 ADAptive LINear Element (ADALINE) is an early
single-layer artificial neural network model developed
by Bernard Widrow and Ted Hoff in 1960. It is a
supervised learning model primarily used for binary
classification and linear regression tasks.
 The main distinction between the Rosenblatt's
perceptron model and the Widrow's Adaline model is
that, in the Adaline the analog activation value (x)is
compared with the target output (b). In other words,
the output is a linear function of the activation value
(x).
 Where ŋ is the learning rate parameter. This
weight update rule minimizes the mean
squared error δ2,averaged over all inputs.
Hence it is called Least Mean Squared
(LMS)error learning law.
 This law is derived using the negative
gradient of the error surface in the weight
space. Hence it is also known as a gradient
descent algorithm.
 The operation of a neural network is governed by
neuronal dynamics. Neuronal dynamics consists of
two parts…
 one corresponding to the dynamics of the activation
state and the other corresponding to the dynamics
of the synaptic weights.
▪ The Short-Term Memory (STM) in neural networks
is modelled by the activation state of the network.
▪ The Long-Term Memory (LTM) corresponds to the
encoded pattern information in the synaptic
weights due to learning.
 The Learning laws are merely implementation
models of synaptic dynamics. Typically, a model
of synaptic dynamics is described in terms of
expressions for the first derivative of the weights.
They are called learning equations.
 Learning laws describe the weight vector for the
ith processing unit at time instant (t+1) in terms of
the weight vector at time instant (t) as follows:

wi(t+1) = wi(t)+ Awi(t)


 Hebbian Law is an unsupervised learning algorithm
used in neural networks to adjust the weights
between nodes. It is based on the principle that the
connection strength between two neurons should
change depending on their activity patterns. The rule
can be summarized as follows:
▪ When two neighboring neurons operate in the same phase
at the same time, the weight between them increases.
▪ If the neurons operate in opposite phases, the weight
between them decreases.
▪ When there is no signal correlation between the neurons,
the weight remains unchanged.
 This law is applicable only for bipolar output
functions f(.). This is also called discrete
perceptron learning law.
 The expression for Δwij shows that the weights
are adjusted only if the actual output si is
incorrect, since the term in the square brackets
is zero for the correct output.
 This is a supervised learning law, as the law
requires a desired output for each input.
 Perceptron Learning law is an error-
correcting algorithm designed for
single-layer feedforward networks. It is
a supervised learning approach that
adjusts weights based on the error
calculated between the desired and
actual outputs. Weight adjustments are
made only when an error is present.
 This law is valid only for a differentiable
output function, as it depends on the
derivative of the output function f(.).
 It is a supervised learning law since the
change in the weight is based on the error
between the desired and the actual output
values for a given input.
 Delta learning law can also be viewed as a
continuous perceptron learning law.
 The weights can be initialized to any random
values as the values are not very critical.
 The weights converge to the final values
eventually by repeated use of the input-output
pattern pairs.
 The convergence can be more or less guaranteed
by using more layers of processing units in
between the input and output layers.
 The delta learning law can be generalized to the
case of multiple layers of a feedforward network.
 This is a supervised learning law and is a special case
of the delta learning law, where the output function is
assumed linear, i.e., f(xi)=xi.
 In this case the change in the weight is made
proportional to the negative gradient of the error
between the desired output and the continuous
activation value, which is also the continuous output
signal due to linearity of the output function.
 Hence, this is also called the Least Mean Squared
(LMS) error learning law.
 This is same as the learning law used in the
Adaline model of neuron.
 In implementation, the weights may be
initialized to any values.
 The input-output pattern pairs data is applied
several times to achieve convergence of the
weights for a given set of training data.
 The convergence is not guard for any
arbitrary training data set.
 This is a special case of the Hebbian learning with
the output signal(si)being replaced by the desired
signal (bi).
 But the Hebbian learning is an unsupervised
learning, whereas the correlation learning is a
supervised learning, since it uses the desired
output value to adjust the weights.
 In the implementation of the learning law, the
weights are initialized to small random values
close to zero, i.e., wij = 0.
 This is relevant for a collection of neurons,
organized in a layer as shown in Figure.
 All the inputs are connected to each of the units in
the output layer in a feedforward manner.
 For a given input vector a, the output from each
unit i is computed using the weighted sum wiTa.
The unit k that gives maximum output is
identified
wkTa = max (wiTa)
i
 Then the weight vector leading to the kth unit is adjusted as follows:
Δwk = ŋ (a-wk)
Δwkj= ŋ (a-wkj), for j = 1,2,...,M

 The final weight vector tends to represent a group of input vectors within a

small neighborhood. This is a case of unsupervised learning. In

implementation, the values of the weight vectors are initialized to random

values prior to learning, and the vector lengths are normalized during

learning.
 The outstar learning law is also related to a group of units
arranged in a layer.
 In this law the weights are adjusted so as to capture the
desired output pattern characteristics. The adjustment of
the weights is given by
Δwkj= ŋ (bj-wkj), for j = 1,2,...,M
 where the kth unit is the only active unit in the input layer
 The outstar learning is a supervised
learning law, and it is used with a
network of instars to capture the
characteristics of the input and output
patterns for data compression.
 In ,implementation, the weight vectors
are initialized to zero prior to blaming.

You might also like