18CSE352T
NEURO FUZZY AND GENETIC
PROGRAMMING
SESSION 1
What is Soft Computing?
• Soft computing deals with approximate models
• It deals with partial truth, uncertainty and approximation
• Gives solutions to complex real-life problems
• The role model for soft computing is the human mind
• Soft computing is not a single method
• It is a combination of several methods such as
• Artificial Neural Networks
• Fuzzy Logic
• Genetic Algorithms
• Machine Learning
• Expert Systems
Hard Computing vs Soft Computing
Essential Features of the Brain
Aspect Description
Architecture The average human brain consists of about 100 billion neurons. There
are nearly 10 power 15 number interconnections among these neurons.
Hence the brain’s architecture is highly connected.
Mode of operation The brain operates in extreme parallel mode. The power of the brain lies
in the simultaneous billion neurons and their interactions.
Speed Very slow, and also very fast. Very slow in the sense that neurons
operate at milliseconds miserably slow compared to the speed of
present day VLSI chips that operate at nanoseconds computers are
tremendously fast and flawless in number crunching and data
processing human beings. Still the brain can perform activities in
split-seconds (e.g., converse in natural out common sense reasoning,
interpret a visual scenery, etc.) which a modern supercomputer to
carryout.
Essential Features of the Brain
Aspect Description
Fault tolerance The brain is highly fault tolerant. Knowledge is stored within the
brain in a distributed manner. Consequently, if a portion of the
brain is damaged, it can still go on functioning by retrieving the
lost knowledge from the remaining neurons.
Storage mechanism The brain stores information as strengths of the interconnections
among the neurons. New information can be added by adjusting
the weights without already stored information.
Control There is no global control in the brain. A neuron acts on local
information available with neurons. The neurons pass on the
results of processing only to the neurons adjacent to them.
Where does the power of a human being lie?
• Computers are fast processing systems
• But Humans are good in Recognizing a face
• Human beings do not think in terms of data, but in terms of patterns
• When we look at a face, we never think in terms of pixel values, but perceive the
face as a whole, as a pattern
• The structure of a human brain is drastically different from the architecture of a
computer
The Biological Neuron
• The biological neuron is the building block of a human brain
The Biological Neuron – contd.
• The biological neuron is the building block of a human brain
• It consists of 3 primary parts:
• Dendrites
• Collect stimuli from the neighbouring neurons and pass it on to soma
• Soma
• It is the main body of the cell
• It accumulates the stimuli received through the dendrites
• It ‘fires’ when sufficient stimuli is obtained
• When a neuron fires, it transmits its own stimulus through the axon
• Axon
• It helps to pass the stimulus to the neighbouring neurons
The Biological Neuron – contd.
• There is a small gap between the end of an
axon terminal and the adjacent dendrite of
the neighbouring neuron
• This gap is called synapse
• A nervous stimulus is an electric impulse.
• It is transmitted across a synaptic gap by
means of electrochemical process
• The synaptic gap has an important role to play in the activities of the
nervous system
• It scales the input signal by a weight
• If the input signal is x, and the synaptic weight is w, then the stimulus that
finally reaches the soma due to input x is the product x x w
• This weight together with other synaptic weights, embody the knowledge
stored in the network of neurons
The Biological Neuron – contd.
Artificial Neural Networks (ANN)
• ANNs are information processing systems that are inspired by the way biological
nervous system and the brain works
• They are usually configured for specific applications such as
• Pattern Recognition
• Data Recognition
• Image Processing
• Stock Market Prediction
• Weather Prediction
• Image Compression
• Aim: To bring the traditional computers a little closer to the way human brain
works
BNN vs ANN
A Single Neuron - Perceptron
The Artificial Neuron
• An artificial neuron is a computational model based on the structure and
functionality of a biological neuron.
• It consists of
1. a processing element
2. a number of inputs
3. weighted edges connecting each input to the
processing element.
• A processing unit is represented by a circle
• The input units are shown with boxes to distinguish
them from the processing units
The Artificial Neuron – contd.
Notational Convention
Symbol Used Description
Xi The ith input unit
Y The output unit. In case there are more than one output units, the jth
output unit is Yj
xi Signal to the input unit Xi
wi Weight associated with the interconnection b/w xi and the output unit
Y. In case there are more than one output units, wij denotes the weight
between input unit Xi and the jth output unit Yj
y_in The total (or net) input to the output unit Y. It is the algebraic sum of
all weighted inputs to Y
y_out Signal transmitted by the output unit Y. It is known as the activation
of Y
The Artificial Neuron – contd.
Example for a Neural Network with more than one output
layer
Simple
Architecture
• Here is a simple model of
neurons
• Circles are the neurons and the
arrows are the synapses that
connect these neurons
Transmission of
inputs
Transmission of
inputs
Transmission of
inputs
Architecture with inputs, weights, threshold and adder
functions
The Artificial Neuron – contd.
•
The Artificial Neuron – contd.
•
The Artificial Neuron – contd.
•
The Artificial Neuron – A Summary
•
HISTORY OF ANN
History of ANN
• 1943 : Neurophysiologist Warren McCulloch and Mathematician Walter Pitts wrote
a paper on how neurons work. To describe it, they modeled a simple neural network
using electrical circuits.
• 1949 : Donald Hebb pointed out the fact that “Neural pathways are strengthened
each time they are used (the way in which humans learn). If two nerves fire at the
same time, the connection b/w them is enhanced”
• 1959 : Bernard Widrow and Marcian Hoff of Stanford developed models called
ADALINE (ADAptive LINear Elements) and MADALINE (Multiple ADAptive LINear
Elements)
• ADALINE was developed to recognize binary patterns
• MADALINE was the first neural network applied to a real world problem that
eliminates echoes on phone lines
History of ANN – contd.
• 1962 : Widrow & Hoff developed a learning procedure that examines the value
before the weight adjusts it.
• 1975 : The first multilayered network was developed, an unsupervised network.
• 1982 : Cooperative / Competitive Neural Networks.
• 1986 : There were efforts on extending the Windrow-Hoff rule to multiple layers.
Three independent groups of researchers worked on it and came up with similar
ideas which are now called Back Propagation Networks
• 1997 : A recurrent neural network framework, Long Short-Term Memory (LSTM)
was proposed by Schmidhuber & Hochreiter
• 1998 : Yann LeCun published Gradient-Based Learning Applied to Document
Recognition
History of ANN
ANN ARCHITECTURES
Artificial Neural Network (ANN) Architectures
• An ANN consists of a number of a number of artificial neurons connected among
themselves in certain ways
• These neurons are arranged in layers, with interconnections across the layers
• The network may or may not be fully connected
• The nature of the interconnection paths also varies
• They are either unidirectional or bidirectional
• The topology of an ANN, together with the nature of its interconnection paths is
generally referred to as its architecture
Output
1. Single-Layer Feed Forward ANN Neurons
y1_out
Input y1_in Y1
Neurons w
• It is the simplest ANN architecture 11
w21
x1 w12
• It consists of an array of i/p neurons X1 y2_in
connected to an array of o/p neurons w13 y2_out
Y2
w14
• The input neurons do not exercise any w22
w32
processing power, but simply forward x2
the i/p signals to the subsequent X2 y3_in
neurons. So they are not considered w23 y3_out
w31 Y3
to constitute a layer
w33
• So the only layer in the ANN is x3
X3 w24
composed of the o/p neurons Y1, … Yn y4_in
w34
y4_out
Y4
Output
Single-Layer Feed Forward ANN – cont.. Neurons
y1_out
• Input y1_in Y1
Neurons w
11
w21
x1 w12
X1 y2_in
w13 y2_out
Y2
w14
w22
x2 w32
X2 y3_in
w23 Y3 y3_out
w31
w33
x3
X3 w24
w34 y4_in
y4_out
Y4
Single-Layer Feed Forward ANN – cont..
•
2. MultiLayer Feed Forward ANN
• Similar to single layer feed forward net except that there is one or more additional
layers of processing units between the input and the output layers
• The additional layers are called the hidden layers of the network
MultiLayer Feed Forward ANN – cont..
•
MultiLayer Feed Forward ANN – cont..
•
where m ⇒ number of nodes in the i/p layer
n ⇒ number of nodes in the hidden layer
r ⇒ number of nodes in the o/p layer
MultiLayer Feed Forward ANN – cont..
It is possible to include more than one hidden layers
MultiLayer Feed Forward ANN – cont..
3. Competitive ANN
• Competitive networks are structurally
similar to single layer feed forward nets
• But, the o/p units are connected among
themselves usually through –ve weights
• Two types
1. O/p units are connected only to
their respective neighbours
2. O/p units are fully connected
• For a given i/p pattern, the output units
tend to compete among themselves to
represent that input
• Can be used for unsupervised learning
(clustering)
4. Recurrent Networks
• In feed forward networks, signals flow in
one direction only (from the i/p layer
towards the o/p layer through the hidden
layers). There are no feedback loops
• A recurrent network allows feedback
loops
• Fully connected recurrent networks
contain a bidirectional path between
every pair of processing elements
• Also, a recurrent network may contain
self loops
• Also called feedback neural networks
• They are designed to work with sequence
prediction problems
ACTIVATION FUNCTIONS
LEARNING ALGORITHMS
Activation Functions
• The function that maps the net input value to the output signal value is known as the
activation function
• The output from a processing unit is known as its activation
• Some of the common activation functions are
1. Identity Function
2. Step Function
3. Sigmoid Function
4. Hyperbolic Tangent Function
1. Identity Function
•
2. Step Function
• It is one of the frequently used activation functions
• The step function is also known as the Heaviside function
• There are 4 types of step functions:
1. Binary step function
2. Binary threshold function
3. Bipolar step function
4. Bipolar threshold function
Binary Step Function Binary Threshold
Function
•
Bipolar Step Function Bipolar Threshold Function
• Sometimes, it is more convenient to work with bipolar data -1 and +1, than binary data
• If a signal value 0 is sent through a weighted path, the information contained in the
interconnection weight is lost as it is multiplied by 0
• To overcome this, the binary input is converted to bipolar form and then a suitable bipolar
activation function is applied
• The output of bipolar step function is -1 or +1
•
3. The Sigmoid Function
• The step function is not continuous and it is not differentiable
• Some ANN training algorithms require the activation function to be continuous and
differentiable
• The step function is not suitable for such cases
• Sigmoid functions have the nice property that they can approximate the step
function to the desired extent without losing its differentiability
• There are 2 types of sigmoid functions:
1. Binary sigmoid function (Logistics sigmoid function)
2. Bipolar sigmoid function
Binary Sigmoid Function
•
Bipolar Sigmoid Function
•
Hyperbolic Tangent Function
•
18CSE352T
NEURO FUZZY AND GENETIC
PROGRAMMING
ACTIVATION FUNCTIONS
LEARNING ALGORITHMS
Activation Functions
• The function that maps the net input value to the output signal value is known as the
activation function
• The output from a processing unit is known as its activation
• Some of the common activation functions are
1. Identity Function
2. Step Function
3. Sigmoid Function
4. Hyperbolic Tangent Function
1. Identity Function
•
2. Step Function
• It is one of the frequently used activation functions
• The step function is also known as the Heaviside function
• There are 4 types of step functions:
1. Binary step function
2. Binary threshold function
3. Bipolar step function
4. Bipolar threshold function
Binary Step Function Binary Threshold
Function
•
Bipolar Step Function Bipolar Threshold Function
• Sometimes, it is more convenient to work with bipolar data -1 and +1, than binary data
• If a signal value 0 is sent through a weighted path, the information contained in the
interconnection weight is lost as it is multiplied by 0
• To overcome this, the binary input is converted to bipolar form and then a suitable bipolar
activation function is applied
• The output of bipolar step function is -1 or +1
•
3. The Sigmoid Function
• The step function is not continuous and it is not differentiable
• Some ANN training algorithms require the activation function to be continuous and
differentiable
• The step function is not suitable for such cases
• Sigmoid functions have the nice property that they can approximate the step
function to the desired extent without losing its differentiability
• There are 2 types of sigmoid functions:
1. Binary sigmoid function (Logistics sigmoid function)
2. Bipolar sigmoid function
Binary Sigmoid Function
•
Bipolar Sigmoid Function
•
Hyperbolic Tangent Function
•
• Learning Algorithms
• The Basic Principle of ANN Learning
• Supervised Learning
• Hebb Rule
• Perceptron Learning Rule
• Delta Rule
• Extended Delta Rule
Learning Algorithms
• ANN is characterized by three entities:
• Its architecture
• Activation function
• Learning technique
• Learning refers to the process of finding the appropriate set of weights of the
interconnections so that the ANN attains the ability to perform the designated task.
• This process is called Training the ANN
• How to find the appropriate set of weights so that the ANN is able to solve a given
problem?
• To start with a set of weights and then gradually modify them to arrive at the final
weights
The Basic Principle of ANN
Learning
•
Supervised Learning
• Labelled training data
• The computer is presented with example inputs & outputs
• A teacher is present
Class Attribute /
Input Arguments Target Attribute
SIZE COLOR FRUIT NAME
BIG RED APPLE
SMALL RED CHERRY
BIG GREEN BANANA
SMALL GREEN GRAPE
• Classification : Target attribute → Categorical
• Prediction (Regression) : Target attribute → Numeric
Unsupervised Learning
• Unlabelled training data
• Derives structure from data based on relationships among attributes
• No teacher Size Color
Input Arguments
Big Small Red Green
SIZE COLOR
BIG RED
SMALL RED Size & Color
BIG GREEN
SMALL GREEN
Small Small
Big & Big & &
Red &
Red Green Green
• Clustering
Supervised Learning
Supervised
Learning
• A neural network is trained with the help of a set of patterns known as the training
vectors
• The output of these vectors might be, or might not be, known beforehand
• When these are known and that knowledge is employed in the training process, the
training is termed as supervised learning
• Otherwise, the learning is said to be unsupervised
• Some popular supervised learning methods are
• Perceptron Learning
• Delta Learning
• Least-Mean-Square (LMS) Learning
• Correlation Learning
• Outstar Learning
Linearly Separable data
1. Hebb Rule
•
2. Perceptron Learning
Rule
•
Perceptron Learning Rule –
cont..
•
Perceptron Learning Rule –
•
cont..
Learning Strategy
• If the perceptron produces the
desired output, then the weights need
not be changed
• If the perceptron misclassifies X
negatively (if it erroneously produces
-1 instead of +1) then the weights
should be appropriately increased
• If the perceptron misclassifies X
positively (if it erroneously produces
+1 instead of -1) then the weights
should be appropriately decreased
Perceptron Learning Rule –
cont..
Perceptron Learning Rule –
cont..
•
3. Delta / LMS (Least Mean Square) /
Widrow-Hoff Rule
•
4. Extended Delta Rule
•
Unsupervised
Learning
• In unsupervised learning, the target output may not be
available during the learning phase
• Pattern Clustering problem comes under Unsupervised
Learning
• Let us consider a set of points on a Cartesian plane
• The problem is to divide the given patterns into two
clusters so that when the neural net is presented with one
of these patterns, its output indicates the cluster to which
the pattern belongs
• Patterns that are close to each other should form a cluster
• We must have a suitable measure of closeness
Unsupervised
Learning
Clustering
• Clustering is an instance of
unsupervised learning
• It is not assisted by any teacher or
any target output
• The network itself has to understand
the patterns and put them into
appropriate clusters
• The only clue is the given number of clusters
• The neural network output layer has one unit for each cluster
• When an input pattern is given, exactly one of the output units should fire
• This is achieved through a mechanism called competition
Winner Takes All
•
Winner Takes All –
cont..
• The network finds the output unit that matches best for the current input vector and
makes it the winner
• The weight vector for the winner is updated according to the learning algorithm
• One way of deciding the winner is to employ the square of the Euclidean distance
between the input vector and the exemplar
• The unit that has the smallest Euclidean distance is chosen as the winner
Winner Takes All –
cont..
•
Competitive learning through Winner-takes-all
strategy
•
Competitive learning through Winner-takes-all strategy-
cont..
•
Initial Cluster Formation
Competitive learning through Winner-takes-all strategy-
cont..
Successive positions of the code vectors during the first epoch of the clustering process
Summary of ANN Learning
Rules
18CSE352T
NEURO FUZZY AND GENETIC
PROGRAMMING
THE
MCCULLOCH-PITTS
NEURAL MODEL
THE MCCULLOCH-PITTS NEURAL
MODEL
• The earliest artificial neural model was proposed by McCulloch and Pitts in 1943
• It consists of a number of input units connected to a single output unit
• The interconnecting links are unidirectional
THE MCCULLOCH-PITTS NEURAL
MODEL – cont..
1 There are two kinds of input units: excitatory and inhibitory
The excitatory inputs are connected to the output unit through positively weighted links.
Inhibitory inputs have negative weights on their connecting paths to the output unit.
2 All excitatory weights have the same positive magnitude w and all inhibitory weights have
the same negative magnitude −v.
3 The activation y_out = f(y_in) is binary, i.e., either 1 (in case the neuron fires), or 0 (in case
the neuron does not fire).
4 The activation function is a binary step function. It is 1 if the net input y_in is greater than
or equal to a given threshold value θ, and 0 otherwise.
5 The inhibition is absolute. A single inhibitory input should prevent the neuron from firing
irrespective of the number of active excitatory inputs.
The inhibitory inputs are those that have maximum effect on the decision making
irrespective of other inputs
THE MCCULLOCH-PITTS NEURAL
MODEL – cont..
•
McCulloch Pitts Model
Implementation of Logical Functions in McCulloch-Pitts Model
AND, OR, AND-NOT, XOR
Perceptron
A Simple Classification Problem
Linear & Non Linear Problems
XOR problem
Implementing AND function using Perceptron
THE MCCULLOCH-PITTS NEURAL
MODEL
• The earliest artificial neural model was proposed by McCulloch and Pitts in 1943
• It consists of a number of input units connected to a single output unit
• The interconnecting links are unidirectional
THE MCCULLOCH-PITTS NEURAL
MODEL – cont..
1 There are two kinds of input units: exitatory and inhibitory
The excitatory inputs are connected to the output unit through positively weighted links.
Inhibitory inputs have negative weights on their connecting paths to the output unit.
2 All excitatory weights have the same positive magnitude w and all inhibitory weights have
the same negative magnitude −v.
3 The activation y_out = f(y_in) is binary, i.e., either 1 (in case the neuron fires), or 0 (in case
the neuron does not fire).
4 The activation function is a binary step function. It is 1 if the net input y_in is greater than
or equal to a given threshold value θ, and 0 otherwise.
5 The inhibition is absolute. A single inhibitory input should prevent the neuron from firing
irrespective of the number of active excitatory inputs.
The inhibitory inputs are those that have maximum effect on the decision making
irrespective of other inputs
THE MCCULLOCH-PITTS NEURAL
MODEL – cont..
•
Topics that will be covered in this
Session
• Implementation of Logical Functions in McCulloch-Pitts Model
• AND, OR, AND-NOT, XOR
• Perceptron
• A Simple Classification Problem
• Linear & Non Linear Problems
• XOR problem
• Implementing AND function using Perceptron
MCCULLOCH-PITTS NEURON TO IMPLEMENT
LOGICAL ‘AND’
• All inputs are excitatory
• No inhibitory input is required to
implement the logical AND operation
• The interconnection weights and the
activation functions are so chosen that
the output is 1 if and only if both the
inputs are 1, otherwise it is 0
Truth Table : Logical
AND
MCCULLOCH-PITTS NEURON TO IMPLEMENT
LOGICAL ‘OR’
• The neuron outputs a 1 whenever there
is at least one 1 at its inputs.
• The neuron is structurally identical to
that of ‘AND’
• Only the Activation function is changed
appropriately so that the desired
functionality is ensured
Truth Table : Logical
OR
MCCULLOCH-PITTS NEURON TO IMPLEMENT LOGICAL
‘AND-NOT’
Truth Table
:
AND - NOT
MCCULLOCH-PITTS NEURON TO IMPLEMENT
LOGICAL ‘XOR’
• Simple neurons performing basic logical
operations can be combined together to
implement complex logic functions
• XOR function can be implemented with two
AND-NOT operations
• It is implemented with the help of a network
of neurons
• All the processing elements Y1, Y2 and Z have
the same activation function
Truth Table
:
XOR
Node A
FINDING THE FUNCTION OF Net input to A Activation of A Logic Function
x1 x2
A GIVEN A_in A_out Realized
MCCULLOCH-PITTS NET 0 0 0 0
0 1 1 0
AND
1 0 1 0
1 1 2 1
Node B
Net input to B Activation of B Logic Function
x1 x2
B_in B_out Realized
0 0 0 0
0 1 -1 0
AND-NOT
1 0 2 1
1 1 1 0
Node Z
Logic Function
x1 x2 A_out B_out Z_in Z_out
Realized
0 0 0 0 0 0
0 1 0 0 0 0
x1
1 0 0 1 2 1
Function realized by the network is f(x1, x2) = x1 1 1 1 0 2 1
Perceptron
Perceptron Linearly Separable data
• Perceptrons have the capacity to classify patterns
• For a given set of linearly separable patterns, it is always
possible to find a perceptron that solves the
classification problem (A linearly separable set of
patterns is one that can be completely partitioned by a
decision plane into 2 classes)
• The only thing is we should find the appropriate
combination of values for the weights
• This is achieved through a process called learning or
training
• The famous “perceptron convergence theorem” states
that for a set of linearly separable patterns, a perceptron
is guaranteed to learn the appropriate values of the
weights
Perceptron
A Simple Classification Problem
•
A Simple Classification Problem
•
A Simple Classification Problem
•
Forms of Equations
•
The XOR Problem
• Most of the real life classification problems are not
linearly separable
• A perceptron cannot learn to compute even a 2-bit
XOR as it is non linearly separable
• There is no single straight line to separate the patterns
producing 1s {(0,1), (1,0)} from the patterns producing
0s {(0,0), (1,1)}
• How to overcome this limitation?
1. Draw a curved decision surface. But a
perceptron cannot model any curved surface
2. To employ two decision lines (Multi-layered
perceptron)
Learning the logical AND function by a Perceptron
• Let us train a perceptron to realize the logical
AND function
• The training patterns and the corresponding
target outputs for AND operation where the
input and outputs are in bipolar form are given
in the table
• The bias is permanently set to 1
• The structure of the perceptron is shown in the
figure
• Activation function for the output unit is
Learning the logical AND function by a Perceptron –
cont..
•
EPOCH 1
•
EPOCH 1 – cont..
•
EPOCH 2
•
EPOCH 2 – cont..
•
EPOCH 3
•
EPOCH 3 – cont.. •
Training / Learning
Hebbian Network
ADALINE Networks
MADALINE Networks
Realizing the logical AND function through Hebb
Learning
•
Realizing the logical AND function through Hebb Learning – cont..
•
EPOCH 1
•
EPOCH 1 – cont..
•
EPOCH 2
•
EPOCH 2 –
cont..
•
EPOCH 3
•
EPOCH 3 –
•
cont..
ADALINE
ADALINE - ADAptive LInear NEuron
•
Procedure: ADALINE-Learning
Realizing the logical AND-NOT function using ADALINE
•
1 1 1 -1
1 1 -1 1
1 -1 1 -1
1 -1 -1 -1
EPOCH 1
•
EPOCH 1 – cont..
•
EPOCH 1 – cont..
•
EPOCH 1 – cont..
•
Application/Testing – At the end of EPOCH 1
•
Final Training & Testing
• One pattern is not correctly classified
• So, training has to be repeated for another epoch
• Testing after 2 epochs
• The net learns the designated task after two epochs
MADALINE
MADALINE – Many ADAptive LInear NEuron
• Several ADALINEs arranged in a multilayer net
• It is computationally more powerful than ADALINE
• The enhanced computational power of the MADALINE is
due to the hidden ADALINE units
• However, existence of the hidden units makes the
training process more complicated
• A MADALINE network with two inputs, one output and
one hidden layer with two hidden units is shown in fig.
• All units except the inputs employ the same activation
function as in ADALINE
• There are two training algorithms for MADALINE: MR-I
and MR-II
Procedure:
MADALINE-
MR-I
Learning
Procedure:
MADALINE-
MR-I
Learning
MR-I Algorithm
•
MR-I Alogirithm – Weight
Adjustments
•
MADALINE Training for the XOR function
• Let us train a MADALINE net through the MR-I algorithm
to realize the two-input XOR function
• The bipolar training set, including the bias input x0 which
is permanently fixed at 1 is given in Table
• The randomly hosen initial weights and the learning rate
are also given
MADALINE Training for the XOR function –Epoch 1
Steps 1 to 4 are already taken care of
MADALINE Training for the XOR function – Epoch 1 –
contd..
MADALINE Training for the XOR function – Epoch 1 - cont.
MADALINE Training for the XOR function – Epochs 2, 3 & 4