0% found this document useful (0 votes)
36 views77 pages

Unit 5

Uploaded by

rahuljssstu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views77 pages

Unit 5

Uploaded by

rahuljssstu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 77

Multilayer Perceptron

Unit-5

References
1. https://www.bing.com/search?pglt=41&q=mcculloch+pitts+model&cvid=df84765ab37b4038acd2
2c530fafc67c&aqs=edge.5.69i57j0l8.12113j0j1&FORM=ANNTA1&PC=ACTS&ntref=1#

2. https://www.bing.com/search?pglt=41&q=mcculloch+pitts+model&cvid=df84765ab37b4038acd2
2c530fafc67c&aqs=edge.5.69i57j0l8.12113j0j1&FORM=ANNTA1&PC=ACTS&ntref=1#
Syllabus
• Multilayer Perceptron: Introduction to ANN, Perceptron, types of
activation function, training a Perceptron, Learning Boolean Functions,
Multilayer Perceptron (MLP), Back Propagation Algorithm
Artificial Neural Network
It is an information processing system, which is constructed
and implemented to model the human brain.
The aim of NN is to mimic the human ability to adapt to changing circumstances
and current environments.
In other words NN is a machine learning approach inspired by
the way in which the brain performs a particular learning task:
– Knowledge about the learning task is given in the form of examples.
– Inter neuron connection strengths (weights) are used to store the
acquired
information (the training examples). 2
– During the learning process the weights are modified in order to model the
particular learning task correctly on the training examples
– Neural Network models: perceptron, feed-forward, radial basis
function, support vector machine..
In fact, the human brain is a highly complex structure viewed as a
massive, highly interconnected network of simple processing
elements called neurons. On an average human brain has around 1011
neurons.
Artificial neural networks (ANNs) or simply we refer it as neural
network (NNs), which are simplified models (i.e. imitations) of the
biological nervous system, and obviously, therefore, have been
motivated by the kind of computing performed by the human
brain.
The behavior of a biological neural network can be captured by
a simple model called artificial neural network.
Structure of a Neuron
Dendrite : A bush of very thin fibre.
Axon : A long cylindrical fibre.
Soma : It is also called a cell body, and just like as a nucleus of cell.
Synapse : It is a junction where axon makes contact with the
dendrites of neighboring dendrites and receive messages at the
dendrites
Message is sent quickly down the axon using electrical
impulses What happens when the signal reaches the end of the
axon?
A neuron is a part of an interconnected network of nervous
system and serves the following.
Compute input signals Transportation of signals (at a very high
speed) Storage of information Perception, automatic training
and learning
We also can see the analogy between the biological neuron
and artificial neuron.
Truly, every component of the model (i.e. artificial neuron) bears a
direct analogy to that of a biological neuron.
It is this model which forms the basis of neural network (i.e.
artificial neural network). The representation of an ANN is as
shown in next slide.
Here,
x1, x2, · · · , xn are the n inputs to the artificial neuron.
w1, w2, · · · , wn are weights attached to the input
Note that, a biological neuron receives all inputs through the dendrites, sums them and produces
an output if the sum is greater than a threshold value.
The input signals are passed on to the cell body through the synapse, which may accelerate
or retard an arriving signal.
It is this acceleration or retardation of the input signals that is modeled by the weights.
An effective synapse, which transmits a stronger signal will have a correspondingly larger
weights while a weak synapse will have smaller weights.
Thus, weights here are multiplicative factors of the inputs to account for the strength of the
synapse.
Hence, the total input say I received by the soma of the artificial neuron is I =
w1x1 + w2x2 + · · · + wnxn
To generate the final output y, the sum is passed to a filter φ called transfer function, which
releases the output. That is, y = φ(I)
Differences between ANN and BNN :
Biological Neural Networks (BNNs) and Artificial Neural Networks (ANNs) are both composed of similar
basic components, but there are some differences between them.
Neurons: In both BNNs and ANNs, neurons are the basic building blocks that process and transmit
information. However, BNN neurons are more complex and diverse than ANNs. In BNNs, neurons have
multiple dendrites that receive input from multiple sources, and the axons transmit signals to other
neurons, while in ANNs, neurons are simplified and usually only have a single output.
Synapses: In both BNNs and ANNs, synapses are the points of connection between neurons, where
information is transmitted. However, in ANNs, the connections between neurons are usually fixed, and the
strength of the connections is determined by a set of weights, while in BNNs, the connections between
neurons are more flexible, and the strength of the connections can be modified by a variety of factors,
including learning and experience.
Neural Pathways: In both BNNs and ANNs, neural pathways are the connections between neurons that
allow information to be transmitted throughout the network. However, in BNNs, neural pathways are highly
complex and diverse, and the connections between neurons can be modified by experience and learning.
In ANNs, neural pathways are usually simpler and predetermined by the architecture of the network.
Advantages of NN

1. ANNs exhibits mapping capabilities, that is, they can map input patterns to their
associated output pattern.
2. The ANNs learn by examples. In other words, they can identify new objects previously
untrained.
3. The ANNs possess the capability to generalize. This is the power to apply in
application where exact mathematical model to problem are not possible.The ANNs
are robust system and fault tolerant. They can therefore, recall full patterns from
incomplete, partial or noisy patterns.
4. The ANNS can process information in parallel, at high speed and in a distributed manner.
Thus a massively parallel distributed processing system made up of highly
interconnected (artificial) neural computing elements having ability to learn and acquire
knowledge is possible.
Applications of NN
1. Voice recognition
2. Weather Prediction
3. Strategies for Games, businessand war
4. Fraud Detection
5. Data mining
6. Medical Diagnosis, Photo and fingerprint recognition
Perceptron
 The artificial neuron simply activates its output when more than a
certain number of its inputs are active
 Perceptron is a single layer neural network or simply a neuron.
 So perceptron is a ANN with single layer neural network without
having hidden layers.
Perceptron consists of
4parts
 input values
 weights and a Constant/Bias
 a weighted sum, and
 Step function / Activation function
Neuron :The neuron is the basic information processing unit of a
NN. It consists of:
1 A set of synapses or connecting links, each link
characterized by a weight:
• W1, W2, …, Wm
• 2 An adder function (linear combiner) which computes the
weighted sum of the inputs:

11

3 Activation function (squashing function) for limiting the


amplitude of the output of the neuron.
Here,
x1, x2, · · · , xn are the n inputs to the artificial neuron.
w1, w2, · · · , wn are weights attached to the input
Each neuron consists of three major components:
A set of ‘i’ synapses having weight wi. A signal xi forms the input to the i-th
synapse having weight wi. The value of any weight may be positive or negative. A
positive weight has an extraordinary effect, while a negative weight has an
inhibitory effect on the output of the summation junction.

A summation junction for the input signals is weighted by the respective synaptic
weight. Because it is a linear combiner or adder of the weighted input signals, the
output of the summation junction can be expressed as follows:

A threshold activation function (or simply the activation function, also known as
squashing function) results in an output signal only when an input signal exceeding
a specific threshold value comes as an input. It is similar in behaviour to the
biological neuron which transmits the signal only when the total input signal meets
the firing threshold.
Summary…
Why connection weight and bias
•Weights are the parameters that the model learns during training. They determine the
strength of the influence of input features and capture relationships between input features
and target outputs.

•Biases provide flexibility to the model, allowing it to fit the training data better by adjusting
the output independently of the weights. Biases are akin to the intercept in a linear equation.
They enable the model to fit the data better by providing a starting point in the output space.
Linear and Non-linear part of neuron
What is an activation function and why use them?
The activation function decides whether a neuron should be
activated or not by calculating the weighted sum and further
adding bias to it. The purpose of the activation function is to
introduce non-linearity into the output of a neuron.
Why do we need Non-linear activation function?
A neural network without an activation function is essentially
just a linear regression model. The activation function does
the non-linear transformation to the input making it capable
to learn and perform more complex tasks.
Need of activation
function:
 They are used in the hidden and output layers.
 Activation function is a function that is added into
an artificial neural network in order to help the
network learn complex patterns in the
data.
 The activation function will decide what is to be
fired to the next neuron
Popular Activation
Functions
1. Popular types of activation functions
are:
1. Step function
2. Sign function

3. Linear function

4. ReLU (Rectified Linear Unit): no –ve value

5. Leaky ReLU

6. Tanh

7. Sigmoid

8. softmax
Types of Activation Functions –
Several different types of activation functions are used in Deep Learning.
Some of them are explained below:
1) Step Function:
Step Function is one of the simplest kind of activation functions. In this, we
consider a threshold value and if the value of net input say y is greater than
the threshold then the neuron is activated.
Mathematically,

1. f(x) = 1 if x>= 0
2. f(x) = 0 if x < 0
2) Sign function
3) Sigmoid Function:
Sigmoid function is a widely used activation function. It is defined
as:
f(x) = 1/(1+e-x) . This is a smooth function and is
differentiable. continuously
The biggest advantage that it has over step and linear function is that it
is non-linear. This is an incredibly cool feature of the sigmoid
function.
This essentially means that when there are multiple neurons having
sigmoid function as their activation function –the output is non linear
as well.
The function ranges from 0-1 having an S shape.
3) Sigmoid Function
4) Linear function
5) ReLU:
The ReLU function is the Rectified linear unit. It is the most
widely used activation function. It is defined as: f(x) = Max(0,x)
The main advantage of using the ReLU function over other
activation functions is that it does not activate all the neurons at
the same time.
5) ReLU function
6) Leaky Rectified Linear Unit
6) Leaky ReLU
 Leaky Rectified Linear Unit, or Leaky ReLU, is a type of
activation function based on a ReLU, but it has a small
slope for negative values instead of a flat slope.
7) Tanh: Hyperbolic Tangent : any value
between -1 to +1.
8) SoftMax function : It is the variant of
sigmoid function with multi class classification
Activation
Functions
 In most of the cases you can use the ReLU activation function in the hidden
layers. It is a bit faster to compute than other activation functions.
 For the output layer, the softmax activation function is generally a good
choice for classification tasks.
It is very well known that the most fundamental unit of neural
networks is called an artificial neuron(Mc Culloch-Pitts)
/perceptron (Rosenblatt). But the very first step towards the Neuron
was taken in 1943 by McCulloch and Pitts, by mimicking the
functionality of a biological neuron.
It may be divided into 2 parts. The first part, g takes an input,
performs an aggregation and based on the aggregated value the
second part, f makes a decision.
We can see that g(x) is just doing a sum of the inputs — a
simple aggregation. And theta here is called thresholding
parameter( Activation Function).

A single Mc Culloch pitts neuron can be used to represent boolean


fns which are linearly separable.
Linear Separability : There exists a line(plane) such that all inputs
which produce 1 lie on one side of the (plane) line and all other
inputs which produce 0 lie on the other side of the line(plane)
In 2 dimensional space it is a plane. In higher dimensional space it is
a hyper plane.
AND Gate In this case, the decision boundary equation is x_1 + x_2
=2. Here, all the input points that lie ON or ABOVE, just (1,1),
output 1 when passed through the AND function M-P neuron. It fits!
The decision boundary works!
This representation just denotes that, for the boolean inputs x_1, x_2
and x_3 if the g(x) i.e., sum ≥ theta, the neuron will fire otherwise,
it won’t.
AND Gate
An AND function neuron would only fire when ALL the inputs
are ON i.e., g(x) ≥ 3 here.
OR Gate An OR function neuron would only fire when ALL
the inputs are ON i.e., g(x) ≥ 1 here.

A Function With An Inhibitory Input


Lets verify that, the g(x) i.e., x_1 + x_2 would be ≥ 1 in only 3 cases:
Case 1: when x_1 is 1 and x_2 is 0 = 1
Case 2: when x_1 is 1 and x_2 is 1
Case 3: when x_1 is 0 and x_2 is 1
OR Function
OR function’s thresholding parameter theta is 1, for obvious reasons.
The inputs are obviously boolean, so only 4 combinations are
possible
— (0,0), (0,1), (1,0) and (1,1). Now plotting them on a 2D graph and
making use of the OR function’s aggregation equation i.e., x_1 +
x_2
≥ 1 using which we can draw the decision boundary as shown in
the graph below.
ANDNOT function using McCulloch-Pitts
Neuron
NOR Function : For a NOR neuron to fire, we want ALL the inputs
to be 0 so the thresholding parameter should also be 0 and we take
them all as inhibitory input.

NOT Function For a NOT neuron, 1 outputs 0 and 0 outputs 1.


So we take the input as an inhibitory input and set the thresholding
parameter to 0.
Perceptrons- 1958- Frank RosenBlatt
What about non-boolean(Say, real inputs)?
We need a way of learning these. What if some of these inputs have to
be given more importance (weightage) than the others. And also
which are not linearly separable.
A perceptron is a more general computational mode.
Main differences: Introduction of Numerical wts for inputs and a
mechanism for learning these wts.

Mathematically: neuron fires if X1W1 + X2W2 + X3W3 + ... > Theta


The values are boolean for Mc Cullochs neuron where as it can
be real for perceptron.
Bias of a Neuron
• Bias w0 represents the prior (prejudice )v = u + w0]]

• v is the induced field of the neuron.


v

u
The wts w1 w2 w3…………wn and bias w0 depends on the data 28
PERCEPTR
ON

The perceptron model is a more general computational model than McCulloch-Pitts


neuron.
It takes an input, aggregates it (weighted sum) and returns 1 only if the aggregated
sum is more than some threshold else returns 0. Rewriting the threshold as shown
above and making it a constant input with a variable weight, we would end up with
something like in the next slide:
A single perceptron can only be used to implement linearly separable functions. It takes both
real and boolean inputs and associates a set of weights to them, along with a bias
OR Function Using A
Perceptron
• Though Boolean functions like AND and OR are linearly separable and
are solvable using the perceptron, certain functions like XOR are not.
Limitations of Perceptrons:
● Perceptron works only with linearly separable classes.
● Perceptron can only classify linearly separable sets of vectors.
● The output values of a perceptron can take on only one of two values (0 or 1) due
to the hard-limit transfer function.
● Perceptron doesn’t have the ability to learn a simple logical function like
‘XOR’.
Multilayer Perceptron's
• A perceptron that has a single layer of weights cannot be used for
nonlinear regression like the XOR.
• Hence the multilayer perceptrons perceptrons (MLP) can implement
nonlinear discriminants.
• It is the feedforward networks with intermediate or hidden layers
between the input and the output layers.

 The resulting ANN is called a Multi-Layer Perceptron (MLP).


 MLP will have one or more hidden layers.
Multi Layer Perceptron (MLP)
Implementing an XOR operation using an
MLP
• Any Boolean function can be implemented by a multilayer perceptron with one hidden layer.
• Any continuous function can be approximated with arbitrary accuracy by a multi layer perceptron.
• Boolean function is implemented as each conjunction is implemented by one hidden unit and the
disjunction by the output unit.

• So two perceptron's can in parallel implement the two AND, and another perceptron on top can OR them
together
How many hidden
layers?
 For any application, number of hidden layers and number of nodes
in each hidden layer is not fixed.

 It will be varied till the output moves towards zero error or till we
get a satisfactory output.
Number of Neurons per Hidden
Layers
 Usually the number of neurons in the input and output layers is determined by the type of
input and output your task required.
 For the hidden layer the common practice is to size them to form a funnel, with fewer and
fewer neurons at each layer.
 For example a typical neural network may have two hidden layers, the first with 300 neurons
and the second with 100.
 However, this practice is not as common now, and you may simply use the same size for all
hidden layers; for example all hidden layers with 150 neurons.
 Neurons can be gradually increased until the network starts overfitting.
Training a Perceptron
 Perceptrons are trained considering the error made by the network.
 For every output neuron that produced a wrong prediction, it reinforces
the connection weights from the inputs that would have contributed to
the correct prediction.
Perceptron Learning Rule
 Perceptron learning rule (weight update)


w ijt   rit  y it x tj 
Update  LearningFa ctor  DesiredOut put  ActualOutp ut  Input

 • wi, j is the connection weight between the ith input neuron and the jth
output neuron.
 • xi is the ith input value of the current training instance.
 • ^yj is the output of the jth output neuron for the current training
instance.
 • yj is the target output of the jth output neuron for the current training
instance.
 • η is the learning rate.
 This process is repeated till the error rate is close tozer
o
Perceptron Learning Algorithm
Our goal is to find the w vector that can perfectly classify
positive inputs and negative inputs in our data.
Forward and Backward
Propagation
 Forward Propagation is the way to move from the Input layer
(left) to the Output layer (right) in the neural network. It is also
called as Feed forward.

 The process of moving from the right to left i.e backward from
the Output to the Input layer is called the Backward
Propagation.

 Backward propagation is required to correct the error or


generally it is said to make the system to learn.
Back
propagation
• Backpropagation is the essence of neural network training. It is the method of
fine-tuning the weights of a neural network based on the error rate
obtained in the previous epoch (i.e., iteration).
• Proper tuning of the weights allows you to reduce error rates and make
the model reliable by increasing its generalization.
• Backpropagation in neural network is a short form for “backward propagation
of errors.” It is a standard method of training artificial neural networks. This
method helps calculate the gradient of a loss function with respect to all the
weights in the network.
Feed forward and Backward propagation
Example: Neural Network to find whether the given input
is Square or circle or triangle
End of Machine Learning.

THANK YOU AND


ALL THE BEST!!

You might also like