0% found this document useful (0 votes)
8 views31 pages

DL Unit 1a

Uploaded by

Rohit 6 World
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views31 pages

DL Unit 1a

Uploaded by

Rohit 6 World
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 31

DEEP LEARNING

UNIT-1-1(A)
Introduction to Deep Learning: Basics: Biological Neuron, Idea of computational
units, McCulloch Pitts unit and Thresholding logic, Linear Perceptron, Perceptron
Learning Algorithm, Linear separability. Convergence theorem for Perceptron Learning
Algorithm.

Biological Neuron& Idea of computational units:- Neuron is a special biological cell


that processes information. According to an estimation, there are huge number of
neurons, approximately 1011 with numerous interconnections, approximately 1015.

Working of a Biological Neuron


As shown in the above diagram, a typical neuron consists of the following four parts
with the help of which we can explain its working −
 Dendrites − They are tree-like branches, responsible for receiving the
information from other neurons it is connected to. In other sense, we can say
that they are like the ears of neuron.
 Soma − It is the cell body of the neuron and is responsible for processing of
information, they have received from dendrites.
 Axon − It is just like a cable through which neurons send the information.
 Synapses − It is the connection between the axon and other neuron dendrites.
Steps of neuron processing are:
1. Signals from connected neurons are collected by the dendrites.
2. The cells body (soma) sums the incoming signals (spatially and temporally).
3. When sufficient input is received (i.e., a threshold is exceeded), the neuron
generates an action potential or ‘spike’ (i.e., it ‘fires’).
4. That action potential is transmitted along the axon to other neurons, or to
structures outside the nervous systems (e.g., muscles).
5. If sufficient input is not received (i.e. the threshold is not exceeded), the
inputs quickly decay and no action potential is generated.
6. Timing is clearly important – input signals must arrive together, strong inputs
will generate more action potentials per unit time.

ARTIFICIAL NEURON:-

An artificial neuron is a connection point in an artificial neural network. Artificial


neural networks, like the human body's biological neural network, have a layered
architecture and each network node (connection point) has the capability to process
input and forward output to other nodes in the network.

An artificial neuron is a mathematical function conceived as a model of


biological neurons, a neural network. Artificial neurons are elementary units in
an artificial neural network. The artificial neuron receives one or more inputs
(representing excitatory postsynaptic potentials and inhibitory postsynaptic
potentials at neural dendrites) and sums them to produce an output (or activation,
representing a neuron's action potential which is transmitted along its axon). Usually
each input is separately weighted, and the sum is passed through a non-linear
function known as an activation function or transfer function. The transfer functions
usually have a sigmoid shape, but they may also take the form of other non-linear
functions.

Model of Artificial Neural Network


The following diagram represents the general model of ANN followed by its
processing.
Processing of ANN depends upon the following three building blocks −
 Network Topology
 Adjustments of Weights or Learning
 Activation Functions
In this chapter, we will discuss in detail about these three building blocks of ANN
Network Topology
A network topology is the arrangement of a network along with its nodes and
connecting lines. According to the topology, ANN can be classified as the following
kinds −
Feedforward Network
It is a non-recurrent network having processing units/nodes in layers and all the nodes
in a layer are connected with the nodes of the previous layers. The connection has
different weights upon them. There is no feedback loop means the signal can only
flow in one direction, from input to output. It may be divided into the following two
types −
 Single layer feedforward network − The concept is of feedforward ANN
having only one weighted layer. In other words, we can say the input layer is
fully connected to the output layer.
 Multilayer feedforward network − The concept is of feedforward ANN
having more than one weighted layer. As this network has one or more layers
between the input and the output layer, it is called hidden layers.

Feedback Network
As the name suggests, a feedback network has feedback paths, which means the signal
can flow in both directions using loops. This makes it a non-linear dynamic system,
which changes continuously until it reaches a state of equilibrium. It may be divided
into the following types −
 Recurrent networks − They are feedback networks with closed loops.
Following are the two types of recurrent networks.
 Fully recurrent network − It is the simplest neural network architecture
because all nodes are connected to all other nodes and each node works as both
input and output.
Adjustments of Weights or Learning
Learning, in artificial neural network, is the method of modifying the weights of
connections between the neurons of a specified network. Learning in ANN can be
classified into three categories namely supervised learning, unsupervised learning, and
reinforcement learning.
Supervised Learning
As the name suggests, this type of learning is done under the supervision of a teacher.
This learning process is dependent.
During the training of ANN under supervised learning, the input vector is presented to
the network, which will give an output vector. This output vector is compared with the
desired output vector. An error signal is generated, if there is a difference between the
actual output and the desired output vector. On the basis of this error signal, the
weights are adjusted until the actual output is matched with the desired output.

Unsupervised Learning
As the name suggests, this type of learning is done without the supervision of a
teacher. This learning process is independent.
During the training of ANN under unsupervised learning, the input vectors of similar
type are combined to form clusters. When a new input pattern is applied, then the
neural network gives an output response indicating the class to which the input pattern
belongs.
There is no feedback from the environment as to what should be the desired output
and if it is correct or incorrect. Hence, in this type of learning, the network itself must
discover the patterns and features from the input data, and the relation for the input
data over the output.

Reinforcement Learning
As the name suggests, this type of learning is used to reinforce or strengthen the
network over some critic information. This learning process is similar to supervised
learning, however we might have very less information.
During the training of network under reinforcement learning, the network receives
some feedback from the environment. This makes it somewhat similar to supervised
learning. However, the feedback obtained here is evaluative not instructive, which
means there is no teacher as in supervised learning. After receiving the feedback, the
network performs adjustments of the weights to get better critic information in future.
Activation Functions
It may be defined as the extra force or effort applied over the input to obtain an exact
output. In ANN, we can also apply activation functions over the input to get the exact
output. Followings are some activation functions
Activation functions are functions used in a neural network to compute the weighted
sum of inputs and biases, which is in turn used to decide whether a neuron can be
activated or not. It manipulates the presented data and produces an output for the
neural network that contains the parameters in the data. The activation functions are
also referred to as transfer functions in some literature. These can either be linear or
nonlinear
depending on the function it represents and is used to control the output of neural
networks across different domains.

For a linear model, a linear mapping of an input function to output is performed in the
hidden layers before the final prediction for each label is given. The input
vector x transformation is given by

f(x) = wT . x + b

where, x = input, w = weight, and b = bias.

Linear results are produced from the mappings of the above equation and the need for
the activation function arises here, first to convert these linear outputs into non-linear
output for further computation, and then to learn the patterns in the data. The output of
these models are given by

y = (w1 x1 + w2 x2 + … + wn xn + b)

These outputs of each layer are fed into the next subsequent layer for multilayered
networks until the final output is obtained, but they are linear by default. The expected
output is said to determine the type of activation function that has to be deployed in a
given network.
However, since the outputs are linear in nature, the nonlinear activation functions are
required to convert these linear inputs to non-linear outputs. These transfer functions,
applied to the outputs of the linear models to produce the transformed non-linear
outputs are ready for further processing. The non-linear output after the application of
the activation function is given by

y = α (w1 x1 + w2 x2 + … + wn xn + b)

where α is the activation function.

Why Activation Functions?

The need for these activation functions includes converting the linear input signals
and models into non-linear output signals, which aids the learning of high order
polynomials for deeper networks.

How to use it?

In a neural network every neuron will do two computations:


 Linear summation of inputs: In the above diagram, it has two inputs x1, x2 with
weights w1, w2, and bias b. And the linear sum z = w1 x1 + w2 x2 + … + wn xn + b
 Activation computation: This computation decides, whether a neuron should be
activated or not, by calculating the weighted sum and further adding bias with it. The
purpose of the activation function is to introduce non-linearity into the output of a
neuron.

Most neural networks begin by computing the weighted sum of the inputs. Each node
in the layer can have its own unique weighting. However, the activation function is
the same across all nodes in the layer. They are typical of a fixed form whereas the
weights are considered to be the learning parameters.

Types of Activation Functions

Activation Functions in Neural Networks


 Linear or Identity Activation Function.
 Non-linear Activation Function.
 Sigmoid or Logistic Activation Function.
 Tanh or hyperbolic tangent Activation Function.
 ReLU (Rectified Linear Unit) Activation Function.
 Leaky ReLU.

1) Linear Activation Functions

A linear function is also known as a straight-line function where the activation is


proportional to the input i.e. the weighted sum from neurons. It has a simple function
with the equation:

f(x) = ax + c
The problem with this activation is that it cannot be defined in a specific range.
Applying this function in all the nodes makes the activation function work like linear
regression. The final layer of the Neural Network will be working as a linear function
of the first layer. Another issue is the gradient descent when differentiation is done, it
has a constant output which is not good because during backpropagation the rate of
change of error is constant that can ruin the output and the logic of backpropagation.

2) Non-Linear Activation Functions

The non-linear functions are known to be the most used activation functions. It makes
it easy for a neural network model to adapt with a variety of data and to differentiate
between the outcomes.

These functions are mainly divided basis on their range or curves:

a) Sigmoid Activation Functions

Sigmoid takes a real value as the input and outputs another value between 0 and 1.
The sigmoid activation function translates the input ranged in (-∞,∞) to the range in
(0,1)

b) Tanh Activation Functions

The tanh function is just another possible function that can be used as a non-linear
activation function between layers of a neural network. It shares a few things in
common with the sigmoid activation function. Unlike a sigmoid function that will
map input values between 0 and 1, the Tanh will map values between -1 and 1.
Similar to the sigmoid function, one of the interesting properties of the tanh function
is that the derivative of tanh can be expressed in terms of the function itself.
c) ReLU Activation Functions

The formula is deceptively simple: max(0,z). Despite its name, Rectified Linear Units,
it’s not linear and provides the same benefits as Sigmoid but with better performance.

(i) Leaky Relu

Leaky Relu is a variant of ReLU. Instead of being 0 when z<0, a leaky ReLU allows a
small, non-zero, constant gradient α (normally, α=0.01). However, the consistency of
the benefit across tasks is presently unclear. Leaky ReLUs attempt to fix the “dying
ReLU” problem.

(ii) Parametric Relu

PReLU gives the neurons the ability to choose what slope is best in the negative
region. They can become ReLU or leaky ReLU with certain values of α.

d) Maxout:

The Maxout activation is a generalization of the ReLU and the leaky ReLU functions.
It is a piecewise linear function that returns the maximum of inputs, designed to be
used in conjunction with the dropout regularization technique. Both ReLU and leaky
ReLU are special cases of Maxout. The Maxout neuron, therefore, enjoys all the
benefits of a ReLU unit and does not have any drawbacks like dying ReLU. However,
it doubles the total number of parameters for each neuron, and hence, a higher total
number of parameters need to be trained.

e) ELU

The Exponential Linear Unit or ELU is a function that tends to converge faster and
produce more accurate results. Unlike other activation functions, ELU has an extra
alpha constant which should be a positive number. ELU is very similar to ReLU
except for negative inputs. They are both in the identity function form for non-
negative inputs. On the other hand, ELU becomes smooth slowly until its output equal
to -α whereas ReLU sharply smoothes.

f) Softmax Activation Functions

Softmax function calculates the probabilities distribution of the event over ‘n’
different events. In a general way, this function will calculate the probabilities of each
target class over all possible target classes. Later the calculated probabilities will help
determine the target class for the given inputs.

When to use which Activation Function in a Neural Network?

Specifically, it depends on the problem type and the value range of the expected
output. For example, to predict values that are larger than 1, tanh or sigmoid are not
suitable to be used in the output layer, instead, ReLU can be used. On the other hand,
if the output values have to be in the range (0,1) or (-1, 1) then ReLU is not a good
choice, and sigmoid or tanh can be used here. While performing a classification task
and using the neural network to predict a probability distribution over the mutually
exclusive class labels, the softmax activation function should be used in the last layer.
However, regarding the hidden layers, as a rule of thumb, use ReLU as an activation
for these layers.

In the case of a binary classifier, the Sigmoid activation function should be used. The
sigmoid activation function and the tanh activation function work terribly for the
hidden layer. For hidden layers, ReLU or its better version leaky ReLU should be
used. For a multiclass classifier, Softmax is the best-used activation function. Though
there are more activation functions known, these are known to be the most used
activation functions.
Activation Functions and their Derivatives
Difference between ANN and BNN:-.
1. Artificial Neural Network : Artificial Neural Network (ANN) is a type of neural
network which is based on a Feed-Forward strategy. It is called this because they
pass information through the nodes continuously till it reaches the output node. This
is also known as the simplest type of neural network. Some advantages of ANN :
 Ability to learn irrespective of the type of data (Linear or Non-Linear).
 ANN is highly volatile and serves best in financial time series forecasting.
Some disadvantages of ANN :
 The simplest architecture makes it difficult to explain the behavior of the
network.
 This network is dependent on hardware.
2. Biological Neural Network : Biological Neural Network (BNN) is a structure
that consists of Synapse, dendrites, cell body, and axon. In this neural network, the
processing is carried out by neurons. Dendrites receive signals from other neurons,
Soma sums all the incoming signals and axon transmits the signals to other cells.

Some advantages of BNN :


 The synapses are the input processing element.
 It is able to process highly complex parallel inputs.
Some disadvantages of BNN :
 There is no controlling mechanism.
 Speed of processing is slow being it complex.

Differences between ANN and BNN :

Parameters ANN BNN

input dendrites
weight synapse
Structure
output axon
hidden layer cell body

very precise structures and


Learning they can tolerate ambiguity
formatted data

complex simple
Processor high speed low speed
one or a few large number

separate from a processor integrated into processor


Memory localized distributed
non-content addressable content-addressable
centralized distributed
Computing sequential parallel
stored programs self-learning

Reliability very vulnerable robust

numerical and symbolic perceptual


Expertise
manipulations problems

well-defined poorly defined


Operating
Environment well-constrained un-constrained

the potential of fault performance degraded even on


Fault Tolerance
tolerance partial damage

McCulloch–Pitts NEURAL NETWORK:-

The McCulloch–Pitt neural network is considered to be the first neural network. The
neurons are connected by directed weighted paths. McCulloch–Pitt neuron allows
binary activation (1 ON or 0 OFF), i.e., it either fires with an activation 1 or does
not fire with an activation of 0.

In the network, connected path is said to be excitatory else it is known as inhibitory.


Excitatory connections have positive weights and inhibitory connections have
negative weights. Each neuron has a fixed threshold for firing. That is, if the net input
to the neuron is greater than the threshold, it fires.
Geometric Interpretation of McCulloh & Pitt’s Model
Let us understand the geometric interpretation of the model using the following
functions.
OR Function
We know that the thresholding parameter for OR function is 1, i.e. theta is 1. The
possible combinations of inputs are: (0,0), (0,1), (1,0), and (1,1). Considering the OR
function’s aggregation equation, i.e. x_1+x_2≥1, let us plot the graph.

The graph shows that the inputs for which the output when passed through OR
function M-P neuron lie ON or ABOVE (output is 1, positive) that line and all inputs
that lie BELOW (output is 0, negative) that line give the output as 0.
Therefore, the McCulloh Pitt’s Model has made a linear decision boundary which
splits the inputs into two classes, which are positive and negative.
AND Function
Similar to OR Function, we can plot the graph for AND function considering the
equation is x_1+x_2=2.

Here, the decision boundary separates all the input points that lie ON or ABOVE and
give output 1 with just (1,1) when passed through the AND function.
From these examples, we can understand that with increase in the number of inputs,
the dimensions which are plotted on the graph will also increase, which means that if
we consider 3 inputs with OR function, we will plot a graph on a three-dimensional
(3D) plane and draw a decision boundary in 3 dimensions.

PERCEPTRON:-
A Perceptron is an Artificial Neuron

Perceptron is a building block of an Artificial Neural Network. Initially, in the mid


of 19th century, Mr. Frank Rosenblatt invented the Perceptron for performing certain
calculations to detect input data capabilities or business intelligence. Perceptron is a
linear Machine Learning algorithm used for supervised learning for various binary
classifiers. This algorithm enables neurons to learn elements and processes them one
by one during preparation. In this tutorial, "Perceptron in Machine Learning," we will
discuss in-depth knowledge of Perceptron and its basic functions in brief. Let's start
with the basic introduction of Perceptron.
Scientists had discovered that brain cells (Neurons) receive input from our senses by
electrical signals.

The Neurons, then again, use electrical signals to store information, and to make
decisions based on previous input.

Frank had the idea that Perceptrons could simulate brain principles, with the ability
to learn and make decisions.

Perceptron model is also treated as one of the best and simplest types of Artificial
Neural networks. However, it is a supervised learning algorithm of binary classifiers.
Hence, we can consider it as a single-layer neural network with four main parameters,
i.e., input values, weights and Bias, net sum, and an activation function.

Differences between McCulloch & Pitt’s Model and Perceptron Model


1. McCulloh/Pitt’s Model accepts only boolean inputs whereas, Perceptron Model
can process inputs in various real forms.
2. In McCulloh/Pitt’s Model the inputs are not weighted which means that this
model is not flexible, however in comparison to this model, Perceptron model
accepts weights with respect to the provided inputs which makes it much more
flexible.
Similarities between McCulloch & Pitt’s Model and Perceptron Model
1. Both models can handle linearly separable data.
2. Threshold inputs can be adjusted in both models so that they can fit respective
datasets.

The Perceptron

The original Perceptron was designed to take a number of binary inputs, and
produce one binary output (0 or 1).

The idea was to use different weights to represent the importance of each input, and
that the sum of the values should be greater than a threshold value before making a
decision like true or false (0 or 1).
Basic Components of Perceptron

Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which
contains three main components. These are as follows:

o Input Nodes or Input Layer:

This is the primary component of Perceptron which accepts the initial data into the
system for further processing. Each input node contains a real numerical value.

o Wight and Bias:

Weight parameter represents the strength of the connection between units. This is
another most important parameter of Perceptron components. Weight is directly
proportional to the strength of the associated input neuron in deciding the output.
Further, Bias can be considered as the line of intercept in a linear equation.
How does Perceptron work?

In Machine Learning, Perceptron is considered as a single-layer neural network that


consists of four main parameters named input values (Input nodes), weights and Bias,
net sum, and an activation function. The perceptron model begins with the
multiplication of all input values and their weights, then adds these values together to
create the weighted sum. Then this weighted sum is applied to the activation function
'f' to obtain the desired output. This activation function is also known as the step
function and is represented by 'f'.

This step function or Activation function plays a vital role in ensuring that output is
mapped between required values (0,1) or (-1,1). It is important to note that the weight
of input is indicative of the strength of a node. Similarly, an input's bias value gives
the ability to shift the activation function curve up or down.

Perceptron model works in two important steps as follows:

Step-1

In the first step first, multiply all input values with corresponding weight values and
then add them to determine the weighted sum. Mathematically, we can calculate the
weighted sum as follows:

∑wi*xi = x1*w1 + x2*w2 +…wn*xn


Add a special term called bias 'b' to this weighted sum to improve the model's
performance.

∑wi*xi + b

Step-2

Types of Perceptron Models

Based on the layers, Perceptron models are divided into two types. These are as
follows:

1. Single-layer Perceptron Model


2. Multi-layer Perceptron model

Single Layer Perceptron Model:

This is one of the easiest Artificial neural networks (ANN) types. A single-layered
perceptron model consists feed-forward network and also includes a threshold transfer
function inside the model. The main objective of the single-layer perceptron model is
to analyze the linearly separable objects with binary outcomes.

In a single layer perceptron model, its algorithms do not contain recorded data, so it
begins with inconstantly allocated input for weight parameters. Further, it sums up all
inputs (weight). After adding all inputs, if the total sum of all inputs is more than a
pre-determined value, the model gets activated and shows the output value as +1.

If the outcome is same as pre-determined or threshold value, then the performance of


this model is stated as satisfied, and weight demand does not change. However, this
model consists of a few discrepancies triggered when multiple weight inputs values
are fed into the model. Hence, to find desired output and minimize errors, some
changes should be necessary for the weights input.

"Single-layer perceptron can learn only linearly separable patterns."

Multi-Layered Perceptron Model:

Like a single-layer perceptron model, a multi-layer perceptron model also has the
same model structure but has a greater number of hidden layers.

The multi-layer perceptron model is also known as the Backpropagation algorithm,


which executes in two stages as follows:
o Forward Stage: Activation functions start from the input layer in the forward
stage and terminate on the output layer.
o Backward Stage: In the backward stage, weight and bias values are modified
as per the model's requirement. In this stage, the error between actual output
and demanded originated backward on the output layer and ended on the input
layer.

Hence, a multi-layered perceptron model has considered as multiple artificial neural


networks having various layers in which activation function does not remain linear,
similar to a single layer perceptron model. Instead of linear, activation function can be
executed as sigmoid, TanH, ReLU, etc., for deployment.

A multi-layer perceptron model has greater processing power and can process linear
and non-linear patterns. Further, it can also implement logic gates such as AND, OR,
XOR, NAND, NOT, XNOR, NOR.

Advantages of Multi-Layer Perceptron:

o A multi-layered perceptron model can be used to solve complex non-linear


problems.
o It works well with both small and large input data.
o It helps us to obtain quick predictions after the training.
o It helps to obtain the same accuracy ratio with large as well as small data.

Disadvantages of Multi-Layer Perceptron:

o In Multi-layer perceptron, computations are difficult and time-consuming.


o In multi-layer Perceptron, it is difficult to predict how much the dependent
variable affects each independent variable.
o The model functioning depends on the quality of the training.

Perceptron Function

Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x' with
the learned weight coefficient 'w'.

Mathematically, we can express it as follows:


f(x)=1; if w.x+b>0

otherwise, f(x)=0

o 'w' represents real-valued weights vector


o 'b' represents the bias
o 'x' represents a vector of input x values.

Characteristics of Perceptron

The perceptron model has the following characteristics.

1. Perceptron is a machine learning algorithm for supervised learning of binary


classifiers.
2. In Perceptron, the weight coefficient is automatically learned.
3. Initially, weights are multiplied with input features, and the decision is made
whether the neuron is fired or not.
4. The activation function applies a step rule to check whether the weight function
is greater than zero.
5. The linear decision boundary is drawn, enabling the distinction between the
two linearly separable classes +1 and -1.
6. If the added sum of all input values is more than the threshold value, it must
have an output signal; otherwise, no output will be shown.

Limitations of Perceptron Model

A perceptron model has limitations as follows:

o The output of a perceptron can only be a binary number (0 or 1) due to the hard
limit transfer function.
o Perceptron can only be used to classify the linearly separable sets of input
vectors. If input vectors are non-linear, it is not easy to classify them properly.

Future of Perceptron

The future of the Perceptron model is much bright and significant as it helps to
interpret data by building intuitive patterns and applying them in the future. Machine
learning is a rapidly growing technology of Artificial Intelligence that is continuously
evolving and in the developing phase; hence the future of perceptron technology will
continue to support and facilitate analytical behavior in machines that will, in turn,
add to the efficiency of computers.

The perceptron model is continuously becoming more advanced and working


efficiently on complex problems with the help of artificial neurons.

Perceptron Example

Imagine a perceptron (in your brain).

The perceptron tries to decide if you should go to a concert.

Is the artist good? Is the weather good?

What weights should these facts have?

Criteria Input Weight

Artists is Good x1 = 0 or 1 w1 = 0.7

Weather is Good x2 = 0 or 1 w2 = 0.6

Friend will Come x3 = 0 or 1 w3 = 0.5

Food is Served x4 = 0 or 1 w4 = 0.3


Alcohol is Served x5 = 0 or 1 w5 = 0.4

The Perceptron Algorithm

Frank Rosenblatt suggested this algorithm:

1. Set a threshold value


2. Multiply all inputs with its weights
3. Sum all the results
4. Activate the output

1. Set a threshold value:

 Threshold = 1.5

2. Multiply all inputs with its weights:

 x1 * w1 = 1 * 0.7 = 0.7
 x2 * w2 = 0 * 0.6 = 0
 x3 * w3 = 1 * 0.5 = 0.5
 x4 * w4 = 0 * 0.3 = 0
 x5 * w5 = 1 * 0.4 = 0.4

3. Sum all the results:

 0.7 + 0 + 0.5 + 0 + 0.4 = 1.6 (The Weighted Sum)

4. Activate the Output:

 Return true if the sum > 1.5 ("Yes I will go to the Concert")

Perceptron Terminology

 Perceptron Inputs
 Node values
 Node Weights
 Activation Function
Perceptron Inputs

Perceptron inputs are called nodes.

The nodes have both a value and a weight.

Node Values

In the example above, the node values are: 1, 0, 1, 0, 1

The binary input values (0 or 1) can be interpreted as (no or yes) or (false or true).

Node Weights

Weights shows the strength of each node.

In the example above, the node weights are: 0.7, 0.6, 0.5, 0.3, 0.4

The Activation Function

The activation functions maps the result (the weighted sum) into a required value like
0 or 1.

In the example above, the activation function is simple: (sum > 1.5)

The binary output (1 or 0) can be interpreted as (yes or no) or (true or false).

Note

It is obvious that a decision is NOT made by one neuron alone.

Other neurons must provide input: Is the artist good. Is the weather good...

In Neuroscience, there is a debate if single-neuron encoding or distributed encoding


is most relevant for understanding brain functions.
Neural Networks
The Perceptron defines the first step into Neural Networks.

Multi-Layer Perceptrons can be used for very sophisticated decision making.

In the Neural Network Model, input data (yellow) are processed against a hidden
layer (blue) and modified against more hidden layers (green) to produce the final
output (red).

The First Layer:


The 3 yellow perceptrons are making 3 simple decisions based on the input evidence.
Each single decision is sent to the 4 perceptrons in the next layer.

The Second Layer:


The blue perceptrons are making decisions by weighing the results from the first
layer. This layer make more complex decisions at a more abstract level than the first
layer.

The Third Layer:


Even more complex decisions are made by the green perceptons.

A perceptron can be trained to recognize the points over the line, without knowing the
formula for the line.
A Perceptron is often used to classify data into two parts.

A Perceptron is also known as a Linear Binary Classifier.

The perceptron model is a more general computational model than McCulloch-Pitts


neuron. It takes an input, aggregates it (weighted sum) and returns 1 only if the
aggregated sum is more than some threshold else returns 0. Rewriting the threshold as
shown above and making it a constant input with a variable weight, we would end up
with something like the following:
A single perceptron can only be used to implement linearly separable functions. It
takes both real and boolean inputs and associates a set of weights to them, along with
a bias (the threshold thing I mentioned above). We learn the weights, we get the
function. Let's use a perceptron to learn an OR function.

OR Function Using A Perceptron

You might also like