DL Unit 1a
DL Unit 1a
UNIT-1-1(A)
Introduction to Deep Learning: Basics: Biological Neuron, Idea of computational
units, McCulloch Pitts unit and Thresholding logic, Linear Perceptron, Perceptron
Learning Algorithm, Linear separability. Convergence theorem for Perceptron Learning
Algorithm.
ARTIFICIAL NEURON:-
Feedback Network
As the name suggests, a feedback network has feedback paths, which means the signal
can flow in both directions using loops. This makes it a non-linear dynamic system,
which changes continuously until it reaches a state of equilibrium. It may be divided
into the following types −
Recurrent networks − They are feedback networks with closed loops.
Following are the two types of recurrent networks.
Fully recurrent network − It is the simplest neural network architecture
because all nodes are connected to all other nodes and each node works as both
input and output.
Adjustments of Weights or Learning
Learning, in artificial neural network, is the method of modifying the weights of
connections between the neurons of a specified network. Learning in ANN can be
classified into three categories namely supervised learning, unsupervised learning, and
reinforcement learning.
Supervised Learning
As the name suggests, this type of learning is done under the supervision of a teacher.
This learning process is dependent.
During the training of ANN under supervised learning, the input vector is presented to
the network, which will give an output vector. This output vector is compared with the
desired output vector. An error signal is generated, if there is a difference between the
actual output and the desired output vector. On the basis of this error signal, the
weights are adjusted until the actual output is matched with the desired output.
Unsupervised Learning
As the name suggests, this type of learning is done without the supervision of a
teacher. This learning process is independent.
During the training of ANN under unsupervised learning, the input vectors of similar
type are combined to form clusters. When a new input pattern is applied, then the
neural network gives an output response indicating the class to which the input pattern
belongs.
There is no feedback from the environment as to what should be the desired output
and if it is correct or incorrect. Hence, in this type of learning, the network itself must
discover the patterns and features from the input data, and the relation for the input
data over the output.
Reinforcement Learning
As the name suggests, this type of learning is used to reinforce or strengthen the
network over some critic information. This learning process is similar to supervised
learning, however we might have very less information.
During the training of network under reinforcement learning, the network receives
some feedback from the environment. This makes it somewhat similar to supervised
learning. However, the feedback obtained here is evaluative not instructive, which
means there is no teacher as in supervised learning. After receiving the feedback, the
network performs adjustments of the weights to get better critic information in future.
Activation Functions
It may be defined as the extra force or effort applied over the input to obtain an exact
output. In ANN, we can also apply activation functions over the input to get the exact
output. Followings are some activation functions
Activation functions are functions used in a neural network to compute the weighted
sum of inputs and biases, which is in turn used to decide whether a neuron can be
activated or not. It manipulates the presented data and produces an output for the
neural network that contains the parameters in the data. The activation functions are
also referred to as transfer functions in some literature. These can either be linear or
nonlinear
depending on the function it represents and is used to control the output of neural
networks across different domains.
For a linear model, a linear mapping of an input function to output is performed in the
hidden layers before the final prediction for each label is given. The input
vector x transformation is given by
f(x) = wT . x + b
Linear results are produced from the mappings of the above equation and the need for
the activation function arises here, first to convert these linear outputs into non-linear
output for further computation, and then to learn the patterns in the data. The output of
these models are given by
y = (w1 x1 + w2 x2 + … + wn xn + b)
These outputs of each layer are fed into the next subsequent layer for multilayered
networks until the final output is obtained, but they are linear by default. The expected
output is said to determine the type of activation function that has to be deployed in a
given network.
However, since the outputs are linear in nature, the nonlinear activation functions are
required to convert these linear inputs to non-linear outputs. These transfer functions,
applied to the outputs of the linear models to produce the transformed non-linear
outputs are ready for further processing. The non-linear output after the application of
the activation function is given by
y = α (w1 x1 + w2 x2 + … + wn xn + b)
The need for these activation functions includes converting the linear input signals
and models into non-linear output signals, which aids the learning of high order
polynomials for deeper networks.
Most neural networks begin by computing the weighted sum of the inputs. Each node
in the layer can have its own unique weighting. However, the activation function is
the same across all nodes in the layer. They are typical of a fixed form whereas the
weights are considered to be the learning parameters.
f(x) = ax + c
The problem with this activation is that it cannot be defined in a specific range.
Applying this function in all the nodes makes the activation function work like linear
regression. The final layer of the Neural Network will be working as a linear function
of the first layer. Another issue is the gradient descent when differentiation is done, it
has a constant output which is not good because during backpropagation the rate of
change of error is constant that can ruin the output and the logic of backpropagation.
The non-linear functions are known to be the most used activation functions. It makes
it easy for a neural network model to adapt with a variety of data and to differentiate
between the outcomes.
Sigmoid takes a real value as the input and outputs another value between 0 and 1.
The sigmoid activation function translates the input ranged in (-∞,∞) to the range in
(0,1)
The tanh function is just another possible function that can be used as a non-linear
activation function between layers of a neural network. It shares a few things in
common with the sigmoid activation function. Unlike a sigmoid function that will
map input values between 0 and 1, the Tanh will map values between -1 and 1.
Similar to the sigmoid function, one of the interesting properties of the tanh function
is that the derivative of tanh can be expressed in terms of the function itself.
c) ReLU Activation Functions
The formula is deceptively simple: max(0,z). Despite its name, Rectified Linear Units,
it’s not linear and provides the same benefits as Sigmoid but with better performance.
Leaky Relu is a variant of ReLU. Instead of being 0 when z<0, a leaky ReLU allows a
small, non-zero, constant gradient α (normally, α=0.01). However, the consistency of
the benefit across tasks is presently unclear. Leaky ReLUs attempt to fix the “dying
ReLU” problem.
PReLU gives the neurons the ability to choose what slope is best in the negative
region. They can become ReLU or leaky ReLU with certain values of α.
d) Maxout:
The Maxout activation is a generalization of the ReLU and the leaky ReLU functions.
It is a piecewise linear function that returns the maximum of inputs, designed to be
used in conjunction with the dropout regularization technique. Both ReLU and leaky
ReLU are special cases of Maxout. The Maxout neuron, therefore, enjoys all the
benefits of a ReLU unit and does not have any drawbacks like dying ReLU. However,
it doubles the total number of parameters for each neuron, and hence, a higher total
number of parameters need to be trained.
e) ELU
The Exponential Linear Unit or ELU is a function that tends to converge faster and
produce more accurate results. Unlike other activation functions, ELU has an extra
alpha constant which should be a positive number. ELU is very similar to ReLU
except for negative inputs. They are both in the identity function form for non-
negative inputs. On the other hand, ELU becomes smooth slowly until its output equal
to -α whereas ReLU sharply smoothes.
Softmax function calculates the probabilities distribution of the event over ‘n’
different events. In a general way, this function will calculate the probabilities of each
target class over all possible target classes. Later the calculated probabilities will help
determine the target class for the given inputs.
Specifically, it depends on the problem type and the value range of the expected
output. For example, to predict values that are larger than 1, tanh or sigmoid are not
suitable to be used in the output layer, instead, ReLU can be used. On the other hand,
if the output values have to be in the range (0,1) or (-1, 1) then ReLU is not a good
choice, and sigmoid or tanh can be used here. While performing a classification task
and using the neural network to predict a probability distribution over the mutually
exclusive class labels, the softmax activation function should be used in the last layer.
However, regarding the hidden layers, as a rule of thumb, use ReLU as an activation
for these layers.
In the case of a binary classifier, the Sigmoid activation function should be used. The
sigmoid activation function and the tanh activation function work terribly for the
hidden layer. For hidden layers, ReLU or its better version leaky ReLU should be
used. For a multiclass classifier, Softmax is the best-used activation function. Though
there are more activation functions known, these are known to be the most used
activation functions.
Activation Functions and their Derivatives
Difference between ANN and BNN:-.
1. Artificial Neural Network : Artificial Neural Network (ANN) is a type of neural
network which is based on a Feed-Forward strategy. It is called this because they
pass information through the nodes continuously till it reaches the output node. This
is also known as the simplest type of neural network. Some advantages of ANN :
Ability to learn irrespective of the type of data (Linear or Non-Linear).
ANN is highly volatile and serves best in financial time series forecasting.
Some disadvantages of ANN :
The simplest architecture makes it difficult to explain the behavior of the
network.
This network is dependent on hardware.
2. Biological Neural Network : Biological Neural Network (BNN) is a structure
that consists of Synapse, dendrites, cell body, and axon. In this neural network, the
processing is carried out by neurons. Dendrites receive signals from other neurons,
Soma sums all the incoming signals and axon transmits the signals to other cells.
input dendrites
weight synapse
Structure
output axon
hidden layer cell body
complex simple
Processor high speed low speed
one or a few large number
The McCulloch–Pitt neural network is considered to be the first neural network. The
neurons are connected by directed weighted paths. McCulloch–Pitt neuron allows
binary activation (1 ON or 0 OFF), i.e., it either fires with an activation 1 or does
not fire with an activation of 0.
The graph shows that the inputs for which the output when passed through OR
function M-P neuron lie ON or ABOVE (output is 1, positive) that line and all inputs
that lie BELOW (output is 0, negative) that line give the output as 0.
Therefore, the McCulloh Pitt’s Model has made a linear decision boundary which
splits the inputs into two classes, which are positive and negative.
AND Function
Similar to OR Function, we can plot the graph for AND function considering the
equation is x_1+x_2=2.
Here, the decision boundary separates all the input points that lie ON or ABOVE and
give output 1 with just (1,1) when passed through the AND function.
From these examples, we can understand that with increase in the number of inputs,
the dimensions which are plotted on the graph will also increase, which means that if
we consider 3 inputs with OR function, we will plot a graph on a three-dimensional
(3D) plane and draw a decision boundary in 3 dimensions.
PERCEPTRON:-
A Perceptron is an Artificial Neuron
The Neurons, then again, use electrical signals to store information, and to make
decisions based on previous input.
Frank had the idea that Perceptrons could simulate brain principles, with the ability
to learn and make decisions.
Perceptron model is also treated as one of the best and simplest types of Artificial
Neural networks. However, it is a supervised learning algorithm of binary classifiers.
Hence, we can consider it as a single-layer neural network with four main parameters,
i.e., input values, weights and Bias, net sum, and an activation function.
The Perceptron
The original Perceptron was designed to take a number of binary inputs, and
produce one binary output (0 or 1).
The idea was to use different weights to represent the importance of each input, and
that the sum of the values should be greater than a threshold value before making a
decision like true or false (0 or 1).
Basic Components of Perceptron
Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which
contains three main components. These are as follows:
This is the primary component of Perceptron which accepts the initial data into the
system for further processing. Each input node contains a real numerical value.
Weight parameter represents the strength of the connection between units. This is
another most important parameter of Perceptron components. Weight is directly
proportional to the strength of the associated input neuron in deciding the output.
Further, Bias can be considered as the line of intercept in a linear equation.
How does Perceptron work?
This step function or Activation function plays a vital role in ensuring that output is
mapped between required values (0,1) or (-1,1). It is important to note that the weight
of input is indicative of the strength of a node. Similarly, an input's bias value gives
the ability to shift the activation function curve up or down.
Step-1
In the first step first, multiply all input values with corresponding weight values and
then add them to determine the weighted sum. Mathematically, we can calculate the
weighted sum as follows:
∑wi*xi + b
Step-2
Based on the layers, Perceptron models are divided into two types. These are as
follows:
This is one of the easiest Artificial neural networks (ANN) types. A single-layered
perceptron model consists feed-forward network and also includes a threshold transfer
function inside the model. The main objective of the single-layer perceptron model is
to analyze the linearly separable objects with binary outcomes.
In a single layer perceptron model, its algorithms do not contain recorded data, so it
begins with inconstantly allocated input for weight parameters. Further, it sums up all
inputs (weight). After adding all inputs, if the total sum of all inputs is more than a
pre-determined value, the model gets activated and shows the output value as +1.
Like a single-layer perceptron model, a multi-layer perceptron model also has the
same model structure but has a greater number of hidden layers.
A multi-layer perceptron model has greater processing power and can process linear
and non-linear patterns. Further, it can also implement logic gates such as AND, OR,
XOR, NAND, NOT, XNOR, NOR.
Perceptron Function
Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x' with
the learned weight coefficient 'w'.
otherwise, f(x)=0
Characteristics of Perceptron
o The output of a perceptron can only be a binary number (0 or 1) due to the hard
limit transfer function.
o Perceptron can only be used to classify the linearly separable sets of input
vectors. If input vectors are non-linear, it is not easy to classify them properly.
Future of Perceptron
The future of the Perceptron model is much bright and significant as it helps to
interpret data by building intuitive patterns and applying them in the future. Machine
learning is a rapidly growing technology of Artificial Intelligence that is continuously
evolving and in the developing phase; hence the future of perceptron technology will
continue to support and facilitate analytical behavior in machines that will, in turn,
add to the efficiency of computers.
Perceptron Example
Threshold = 1.5
x1 * w1 = 1 * 0.7 = 0.7
x2 * w2 = 0 * 0.6 = 0
x3 * w3 = 1 * 0.5 = 0.5
x4 * w4 = 0 * 0.3 = 0
x5 * w5 = 1 * 0.4 = 0.4
Return true if the sum > 1.5 ("Yes I will go to the Concert")
Perceptron Terminology
Perceptron Inputs
Node values
Node Weights
Activation Function
Perceptron Inputs
Node Values
The binary input values (0 or 1) can be interpreted as (no or yes) or (false or true).
Node Weights
In the example above, the node weights are: 0.7, 0.6, 0.5, 0.3, 0.4
The activation functions maps the result (the weighted sum) into a required value like
0 or 1.
In the example above, the activation function is simple: (sum > 1.5)
Note
Other neurons must provide input: Is the artist good. Is the weather good...
In the Neural Network Model, input data (yellow) are processed against a hidden
layer (blue) and modified against more hidden layers (green) to produce the final
output (red).
A perceptron can be trained to recognize the points over the line, without knowing the
formula for the line.
A Perceptron is often used to classify data into two parts.