0% found this document useful (0 votes)
9 views12 pages

Unit-1 NN

Uploaded by

studymust00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views12 pages

Unit-1 NN

Uploaded by

studymust00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

What is Deep learning:

Deep learning is a subfield within machine learning, and it’s gaining traction for its
ability to extract features from data. Deep learning uses Artificial Neural Networks
(ANNs) to extract higher-level features from raw data. ANNs, though much
different from human brains, were inspired by the way humans biologically
process information. The learning a computer does is considered “deep” because
the networks use layering to learn from, and interpret, raw information.

For example, deep learning is an important asset for image processing in


everything from e-commerce to medical imagery. Google is equipping its
programs with deep learning to discover patterns in images in order to display the
correct image for whatever you search. If you search for a winter jacket, Google’s
machine and deep learning will team up to discover patterns in images — sizes,
colors, shapes, relevant brand titles — that display pertinent jackets that satisfy
your query.

Deep learning is also making headwinds in radiology, pathology and any medical
sector that relies heavily on imagery. The technology relies on its tacit knowledge
from studying millions of other scans — to immediately recognize disease or
injury, saving doctors and hospitals both time and money.

The three popular types of deep neural networks are:

 Multi-Layer Perceptrons (MLP)


 Convolutional Neural Networks (CNN)
 Recurrent Neural Networks (RNN)

Feed-Forward Neural Network:


Feed-forward neural networks allows signals to travel one approach only, from
input to output. There is no feedback (loops) such as the output of some layer does
not influence that same layer. Feed-forward networks tends to be simple networks
that associates inputs with outputs. It can be used in pattern recognition. This type
of organization is represented as bottom-up or top-down.
Each unit in the hidden layer is generally completely connected to some units in
the input layer. Because this network includes standard units, the units in the
hidden layer compute their output by multiplying the value of each input by its
correlating weight, inserting these up, and using the transfer function.
How Feedforwardd Neural Networks Work

Feedforward neural networks were among the first and most successful learning
algorithms. They are also called deep networks, multi
multi-layer
layer perceptron (MLP), or
simply neural networks. As data travels through the network’s artificial mesh,
m each
layer processes an aspect of the data, filters outliers, spots familiar entities and
produces the final output.

Feedforward neural networks are made up of the following:

1. Input layer: This layer consists of the neurons that receive inputs and pass
pa
them on to the other layers. The number of neurons in the input layer should
be equal to the attributes or features in the dataset.
2. Output layer: The output layer is the predicted feature and depends on the
type of model you’re building.
3. Hidden layer: Inn between the input and output layer, there are hidden layers
based on the type of model. Hidden layers contain a vast number of neurons
which apply transformations to the inputs before passing them. As the
network is trained, the weights are updated to be more predictive.
4. Neuron weights: Weights refer to the strength or amplitude of a connection
between two neurons. If you are familiar with linear regression, you can
compare weights on inputs like coefficients. Weights are often initialized to
small randomm values, such as values in the range 0 to 1.
Here’s how the neural network computes the data in three simple steps:

1. Multiplication of weights and inputs: The input is multiplied by the assigned


weight values, which this case would be the following:

(x1* w1) = (0 * 0.1) = 0

(x2* w2) = (1 * 12) = 12

(x3* w3) = (11 * 1) = 11

2. Adding the biases: In the next step, the product found in the previous step is
added to their respective biases. The modified inputs are then summed up to a
single value.

(x1* w1) + b1 = 0 + 1

(x2* w2) + b2 = 12 + 0

(x3* w3) + b3 = 11 + 0

weighted_sum = (x1* w1) + b1 + (x2* w2) + b2 + (x3* w3) + b3 = 23

3. Activation: An activation function is the mapping of summed weighted input to


the output of the neuron. It is called an activation/transfer function because it
governs the inception at which the neuron is activated and the strength of the
output signal.

Output signal: Finally, the weighted sum obtained is turned into an output signal
by feeding the weighted sum into an activation function (also called transfer
function). Since the weighted sum in our example is greater than 20, the perceptron
predicts it to be a rainy day.
Layers (Presentation) of feed forward neural network
network:

 Input layer:
The neurons of this layer receive input and pass it on to the other layers of
the network. Feature or attribute numbers in the dataset must match the
number of neurons in the input layer.
 Output layer:
According to the type of model getting built, this la layer
yer represents the
forecasted feature.
 Hidden layer:
Input and output layers get separated by hidden layers. Depending on the
type of model, there may be several hidden layers.
There are several neurons in hidden layers that transform the input before
actually transferring it to the next layer. This network gets constantly
updated with weights in order to make it easier to predict.
 Neuron weights:
Neurons get connected by a weight, which measures their strength or
magnitude. Similar to linear regression coefficients, input weights can also
get compared.

Weight is normally between 0 and 1, with a value between 0 and 1.

 Neurons:
Artificial neurons get used in feed forward networks, which later get adapted
from biological neurons. A neural network consists of artificial neurons.
Neurons function in two ways: first, they create weighted input sums, and
second, they activate the sums to make them normal.
Activation functions can either be linear or nonlinear. Neurons have weights
based on their inputs. During the learning phase, the network studies these
weights.
 Activation Function:
Neurons are responsible for making decisions in this area.
According to the activation function, the neurons determine whether to make
a linear or nonlinear decision. Since it passes through so many layers, it
prevents the cascading effect from increasing neuron outputs.
An activation function can be classified into three major categories: sigmoid,
Tanh, and Rectified Linear Unit (ReLu).
o Sigmoid:
Input values between 0 and 1 get mapped to the output values.
o Tanh:
A value between -1 and 1 gets mapped to the input values.
o Rectified linear Unit:
Only positive values are allowed to flow through this function.
Negative values get mapped to 0.
Advantages of feed forward Neural Networks

 Machine learning can be boosted with feed forward neural networks'


simplified architecture.
 Multi-network in the feed forward networks operate independently, with a
moderated intermediary.
 Complex tasks need several neurons in the network.
 Neural networks can handle and process nonlinear data easily compared to
perceptrons and sigmoid neurons, which are otherwise complex.
 A neural network deals with the complicated problem of decision
boundaries.
 Depending on the data, the neural network architecture can vary. For
example, convolutional neural networks (CNNs) perform exceptionally well
in image processing, whereas recurrent neural networks (RNNs) perform
well in text and voice processing.
 Neural networks need graphics processing units (GPUs) to handle large
datasets for massive computational and hardware performance. Several
GPUs get used widely in the market, including Kaggle Notebooks and
Google Collab Notebooks.

Multi Layer Perceptron:

A Multilayer Perceptron or MLP is one of the simplest feed-forward neural


networks. Multilayer Perceptrons are the types of neural networks which are
bidirectional as they foreword propagation of the inputs and backward propagation
of the weights.

Some machine learning practitioners often confuse Perceptron and a Multilayer


Perceptron with each other. Perceptron is the most basic architecture of the neural
network, it is also known as a single-layered neural network. Perceptron is
specially designed for the problems of binary classification, but MLPs has nothing
to do with perceptron.

A Multilayer Perceptron has an input layer and an output layer with one or more
hidden layers. In MLPs, all neurons in one layer are connected to all neurons in the
next layer. Here, the input layer receives the input signals and the desired task is
performed
erformed by the output layer. And the hidden layers are responsible for all the
calculations. Here is the architecture of the multilayer perceptrons:

Representation of MultiLayer Perceptron:

Representation power is related to ability of a neural network to assign proper


labels to a particular instance and create well defined accurate decision boundaries
for that class.

It all begins with MP Neuro


Neuron which is very simplified model of a neuron. With
very simple idea of summation of all inputs being larger than a threshold for a given
function the neuron triggers otherwise it doesn’t. Very primitive beginning in deed.
For checkingg its representation power let’s see a geometric interpretation. First a 2-
2
D analysis with 2 inputs for approximating OR function and then moving onto 3-D 3
analysis with 3 inputs.
Remember: Any Boolean function of n inputs can be represented by a network of
perceptrons containing 1 hidden layer with 2^n perceptrons and one output layer
containing 1 perceptron. It is a sufficient condition not necessary.
What is the Sigmoid Function?

A Sigmoid function is a mathematical function which has a characteristic S-shaped


curve. There are a number of common sigmoid functions, such as the logistic
function, the hyperbolic tangent, and the arctangent.

In machine learning, the term sigmoid function is normally used to refer


specifically to the logistic function, also called the logistic sigmoid function.

All sigmoid functions have the property that they map the entire number line into a
small range such as between 0 and 1, or -1 and 1, so one use of a sigmoid function
is to convert a real value into one that can be interpreted as a probability.

Logistic Sigmoid Function Formula

 One of the commonest sigmoid functions is the logistic sigmoid function.


This is often referred to as the Sigmoid Function in the field of machine
learning. The logistic sigmoid function is defined as follows:
 Mathematical definition of the logistic sigmoid function, a common sigmoid
function
 The logistic function takes any real-valued input, and outputs a value
between zero and one.

Hyperbolic Tangent Function Formula

 Another common sigmoid function is the hyperbolic function. This maps


any real-valued input to the range between -1 and 1.
 Mathematical definition of the hyperbolic tangent

Arctangent Function Formula

 A third alternative sigmoid function is the arctangent, which is the inverse of


the tangent function.
 The arctangent function
 The arctangent function maps any real-valued input to the range −π/2 to π/2.
Gradient Descent

 Gradient Descent is an optimizing algorithm used in Machine/ Deep


Learning algorithms. The goal of Gradient Descent is to minimize the
objective convex function f(x) using iteration.
 For ease, let’s take a simple linear model.
 Error = Y(Predicted)-Y(Actual)
 A machine learning model always wants low error with maximum accuracy,
in order to decrease error we will intuit our algorithm that you’re doing
something wrong that is needed to be rectified, that would be done through
Gradient Descent.
 We need to minimize our error, in order to get pointer to minima we need to
walk some steps that are known as alpha(learning rate).

Steps to implement Gradient Descent

1. Randomly initialize values.


2. Update values.

3. Repeat until slope =0.

A derivative is a term that comes from calculus and is calculated as the slope of the
graph at a particular point. The slope is described by drawing a tangent line to the
graph at the point. So, if we are able to compute this tangent line, we might be able
to compute the desired direction to reach the minima.

Learning rate must be chosen wisely as:


1. if it is too small, then the model will take some time to learn.
2. if it is too large, model will converge as our pointer will shoot and we’ll not be
able to get to minima.
Vanilla gradient descent, however, can’t guarantee good convergence, due to
following reasons:

 Picking an appropriate learning rate can be troublesome. A learning rate that is


too low will lead to slow training and a higher learning rate will lead to
overshooting of slope.

 Another key hurdle faced by Vanilla Gradient Descent is it avoid getting


trapped in local minima; these local minimas are surrounded by hills of same
error, which makes it really hard for vanilla Gradient Descent to escape it.

You might also like