0% found this document useful (0 votes)
13 views15 pages

DSP Unit - Iv

The document provides an overview of Multi-Layer Perceptrons (MLPs), including their structure, activation functions like Sigmoid, tanh, and ReLu, and the process of back propagation for training. It also discusses loss functions for classification tasks, epochs, batch sizes, and introduces Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks, highlighting their applications and advantages. Additionally, it covers Convolutional Neural Networks (CNNs), their architecture, and the importance of layers such as convolution and pooling in image processing.

Uploaded by

harshakoushil04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views15 pages

DSP Unit - Iv

The document provides an overview of Multi-Layer Perceptrons (MLPs), including their structure, activation functions like Sigmoid, tanh, and ReLu, and the process of back propagation for training. It also discusses loss functions for classification tasks, epochs, batch sizes, and introduces Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks, highlighting their applications and advantages. Additionally, it covers Convolutional Neural Networks (CNNs), their architecture, and the importance of layers such as convolution and pooling in image processing.

Uploaded by

harshakoushil04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

ASHOKA WOMEN’S ENGINEERING COLLEGE (AUTONOMOUS)

UNIT- 4
MULTILAYER PERCEPTRON (or) ANN (Artificial Neural Network) (or) Feed Forward:

 The Perceptron consists of an input layer and an output layer which are fully connected.

 A fully connected Multi-Layered Neural Network is known as Multi-Layer Perceptron.

 A Multi-Layered Neural Network consists of multiple layers of artificial neurons or nodes.

 MLPs have the same input and output layers but may have multiple hidden layers in between
the aforementioned layers, as seen below.

Sigmoid: takes real-valued input and squashes it to range between 0 and 1.

When we plot the output from sigmoid units given various weighted sums as input, it looks remarkably
like a step function:
V. ARUNA KUMARI-Asst. Professor-Dept of MCA
Page 1
ASHOKA WOMEN’S ENGINEERING COLLEGE (AUTONOMOUS)

tanh: takes real-valued input and squashes it to the range [-1, 1 ].

ReLu: ReLu stands for Rectified Linear Units. It takes real-valued input and thresholds it to 0 (replaces
negative values to 0).

Example Multi-layer ANN with Sigmoid Units:

 We will concern ourselves here with ANNs containing only one hidden layer, as this makes
describing the back propagation routine easier.
 Note that networks where you can feed in the input on the left and propagate it forward to get an
output are called feed forward networks.
 Below is such an ANN, with two sigmoid units in the hidden layer. The weights have been set
arbitrarily between all the units.

 Note that the sigma units have been identified with sigma signs in the node on the graph. As we did
with perceptrons, we can give this network an input and determine the output. We can also look to see
which units "fired", i.e., had a value closer to 1 than to 0.
 Suppose we input the values 10, 30, 20 into the three input units, from top to bottom. Then the
weighted sum coming into H1 will be:
SH1 = (0.2 * 10) + (-0.1 * 30) + (0.4 * 20) = 2 -3 + 8 = 7.
 Then the σ function is applied to S H1 to give:
σ(SH1) = 1/(1+e-7) = 1/(1+0.000912) = 0.999
 [Don't forget to negate S]. Similarly, the weighted sum coming into H2 will be:
SH2 = (0.7 * 10) + (-1.2 * 30) + (1.2 * 20) = 7 - 36 + 24 = -5

V. ARUNA KUMARI-Asst. Professor-Dept of MCA


Page 2
ASHOKA WOMEN’S ENGINEERING COLLEGE (AUTONOMOUS)

 and σ applied to S H2 gives:


σ(SH2) = 1/(1+e5) = 1/(1+148.4) = 0.0067
 From this, we can see that H1 has fired, but H2 has not. We can now calculate that the weighted
sum going in to output unit O1 will be:
SO1 = (1.1 * 0.999) + (0.1*0.0067) = 1.0996
 and the weighted sum going in to output unit O2 will be:
SO2 = (3.1 * 0.999) + (1.17*0.0067) = 3.1047
 The output sigmoid unit in O1 will now calculate the output values from the network for O1:
σ(SO1) = 1/(1+e-1.0996) = 1/(1+0.333) = 0.750
 and the output from the network for O2:
σ(SO2) = 1/(1+e-3.1047) = 1/(1+0.045) = 0.957
 Therefore, if this network represented the learned rules for a categorisation problem, the input triple
(10,30,20) would be categorised into the category associated with O2, because this has the larger
output

BACK PROPAGATION:

V. ARUNA KUMARI-Asst. Professor-Dept of MCA


Page 3
ASHOKA WOMEN’S ENGINEERING COLLEGE (AUTONOMOUS)

 So, with back propagation you basically try to change the weights of your model while training.

V. ARUNA KUMARI-Asst. Professor-Dept of MCA


Page 4
ASHOKA WOMEN’S ENGINEERING COLLEGE (AUTONOMOUS)

V. ARUNA KUMARI-Asst. Professor-Dept of MCA


Page 5
ASHOKA WOMEN’S ENGINEERING COLLEGE (AUTONOMOUS)

LOSS FUNCTIONS:

V. ARUNA KUMARI-Asst. Professor-Dept of MCA


Page 6
ASHOKA WOMEN’S ENGINEERING COLLEGE (AUTONOMOUS)

 Loss functions can be classified into two major categories depending upon the type of learning task
we are dealing with Regression losses and Classification losses.

Loss functions for Classification:


1. Binary Cross Entropy Loss:
It gives the probability value between 0 and 1 for a classification task. Cross-Entropy calculates the
average difference between the predicted and actual probabilities.
Mathematical formulation:-

V. ARUNA KUMARI-Asst. Professor-Dept of MCA


Page 7
ASHOKA WOMEN’S ENGINEERING COLLEGE (AUTONOMOUS)

2. Hinge Loss:
 This type of loss is used when the target variable has 1 or -1 as class labels. It penalizes the model
when there is a difference in the sign between the actual and predicted class values.
 Hinge loss is used for maximum-margin classification.
Mathematical formulation:-

EPOCHS AND BATCH SIZES:


 An epoch means training the neural network with all the training data for one cycle.
 In an epoch, we use all of the data exactly once. A forward pass and a backward pass together are
counted as one pass.

 An epoch is made up of one or more batches, where we use a part of the data set to train the neural network. We call
passing through the training examples in batch iteration.
 An epoch is sometimes mixed with iteration. To clarify the concepts, let’s consider a simple example where we
have1000 data points as presented in the figure below:

 If the batch size is 1000, we can complete an epoch with a single iteration. Similarly, if the batch size is
500, an epoch takes two iterations. So, if the batch size is 100, an epoch takes10 iterations to complete. Simply,
for each epoch, the required number of iterations times the batch size gives the number of data points.
 We can use multiple epochs in training. In this case, the neural network is fed the same data more than
once.
V. ARUNA KUMARI-Asst. Professor-Dept of MCA
Page 8
ASHOKA WOMEN’S ENGINEERING COLLEGE (AUTONOMOUS)

V. ARUNA KUMARI-Asst. Professor-Dept of MCA


Page 9
ASHOKA WOMEN’S ENGINEERING COLLEGE (AUTONOMOUS)

RECURRENT NEURAL NETWORK (RNN):

Types of Recurrent Neural Networks:


There are four types of Recurrent Neural Networks:
1. One to One
2. One to Many
3. Many to One
4. Many to Many

V. ARUNA KUMARI-Asst. Professor-Dept of MCA


Page 10
ASHOKA WOMEN’S ENGINEERING COLLEGE (AUTONOMOUS)

Applications of Recurrent Neural Networks:


 Image Captioning: RNNs are used to caption an image by analyzing the activities present.
 Time Series Prediction: Any time series problem, like predicting the prices of stocks in a
particular month, can be solved using an RNN.
 Natural Language Processing: Text mining and Sentiment analysis can be carried out using an
RNN for Natural Language Processing (NLP).
 Machine Translation: Given an input in one language, RNNs can be used to translate the input
into different languages as output.
Advantages of Recurrent Neural Network:
1. An RNN remembers each and every information through time.
2. RNN used with convolutional layers to extend the effective pixel neighborhood.
Disadvantages of Recurrent Neural Network:
1. Gradient vanishing and exploding problems.
2. Training an RNN is a very difficult task.
3. It cannot process very long sequences if using tanh or relu as an activation function.
LONG SHORT-TERM MEMORY (LSTM):
 Long Short-Term Memory (LSTM) networks are an extension of RNN that extend the memory,
which makes it easier to remember past data in memory.
 LSTM are used as the building blocks for the layers of a RNN.
 LSTMs assign data “weights” which helps RNNs to either let new information in, forget
information or give it importance enough to impact the output.
 The units of an LSTM are used as building units for the layers of a RNN, often called an LSTM
network.
 LSTMs enable RNNs to remember inputs over a long period of time. This is because LSTMs
contain information in a memory, much like the memory of a computer. The LSTM can read, write and
delete information from its memory.
 In an LSTM you have three gates: input, forget and output gate. These gates determine whether or
not to let new input in (input gate), delete the information because it isn’t important (forget gate), or let
it impact the output at the current time step (output gate).

V. ARUNA KUMARI-Asst. Professor-Dept of MCA


Page 11
ASHOKA WOMEN’S ENGINEERING COLLEGE (AUTONOMOUS)

Architecture of LSTM network:


 LSTM network have a sequence like structure, but the recurring network has a different module.
Instead of having single neural network layer, they have small parts connected to each other which
function in storing and removal of memory.

1. Input gate- It discover which value from input should be used to modify the
memory. Sigmoid function decides which values to let through 0 or 1. And tanh function gives
weightage to the values which are passed, deciding their level of importance ranging from -1 to 1.

2. Forget gate- It discover the details to be discarded from the block. A sigmoid function decides it. It
looks at the previous state (ht-1) and the content input (Xt) and outputs a number between 0(omit this)
and 1(keep this) for each number in the cell state Ct-1.

3. Output gate- The input and the memory of the block are used to decide the output. Sigmoid function decides
which values to let through 0 or 1. And tanh function decides which values to let through 0, 1. And tanh function
gives weightage to the values which are passed, deciding their level of importance ranging from -1 to 1 and
multiplied with an output of sigmoid.

V. ARUNA KUMARI-Asst. Professor-Dept of MCA


Page 12
ASHOKA WOMEN’S ENGINEERING COLLEGE (AUTONOMOUS)

CONVOLUTIONAL NEURAL NETWORK (CNN):


 Convolutional Neural Network is a special kind of multi-layer neural networks.
 Convolutional Neural Network is one of the main categories to do image classification and image
recognition in neural networks. Scene labeling, objects detections, and face recognition, etc., are some
of the areas where convolutional neural networks are widely used.
 CNN takes an image as input, which is classified and process under a certain category such as dog,
cat, lion, tiger, etc. The computer sees an image as an array of pixels and depends on the resolution of
the image. Based on image resolution, it will see as h * w * d, where h= height w= width and d=
dimension.
 Fully-connected network architecture does not take into account the spatial structure.
 In CNN, each input image will pass through a sequence of convolution layers along with pooling,
fully connected layers, filters (Also known as kernels). After that, we will apply the Soft-max function
to classify an object with probabilistic values 0 and 1.
Why Convolutions:
 Parameter sharing: a feature detector (such as a vertical edge detector) that’s useful in one part of
the image is probably useful in another part of the image.
 Sparsity of connections: In each layer, each output value depends only on small number of inputs.

Convolution Layer:
Convolution layer is the first layer to extract features from an input image. By learning image features
using a small square of input data, the convolutional layer preserves the relationship between pixels. It is
a mathematical operation which takes two inputs such as image matrix and a kernel or filter.
o The dimension of the image matrix is h×w×d.
o The dimension of the filter is f h×fw×d.
o The dimension of the output is (h-f h+1)×(w-fw+1)×1.

V. ARUNA KUMARI-Asst. Professor-Dept of MCA


Page 13
ASHOKA WOMEN’S ENGINEERING COLLEGE (AUTONOMOUS)

Stride:
Stride means how many cells the filter is moved in the input to calculate the next cell in the result.
When the stride is equaled to 1, then we move the filters to 1 pixel at a time and similarly, if the stride is
equaled to 2, then we move the filters to 2 pixels at a time. The following figure shows that the
convolution would work with a stride of 2.

Padding:
1. It allows us to use a CONV layer without necessarily shrinking the height and width of the
volumes. This is important for building deeper networks, since otherwise the height/width would shrink
as we go to deeper layers.
2. It helps us keep more of the information at the border of an image. Without padding, very few
values at the next layer would be affected by pixels as the edges of an image.

V. ARUNA KUMARI-Asst. Professor-Dept of MCA


Page 14
ASHOKA WOMEN’S ENGINEERING COLLEGE (AUTONOMOUS)

Pooling Layer:

 Pooling layer is used to reduce the size of the representations and to speed up calculations.
 In conventional CNNs, the feature map from the convolutional layer is subsample in a pooling
layer before being passed on to the next convolutional layer.
 The pooling layer works to replace a small patch in the feature map with its summary statistic.
 For example, the popular max-pooling layer reduces the input patch to a single value, the
maximum of all values within that patch. Other alternative pooling strategies involve taking the average,
weighted average of the patch as a sub sampling technique.
 Average Pooling: Down-scaling will perform through average pooling by dividing the input into
rectangular pooling regions and computing the average values of each region.

Advantages:
 Good at detecting patterns and features in images, videos, and audio signals and Robust transulate.
 Very High accuracy in image recognition problems.
 Automatically detects the important features without any human supervision.
Disadvantages:
 Computationally expensive to train and require a lot of memory.
 Requires large amounts of labeled data.

V. ARUNA KUMARI-Asst. Professor-Dept of MCA


Page 15

You might also like