B.Tech.
(ARTIFICIAL INTELLIGENCE AND DATA
SCIENCE) (Semester IV)
DEEP LEARNING (U20ADT508)
Day and Date: Tuesday 27/08/2024
PART A (5 X 2 = 10 Marks)
Answer all the Questions
S.No Key Points Marks
Allotted
Q.1 What is Deep Learning? 2 Marks
Deep learning is a type of machine learning that uses artificial neural networks to teach
computers to process data in a way that mimics the human brain.
Deep learning models can recognize complex patterns in data to make predictions and produce
insights.
Q.2 Define LSE? 2 Marks
Least squares error is the sum of the squared deviations between the actual and predicted values
in a set of data.
The method of least squares is a statistical technique that uses this error to find the best-fit line
or curve for a given set of data.
Q.3 What is ReLu 2 Marks
ReLU, or Rectified Linear Unit, is a non-linear activation function used in deep neural networks
for machine learning. It's the most commonly used activation function in neural networks,
especially CNNs.
The ReLU function is defined as:
f(x) = max(0,x)
If x < 0, R(x) = 0
If x ≥ 0, R(x) = x
Q.4 What is CNN? 2 Marks
A convolutional neural network (CNN or ConvNet) is a network architecture for deep learning
that learns directly from data.
CNNs are particularly useful for finding patterns in images to recognize objects, classes, and
categories.
Q.5 Define ResNet. 2 Marks
ResNet, or Residual Neural Network, is a deep learning architecture that uses residual blocks to
improve the performance of deep models.
PART B (4x5 = 20 Marks)
Answer all questions
S.No Key Points Marks
Allotted
Q.6 What is Deep Learning?List the major architecture of dl networks. 5 Marks
The definition of Deep learning is that it is the branch of machine learning that is based on
artificial neural network architecture. An artificial neural network or ANN uses layers of
interconnected nodes called neurons that work together to process and learn from the input
data.
In a fully connected Deep neural network, there is an input layer and one or more hidden layers
connected one after the other. Each neuron receives input from the previous layer neurons or
the input layer. The output of one neuron becomes the input to other neurons in the next layer
of the network, and this process continues until the final layer produces the output of the
network. The layers of the neural network transform the input data through a series of nonlinear
transformations, allowing the network to learn complex representations of the input data.
Q.7 Explain multilayer perceptron with diagram 5 Marks
A multilayer perceptron (MLP) Neural network belongs to the feedforward neural network. It is
an Artificial Neural Network in which all nodes are interconnected with nodes of different
layers.
Frank Rosenblatt first defined the word Perceptron in his perceptron program. Perceptron is a
basic unit of an artificial neural network that defines the artificial neuron in the neural network.
It is a supervised learning algorithm containing nodes’ values, activation functions, inputs, and
weights to calculate the output.
The Multilayer Perceptron (MLP) Neural Network works only in the forward direction. All nodes
are fully connected to the network. Each node passes its value to the coming node only in the
forward direction. The MLP neural network uses a Backpropagation algorithm to increase the
accuracy of the training model.
Q.8 Write a note on loss function for regression. 5 Marks
Loss functions are essential in regression tasks as they measure the difference between predicted and
actual values, guiding the model during training to minimize this difference and improve accuracy. Here’s
a breakdown of common loss functions used in regression:
1. Mean Absolute Error (MAE)
Definition: MAE is the average of the absolute differences between actual and predicted values.
Characteristics: MAE treats all errors equally by measuring their absolute size. It’s robust to
outliers, as each error contributes proportionally to the total.
2. Mean Squared Error (MSE)
Definition: MSE is the average of the squared differences between actual and predicted values.
Characteristics: MSE emphasizes larger errors by squaring them, which makes it sensitive to
outliers. The squared nature of this error helps in gradient-based optimization as it’s smooth and
differentiable.
3. Root Mean Squared Error (RMSE)
Definition: RMSE is the square root of MSE, making the error metric in the same units as the
target values.
Characteristics: Like MSE, RMSE penalizes larger errors but provides interpretable results in the
original units of the data.
4. Huber Loss
Definition: Huber Loss combines MAE and MSE, being less sensitive to outliers than MSE but still
smooth. It’s quadratic for small errors and linear for large errors.
Characteristics: Huber Loss offers a balance between MAE and MSE, making it robust to outliers
but sensitive enough to maintain accuracy.
Q.9 Explain about building pooling layer in CNN. 5 Marks
In Convolutional Neural Networks (CNNs), pooling layers are essential for reducing the spatial dimensions
of feature maps while retaining important information. Pooling layers make the model more
computationally efficient and help in controlling overfitting. Here’s how to build and understand a
pooling layer in CNNs:
Types of Pooling Layers
1. Max Pooling: Takes the maximum value from each pool of values in the feature map.
o Purpose: Helps in identifying the most prominent feature in each region, making the
model invariant to small translations.
2. Average Pooling: Computes the average of values in each pool.
o Purpose: Smooths the feature map and retains overall trends in each region.
import tensorflow as tf
# Example: Max Pooling layer with 2x2 window and stride of 2
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2) # Add pooling layer
])
PART C (20 Marks)
Answer any two questions
S.No Key Points Marks
Allotted
Q.10 Explain backpropagation with diagram
Backpropagation is a powerful algorithm in deep learning, primarily used to train artificial neural
networks, particularly feed-forward networks. It works iteratively, minimizing the cost function by
adjusting weights and biases.
In each epoch, the model adapts these parameters, reducing loss by following the error gradient.
Backpropagation often utilizes optimization algorithms like gradient descent or stochastic gradient
descent. The algorithm computes the gradient using the chain rule from calculus, allowing it to
effectively navigate complex layers in the neural network to minimize the cost function. 10 Marks
Working of Backpropagation Algorithm
The Backpropagation algorithm involves two main steps: the Forward Pass and the Backward Pass.
How Does the Forward Pass Work?
In the forward pass, the input data is fed into the input layer. These inputs, combined with their
respective weights, are passed to hidden layers.
For example, in a network with two hidden layers (h1 and h2 as shown in Fig. (a)), the output from h1
serves as the input to h2. Before applying an activation function, a bias is added to the weighted inputs.
Each hidden layer applies an activation function like ReLU (Rectified Linear Unit), which returns the input
if it’s positive and zero otherwise. This adds non-linearity, allowing the model to learn complex
relationships in the data. Finally, the outputs from the last hidden layer are passed to the output layer,
where an activation function, such as softmax, converts the weighted outputs into probabilities for
classification.
Q.11 Explain Activation function with diagram and properties it must hold in neural network model. 10 Marks
An activation function in the context of neural networks is a mathematical function applied to the output
of a neuron. The purpose of an activation function is to introduce non-linearity into the model, allowing
the network to learn and represent complex patterns in the data. Without non-linearity, a neural
network would essentially behave like a linear regression model, regardless of the number of layers it
has.
Variants of Activation Function
Linear Function
Equation : Linear function has the equation similar to as of a straight line i.e. y = x
No matter how many layers we have, if all are linear in nature, the final activation function of last
layer is nothing but just a linear function of the input of first layer.
Range : -inf to +inf
Uses : Linear activation function is used at just one place i.e. output layer.
Issues : If we will differentiate linear function to bring non-linearity, result will no more depend
on input “x” and function will become constant, it won’t introduce any ground-breaking behavior
to our algorithm.
Sigmoid Function
It is a function which is plotted as ‘S’ shaped graph.
Equation : A = 1/(1 + e-x)
Nature : Non-linear. Notice that X values lies between -2 to 2, Y values are very steep. This
means, small changes in x would also bring about large changes in the value of Y.
Value Range : 0 to 1
Tanh Function
The activation that works almost always better than sigmoid function is Tanh function also
known as Tangent Hyperbolic function. It’s actually mathematically shifted version of the
sigmoid function. Both are similar and can be derived from each other.
Equation :-
f(x) = tanh(x) = 2/(1 + e-2x) – 1
OR
tanh(x) = 2 * sigmoid(2x) – 1
Value Range :- -1 to +1
Nature :- non-linear
RELU Function
It Stands for Rectified linear unit. It is the most widely used activation function. Chiefly
implemented in hidden layers of Neural network.
Equation :- A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise.
Value Range :- [0, inf)
Nature :- non-linear, which means we can easily backpropagate the errors and have multiple
layers of neurons being activated by the ReLU function.
oftmax Function
The softmax function is also a type of sigmoid function but is handy when we are trying to handle multi-
class classification problems.
Nature :- non-linear
Output:- The softmax function is ideally used in the output layer of the classifier where we are
actually trying to attain the probabilities to define the class of each input.
The basic rule of thumb is if you really don’t know what activation function to use, then simply
use RELU as it is a general activation function in hidden layers and is used in most cases these
days.
If your output is for binary classification then, sigmoid function is very natural choice for output
layer.
If your output is for multi-class classification then, Softmax is very useful to predict the
probabilities of each classes.
Q.12 Draw and explain architecture of CNN. 10 Marks
Basic structure of convolution block:
The input image is passed through one or more convolutional layers, where filters are applied to
the image to extract features such as edges, textures, and shapes.
The output of the convolutional layers is then passed through one or more pooling layers, which
are used to down-sample the feature maps.
This reduces the spatial dimensions of the feature maps, making the network more
computationally efficient and reducing the risk of overfitting.
The output of the pooling layers is then passed through one or more fully connected layers,
which are used to make a prediction or classify the image.
Convolutional blocks are used in the initial stages of a CNN, where the primary goal is to extract
features from the input image.
They are typically repeated multiple times in the network, with each block extracting more
complex features from the output of the previous block.
This hierarchical structure allows the network to learn increasingly complex features, leading to
improved image recognition performance.
Convolutional Layer :
This is the core building block of a CNN. It applies a set of learnable filters (also known as
kernels) to the input data.
These filters are small grids that slide over the input image to perform element-wise
multiplications and additions.
Each filter extracts specific features from the input data, such as edges, textures, or more
complex patterns. • Multiple filters are used to capture different features.
The output of this layer is called feature maps.
Pooling Layer:
Pooling layers reduce the spatial dimensions of the feature maps while retaining the most
important information.
This helps in reducing computation and making the model translation invariant.
Max-pooling and average-pooling are common pooling techniques.
Max-pooling, for example, selects the maximum value within a small region of the feature map,
reducing the size and introducing translational invariance.
Pooling helps reduce the computational complexity of the network and makes the model more
robust to small shifts in the input data.
Activation Layer :
After each convolutional layer, an activation function is applied element-wise to the feature
maps.
It introduces non-linearity into the model which is essential for the network to learn complex
patterns.
The most common activation function used in CNNs is the Rectified Linear Unit (ReLU).
ReLU activation function replaces negative values with zero and leaves positive values
unchanged, introducing non-linearity into the model.
Dropout Layer :
Dropout is a regularization technique used to prevent overfitting.
During training, a fraction of randomly selected neurons (typically set as a hyperparameter) is
temporarily “dropped out” or ignored.
It prevents the network from relying too heavily on specific neurons and features.
Fully Connected Layer (Dense Layer) :
The Fully Connected (FC) layer consists of the weights and biases along with the neurons
These layers connect every neuron in one layer to every neuron in the next layer.
They are typically used in the final layers of the CNN for classification or regression tasks.
This is usually placed before the output layer and reduces human supervision.