0% found this document useful (0 votes)
15 views4 pages

DL2024

The document outlines the structure and instructions for a comprehensive examination on Deep Learning at the Birla Institute of Technology and Science, Pilani, scheduled for December 20, 2023. It consists of three parts: Part A and B are closed book with multiple-choice and true/false questions, while Part C is open book. The exam covers various topics in deep learning, including neural networks, regularization methods, and sequence modeling.

Uploaded by

f20220384
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views4 pages

DL2024

The document outlines the structure and instructions for a comprehensive examination on Deep Learning at the Birla Institute of Technology and Science, Pilani, scheduled for December 20, 2023. It consists of three parts: Part A and B are closed book with multiple-choice and true/false questions, while Part C is open book. The exam covers various topics in deep learning, including neural networks, regularization methods, and sequence modeling.

Uploaded by

f20220384
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI (RAJ.

)
First Semester 2023-24 CS F425 Deep Learning
COMPREHENSIVE EXAMINATION [Closed Book]
Date: 20th December 2023 Weightage: 35 % Max. Marks: 70 Duration: 180 minutes
Part A [CLOSED BOOK] [28 Marks]

Important Instructions:
Q1 answers (MCQ): Q2 answers (True/False): MARKS
1. The exam has THREE parts – A, B
and C. Part A and B are closed i) i) Marks of Q1:
book and are provided to you in Marks of Q2:
the beginning of the exam. The ii) ii)
Marks of Q3:
recommended time for them is iii) iii)
total two hours. You can collect
Part C (which is Open Book) iv) iv)
whenever you submit Parts A and
v) v)
B. TOTAL
MARKS:
2. This is Part A containing short- vi) vi)
answer type questions.
vii) vii) Recheck request:
3. Any overwritten answers would
not be considered for a recheck viii) viii)
request.
4. For Q1 & Q2: write your answers ix) ix)
in the grid provided on the right. x) x)

1. Multiple Choice Questions (+1 for right answer, -0.5 for wrong). Only one correct answer. [1*10 = 10]

i). Which of the following activation function can lead to the vanishing gradient problem?
A). ReLU, B). tanh, C). Leaky ReLU, D). None of these.

ii). Which of the following techniques can NOT help prevent a model from overfitting?
A). Data augmentation, B). Dropout, C). Early stopping, D). None of these

iii). After training a neural network, you observe a large gap between the training accuracy (95%) and the test accuracy
(35%). Which of the following methods can be used to reduce this gap?
A). Generative adversarial network., B). Sigmoid activation, C). RMSprop optimizer, D). Dropout.

iv). Which of the following regularization methods leads to weight sparsity?


A). L1 regularization, B). L2 regularization, C). Early stopping, D). None of these.

v) Which of the following layers is generally NOT a part of a CNN?


A) Convolutional Layer B) Pooling Layer C) Code Layer D) Fully connected Layer

vi). Which of the below can you use to solve the exploding gradient problem?
A) Use SGD optimization, B) Oversample minority classes, C). Increase the batch size, D) Impose gradient clipping.

vii). If the input to the CNN of size 24x24 is convolved with a kernel of size 7x7 and same padding is used, what will be the
size of the output matrix? Consider stride of 1.
A) 18x18 B) 24x24 C) 17x17 D) Cannot be determined with the information provided

viii). The convolution operation doesn't fully use the pixels at the corners of an image. This is resolved by the use of:
A) Padding B) Striding C) Kernels D) Pooling
ix). Which of the following is true about dropout?
A) Dropout leads to sparsity in the trained weights B) At test time, dropout is applied with inverted keep probability
C) Larger the keep probability of a layer, stronger the regularization of the weights in that layer D) None of these

x). Which of the following is TRUE about Momentum?


A) It helps in accelerating SGD in a relevant direction B) It helps SGD in avoiding local minima
C) It helps in faster convergence D) All of these

2. Answer as TRUE or FALSE. No reasoning or justification required. (+1 for right answer, -0.5 for wrong) [1*10=10]
i) Convolutional networks generally have more parameters than their equivalent fully connected Networks
ii) Autoencoders are able to compress data and thus can be used as a generic data compression algorithm.
iii) The output of the autoencoder will not be exactly the same as the input, and thus they are “lossy”.
iv) Autoencoders are considered a supervised learning technique since they produce the reconstructed image
using the original image as an input.
v) An autoencoder can be forced to learn useful features by adding random noise to its inputs and making it
recover the original noise-free data.
vi) Apart from being an optimization technique, Batch normalization also acts as a regularizer and often eliminates
the need for using Dropout.
vii) Regularization is intended to reduce the training error as well as the generalization error.
viii) Pooling layers involve many fixed computations and hence they slow down the computation in a neural
network.
ix) the basic concept behind RNNs is that RNNs use recurrent features from dataset to find the best optimization.
x) In general, training a GAN involves alternating periods where the discriminator trains for one or more epochs
followed by the generator being trained for one or more epochs

3. If the input is of size 256x256x6 and the neural network structure is as indicated in the first column below, calculate the
output feature map dimensions for each layer. [1*8=8]
The notation follows the convention:
• CONV-K-N denotes a convolutional layer with N filters, each them of size KxK. Padding and stride parameters
are always 0 and 1 respectively.
• POOL-K indicates a KxK pooling layer with stride K and padding 0.
• FC-N stands for a fully-connected layer with N neurons.

Write your answer in the space provided in the table below.

Layer Feature map dimensions


INPUT 256x256x6
CONV-57-64
POOL-2
CONV-5-32
POOL-2
CONV-5-64
POOL-2
POOL-2
FC-9
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI (RAJ.)
First Semester 2023-24 CS F425 Deep Learning
COMPREHENSIVE EXAMINATION [Closed Book]
Date: 20th December 2023 Weightage: 35 % Max. Marks: 70 Duration: 180 minutes
Part B [CLOSED BOOK] [22 Marks]

This is Part B, Closed Book. Together with Part A, you are recommended to finish this in two hours. Once you submit
Parts A and B, you can collect Part C [Open Book].

Q.1. Consider the following types of sequence modelling scenarios represented using an unfolded recurrent neural network
(RNN) over time-steps: [1+1+1+1=4]
RNN
Output A B C D

Input
Categorize each of the following applications into any one of the above types i.e. A, B, C, or D. No reasoning or explanation
required. (Note: No marks will be awarded if more than one answer is written for an application):
i). Image captioning, ii). Sentiment prediction, iii). Machine translation, iv). Video frame classification.

Q.2. Fill in the blanks in the following graph with regard to the regularization method “Early stopping”. [3]

Q.3. You use vanilla (batch) gradient descent to optimize your loss function, but realise you are getting poor training loss. You
notice that you're not shuffling the training data and feel that it might be a cause. Would shuffling the training data help
in this regard? Give a clear YES or NO as an answer and then give a 1-2 lines justification. [2]
Q.4. Suppose we train two different deep CNNs to classify images:
Using ReLU as the activation function, and (ii) using sigmoid as the activation function. For each of them, we try initializing
weights with the three different initialization methods, while the biases are always initialized to all zeros. We plot the
validation accuracies with different training iterations below: [3]

What is the weight initialization method for A, B, and C in the above plots from zero initialization, Xavier initialization,
and Kaiming He initialization? (Answer with only one initialization method for A, B and C.)

Q.5 You are solving the binary classification task of classifying images as “car vs. no car”. You design a CNN with a single
output neuron. The final output of your network, 𝑦̂ is given by:

𝑦̂ = 𝜎(𝑅𝑒𝐿𝑈(𝑧)) (where z, as usual, is w.x + b)

You classify all inputs with a final value 𝑦̂ ≥ 0.5 as car images. What problem are you going to encounter? Justify. [2]

Q.6. Given N training data points {𝑥𝑖 , 𝑦𝑖 , 𝑖 = 1: 𝑁, }, 𝑥𝑖 ∈ 𝑅𝑑 , labels 𝑦𝑖 ∈ {1, −1}. We need a linear classifier 𝑓(𝑥) =
𝑠𝑖𝑔𝑛(𝑤. 𝑥) (read as w dot x) optimizing the loss function 𝐿(𝑧) = 𝑒 −𝑧 , for 𝑧 = 𝑦. (𝑤. 𝑥). Here, represents data point
of class 1 and represents data point of class -1: [6+2=8]

a). Explain the penalties given by this loss functions for the different data points (1 to 6) shown above in the plot.

b). Derive the stochastic gradient descent update ∆𝑤 for 𝐿(𝑧).

You might also like