0% found this document useful (0 votes)
5 views2 pages

QP3

Question paper on advanced machine learning

Uploaded by

khansethifamily2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views2 pages

QP3

Question paper on advanced machine learning

Uploaded by

khansethifamily2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

EE782 Advanced Topics in Machine Learning

End-Semester Examination Question Paper and Answer Sheet


November 24, 2023; 08:30 am to 11:30 am

ROLL NO. ________________ NAME: _________________________


Instructions:
 Exam is open notes as long as the notes are on paper and not on an electronic device
 Collaboration between a student and any other person or the Internet is prohibited
 SUBMIT THIS SHEET ONLY. Answer all questions in the space given on this sheet only. Use separate sheet for rough work
 Total marks = 28; weight in course 28%

1. Let f(x,y) be 2x + 3y2 − 2x2. Find the critical points of this function and characterize them into local maxima, local minima, furrow (flat in
one direction, minima in the other), saddle point, or point of inflection in a given direction. Show your work. [1.5]

2. Not carefully initializing weights in a deep neural network where every layer has a sigmoid nonlinearity will lead to (a) no problem, (b)
vanishing gradients, (c) exploding gradients, or (d) both vanishing and exploding gradients? Explain the case for both vanishing and
exploding gradients. [2]

3. Suppose one layer has an output of size C (one-dimensional array) before a nonlinearity is applied. A second layer is a convolutional layer
with ReLU nonlinearity, whose output has dimensions H×W×C (three-dimensional tensor). The output of the first layer needs to be used as
channel-wise attention weight for the output of the second layer. Suggest a nonlinearity and any additional operations that need to be
applied to the output of the first layer for this purpose. [2]

4. For a convolutional layer with output of size H×W×C×B, where B is the batch size and C is the number of channels, what will be the
number of elements that will be averaged for computing one mean during batch normalization? [1]

5. What will happen if we train a GAN that has a discriminator with low capacity (e.g. not enough learnable parameters)? Will it lead to (a)
mode collapse, (b) generation of unrealistic samples, or (c) a discriminator that easily classifies between real and fake? Justify your answer.
[1]

6. Which of the following is a good principle for designing a loss function for regression that is robust to outliers? Justify with an example of
a robust loss function and how it treats inliers (non-outliers) versus outliers. [1.5]
a) The loss function should be convex
b) The loss function should have a constant upper bound
c) The gradient of the loss function should have a constant upper bound
d) The absolute value of the gradient of the loss function should have a constant upper bound

7. Write the formula for generalized cross entropy and explain how it might help deal with mislabeled samples by drawing an approximate
graph and explaining the role of its hyperparameter. [1.5]
8. According to the paper titled “Normalized Loss Functions for Deep Learning with Noisy Labels” by Ma et al., (a) draw the approximate
graphs of cross entropy and (b) normalized cross entropy for the predicted probability of the correct class [Hint: assume binary
classification], and (c) explain the advantage of normalized cross entropy over cross entropy when dealing with mislabled samples, as well as
(d) the disadvantage of using only an active loss (e.g. NCE) based on the graphs. [2]

9. List four different methods of augmenting images for self-supervised learning. [2]

10. Give the general architecture of a neural network that is being trained in a self-supervised manner to restore images of old degraded
photographs. Your answer must given an example each of (a) the dimensions of the input layer, (b) the dimensions and nonlinearity (if any)
of the output layer, (c) the loss function, (d) a plausible architecture, and (e) a method to create the training dataset. [2.5]

11. What could be an advantage in few-shot learning of decreasing the relative distance of a query sample from the prototype of its class as
opposed to the support samples of its class? [1]

12. Suppose that we want to use graph neural networks on Facebook communities to classify them into those who might respond to a
particular ad versus those who will not. Give at least two examples of vertex attributes and two examples of edge attributes that one can use.
[2]

13. Write the Degree, Adjacency, and Laplacian matrices for the following graph. 1,2,3,4 are the serial number of the vertex. [1.5]

14. What is a pseudo-label for a classification problem? [1]

15. Write the formula for entropy and describe one way to use it for semi-supervised classification. [1]

16. Explain how Grad-CAM (gradient-weighted class activation map) works to localize objects when a CNN is only trained for image
classification. [1.5]

17. List and briefly describe the key differences between how dropout is used for regular training and inference versus how it is used for
uncertainty estimation. [1.5]

18. Describe one ways to compare the performance of uncertanity estimation methods for outlier detection, and another for determining
trustworthy outcomes on non-outliers. [0.5 + 1]

You might also like