1.
Given a square matrix A of size n × n, explain the process of computing its
eigenvalues and eigenvectors. Use the matrix A = (4 1 2 3) to
demonstrate the process. Additionally, discuss how eigenvectors are
utilized in the context of Principal Component Analysis (PCA).
2. Explain the vanishing gradient problem in deep neural networks. How does
it affect the training of deep networks? Describe two techniques that help
mitigate this problem, and explain how they work.
3. Consider a deep feedforward network with 4 hidden layers. Explain how
backpropagation works in this network and derive the expression for the
gradient of the loss function with respect to the weights of the first hidden
layer. Discuss the role of the chain rule in this process.
4. Compare and contrast L1 and L2 regularization in the context of training
deep learning models. Explain the mathematical formulation of each and
discuss how these regularization techniques affect the model's weights
and sparsity.
5. Explain the concept of overfitting in deep learning models. Describe three
strategies that can be employed to prevent overfitting during the training
of deep models. Provide an example scenario for each strategy.
6. Explain how a Convolutional Neural Network (CNN) processes an image
from input to output, highlighting the roles of convolutional layers, pooling
layers, and fully connected layers. Provide an example with specific
dimensions, filters, and activation functions. Also, describe how CNNs
differ from traditional fully connected networks in handling image data.
7. A factory produces light bulbs, and it is known that 2% of them are
defective. If you randomly select 10 light bulbs from the production line,
what is the probability that exactly 2 of them are defective? Use the
binomial distribution to calculate the answer.
8. You are training a simple neural network with a single hidden layer. The
network uses a sigmoid activation function, and the loss function is mean
squared error (MSE). Suppose the weights between the input and hidden
layers are W=(0.1 −0.2 0.4 0.5), the biases are b=(0.3 0.1), and the
input is x=(1 −1). Compute the output of the hidden layer and the final
output of the network. Then, calculate the gradient of the loss with respect
to the weights using backpropagation.