1-25
1. What is the primary use of 'gradient boosting' in machine learning?
o A. To remove noise from datasets
o B. To make a single model interpretable
o ✅ C. To combine multiple weak learners into a strong learner
o D. To increase feature importance
2. In ensemble learning, bagging and boosting are two different techniques.
What is the primary difference between them?
o ✅ A. Bagging combines multiple models to make predictions, while boosting
trains models sequentially.
o B. Bagging reduces bias, while boosting reduces variance.
o C. Boosting uses a single model, while bagging uses multiple models.
o D. Boosting is used only for regression, while bagging is for classification.
3. Which algorithm is commonly used for anomaly detection and one-class
classification? o A. K-Means Clustering
o B. Decision Trees
o ✅ C. One-Class Support Vector Machines (One-Class SVM)
o D. Logistic Regression
4. What is the purpose of dimensionality reduction techniques in machine
learning?
o A. To improve model interpretability by adding more parameters
o B. To increase the number of features
o ✅ C. To reduce the complexity of data while preserving important
information
o D. To remove all noise from data
5. In reinforcement learning, what is the term for the process of estimating
the expected cumulative rewards of taking a particular action in a specific
state?
o A. Exploration
o B. Temporal Difference Learning
o ✅ C. Q-Value
o D. Discount Factor
6. Which machine learning algorithm is known for its ability to handle
imbalanced datasets and is often used in fraud detection?
o A. K-Nearest Neighbors (KNN)
o B. Naive Bayes
o C. Logistic Regression
o ✅ D. Random Forest or Gradient Boosting
7. Which algorithm is commonly used for clustering and dimensionality
reduction and is based on finding the nearest neighbors of data points?
o ✅ A. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
o B. K-Means Clustering
o C. Principal Component Analysis (PCA)
o D. Decision Trees
8. What is the key idea behind word embeddings in natural language
processing?
o A. Representing words as one-hot encoded vectors
o B. Using only frequency-based word representations
o ✅ C. Representing words as dense vectors in a continuous space o
D. Sorting words alphabetically in datasets
9. What is the primary goal of 'convolutional neural networks' (CNNs) in deep
learning?
o A. To analyze tabular data
o B. To classify text documents
o ✅ C. To perform image recognition and feature extraction
o D. To process sequential data like time series
10. In reinforcement learning, what is the term for the numerical value that
represents the goodness or desirability of a state or action?
• ✅ A. Reward
• B. Discount Factor
• C. Policy Gradient
• D. Bellman
11. What is the primary goal of 'Kullback-Leibler divergence' (KL divergence)
in information theory?
• A. To find the shortest path between two data points
• ✅ B. To quantify the difference between two probability distributions
• C. To measure the variance of a dataset
• D. To optimize hyperparameters in a neural network
12. In deep learning, what is a convolutional layer primarily used for?
• ✅ A. Feature extraction from images
• B. Reducing noise in numerical data
• C. Combining multiple classifiers
• D. Splitting datasets into training and testing sets
13. What is the purpose of batch normalization in deep neural networks?
• A. To remove outliers from training data
• ✅ B. To help stabilize and accelerate training by normalizing input features
at each layer • C. To increase the number of neurons in each layer
• D. To reduce model accuracy for faster inference
14. In unsupervised learning, what is the main goal of 'hierarchical
clustering'?
• A. To assign each data point to a unique cluster
• B. To improve performance of supervised learning models
• ✅ C. To merge clusters based on similarity to create a hierarchy
• D. To train a deep learning model using small datasets
15. Which machine learning algorithm is often used for collaborative filtering
and recommendation systems?
• A. Decision Trees
• B. Logistic Regression
• ✅ C. Matrix Factorization (e.g., Singular Value Decomposition - SVD)
• D. K-Means Clustering
16. Which machine learning algorithm is often used for sentiment analysis
and text classification and is based on a probabilistic approach?
• A. Support Vector Machine (SVM)
• B. K-Nearest Neighbors (KNN)
• ✅ C. Naive Bayes
• D. Decision Trees
17. What is the main goal of semi-supervised learning?
• A. To train models using only labeled data
• ✅ B. To learn from both labeled and unlabeled data
• C. To reduce the number of features in a dataset
• D. To create synthetic data for training
18. What is the primary use of 'word embeddings' in natural language
processing?
• A. ✅ To represent words as dense vectors in a continuous space
• B. To convert words into one-hot encoded vectors
• C. To count the frequency of words in a document
• D. To group similar words using clustering
19. What is the primary objective of hyperparameter tuning in machine
learning?
• A. ✅ To improve model generalization
• B. To increase dataset size
• C. To remove bias from a dataset
• D. To train models faster without overfitting
20. What is the main benefit of using 'word2vec' embeddings in natural
language processing?
• A. They store words in tabular format
• ✅ B. They capture semantic relationships between words
• C. They reduce the need for labeled data
• D. They remove all redundant words from text
21. Which method is commonly used for handling missing data in a dataset?
• A. Deleting rows with missing values
• B. Replacing missing values with zero
• ✅ C. Imputation (e.g., mean, median, mode, interpolation)
• D. Ignoring missing values during training
22. In reinforcement learning, what is the term for the process of selecting
actions that maximize expected cumulative rewards while considering
exploration and exploitation?
• A. Reward Selection
• B. Gradient Descent
• C. Backpropagation
• ✅ D. Policy Optimization
23. What does the term 'overfitting' refer to in the context of machine
learning?
• A. When a model generalizes well to unseen data
• ✅ B. When a model performs well on training data but poorly on unseen
data
• C. When a model underestimates the complexity of a dataset
• D. When a model is unable to find a pattern in data
24. What is the primary objective of 'one-class classification' in machine
learning?
• A. To classify multiple categories of data
• B. ✅ To identify outliers and anomalies
• C. To increase training accuracy
• D. To reduce computation cost in training
25. What is the purpose of the 'sigmoid' activation function in neural
networks?
• A. ✅ To introduce non-linearity into the model
• B. To ensure weights do not change in a neural network
• C. To reduce overfitting in CNN models
• D. To increase training time
26. What is the primary goal of 'autoencoders' in deep learning?
• A. To classify images in supervised learning
• ✅ B. To compress data and reduce dimensionality
• C. To generate new labeled datasets
• D. To predict numerical values
27. Which machine learning technique is best suited for time series
forecasting and stock price prediction?
• A. Convolutional Neural Networks (CNNs)
• ✅ B. Recurrent Neural Networks (RNN) or Long Short-Term Memory (LSTM)
• C. Random Forest
• D. Naive Bayes
28. In reinforcement learning, what is the 'Bellman equation' used for?
• A. To increase dataset size
• ✅ B. To estimate the optimal value function
• C. To train deep neural networks
• D. To balance bias and variance
29. What is the primary use of a confusion matrix in classification tasks?
• A. To split data into training and testing sets
• ✅ B. To evaluate the performance of a classifier
• C. To increase model interpretability
• D. To visualize decision boundaries
30. What is the primary purpose of the 'mean squared error' (MSE) loss
function in regression tasks?
• A. ✅ To calculate the mean of squared errors between predicted and actual
values
• B. To improve model explainability
• C. To reduce bias in supervised learning
• D. To train deep learning models faster
31. Which deep learning architecture is designed to process sequences of
data and is commonly used in natural language processing?
• A. Convolutional Neural Networks (CNNs)
• ✅ B. Long Short-Term Memory (LSTM)
• C. Random Forest
• D. Naive Bayes
32. What is the primary use of 'online learning' in machine learning?
• A. To train models with static datasets
• ✅ B. To continuously update the model as new data becomes available
• C. To reduce model complexity
• D. To increase dataset interpretability
33. What is the primary advantage of using dropout layers in neural
networks?
• A. To speed up training time
• ✅ B. They prevent overfitting by randomly deactivating neurons during
training
• C. To reduce dataset size
• D. To replace backpropagation
34. In the context of reinforcement learning, what does the term 'exploration'
refer to?
• A. Avoiding new actions to reduce risk
• ✅ B. The process of selecting actions to discover new information
• C. Training a model using only supervised learning
• D. Increasing dataset complexity
35. What is the primary goal of unsupervised learning?
• A. To improve the performance of supervised models
• B. To train models using only labeled data
• ✅ C. To discover patterns and relationships in data
• D. To increase training speed in deep learning models
36. Which machine learning algorithm is commonly used for binary
classification problems and is based on a linear decision boundary?
• A. Decision Trees
• ✅ B. Support Vector Machine (SVM) or A. Logistic Regression
• C. K-Means Clustering
• D. Random Forest
37. What is the purpose of a 'one-hot encoding' in feature engineering?
• A. To compress high-dimensional data into a single value
• ✅ B. To convert categorical variables into a binary representation
• C. To cluster data points in an unsupervised manner
• D. To perform hyperparameter tuning
38. In natural language processing, what is the 'tf-idf' (term frequency-
inverse document frequency) measure used for?
• A. To count the number of times a word appears in a document
• ✅ B. To measure the importance of a word in a document relative to a
corpus
• C. To represent words as dense vectors
• D. To remove stopwords from text
39. What is the primary purpose of the 'softmax' activation function in neural
networks? • A. ✅ To convert logits into probabilities for multi-class
classification
• B. To normalize input data before training
• C. To reduce overfitting in deep learning models
• D. To balance the number of neurons in each layer
40. In reinforcement learning, what does the term 'policy' refer to?
• A. A dataset used for training
• B. A reinforcement learning algorithm
• ✅ C. A mapping from states to actions, defining the behavior of an agent
• D. A method for tuning hyperparameters
41. What is the primary challenge when training GANs?
• A. Lack of training data
• ✅ B. Mode collapse, where the generator produces limited diversity
• C. Slow training due to backpropagation
• D. High variance in predictions
42. Which machine learning model is commonly used for handling graph-
structured data? • A. ✅ Graph Neural Networks (GNNs)
• B. Decision Trees
• C. Random Forest
• D. Recurrent Neural Networks (RNNs)
43. What is the primary advantage of using Transformers over RNNs for NLP
tasks?
• A. They require fewer training examples
• B. They use convolutional layers instead of recurrent connections
• C. They process input sequences sequentially
• ✅ D. They can process the entire input sequence in parallel instead of
sequentially
44. Which method is commonly used for interpreting deep learning models?
• A. PCA (Principal Component Analysis)
• B. SHAP (Shapley Additive Explanations)
• ✅ C. LIME (Local Interpretable Model-agnostic Explanations)
• D. K-Means Clustering
45. What is the key advantage of Bayesian Optimization for hyperparameter
tuning?
• A. It reduces bias in models
• B. ✅ It efficiently searches for the best hyperparameters using probabilistic
models
• C. It requires minimal computational resources
• D. It is used only in supervised learning
46. In reinforcement learning, what is the main purpose of the 'reward
shaping' technique? • A. ✅ To accelerate learning by providing additional
guidance to the agent
• B. To increase exploration by penalizing incorrect actions
• C. To balance bias and variance in models
• D. To avoid overfitting in Q-learning
47. What is the main challenge of federated learning?
• A. Ensuring the dataset is large enough
• B. Reducing model overfitting
• ✅ C. Ensuring privacy and secure communication across distributed devices
• D. Increasing training speed
48. Which machine learning approach is best suited for multimodal learning?
• A. Using ensemble methods
• B. Training deep learning models with multiple architectures
• ✅ C. Combining multiple data sources such as text, images, and audio
• D. Using unsupervised learning exclusively
49. In explainable AI (XAI), what is the purpose of the LIME algorithm?
• A. To improve hyperparameter selection
• B. ✅ To approximate and explain model predictions using interpretable
models
• C. To detect and prevent bias in models
• D. To improve deep learning model generalization
50. What is the primary function of self-supervised learning?
• A. ✅ To learn representations from data without requiring labeled examples
• B. To train deep learning models using only supervised methods
• C. To ensure models are robust against adversarial attacks
• D. To classify images into pre-defined categories
51. What is a key application of contrastive learning in machine learning?
• A. To reduce the number of layers in deep networks
• B. To replace traditional loss functions in supervised learning
• ✅ C. Learning useful embeddings by pulling similar samples together and
pushing dissimilar ones apart
• D. To detect outliers in labeled datasets
52. Which optimization technique is best suited for training very deep neural
networks?
• A. Stochastic Gradient Descent (SGD)
• B. AdaGrad
• ✅ C. Adam optimizer
• D. K-Means
53. What is the primary role of zero-shot learning in NLP?
• A. To improve training speed
• ✅ B. To enable a model to classify data it has never seen before
• C. To replace labeled data with synthetic data
• D. To optimize the structure of neural networks
54. Which technique is most effective for handling catastrophic forgetting in
continual learning?
• A. Data Augmentation
• B. Hyperparameter Tuning
• ✅ C. Experience Replay
• D. Batch Normalization
55. What is the primary purpose of knowledge distillation in machine
learning?
• A. To increase dataset size
• B. To replace deep learning models with simpler models
• ✅ C. To transfer knowledge from a large model to a smaller model
• D. To remove redundant layers in deep networks
56. Which activation function is commonly used in transformers?
• A. ReLU
• B. Sigmoid
• ✅ C. GELU (Gaussian Error Linear Unit)
• D. Softmax
57. In reinforcement learning, what is 'multi-arm bandit' used for?
• A. Predicting time-series data
• B. Training generative adversarial networks
• ✅ C. Solving exploration-exploitation trade-offs in decision-making
• D. Hyperparameter tuning in neural networks
58. What is the primary advantage of using diffusion models in generative
AI?
• A. They require fewer labeled training samples
• B. They eliminate the need for adversarial training
• C. They process data faster than other generative models
• ✅ D. They generate high-quality images by modeling noise transitions over
time
59. Which type of neural network is used in AlphaFold for protein structure
prediction?
• A. Convolutional Neural Networks (CNNs)
• B. Recurrent Neural Networks (RNNs)
• ✅ C. Transformer-based models
• D. Generative Adversarial Networks (GANs)
60. What is the main role of masked language modeling in NLP?
• A. To reduce the number of tokens in a sentence
• B. ✅ To train models like BERT by predicting missing words in a sentence
• C. To segment text into predefined categories
• D. To convert text into one-hot encoded vectors
61. In adversarial machine learning, what is an 'evasion attack'?
• A. An attack that removes neurons from a network
• ✅ B. An attack that manipulates input data to fool a model at inference time
• C. An attack that reduces model training time
• D. An attack that modifies the loss function
62. Which deep learning framework is known for its symbolic differentiation
capability?
• A. TensorFlow
• ✅ B. JAX
• C. PyTorch
• D. Scikit-learn
63. What is the primary use of the Mixture of Experts (MoE) model in
machine learning?
• A. To increase model size for better accuracy
• B. To replace traditional neural networks
• ✅ C. To use multiple specialized subnetworks that activate based on input
data
• D. To fine-tune deep learning models
64. What is the advantage of meta-learning in few-shot learning tasks?
• A. ✅ It allows models to quickly adapt to new tasks with minimal data
• B. It increases dataset interpretability
• C. It reduces computation in deep learning models
• D. It replaces reinforcement learning algorithms
65. In unsupervised learning, what is the primary role of spectral clustering?
• A. To replace deep learning models in image classification
• B. To group labeled datasets efficiently
• ✅ C. Using eigenvalues of similarity matrices for clustering complex data
• D. To optimize the number of layers in a neural network
66. What is the key challenge of deploying deep learning models on edge
devices?
• A. Reducing dataset size
• B. ✅ Reducing computational and memory requirements
• C. Improving feature engineering
• D. Using more training samples
67. In reinforcement learning, what is 'curriculum learning'?
• A. A training method that ignores complex tasks
• ✅ B. A training strategy where tasks progressively increase in complexity
• C. A technique for reducing bias in machine learning models
• D. A method for feature selection
68. What is the purpose of an 'attention mechanism' in NLP?
• ✅ A. To focus on important parts of an input sequence while making
predictions
• B. To replace deep learning models in NLP
• C. To encode text into one-hot representations
• D. To generate word embeddings
69. Which metric is most suitable for evaluating generative models?
• A. Mean Squared Error (MSE)
• B. Accuracy
• C. ✅ Fréchet Inception Distance (FID) • D. Precision
70. What is the primary goal of entropy regularization in reinforcement
learning?
• A. To minimize training time
• B. To maximize accuracy
• ✅ C. Encouraging exploration by maximizing policy entropy
• D. To reduce computational complexity
71. What is the purpose of group normalization in deep learning?
• A. To increase model accuracy
• B. To reduce dataset imbalance
• ✅ C. To normalize activations by dividing features into groups instead of
using batch statistics
• D. To prevent adversarial attacks
72. Which framework is best suited for training physics-informed neural
networks (PINNs)? • A. PyTorch
• ✅ B. TensorFlow or JAX with differentiable physics solvers
• C. Scikit-learn
• D. XGBoost
73. What is the primary function of a hypernetwork in deep learning?
• A. To generate synthetic datasets
• ✅ B. To generate the weights of another neural network dynamically
• C. To replace the need for labeled data
• D. To improve dataset scaling
74. Which deep learning approach is best suited for learning hierarchical
structures in data?
• A. ✅ Capsule Networks
• B. Support Vector Machines (SVM)
• C. Random Forest
• D. Decision Trees
75. What is the main benefit of using the Lottery Ticket Hypothesis in deep
learning?
• A. ✅ It suggests that smaller sub-networks can be trained to achieve similar
performance as large networks
• B. It eliminates the need for labeled data
• C. It reduces training dataset size
• D. It ensures neural networks converge faster
76. In self-supervised learning, what is a common pretext task for learning
image representations?
• A. One-hot encoding of images
• B. Using decision trees for image classification
• C. Applying reinforcement learning on labeled images
• ✅ D. Predicting missing parts of an image or solving jigsaw puzzles
77. Which loss function is commonly used for training Variational
Autoencoders (VAEs)? • A. Cross-Entropy Loss
• ✅ B. Evidence Lower Bound (ELBO) loss
• C. Mean Squared Error (MSE)
• D. Hinge Loss
78. What is the main goal of entropy regularization in reinforcement
learning?
• A. To penalize incorrect actions
• ✅ B. Encouraging exploration by maximizing policy entropy
• C. To prevent underfitting
• D. To optimize hyperparameters
79. In multi-task learning, what is the main challenge of negative transfer?
• A. ✅ One task adversely affecting the learning of another task
• B. Reducing dataset size
• C. Training multiple tasks simultaneously
• D. Overfitting due to excessive data
80. What is a key advantage of using diffusion models for generative tasks?
• A. They eliminate noise from real images
• ✅ B. They generate high-quality samples by iteratively denoising random
noise
• C. They remove the need for training deep models
• D. They are less computationally expensive than GANs
81. Which method is commonly used for out-of-distribution detection?
• A. Cross-validation
• B. Gradient Boosting
• ✅ C. Mahalanobis distance
• D. Word2Vec
82. What is the primary purpose of quantization in deep learning?
• A. ✅ Reducing model size and computational cost
• B. Increasing accuracy of deep networks
• C. Improving image quality in generative models
• D. Eliminating adversarial attacks
 83. Which framework is best suited for training physics-informed neural
networks (PINNs)? • A. PyTorch
• ✅ B. TensorFlow or JAX with differentiable physics solvers
• C. Scikit-learn
• D. XGBoost
84. What is the primary function of a hypernetwork in deep learning?
• A. To generate synthetic datasets
• ✅ B. To generate the weights of another neural network dynamically
• C. To replace the need for labeled data
• D. To improve dataset scaling
85. Which deep learning approach is best suited for learning hierarchical
structures in data?
• ✅ A. Capsule Networks
• B. Support Vector Machines (SVM)
• C. Random Forest
• D. Decision Trees
86. What is the purpose of curriculum learning in training deep learning
models?
• A. To randomly shuffle data to improve accuracy
• ✅ B. To progressively introduce harder tasks during training to improve
learning efficiency
• C. To create new labeled datasets
• D. To increase the number of parameters in a model
87. In few-shot learning, what is the purpose of prototypical networks?
• A. To perform reinforcement learning on small datasets
• B. To classify data using decision trees
• C. To optimize hyperparameters in deep learning models
• ✅ D. To classify new samples based on their proximity to learned class
prototypes
88. What is the primary goal of the Neural Tangent Kernel (NTK) in deep
learning?
• A. To train neural networks with fewer parameters
• ✅ B. To analyze the training dynamics of infinitely wide neural networks
• C. To optimize deep learning architectures for faster convergence
• D. To increase dataset complexity
89. In adversarial training, what is the primary purpose of the PGD (Projected
Gradient Descent) attack?
 • ✅ A. To iteratively generate adversarial examples that maximize model
error
• B. To reduce overfitting in neural networks
• C. To improve the robustness of deep learning models
• D. To minimize loss functions in supervised learning
90. What is the main benefit of using the Lottery Ticket Hypothesis in deep
learning?
• A. ✅ It suggests that smaller sub-networks can be trained to achieve similar
performance as large networks
• B. It eliminates the need for labeled data
• C. It reduces training dataset size
• D. It ensures neural networks converge faster
91. Which type of neural network is most suitable for object detection in
images?
• A. Recurrent Neural Networks (RNNs)
• ✅ B. Convolutional Neural Networks (CNNs)
• C. Graph Neural Networks (GNNs)
• D. Long Short-Term Memory (LSTM) Networks
92. Which optimization method is best suited for reinforcement learning
tasks?
• A. Stochastic Gradient Descent (SGD)
• ✅ B. Trust Region Policy Optimization (TRPO)
• C. Principal Component Analysis (PCA)
• D. Support Vector Machines (SVM)
93. What is a key challenge in federated learning?
• A. Lack of labeled data
• ✅ B. Secure aggregation of distributed models while maintaining privacy
• C. Optimizing hyperparameters for deep learning models
• D. Generating adversarial examples
94. Which algorithm is commonly used for recommendation systems apart
from matrix factorization?
• A. Decision Trees
• B. Reinforcement Learning
• ✅ C. Deep Neural Networks (DNNs)
• D. K-Means Clustering
95. Which technique is commonly used to prevent mode collapse in GANs?
• A. Increasing the batch size
• ✅ B. Using Wasserstein GAN (WGAN) with gradient penalty
• C. Applying dropout layers
• D. Adding more convolutional layers
96. Which activation function is known for being computationally efficient in
deep networks?
• ✅ A. ReLU (Rectified Linear Unit)
• B. Sigmoid
• C. Tanh
• D. Softmax
97. Which reinforcement learning algorithm is model-based?
• A. Deep Q-Network (DQN)
• ✅ B. AlphaGo (Monte Carlo Tree Search + Deep Learning)
• C. Proximal Policy Optimization (PPO)
• D. Trust Region Policy Optimization (TRPO)
98. What is a primary advantage of meta-learning?
• A. It reduces the size of training datasets
• B. ✅ It enables fast adaptation to new tasks with minimal training examples
• C. It enhances feature selection in supervised learning
• D. It improves deep learning model accuracy
99. What is a key feature of contrastive divergence in machine learning?
• A. Used in reinforcement learning
• ✅ B. Used to efficiently train Restricted Boltzmann Machines (RBMs)
• C. Used to improve backpropagation in CNNs
• D. Used for dimensionality reduction
100. What is the primary advantage of using a Transformer over an LSTM for
NLP tasks?
• A. Faster training but higher computational cost
• ✅ B. Parallel processing of input sequences for faster computation
• C. Better suited for small datasets
• D. Less memory usage than LSTMs