0% found this document useful (0 votes)
25 views21 pages

ML 1-100

The document consists of a series of multiple-choice questions related to machine learning and deep learning concepts. It covers various topics such as gradient boosting, reinforcement learning, dimensionality reduction, and different algorithms used for classification, clustering, and natural language processing. Each question is followed by options, with the correct answer marked for reference.

Uploaded by

rahul.kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views21 pages

ML 1-100

The document consists of a series of multiple-choice questions related to machine learning and deep learning concepts. It covers various topics such as gradient boosting, reinforcement learning, dimensionality reduction, and different algorithms used for classification, clustering, and natural language processing. Each question is followed by options, with the correct answer marked for reference.

Uploaded by

rahul.kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

1-25

1. What is the primary use of 'gradient boosting' in machine learning?

o A. To remove noise from datasets

o B. To make a single model interpretable

o ✅ C. To combine multiple weak learners into a strong learner

o D. To increase feature importance

2. In ensemble learning, bagging and boosting are two different techniques.


What is the primary difference between them?
o ✅ A. Bagging combines multiple models to make predictions, while boosting
trains models sequentially.
o B. Bagging reduces bias, while boosting reduces variance.
o C. Boosting uses a single model, while bagging uses multiple models.
o D. Boosting is used only for regression, while bagging is for classification.

3. Which algorithm is commonly used for anomaly detection and one-class


classification? o A. K-Means Clustering

o B. Decision Trees

o ✅ C. One-Class Support Vector Machines (One-Class SVM)

o D. Logistic Regression

4. What is the purpose of dimensionality reduction techniques in machine


learning?

o A. To improve model interpretability by adding more parameters

o B. To increase the number of features

o ✅ C. To reduce the complexity of data while preserving important


information

o D. To remove all noise from data

5. In reinforcement learning, what is the term for the process of estimating


the expected cumulative rewards of taking a particular action in a specific
state?

o A. Exploration
o B. Temporal Difference Learning

o ✅ C. Q-Value

o D. Discount Factor

6. Which machine learning algorithm is known for its ability to handle


imbalanced datasets and is often used in fraud detection?

o A. K-Nearest Neighbors (KNN)

o B. Naive Bayes

o C. Logistic Regression

o ✅ D. Random Forest or Gradient Boosting

7. Which algorithm is commonly used for clustering and dimensionality


reduction and is based on finding the nearest neighbors of data points?

o ✅ A. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

o B. K-Means Clustering

o C. Principal Component Analysis (PCA)

o D. Decision Trees

8. What is the key idea behind word embeddings in natural language


processing?

o A. Representing words as one-hot encoded vectors

o B. Using only frequency-based word representations

o ✅ C. Representing words as dense vectors in a continuous space o

D. Sorting words alphabetically in datasets

9. What is the primary goal of 'convolutional neural networks' (CNNs) in deep


learning?

o A. To analyze tabular data

o B. To classify text documents

o ✅ C. To perform image recognition and feature extraction

o D. To process sequential data like time series


10. In reinforcement learning, what is the term for the numerical value that
represents the goodness or desirability of a state or action?

• ✅ A. Reward

• B. Discount Factor

• C. Policy Gradient

• D. Bellman

11. What is the primary goal of 'Kullback-Leibler divergence' (KL divergence)


in information theory?

• A. To find the shortest path between two data points

• ✅ B. To quantify the difference between two probability distributions

• C. To measure the variance of a dataset

• D. To optimize hyperparameters in a neural network

12. In deep learning, what is a convolutional layer primarily used for?

• ✅ A. Feature extraction from images

• B. Reducing noise in numerical data

• C. Combining multiple classifiers

• D. Splitting datasets into training and testing sets

13. What is the purpose of batch normalization in deep neural networks?

• A. To remove outliers from training data

• ✅ B. To help stabilize and accelerate training by normalizing input features


at each layer • C. To increase the number of neurons in each layer

• D. To reduce model accuracy for faster inference

14. In unsupervised learning, what is the main goal of 'hierarchical


clustering'?

• A. To assign each data point to a unique cluster

• B. To improve performance of supervised learning models

• ✅ C. To merge clusters based on similarity to create a hierarchy

• D. To train a deep learning model using small datasets


15. Which machine learning algorithm is often used for collaborative filtering
and recommendation systems?

• A. Decision Trees

• B. Logistic Regression

• ✅ C. Matrix Factorization (e.g., Singular Value Decomposition - SVD)

• D. K-Means Clustering

16. Which machine learning algorithm is often used for sentiment analysis
and text classification and is based on a probabilistic approach?

• A. Support Vector Machine (SVM)

• B. K-Nearest Neighbors (KNN)

• ✅ C. Naive Bayes

• D. Decision Trees

17. What is the main goal of semi-supervised learning?

• A. To train models using only labeled data

• ✅ B. To learn from both labeled and unlabeled data

• C. To reduce the number of features in a dataset

• D. To create synthetic data for training

18. What is the primary use of 'word embeddings' in natural language


processing?

• A. ✅ To represent words as dense vectors in a continuous space

• B. To convert words into one-hot encoded vectors

• C. To count the frequency of words in a document

• D. To group similar words using clustering

19. What is the primary objective of hyperparameter tuning in machine


learning?

• A. ✅ To improve model generalization

• B. To increase dataset size

• C. To remove bias from a dataset


• D. To train models faster without overfitting

20. What is the main benefit of using 'word2vec' embeddings in natural


language processing?

• A. They store words in tabular format

• ✅ B. They capture semantic relationships between words

• C. They reduce the need for labeled data

• D. They remove all redundant words from text

21. Which method is commonly used for handling missing data in a dataset?

• A. Deleting rows with missing values

• B. Replacing missing values with zero

• ✅ C. Imputation (e.g., mean, median, mode, interpolation)

• D. Ignoring missing values during training

22. In reinforcement learning, what is the term for the process of selecting
actions that maximize expected cumulative rewards while considering
exploration and exploitation?

• A. Reward Selection

• B. Gradient Descent

• C. Backpropagation

• ✅ D. Policy Optimization

23. What does the term 'overfitting' refer to in the context of machine
learning?

• A. When a model generalizes well to unseen data

• ✅ B. When a model performs well on training data but poorly on unseen


data

• C. When a model underestimates the complexity of a dataset

• D. When a model is unable to find a pattern in data

24. What is the primary objective of 'one-class classification' in machine


learning?

• A. To classify multiple categories of data


• B. ✅ To identify outliers and anomalies

• C. To increase training accuracy

• D. To reduce computation cost in training

25. What is the purpose of the 'sigmoid' activation function in neural


networks?

• A. ✅ To introduce non-linearity into the model

• B. To ensure weights do not change in a neural network

• C. To reduce overfitting in CNN models

• D. To increase training time

26. What is the primary goal of 'autoencoders' in deep learning?

• A. To classify images in supervised learning

• ✅ B. To compress data and reduce dimensionality

• C. To generate new labeled datasets

• D. To predict numerical values

27. Which machine learning technique is best suited for time series
forecasting and stock price prediction?

• A. Convolutional Neural Networks (CNNs)

• ✅ B. Recurrent Neural Networks (RNN) or Long Short-Term Memory (LSTM)

• C. Random Forest

• D. Naive Bayes

28. In reinforcement learning, what is the 'Bellman equation' used for?

• A. To increase dataset size

• ✅ B. To estimate the optimal value function

• C. To train deep neural networks

• D. To balance bias and variance

29. What is the primary use of a confusion matrix in classification tasks?

• A. To split data into training and testing sets


• ✅ B. To evaluate the performance of a classifier

• C. To increase model interpretability

• D. To visualize decision boundaries

30. What is the primary purpose of the 'mean squared error' (MSE) loss
function in regression tasks?

• A. ✅ To calculate the mean of squared errors between predicted and actual


values

• B. To improve model explainability

• C. To reduce bias in supervised learning

• D. To train deep learning models faster

31. Which deep learning architecture is designed to process sequences of


data and is commonly used in natural language processing?

• A. Convolutional Neural Networks (CNNs)

• ✅ B. Long Short-Term Memory (LSTM)

• C. Random Forest

• D. Naive Bayes

32. What is the primary use of 'online learning' in machine learning?

• A. To train models with static datasets

• ✅ B. To continuously update the model as new data becomes available

• C. To reduce model complexity

• D. To increase dataset interpretability

33. What is the primary advantage of using dropout layers in neural


networks?

• A. To speed up training time

• ✅ B. They prevent overfitting by randomly deactivating neurons during


training

• C. To reduce dataset size

• D. To replace backpropagation
34. In the context of reinforcement learning, what does the term 'exploration'
refer to?

• A. Avoiding new actions to reduce risk

• ✅ B. The process of selecting actions to discover new information

• C. Training a model using only supervised learning

• D. Increasing dataset complexity

35. What is the primary goal of unsupervised learning?

• A. To improve the performance of supervised models

• B. To train models using only labeled data

• ✅ C. To discover patterns and relationships in data

• D. To increase training speed in deep learning models

36. Which machine learning algorithm is commonly used for binary


classification problems and is based on a linear decision boundary?

• A. Decision Trees

• ✅ B. Support Vector Machine (SVM) or A. Logistic Regression

• C. K-Means Clustering

• D. Random Forest

37. What is the purpose of a 'one-hot encoding' in feature engineering?

• A. To compress high-dimensional data into a single value

• ✅ B. To convert categorical variables into a binary representation

• C. To cluster data points in an unsupervised manner

• D. To perform hyperparameter tuning

38. In natural language processing, what is the 'tf-idf' (term frequency-


inverse document frequency) measure used for?

• A. To count the number of times a word appears in a document

• ✅ B. To measure the importance of a word in a document relative to a


corpus

• C. To represent words as dense vectors


• D. To remove stopwords from text

39. What is the primary purpose of the 'softmax' activation function in neural
networks? • A. ✅ To convert logits into probabilities for multi-class
classification

• B. To normalize input data before training

• C. To reduce overfitting in deep learning models

• D. To balance the number of neurons in each layer

40. In reinforcement learning, what does the term 'policy' refer to?

• A. A dataset used for training

• B. A reinforcement learning algorithm

• ✅ C. A mapping from states to actions, defining the behavior of an agent

• D. A method for tuning hyperparameters

41. What is the primary challenge when training GANs?

• A. Lack of training data

• ✅ B. Mode collapse, where the generator produces limited diversity

• C. Slow training due to backpropagation

• D. High variance in predictions

42. Which machine learning model is commonly used for handling graph-
structured data? • A. ✅ Graph Neural Networks (GNNs)

• B. Decision Trees

• C. Random Forest

• D. Recurrent Neural Networks (RNNs)

43. What is the primary advantage of using Transformers over RNNs for NLP
tasks?

• A. They require fewer training examples

• B. They use convolutional layers instead of recurrent connections

• C. They process input sequences sequentially


• ✅ D. They can process the entire input sequence in parallel instead of
sequentially

44. Which method is commonly used for interpreting deep learning models?

• A. PCA (Principal Component Analysis)

• B. SHAP (Shapley Additive Explanations)

• ✅ C. LIME (Local Interpretable Model-agnostic Explanations)

• D. K-Means Clustering

45. What is the key advantage of Bayesian Optimization for hyperparameter


tuning?

• A. It reduces bias in models

• B. ✅ It efficiently searches for the best hyperparameters using probabilistic


models

• C. It requires minimal computational resources

• D. It is used only in supervised learning

46. In reinforcement learning, what is the main purpose of the 'reward


shaping' technique? • A. ✅ To accelerate learning by providing additional
guidance to the agent

• B. To increase exploration by penalizing incorrect actions

• C. To balance bias and variance in models

• D. To avoid overfitting in Q-learning

47. What is the main challenge of federated learning?

• A. Ensuring the dataset is large enough

• B. Reducing model overfitting

• ✅ C. Ensuring privacy and secure communication across distributed devices

• D. Increasing training speed

48. Which machine learning approach is best suited for multimodal learning?

• A. Using ensemble methods

• B. Training deep learning models with multiple architectures


• ✅ C. Combining multiple data sources such as text, images, and audio

• D. Using unsupervised learning exclusively

49. In explainable AI (XAI), what is the purpose of the LIME algorithm?

• A. To improve hyperparameter selection

• B. ✅ To approximate and explain model predictions using interpretable


models

• C. To detect and prevent bias in models

• D. To improve deep learning model generalization

50. What is the primary function of self-supervised learning?

• A. ✅ To learn representations from data without requiring labeled examples

• B. To train deep learning models using only supervised methods

• C. To ensure models are robust against adversarial attacks

• D. To classify images into pre-defined categories

51. What is a key application of contrastive learning in machine learning?

• A. To reduce the number of layers in deep networks

• B. To replace traditional loss functions in supervised learning

• ✅ C. Learning useful embeddings by pulling similar samples together and


pushing dissimilar ones apart

• D. To detect outliers in labeled datasets

52. Which optimization technique is best suited for training very deep neural
networks?

• A. Stochastic Gradient Descent (SGD)

• B. AdaGrad

• ✅ C. Adam optimizer

• D. K-Means

53. What is the primary role of zero-shot learning in NLP?

• A. To improve training speed

• ✅ B. To enable a model to classify data it has never seen before


• C. To replace labeled data with synthetic data

• D. To optimize the structure of neural networks

54. Which technique is most effective for handling catastrophic forgetting in


continual learning?

• A. Data Augmentation

• B. Hyperparameter Tuning

• ✅ C. Experience Replay

• D. Batch Normalization

55. What is the primary purpose of knowledge distillation in machine


learning?

• A. To increase dataset size

• B. To replace deep learning models with simpler models

• ✅ C. To transfer knowledge from a large model to a smaller model

• D. To remove redundant layers in deep networks

56. Which activation function is commonly used in transformers?

• A. ReLU

• B. Sigmoid

• ✅ C. GELU (Gaussian Error Linear Unit)

• D. Softmax

57. In reinforcement learning, what is 'multi-arm bandit' used for?

• A. Predicting time-series data

• B. Training generative adversarial networks

• ✅ C. Solving exploration-exploitation trade-offs in decision-making

• D. Hyperparameter tuning in neural networks

58. What is the primary advantage of using diffusion models in generative


AI?

• A. They require fewer labeled training samples

• B. They eliminate the need for adversarial training


• C. They process data faster than other generative models

• ✅ D. They generate high-quality images by modeling noise transitions over


time

59. Which type of neural network is used in AlphaFold for protein structure
prediction?

• A. Convolutional Neural Networks (CNNs)

• B. Recurrent Neural Networks (RNNs)

• ✅ C. Transformer-based models

• D. Generative Adversarial Networks (GANs)

60. What is the main role of masked language modeling in NLP?

• A. To reduce the number of tokens in a sentence

• B. ✅ To train models like BERT by predicting missing words in a sentence

• C. To segment text into predefined categories

• D. To convert text into one-hot encoded vectors

61. In adversarial machine learning, what is an 'evasion attack'?

• A. An attack that removes neurons from a network

• ✅ B. An attack that manipulates input data to fool a model at inference time

• C. An attack that reduces model training time

• D. An attack that modifies the loss function

62. Which deep learning framework is known for its symbolic differentiation
capability?

• A. TensorFlow

• ✅ B. JAX

• C. PyTorch

• D. Scikit-learn

63. What is the primary use of the Mixture of Experts (MoE) model in
machine learning?
• A. To increase model size for better accuracy

• B. To replace traditional neural networks

• ✅ C. To use multiple specialized subnetworks that activate based on input


data

• D. To fine-tune deep learning models

64. What is the advantage of meta-learning in few-shot learning tasks?

• A. ✅ It allows models to quickly adapt to new tasks with minimal data

• B. It increases dataset interpretability

• C. It reduces computation in deep learning models

• D. It replaces reinforcement learning algorithms

65. In unsupervised learning, what is the primary role of spectral clustering?

• A. To replace deep learning models in image classification

• B. To group labeled datasets efficiently

• ✅ C. Using eigenvalues of similarity matrices for clustering complex data

• D. To optimize the number of layers in a neural network

66. What is the key challenge of deploying deep learning models on edge
devices?

• A. Reducing dataset size

• B. ✅ Reducing computational and memory requirements

• C. Improving feature engineering

• D. Using more training samples

67. In reinforcement learning, what is 'curriculum learning'?

• A. A training method that ignores complex tasks

• ✅ B. A training strategy where tasks progressively increase in complexity

• C. A technique for reducing bias in machine learning models

• D. A method for feature selection

68. What is the purpose of an 'attention mechanism' in NLP?


• ✅ A. To focus on important parts of an input sequence while making
predictions

• B. To replace deep learning models in NLP

• C. To encode text into one-hot representations

• D. To generate word embeddings

69. Which metric is most suitable for evaluating generative models?

• A. Mean Squared Error (MSE)

• B. Accuracy

• C. ✅ Fréchet Inception Distance (FID) • D. Precision

70. What is the primary goal of entropy regularization in reinforcement


learning?

• A. To minimize training time

• B. To maximize accuracy

• ✅ C. Encouraging exploration by maximizing policy entropy

• D. To reduce computational complexity

71. What is the purpose of group normalization in deep learning?

• A. To increase model accuracy

• B. To reduce dataset imbalance

• ✅ C. To normalize activations by dividing features into groups instead of


using batch statistics

• D. To prevent adversarial attacks

72. Which framework is best suited for training physics-informed neural


networks (PINNs)? • A. PyTorch

• ✅ B. TensorFlow or JAX with differentiable physics solvers

• C. Scikit-learn

• D. XGBoost

73. What is the primary function of a hypernetwork in deep learning?

• A. To generate synthetic datasets


• ✅ B. To generate the weights of another neural network dynamically

• C. To replace the need for labeled data

• D. To improve dataset scaling

74. Which deep learning approach is best suited for learning hierarchical
structures in data?

• A. ✅ Capsule Networks

• B. Support Vector Machines (SVM)

• C. Random Forest

• D. Decision Trees

75. What is the main benefit of using the Lottery Ticket Hypothesis in deep
learning?

• A. ✅ It suggests that smaller sub-networks can be trained to achieve similar


performance as large networks

• B. It eliminates the need for labeled data

• C. It reduces training dataset size

• D. It ensures neural networks converge faster

76. In self-supervised learning, what is a common pretext task for learning


image representations?

• A. One-hot encoding of images

• B. Using decision trees for image classification

• C. Applying reinforcement learning on labeled images

• ✅ D. Predicting missing parts of an image or solving jigsaw puzzles

77. Which loss function is commonly used for training Variational


Autoencoders (VAEs)? • A. Cross-Entropy Loss

• ✅ B. Evidence Lower Bound (ELBO) loss

• C. Mean Squared Error (MSE)

• D. Hinge Loss

78. What is the main goal of entropy regularization in reinforcement


learning?
• A. To penalize incorrect actions

• ✅ B. Encouraging exploration by maximizing policy entropy

• C. To prevent underfitting

• D. To optimize hyperparameters

79. In multi-task learning, what is the main challenge of negative transfer?

• A. ✅ One task adversely affecting the learning of another task

• B. Reducing dataset size

• C. Training multiple tasks simultaneously

• D. Overfitting due to excessive data

80. What is a key advantage of using diffusion models for generative tasks?

• A. They eliminate noise from real images

• ✅ B. They generate high-quality samples by iteratively denoising random


noise

• C. They remove the need for training deep models

• D. They are less computationally expensive than GANs

81. Which method is commonly used for out-of-distribution detection?

• A. Cross-validation

• B. Gradient Boosting

• ✅ C. Mahalanobis distance

• D. Word2Vec

82. What is the primary purpose of quantization in deep learning?

• A. ✅ Reducing model size and computational cost

• B. Increasing accuracy of deep networks

• C. Improving image quality in generative models

• D. Eliminating adversarial attacks

83. Which framework is best suited for training physics-informed neural


networks (PINNs)? • A. PyTorch
• ✅ B. TensorFlow or JAX with differentiable physics solvers

• C. Scikit-learn

• D. XGBoost

84. What is the primary function of a hypernetwork in deep learning?

• A. To generate synthetic datasets

• ✅ B. To generate the weights of another neural network dynamically

• C. To replace the need for labeled data

• D. To improve dataset scaling

85. Which deep learning approach is best suited for learning hierarchical
structures in data?

• ✅ A. Capsule Networks

• B. Support Vector Machines (SVM)

• C. Random Forest

• D. Decision Trees

86. What is the purpose of curriculum learning in training deep learning


models?

• A. To randomly shuffle data to improve accuracy

• ✅ B. To progressively introduce harder tasks during training to improve


learning efficiency

• C. To create new labeled datasets

• D. To increase the number of parameters in a model

87. In few-shot learning, what is the purpose of prototypical networks?

• A. To perform reinforcement learning on small datasets

• B. To classify data using decision trees

• C. To optimize hyperparameters in deep learning models

• ✅ D. To classify new samples based on their proximity to learned class


prototypes
88. What is the primary goal of the Neural Tangent Kernel (NTK) in deep
learning?

• A. To train neural networks with fewer parameters

• ✅ B. To analyze the training dynamics of infinitely wide neural networks

• C. To optimize deep learning architectures for faster convergence

• D. To increase dataset complexity

89. In adversarial training, what is the primary purpose of the PGD (Projected
Gradient Descent) attack?

• ✅ A. To iteratively generate adversarial examples that maximize model


error

• B. To reduce overfitting in neural networks

• C. To improve the robustness of deep learning models

• D. To minimize loss functions in supervised learning

90. What is the main benefit of using the Lottery Ticket Hypothesis in deep
learning?

• A. ✅ It suggests that smaller sub-networks can be trained to achieve similar


performance as large networks

• B. It eliminates the need for labeled data

• C. It reduces training dataset size

• D. It ensures neural networks converge faster

91. Which type of neural network is most suitable for object detection in
images?

• A. Recurrent Neural Networks (RNNs)

• ✅ B. Convolutional Neural Networks (CNNs)

• C. Graph Neural Networks (GNNs)

• D. Long Short-Term Memory (LSTM) Networks

92. Which optimization method is best suited for reinforcement learning


tasks?

• A. Stochastic Gradient Descent (SGD)


• ✅ B. Trust Region Policy Optimization (TRPO)

• C. Principal Component Analysis (PCA)

• D. Support Vector Machines (SVM)

93. What is a key challenge in federated learning?

• A. Lack of labeled data

• ✅ B. Secure aggregation of distributed models while maintaining privacy

• C. Optimizing hyperparameters for deep learning models

• D. Generating adversarial examples

94. Which algorithm is commonly used for recommendation systems apart


from matrix factorization?

• A. Decision Trees

• B. Reinforcement Learning

• ✅ C. Deep Neural Networks (DNNs)

• D. K-Means Clustering

95. Which technique is commonly used to prevent mode collapse in GANs?

• A. Increasing the batch size

• ✅ B. Using Wasserstein GAN (WGAN) with gradient penalty

• C. Applying dropout layers

• D. Adding more convolutional layers

96. Which activation function is known for being computationally efficient in


deep networks?

• ✅ A. ReLU (Rectified Linear Unit)

• B. Sigmoid

• C. Tanh

• D. Softmax

97. Which reinforcement learning algorithm is model-based?

• A. Deep Q-Network (DQN)


• ✅ B. AlphaGo (Monte Carlo Tree Search + Deep Learning)

• C. Proximal Policy Optimization (PPO)

• D. Trust Region Policy Optimization (TRPO)

98. What is a primary advantage of meta-learning?

• A. It reduces the size of training datasets

• B. ✅ It enables fast adaptation to new tasks with minimal training examples

• C. It enhances feature selection in supervised learning

• D. It improves deep learning model accuracy

99. What is a key feature of contrastive divergence in machine learning?

• A. Used in reinforcement learning

• ✅ B. Used to efficiently train Restricted Boltzmann Machines (RBMs)

• C. Used to improve backpropagation in CNNs

• D. Used for dimensionality reduction

100. What is the primary advantage of using a Transformer over an LSTM for
NLP tasks?

• A. Faster training but higher computational cost

• ✅ B. Parallel processing of input sequences for faster computation

• C. Better suited for small datasets

• D. Less memory usage than LSTMs

You might also like