Multiple Choice Questions (MCQ)
UNIT 1
1. Which of the following is NOT a subfield of Artificial Intelligence? a) Machine
Learning
b) Natural Language Processing
c) Quantum Computing
d) Robotics
Answer: c) Quantum Computing
2. Who is known as the father of Artificial Intelligence? a) Alan Turing
b) John McCarthy
c) Marvin Minsky
d) Herbert Simon
Answer: b) John McCarthy
3. Which of the following is an application of AI in healthcare? a) Automated
language translation
b) Autonomous driving
c) Predictive diagnostics and personalized medicine
d) Financial fraud detection
Answer: c) Predictive diagnostics and personalized medicine
4. In which area is AI commonly used for improving customer service? a) Virtual
Reality
b) Chatbots and virtual assistants
c) Renewable energy
d) Quantum computing
Answer: b) Chatbots and virtual assistants
5. What is the primary role of a problem-solving agent in AI? a) To learn from data
b) To interact with users
c) To find a sequence of actions that leads to a desirable goal
d) To translate languages
Answer: c) To find a sequence of actions that leads to a desirable goal
6. Which of the following characteristics is essential for a problem-solving agent? a)
Learning capability
b) Perception
c) Goal formulation
d) Speech recognition
Answer: c) Goal formulation
7. Which search algorithm explores the entire search space to find the goal? a)
Depth-first search
b) Breadth-first search
c) Best-first search
d) Hill-climbing search
Answer: b) Breadth-first search
8. Which of the following is a characteristic of uninformed search algorithms? a)
They use heuristics to guide the search
b) They have no information about the number of steps to the goal
c) They guarantee to find the shortest path
d) They require a heuristic function
Answer: b) They have no information about the number of steps to the goal
9. Which of the following is an uninformed search strategy? a) A* search
b) Greedy search
c) Depth-first search
d) Hill-climbing search
Answer: c) Depth-first search
10. What is the primary drawback of Depth-First Search (DFS)? a) It requires
exponential memory
b) It may not find the shortest path
c) It is too slow
d) It needs a heuristic function
Answer: b) It may not find the shortest path
11. Which search algorithm uses heuristics to improve efficiency? a) Depth-First
Search
b) Breadth-First Search
c) A* search
d) Uniform Cost Search
Answer: c) A* search
12. What is the heuristic function used for in heuristic search strategies? a) To
guarantee the shortest path
b) To estimate the cost to reach the goal
c) To eliminate unnecessary nodes
d) To explore all possible paths
Answer: b) To estimate the cost to reach the goal
13. Which local search algorithm is known for escaping local maxima by allowing
some bad moves? a) Hill-Climbing
b) Simulated Annealing
c) Genetic Algorithm
d) Best-First Search
Answer: b) Simulated Annealing
14. Which of the following is an example of an optimization problem? a) Pathfinding
in a maze
b) Scheduling tasks with limited resources
c) Playing chess
d) Translating languages
Answer: b) Scheduling tasks with limited resources
15. What is the primary focus of adversarial search in AI? a) Finding the shortest path
b) Maximizing utility in a competitive environment
c) Optimizing a single agent's performance
d) Learning from examples
Answer: b) Maximizing utility in a competitive environment
16. Which algorithm is commonly used in adversarial search for games like chess? a)
A* search
b) Minimax algorithm
c) Breadth-First Search
d) Simulated Annealing
Answer: b) Minimax algorithm
17. What is a Constraint Satisfaction Problem (CSP)? a) A problem where the
solution must satisfy a set of constraints
b) A problem focused on learning from data
c) A problem that requires searching for the shortest path
d) A problem with a single, fixed solution
Answer: a) A problem where the solution must satisfy a set of constraints
18. Which technique is commonly used to solve CSPs? a) Heuristic search
b) Backtracking
c) Local search
d) Minimax algorithm
Answer: b) Backtracking
Multiple Choice Questions (MCQ)
UNIT 2
1. Which of the following best describes acting under uncertainty in AI? a) Making
decisions with perfect information
b) Making decisions without any information
c) Making decisions based on incomplete or probabilistic information
d) Making decisions with predetermined outcomes
Answer: c) Making decisions based on incomplete or probabilistic information
2. What is the primary goal when an AI agent acts under uncertainty?
a) To minimize computation time
b) To maximize expected utility
c) To gather more data
d) To simplify the problem
Answer: b) To maximize expected utility
3. Bayesian inference is used to update the probability estimate for a hypothesis as
more __________ becomes available. a) hypotheses
b) data
c) algorithms
d) errors
Answer: b) data
4. Which formula is fundamental to Bayesian inference? a) Bayes' Theorem
b) Pythagorean Theorem
c) Euler's Formula
d) Heisenberg's Uncertainty Principle
Answer: a) Bayes' Theorem
5. The Naïve Bayes model assumes that all features are __________. a) dependent
b) equally probable
c) independent
d) deterministic
Answer: c) independent
6. Naïve Bayes is particularly effective for __________ tasks. a) classification
b) regression
c) clustering
d) optimization
Answer: a) classification
7. Probabilistic reasoning in AI deals with reasoning and making decisions based
on __________. a) certain outcomes
b) exact calculations
c) probabilities
d) deterministic rules
Answer: c) probabilities
8. A common framework for probabilistic reasoning in AI is the __________. a)
decision tree
b) neural network
c) Bayesian network
d) support vector machine
Answer: c) Bayesian network
9. A Bayesian network is a directed acyclic graph where nodes represent
__________ and edges represent __________. a) variables, dependencies
b) outcomes, probabilities
c) probabilities, outcomes
d) features, labels
Answer: a) variables, dependencies
10. In a Bayesian network, the conditional probabilities are stored in __________. a)
edges
b) nodes
c) tables
d) graphs
Answer: c) tables
11. Which of the following is a method for exact inference in Bayesian networks? a)
Markov Chain Monte Carlo
b) Gibbs Sampling
c) Variable Elimination
d) Particle Filtering
Answer: c) Variable Elimination
12. Exact inference in Bayesian networks is computationally feasible when the
network is __________. a) very large
b) very small
c) densely connected
d) sparsely connected
Answer: d) sparsely connected
13. Which of the following is a technique for approximate inference in Bayesian
networks? a) Variable Elimination
b) Junction Tree Algorithm
c) Gibbs Sampling
d) Exact Enumeration
Answer: c) Gibbs Sampling
14. Approximate inference methods are used when exact inference is __________. a)
unnecessary
b) infeasible
c) too simple
d) perfectly accurate
Answer: b) infeasible
15. A causal network is a type of Bayesian network that explicitly represents
__________ relationships. a) temporal
b) causal
c) spatial
d) independent
Answer: b) causal
16. In causal networks, the direction of the arrows indicates the direction of
__________. a) correlation
b) causation
c) time
d) independence
Answer: b) causation
Multiple Choice Questions (MCQ)
UNIT 3
1. Which of the following is a type of supervised learning? a) Clustering
b) Regression
c) Dimensionality reduction
d) Anomaly detection
Answer: b) Regression
2. Unsupervised learning is primarily used for __________. a) classification
b) clustering
c) regression
d) supervised tasks
Answer: b) clustering
3. In linear regression, the method of least squares is used to __________. a)
maximize the number of variables
b) minimize the sum of squared residuals
c) maximize the correlation coefficient
d) minimize the variance
Answer: b) minimize the sum of squared residuals
4. Which of the following statements is true for multiple linear regression? a) It can
only have one predictor variable.
b) It can have multiple predictor variables.
c) It is only applicable for binary outcomes.
d) It cannot be used for prediction.
Answer: b) It can have multiple predictor variables.
5. Bayesian linear regression incorporates __________ into the model. a) prior
distributions
b) maximum likelihood estimates
c) clustering
d) feature scaling
Answer: a) prior distributions
6. Gradient descent is an optimization algorithm used to find the __________ of a
function. a) maximum
b) minimum
c) derivative
d) integral
Answer: b) minimum
7. A discriminant function is used in linear classification models to __________. a)
calculate probabilities
b) separate classes
c) generate random data
d) reduce dimensionality
Answer: b) separate classes
8. Logistic regression is a type of __________ model. a) probabilistic generative
b) probabilistic discriminative
c) unsupervised learning
d) clustering
Answer: b) probabilistic discriminative
9. Naive Bayes models assume that all features are __________. a) dependent
b) independent
c) continuous
d) binary
Answer: b) independent
10. Naive Bayes is particularly useful for __________ tasks. a) regression
b) classification
c) clustering
d) anomaly detection
Answer: b) classification
11. Support Vector Machines aim to find a hyperplane that __________ the margin
between classes. a) minimizes
b) maximizes
c) equalizes
d) reduces
Answer: b) maximizes
12. The kernel trick in SVM is used to __________. a) handle linear data
b) handle non-linear data
c) increase computational cost
d) reduce overfitting
Answer: b) handle non-linear data
13. Decision trees split the data into subsets based on __________. a) random selection
b) a single feature value
c) the mean of all features
d) the sum of all feature values
Answer: b) a single feature value
14. One common issue with decision trees is __________. a) underfitting
b) overfitting
c) low interpretability
d) high computational cost
Answer: b) overfitting
15. Random forests are an ensemble method that uses multiple __________. a) linear
models
b) decision trees
c) support vector machines
d) logistic regression models
Answer: b) decision trees
16. One advantage of random forests over single decision trees is __________. a)
increased overfitting
b) reduced interpretability
c) reduced variance
d) increased computational cost
Answer: c) reduced variance
Multiple Choice Questions (MCQ)
UNIT 4
1. Which of the following is a model combination scheme where the final prediction
is made by taking the majority vote of multiple models? a) Bagging
b) Boosting
c) Voting
d) Stacking
Answer: c) Voting
2. In model combination schemes, combining multiple learners often leads to
__________. a) increased bias
b) decreased variance
c) decreased accuracy
d) increased computational cost
Answer: b) decreased variance
Ensemble Learning - Bagging, Boosting, Stacking
3. Bagging, short for Bootstrap Aggregating, primarily aims to reduce __________.
a) bias
b) variance
c) overfitting
d) underfitting
Answer: b) variance
4. Boosting combines multiple weak learners to create a strong learner by focusing
on __________ examples. a) correctly classified
b) random
c) incorrectly classified
d) underrepresented
Answer: c) incorrectly classified
5. Stacking, an ensemble learning method, uses __________ to combine the
predictions of multiple base models. a) voting
b) a meta-learner
c) averaging
d) boosting
Answer: b) a meta-learner
Unsupervised Learning: K-means
6. In K-means clustering, the goal is to partition the data into __________ clusters.
a) hierarchical
b) overlapping
c) K non-overlapping
d) probabilistic
Answer: c) K non-overlapping
7. The K-means algorithm updates the cluster centers by calculating the
__________ of all points in a cluster. a) median
b) mode
c) mean
d) range
Answer: c) mean
Instance Based Learning: KNN
8. The K-nearest neighbors (KNN) algorithm is an example of __________
learning. a) supervised
b) unsupervised
c) reinforcement
d) instance-based
Answer: d) instance-based
9. In KNN, the classification of a new instance is determined by the __________ of
its nearest neighbors. a) distance
b) mean
c) majority vote
d) regression
Answer: c) majority vote
Gaussian Mixture Models and Expectation Maximization
10. A Gaussian Mixture Model (GMM) represents a mixture of __________
Gaussian distributions. a) dependent
b) independent
c) overlapping
d) hierarchical
Answer: b) independent
11. The Expectation-Maximization (EM) algorithm is commonly used to estimate the
parameters of __________. a) decision trees
b) linear regression
c) Gaussian Mixture Models
d) neural networks
Answer: c) Gaussian Mixture Models
12. In the EM algorithm, the E-step involves calculating the __________ of the latent
variables given the current parameter estimates. a) maximization
b) expectation
c) classification
d) minimization
Answer: b) expectation
13. In the M-step of the EM algorithm, the goal is to __________ the expected
complete-data log-likelihood. a) maximize
b) minimize
c) average
d) differentiate
Answer: a) maximize
Multiple Choice Questions (MCQ)
UNIT 5
1. The perceptron algorithm is used for __________ classification problems. a)
linear
b) non-linear
c) clustering
d) regression
Answer: a) linear
2. A multilayer perceptron (MLP) consists of __________. a) only input and output
layers
b) multiple layers of neurons including hidden layers
c) no hidden layers
d) a single layer of neurons
Answer: b) multiple layers of neurons including hidden layers
3. Which activation function is commonly used to introduce non-linearity in neural
networks? a) Linear
b) Sigmoid
c) ReLU
d) Step
Answer: c) ReLU
4. The sigmoid activation function outputs values in the range of __________. a) -1
to 1
b) 0 to 1
c) -∞ to +∞
d) 0 to +∞
Answer: b) 0 to 1
5. Gradient descent is used to __________ the cost function in neural network
training. a) maximize
b) minimize
c) approximate
d) stabilize
Answer: b) minimize
6. Stochastic gradient descent updates the model parameters using __________. a)
the entire dataset
b) a single data point
c) a batch of data points
d) none of the above
Answer: b) a single data point
7. Backpropagation is used to compute the __________ of the loss function with
respect to each weight by the chain rule. a) value
b) gradient
c) sum
d) product
Answer: b) gradient
8. In backpropagation, the gradients are propagated from the __________ to the
__________ layer. a) input, output
b) output, input
c) hidden, output
d) input, hidden
Answer: b) output, input
9. Deep networks are characterized by having __________ hidden layers compared
to shallow networks. a) fewer
b) more
c) no
d) single
Answer: b) more
10. Deep networks can model more complex functions due to their ability to capture
__________. a) linear relationships
b) hierarchical features
c) low-dimensional data
d) simple patterns
Answer: b) hierarchical features
11. The vanishing gradient problem is more common with the __________ activation
function. a) ReLU
b) Sigmoid
c) Tanh
d) Softmax
Answer: b) Sigmoid
12. The ReLU activation function helps mitigate the vanishing gradient problem
because it __________. a) is linear
b) has a constant derivative
c) has a derivative of 0 for negative inputs
d) avoids saturation
Answer: d) avoids saturation
13. Hyperparameter tuning involves selecting the best set of hyperparameters for a
model using techniques like __________. a) gradient descent
b) grid search
c) backpropagation
d) dropout
Answer: b) grid search
14. An example of a hyperparameter in a neural network is the __________. a)
weights
b) learning rate
c) input data
d) output prediction
Answer: b) learning rate
15. Batch normalization is used to __________ the training of deep networks. a) slow
down
b) accelerate
c) eliminate
d) complicate
Answer: b) accelerate
16. Batch normalization works by normalizing the __________ of each mini-batch.
a) gradients
b) activations
c) weights
d) biases
Answer: b) activations
17. Regularization techniques are used to prevent __________ in neural networks. a)
underfitting
b) overfitting
c) gradient descent
d) backpropagation
Answer: b) overfitting
18. A common regularization technique is __________, which adds a penalty to the
loss function. a) dropout
b) batch normalization
c) L2 regularization
d) activation
Answer: c) L2 regularization
19. Dropout is a regularization technique where __________ are randomly dropped
during training. a) input data
b) weights
c) activations
d) gradients
Answer: c) activations
20. The purpose of dropout is to prevent __________ by making the network less
sensitive to specific weights. a) underfitting
b) overfitting
c) gradient vanishing
d) gradient explosion
Answer: b) overfitting