Gondwana University Gadchiroli
B.E. (Computer Science & Engineering)
Sem-VI(Model Curriculum)
Subject: - Machine Learning
MCQs Question Bank with Answer Key
UNIT-1
Que1: Identify the type of learning in which labeled training data is used.
A) Semi unsupervised learning
B) Supervised learning
C) Reinforcement learning
D) Unsupervised learning
Ans:- B
Que2:- Machine learning is a subset of which of the following.
A) Artificial intelligence
B) Deep learning
C) Data learning
D) None of the above
Ans:- A
Que3:- What is machine learning?
A) The selective acquisition of knowledge through the use of manual programs
B) The selective acquisition of knowledge through the use of computer programs
C) The autonomous acquisition of knowledge through the use of manual programs
D) The autonomous acquisition of knowledge through the use of computer programs
Ans:- D
Que4:- The robotic arm will be able to paint every corner in the automotive parts while
minimizing the quantity of paint wasted in the process. Which learning technique is used in
this problem?
(A) Supervised Learning.
(B) Unsupervised Learning.
(C) Reinforcement Learning.
(D) Both (A) and (B).
Answer: C
Que5:- ________refers to a model that can neither model the training data nor generalize to
new data.
A) good fitting
B) overfitting
C) underfitting
D) all of the above
Ans:- C
Que6:- The supervised learning problems can be grouped as _______.
A) Regression problems
B) Classification problems
C) Both A and B
D) None of the above
Ans:- C
Que7:- The unsupervised learning problems can be grouped as _______.
A. Clustering
B. Association
C. Both A and B
D. None of the above
Ans:- C
Que8:- Which machine learning models are trained to make a series of decisions based on
the rewards and feedback they receive for their actions?
A. Supervised learning
B. Unsupervised learning
C. Reinforcement learning
D. All of the above
Ans:- C
Que9:- Machine learning algorithms build a model based on sample data, known as _____
A. Training Data
B. Transfer Data
C. Data Training
D. None of the above
Ans:- A
Que10:- Bias refers to how well your model can represent all possible outcomes, whereas
variance refers to how sensitive your predictions are to changes in the model’s parameters.
True or False?
A. True
B. False
Ans:- True
Que11:- The ______ is one way to quantify generalization error.
A) Bias
B) Variance
C) Bias-Variance Composition
D) Bias – Variance Decomposition
Ans:- D
Que12:- Because of low bias and high variance , we get ________ model.
A) high error
B) perfectly fitting
C) underfitting
D) over fitting
Ans:- D
Que13:- Features of Reinforcement learning are
A. Set of problem rather than set of techniques
B. RL is training by reward
C. RL is learning from trial and error
D. All of these
Ans:- D
Que14:- Choose the correct statement.
A) A hypothesis is a function that best describes the target function in machine learning.
B) The hypothesis that an algorithm would come up depends upon the data and also
depends upon the restrictions and bias that we have imposed on the data.
C) Both A and B
D) None of the above
Ans:- C
Que15:- What is hypothesis space?
A. Set of all the possible legal hypothesis.
B. Set from which the machine learning algorithm would determine the best
possible (only one) which would best describe the target function or the outputs.
C. Both A & B
D. None of the above
Ans:- C
Que16:- How large is the hypothesis space when we have n Boolean attributes?
A. |H| = 3 n
B. |H| = 2 n
C. |H| = 1 n
D. |H| = 4n
Ans:-A
Que17:- In general, to have a well-defined learning problem, we must identity which of the
following
(A) The class of tasks
(B) The measure of performance to be improved
(C) The source of experience
(D) All of the above
Ans:- D
Que18:- Successful applications of ML
(A) Learning to recognize spoken words
(B) Learning to drive an autonomous vehicle
(C) Learning to classify new astronomical structures
(D) All of the above
Ans:- D
Que19:- Designing a machine learning approach involves:-
(A) Choosing the type of training experience
(B) Choosing the target function to be learned
(C) Choosing a function approximation algorithm
(D) All of the above
Ans:- D
Que20:- Fraud Detection, Disease detection, Diagnostic, and Customer Retention are
applications in which of the following
A) Unsupervised Learning: Regression
B) Supervised Learning: Classification
C) Unsupervised Learning: Clustering
D) Reinforcement Learning
Ans:- B
UNIT-2
Que1:- Choose a disadvantage of decision trees among the following.
A) Decision trees are robust to outliers
B) Factor analysis
C) Decision trees are prone to overfit.
D) All of the above
Ans:- C
Que2:- Feature can be used as a
A. binary split
B. predictor
C. both a and b
D. none of the above
Ans:- C
Que3:- PCA works better if there is
1. A linear structure in the data
2. If the data lies on a curved surface and not on a flat surface.
3. If variables are scaled in the same unit
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. 1, 2 and 3
Ans:- C
Que4:- Support Vector Machine is
A. logical model
B. probabilistic model
C. geometric model
D. none of the above
Ans:- C
Que5:- In multiclass classification number of classes must be
A. less than two
B. equals to two
C. greater than two
D. option 1 and option 2
Ans:- C
Que6:- Which of the following can only be used when training data are linearly separable?
A. linear hard-margin svm
B. linear logistic regression
C. linear soft margin svm
D. the centroid method
Ans:- A
Que7:- Impact of high variance on the training set ?
A. overfitting
B. underfitting
C. both underfitting & overfitting
D. depents upon the dataset
Ans:- A
Que8:- What are support vectors?
A. all the examples that have a non-zero weight ??k in a svm
B. the only examples necessary to compute f(x) in an svm.
C. all of the above
D. none of the above
Ans:- C.
Que9:- High entropy means that the partitions in classification are
A. pure
B. not pure
C. useful
D. useless
Ans:- B
Que10:- Which of the following statement is False in the case of the KNN Algorithm?
(A) For a very large value of K, points from other classes may be included in the
neighborhood.
(B) For the very small value of K, the algorithm is very sensitive to noise.
(C) KNN is used only for classification problem statements.
(D) KNN is a lazy learner.
Ans:- C
Que11:- Which of the following statement is TRUE?
(A) Outliers should be identified and removed always from a dataset.
(B) Outliers can never be present in the testing dataset.
(C) Outliers is a data point that is significantly close to other data points.
(D) The nature of our business problem determines how outliers are used.
Answer: D
Que12:- Which one of the following statements is TRUE for a Decision Tree?
(A) Decision tree is only suitable for the classification problem statement.
(B) In a decision tree, the entropy of a node decreases as we go down a decision tree.
(C) In a decision tree, entropy determines purity.
(D) Decision tree can only be used for only numeric valued and continuous attributes.
Answer: - B
Que13:- How do you choose the right node while constructing a decision tree?
(A) An attribute having high entropy
(B) An attribute having high entropy and information gain
(C) An attribute having the lowest information gain.
(D) An attribute having the highest information gain.
Answer: D
Que14:- Which of the following is FALSE for neural networks?
(A) Artificial neurons are similar in operation to biological neurons.
(B) Training time for a neural network depends on network size.
(C) Neural networks can be simulated on conventional computers.
(D) The basic unit of neural networks are neurons.
Ans: A
Que15:- A machine learning problem involves four attributes plus a class. The attributes
have 3, 2, 2, and 2 possible values each. The class has 3 possible values. How many
maximum possible different examples are there?
A) 12
B) 24
C) 48
D) 72
Ans:- D
Que16:- What kind of distance metric(s) are suitable for categorical variables to find the
closest neighbors?
(A) Euclidean distance.
(B) Manhattan distance.
(C) Minkowski distance.
(D) Hamming distance.
Ans: D
Que17:- Which of the following logic function cannot be implemented by a perceptron
having 2 inputs?
(A) AND.
(B) OR.
(C) NOR.
(D) XOR.
Ans: D
Que18:- Which is true for neural networks?
A) It has set of nodes and connections
B) Each node computes it’s weighted input
C) Node could be in excited state or non-excited state
D) All of the above
Ans:- D
Que19:- What is perceptron?
A) a single layer feed-forward neural network with pre-processing
B) an auto-associative neural network
C) a double layer auto-associative neural network
D) a neural network that contains feedback
Ans:- A
Que20:- A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the
constant of proportionality being equal to 2. The inputs are 4, 10, 5 and 20 respectively.
What will be the output?
A) 238
B) 76
C) 119
D) 123
Ans:- A
UNIT-3
Que1:- Which of the following statements is FALSE about Ridge and Lasso Regression?
(A) These are types of regularization methods to solve the overfitting problem.
(B) Lasso Regression is a type of regularization method.
(C) Ridge regression shrinks the coefficient to a lower value.
(D) Ridge regression lowers some coefficients to a zero value.
Answer: D
Que2:- Which of the following is not kernel method?
A. Linear
B. Polynomial
C. Gaussian
D. Continuous
Ans:- D
Que3:- What do you mean by generalization error in terms of the SVM?
A) How far the hyperplane is from the support vectors
B) How accurately the SVM can predict outcomes for unseen data
C) The threshold amount of error in an SVM
D) All of the above
Ans:- B
Que4:- What do you mean by a hard margin?
A) The SVM allows very low error in classification
B) The SVM allows high amount of error in classification
C) None of the above
D) All of the above
Ans:-A
Que5:- Suppose you are using RBF kernel in SVM with high Gamma value. What does this
signify?
A) The model would consider even far away points from hyperplane for modeling
B) The model would consider only the points close to the hyperplane for modeling
C) The model would not be affected by distance of points from hyperplane for
modeling
D) None of the above
Ans:-B
Que6:- Which of the following can only be used when training data are linearly separable?
A) Linear hard-margin SVM.
B) Linear Logistic Regression.
C) Linear Soft margin SVM.
D) The centroid method.
Ans:- A
Que7:- What is/are true about RBF network?
A) A kind of supervised learning
B) Design of NN as curve fitting problem
C) Use of multidimensional surface to interpolate the test data
D) All of these
Ans:- D
Que8:-The cost parameter in the SVM means:
A) The number of cross-validations to be made
B) The kernel to be used
C) The tradeoff between misclassification and simplicity of the model
D) None of the above
Ans:- C
Que9:-If I am using all features of my dataset and I achieve 100% accuracy on my training
set, but ~70% on validation set, what should I look out for?
A) Underfitting
B) Nothing, the model is perfect
C) Overfitting
D) Both A and C
Ans:- C
Que10:- In SVM, the dimension of the hyperplane depends upon which one?
(A) the number of features
(B) the number of samples
(C) the number of target variables
(D) All of the above
Ans:- A
Que11:- For Ridge Regression, if the regularization parameter = 0, what does it mean?
(A) Large coefficients are not penalized
(B) Overfitting problems are not accounted for
(C) The loss function is as same as the ordinary least square loss function
(D) All of the above
Ans:- B
Que12: For Lasso Regression, if the regularization parameter = 0, what does it mean?
(A) The loss function is as same as the ordinary least square loss function
(B) Can be used to select important features of a dataset
(C) Shrinks the coefficients of less important features to exactly 0
(D) All of the above
Ans:- A
Que13:- What is correct about kernel in SVM?
A) Kernel function is used to map a lower dimensional data into a higher dimensional
data.
B) Kernel function is used to map a higher dimensional data into a lower dimensional
data.
C) Kernel function is used to find hard margin.
D) None of these.
Ans:- A
Que14: What is the purpose of the Kernel Trick?
A) To map a lower dimensional data into a higher dimensional data.
B) To map a higher dimensional data into a lower dimensional data.
C) To transform the problem from supervised to unsupervised learning.
D) All of the above
Ans:- C
Que15:- The effectiveness of an SVM depends upon:
A) Selection of Kernel
B) Kernel Parameters
C) Soft Margin Parameter C
D) All of the above
Ans:-D
Que16:- When the C parameter is set to infinite, which of the following holds true?
A) The optimal hyperplane if exists, will be the one that completely separates the data
B) The soft-margin classifier will separate the data
C) Both A and B
D) None of the above
Ans:-A
Que17:- The cost parameter in the SVM means:
A) The number of cross-validations to be made
B) The kernel to be used
C) The tradeoff between misclassification and simplicity of the model
D) None of the above
Ans:- C
Que18:- Which of the following are real world applications of the SVM?
A) Text and Hypertext Categorization
B) Image Classification
C) Clustering of News Articles
D) All of the above
Ans:-D
Que19:- Choose the correct statement.
A) A hard margin means that an SVM is very rigid in classification and
B) A hard margin means that an SVM tries to work extremely well in the training set,
causing overfitting.
C) Both A and B
D) None of these
Ans:- C
Que20:- Soft margin allow SVM to make a certain number of mistakes and keep margin as
wide as possible so that other points can still be classified correctly.
A) True
B) False
Ans:- True
UNIT-4
Que1:- What is the minimum no. of variables/ features required to perform clustering?
A) 0
B) 1
C) 2
D) 3
Ans:- B
Que2:- Suppose we would like to perform clustering on spatial data such as the geometrical
locations of houses. We wish to produce clusters of many different sizes and shapes. Which
of the following methods is the most appropriate?
A. Decision Trees
B. Model-based clustering
C. K-means clustering
D. Density-based clustering
Ans:- D
Que3:- In which of the following cases will K-means clustering fail to give good results?
1. Data points with outliers
2. Data points with different densities
3. Data points with nonconvex shapes
A. 1 & 2
B. 1, 2, & 3
C. 2 & 3
D. 1 & 3
Ans:-B
Que4:- For two runs of K-Means clustering is it expected to get same clustering results?
A : TRUE
B : FALSE
Ans:- B
Que5:- Can decision trees be used for performing clustering?
A ) TRUE
B) FALSE
Ans:- A
Que6:- _____ is a clustering procedure where all objects start out in one giant cluster.
Clusters are formed by dividing this cluster into smaller and smaller clusters.
A : Non-hierarchical clustering
B : Divisive clustering
C : Agglomerative clustering
D : K-means clustering
Ans:- B
Que7:- The k-means algorithm…
A : always converges to a clustering that minimizes the mean-square vector-representative
distance
B : can converge to different final clustering, depending on initial choice of representatives
C : is typically done by hand, using paper and pencil
D : should only be attempted by trained professionals
Ans:- B
Que8:- Which of the following is required by K-means clustering?
A : defined distance metric
B : number of clusters
C : initial guess as to cluster centroids
D : All of the above
Ans:- D
Que9:- What is true about single linkage hierarchical clustering?
A : we merge in each step the two clusters, whose two closest members have the smallest
distance.
B : we merge in the members of the clusters in each step, which provide the smallest
maximum pairwise distance.
C : the distance between two clusters is defined as the average distance between each point
in one cluster to every point in the other cluster.
D : none of the above
Ans:- A
Que10:- What is true about complete linkage hierarchical clustering?
A : we merge in each step the two clusters, whose two closest members have the smallest
distance.
B : we merge in the members of the clusters in each step, which provide the smallest
maximum pairwise distance.
C : the distance between two clusters is defined as the average distance between each point
in one cluster to every point in the other cluster.
D : none of the above
Ans:- B
Que11:- Agglomerative clustering is
A : the initial state is a single cluster with all samples and the process proceeds by splitting
the intermediate cluster until all elements are separated
B : process starts from the bottom (each initial cluster is made up of a single element) and
proceeds by merging the clusters until a stop criterion is reached.
C : requires prior knowledge of no. of clusters you want to divide your data into.
D : None of the above
Ans:- B
Que12:- What could be the possible reason(s) for producing two different dendrograms
using agglomerative clustering algorithm for the same dataset?
A : Proximity function used
B : of data points used
C : of variables used
D : all of the above
Ans:- D
Que13:- Which of the following is finally produced by Hierarchical Clustering?
A : final estimate of cluster centroids
B : tree showing how close things are to each other
C : assignment of each point to clusters
D : all of the mentioned
Ans:- B
Que14:- Which of the following clustering requires merging approach?
A : Partitional
B : Hierarchical
C : Naive Bayes
D : None of the mentioned
Ans:- B
Que15:- Using K-means clustering to cluster the following data into two groups.
{ 2, 4, 10, 12, 3, 20, 30, 11, 25 }. Assume cluster centroid are m1=2 and m2=4.
The distance function used is Euclidean distance. The final clusters are
A) C1= {2, 3, 4, 10, 12, 11} and C2= {20, 30, 25}
B) C1= {2, 3, 4, 10, 11} and C2= {12, 20, 30, 25}
C) C1= {2, 3, 4, 10, 12 } and C2= {11, 20, 30, 25}
D) C1= {2, 3, 4, 10, 12, 11, 25} and C2= {20, 30}
Ans:- A
Que16:- Which of the following statements is true for Partitioning Methods?
A) These methods consider the clusters as the dense region having some similarities and
differences from the lower dense region of the space.
B) The clusters formed in this method form a tree-type structure based on the hierarchy.
C) These methods partition the objects into k clusters and each partition forms one
cluster.
D) All of the above
Ans:- C
Que17:- Agglomerative hierarchical clustering having
A) Bottom-up approach
B) Top-down approach
C) Left-right approach
D) Both A and B
Ans:- A
Que18:- Point out the correct statement.
A) The choice of an appropriate metric will influence the shape of the clusters
B) Hierarchical clustering is also called HCA
C) In general, the merges and splits are determined in a greedy manner
D) All of the mentioned
Ans:- D
Que19:- What is the advantage of hierarchical clustering over K-means clustering?
A) Hierarchical clustering is computationally faster than K-means clustering.
B) You don't have to assign the number of clusters from the beginning in the case of
hierarchical clustering.
C) None of the above
D) There is no difference. Both are equally proficient.
Ans:- B
Que20:- Which of the following tasks can be best solved using Clustering.
A) Predicting the amount of rainfall based on various cues
B) Detecting fraudulent credit card transactions
C) Training a robot to solve a maze
D) All of the above
Ans:- B
UNIT-5
Que1:- Which of the following algorithms cannot be used for reducing the dimensionality
of data?
A. t-SNE
B. PCA
C. LDA False
D. None of these
Ans:- (D)
Que2:- PCA can be used for projecting and visualizing data in lower dimensions.
A. TRUE
B. FALSE
Ans:- (A)
Que3:- What will happen when eigen values are roughly equal?
A. PCA will perform outstandingly
B. PCA will perform badly
C. Can’t Say
D.None of above
Ans:- (B)
Que4:- Which of the following method would result into better class prediction?
A. Building a classification algorithm with PCA (A principal component in direction
of PCA)
B. Building a classification algorithm with LDA
C. Can’t say
D. None of these
Ans:- (B)
Que5:- What does dimensionality reduction reduce?
A. stochastics
B. collinerity
C. performance
D. entropy
Ans:- B
Que6:- Which of the following techniques would perform better for reducing dimensions of
a data set?
A. Removing columns which have too many missing values
B. Removing columns which have high variance in data
C. Removing columns with dissimilar data trends
D. None of these
Ans:- A
Que7:- ______________is a dimensionality reduction technique which is commonly used
for the supervised classification problems.
A. Value analysis
B. Function Analysis
C. Pure analysis
D. None of these
Ans : B
Que8:- ________is an important factor in predictive modeling
A. Dimensionality Reduction
B. feature selection
C. feature extraction
D. None of these
Ans. A
Que9:- Feature selection has _____different approaches.
A. 2
B. 3
C. 4
D. 5
Ans. C
Que10:- Parameters for Feature Selection are classified on ____ factors.
A. 3
B. 2
C. 4
D. 5
Ans : B
Que11:- In PCA the number of input dimension are equal to principal components
A. true
B. false
Ans:- A
Que12:- Dimensionality Reduction Algorithms are one of the possible ways to reduce the
computation time required to build a model
A. true
B. false
Ans:- A
Que13: Following are the main methods for reducing dimensionality
I) Feature Selection
II) Feature Extraction
III) Feature Mapping
IV) Attribute Mapping
A) Both I and III
B) Both II and III
C) Both I and II
D) Both III and IV
Ans:- C
Que14: The best known and most widely used feature extraction methods are principal
component analysis and linear discriminant analysis.
A) True
B) False
Ans:- A
Que15:- In ___________, we are interested in finding the best subset of the set of features.
A) Subset selection
B) Feature selection
C) Feature extraction
D) None of the above
Que16:- Pick out the correct statement
A) Principal component analysis is a supervised method
B) Linear discriminant analysis is an unsupervised method
C) Principal component analysis and Linear discriminant analysis both are linear
projection methods.
D) Subset selection is an unsupervised method.
Ans:- C
Que17:- PCA explains variance and is sensitive to outliers.
A) True
B) False
Ans:- True
Que18:- What is the use of factor analysis?
A) It can be used for knowledge extraction when we find the loadings
B) Try to express the variables using fewer factors.
C) Both A and B
D) None of the above
Ans:- C
Que19:- PCA reduces the dimension by finding a few________.
A. Hexagonal linear combination
B. Orthogonal linear combinations
C. Octagonal linear combination
D. Pentagonal Linear Combination
Ans : B
Que20:- ________is a tool which is used to reduce the dimension of the data.
A. Principal components analysis
B. Product Components analysis
C. Principle Components analysis
D. Pre Complex analysis
Ans : A