0% found this document useful (0 votes)

203 views63 pages

This Sheet Is For 1 Mark Questions S.R No

This document is a sheet containing 131 multiple choice questions for a 1 mark exam. The questions cover a range of machine learning topics including supervised learning, unsupervised learning, reinforcement learning, linear regression, Naive Bayes classification, support vector machines, and feature selection.

Uploaded by

Sohel Sutriya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as XLSX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

203 views63 pages

This Sheet Is For 1 Mark Questions S.R No

Uploaded by

Sohel Sutriya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as XLSX, PDF, TXT or read online on Scribd

You are on page 1/ 63

This sheet is for 1 Mark questions

S.r No

15
16

20
21

25
26

33
34

35
36
37
38
39

42
43

45
46
47
48
49

50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65

66
67
68

78
79

81
82
83
84

90
91

97
98

100

101
102

103

104

105

106
107

108

109

110
111

112

113
114
115

116
117

118

119
120
121

122
123

124

125
126
127

128

129
130
This sheet is for 1 Mark questions
Question
Write down question

In reinforcement learning if feedback is negative one it is defined as____.

According to____ , it’s a key success factor for the survival and evolution of all species.

How can you avoid overfitting ?

What are the popular algorithms of Machine Learning?

What is ‘Training set’?

Common deep learning applications include____

what is the function of ‘Supervised Learning’?

Commons unsupervised applications include

Reinforcement learning is particularly efficient when______________.

if there is only a discrete number of possible outcomes (called categories),

the process becomes a______.

Which of the following are supervised learning applications

During the last few years, many ______ algorithms have been applied to deep
neural networks to learn the best policy for playing Atari video games and to teach an agent how to
associate the right action with an input representing the state.

Which of the following sentence is correct?

What is ‘Overfitting’ in Machine learning?

What is ‘Test set’?

________is much more difficult because it's necessary to determine a supervised strategy to train a
model for each feature and, finally, to predict their value

How it's possible to use a different placeholder through the parameter_______.

If you need a more powerful scaling feature, with a superior control on outliers and the possibility to
select a quantile range, there's also the class________.
scikit-learn also provides a class for per-sample normalization, Normalizer. It can apply________to
each element of a dataset
There are also many univariate methods that can be used in order to select the best features according to
specific criteria based on________.
Which of the following selects only a subset of features belonging to a certain percentile

________performs a PCA with non-linearly separable data sets.

A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a college.
Which of the following statement is true in following case?

What would you do in PCA to get the same projection as SVD?

What is PCA, KPCA and ICA used for?

Can a model trained for item based similarity also choose from a given set of items?

What are common feature selection methods in regression task?

The parameter______ allows specifying the percentage of elements to put into the test/training set

In many classification problems, the target ______ is made up of categorical labels which cannot
immediately be processed by any algorithm.
_______adopts a dictionary-oriented approach, associating to each category label a progressive integer
number.

If Linear regression model perfectly first i.e., train error is zero, then _____________________

Which of the following metrics can be used for evaluating regression models?i) R Squaredii) Adjusted R Squarediii) F Statis

How many coefficients do you need to estimate in a simple linear regression model (One independent variable)?
In a simple linear regression model (One independent variable), If we change the input variable by 1 unit. How much outpu

Function used for linear regression in R is __________

In syntax of linear model lm(formula,data,..), data refers to ______
In the mathematical Equation of Linear Regression Y = β1 + β2X + ϵ, (β1, β2) refers to __________
Linear Regression is a supervised machine learning algorithm.
It is possible to design a Linear regression algorithm using a neural network?

Which of the following methods do we use to find the best fit line for data in Linear Regression?

Which of the following evaluation metrics can be used to evaluate a model while modeling a
continuous output variable?

Which of the following is true about Residuals ?

Overfitting is more likely when you have huge amount of data to train?

Which of the following statement is true about outliers in Linear regression?

Suppose you plotted a scatter plot between the residuals and predicted values in linear regression and
you found that there is a relationship between them. Which of the following conclusion do you make
about this situation?
Naive Bayes classifiers are a collection ------------------of algorithms
Naive Bayes classifiers is _______________ Learning
Features being classified is independent of each other in Naïve Bayes Classifier
Features being classified is __________ of each other in Naïve Bayes Classifier

In given image, P(H|E) is__________probability.

In given image, P(H) is__________probability.
Conditional probability is a measure of the probability of an event given that another event has already occurred.
Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the
Bernoulli Naïve Bayes Classifier is ___________distribution
Multinomial Naïve Bayes Classifier is ___________distribution
Gaussian Naïve Bayes Classifier is ___________distribution
Binarize parameter in BernoulliNB scikit sets threshold for binarizing of sample features.
Gaussian distribution when plotted, gives a bell shaped curve which is symmetric about the _______ of the feature values.
SVMs directly give us the posterior probabilities P(y = 1jx) and P(y = 􀀀1jx)
Any linear combination of the components of a multivariate Gaussian is a univariate Gaussian.
Solving a non linear separation problem with a hard margin Kernelized SVM (Gaussian RBF Kernel) might lead to overfitting
SVM is a ------------------ algorithm
SVM is a ------------------ learning
The linear SVM classifier works by drawing a straight line between two classes

Which of the following function provides unsupervised prediction ?

Which of the following is characteristic of best machine learning method ?
What are the different Algorithm techniques in Machine Learning?

What is the standard approach to supervised learning?

Which of the following is not Machine Learning?

What is Model Selection in Machine Learning?

Which are two techniques of Machine Learning ?

Even if there are no actual supervisors ________ learning is also based on feedback provided
by the environment

What does learning exactly mean?

When it is necessary to allow the model to develop a generalization ability and avoid a common
problem called______.

Techniques involve the usage of both labeled and unlabeled data is called___.

In reinforcement learning if feedback is negative one it is defined as____.

According to____ , it’s a key success factor for the survival and evolution of all species.
A supervised scenario is characterized by the concept of a _____.
overlearning causes due to an excessive ______.

Which of the following is an example of a deterministic algorithm?

Which of the following model model include a backwards elimination feature selection routine?
Can we extract knowledge without apply feature selection
While using feature selection on the data, is the number of features decreases.
Which of the following are several models for feature extraction

_____ provides some built-in datasets that can be used for testing purposes.

While using _____ all labels are

turned into sequential numbers.
_______produce sparse matrices of real numbers that can be fed into any machine learning
model.
scikit-learn offers the class______, which is responsible for filling the holes using a strategy
based on the mean, median, or frequency
Which of the following scale data by removing elements that don't belong to a given range or
by considering a maximum absolute value.
scikit-learn also provides a class for per-sample normalization,_____
______dataset with many features contains information proportional to the independence of all
features and their variance.
In order to assess how much information is brought by each component, and the correlation
among them, a useful tool is the_____.
The_____ parameter can assume different values which determine how the data matrix is
initially processed.

______allows exploiting the natural sparsity of data while extracting principal components.

Which of the following evaluation metrics can be used to evaluate a model while modeling a
continuous output variable?

Which of the following is true about Residuals ?

Overfitting is more likely when you have huge amount of data to train?

Which of the following statement is true about outliers in Linear regression?

Let’s say, a “Linear regression” model perfectly fits the training data (train error is zero). Now, Which
of the following statement is true?
In a linear regression problem, we are using “R-squared” to measure goodness-of-fit. We add a feature
in linear regression model and retrain the same model.Which of the following option is true?

Which of the one is true about Heteroskedasticity?

Which of the following assumptions do we make while deriving linear regression parameters?1. The
true relationship between dependent y and predictor x is linear2. The model errors are statistically
independent3. The errors are normally distributed with a 0 mean and constant standard deviation4. The
predictor x is non-stochastic and is measured error-free
To test linear relationship of y(dependent) and x(independent) continuous variables, which of the
following plot best suited?

which of the following step / assumption in regression modeling impacts the trade-off between
under-fitting and over-fitting the most.

Can we calculate the skewness of variables based on mean and median?

Which of the following is true about “Ridge” or “Lasso” regression methods in case of feature
selection?

Which of the following statement(s) can be true post adding a variable in a linear regression model?1.
R-Squared and Adjusted R-squared both increase2. R-Squared increases and Adjusted R-squared
decreases3. R-Squared decreases and Adjusted R-squared decreases4. R-Squared decreases and
Adjusted R-squared increases
How many coefficients do you need to estimate in a simple linear regression model (One independent
variable)?
In given image, P(H) is__________probability.
Conditional probability is a measure of the probability of an event given that another event has
already occurred.

Gaussian distribution when plotted, gives a bell shaped curve which is symmetric about the _______ of the feature
SVMs directly give us the posterior probabilities P(y = 1jx) and P(y = 􀀀1jx)
SVM is a ------------------ algorithm
What is/are true about kernel in SVM?1. Kernel function map low dimensional data to high
dimensional space2. It’s a similarity function
Suppose you are building a SVM model on data X. The data X can be error prone which means
that you should not trust any specific data point too much. Now think that you want to build a
SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack
variable C as one of it’s hyper parameter.What would happen when you use very small C
(C~0)?

The cost parameter in the SVM means:

Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions
that might be related to the event.
Bernoulli Naïve Bayes Classifier is ___________distribution
If you remove the non-red circled points from the data, the decision boundary will change?

How do you handle missing or corrupted data in a dataset?

Binarize parameter in BernoulliNB scikit sets threshold for binarizing of sample features.

Which of the following statements about Naive Bayes is incorrect?

The SVM’s are less effective when:

Naive Bayes classifiers is _______________ Learning

Features being classified is independent of each other in Naïve Bayes Classifier
Features being classified is __________ of each other in Naïve Bayes Classifier

Bayes Theorem is given by where 1. P(H) is the probability of hypothesis H being true.
2. P(E) is the probability of the evidence(regardless of the hypothesis).
3. P(E|H) is the probability of the evidence given that hypothesis is true.
4. P(H|E) is the probability of the hypothesis given that the evidence is there.
Any linear combination of the components of a multivariate Gaussian is a univariate Gaussian.
Image a b c d Correct Answer
img.jpg Option a Option b Option c Option d a/b/c/d
None of
Penalty Overlearning Reward A
above
Claude Shannon's Darwin’s None of
Gini Index C
theory theory above

By using By using
By using a lot of None of
inductive validation A
data above
machine learning only

Probabilistic
Decision Trees and Support
networks and
Neural Networks vector All D
Nearest
(back propagation) machines
Neighbor
Training set is used A set of data is
to test the accuracy used to discover
None of
of the hypotheses the potentially Both A & B B
above
generated by the predictive
learner. relationship.
Autonomous car
Image classification, Bioinformatic
driving,
Real-time visual s, All above D
Logistic
tracking Speech
optimization
Classifications, Speech recognition
None of
Predict time series, recognition, Both A & B C
above
Annotate strings Regression
Similarity Automatic
Object segmentation All above D
detection labeling
it's
the environment is impossible to
it's often very
not completely have a All above D
dynamic
deterministic precise error
measure
Regression Classification. Modelfree Categories B

Spam detection, Image Autonomous

Bioinformatic
Pattern detection, classification, car driving,
s, A
Natural Language Real-time visual Logistic
Speech
Processing tracking optimization
recognition

Machine learning Data mining can None of

Logical Classical Classification D
relates with the be defined as the above
study, design and process in which
development of the the unstructured
algorithms that give data tries to None of the
Both A & B C
computers the extract above
when a statistical
capability to learn Robots
knowledge are or
While a set of data
model
withoutdescribes
being programed
unknown so involving the is used to
random
explicitlyerror or that they can
interesting process of discover the
noise instead of
programmed. perform
patterns. the task A
learning potentially
underlying based on data
It is a set of data
Test set is used to ‘overfitting’ predictive
relationship they
is used to from
gather
test the accuracy of occurs. relationship
‘overfitting’ occurs. sensors.
discover the None of
the hypotheses Both A & B A
potentially above
generated by the
predictive
learner.
relationship.
Using an
automatic
Creating sub- strategy to
Removing the
model to predict input them All above B
whole line
those features according to
the other
known values

missing_valu
regression classification random_state D
es
LabelBinariz FeatureHashe
RobustScaler DictVectorizer A
er r
max, l0 and l1 max, l1 and l2 max, l2 and max, l3 and
B
norms norms l3 norms l4 norms

F-tests and p-values chi-square ANOVA All above A

SelectPercentile FeatureHasher SelectKBest All above A

None of the
SparsePCA KernelPCA SVD B
Mentioned

It doesn’t
Feature F1 is an Feature F1 is an
belong to any
example of nominal example of Both of these B
of the above
variable. ordinal variable.
category.

Transform data to Transform data

Not possible None of these A
zero mean to zero median
Kernel based
Principal Independent
Principal
Components Component All above D
Component
Analysis Analysis
Analysis
YES NO A
correlation Greedy
All above None of these C
coefficient algorithms

test_size training_size All above None of these C

random_state dataset test_size All above B

LabelBinarizer DictVectorize FeatureHashe

LabelEncoder class A
class r r

a) Test error is also b) Test error is noc) Couldn’t co d) Test error i c

Adjusted R Squarediii) F Statist a) ii and iv b) i and ii c) ii, iii and iv d) i, ii, iii and i d

dependent variable)? a) 1 b) 2 c) 3 d) 4 b
ble by 1 unit. How much output a) by 1 b) no change c) by interceptd) by its slope d

a) lm(formula, data) b) lr(formula, datac) lrm(formula,d) regression.la

a) Matrix b) Vector c) Array d) List b
a) (X-intercept, Slopeb) (Slope, X-Intercc) (Y-Interceptd) (slope, Y-In c
A) TRUE B) FALSE a
A) TRUE B) FALSE a
C)
A) Least Square B) Maximum Logarithmic D) Both A and
Error Likelihood Loss B a

D) Mean-
Squared-
A) AUC-ROC B) Accuracy C) Logloss Error d
C) A or B
B) Higher is depend on D) None of
A) Lower is better better the situation these a
A) TRUE B) Linear
B) FALSE b
A) Linear regression regression is not
is sensitive to sensitive to D) None of
outliers outliers C) Can’t say these a
B) Since the
A) Since the there is there is a
a relationship relationship
means our model is means our D) None of
not good model is good C) Can’t say these a
Classification Clustering Regression All a
Supervised Unsupervised Both None a
False 1 b
Independent Dependent Partial DependNone a

bayes.jpg True 0 a
bayes.jpg Posterior Prior a
bayes.jpg Posterior Prior b
has already occurred. True 0 a
ns that might be related to the e True 0 a
Continuous Discrete Binary c
Continuous Discrete Binary b
Continuous Discrete Binary a
True 0 a
_______ of the feature values. Mean Variance Discrete Random a
True 0 b
True 0 a
ernel) might lead to overfitting True 0 a
Classification Clustering Regression All a
Supervised Unsupervised Both None a
True 0 a
None of the
-- cl_forecastB cl_nowcastC cl_precastD D
Mentioned
-- fast accuracy scalable All above D
Supervised
Unsupervised
Learning and None of the
-- Learning and Both A & B C
Semi-supervised Mentioned
Transduction
Learning

a set of
split the set of group the set of observed
learns
example into the example into instances
-- programs A
training set and the the training set tries to
from data
test and the test induce a
general rule
Artificial Rule based None of the
-- Both A & B B
Intelligence inference Mentioned

when a Find
The process of
statistical interesting
selecting models
model directions in
among different
describes data and
-- mathematical All above A
random error or find novel
models, which are
noise instead of observations
used to describe
underlying / database
the same data set
relationship cleaning

Genetic Speech
None of the
-- Programming and recognition and Both A & B A
Mentioned
Inductive Learning Regression

Unsupervise None of the

-- Supervised Reinforcement Learning is B
d above
the ability to
It is a set of
Robots are A set of data is change
data is used
programed so that used to according to
to discover
they can perform discover the external
-- the C
the task based on potentially stimuli and
potentially
data they gather predictive rememberin
predictive
from sensors. relationship. g most of all
relationship.
previous
Classificatio
-- Overfitting Overlearning experiences. Regression A
n
Semi- Unsupervise None of the
-- Supervised B
supervised d above
None of
-- Penalty Overlearning Reward A
above
Claude Shannon's Darwin’s None of
-- Gini Index C
theory theory above
-- Programmer Teacher Author Farmer B
Reinforceme
-- Capacity Regression Accuracy A
nt
None of the
-- PCA K-Means A
above
-- MCV MARS MCRS All above B
-- YES NO A
-- NO YES B
None of the
-- regression classification C
above
None of the
-- scikit-learn classification regression A
above
LabelEncoder LabelBinarizer DictVectoriz FeatureHash
-- A
class class er er
None of the
-- DictVectorizer FeatureHasher Both A & B C
Mentioned
DictVectoriz
-- LabelEncoder LabelBinarizer Imputer D
er
None of the
-- MinMaxScaler MaxAbsScaler Both A & B C
Mentioned
-- Normalizer Imputer Classifier All above A
None of the
-- normalized unnormalized Both A & B B
Mentioned
Convergance Supportive Covariance
-- Concuttent matrix D
matrix matrix matrix

-- run start stop C

init
init
-- SparsePCA KernelPCA SVD A
parameter
Mean-
-- AUC-ROC Accuracy Logloss Squared- D
Error
A or B
-- Lower is better Higher is better depend on the None of these A
situation
-- 1 0 B

Linear regression
Linear regression is
-- is not sensitive to Can’t say None of these A
sensitive to outliers
outliers

Since the there is a Since the there is

relationship means a relationship
-- Can’t say None of these A
our model is not means our model
good is good

You can not

You will always None of the
-- have test error C
have test error zero above
zero
Individually
R squared
cannot tell
If R Squared If R Squared about
increases, this decreases, this variable None of
-- C
variable is variable is not importance. these.
significant. significant. We can’t say
anything
about it right
now.

Linear Linear
Linear Regression
Regression with Regression
-- with varying error None of these A
constant error with zero
terms
terms error terms

-- 1,2 and 3. 1,3 and 4. 1 and 3. All of above. D

-- Scatter plot Barchart Histograms None of these A

Whether we
learn the
weights by The use of a
The polynomial
-- matrix constant- A
degree
inversion or term
gradient
descent
-- 1 0 B
Lasso regression Both use
Ridge regression
uses subset subset None of
-- uses subset selection B
selection of selection of above
of features
features features

None of the
-- 1 and 2 1 and 3 2 and 4 A
above

-- 1 2 Can’t Say B

bayes.jpg Posterior Prior B

-- True 0 A

-- Mean Variance Discrete Random A

-- True 0 B
-- Classification Clustering Regression All A
None of
-- 1 2 1 and 2 C
these
Data will be
Misclassification None of
-- correctly Can’t say A
would happen these
classified

The tradeoff
between
The number of
The kernel to misclassifica None of the
-- cross-validations C
be used tion and above
to be made
simplicity of
the model

-- True 0 A
-- Continuous Discrete Binary C
svm.jpg 1 0 B
b. Replace c. Assign a
missing values unique
a. Drop missing d. All of the
-- with category to D
rows or columns above
mean/median/m missing
ode values
-- True 0 A

C.     Attribu
B.     Attributes tes are
are statistically statistically D.     Attribu
A.     Attributes
dependent of independent tes can be
-- are equally B
one another of one nominal or
important.
given the class another numeric
value. given the
class value.

The data is
The data is noisy and
The data is
-- clean and ready contains C
linearly separable
to use overlapping
points
-- Supervised Unsupervised Both None A
-- False 1 B
Partial
-- Independent Dependent None A
Dependent

bayes.jpg True 0 A

-- True 0 A
This sheet is for 2 Mark questions
S.r No Question
e.g 1 Write down question
1 A supervised scenario is characterized by the concept of a _____.
2 overlearning causes due to an excessive ______.

3 If there is only a discrete number of possible outcomes called _____.

4 What is the standard approach to supervised learning?

Some people are using the term ___ instead of prediction only to avoid the weird idea that machine
5
learning is a sort of modern magic.

The term _____ can be freely used, but with the same meaning adopted in physics or system
6
theory.

7 Which are two techniques of Machine Learning ?

Even if there are no actual supervisors ________ learning is also based on feedback provided by
8
the environment

9 Common deep learning applications / problems can also be solved using____

10 Identify the various approaches for machine learning.

11 what is the function of ‘Unsupervised Learning’?

12 What are the two methods used for the calibration in Supervised Learning?

13 What is the standard approach to supervised learning?

14 Which of the following is not Machine Learning?

15 What is Model Selection in Machine Learning?

16 _____ provides some built-in datasets that can be used for testing purposes.

While using _____ all labels are

17
turned into sequential numbers.
18 _______produce sparse matrices of real numbers that can be fed into any machine learning model.

scikit-learn offers the class______, which is responsible for filling the holes using a strategy based
19
on the mean, median, or frequency
Which of the following scale data by removing elements that don't belong to a given range or by
20
considering a maximum absolute value.

21 Which of the following model model include a backwards elimination feature selection routine?

22 Can we extract knowledge without apply feature selection

23 While using feature selection on the data, is the number of features decreases.

24 Which of the following are several models for feature extraction

25 scikit-learn also provides a class for per-sample normalization,_____

______dataset with many features contains information proportional to the independence of all
26
features and their variance.
In order to assess how much information is brought by each component, and the correlation among
27
them, a useful tool is the_____.
The_____ parameter can assume different values which determine how the data matrix is initially
28
processed.
29 ______allows exploiting the natural sparsity of data while extracting principal components.

30 Which of the following is an example of a deterministic algorithm?

31 Let’s say, a “Linear regression” model perfectly fits the training data (train error is zero). Now,
Which of the following statement is true?

In a linear regression problem, we are using “R-squared” to measure goodness-of-fit. We add a

feature in linear regression model and retrain the same model.Which of the following option is
true?

33
Which of the one is true about Heteroskedasticity?

34 Which of the following assumptions do we make while deriving linear regression parameters?1.
The true relationship between dependent y and predictor x is linear2. The model errors are
statistically independent3. The errors are normally distributed with a 0 mean and constant
standard deviation4. The predictor x is non-stochastic and is measured error-free

35 To test linear relationship of y(dependent) and x(independent) continuous variables, which of the
following plot best suited?

36 Generally, which of the following method(s) is used for predicting continuous dependent
variable?1. Linear Regression2. Logistic Regression
Suppose you are training a linear regression model. Now consider these points.1. Overfitting is
37 more likely if we have less data2. Overfitting is more likely when the hypothesis space is
small.Which of the above statement(s) are correct?

Suppose we fit “Lasso Regression” to a data set, which has 100 features (X1,X2…X100). Now, we
38 rescale one of these feature by multiplying with 10 (say that feature is X1), and then refit Lasso
regression with the same regularization parameter.Now, which of the following option will be
correct?

39
Which of the following is true about “Ridge” or “Lasso” regression methods in case of feature
selection?

Which of the following statement(s) can be true post adding a variable in a linear regression
40 model?1. R-Squared and Adjusted R-squared both increase2. R-Squared increases and Adjusted
R-squared decreases3. R-Squared decreases and Adjusted R-squared decreases4. R-Squared
decreases and Adjusted R-squared increases

We can also compute the coefficient of linear regression with the help of an analytical method
41 called “Normal Equation”. Which of the following is/are true about “Normal Equation”?1. We
don’t have to choose the learning rate2. It becomes slow when number of features is very large3.
No need to iterate

42 How many coefficients do you need to estimate in a simple linear regression model (One
independent variable)?
43 If two variables are correlated, is it necessary that they have a linear relationship?
44 Correlated variables can have zero correlation coeffficient. True or False?

Which of the following option is true regarding “Regression” and “Correlation” ?Note: y is
dependent variable and x is independent variable.

46 What is/are true about kernel in SVM?1. Kernel function map low dimensional data to high
dimensional space2. It’s a similarity function
47 Suppose you are building a SVM model on data X. The data X can be error prone which means that you should no
48 Suppose you are using a Linear SVM classifier with 2 class classification problem. Now you have be
49 If you remove the non-red circled points from the data, the decision boundary will change?

When the C parameter is set to infinite, which of the following holds true?
51
Suppose you are building a SVM model on data X. The data X can be error prone which means that you should no

52
SVM can solve linear and non-linear problems

The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space(N — the

54
Hyperplanes are _____________boundaries that help classify the data points.
55
The _____of the hyperplane depends upon the number of features.
56
Hyperplanes are decision boundaries that help classify the data points.
57 SVM algorithms use a set of mathematical functions that are defined as the kernel.
58
In SVM, Kernel function is used to map a lower dimensional data into a higher dimensional data.
59
In SVR we try to fit the error within a certain threshold.

When the C parameter is set to infinite, which of the following holds true?

61
How do you handle missing or corrupted data in a dataset?

What is the purpose of performing cross-validation?

Which of the following is true about Naive Bayes ?

Which of the following statements about Naive Bayes is incorrect?

65 Which of the following is not supervised learning?

66 How can you avoid overfitting ?

67 What are the popular algorithms of Machine Learning?

68 What is ‘Training set’?

69 Identify the various approaches for machine learning.

70 what is the function of ‘Unsupervised Learning’?

71 What are the two methods used for the calibration in Supervised Learning?

______can be adopted when it's necessary to categorize a large amount of data with a few
72
complete examples or when there's the need to impose some constraints to a clustering algorithm.

73 In reinforcement learning, this feedback is usually called as___.

In the last decade, many researchers started training bigger and bigger models, built with several
74
different layers that's why this approach is called_____.
there's a growing interest in pattern recognition and associative memories whose structure and
75 functioning are similar to what happens in the neocortex. Such an approach also allows simpler
algorithms called _____

76 ______ showed better performance than other approaches, even without a context-based model

77 Common deep learning applications / problems can also be solved using____

Some people are using the term ___ instead of prediction only to avoid the weird idea that machine
78
learning is a sort of modern magic.
The term _____ can be freely used, but with the same meaning adopted in physics or system
79
theory.

80 If there is only a discrete number of possible outcomes called _____.

A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a
81 college.
Which of the following statement is true in following case?

82 What would you do in PCA to get the same projection as SVD?

83 What is PCA, KPCA and ICA used for?

84 Can a model trained for item based similarity also choose from a given set of items?
85 What are common feature selection methods in regression task?

86 The parameter______ allows specifying the percentage of elements to put into the test/training set

In many classification problems, the target ______ is made up of categorical labels which cannot
87
immediately be processed by any algorithm.
_______adopts a dictionary-oriented approach, associating to each category label a progressive
88
integer number.

________is much more difficult because it's necessary to determine a supervised strategy to train a
89
model for each feature and, finally, to predict their value

90 How it's possible to use a different placeholder through the parameter_______.

If you need a more powerful scaling feature, with a superior control on outliers and the possibility
91
to select a quantile range, there's also the class________.
scikit-learn also provides a class for per-sample normalization, Normalizer. It can
92
apply________to each element of a dataset
There are also many univariate methods that can be used in order to select the best features
93
according to specific criteria based on________.
94 Which of the following selects only a subset of features belonging to a certain percentile

95 ________performs a PCA with non-linearly separable data sets.

96 If two variables are correlated, is it necessary that they have a linear relationship?
97 Correlated variables can have zero correlation coeffficient. True or False?
Suppose we fit “Lasso Regression” to a data set, which has 100 features (X1,X2…X100). Now,
we rescale one of these feature by multiplying with 10 (say that feature is X1), and then refit
98
Lasso regression with the same regularization parameter.Now, which of the following option will
be correct?

99 If Linear regression model perfectly first i.e., train error is zero, then _____________________

Which of the following metrics can be used for evaluating regression models?i) R Squaredii)
100
Adjusted R Squarediii) F Statisticsiv) RMSE / MSE / MAE
101 In syntax of linear model lm(formula,data,..), data refers to ______
102 Linear Regression is a supervised machine learning algorithm.
103 It is possible to design a Linear regression algorithm using a neural network?

104 Which of the following methods do we use to find the best fit line for data in Linear Regression?
Suppose you are training a linear regression model. Now consider these points.1. Overfitting is
105 more likely if we have less data2. Overfitting is more likely when the hypothesis space is
small.Which of the above statement(s) are correct?
We can also compute the coefficient of linear regression with the help of an analytical method
called “Normal Equation”. Which of the following is/are true about “Normal Equation”?1. We
106
don’t have to choose the learning rate2. It becomes slow when number of features is very large3.
No need to iterate

Which of the following option is true regarding “Regression” and “Correlation” ?Note: y is
107
dependent variable and x is independent variable.

In a simple linear regression model (One independent variable), If we change the input variable by
108
1 unit. How much output variable will change?
Generally, which of the following method(s) is used for predicting continuous dependent variable?
109
1. Linear Regression2. Logistic Regression
How many coefficients do you need to estimate in a simple linear regression model (One
110
independent variable)?

Suppose you are building a SVM model on data X. The data X can be error prone which
means that you should not trust any specific data point too much. Now think that you want
111 to build a SVM model which has quadratic kernel function of polynomial degree 2 that
uses Slack variable C as one of it’s hyper parameter.What would happen when you use
very large value of C(C->infinity)?
112 SVM can solve linear and non-linear problems
The objective of the support vector machine algorithm is to find a hyperplane in an N-
113
dimensional space(N — the number of features) that distinctly classifies the data points.
114 Hyperplanes are _____________boundaries that help classify the data points.

115 When the C parameter is set to infinite, which of the following holds true?

116 SVM is a ------------------ learning

117 The linear SVM classifier works by drawing a straight line between two classes
In a real problem, you should check to see if the SVM is separable and then include slack
118
variables if it is not separable.

119 Which of the following are real world applications of the SVM?
120 The _____of the hyperplane depends upon the number of features.
121 Hyperplanes are decision boundaries that help classify the data points.
122 SVM algorithms use a set of mathematical functions that are defined as the kernel.
123 Naive Bayes classifiers are a collection ------------------of algorithms
124 In given image, P(H|E) is__________probability.
Solving a non linear separation problem with a hard margin Kernelized SVM (Gaussian
125
RBF Kernel) might lead to overfitting
100 people are at party. Given data gives information about how many wear pink or not,
126
and if a man or not. Imagine a pink wearing guest leaves, was it a man?
127 For the given weather data, Calculate probability of playing
In SVM, Kernel function is used to map a lower dimensional data into a higher
128
dimensional data.
129 In SVR we try to fit the error within a certain threshold.

130 When the C parameter is set to infinite, which of the following holds true?
Image a b c
img.jpg Option a Option b Option c
Programmer Teacher Author
Capacity Regression Reinforcement

Modelfree Categories Prediction

a set of observed
split the set of example
group the set of example into the instances tries to
into the training set and
training set and the test induce a general
the test
rule

Inference Interference Accuracy

Accuracy Cluster Regression

Genetic Programming
Speech recognition and
and Both A & B
Regression
Inductive Learning

Supervised Reinforcement Unsupervised

Real-time visual object

Classic approaches Automatic labeling
identification
Inductive Vs
Concept Vs
Find the data Symbolic Vs Statistical Learning
clusters ofLearning
Classification
Analytical
and find low- Find interesting directions in Learning
Interesting
dimensional data and find novel observations/ coordinates and
representations of the database cleaning correlations
data
Platt Calibration and Statistics and
Isotonic Regression Informal Retrieval
a set of observed
split the set of example
group the set of example into the instances tries to
into the training set and
training set and the test induce a general
the test
rule
Artificial
The Intelligence
process of Rule based inference Both A & B
selecting models Find interesting
when a statistical model
among different directions in data
describes random error or noise
mathematical models, and find novel
instead of underlying
which are used to observations/
relationship
describe the same data database cleaning
scikit-learn
set classification regression

LabelEncoder class LabelBinarizer class DictVectorizer

DictVectorizer FeatureHasher Both A & B

LabelEncoder LabelBinarizer DictVectorizer

MinMaxScaler MaxAbsScaler Both A & B

MCV MARS MCRS

YES NO
NO YES

regression classification None of the above

Normalizer Imputer Classifier

normalized unnormalized Both A & B

Concuttent matrix Convergance matrix Supportive matrix

run start
init
SparsePCA KernelPCA SVD
None of the
PCA K-Means
above
A. You will always have B. You can not have test error C. None of the
test error zero zero above

C. Individually R
squared cannot
tell about variable
A. If R Squared importance. We
increases, this variable B. If R Squared decreases, this can’t say anything
is significant. variable is not significant. about it right now.

A. Linear Regression C. Linear

with varying error B. Linear Regression with Regression with
terms constant error terms zero error terms

A. 1,2 and 3. B. 1,3 and 4. C. 1 and 3.

A. Scatter plot B. Barchart C. Histograms

A. 1 and 2 B. only 1 C. only 2

C. 1 is True and 2
A. Both are False B. 1 is False and 2 is True is False

A. It is more likely for

X1 to be excluded from B. It is more likely for X1 to be
the model included in the model C. Can’t say

A. Ridge regression C. Both use subset

uses subset selection of B. Lasso regression uses subset selection of
features selection of features features

A. 1 and 2 B. 1 and 3 C. 2 and 4

A. 1 and 2 B. 1 and 3. C. 2 and 3.

A. 1 B. 2 C. Can’t Say
A. Yes B. No
A. True B. False

C. The relationship
is not symmetric
between x and y in
case of correlation
A. The relationship is B. The relationship is not but in case of
symmetric between x symmetric between x and y in regression it is
and y in both. both. symmetric.

1 2 1 and 2

h means that you should not trust anMisclassification would Data will be correctly classified Can’t say
svm.jpg yes no
svm.jpg 1 0

The optimal
hyperplane if exists,
will be the one that
completely separates The soft-margin classifier will
the data separate the data None of the above
h means that you should not trust any
We can still classify dat We can not classify data correctl Can’t Say

1 0

mensional space(N — the number of f 1 0

usual decision parallel

dimension classification reduction

1 0

The optimal
hyperplane if exists,
will be the one that
completely separates The soft-margin classifier will
the data separate the data None of the above
c. Assign a unique
a. Drop missing rows or b. Replace missing values with category to
columns mean/median/mode missing values

a. To assess the b. To judge how the trained

predictive performance model performs outside the
of the models sample on test data c. Both A and B

a. Assumes that all the

features in a dataset are b. Assumes that all the features
equally important in a dataset are independent c. Both A and B

C.     Attributes are
statistically
B.     Attributes are statistically independent of one
A.     Attributes are dependent of one another given another given the
equally important. the class value. class value.
  PCA   Decision Tree   Naive Bayesian
By using inductive machine By using
-- By using a lot of data
learning validation only
Decision Trees and
Probabilistic networks and Support vector
-- Neural Networks (back
Nearest Neighbor machines
propagation)

Training set is used to

A set of data is used to discover
test the accuracy of the
-- hypotheses generated
the potentially predictive Both A & B
relationship.
by the learner.

Inductive Vs
Concept Vs
-- Classification Learning
Symbolic Vs Statistical Learning Analytical
Learning

Find clusters of the data

and find low- Find interesting directions in Interesting
-- dimensional data and find novel observations/ coordinates and
representations of the database cleaning correlations
data
Platt Calibration and Statistics and
-- Isotonic Regression Informal Retrieval

-- Supervised Semi-supervised Reinforcement

-- Overfitting Overlearning Reward

Reinforcement
-- Deep learning Machine learning
learning

-- Regression Accuracy Modelfree

Reinforcement
-- Machine learning Deep learning
learning

Real-time visual object

-- identification
Classic approaches Automatic labeling

-- Inference Interference Accuracy

-- Accuracy Cluster Regression

-- Modelfree Categories Prediction

Feature F1 is an It doesn’t belong

Feature F1 is an example of
-- example of nominal
ordinal variable.
to any of the above
variable. category.
Transform data to zero
-- mean
Transform data to zero median Not possible
Independent
Principal Components Kernel based Principal
-- Analysis Component Analysis
Component
Analysis
-- YES NO
-- correlation coefficient Greedy algorithms All above

-- test_size training_size All above

-- random_state dataset test_size

-- LabelEncoder class LabelBinarizer class DictVectorizer

Using an
automatic strategy
Removing the whole Creating sub-model to predict to input them
-- line those features according to the
other known
values

-- regression classification random_state

-- RobustScaler DictVectorizer LabelBinarizer

max, l2 and l3
-- max, l0 and l1 norms max, l1 and l2 norms
norms

-- F-tests and p-values chi-square ANOVA

-- SelectPercentile FeatureHasher SelectKBest

-- SparsePCA KernelPCA SVD

-- Yes No
-- 1 0

It is more likely for X1

It is more likely for X1 to be
-- to be excluded from the Can’t say
included in the model
model

Test error is also Couldn’t comment

-- Test error is non zero
always zero on Test error

-- ii and iv i and ii ii, iii and iv

-- Matrix Vector Array

-- 1 0
-- 1 0

-- Least Square Error Maximum Likelihood Logarithmic Loss

1 is True and 2 is
-- Both are False 1 is False and 2 is True
False

-- 1 and 2 1 and 3. 2 and 3.

The relationship is
not symmetric
The relationship is The relationship is not between x and y in
-- symmetric between x symmetric between x and y in case of correlation
and y in both. both. but in case of
regression it is
symmetric.

-- by 1 no change by intercept

-- 1 and 2 only 1 only 2

-- 1 2 3

We can still classify

We can not classify data
data correctly for
-- correctly for given setting of Can’t Say
given setting of hyper
hyper parameter C
parameter C

-- 1 0
-- 1 0
-- usual decision parallel
The optimal
hyperplane if exists,
The soft-margin classifier will None of the
-- will be the one that
separate the data above
completely separates
the data
-- Supervised Unsupervised Both
-- True 0
-- 1 0

Text and Hypertext Clustering of

-- Image Classification
Categorization News Articles
-- dimension classification reduction
-- 1 0
-- 1 0
-- Classification Clustering Regression
bayes.jpg Posterior Prior
True 0

man.jpg 1 0
weather data.jpg 0.4 0.64 0.29
-- 1 0
-- 1 0
The optimal
hyperplane if exists,
The soft-margin classifier will None of the
-- will be the one that
separate the data above
completely separates
the data
d Correct Answer
Option d a/b/c/d
Farmer B
Accuracy A

None of above B

learns programs
A
from data

None of above A

Prediction D

None of the
A
Mentioned

None of the
B
above
Bio-inspired
adaptive B
systems
All above D

All D

learns programs
A
from data

None of the
B
Mentioned

All above A

None of the
A
above

FeatureHasher A
None of the
C
Mentioned

Imputer D

None of the
C
Mentioned

All above B

A
B

All above A
None of the
B
Mentioned
Covariance
D
matrix

stop C

init parameter A

D. None of
these. c

D. None of
these a

D. All of above. d
D. None of
these a
D. None of
these. b
D. Both are
True c

D. None of
these b

D. None of
above b

D. None of the
above a

D. 1,2 and 3. d

b
b
a

D. The
relationship is
symmetric
between x and
y in case of
correlation but
in case of
regression it is
not symmetric. d

None of these c

None of these a
a
b

a
None of these a

d. All of the
above d

d. None of the
above option c

D. Attributes
can be nominal
or numeric b
Linerar regression a

None of above A
All D

None of above B

All above D

All D

Clusters B

None of above C

Unsupervised
A
learning

Scalable C

Supervised
B
learning
Bio-inspired
adaptive B
systems

None of above A

Prediction D

None of above B

Both of these B

None of these A
All above D

A
None of these C

None of these C

All above B

FeatureHasher A

All above B

missing_values D

FeatureHasher A

max, l3 and l4
B
norms

All above A

All above A
None of the
B
Mentioned
B
A

None of these B

Test error is
equal to Train C
error

i, ii, iii and iv D

List B
A
A

Both A and B A
Both are True C

1,2 and 3. D

The relationship
is symmetric
between x and y
in case of
D
correlation but
in case of
regression it is
not symmetric.

by its slope D

None of these. B

4 B

None of these A

A
A
B

None A
A
B

All of the
D
above
A
A
A
All A
A
A

A
0.75 B
A
A

A
This sheet is for 3 Mark questions
S.r No Question Image a
e.g 1 Write down question img.jpg Option a
1 Which of the following is characteristic of best fast
machine learning method ?

2 What are the different Algorithm techniques in Supervised

Machine Learning? Learning and
Semi-
3 ______can be adopted when it's necessary to supervised
Supervised
categorize a large amount of data with a few complete Learning
examples or when there's the need to impose some
4 constraints to a clustering
In reinforcement learning, algorithm.
this feedback is usually Overfitting
called as___.

5 In the last decade, many researchers started training Deep learning

bigger and bigger models, built with several different
layers that's why this approach is called_____.
6 What does learning exactly mean? Robots are
programed so
that they can
7 When it is necessary to allow the model to develop a perform the
Overfitting
generalization ability and avoid a common problem task based on
called______. data they gather
from sensors.
8 Techniques involve the usage of both labeled and Supervised
unlabeled data is called___.

9 there's a growing interest in pattern recognition and Regression

associative memories whose structure and functioning
are similar to what happens in the neocortex. Such an
10 approach also allows
______ showed bettersimpler algorithms
performance called _____
than other Machine
Machine
approaches, even without a context-based model learning
learning relates
with the study,
design and
development of
11 the algorithms
Which of the following sentence is correct? -- that give
when a
computers the
12 statistical
capability to
model describes
learn without
random error or
being explicitly
What is ‘Overfitting’ in Machine learning? -- noise instead of
programmed.
underlying
relationship
‘overfitting’
occurs.
13
Test set is used
to test the
accuracy of the
What is ‘Test set’? -- hypotheses
generated by
the learner.

14
Classifications,
Predict time
what is the function of ‘Supervised Learning’? -- series, Annotate
strings

15 Object
Commons unsupervised applications include -- segmentation
16
the
Reinforcement learning is particularly efficient environment is
when______________.
-- not completely
deterministic

17
During the last few years, many ______ algorithms
have been applied to deep
neural networks to learn the best policy for playing
Atari video games and to teach an agent how to
-- Logical
associate the right action with an input representing
the state.
18
Image
classification,
Common deep learning applications include____ -- Real-time
visual tracking

19 if there is only a discrete number of possible

outcomes (called categories), -- Regression
the process becomes a______.
20
Spam detection,
Pattern
Which of the following are supervised learning detection,
applications
-- Natural
Language
Processing
21

Let’s say, you are working with categorical feature(s) All categories
and you have not looked at the distribution of the of categorical
categorical variable in the test data. -- variable are not
present in the
You want to apply one hot encoding (OHE) on the test dataset.
categorical feature(s). What challenges you may face
if you have applied OHE on a categorical variable of
train dataset?
22
Which of the following sentence is FALSE regarding It relates inputs
regression?
-- to outputs.

23
Which of the following method is used to find the
optimal features for cluster analysis
-- k-Means

24 scikit-learn also provides functions for creating make_classifica

dummy datasets from scratch:
-- tion()
25 _____which can accept a NumPy RandomState
generator or an integer seed.
-- make_blobs

26
In many classification problems, the target dataset is
made up of categorical labels which cannot
immediately be processed by any algorithm. An -- 1
encoding is needed and scikit-learn offers at
least_____valid options
27 In which of the following each categorical label is
first turned into a positive integer and then LabelEncoder
transformed into a vector where only one feature is 1
-- class
while all the others are 0.
28

______is the most drastic one and should be

considered only when the dataset is quite large, the Removing the
number of missing features is high, and any
-- whole line
prediction could be risky.

29 It's possible to specify if the scaling process must

with_mean=Tru
include both mean and standard deviation using the -- e/False
parameters________.
30 Which of the following selects the best K high-score
features.
-- SelectPercentile
31
How does number of observations influence
overfitting? Choose the correct answer(s).Note:
Rest all parameters are same1. In case of fewer
observations, it is easy to overfit the data.2. In
-- 1 and 4
case of fewer observations, it is hard to overfit
the data.3. In case of more observations, it is easy
to overfit the data.4. In case of more
observations, it is hard to overfit the data.

32
Suppose you have fitted a complex regression
In case of very
model on a dataset. Now, you are using Ridge
large lambda;
regression with tuning parameter lambda to
-- bias is low,
reduce its complexity. Choose the option(s)
variance is
below which describes relationship of bias and
low
variance with lambda.
33
What is/are true about ridge regression?1. When
lambda is 0, model works like linear regression
model2. When lambda is 0, model doesn’t work
like linear regression model3. When lambda goes
-- 1 and 3
to infinity, we get very, very small coefficients
approaching 04. When lambda goes to infinity,
we get very, very large coefficients approaching
infinity
34 Which of the following method(s) does not have Ridge
--
closed form solution for its coefficients? regression
35
Function used for linear regression in R is lm(formula,
--
__________ data)

36
In the mathematical Equation of Linear Regression (X-intercept,
--
Y = β1 + β2X + ϵ, (β1, β2) refers to __________ Slope)

37
Suppose that we have N independent variables
(X1,X2… Xn) and dependent variable is Y. Now
Relation
Imagine that you are applying linear regression
between the
by fitting the best fit line using least square error --
X1 and Y is
on this data. You found that correlation
weak
coefficient for one of it’s variable(Say X1) with
Y is -0.95.Which of the following is true for X1?
38
We have been given a dataset with n records in
which we have input attribute as x and output
attribute as y. Suppose we use a linear regression
method to model this data. To test our linear
regressor, we split the data in training set and test -- Increase
set randomly. Now we increase the training set
size gradually. As the training set size increases,
what do you expect will happen with the mean
training error?

39
We have been given a dataset with n records in
which we have input attribute as x and output
attribute as y. Suppose we use a linear regression
Bias increases
method to model this data. To test our linear
-- and Variance
regressor, we split the data in training set and test
increases
set randomly. What do you expect will happen
with bias and variance as you increase the size of
training data?
40
Suppose, you got a situation where you find that
your linear regression model is under fitting the
data. In such situation which of the following
-- 1 and 2
options would you consider?1. I will add more
variables2. I will start introducing polynomial
degree variables3. I will remove some variables

41 Problem: Players will play if weather is sunny. Is

weather data.jpg 1
this statement is correct?
42 Multinomial Naïve Bayes Classifier is
Continuous
___________distribution
43 For the given weather data, Calculate probability
weather data.jpg 0.4
of not playing
44
Suppose you have trained an SVM with linear
decision boundary after training SVM, you You want to
correctly infer that your SVM model is under -- increase your
fitting.Which of the following option would you data points
more likely to consider iterating SVM next time?

45 The minimum time complexity for training an

SVM is O(n2). According to this fact, what sizes -- Large datasets
of datasets are not best suited for SVM’s?
46 Selection of
The effectiveness of an SVM depends upon: --
Kernel
47
How far the
hyperplane is
What do you mean by generalization error in
-- from the
terms of the SVM?
support
vectors

48
The SVM
allows very
What do you mean by a hard margin? --
low error in
classification

49
We usually use feature normalization before
using the Gaussian kernel in SVM. What is true
about feature normalization? 1. We do feature
normalization so that new feature will dominate
-- 1
other 2. Some times, feature normalization is not
feasible in case of categorical variables3. Feature
normalization always helps when we use
Gaussian kernel in SVM
50 Support vectors are the data points that lie closest
-- 1
to the decision surface.
51 Which of the following is not supervised
-- PCA
learning?
52
The model
would
consider even
Suppose you are using RBF kernel in SVM with
-- far away
high Gamma value. What does this signify?
points from
hyperplane for
modeling

53 Gaussian Naïve Bayes Classifier is

-- Continuous
___________distribution
54 If I am using all features of my dataset and I
achieve 100% accuracy on my training set, but
-- Underfitting
~70% on validation set, what should I look out
for?
55

a. To assess
What is the purpose of performing cross- the predictive
--
validation? performance
of the models

56
a. Assumes
that all the
Which of the following is true about Naive Bayes features in a
--
? dataset are
equally
important
57
Suppose you are using a Linear SVM classifier
with 2 class classification problem. Now you
have been given the following data in which
some points are circled red that are representing svm.jpg yes
support vectors.If you remove the following any
one red points from the data. Does the decision
boundary will change?
58 Linear SVMs have no hyperparameters that need
-- 1
to be set by cross-validation
59 For the given weather data, what is the
probability that players will play if weather is weather data.jpg 0.5
sunny
60
100 people are at party. Given data gives
information about how many wear pink or not,
man.jpg 0.4
and if a man or not. Imagine a pink wearing guest
leaves, what is the probability of being a man

61 Problem: Players will play if weather is sunny. Is this statement

weather is correct?
data.jpg 1
62 For the given weather data, Calculate probability weather data.jpg 0.4
63 For the given weather data, Calculate probability oweather data.jpg 0.4
64 For the given weather data, what is the probabilityweather data.jpg 0.5
65 100 people are at party. Given data gives informatman.jpg 0.4
66 100 people are at party. Given data gives informa man.jpg 1
67 What do you mean by generalization error in terms of the SVM? How far the hyp
68 What do you mean by a hard margin? The SVM allows
69 The minimum time complexity for training an SVM is O(n2). Accordin Large datasets
70 The effectiveness of an SVM depends upon: Selection of Ke
71 Support vectors are the data points that lie closest to the decision 1
72 The SVM’s are less effective when: The data is line
73 Suppose you are using RBF kernel in SVM with high Gamma value.
The model would
74

The number of
cross-
validations to
The cost parameter in the SVM means: be made
75 If I am using all features of my dataset and I achieve 100% accurac
Underfitting
76 Which of the following are real world applications of the SVM? Text and Hypert
77

Suppose you have trained an SVM with linear

decision boundary after training SVM, you
correctly infer that your SVM model is under
fitting.Which of the following option would you
more likely to consider iterating SVM next time? You want to inc
78 We usually use feature normalization before using the Gaussian kernel in SVM. 1What is true about feature
79 Linear SVMs have no hyperparameters that need to be set by cross-vali 1
80 In a real problem, you should check to see if the SVM is separable and the 1
b c d Correct Answer
Option b Option c Option d a/b/c/d
accuracy scalable All above D

Unsupervised Both A & B None of the C

Learning and Mentioned
Transduction
Semi- Reinforcement Clusters B
supervised

Overlearning Reward None of above C

Machine Reinforcement Unsupervised A

learning learning learning

A set of data is Learning is the It is a set of data C

used to ability to is used to
discover the change discover the
potentially
Overlearning according to
Classification potentially
Regression A
predictive external stimuli predictive
relationship. and relationship.
remembering
Semi- Unsupervised None of the B
most of all
supervised above
previous
experiences.
Accuracy Modelfree Scalable C

Deep learning Reinforcement Supervised B

Data mining learning learning
can be defined
as the process
in which the
unstructured
None of the
data tries to Both A & B C
above
extract
Robots
knowledgeare or
programed
unknown so While involving a set of data is
that they can
interesting the process of used to discover
perform
patterns. the learning the potentially A
task based on
‘overfitting’ predictive
data they
occurs. relationship
gather from
sensors.
It is a set of
data is used to
discover the
Both A & B None of above A
potentially
predictive
relationship.

Speech
recognition, Both A & B None of above C
Regression

Similarity Automatic
All above D
detection labeling

it's impossible
it's often very to have a
All above D
dynamic precise error
measure

Classical Classification None of above D

Autonomous
car driving, Bioinformatics,
All above D
Logistic Speech
optimization recognition

It may be used It discovers

It is used for
for causal D
prediction.
interpretation. relationships.

Density-Based Spectral
Spatial Clustering Find All above D
Clustering clusters

make_regressio
make_blobs() All above D
n()

random_state test_size training_size B

2 3 4 B

LabelBinarizer
DictVectorizer FeatureHasher C
class

Using an
Creating sub- automatic
model to strategy to input
All above A
predict those them according
features to the other
known values

with_std=True/ None of the

Both A & B C
False Mentioned

FeatureHasher SelectKBest All above C

2 and 3 1 and 3 None of theses A

In case of
In case of In case of very
very large
very large large lambda;
lambda; bias
lambda; bias is bias is high, C
is low,
high, variance variance is
variance is
is low high
high

1 and 4 2 and 3 2 and 4 A

Both Ridge
Lasso None of both B
and Lasso

lr(formula, lrm(formula, regression.linear

A
data) data) (formula, data)

(Slope, X- (Y-Intercept, (slope, Y-

C
Intercept) Slope) Intercept)

Relation Relation
Correlation
between the between the
can’t judge the B
X1 and Y is X1 and Y is
relationship
strong neutral
Remain
Decrease Can’t Say D
constant

Bias
Bias decreases Bias increases
decreases and
and Variance and Variance D
Variance
decreases decreases
increases

2 and 3 1 and 3 1, 2 and 3 A

0 A

Discrete Binary B

0.64 0.36 0.5 C

You want to You will try to You will try to

decrease your calculate more reduce the C
data points variables features

Small Medium sized Size does not

A
datasets datasets matter

Kernel Soft Margin All of the

D
Parameters Parameter C above
How
accurately the The threshold
SVM can amount of
B
predict error in an
outcomes for SVM
unseen data

The SVM
allows high
None of the
amount of A
above
error in
classification

1 and 2 1 and 3 2 and 3 B

0 A

Decision Naive Linerar

A
Tree Bayesian regression

The model The model

would would not be
consider only affected by
None of the
the points distance of B
above
close to the points from
hyperplane hyperplane for
for modeling modeling

Discrete Binary A

Nothing, the
model is Overfitting C
perfect
b. To judge
how the
trained model
c. Both A and
performs C
B
outside the
sample on test
data

b. Assumes
that all the
c. Both A and d. None of the
features in a C
B above option
dataset are
independent

no A

0 B

0.26 0.73 0.6 D

0.2 0.6 0.45 B

0 a
0.64 0.29 0.75 b
0.64 0.36 0.5 c
0.26 0.73 0.6 d
0.2 0.6 0.45 b
0 a
How accuratelyThe threshold amount of error b
The SVM allows None of the above a
Small dataset Medium sized dSize does not ma
Kernel ParameSoft Margin Pa All of the abov d
0 a
The data is cl The data is noisy and contains c
The model woul
The model would
None of the abb

The tradeoff
between
misclassificati
on and
The kernel to simplicity of None of the
be used the model above c
Nothing, the moOverfitting c
Image ClassificClustering of N All of the abov d

You want to deYou will try to You will try to c

1 and 2 1 and 3 2 and 3 b
0 b
0 b

This Sheet Is For 1 Mark Questions S.R No
No ratings yet
This Sheet Is For 1 Mark Questions S.R No
56 pages
This Sheet Is For 1 Mark Questions S.R No
100% (1)
This Sheet Is For 1 Mark Questions S.R No
69 pages
Machine Learning & AI Quiz Answers
No ratings yet
Machine Learning & AI Quiz Answers
15 pages
ML Objectives Answers
No ratings yet
ML Objectives Answers
8 pages
MLP Question Bank of AI and ML and NLP
No ratings yet
MLP Question Bank of AI and ML and NLP
7 pages
It ML
No ratings yet
It ML
10 pages
Marks Hi Marks: Be Comp MCQ PDF
100% (1)
Marks Hi Marks: Be Comp MCQ PDF
878 pages
True/False and Multiple Choice Quiz on Machine Learning and Python
50% (2)
True/False and Multiple Choice Quiz on Machine Learning and Python
18 pages
ML Paper
No ratings yet
ML Paper
23 pages
Machine Learning Quiz: Key Concepts
No ratings yet
Machine Learning Quiz: Key Concepts
7 pages
EE782 Quiz 1
No ratings yet
EE782 Quiz 1
2 pages
Machine Learning Exam Sample Questions
No ratings yet
Machine Learning Exam Sample Questions
8 pages
EE2211 Past Paper
No ratings yet
EE2211 Past Paper
14 pages
ML-CBT July24
No ratings yet
ML-CBT July24
3 pages
Machine-Learning Set 8
No ratings yet
Machine-Learning Set 8
8 pages
Hatdog 1.2
No ratings yet
Hatdog 1.2
18 pages
Data Science Final Mock Test
No ratings yet
Data Science Final Mock Test
47 pages
Midterm Quiz 1 - Attempt Reviewai2
No ratings yet
Midterm Quiz 1 - Attempt Reviewai2
6 pages
Upload 6
No ratings yet
Upload 6
6 pages
Machine-Learning Set 7
No ratings yet
Machine-Learning Set 7
22 pages
MLT QN Bank Merged
No ratings yet
MLT QN Bank Merged
26 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
HUAWEI Final Written Exam 3333
50% (2)
HUAWEI Final Written Exam 3333
13 pages
Nptel Notes 3
No ratings yet
Nptel Notes 3
46 pages
Machine Learning - AKTU PAPER (Session 2019 - 2020)
No ratings yet
Machine Learning - AKTU PAPER (Session 2019 - 2020)
10 pages
Machine Learning Quiz for Students
No ratings yet
Machine Learning Quiz for Students
45 pages
ML MCQs Set
No ratings yet
ML MCQs Set
18 pages
PAML Sem 3 - Model Paper Answers
No ratings yet
PAML Sem 3 - Model Paper Answers
4 pages
ML Afawerquestions
No ratings yet
ML Afawerquestions
5 pages
Quizlet Machine Learning Full (AutoRecovered)
No ratings yet
Quizlet Machine Learning Full (AutoRecovered)
27 pages
Huawei Final Written Exam 2.2 Attempts
No ratings yet
Huawei Final Written Exam 2.2 Attempts
19 pages
Sem 5 External
No ratings yet
Sem 5 External
12 pages
CS771 IITK EndSem Solutions
100% (1)
CS771 IITK EndSem Solutions
8 pages
Axioms:: Simultaneously Meannormalization
No ratings yet
Axioms:: Simultaneously Meannormalization
2 pages
ISE 529 Mock Test Answers
No ratings yet
ISE 529 Mock Test Answers
6 pages
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
No ratings yet
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
6 pages
Int 354 ML-1
No ratings yet
Int 354 ML-1
4 pages
Tentamen 27 Maart 2018 Antwoorden
No ratings yet
Tentamen 27 Maart 2018 Antwoorden
11 pages
Nptel Week 5
No ratings yet
Nptel Week 5
4 pages
Quizz ML
No ratings yet
Quizz ML
3 pages
Machine Learning Suggestion (2 Marks) MCQ
No ratings yet
Machine Learning Suggestion (2 Marks) MCQ
5 pages
Deep Learning and Machine Learning Evaluation Mapping
No ratings yet
Deep Learning and Machine Learning Evaluation Mapping
10 pages
Machine Learning Assignment Solutions
No ratings yet
Machine Learning Assignment Solutions
46 pages
AIML MCQ All
No ratings yet
AIML MCQ All
20 pages
Reading 3 Machine Learning
No ratings yet
Reading 3 Machine Learning
9 pages
COMPSCI5014 1 Machine Learning (M) 201904
No ratings yet
COMPSCI5014 1 Machine Learning (M) 201904
7 pages
ML BIT Ans
No ratings yet
ML BIT Ans
5 pages
COMPSCI4061 1 Machine Learning (H) 201904
No ratings yet
COMPSCI4061 1 Machine Learning (H) 201904
5 pages
Sem3 Asmt Answers
No ratings yet
Sem3 Asmt Answers
20 pages
Machine Learning and Python Quiz
No ratings yet
Machine Learning and Python Quiz
13 pages
2022 Jan
No ratings yet
2022 Jan
37 pages
ML Suggestion 2
No ratings yet
ML Suggestion 2
11 pages
MLPUE1 Solution
No ratings yet
MLPUE1 Solution
9 pages
Data Mining For Intelligence
No ratings yet
Data Mining For Intelligence
4 pages
Pentaho Pivot 4 JTutorial
No ratings yet
Pentaho Pivot 4 JTutorial
14 pages
Group Activity-Report Template-For Students
No ratings yet
Group Activity-Report Template-For Students
2 pages
Praveen Prasad Tiwari: Master of Computer Applications July 2007 - May 2010
No ratings yet
Praveen Prasad Tiwari: Master of Computer Applications July 2007 - May 2010
3 pages
Haffmans RPU-351-RPC-80-RPC-50
No ratings yet
Haffmans RPU-351-RPC-80-RPC-50
120 pages
Swiss Testakica Free Energy Device
80% (5)
Swiss Testakica Free Energy Device
9 pages
Meeting Gmail - PMIS HDFC Onboarding Meeting - 7th May 2025 at 4 PM
No ratings yet
Meeting Gmail - PMIS HDFC Onboarding Meeting - 7th May 2025 at 4 PM
2 pages
Error Correction Model
No ratings yet
Error Correction Model
37 pages
Chip Simulation of Automotive Ecus
100% (1)
Chip Simulation of Automotive Ecus
14 pages
MSC Electrical Engineering
No ratings yet
MSC Electrical Engineering
15 pages
Chapter # 6
No ratings yet
Chapter # 6
21 pages
SonicWALL TZ190 Getting Started Guide
No ratings yet
SonicWALL TZ190 Getting Started Guide
50 pages
Jawaharlal Institute of Post-Graduate Medical Education & Research
No ratings yet
Jawaharlal Institute of Post-Graduate Medical Education & Research
1 page
Lab 3 - Simulink Libraries and Amplitude Modulation
No ratings yet
Lab 3 - Simulink Libraries and Amplitude Modulation
7 pages
ASOS Intranet Resources
No ratings yet
ASOS Intranet Resources
28 pages
342 - 11- Prepare 2 Student's Book - 2018, 2nd, 158p (1) -сторінки-видалено-сторінки-видалено
No ratings yet
342 - 11- Prepare 2 Student's Book - 2018, 2nd, 158p (1) -сторінки-видалено-сторінки-видалено
63 pages
Cross Tripping
No ratings yet
Cross Tripping
2 pages
FESTO
No ratings yet
FESTO
28 pages
PoS (ISGC2019) 010
No ratings yet
PoS (ISGC2019) 010
12 pages
Question Bank-DA
No ratings yet
Question Bank-DA
5 pages
2-ADVBS Group Assignment - Question
No ratings yet
2-ADVBS Group Assignment - Question
10 pages
Python File Handling Guide
No ratings yet
Python File Handling Guide
4 pages
NVIDIA
No ratings yet
NVIDIA
16 pages
Visvesvaraya Technological University, Belagavi: "Anti Sleep Alarm For Drivers and Engine Cutoff"
No ratings yet
Visvesvaraya Technological University, Belagavi: "Anti Sleep Alarm For Drivers and Engine Cutoff"
33 pages
Java Cheat Sheet
100% (1)
Java Cheat Sheet
98 pages
Constant Ground Via Stitching: PCB From The Load Devices To The Power Source. Thus, It Maintains A Healthy
No ratings yet
Constant Ground Via Stitching: PCB From The Load Devices To The Power Source. Thus, It Maintains A Healthy
5 pages
Syllabus: Database System Design, Management and Administration - Msis 4013
No ratings yet
Syllabus: Database System Design, Management and Administration - Msis 4013
4 pages
Deploy Web Apps With Docker
No ratings yet
Deploy Web Apps With Docker
61 pages
Kaisa Consulting
No ratings yet
Kaisa Consulting
8 pages
BT0062
No ratings yet
BT0062
21 pages
Hmi Fy001a en
No ratings yet
Hmi Fy001a en
2 pages