This Sheet Is For 1 Mark Questions S.R No
This Sheet Is For 1 Mark Questions S.R No
S.r No
10
11
12
13
14
   15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
                                This sheet is for 1 Mark questions
                                                 Question
Write down question
According to____ , it’s a key success factor for the survival and evolution of all species.
During the last few years, many ______ algorithms have been applied to deep
neural networks to learn the best policy for playing Atari video games and to teach an agent how to
associate the right action with an input representing the state.
If you need a more powerful scaling feature, with a superior control on outliers and the possibility to
select a quantile range, there's also the class________.
scikit-learn also provides a class for per-sample normalization, Normalizer. It can apply________to
each element of a dataset
There are also many univariate methods that can be used in order to select the best features according to
specific criteria based on________.
Which of the following selects only a subset of features belonging to a certain percentile
A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a college.
Which of the following statement is true in following case?
Can a model trained for item based similarity also choose from a given set of items?
The parameter______ allows specifying the percentage of elements to put into the test/training set
In many classification problems, the target ______ is made up of categorical labels which cannot
immediately be processed by any algorithm.
_______adopts a dictionary-oriented approach, associating to each category label a progressive integer
number.
If Linear regression model perfectly first i.e., train error is zero, then _____________________
Which of the following metrics can be used for evaluating regression models?i) R Squaredii) Adjusted R Squarediii) F Statis
How many coefficients do you need to estimate in a simple linear regression model (One independent variable)?
In a simple linear regression model (One independent variable), If we change the input variable by 1 unit. How much outpu
Which of the following methods do we use to find the best fit line for data in Linear Regression?
Which of the following evaluation metrics can be used to evaluate a model while modeling a
continuous output variable?
Suppose you plotted a scatter plot between the residuals and predicted values in linear regression and
you found that there is a relationship between them. Which of the following conclusion do you make
about this situation?
Naive Bayes classifiers are a collection ------------------of algorithms 
Naive Bayes classifiers is _______________ Learning
Features being classified is independent of each other in Naïve Bayes Classifier
Features being classified is __________ of each other in Naïve Bayes Classifier
Bayes Theorem is given by where 1. P(H) is the probability of hypothesis H being true.
2. P(E) is the probability of the evidence(regardless of the hypothesis).
3. P(E|H) is the probability of the evidence given that hypothesis is true.
4. P(H|E) is the probability of the hypothesis given that the evidence is there.
Even if there are no actual supervisors ________ learning is also based on feedback provided
by the environment
When it is necessary to allow the model to develop a generalization ability and avoid a common
problem called______.
Techniques involve the usage of both labeled and unlabeled data is called___.
According to____ , it’s a key success factor for the survival and evolution of all species.
A supervised scenario is characterized by the concept of a _____.
overlearning causes due to an excessive ______.
_____ provides some built-in datasets that can be used for testing purposes.
______allows exploiting the natural sparsity of data while extracting principal components.
Which of the following evaluation metrics can be used to evaluate a model while modeling a
continuous output variable?
Overfitting is more likely when you have huge amount of data to train?
Suppose you plotted a scatter plot between the residuals and predicted values in linear regression and
you found that there is a relationship between them. Which of the following conclusion do you make
about this situation?
Let’s say, a “Linear regression” model perfectly fits the training data (train error is zero). Now, Which
of the following statement is true?
In a linear regression problem, we are using “R-squared” to measure goodness-of-fit. We add a feature
in linear regression model and retrain the same model.Which of the following option is true?
Which of the following assumptions do we make while deriving linear regression parameters?1. The
true relationship between dependent y and predictor x is linear2. The model errors are statistically
independent3. The errors are normally distributed with a 0 mean and constant standard deviation4. The
predictor x is non-stochastic and is measured error-free
To test linear relationship of y(dependent) and x(independent) continuous variables, which of the
following plot best suited?
which of the following step / assumption in regression modeling impacts the trade-off between
under-fitting and over-fitting the most.
Which of the following is true about “Ridge” or “Lasso” regression methods in case of feature
selection?
Which of the following statement(s) can be true post adding a variable in a linear regression model?1.
R-Squared and Adjusted R-squared both increase2. R-Squared increases and Adjusted R-squared
decreases3. R-Squared decreases and Adjusted R-squared decreases4. R-Squared decreases and
Adjusted R-squared increases
How many coefficients do you need to estimate in a simple linear regression model (One independent
variable)?
In given image, P(H) is__________probability.
Conditional probability is a measure of the probability of an event given that another event has
already occurred.
Gaussian distribution when plotted, gives a bell shaped curve which is symmetric about the _______ of the feature
SVMs directly give us the posterior probabilities P(y = 1jx) and P(y = 1jx)
SVM is a ------------------ algorithm 
What is/are true about kernel in SVM?1. Kernel function map low dimensional data to high
dimensional space2. It’s a similarity function
Suppose you are building a SVM model on data X. The data X can be error prone which means
that you should not trust any specific data point too much. Now think that you want to build a
SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack
variable C as one of it’s hyper parameter.What would happen when you use very small C
(C~0)?
Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions
that might be related to the event.
Bernoulli Naïve Bayes Classifier is ___________distribution
If you remove the non-red circled points from the data, the decision boundary will change?
Binarize parameter in BernoulliNB scikit sets threshold for binarizing of sample features.
Bayes Theorem is given by where 1. P(H) is the probability of hypothesis H being true.
2. P(E) is the probability of the evidence(regardless of the hypothesis).
3. P(E|H) is the probability of the evidence given that hypothesis is true.
4. P(H|E) is the probability of the hypothesis given that the evidence is there.
Any linear combination of the components of a multivariate Gaussian is a univariate Gaussian.
 Image               a                     b                    c                d      Correct Answer
img.jpg   Option a                  Option b              Option c         Option d         a/b/c/d
                                                                           None of
          Penalty                   Overlearning          Reward                              A
                                                                           above
          Claude Shannon's                                Darwin’s         None of
                                    Gini Index                                                C
          theory                                          theory           above
                                    By using         By using
          By using a lot of                                                None of
                                    inductive        validation                               A
          data                                                             above
                                    machine learning only
                                Probabilistic
          Decision Trees and                              Support
                                networks and
          Neural Networks                                 vector           All                D
                                Nearest
          (back propagation)                              machines
                                Neighbor
          Training set is used A set of data is
          to test the accuracy used to discover
                                                                           None of
          of the hypotheses the potentially               Both A & B                          B
                                                                           above
          generated by the      predictive
          learner.              relationship.
                                Autonomous car
          Image classification,                           Bioinformatic
                                driving,
          Real-time visual                                s,            All above             D
                                Logistic
          tracking                                        Speech
                                optimization
          Classifications,      Speech                    recognition
                                                                        None of
          Predict time series, recognition,               Both A & B                          C
                                                                        above
          Annotate strings      Regression
                                    Similarity            Automatic
          Object segmentation                                           All above             D
                                    detection             labeling
                                                          it's
          the environment is                              impossible to
                                    it's often very
          not completely                                  have a        All above             D
                                    dynamic
          deterministic                                   precise error
                                                          measure
          Regression                Classification.       Modelfree     Categories            B
                                                                                             missing_valu
                                  regression            classification     random_state                              D
                                                                                             es
                                                                           LabelBinariz FeatureHashe
                                  RobustScaler          DictVectorizer                                               A
                                                                           er           r
                                  max, l0 and l1        max, l1 and l2     max, l2 and       max, l3 and
                                                                                                                     B
                                  norms                 norms              l3 norms          l4 norms
                                                                       It doesn’t
                                  Feature F1 is an   Feature F1 is an
                                                                       belong to any
                                  example of nominal example of                      Both of these                   B
                                                                       of the above
                                  variable.          ordinal variable.
                                                                       category.
Adjusted R Squarediii) F Statist a) ii and iv b) i and ii c) ii, iii and iv d) i, ii, iii and i d
dependent variable)?            a) 1                    b) 2               c) 3           d) 4            b
 ble by 1 unit. How much output a) by 1                 b) no change       c) by interceptd) by its slope d
                                                                                           D) Mean-
                                                                                           Squared-
                                  A) AUC-ROC             B) Accuracy         C) Logloss    Error         d
                                                                             C) A or B
                                                        B) Higher is         depend on D) None of
                                  A) Lower is better    better               the situation these         a
                                  A) TRUE               B) Linear
                                                        B) FALSE                                         b
                                  A) Linear regression  regression is not
                                  is sensitive to       sensitive to                     D) None of
                                  outliers              outliers           C) Can’t say these            a
                                                        B) Since the
                                  A) Since the there is there is a
                                  a relationship        relationship
                                  means our model is means our                           D) None of
                                  not good              model is good      C) Can’t say these            a
                                  Classification        Clustering         Regression All                a
                                  Supervised            Unsupervised       Both          None            a
                                  False                                  1                               b
                                  Independent           Dependent          Partial DependNone            a
           bayes.jpg              True                                   0                               a
           bayes.jpg              Posterior              Prior                                           a
           bayes.jpg              Posterior              Prior                                           b
has already occurred.             True                                   0                               a
ns that might be related to the e True                                   0                               a
                                  Continuous             Discrete            Binary                      c
                                  Continuous             Discrete            Binary                      b
                                  Continuous             Discrete            Binary                      a
                                  True                                   0                               a
_______ of the feature values. Mean                      Variance            Discrete      Random        a
                                  True                                   0                               b
                                  True                                   0                               a
ernel) might lead to overfitting True                                    0                               a
                                  Classification         Clustering          Regression    All           a
                                  Supervised             Unsupervised        Both          None          a
                                  True                                   0                               a
                                                                                           None of the
                      --          cl_forecastB           cl_nowcastC         cl_precastD                      D
                                                                                           Mentioned
                      --          fast                   accuracy            scalable      All above          D
     Supervised
                         Unsupervised
     Learning and                                         None of the
--                       Learning and       Both A & B                     C
     Semi-supervised                                      Mentioned
                         Transduction
     Learning
                                           a set of
     split the set of     group the set of observed
                                                        learns
     example into the example into instances
--                                                      programs           A
     training set and the the training set tries to
                                                        from data
     test                 and the test     induce a
                                           general rule
     Artificial          Rule based                       None of the
--                                          Both A & B                     B
     Intelligence        inference                        Mentioned
                         when a             Find
     The process of
                         statistical        interesting
     selecting models
                         model              directions in
     among different
                         describes          data and
--   mathematical                                         All above        A
                         random error or    find novel
     models, which are
                         noise instead of   observations
     used to describe
                         underlying         / database
     the same data set
                         relationship       cleaning
     Genetic            Speech
                                                None of the
--   Programming and recognition and Both A & B                            A
                                                Mentioned
     Inductive Learning Regression
                           Linear regression
     Linear regression is
--                         is not sensitive to     Can’t say    None of these   A
     sensitive to outliers
                                outliers
                                    Linear               Linear
             Linear Regression
                                Regression with        Regression
   --        with varying error                                    None of these       A
                                 constant error         with zero
                   terms
                                     terms             error terms
                                     Whether we
                                       learn the
                                      weights by      The use of a
             The polynomial
   --                                   matrix         constant-                       A
                 degree
                                     inversion or        term
                                       gradient
                                        descent
   --                1                     0                                           B
                                    Lasso regression Both use
              Ridge regression
                                      uses subset       subset           None of
   --       uses subset selection                                                      B
                                      selection of   selection of         above
                 of features
                                        features       features
                                                                       None of the
   --             1 and 2               1 and 3          2 and 4                       A
                                                                         above
-- 1 2 Can’t Say B
                                                The tradeoff
                                                   between
             The number of
                                  The kernel to misclassifica None of the
   --       cross-validations                                                     C
                                    be used        tion and     above
               to be made
                                                simplicity of
                                                  the model
   --             True                  0                                         A
  --           Continuous           Discrete         Binary                       C
svm.jpg            1                   0                                          B
                              b. Replace   c. Assign a
                            missing values    unique
            a. Drop missing                            d. All of the
   --                            with      category to                            D
            rows or columns                               above 
                            mean/median/m missing
                                  ode         values
   --             True                  0                                         A
                                                   C.     Attribu
                                 B.     Attributes    tes are
                                 are statistically statistically D.     Attribu
             A.     Attributes
                                  dependent of independent tes can be
   --          are equally                                                        B
                                   one another        of one      nominal or
               important.
                                 given the class     another       numeric
                                       value.       given the
                                                   class value.
                                               The data is
                                  The data is  noisy and
                The data is
   --                          clean and ready contains                           C
            linearly separable
                                    to use    overlapping
                                                 points
   --          Supervised        Unsupervised         Both           None         A
   --            False                1                                           B
                                                    Partial
   --         Independent          Dependent                         None         A
                                                   Dependent
bayes.jpg True 0 A
   --             True                  0                                         A
                                       This sheet is for 2 Mark questions
    S.r No                                                 Question
e.g 1        Write down question
       1     A supervised scenario is characterized by the concept of a _____.
      2      overlearning causes due to an excessive ______.
             Some people are using the term ___ instead of prediction only to avoid the weird idea that machine
      5
             learning is a sort of modern magic.
             The term _____ can be freely used, but with the same meaning adopted in physics or system
      6
             theory.
             Even if there are no actual supervisors ________ learning is also based on feedback provided by
      8
             the environment
12 What are the two methods used for the calibration in Supervised Learning?
16 _____ provides some built-in datasets that can be used for testing purposes.
     scikit-learn offers the class______, which is responsible for filling the holes using a strategy based
19
     on the mean, median, or frequency
     Which of the following scale data by removing elements that don't belong to a given range or by
20
     considering a maximum absolute value.
21 Which of the following model model include a backwards elimination feature selection routine?
31   Let’s say, a “Linear regression” model perfectly fits the training data (train error is zero). Now,
     Which of the following statement is true?
32
33
     Which of the one is true about Heteroskedasticity?
34   Which of the following assumptions do we make while deriving linear regression parameters?1.
     The true relationship between dependent y and predictor x is linear2. The model errors are
     statistically independent3. The errors are normally distributed with a 0 mean and constant
     standard deviation4. The predictor x is non-stochastic and is measured error-free
35   To test linear relationship of y(dependent) and x(independent) continuous variables, which of the
     following plot best suited?
36   Generally, which of the following method(s) is used for predicting continuous dependent
     variable?1. Linear Regression2. Logistic Regression
     Suppose you are training a linear regression model. Now consider these points.1. Overfitting is
37   more likely if we have less data2. Overfitting is more likely when the hypothesis space is
     small.Which of the above statement(s) are correct?
     Suppose we fit “Lasso Regression” to a data set, which has 100 features (X1,X2…X100).  Now, we
38   rescale one of these feature by multiplying with 10 (say that feature is X1),  and then refit Lasso
     regression with the same regularization parameter.Now, which of the following option will be
     correct?
39
     Which of the following is true about “Ridge” or “Lasso” regression methods in case of feature
     selection?
     Which of the following statement(s) can be true post adding a variable in a linear regression
40   model?1. R-Squared and Adjusted R-squared both increase2. R-Squared increases and Adjusted
     R-squared decreases3. R-Squared decreases and Adjusted R-squared decreases4. R-Squared
     decreases and Adjusted R-squared increases
     We can also compute the coefficient of linear regression with the help of an analytical method
41   called “Normal Equation”. Which of the following is/are true about “Normal Equation”?1. We
     don’t have to choose the learning rate2. It becomes slow when number of features is very large3.
     No need to iterate
42   How many coefficients do you need to estimate in a simple linear regression model (One
     independent variable)?
43    If two variables are correlated, is it necessary that they have a linear relationship?
44   Correlated variables can have zero correlation coeffficient. True or False?
45
     Which of the following option is true regarding “Regression” and “Correlation” ?Note: y is
     dependent variable and x is independent variable.
46   What is/are true about kernel in SVM?1. Kernel function map low dimensional data to high
     dimensional space2. It’s a similarity function
47   Suppose you are building a SVM model on data X. The data X can be error prone which means that you should no
48   Suppose you are using a Linear SVM classifier with 2 class classification problem. Now you have be
49   If you remove the non-red circled points from the data, the decision boundary will change?
50
     When the C parameter is set to infinite, which of the following holds true?
51
     Suppose you are building a SVM model on data X. The data X can be error prone which means that you should no
52
     SVM can solve linear and non-linear problems
53
The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space(N — the
54
     Hyperplanes are _____________boundaries that help classify the data points. 
55
     The _____of the hyperplane depends upon the number of features.
56
     Hyperplanes are decision boundaries that help classify the data points. 
57   SVM algorithms use a set of mathematical functions that are defined as the kernel.
58
     In SVM, Kernel function is used to map a lower dimensional data into a higher dimensional data.
59
     In SVR we try to fit the error within a certain threshold.
60
When the C parameter is set to infinite, which of the following holds true?
61
     How do you handle missing or corrupted data in a dataset?
62
63
64
71 What are the two methods used for the calibration in Supervised Learning?
     ______can be adopted when it's necessary to categorize a large amount of data with a few
72
     complete examples or when there's the need to impose some constraints to a clustering algorithm.
     In the last decade, many researchers started training bigger and bigger models, built with several
74
     different layers that's why this approach is called_____.
     there's a growing interest in pattern recognition and associative memories whose structure and
75   functioning are similar to what happens in the neocortex. Such an approach also allows simpler
     algorithms called _____
76 ______ showed better performance than other approaches, even without a context-based model
     Some people are using the term ___ instead of prediction only to avoid the weird idea that machine
78
     learning is a sort of modern magic.
     The term _____ can be freely used, but with the same meaning adopted in physics or system
79
     theory.
     A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a
81   college.
     Which of the following statement is true in following case?
84    Can a model trained for item based similarity also choose from a given set of items?
85    What are common feature selection methods in regression task?
86 The parameter______ allows specifying the percentage of elements to put into the test/training set
      In many classification problems, the target ______ is made up of categorical labels which cannot
87
      immediately be processed by any algorithm.
      _______adopts a dictionary-oriented approach, associating to each category label a progressive
88
      integer number.
      ________is much more difficult because it's necessary to determine a supervised strategy to train a
89
      model for each feature and, finally, to predict their value
      If you need a more powerful scaling feature, with a superior control on outliers and the possibility
91
      to select a quantile range, there's also the class________.
      scikit-learn also provides a class for per-sample normalization, Normalizer. It can
92
      apply________to each element of a dataset
      There are also many univariate methods that can be used in order to select the best features
93
      according to specific criteria based on________.
94    Which of the following selects only a subset of features belonging to a certain percentile
96     If two variables are correlated, is it necessary that they have a linear relationship?
97    Correlated variables can have zero correlation coeffficient. True or False?
      Suppose we fit “Lasso Regression” to a data set, which has 100 features (X1,X2…X100).  Now,
      we rescale one of these feature by multiplying with 10 (say that feature is X1),  and then refit
98
      Lasso regression with the same regularization parameter.Now, which of the following option will
      be correct?
99 If Linear regression model perfectly first i.e., train error is zero, then _____________________
      Which of the following metrics can be used for evaluating regression models?i) R Squaredii)
100
      Adjusted R Squarediii) F Statisticsiv) RMSE / MSE / MAE
101   In syntax of linear model lm(formula,data,..), data refers to ______
102   Linear Regression is a supervised machine learning algorithm.
103   It is possible to design a Linear regression algorithm using a neural network?
104   Which of the following methods do we use to find the best fit line for data in Linear Regression?
      Suppose you are training a linear regression model. Now consider these points.1. Overfitting is
105   more likely if we have less data2. Overfitting is more likely when the hypothesis space is
      small.Which of the above statement(s) are correct?
      We can also compute the coefficient of linear regression with the help of an analytical method
      called “Normal Equation”. Which of the following is/are true about “Normal Equation”?1. We
106
      don’t have to choose the learning rate2. It becomes slow when number of features is very large3.
      No need to iterate
      Which of the following option is true regarding “Regression” and “Correlation” ?Note: y is
107
      dependent variable and x is independent variable.
      In a simple linear regression model (One independent variable), If we change the input variable by
108
      1 unit. How much output variable will change?
      Generally, which of the following method(s) is used for predicting continuous dependent variable?
109
      1. Linear Regression2. Logistic Regression
      How many coefficients do you need to estimate in a simple linear regression model (One
110
      independent variable)?
      Suppose you are building a SVM model on data X. The data X can be error prone which
      means that you should not trust any specific data point too much. Now think that you want
111   to build a SVM model which has quadratic kernel function of polynomial degree 2 that
      uses Slack variable C as one of it’s hyper parameter.What would happen when you use
      very large value of C(C->infinity)?
112   SVM can solve linear and non-linear problems
      The objective of the support vector machine algorithm is to find a hyperplane in an N-
113
      dimensional space(N — the number of features) that distinctly classifies the data points.
114   Hyperplanes are _____________boundaries that help classify the data points. 
115 When the C parameter is set to infinite, which of the following holds true?
119   Which of the following are real world applications of the SVM?
120   The _____of the hyperplane depends upon the number of features.
121   Hyperplanes are decision boundaries that help classify the data points. 
122   SVM algorithms use a set of mathematical functions that are defined as the kernel.
123   Naive Bayes classifiers are a collection ------------------of algorithms 
124   In given image, P(H|E) is__________probability.
      Solving a non linear separation problem with a hard margin Kernelized SVM (Gaussian
125
      RBF Kernel) might lead to overfitting
      100 people are at party. Given data gives information about how many wear pink or not,
126
      and if a man or not. Imagine a pink wearing guest leaves, was it a man?
127   For the given weather data, Calculate probability of playing
      In SVM, Kernel function is used to map a lower dimensional data into a higher
128
      dimensional data.
129    In SVR we try to fit the error within a certain threshold.
130   When the C parameter is set to infinite, which of the following holds true?
 Image                 a                            b                               c
img.jpg   Option a                  Option b                             Option c
          Programmer                Teacher                              Author
          Capacity                  Regression                           Reinforcement
          Genetic Programming
                                    Speech recognition and
          and                                                            Both A & B
                                    Regression
          Inductive Learning
YES                      NO
NO                       YES
run                      start
                                                           init
SparsePCA                KernelPCA                         SVD
                                                           None of the
PCA                      K-Means
                                                           above
A. You will always have B. You can not have test error     C. None of the
test error zero         zero                               above
                                                           C. Individually R
                                                           squared cannot
                                                           tell about variable
A. If R Squared                                            importance. We
increases, this variable B. If R Squared decreases, this   can’t say anything
is significant.          variable is not significant.      about it right now.
                                   A. 1                        B. 2                               C. Can’t Say
                                   A. Yes                      B. No
                                   A. True                     B. False
                                                                                     C. The relationship
                                                                                     is not symmetric
                                                                                     between x and y in
                                                                                     case of correlation
                                   A. The relationship is B. The relationship is not but in case of
                                   symmetric between x symmetric between x and y in regression it is
                                   and y in both.         both.                      symmetric.
1 2 1 and 2
h means that you should not trust anMisclassification would Data will be correctly classified Can’t say
            svm.jpg                 yes                     no
            svm.jpg                                       1                                  0
                                    The optimal
                                   hyperplane if exists,
                                   will be the one that
                                   completely separates        The soft-margin classifier will
                                   the data                    separate the data                  None of the above
h means that you should not trust any
                                   We can still classify dat We can not classify data correctl Can’t Say
1 0
1 0
1 0
1 0
1 0
                                    The optimal
                                   hyperplane if exists,
                                   will be the one that
                                   completely separates          The soft-margin classifier will
                                   the data                      separate the data                      None of the above
                                                                                                        c. Assign a unique
                                   a. Drop missing rows or b. Replace missing values with               category to
                                   columns                 mean/median/mode                             missing values
                                                                                                      C.     Attributes are
                                                                                                          statistically
                                                                  B.     Attributes are statistically independent of one
                                     A.     Attributes are        dependent of one another given another given the
                                     equally important.                    the class value.               class value.
                                               PCA                           Decision Tree               Naive Bayesian
                                                                 By using inductive machine           By using
                       --          By using a lot of data
                                                                 learning                             validation only
     Decision Trees and
                           Probabilistic networks and           Support vector
--   Neural Networks (back
                           Nearest Neighbor                     machines
     propagation)
                                                              Inductive Vs
     Concept Vs
--   Classification Learning
                             Symbolic Vs Statistical Learning Analytical
                                                              Learning
                                                                Reinforcement
--   Deep learning           Machine learning
                                                                learning
                                                                Reinforcement
--   Machine learning        Deep learning
                                                                learning
                                                                   Using an
                                                                   automatic strategy
     Removing the whole         Creating sub-model to predict      to input them
--   line                       those features                     according to the
                                                                   other known
                                                                   values
                                                                   max, l2 and l3
--   max, l0 and l1 norms       max, l1 and l2 norms
                                                                   norms
--               Yes                               No
--                1                                0
                                                          The relationship is
                                                            not symmetric
       The relationship is      The relationship is not   between x and y in
--    symmetric between x    symmetric between x and y in case of correlation
         and y in both.                  both.               but in case of
                                                            regression it is
                                                              symmetric.
-- by 1 no change by intercept
-- 1 2 3
--             1                           0
--             1                           0
--           usual                     decision                  parallel
          The optimal
     hyperplane if exists,
                           The soft-margin classifier will     None of the
--    will be the one that
                                 separate the data               above
     completely separates
            the data
--        Supervised                Unsupervised                   Both
--          True                         0
--             1                           0
    man.jpg                 1                           0
weather data.jpg           0.4                        0.64                    0.29
       --                   1                           0
       --                   1                           0
                        The optimal
                   hyperplane if exists,
                                         The soft-margin classifier will   None of the
       --           will be the one that
                                               separate the data             above
                   completely separates
                          the data
       d          Correct Answer
Option d              a/b/c/d
Farmer                   B
Accuracy                A
None of above B
learns programs
                        A
from data
None of above A
Prediction D
None of the
                        A
Mentioned
None of the
                        B
above
Bio-inspired
adaptive                B
systems
All above               D
All D
learns programs
                        A
from data
None of the
                        B
Mentioned
All above A
None of the
                        A
above
FeatureHasher           A
None of the
                     C
Mentioned
Imputer D
None of the
                     C
Mentioned
All above B
                     A
                     B
All above            A
None of the
                     B
Mentioned
Covariance
                     D
matrix
stop C
init parameter A
D. None of
these.           c
D. None of
these            a
D. All of above. d
D. None of
these            a
D. None of
these.           b
D. Both are
True            c
D. None of
these           b
D. None of
above           b
D. None of the
above          a
D. 1,2 and 3. d
                b
                b
                a
D.          The
relationship is
symmetric
between x and
y in case of
correlation but
in    case    of
regression it is
not symmetric. d
None of these c
None of these a
              a
              b
                a
None of these    a
d. All of the
above            d
d. None of the
above option     c
 D.     Attributes
 can be nominal
   or numeric b
Linerar regression   a
None of above        A
All             D
None of above B
All above D
All D
Clusters B
None of above C
Unsupervised
                A
learning
Scalable C
Supervised
                B
learning
Bio-inspired
adaptive        B
systems
None of above A
Prediction D
None of above B
Both of these B
None of these   A
All above            D
                     A
None of these        C
None of these C
All above B
FeatureHasher A
All above B
missing_values D
FeatureHasher A
max, l3 and l4
                     B
norms
All above A
All above            A
None of the
                     B
Mentioned
                     B
                     A
None of these B
  Test error is
 equal to Train      C
     error
       List          B
                     A
                     A
 Both A and B        A
 Both are True      C
1,2 and 3. D
The relationship
  is symmetric
between x and y
    in case of
                    D
 correlation but
    in case of
 regression it is
 not symmetric.
by its slope D
None of these. B
4 B
None of these A
                    A
                    A
                    B
     None           A
                    A
                    B
   All of the
                    D
    above
                    A
       A
       A
All    A
       A
       A
       A
0.75   B
       A
       A
       A
               This sheet is for 3 Mark questions
  S.r No                          Question                             Image           a
e.g 1      Write down question                                        img.jpg   Option a
     1     Which of the following is characteristic of best                     fast
           machine learning method ?
14
                                                                  Classifications,
                                                                  Predict time
     what is the function of ‘Supervised Learning’?          --   series, Annotate
                                                                  strings
15                                                                Object
     Commons unsupervised applications include               --   segmentation
16
                                                                  the
     Reinforcement learning is particularly efficient             environment is
     when______________.
                                                             --   not completely
                                                                  deterministic
17
     During the last few years, many ______ algorithms
     have been applied to deep
     neural networks to learn the best policy for playing
     Atari video games and to teach an agent how to
                                                             --   Logical
     associate the right action with an input representing
     the state.
18
                                                                  Image
                                                                  classification,
     Common deep learning applications include____           --   Real-time
                                                                  visual tracking
     Let’s say, you are working with categorical feature(s)        All categories
     and you have not looked at the distribution of the            of categorical
     categorical variable in the test data.                   --   variable are not
                                                                   present in the
     You want to apply one hot encoding (OHE) on the               test dataset.
     categorical feature(s). What challenges you may face
     if you have applied OHE on a categorical variable of
     train dataset?
22
     Which of the following sentence is FALSE regarding            It relates inputs
     regression?
                                                              --   to outputs.
23
     Which of the following method is used to find the
     optimal features for cluster analysis
                                                              --   k-Means
26
     In many classification problems, the target dataset is
     made up of categorical labels which cannot
     immediately be processed by any algorithm. An            --   1
     encoding is needed and scikit-learn offers at
     least_____valid options
27   In which of the following each categorical label is
     first turned into a positive integer and then                 LabelEncoder
     transformed into a vector where only one feature is 1
                                                              --   class
     while all the others are 0.
28
32
     Suppose you have fitted a complex regression
                                                                  In case of very
     model on a dataset. Now, you are using Ridge
                                                                   large lambda;
     regression with tuning parameter lambda to
                                                             --     bias is low,
     reduce its complexity. Choose the option(s)
                                                                     variance is
     below which describes relationship of bias and
                                                                        low
     variance with lambda.
33
     What is/are true about ridge regression?1. When
     lambda is 0, model works like linear regression
     model2. When lambda is 0, model doesn’t work
     like linear regression model3. When lambda goes
                                                             --      1 and 3
     to infinity, we get very, very small coefficients
     approaching 04. When lambda goes to infinity,
     we get very, very large coefficients approaching
     infinity
34   Which of the following method(s) does not have                   Ridge
                                                             --
     closed form solution for its coefficients?                     regression
35
      Function used for linear regression in R is                  lm(formula,
                                                             --
     __________                                                       data)
36
     In the mathematical Equation of Linear Regression             (X-intercept,
                                                             --
     Y = β1 + β2X + ϵ, (β1, β2) refers to __________                  Slope)
37
     Suppose that we have N independent variables
     (X1,X2… Xn) and dependent variable is Y. Now
                                                                    Relation
     Imagine that you are applying linear regression
                                                                   between the
     by fitting the best fit line using least square error   --
                                                                   X1 and Y is
     on this data. You found that correlation
                                                                      weak
     coefficient for one of it’s variable(Say X1) with
     Y is -0.95.Which of the following is true for X1?
38
     We have been given a dataset with n records in
     which we have input attribute as x and output
     attribute as y. Suppose we use a linear regression
     method to model this data. To test our linear
     regressor, we split the data in training set and test    --              Increase
     set randomly. Now we increase the training set
     size gradually. As the training set size increases,
     what do you expect will happen with the mean
     training error?
39
     We have been given a dataset with n records in
     which we have input attribute as x and output
     attribute as y. Suppose we use a linear regression
                                                                            Bias increases
     method to model this data. To test our linear
                                                              --            and Variance
     regressor, we split the data in training set and test
                                                                              increases
     set randomly. What do you expect will happen
     with bias and variance as you increase the size of
     training data?
40
     Suppose, you got a situation where you find that
     your linear regression model is under fitting the
     data. In such situation which of the following
                                                              --              1 and 2
     options would you consider?1. I will add more
     variables2. I will start introducing polynomial
     degree variables3. I will remove some variables
48
                                                                   The SVM
                                                                  allows very
     What do you mean by a hard margin?                     --
                                                                  low error in
                                                                 classification
49
     We usually use feature normalization before
     using the Gaussian kernel in SVM. What is true
     about feature normalization? 1. We do feature
     normalization so that new feature will dominate
                                                            --         1
     other 2. Some times, feature normalization is not
     feasible in case of categorical variables3. Feature
     normalization always helps when we use
     Gaussian kernel in SVM
50   Support vectors are the data points that lie closest
                                                            --         1
     to the decision surface.
51   Which of the following is not supervised
                                                            --         PCA
     learning?
52
                                                                   The model
                                                                     would
                                                                 consider even
     Suppose you are using RBF kernel in SVM with
                                                            --      far away
     high Gamma value. What does this signify?
                                                                  points from
                                                                 hyperplane for
                                                                   modeling
                                                                             a. To assess
     What is the purpose of performing cross-                               the predictive
                                                                 --
     validation?                                                             performance
                                                                            of the models
56
                                                                            a. Assumes
                                                                             that all the
     Which of the following is true about Naive Bayes                       features in a
                                                                 --
     ?                                                                       dataset are
                                                                               equally
                                                                              important
57
     Suppose you are using a Linear SVM classifier
     with 2 class classification problem. Now you
     have been given the following data in which
     some points are circled red that are representing       svm.jpg             yes
     support vectors.If you remove the following any
     one red points from the data. Does the decision
     boundary will change?
58   Linear SVMs have no hyperparameters that need
                                                                 --               1
     to be set by cross-validation
59   For the given weather data, what is the
     probability that players will play if weather is    weather data.jpg        0.5
     sunny
60
     100 people are at party. Given data gives
     information about how many wear pink or not,
                                                             man.jpg             0.4
     and if a man or not. Imagine a pink wearing guest
     leaves, what is the probability of being a man
                                                                       The number of
                                                                       cross-
                                                                       validations to
     The cost parameter in the SVM means:                              be made
75   If I am using all features of my dataset and I achieve 100% accurac
                                                                       Underfitting
76   Which of the following are real world applications of the SVM?    Text and Hypert
77
Speech
recognition,      Both A & B        None of above     C
Regression
Similarity        Automatic
                                    All above         D
detection         labeling
                  it's impossible
it's often very   to have a
                                    All above         D
dynamic           precise error
                  measure
Autonomous
car driving,      Bioinformatics,
                                  All above           D
Logistic          Speech
optimization      recognition
Image             Autonomous
classification,   car driving,      Bioinformatics,
                                                      A
Real-time         Logistic          Speech
visual tracking   optimization      recognition
Frequency
distribution of
                    Train and Test
categories is
                    always have
different in                       Both A and B      D
                    same
train as
                    distribution.
compared to
the test dataset.
Density-Based Spectral
Spatial       Clustering Find All above              D
Clustering    clusters
make_regressio
               make_blobs()          All above       D
n()
2 3 4 B
                    LabelBinarizer
DictVectorizer                     FeatureHasher     C
                    class
                    Using an
Creating sub-       automatic
model to            strategy to input
                                      All above      A
predict those       them according
features            to the other
                    known values
  In case of
                In case of In case of very
  very large
                very large   large lambda;
lambda; bias
             lambda; bias is bias is high,      C
   is low,
              high, variance variance is
 variance is
                  is low          high
     high
               Both Ridge
   Lasso                       None of both     B
               and Lasso
 Relation       Relation
                             Correlation
between the    between the
                           can’t judge the      B
X1 and Y is    X1 and Y is
                            relationship
   strong        neutral
                 Remain
  Decrease                      Can’t Say      D
                 constant
    Bias
              Bias decreases Bias increases
decreases and
              and Variance and Variance        D
  Variance
                decreases      decreases
  increases
0 A
Discrete Binary B
  The SVM
 allows high
                 None of the
  amount of                                  A
                   above
   error in
classification
0 A
Discrete Binary A
Nothing, the
 model is        Overfitting                 C
  perfect
  b. To judge
    how the
trained model
               c. Both A and
   performs                                       C
                     B 
  outside the
sample on test
      data
 b. Assumes
  that all the
               c. Both A and d. None of the
 features in a                                    C
                     B        above option
  dataset are
 independent
no A
0 B
             0                                a
         0.64             0.29           0.75 b
         0.64             0.36            0.5 c
         0.26             0.73            0.6 d
           0.2             0.6           0.45 b
             0                                a
How accuratelyThe threshold amount of error b
The SVM allows None of the above              a
Small dataset Medium sized dSize does not ma
Kernel ParameSoft Margin Pa All of the abov d
             0                                a
The data is cl The data is noisy and contains c
The model woul
            The model would
                          None of the abb
               The tradeoff
               between
               misclassificati
               on and
The kernel to simplicity of     None of the
be used        the model       above           c
Nothing, the moOverfitting                     c
Image ClassificClustering of N All of the abov d