Machine Learning
Assignment-2
Q.1: Consider a 2 class classification problem with a dataset of inputs {x1 = (−1, −1), x2 = (−1,
+1), x3 = (+1, −1), x4 = (+1, +1)}. Can this dataset be shattered by SVM classifier with the
following kernels?
a) Linear kernel
b) Polynomial kernel of degree 2
c) Gaussian kernel.
Show the shattering.
Q.2: Consider the training dataset given in the following table. Use weighted k-NN and
determine the class. Test instance (7.6, 60, 8) and K=3.
S. No. CGPA Assessment Project Submitted Result
1 9.2 85 8 Pass
2 8 80 7 Pass
3 8.5 81 8 Pass
4 6 45 5 Fail
5 6.5 50 4 Fail
6 8.2 72 7 Pass
7 5.8 38 5 Fail
8 8.9 91 9 Pass
Q.3: Consider a dataset containing information about loan borrower and corresponding labels
indicating whether it's suitable for approval of loan (Yes or No).
Age Income Marital Credit Approved
Status Score Loan
35 50000 Married 650 Yes
45 80000 Single 720 Yes
30 30000 Single 600 No
55 75000 Married 680 Yes
40 60000 Married 700 Yes
28 35000 Single 580 No
50 90000 Married 750 Yes
33 45000 Single 620 No
48 70000 Married 710 Yes
38 55000 Single 670 Yes
Construct a decision tree using the ID3 algorithm to predict whether to loan approval based
on credit score or marital status
Q.4: Determine the statistical measures of covariance and the Pearson correlation coefficient for
the datasets X={1,2,3,4,5} and Y={1,4,9,16,25} within the context of machine learning.
Q.5: Analyze the distinctions and inherent trade-offs between bias and variance. Discuss how
these two factors influence the performance and generalization of machine learning
models, and elucidate their roles within the context of model evaluation and selection.
Q.6: Consider a Random Forest regression model with 50 trees, each trained on a dataset
containing 3000 samples and 10 features. During training, the maximum number of
features considered at each split is set to 3. Calculate the total number of nodes in the entire
Random Forest.
Q.7: Illustrate the method for determining feature importance within Linear Regression,
elucidating its significance in identifying the predictive strength of individual features in
the model.
Q.8: Investigate the training procedure of a Naive Bayes classifier, focusing on the application of
maximum likelihood estimation to determine the probability distribution of features given
class labels, and elucidate its role in model parameter estimation.
Q.9: For the simple linear regression model y = 2x + 1 and the following dataset, calculate the
Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error
(RMSE):
x y
1 3
2 5
3 7
4 9
5 11
Q.10: A decision tree splits a node with 50 samples (30 of class A and 20 of class B) into two
child nodes. The left child has 20 samples (15 of class A and 5 of class B) and the right
child has 30 samples (15 of class A and 15 of class B). Compute the information gain from
this split.