0% found this document useful (0 votes)
27 views7 pages

ML Question Bank

The document is a comprehensive question bank for a Machine Learning course, covering various algorithms, concepts, and applications across multiple modules. It includes questions on the Candidate Elimination Algorithm, Find-S algorithm, supervised vs unsupervised learning, decision trees, support vector machines, Bayesian networks, and reinforcement learning. Each module presents theoretical and practical problems aimed at assessing understanding and application of machine learning techniques.

Uploaded by

harshimustoor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views7 pages

ML Question Bank

The document is a comprehensive question bank for a Machine Learning course, covering various algorithms, concepts, and applications across multiple modules. It includes questions on the Candidate Elimination Algorithm, Find-S algorithm, supervised vs unsupervised learning, decision trees, support vector machines, Bayesian networks, and reinforcement learning. Each module presents theoretical and practical problems aimed at assessing understanding and application of machine learning techniques.

Uploaded by

harshimustoor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

SUB NAME:MACHINE LEARNING (22CII51) QUESTION BANK

MODULE-1

1. Examine the steps of the Candidate Elimination Algorithm in identifying a consistent hypothesis set.
2. Apply the Find-S algorithm to a given dataset and derive the most specific hypothesis.

S.No Age Blood Exercise Heart


Cholesterol
Pressure Disease
1 Young High High No Yes
2 Middle-aged High High No Yes
3 Old High High Yes No
4 Young Normal Low Yes No
5 Middle-aged High Low Yes Yes

3. Assess how handling bad data improves the performance of a well-posed learning system.

4. Analyze the difference between supervised and unsupervised learning in terms of problem-solving
capabilities. Provide examples.
5. Given a dataset with missing values and irrelevant features, describe the steps you would take to
preprocess the data before training a machine learning model.

6. List the primary steps involved in designing a machine learning system.


7. Design a well-posed learning system for predicting house prices. Your design should include:
 Selection of relevant features
 Handling of missing or noisy data
 Measures to prevent overfitting or underfitting

8. Develop a pseudo-code for implementation of the Candidate Elimination Algorithm. Test your pseudo-
code on the given hypothesis space and sample dataset, imagine a dataset where we want to classify if an
animal is an "elephant." The features and a hypothetical dataset are:

Size Color Trunk Legs Elephant


Large Gray Yes 4 Yes
Medium Gray Yes 4 No
Large Brown Yes 4 No
Large Gray No 4 No
Large Gray Yes 4 Yes

9. Apply the Find-S algorithm to determine the hypothesis for the following training data:
Sky Temperature Humidity Wind Water EnjoySport
Example
1 Sunny Warm Normal Strong Warm Yes
2 Sunny Warm High Strong Warm Yes
3 Rainy Cold High Strong Warm No
4 Sunny Warm High Strong Cool Yes
10. Compare and contrast the four types of machine learning algorithm.
11. Consider the training dataset of 4 instances, it contains the details of the performance of
students and their likelihood of getting a job offer or not in their final semester.
Apply the Find-S algorithm,
CGPA Interactiveness Practical Knowledge Communication SkillsLogical Thinking
Interest Job Offer

>=9 Yes Excecllent Good Fast Yes Yes

>=9 Yes Good Good Fast Yes Yes

>=8 No Good Good Fast No No

>=9 Yes Good Good Slow No Yes

12. Discuss methods to prevent overfitting and underfitting with their advantages and
limitations in machine learning.
13. Identify the key challenges in designing a well-posed learning system. Propose potential
solutions to these challenges.
14. Propose an evaluation framework for determining the effectiveness of a learning system
in a real-world scenario (e.g., fraud detection in banking).
15. Analyze the issues involved in designing a learning system for autonomous vehicles.
Discuss key challenges and solution
MODULE-2
1.Apply the concept of entropy and information gain to build a simple decision tree for the following dataset:
Study Hours Extracurricular Placement Offer
4 Yes Yes
6 Yes Yes
2 No No
5 No Yes
3 Yes No
Predicted Positive Predicted Negative
Actual Positive 50 10
Actual Negative 5 35
2. Analyze the following confusion matrix for a binary classification problem and calculate the accuracy,
precision, recall, and F1-score:

3. Evaluate the effectiveness of a logistic regression model for a binary classification problem by examining the
ROC curve and AUC score.
4. Create a detailed implementation plan for using a Support Vector Machine to solve a real-world classification
problem (e.g., spam email classification, image recognition).
5. Describe how a non-linear SVM classifier is constructed using kernel functions.
6. Construct a regression tree using the following dataset which consists of 10 data instances and 3 attribues
‘Assessment”,”Assignment” and “Project”.The target attribute is the “Result” which is a continuous attribute.

Assessment Assignment Project Result

Good Yes Yes 95

Average Yes No 70

Good No Yes 75

Poor NO No 45

Good Yes Yes 98

Average No Yes 80

Good No No 75

Poor Yes Yes 65

Average No No 58

Good Yes Yes 89

7. Design an algorithm to optimize the choice of K in the KNN algorithm based on cross-validation. Provide
pseudocode and explain its implementation.
8. Analyze the linear support vector machine for classification on a 2D dataset. Explain the concept of the
hyperplane and how the support vectors contribute to finding the optimal boundary.
9. Evaluate the model's effectiveness using the confusion matrix metrics: Precision, Recall, F1-score, and ROC
curve. Provide hypothetical values for demonstration.A healthcare company uses logistic regression to predict if
a patient has a particular disease based on symptoms.

10. Apply multiple linear regression for the values , where weekly sales along with sales for products X1and X2
are provided. Use matrix approach for finding multiple regression.
X1 X2 Y
(Product one sale) (Product two sales) Output weekly sales(in thousands)
1 4 1
2 5 6
3 8 8
4 2 12
11. Evaluate the use of stepwise regression for selecting variables in a multiple linear regression model. What
are the potential benefits and limitations of this approach?

12. Explain the difference between simple and multiple linear regression. When would you use each?
13. Consider the training dataset given in the following table. Use Weighted k-NN and

determine the class. Test instance (7.6, 60, 8) and K=3

S.No. CGPA Assessment Project Submitted Result

1. 9.2 85 8 Pass

2. 8 80 7 Pass

3. 8.5 81 8 Pass

4. 6 45 5 Fail

5. 6.5 50 4 Fail

6. 8.2 72 7 Pass

7. 5.8 38 5 Fail

8. 8.9 91 9 Pass
14. Compute K-Nearest Neighbor classifier to predict the diabetic patient with the given features BMI, Age. If
the training examples are,

BMI Age Sugar


33.6 50 1
26.6 30 0
23.4 40 0
43.1 67 0
35.3 23 1
35.9 67 1
36.7 45 1
25.7 46 0
23.3 29 0
31 56 1
Assume K=3,Test Example BMI=43.6, Age=40, Sugar=?

15. Design a hybrid model combining K-Nearest Neighbors and Support Vector Machines for classification
tasks. Explain how they would work together and under what scenarios your model might perform better.
16. Evaluate the trade-offs involved in choosing a kernel function for a specific problem and its effect on model
complexity and performance.
17. Construct the decision tree based ID3 and list the steps to construct a decision Tree.
18. What is the role of the margin in SVM, and how does it affect the model’s generalization?

MODULE-3

1. Design a probabilistic model using the Bayesian framework for spam email detection
2. Assess the impact of prior probabilities on Bayesian learning outcomes.
3. Compare and contrast the Maximum Likelihood and MDL principles
4. Create a Bayesian Belief Network to predict the likelihood of project success based on parameters like
budget, team size, and timeline.

5. Apply the Gibbs Algorithm to generate samples from a joint distribution in a simple example (e.g., 2-
variable Gaussian distribution).

6.Consider the scenario ‘Ayden attendands classes’ ans ‘Ayden reads daily’ have a direct effect on
‘Ayden writes exam’.The event ‘Ayden writes exam ‘ has a direct effect on his scoring marks ans getting
a job.This scenario modelled with a baysian network use the joint probability distribution for each
variable considering only a subset of it in the network.Find the probability that he does not read
daily,attendsclasses,writes exam and gets a job.

P(RD) P(RD) Writes Reads Daily


0.4 0.6 exam
Attends classes
examtes
AC RD P(WE | AC and
RD) P(WE) Scores mark Gets job
T T 0.9 0.1
T F 0.6 0.4
F T 0.5 0.5
F F 0.1 0.9
P(AC) P(AC)
0.6 0.4
WE P(SM|WE) P(͉͉SM)
T 0.8 0.2
F 0.0 1
WE P(GJ|WE) P(GJ|WE)
T 0.7 0.3
F 0.1 0.9
7. Suppose you have a dataset with continuous features. How would you compute the Bayes Optimal
Classifier in such a case, and what are the challenges you would face compared to working with discrete
features?
8. Describe the Gibbs Sampling algorithm in the context of probabilistic graphical models. How does it
help in estimating parameters when the direct computation is intractable?

9. Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake
occurred, and David and Sophia both called the Harry byusingBayesian network through creating a
directed acyclic graph:

10. Discuss the Minimum Description Length (MDL) principle. How does it help in model selection?
11. Assess the performance of Gibbs Algorithm compared to other probabilistic inference methods.
12.Define Bayes’ Theorem and explain its significance in probabilistic learning.
13. Explain the Maximum Likelihood Estimation (MLE) method with an example.
14. What is the Minimum Description Length (MDL) principle, and how does it relate to probabilistic
learning?
15. Derive the formula for the Bayes Optimal Classifier.
16. Write the detailed notes on Bayesian Belief Network.

Module - 4
1. Compare and contrast Bagging, Pasting, and Boosting in terms of bias-variance trade-off.

2. Construct a Voting Classifier using logistic regression, SVM, and decision trees.

3. Judge the utility of Random Forests in handling datasets with missing values.

4. Illustrate the process of stacking in ensemble learning with a clear example of how base learners
and meta-learners interact.
5. Apply the K-means clustering algorithm to the following dataset with 6 points:
Points: (2, 3), (3, 3), (6, 6), (8, 8), (9, 9), (10, 10), Assume 𝑘=2 and initial centroids at (2, 3) and (10, 10). Show
iterations of the clustering process.

6. Perform dimensionality reduction using PCA on the following dataset and visualize the reduced dimension
F Ex 1Ex2 Ex 3 Ex4

X1 4 8 13 7

X2 11 4 5 14

7. Differentiate between linear and non-linear SVMs with appropriate examples.


8. Analyze the limitations of K-means clustering and suggest alternative clustering algorithms that could overc
these limitations. .
9. Evaluate how combining clustering techniques with dimensionality reduction methods like PCA can enhanc
analysis of high-dimensional data.
10. Explain the K-means clustering algorithms.
11. Define clustering in the context of unsupervised learning.
12. Describe the steps involved in performing PCA on a dataset.
13. Explain How the Random Forests algorithm works with examples.
14. Identify how the Bagging reduces variance and why it is useful for high-variance models like decision trees.
15. Differentiate between hard voting and soft voting in ensemble methods.

Module - 5
1. Discuss the advantages and limitations of dynamic programming in reinforcement learning tasks.
2. Design a policy for an agent using active reinforcement learning in a maze navigation task.
3. Analyze the impact of non-deterministic rewards on the learning process in reinforcement learning.
4. Apply Q-learning to solve a maze-navigation problem where the agent needs to find the shortest path.
Show the step-by-step computation of the Q-values.
5. Design an experiment to compare the efficiency of Active Reinforcement Learning versus Passive
Reinforcement Learning in a stochastic environment. Critically assess your results.
6. Differentiate between deterministic and non-deterministic rewards in reinforcement learning.
7. Formulate this problem as a reinforcement learning task by using Q-learning:
A robot is navigating a grid-like environment to reach a target destination. The robot receives a reward of +10
for reaching the destination and -1 for each move it makes. Certain cells in the grid are obstacles that cannot be
traversed.
a. Define the state space, action space, rewards, and policy.
b. Represent the grid environment using a Q-table format.

8. Explain temporal-difference (TD) learning and how it is different from Monte Carlo methods.
9. Differentiate between deterministic and non-deterministic rewards in reinforcement learning.
10. How are dynamic programming methods like value iteration and policy iteration
related to reinforcement learning?
11. You are tasked with training a self-driving car using RL. How would you define the
learning task (goal) for such an agent?
12. Compare tabular methods and generalization methods in reinforcement learning.
13. Describe the agent-environment interaction framework in reinforcement learning.
14. Examine how Temporal-Difference learning differs from other learning approaches
such as Monte Carlo methods.
15. Explain how the expected reward is calculated in the presence of non-deterministic rewards.

You might also like