0% found this document useful (0 votes)

81 views9 pages

ML 20230316 1

Machine learning exam 2023

Uploaded by

bakadi6010

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views9 pages

ML 20230316 1

Machine learning exam 2023

Uploaded by

bakadi6010

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Exam in DD2421 Machine Learning

2023-03-16, kl 14.00 – 18.00

Aids allowed: calculator, language dictionary.

To take this exam you must be registered to this specific exam as well as to the course.
In order to pass this exam, your score x first needs to be 20 or more (out of 42, full point).
In addition, given your points y from the Programming Challenge (out of 18, full point), the
requirements on the total points, p = x + y, are preliminarily set for different grades as:

54 < p ≤ 60 → A
48 < p ≤ 54 → B
42 < p ≤ 48 → C
36 < p ≤ 42 → D
29 < p ≤ 36 → E (A pass is guaranteed with the required points for ’E’.)
0 ≤ p ≤ 29 → F

This exam consists of sections A, B, and C. NB. Use different papers (answer sheets) for
different sections.

Page 1 (of 9)

DD2421 Machine Learning • VT 2023

Sturm, Conradt, and Maki
A Graded problems
Potential inquiries to be addressed to Atsuto Maki.

A-1 Terminology (4p)

For each term (a–d) in the left list, find the explanation from the right list which best describes
how the term is used in machine learning.

1) An approach to find useful dimension for classification

2) Random strategy for area compression
a) Dropout 3) An approach to train artificial neural networks
4) Algorithm to learn with latent variables
b) RANSAC
5) A robust method to fit a model to data with outliers
c) Occams’s razor
6) A technique for margin maximization
d) k-fold cross validation 7) A technique for assessing a model by exploiting available
data for training and testing
8) A principle to choose the simplest explanation

A-2 Nearest Neighbor, Classification (4p)

Suppose that we take a data set, divide it into training and test sets, and then try out two dif-
ferent classification procedures. We use three-quarters of the data for training, and the remaining
one-quarter for testing. First we use Adaboost and get an error rate of 10% on the training data.
We also get the average error rate (weighted average over both test and training data sets) of 12%.
Next we use k-nearest neighbor (where k = 1) and get an average error rate (weighted average
over both test and training data sets) of 10%.

Answer the following questions while motivating them.

a) What was the error rate with 1-nearest neighbor on the training set? (1p)

b) What was the error rate with 1-nearest neighbor on the test set? (1p)

c) What was the error rate with AdaBoost on the test set? (1p)

d) Based on these results, indicate the method which we should prefer to use for classification
of new observations, with a simple reasoning. (1p)

Page 2 (of 9)

DD2421 Machine Learning • VT 2023

Sturm, Conradt, and Maki
Figure 1. Playing cards consisting of 52 patterns.

A-3 Entropy and Decision Trees/Forests (6p)

a) Consider decision trees, and indicate a correct one as the basic strategy for selecting a ques-
tion (attribute) at each node.
i. To minimize the expected reduction of the entropy.
ii. To maximize the expected reduction of the entropy.
iii. To minimize the expected reduction of gini impurity.
Simply indicate your choice. (1p)

b) Briefly explain the two kinds of randomness involved in the design of Decision Forests? (2p)

Now imagine that you are playing with Cards and randomly sample four cards out of the
pile of 52 cards (see Figure 1) with replacement, i.e. you sequentially draw a card but return
it to the pile each time you have seen what it is.

c) At each instance of drawing a card, what is the Shannon information content of the outcome
with respect to the suit, one of {Spades, Hearts, Diamonds, and Clubs}, measured in bits?
Movivate your answer. (1p)

d) You play a game with a rule that you win if the suits of all the four cards are of the same
colour, either black or red. Otherwise you lose. With respect to the outcome of this game,
what is the expected information gain by drawing the first two card, i.e. by seeing (the suit
colours of) the first card and the second card? Movivate your answer. (2p)

Note: if you do not have a calculator, answer with an expression but simplify it as much as
possible.

Page 3 (of 9)

DD2421 Machine Learning • VT 2023

Sturm, Conradt, and Maki
A-4 Regression with regularization (4p)

For a set of N training samples {(x1 , y1 ), . . . , (xN , yN )}, each consisting of input vector x
and output y, suppose we estimate the regression coefficients w> (∈ Rd ) = {w1 , . . . , wd } in a
linear regression model by minimizing
N
X d
X
(yn − w> xn )2 + λ wi2
n=1 i=1

for a particular value of λ.

Now, let us consider different models trained with different values of λ, starting from a very
large value (infinity) and decreasing it down to 0. Then, for parts a) through c), indicate which of
i. through v. is most likely to be the case.

Briefly justify each of your answers.

a) As we decrease λ, the variance of the model will:

i. Remain constant.
ii. Steadily increase.
iii. Steadily decrease.
iv. Decrease initially, and then eventually start increasing in a U shape.
v. Increase initially, and then eventually start decreasing in an inverted U shape. (1p)
b) Repeat a) for the training error (resisual sum of squares, RSS). (1p)
c) Repeat a) for test RSS. (2p)

A-5 PCA, Subspace Methods (4p)

We consider to solve a K-class classification problem with the Subspace Method and for that
we compute a subspace L(j) (j = 1, ..., K) using training data for each class, respectively. That
is, given a set of feature vectors (as training data) which belong to a specific class C (i.e. with an
identical class label), we perform PCA on them and generated an orthonormal basis {u1 , ..., up }
which spans a p-dimensional subsapce, L, as the outcome.
Provide an answer to the following questions.

a) Given that we compute {u1 , ..., up } as eigenvectors of the auto-correlation matrix Q based
on the training data, how should we choose the eigenvectors in relation to the corresponding
eigenvalues of Q? (1p)
b) We have a new input vector x whose class is unknown, and consider its projectiton length
on each L(j) . Describe how the projection length is represented, using a simple formula.
(2p)
c) Given x, we computed its projectiton length on each subspace as S (j) (j = 1, ..., K), re-
spectively. For a few classes among those with labels {l,m,n}, we had the following obser-
vations:
S (l) was the minimum of all S (j) ’s,
S (m) was the maximum of all S (j) ’s, and
S (n) was the closest to the average of all S (j) ’s.
Which class should x belong to? Simply choose a class label. (1p)

Page 4 (of 9)

DD2421 Machine Learning • VT 2023

Sturm, Conradt, and Maki
B Graded problems
Potential inquiries to be addressed to Bob Sturm.

B-1 Identify Bayes’ Theorem (1p)

For two events A and B, which of the following is Bayes’ Theorem:

a) P (A|B) = P (A, B)P (A)/P (B)

b) P (A|B) = P (B|A)P (A, B)/P (B)

c) P (A|B) = P (B|A)P (A)/P (B)

d) P (A|B) = P (A, B)/P (B)

B-2 Conditioning (1p)

For three events A, B, and C, which of the following are not equivalent to P (A, B, C):

a) P (A, C|B)P (B)

b) P (A|B, C)P (B, C)

c) P (A, B|C)P (B)

d) P (A|C, B)P (B|C)P (C)

B-3 Shoogee picks (2p)

Our dog Shoogee1 (pictured) has a big box of N balls, all identical excerpt for color: one is
red and the others are brown. But since Shoogee is colorblind she can’t see a difference. She likes
to take the balls out one by one at random and lay them beside the box. Find an expression for the
expected number of balls Shoogee takes out of the box before taking out the one red ball. Show
your work.

B-4 Inference by Minimizing Expected Loss (3p)

A good strategy for probabilistic inference given an observation x is to choose the value y ∈ Y
that minimizes the expected loss:
Z
∗ 0
y (x) = arg min
0
EY |X=x [L(y, y )] = arg min0
L(y, y 0 )P r(y|X = x)dy (1)
y ∈Y y ∈Y y∈Y

where L(y, y 0 ) is a loss function. Define L(y, y 0 ) = (y−y 0 )2 and assume Y |X = x ∼ Unif[0, 2x],
x > 0, that is a uniform distribution over the domain 0 and 2x for all positive x. Derive a formula
for y ∗ (x) in this case. Show your work.

B-5 Regression (1p)

1
About the name. Being non-native Swedish speakers, we were amused hearing people on the television saying
"tjugo-noll-noll". Except they don’t say "shoogo", but "shoogee". "Shoogo" sounds like a cleaning product. "Shoogee"
sounds like a dog’s name. So we named her "Shoogee Nulnul".

Page 5 (of 9)

DD2421 Machine Learning • VT 2023

Sturm, Conradt, and Maki
Figure 2. Shoogee (dog) with one of her favorite things in the world.

Consider you have modelled D = ((x, y)i : x, y ∈ R) with

y = w0 + w1 x 2 + (2)

where ∼ N (0, σ 2 ), and w0 < 0. Graph a solution in the x-y plane for w1 < 0 and a solution for
w1 > 0. Label the intersection on the y-axis.

B-6 Probabilistic classification (1p)

I have a dataset of labeled feature vectors, D = ((x, y)i ), where y ∈ {0, 1} and x =
(x1 , x2 , x3 ) is three dimensional. It looks like the first element of x is one of three characters
{r, g, b}; the second element is one of two characters {y, n}, and the third element is a positive
number. Which of the following is the Naive Bayes classifier for an observation x?

a) y(x) = arg maxy∈{0,1} P r(x1 , x2 , x3 |y)P r(y)

b) y(x) = arg maxy∈{0,1} P r(x1 |y)P r(x2 |y)P r(x3 |y)P r(y)

c) y(x) = arg maxy∈{0,1} P r(y|x1 )P r(y|x2 )P r(y|x3 )P r(y)

d) y(x) = arg maxy∈{0,1} P r(y|x1 , x2 , x3 )P r(y)

Page 6 (of 9)

DD2421 Machine Learning • VT 2023

Sturm, Conradt, and Maki
B-7 Learning by Minimizing the Minimum Expected Loss (3p)

A good strategy for probabilistic learning is to choose the parameters θ ∈ Θ of the conditional
distribution P r(y|X = x, θ) that minimize the minimum expected loss:
Z
∗ 0
θ = arg min min EY |X=x,θ [L(y, y )]P r(x)dx
0
θ∈Θ x∈X y ∈Y
Z Z
0
= arg min min
0
L(y, y )P r(y|X = x, θ)dy P r(x)dx (3)
θ∈Θ x∈X y ∈Y y∈Y

where L(y, y 0 ) is a loss function. Define L(y, y 0 ) = (y−y 0 )2 and assume Y |X = x ∼ Unif[0, θx],
θ > 0, that is a uniform distribution over the domain 0 and θx, for all positive x. Show that θ∗ → 0
results. Show your work.

Page 7 (of 9)

DD2421 Machine Learning • VT 2023

Sturm, Conradt, and Maki
C Graded problems
Potential inquiries to be addressed to Jörg Conradt.

C-1 Support Vector Classification (4p)

The following diagram shows a small data set consisting of four RED samples (A, B, C, D)
and four BLUE samples (E, F, G, H). This data set can be linearly separated.

a) We use a linear support vector machine (SVM) without kernel function to correctly separate
the BLUE and the RED class. Which of the data points (A-H) can be removed for training
without changing the resulting SVM decision boundary? Name the point(s) that can be
removed, and justify with KEYWORDS (or very short sentences). (2p)

b) Assume someone suggests using a non-linear kernel for the SVM classification of the above
data set (A-H). Give one argument in favor and one argument against using non-linear SVM
classification for such a data set. USE KEYWORDS! (2p)

Page 8 (of 9)

DD2421 Machine Learning • VT 2023

Sturm, Conradt, and Maki
C-2 Neuronal Networks (4p)

The following diagram shows a simple neuronal network with step activation functions in all
neurons. The weight for incoming signals are shown directly at the neurons.

a) Draw a 2D diagram of the input space (two dimensional plot of X1 vs. X2 ) and show for
which area of the input space the network produces a positive output. (2p)

b) Can this network be implemented in a single neuron with linear activation function (yes/no)?
Explain in KEYWORDS. (1p)

c) Assume all weights in the network double their value (the small numbers right next to the
neuron input signal, including the weight for the constant bias input). What happens with
the output? (1p)

Page 9 (of 9)

DD2421 Machine Learning • VT 2023

Sturm, Conradt, and Maki

ML 20240315
No ratings yet
ML 20240315
8 pages
ML 20231026 1
No ratings yet
ML 20231026 1
8 pages
Machine Learning 10-701 Exam Prep
No ratings yet
Machine Learning 10-701 Exam Prep
14 pages
21Csc305P-Machine Learning: Offline
No ratings yet
21Csc305P-Machine Learning: Offline
8 pages
Machine Learning Quiz for Students
No ratings yet
Machine Learning Quiz for Students
45 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
Machine Learning MCQ Assignment
No ratings yet
Machine Learning MCQ Assignment
56 pages
Machine Learning Assignment Solutions
No ratings yet
Machine Learning Assignment Solutions
46 pages
Machine Learning Midterm Prep
No ratings yet
Machine Learning Midterm Prep
42 pages
ML - Gate - Test 1
No ratings yet
ML - Gate - Test 1
7 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Lecture10 Mid
No ratings yet
Lecture10 Mid
43 pages
ML Sample Exam
No ratings yet
ML Sample Exam
7 pages
Machine Learning Exam Instructions
No ratings yet
Machine Learning Exam Instructions
16 pages
ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
EE2211 Past Paper
No ratings yet
EE2211 Past Paper
14 pages
COMP 1003&1433 Midterm (Tuesday)
No ratings yet
COMP 1003&1433 Midterm (Tuesday)
8 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
Endsem ML Regular AK
No ratings yet
Endsem ML Regular AK
7 pages
Midterm Solutions
No ratings yet
Midterm Solutions
11 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Mock End Term Solution
No ratings yet
Mock End Term Solution
12 pages
Machine Learning Practice Quiz
No ratings yet
Machine Learning Practice Quiz
37 pages
SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
ML 1
No ratings yet
ML 1
64 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
Machine Learning Exam Guide
No ratings yet
Machine Learning Exam Guide
9 pages
EE2211 Past Paper Ans
No ratings yet
EE2211 Past Paper Ans
19 pages
12s 701 Final
No ratings yet
12s 701 Final
17 pages
Machine Learning PYQ 2022 Ans
No ratings yet
Machine Learning PYQ 2022 Ans
17 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
Final 2006
No ratings yet
Final 2006
15 pages
Stanford University CS 229, Autumn 2014 Midterm Examination
No ratings yet
Stanford University CS 229, Autumn 2014 Midterm Examination
23 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
Endsem ML Makeup AK - 1
No ratings yet
Endsem ML Makeup AK - 1
7 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
hw1 PDF
No ratings yet
hw1 PDF
6 pages
Homework Set 3
No ratings yet
Homework Set 3
7 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
Data Analysis & Interpretation Questions
No ratings yet
Data Analysis & Interpretation Questions
10 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
Introduction To Machine Learning IIT KGP Week 2
100% (1)
Introduction To Machine Learning IIT KGP Week 2
14 pages
ML FinalUpdated 1
No ratings yet
ML FinalUpdated 1
45 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
67 pages
COMP4702 Notes 2019: Week 2 - Supervised Learning
No ratings yet
COMP4702 Notes 2019: Week 2 - Supervised Learning
23 pages
Statistical Methods-1
No ratings yet
Statistical Methods-1
63 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
2023 Machine Learning
No ratings yet
2023 Machine Learning
8 pages
Final 2006
No ratings yet
Final 2006
15 pages
Midterm Solutions For Machine Learning
No ratings yet
Midterm Solutions For Machine Learning
13 pages
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
No ratings yet
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
3 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
HW 1
No ratings yet
HW 1
4 pages
HW 02
No ratings yet
HW 02
3 pages
Vision L4 U6 Progress Test A AK
No ratings yet
Vision L4 U6 Progress Test A AK
1 page
Investigating Mechanical Properties of Animal Bone Powder Partially Replaced Cement in Concrete Production
No ratings yet
Investigating Mechanical Properties of Animal Bone Powder Partially Replaced Cement in Concrete Production
14 pages
CPU Instruction Set and Architecture
No ratings yet
CPU Instruction Set and Architecture
5 pages
MMRPG Elemental Control Power Cards
No ratings yet
MMRPG Elemental Control Power Cards
2 pages
NCC - Reference Letter
No ratings yet
NCC - Reference Letter
2 pages
Revolutionizing Agriculture
No ratings yet
Revolutionizing Agriculture
23 pages
Newspaper and Magazine Production Note
No ratings yet
Newspaper and Magazine Production Note
10 pages
CSB Scope
No ratings yet
CSB Scope
7 pages
Fiverr Usage and Growth Statistics - How Many People Use Fiverr in 2022
No ratings yet
Fiverr Usage and Growth Statistics - How Many People Use Fiverr in 2022
16 pages
Project 2 978314918104156
No ratings yet
Project 2 978314918104156
3 pages
Resume SrScrumMaster SagarK M35
No ratings yet
Resume SrScrumMaster SagarK M35
6 pages
VISION - QR Code Requirement Sharing v.6 - 5th IP
No ratings yet
VISION - QR Code Requirement Sharing v.6 - 5th IP
33 pages
Atmos-Pipe PB en
No ratings yet
Atmos-Pipe PB en
2 pages
Database Lab for IT Students
No ratings yet
Database Lab for IT Students
3 pages
Damcos BRC 4000
No ratings yet
Damcos BRC 4000
4 pages
Metal Claddings: Producing With Quality
No ratings yet
Metal Claddings: Producing With Quality
48 pages
Construction Plan Details
No ratings yet
Construction Plan Details
1 page
Computer Graphics Interaction Guide
No ratings yet
Computer Graphics Interaction Guide
15 pages
M K Engineering 1 X 1500 Kva Silent DG Set
No ratings yet
M K Engineering 1 X 1500 Kva Silent DG Set
13 pages
1 Over 500 Misc Java Interview Questions (PDFDrive)
No ratings yet
1 Over 500 Misc Java Interview Questions (PDFDrive)
106 pages
SCADA Integration
No ratings yet
SCADA Integration
16 pages
Enterprise DevOps Git Management
No ratings yet
Enterprise DevOps Git Management
46 pages
3 KAIST International Applicant GRADUATE
No ratings yet
3 KAIST International Applicant GRADUATE
2 pages
CR2BD01-CalipriRC220Basic DataSheet EN
No ratings yet
CR2BD01-CalipriRC220Basic DataSheet EN
4 pages
Frequently Asked Questions (Faq) For Italk: NO Answer
0% (1)
Frequently Asked Questions (Faq) For Italk: NO Answer
4 pages
GB/T 3287-2000 (English Version) Malleable Cast Iron Pipe Fittings
No ratings yet
GB/T 3287-2000 (English Version) Malleable Cast Iron Pipe Fittings
3 pages
Hardware Implementation of The A5/3 & A5/4 GSM Encryption Algorithms
No ratings yet
Hardware Implementation of The A5/3 & A5/4 GSM Encryption Algorithms
6 pages
Com 114
No ratings yet
Com 114
8 pages
Network Analysis CPM Pert Crash: Activity Time Estimates Direct Cost Estimates
No ratings yet
Network Analysis CPM Pert Crash: Activity Time Estimates Direct Cost Estimates
4 pages
FST-1 e 2024
No ratings yet
FST-1 e 2024
3 pages

ML 20230316 1

Uploaded by

ML 20230316 1

Uploaded by

Exam in DD2421 Machine Learning

2023-03-16, kl 14.00 – 18.00

DD2421 Machine Learning • VT 2023

A-1 Terminology (4p)

1) An approach to find useful dimension for classification

A-2 Nearest Neighbor, Classification (4p)

Answer the following questions while motivating them.

DD2421 Machine Learning • VT 2023

A-3 Entropy and Decision Trees/Forests (6p)

DD2421 Machine Learning • VT 2023

for a particular value of λ.

Briefly justify each of your answers.

a) As we decrease λ, the variance of the model will:

A-5 PCA, Subspace Methods (4p)

DD2421 Machine Learning • VT 2023

B-1 Identify Bayes’ Theorem (1p)

For two events A and B, which of the following is Bayes’ Theorem:

a) P (A|B) = P (A, B)P (A)/P (B)

b) P (A|B) = P (B|A)P (A, B)/P (B)

c) P (A|B) = P (B|A)P (A)/P (B)

d) P (A|B) = P (A, B)/P (B)

B-2 Conditioning (1p)

a) P (A, C|B)P (B)

b) P (A|B, C)P (B, C)

c) P (A, B|C)P (B)

d) P (A|C, B)P (B|C)P (C)

B-3 Shoogee picks (2p)

B-4 Inference by Minimizing Expected Loss (3p)

B-5 Regression (1p)

DD2421 Machine Learning • VT 2023

Consider you have modelled D = ((x, y)i : x, y ∈ R) with

B-6 Probabilistic classification (1p)

a) y(x) = arg maxy∈{0,1} P r(x1 , x2 , x3 |y)P r(y)

c) y(x) = arg maxy∈{0,1} P r(y|x1 )P r(y|x2 )P r(y|x3 )P r(y)

d) y(x) = arg maxy∈{0,1} P r(y|x1 , x2 , x3 )P r(y)

DD2421 Machine Learning • VT 2023

DD2421 Machine Learning • VT 2023

C-1 Support Vector Classification (4p)

DD2421 Machine Learning • VT 2023

DD2421 Machine Learning • VT 2023

You might also like