0% found this document useful (0 votes)

33 views10 pages

Midterm f01

Uploaded by

johnpritch11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views10 pages

Midterm f01

Uploaded by

johnpritch11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

6.

867 Machine learning

Mid-term exam

October 8, 2003

(2 points) Your name and MIT ID:

Problem 1
In this problem we use sequential active learning to estimate a linear model
y = w1 x + w0 + �
where the input space (x values) are restricted to be within [−1, 1]. The noise term �
is assumed to be a zero mean Gaussian with an unknown variance σ 2 . Recall that our
sequential active learning method selects input points with the highest variance in the
predicted outputs. Figure 1 below illustrates what outputs would be returned for each
query (the outputs are not available unless speciﬁcally queried).
We start the learning algorithm by querying outputs at two input points, x = −1 and
x = 1, and let the sequential active learning algorithm select the remaining query points.

1. (4 points) Give the next two inputs that the sequential active learning method would
pick. Explain why.

Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
1.5

0.5
y

−0.5
−1 −0.5 0 0.5 1
x

Figure 1: Samples from the underlying relation between the inputs x and outputs y. The
outputs are not available to the learning algorithm unless speciﬁcally queried.

2. (4 points) In the ﬁgure 1 above, draw (approximately) the linear relation between
the inputs and outputs that the active learning method would ﬁnd after a large
number of iterations.

3. (6 points) Would the result be any diﬀerent if we started with query points x = 0
and x = 1 and let the sequential active learning algorithm select the remaining query
points? Explain why or why not.

average log−probability of training labels

log−probability

−0.2

average log−probability of test labels

−0.4

0 0.5 1 1.5 2 2.5 3 3.5 4

regularization parameter C

Figure 2: Log-probability of labels as a function of regularization parameter C

Here we use a logistic regression model to solve a classification problem. In Figure 2, we have
plotted the mean log-probability of labels in the training and test sets after having trained
the classifier with quadratic regularization penalty and different values of the regularization
parameter C.

1. (T/F – 2 points) In training a logistic regression model by maximizing

the likelihood of the labels given the inputs we have multiple locally
optimal solutions.
2. (T/F – 2 points) A stochastic gradient algorithm for training logistic
regression models with a ﬁxed learning rate will ﬁnd the optimal setting
of the weights exactly.
3. (T/F – 2 points) The average log-probability of training labels as in
Figure 2 can never increase as we increase C.

5. (T/F – 2 points) The log-probability of labels in the test set would

decrease for large values of C even if we had a large number of training
examples.
6. (T/F – 2 points) Adding a quadratic regularization penalty for the
parameters when estimating a logistic regression model ensures that
some of the parameters (weights associated with the components of the
input vectors) vanish.

Problem 3
Consider a training set consisting of the following eight examples:

Examples labeled “0” Examples labeled “1”

3,3,0 2,2,0
3,3,1 1,1,1
3,3,0 1,1,0
2,2,1 1,1,1

The questions below pertain to various feature selection methods that we could use with
the logistic regression model.

1. (2 points) What is the mutual information between the third feature

and the target label based on the training set?

2. (2 points) Which feature(s) would a ﬁlter feature selection method

choose? You can assume here that the mutual information criterion is
evaluated between a single feature and the label.

4. (4
�3points) Which features would a regularization approach with a 1-norm penalty
i=1 |wi | choose? Explain brieﬂy.

Problem 4
1. (6 points) Figure 3 shows the first decision stump that the AdaBoost algorithm
finds (starting with the uniform weights over the training examples). We claim that
the weights associated with the training examples after including this decision stump
will be [1/8, 1/8, 1/8, 5/8] (the weights here are enumerated as in the figure). Are
these weights correct, why or why not?
Do not provide an explicit calculation of the weights.

2. (T/F – 2 points) The votes that AdaBoost algorithm assigns to the

component classiﬁers are optimal in the sense that they ensure larger
“margins” in the training set (higher majority predictions) than any
other setting of the votes.
3. (T/F – 2 points) In the boosting iterations, the training error of each .
new decision stump and the training error of the combined classiﬁer
vary roughly in concert

Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
+1

x x o x
+1 +1 −1 +1
−1
1 2 3 4

Figure 3: The ﬁrst decision stump that the boosting algorithm ﬁnds.

Problem 5

x
2
x x

x x x
x

x
x x
x o
o
x o
o
x
o o o
o

o
x
1

Figure 4: Training set, maximum margin linear separator, and the support vectors (in
bold).

1. (4 points) What is the leave-one-out cross-validation error estimate for

maximum margin separation in ﬁgure 4? (we are asking for a number)

2. (T/F – 2 points) We would expect the support vectors to remain

the same in general as we move from a linear kernel to higher order
polynomial kernels.
3. (T/F – 2 points) Structural risk minimization is guaranteed to ﬁnd
the model (among those considered) with the lowest expected loss

Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
4. (6 points) What is the VC-dimension of a mixture of two Gaussians model in the
plane with equal covariance matrices? Why?

Problem 6
Using a set of 100 labeled training examples (two classes), we train the following models:

GaussI A Gaussian mixture model (one Gaussian per class), where the covariance matrices
are both set to I (identity matrix).

GaussX A Gaussian mixture model (one Gaussian per class) without any restrictions on
the covariance matrices.

LinLog A logistic regression model with linear features.

QuadLog A logistic regression model, using all linear and quadratic features.

1. (6 points) After training, we measure for each model the average log probability of
labels given examples in the training set. Specify all the equalities or inequalities that
must always hold between the models relative to this performance measure. We are
looking for statements like “model 1 ≤ model 2” or “model 1 = model 2”. If no such
statement holds, write “none”.

Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
2. (4 points) Which equalities and inequalities must always hold if we instead use the
mean classiﬁcation error in the training set as the performance measure? Again use
the format “model 1 ≤ model 2” or “model 1 = model 2”. Write “none” if no such
statement holds.

0.5
y

−0.5
−1 −0.5 0 0.5 1
x

Figure 1. Samples from the underlying relation between the inputs x and outputs y. The
outputs are not available to the learning algorithm unless speciﬁcally queried

average log−probability of training labels

log−probability

−0.2

average log−probability of test labels

−0.4

0 0.5 1 1.5 2 2.5 3 3.5 4

regularization parameter C

Figure 2. Log-probability of labels as a function of regularization parameter C

Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
+1

x x o x
+1 +1 −1 +1
−1
1 2 3 4
Figure 3. The ﬁrst decision stump that the boosting algorithm ﬁnds.

x
2
x x

x x x
x

x
x x
x o
o
x o
o
x
o o o
o

o
x
1

Figure 4. Training set, maximum margin linear separator, and the support vectors (in
bold).

Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Sample Midterm Exam 6
No ratings yet
Sample Midterm Exam 6
11 pages
Midterm F02soln
No ratings yet
Midterm F02soln
14 pages
MIT 6.867 Machine Learning Exam
No ratings yet
MIT 6.867 Machine Learning Exam
13 pages
Sample Midterm Exam 6 - Solutions
No ratings yet
Sample Midterm Exam 6 - Solutions
10 pages
Final f03
No ratings yet
Final f03
8 pages
Final F04soln
No ratings yet
Final F04soln
10 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
Final F03soln
No ratings yet
Final F03soln
10 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 2: Solutions
No ratings yet
Massachusetts Institute of Technology: 6.867 Machine Learning, Fall 2006 Problem Set 2: Solutions
7 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
Final F01soln
No ratings yet
Final F01soln
13 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
hw3 Red
No ratings yet
hw3 Red
4 pages
EE 769 2023.02.23 Mid Term
No ratings yet
EE 769 2023.02.23 Mid Term
2 pages
Final f02
No ratings yet
Final f02
12 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
No ratings yet
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
6 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Lec4 PDF
No ratings yet
Lec4 PDF
7 pages
Machine Learning 10-701 Exam Prep
No ratings yet
Machine Learning 10-701 Exam Prep
14 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
Machine Learning Midterm Exam
No ratings yet
Machine Learning Midterm Exam
106 pages
6.867 Machine Learning: Mid-Term Exam October 13, 2004
No ratings yet
6.867 Machine Learning: Mid-Term Exam October 13, 2004
11 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
10-701 Midterm Exam, Fall 2007
No ratings yet
10-701 Midterm Exam, Fall 2007
25 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
Midterm2008f Sol
No ratings yet
Midterm2008f Sol
12 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
Machine Learning Midterm 2009
No ratings yet
Machine Learning Midterm 2009
23 pages
EE2211 Past Paper
No ratings yet
EE2211 Past Paper
14 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
10-701 Midterm Exam Solutions, Spring 2007
No ratings yet
10-701 Midterm Exam Solutions, Spring 2007
20 pages
hw5 1
No ratings yet
hw5 1
6 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
HW 1
No ratings yet
HW 1
4 pages
Machine Learning Problem Set
No ratings yet
Machine Learning Problem Set
7 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
Machine Learning Quiz for Students
No ratings yet
Machine Learning Quiz for Students
45 pages
COMPSCI5014 1 Machine Learning (M) 201904
No ratings yet
COMPSCI5014 1 Machine Learning (M) 201904
7 pages
ASU Assignment2 Sol
No ratings yet
ASU Assignment2 Sol
8 pages
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages
Quiz2 Mock Solutions
No ratings yet
Quiz2 Mock Solutions
19 pages
Machine Learning Midterm Exam 2020
No ratings yet
Machine Learning Midterm Exam 2020
6 pages
CS229 Midterm Solutions 2010
No ratings yet
CS229 Midterm Solutions 2010
8 pages
HW 4
No ratings yet
HW 4
7 pages
Practice Questions Lec 18 45
No ratings yet
Practice Questions Lec 18 45
4 pages
Machine Learning Assignment Solutions
No ratings yet
Machine Learning Assignment Solutions
46 pages
CS 229, Public Course Problem Set #1 Solutions: Supervised Learning
No ratings yet
CS 229, Public Course Problem Set #1 Solutions: Supervised Learning
10 pages
Homework2 v1.0
No ratings yet
Homework2 v1.0
5 pages
Endsem ML Makeup AK - 1
No ratings yet
Endsem ML Makeup AK - 1
7 pages
Homework 4
No ratings yet
Homework 4
3 pages
ML ES 23-24-II Key
No ratings yet
ML ES 23-24-II Key
4 pages
Ps 1
No ratings yet
Ps 1
5 pages
Lec3 PDF
No ratings yet
Lec3 PDF
6 pages
hw4 Red
No ratings yet
hw4 Red
6 pages
ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
MCSl-217 Mca Ignou Assignment
No ratings yet
MCSl-217 Mca Ignou Assignment
6 pages
Installation & Maintenance Instructions: Motorised Ball Valve
No ratings yet
Installation & Maintenance Instructions: Motorised Ball Valve
9 pages
Digital Transmission & Line Coding
No ratings yet
Digital Transmission & Line Coding
13 pages
Introduction to ICT and IT Basics
No ratings yet
Introduction to ICT and IT Basics
111 pages
Admission Letter-1
No ratings yet
Admission Letter-1
1 page
Humanex A1 C
100% (1)
Humanex A1 C
3 pages
Ashish Sharma: Cluster Innovation Centre
No ratings yet
Ashish Sharma: Cluster Innovation Centre
1 page
0s3 9MA0-01 Pure 1 - Mock Set 3 (Word)
100% (1)
0s3 9MA0-01 Pure 1 - Mock Set 3 (Word)
11 pages
CERTIFICATION
No ratings yet
CERTIFICATION
14 pages
Problems Problem 33-1 (IFRS)
No ratings yet
Problems Problem 33-1 (IFRS)
8 pages
Four-Essential-Questions-For-Boards-To - Ask-About-Generative-Ai
No ratings yet
Four-Essential-Questions-For-Boards-To - Ask-About-Generative-Ai
5 pages
Paladin DesignBase Programs
No ratings yet
Paladin DesignBase Programs
2 pages
Failing To Fail - Achieving Success in Advanced Low Power Design Using UPF
No ratings yet
Failing To Fail - Achieving Success in Advanced Low Power Design Using UPF
2 pages
HR030L
No ratings yet
HR030L
4 pages
UMG510 User Guide for Technicians
No ratings yet
UMG510 User Guide for Technicians
44 pages
Pfe 2018 FACG
100% (2)
Pfe 2018 FACG
44 pages
Grade 2
No ratings yet
Grade 2
11 pages
Olivia Easly Resume
No ratings yet
Olivia Easly Resume
2 pages
Internship Program by DCDIUM Technologies - v10.0
No ratings yet
Internship Program by DCDIUM Technologies - v10.0
2 pages
Morrowind Guide
100% (2)
Morrowind Guide
148 pages
AVL Tree
No ratings yet
AVL Tree
43 pages
Data Communication
No ratings yet
Data Communication
1 page
Prime Bank
No ratings yet
Prime Bank
24 pages
Geotechnical and Structural Instrumentation
No ratings yet
Geotechnical and Structural Instrumentation
3 pages
TEC2000 Brochure A4 PDF
No ratings yet
TEC2000 Brochure A4 PDF
6 pages
Second Semester BSC 2
No ratings yet
Second Semester BSC 2
4 pages
FD DDDD DDDD
No ratings yet
FD DDDD DDDD
2 pages
Virtual Identity: Take Control of Your
No ratings yet
Virtual Identity: Take Control of Your
4 pages
Service Manual for TV Technicians
No ratings yet
Service Manual for TV Technicians
48 pages
Istanbul Metro EMC Assurance
No ratings yet
Istanbul Metro EMC Assurance
14 pages

Midterm f01

Uploaded by

Midterm f01

Uploaded by

6.

867 Machine learning

(2 points) Your name and MIT ID:

average log−probability of training labels

average log−probability of test labels

0 0.5 1 1.5 2 2.5 3 3.5 4

Figure 2: Log-probability of labels as a function of regularization parameter C

1. (T/F – 2 points) In training a logistic regression model by maximizing

5. (T/F – 2 points) The log-probability of labels in the test set would

Examples labeled “0” Examples labeled “1”

1. (2 points) What is the mutual information between the third feature

2. (2 points) Which feature(s) would a ﬁlter feature selection method

2. (T/F – 2 points) The votes that AdaBoost algorithm assigns to the

1. (4 points) What is the leave-one-out cross-validation error estimate for

2. (T/F – 2 points) We would expect the support vectors to remain

LinLog A logistic regression model with linear features.

average log−probability of training labels

average log−probability of test labels

0 0.5 1 1.5 2 2.5 3 3.5 4

Figure 2. Log-probability of labels as a function of regularization parameter C

You might also like