0% found this document useful (0 votes)

10 views8 pages

Final f03

Uploaded by

johnpritch11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views8 pages

Final f03

Uploaded by

johnpritch11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

6.

867 Machine learning

Final exam (Fall 2003)

December 10, 2003

Problem 1: your information

1.1. Your name and MIT ID:

1.2. The grade you would give to yourself + brief justiﬁcation (if you feel that
there’s no question your grade should be an A, then just say A):

Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
Problem 2
2.1. (3 points) Let F be a set of classifiers whose VC-dimension is 5. Suppose we have
four training examples and labels, {(x1 , y1 ), . . . , (x4 , y4 )}, and select a classifier fˆ
from F by minimizing classification error on the training set. In the absence of any
other information about the set of classifiers F, can we say that the prediction fˆ(x5 )
for a new example x5 has any relation to the training set? Briefly justify your answer.

2.2. (T/F – 2 points) Consider a set of classifiers that includes all linear
classifiers that use different choices of strict subsets of the components
of the input vectors x ∈ Rd . Claim: the VC-dimension of this combined
set cannot be more than d + 1.

2.3. (T/F – 2 points) Structural risk minimization is based on comparing

upper bounds on the generalization error, where the bounds hold with
probability 1 − δ over the choice of the training set. Claim: the value
of the conﬁdence parameter δ cannot aﬀect model selection decisions.

2.4. (6 points) Suppose we use class-conditional Gaussians to solve a binary classiﬁcation

task. The covariance matrices of the two Gaussians are constrained to be σ 2 I, where
the value of σ 2 is ﬁxed and I is the identity matrix. The only adjustable parame
ters are therefore the means of the class conditional Gaussians, and the prior class
frequencies. We use the maximum likelihood criterion to train the model. Check all
that apply.

( ) For any three distinct training points and sufficiently small σ 2 , the classifier
would have zero classification error on the training set
( ) For any three training points and sufficiently large σ 2 , the classifier would always
make one classification error on the training set
( ) The classification error of this classifier on the training set is always at least that
of a linear SVM, whether the points are linearly separable or not

Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
Problem 3
3.1. (T/F – 2 points) In the AdaBoost algorithm, the weights on all the
misclassiﬁed points will go up by the same multiplicative factor.

3.2. (3 points) Provide a brief rationale for the following observation about AdaBoost.
The weighted error of the k th weak classiﬁer (measured relative to the weights at the
beginning of the k th iteration) tends to increase as a function of the iteration k.

Consider a text classification problem, where documents are represented by binary (0/1)
feature vectors φ = [φ1 , . . . , φm ]T ; here φi indicates whether word i appears in the document.
We define a set of weak classifiers, h(φ; θ) = yφi , parameterized by θ = {i, y} (the choice
of the component, i ∈ {1, . . . , m}, and the class label, y ∈ {−1, 1}, that the component
should be associated with). There are exactly 2m possible weak learners of this type.
We use this boosting algorithm for feature selection. The idea is to simply run the boosting
algorithm and select the features or components in the order in which they were identified
by the weak learners. We assume that the boosting algorithm finds the best available weak
classifier at each iteration.

3.3. (T/F – 2 points) The boosting algorithm described here can select
the exact same weak classiﬁer more than once.

3.4. (4 points) Is the ranking of features generated by the boosting algorithm likely to
be more useful for a linear classiﬁer than the ranking from simple mutual information
calculations (estimates Iˆ(y; φi )). Brieﬂy justify your answer.

4.5 4.5

4 4

3.5 3.5

3 3
observation x

observation x
2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
time step t time step t

Figure 1a) Figure 1b)

Figure 1: Time dependent observations. The data points in the figure are generated as
sets of five consecutive time dependent observations, x1 , . . . , x5 . The clusters come from
repeatedly generating five consecutive samples. Each visible cluster consists of 20 points,
and has approximately the same variance. The mean of each cluster is shown with a large
X.

Consider the data in Figure 1 (see the caption for details). We begin by modeling this data
with a three state HMM, where each state has a Gaussian output distribution with some
mean and variance (means and variances can be set independently for each state).

4.1. (4 points) Draw the state transition diagram and the initial state distribution for
a three state HMM that models the data in Figure 1 in the maximum likelihood
sense. Indicate the possible transitions and their probabilities in the ﬁgure below
(whether or not the state is reachable after the ﬁrst two steps). In order words, your
drawing should characterize the 1st order homogeneous Markov chain govering the
evolution of the states. Also indicate the means of the corresponding Gaussian output
distributions (please use the boxes).

Begin 2

t=1 t=2

Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
4.2. (4 points) In Figure 1a draw as ovals the clusters of outputs that would form if
we repeatedly generated samples from your HMM over time steps t = 1, . . . , 5. The
height of the ovals should reﬂect the variance of the clusters.

4.3. (4 points) Suppose at time t = 2 we observe x2 = 1.5 but don’t see

the observations for other time points. What is the most likely state at
t = 2 according to the marginal posterior probability γ2 (s) deﬁned as
P (s2 = s|x2 = 1.5).

4.4. (2 points) What would be the most likely state at t = 2 if we also saw
x3 = 0 at t = 3? In this case γ2 (s) = P (s2 = s|x2 = 1.5, x3 = 0).

4.5. (4 points) We can also try to model the data with conditional mixtures (mixtures
of experts), where the conditioning is based on the time step. Suppose we only use
two experts which are linear regression models with additive Gaussian noise, i.e.,
1 � 1 2
�
P (x|t, θi ) = � exp − (x − θ i0 − θ i1 t)
2πσi2 2σi2
for i = 1, 2. The gating network is a logistic regression model from t to binary
selection of the experts. Assuming your estimation of the conditional mixture model
is successfully in the maximum likelihood sense, draw the resulting mean predictions
of the two linear regression models as a function of time t in Figure 1b). Also, with
a vertical line, indicate where the gating network would change it’s preference from
one expert to the other.

4.6. (T/F – 2 points) Claim: by repeatedly sampling from your con

ditional mixture model at successive time points t = 1, 2, 3, 4, 5, the
resulting samples would resemble the data in Figure 1
.

4.7. (4 points) Having two competing models for the same data, the HMM and the
mixture of experts model, we’d like to select the better one. We think that any
reasonable model selection criterion would be able to select the better model in this
case. Which model would we choose? Provide a brief justiﬁcation.

y1 y2 y3 y1 y2 y3
a) Bayesian network (directed) b) Markov random ﬁeld (undirected)

Figure 2: Graphical models

5.1. (2 points) List two diﬀerent types of independence properties satisﬁed by the Bayesian
network model in Figure 2a.

5.2. (2 points) Write the factorization of the joint distribution implied by the directed
graph in Figure 2a.

5.3. (2 points) Provide an alternative factorization of the joint distribution, diﬀerent

from the previous one. Your factorization should be consistent with all the properties
of the directed graph in Figure 2a. Consistency here means: whatever is implied by
the graph should hold for the associated distribution.

Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
5.4. (4 points) Provide an independence statement that holds for the undirected model
in Figure 2b but does NOT hold for the Bayesian network. Which edge(s) should
we add to the undirected model so that it would be consistent with (wouldn’t imply
anything that is not true for) the Bayesian network?

5.5. (2 points) Is your resulting undirected graph triangulated (Y/N)?

5.6. (4 points) Provide two directed graphs representing 1) a mixture of two experts
model for classiﬁcation, and 2) a mixture of Gaussians classiﬁers with two mixture
components per class. Please use the following notation: x for the input observation,
y for the class, and i for any selection of components.

5 5

4.5 4.5

4 4

3.5 3.5

3 3
observation x

observation x
2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
time step t time step t

Figure 1a) Figure 1b)

Begin 2

t=1 t=2

x1 x2 x3 x1 x2 x3

y1 y2 y3 y1 y2 y3
a) Bayesian network (directed) b) Markov random ﬁeld (undirected)

Cite as: Tommi Jaakkola, course materials for 6.867 Machine Learning, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Final F03soln
No ratings yet
Final F03soln
10 pages
Final F01soln
No ratings yet
Final F01soln
13 pages
MIT 6.867 Machine Learning Exam
No ratings yet
MIT 6.867 Machine Learning Exam
13 pages
Final f02
No ratings yet
Final f02
12 pages
Midterm f01
No ratings yet
Midterm f01
10 pages
Final F04soln
No ratings yet
Final F04soln
10 pages
Final F02soln
No ratings yet
Final F02soln
11 pages
Sample Midterm Exam 6
No ratings yet
Sample Midterm Exam 6
11 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Sample Midterm Exam 6 - Solutions
No ratings yet
Sample Midterm Exam 6 - Solutions
10 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Problemset2 PDF
No ratings yet
Problemset2 PDF
4 pages
HW 1
No ratings yet
HW 1
4 pages
Final 2006
No ratings yet
Final 2006
15 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
Lec3 PDF
No ratings yet
Lec3 PDF
6 pages
Pattern Recognition: Revision
No ratings yet
Pattern Recognition: Revision
15 pages
hw5 1
No ratings yet
hw5 1
6 pages
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
No ratings yet
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
25 pages
EE2211 Past Paper
No ratings yet
EE2211 Past Paper
14 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
Engineering Exam Analysis
No ratings yet
Engineering Exam Analysis
7 pages
CS246 Final Exam Solutions, Winter 2011
No ratings yet
CS246 Final Exam Solutions, Winter 2011
18 pages
CS229 Midterm Solutions 2010
No ratings yet
CS229 Midterm Solutions 2010
8 pages
Linear Regression, Active Learning
No ratings yet
Linear Regression, Active Learning
10 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
12s 701 Final
No ratings yet
12s 701 Final
17 pages
Machine Learning Quiz for Students
No ratings yet
Machine Learning Quiz for Students
45 pages
Machine Learning 10-701 Exam Prep
No ratings yet
Machine Learning 10-701 Exam Prep
14 pages
2011 End Spring 2011 Computer Science Machine Learning
No ratings yet
2011 End Spring 2011 Computer Science Machine Learning
10 pages
Quiz3 2024
No ratings yet
Quiz3 2024
2 pages
10-701 Midterm Exam, Fall 2007
No ratings yet
10-701 Midterm Exam, Fall 2007
25 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Data Science Exam Questions
No ratings yet
Data Science Exam Questions
9 pages
Midterm F02soln
No ratings yet
Midterm F02soln
14 pages
hw3 Red
No ratings yet
hw3 Red
4 pages
Lec4 PDF
No ratings yet
Lec4 PDF
7 pages
Machine Learning Assignment Solutions
No ratings yet
Machine Learning Assignment Solutions
46 pages
ML Columbia PDF
No ratings yet
ML Columbia PDF
615 pages
Homework 2: SVM, Kernel Methods, Ensemble Learning, Learning Theory
No ratings yet
Homework 2: SVM, Kernel Methods, Ensemble Learning, Learning Theory
12 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
Assignment 1: 1. Likelihood and Bayesian Inference in The Linear Model
No ratings yet
Assignment 1: 1. Likelihood and Bayesian Inference in The Linear Model
3 pages
Convex Sets, Matrices, and SVMs
No ratings yet
Convex Sets, Matrices, and SVMs
4 pages
Machine Learning MCQ Assignment
No ratings yet
Machine Learning MCQ Assignment
56 pages
10-601 Machine Learning Exam
No ratings yet
10-601 Machine Learning Exam
21 pages
HW 1
No ratings yet
HW 1
4 pages
Ex 83622 2025 1
No ratings yet
Ex 83622 2025 1
2 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
Quiz 1
No ratings yet
Quiz 1
5 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
Midterm Sample
No ratings yet
Midterm Sample
16 pages
hw3 Solutions PDF
No ratings yet
hw3 Solutions PDF
11 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
t3 Sol
No ratings yet
t3 Sol
8 pages
2017-18-I MS Key
No ratings yet
2017-18-I MS Key
6 pages
Stats Medic Ultimate Inference Guide For AP Statistics
No ratings yet
Stats Medic Ultimate Inference Guide For AP Statistics
3 pages
Linear Regression Models: Applications in R (Chapman & Hall/CRC Statistics in The Social and Behavioral Sciences) 1st Edition John P. Hoffmann PDF Download
No ratings yet
Linear Regression Models: Applications in R (Chapman & Hall/CRC Statistics in The Social and Behavioral Sciences) 1st Edition John P. Hoffmann PDF Download
70 pages
India Test House: SAMPLE: Steels For Piston Pins MEASURAND: Chromium TEST METHOD: IS 228 (Part 6) : 1987 (RA 2018)
No ratings yet
India Test House: SAMPLE: Steels For Piston Pins MEASURAND: Chromium TEST METHOD: IS 228 (Part 6) : 1987 (RA 2018)
2 pages
STAV102 Test 1 FINAL
No ratings yet
STAV102 Test 1 FINAL
13 pages
Forecasting: To Accompany
No ratings yet
Forecasting: To Accompany
61 pages
Handout No. 1
No ratings yet
Handout No. 1
11 pages
Week5 Standardization
No ratings yet
Week5 Standardization
10 pages
Chapter 10 One-Sample Tests
No ratings yet
Chapter 10 One-Sample Tests
28 pages
Understanding Research Reliability
No ratings yet
Understanding Research Reliability
2 pages
Large Sample Tests of Hypothesis
No ratings yet
Large Sample Tests of Hypothesis
44 pages
Econometrics Project: Pana Elena Bianca Group 137
No ratings yet
Econometrics Project: Pana Elena Bianca Group 137
17 pages
Unit III Fds
No ratings yet
Unit III Fds
28 pages
Preprocessing - M2
No ratings yet
Preprocessing - M2
53 pages
Aps U6 Test Review 2016 Key
No ratings yet
Aps U6 Test Review 2016 Key
4 pages
Understanding Populations, Samples and Sample Size Requirements
No ratings yet
Understanding Populations, Samples and Sample Size Requirements
14 pages
Linear Regression Project
No ratings yet
Linear Regression Project
2 pages
Applsci 12 08252
No ratings yet
Applsci 12 08252
20 pages
RSH 10 CH 5
No ratings yet
RSH 10 CH 5
81 pages
Full Download Mathematical Statistics and Data Analysis 3ed Duxbury Advanced 3rd Edition John A. Rice PDF
100% (23)
Full Download Mathematical Statistics and Data Analysis 3ed Duxbury Advanced 3rd Edition John A. Rice PDF
50 pages
Weibull Tables
No ratings yet
Weibull Tables
2 pages
Saint Mary'S College
No ratings yet
Saint Mary'S College
3 pages
Statistical Instruments and References Writing in Research
No ratings yet
Statistical Instruments and References Writing in Research
36 pages
Winterfell Watershed Biodiversity Study
No ratings yet
Winterfell Watershed Biodiversity Study
23 pages
Multivariate Analysis for Execs
No ratings yet
Multivariate Analysis for Execs
8 pages
Correlation Exercises
No ratings yet
Correlation Exercises
1 page
Recoverable Resource Estimation
No ratings yet
Recoverable Resource Estimation
130 pages
No 2 (SPSS) : Variables Entered/Removed
No ratings yet
No 2 (SPSS) : Variables Entered/Removed
3 pages
Vinija's Notes - AI Fundamental Concepts
No ratings yet
Vinija's Notes - AI Fundamental Concepts
107 pages
GWP2 G8525
No ratings yet
GWP2 G8525
20 pages

Final f03

Uploaded by

Final f03

Uploaded by

6.

867 Machine learning

Final exam (Fall 2003)

December 10, 2003

Problem 1: your information

2.3. (T/F – 2 points) Structural risk minimization is based on comparing

2.4. (6 points) Suppose we use class-conditional Gaussians to solve a binary classiﬁcation

Figure 1a) Figure 1b)

4.3. (4 points) Suppose at time t = 2 we observe x2 = 1.5 but don’t see

4.6. (T/F – 2 points) Claim: by repeatedly sampling from your con­

Figure 2: Graphical models

5.3. (2 points) Provide an alternative factorization of the joint distribution, diﬀerent

5.5. (2 points) Is your resulting undirected graph triangulated (Y/N)?

Figure 1a) Figure 1b)

You might also like

4.6. (T/F – 2 points) Claim: by repeatedly sampling from your con