0% found this document useful (0 votes)

320 views18 pages

University of Pennsylvania CIS 520: Machine Learning Midterm, 2016

This 3 sentence summary provides the key details about the document: This is a 60 question multiple choice midterm exam for the University of Pennsylvania course CIS 520 Machine Learning. The exam covers topics like linear regression, regularization, bias and variance, probability, information theory, and decision trees. Students are given 80 minutes to complete the exam which tests their understanding of machine learning concepts and algorithms.

Uploaded by

Ishansi Agrawal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

320 views18 pages

University of Pennsylvania CIS 520: Machine Learning Midterm, 2016

Uploaded by

Ishansi Agrawal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

UNIVERSITY of PENNSYLVANIA

CIS 520: Machine Learning

Midterm, 2016

Exam policy: This exam allows one one-page, two-sided cheat sheet; No
other materials.

Time: 80 minutes. Be sure to write your name and Penn student ID (the
8 bigger digits on your ID card) on the answer form and fill in the associated
bubbles in pencil.

If you think a question is ambiguous, mark what you think is the best answer.
As always, we will consider written regrade requests if your interpretation of
a question differed from what we intended. We will only grade the scantron
forms

For the “TRUE or FALSE” questions, note that “TRUE” is (a) and “FALSE”
is (b). For the multiple choice questions, select exactly one answer.

There are 60 questions.

1
CIS520 Midterm, Fall 2016 2

1. [0 points] This is version A of the exam. Please fill in the “bubble”

for that letter.

2. [2 points] Suppose we have a regularized linear regression model: argminw kY −

Xwk22 +λkwk1 . What is the effect of increasing λ on bias and variance?

(a) Increases bias, increases variance

(b) Increases bias, decreases variance
(c) Decreases bias, increases variance
(d) Decreases bias, decreases variance
(e) Not enough information to tell

F SOLUTION: B

3. [2 points] Suppose we have a regularized linear regression model: argminw kY −

Xwk22 + kwkpp . What is the effect of increasing p on bias and variance
(p ≥ 1) if the weights are all larger than 1?

(a) Increases bias, increases variance

(b) Increases bias, decreases variance
(c) Decreases bias, increases variance
(d) Decreases bias, decreases variance
(e) Not enough information to tell

F SOLUTION: B

4. [2 points] Compared to the variance of the MLE estimate ŵM LE , do

you expect that the variance of the MAP estimate ŵM AP is:

(a) higher
(b) same
(c) lower
(d) it could be any of the above
CIS520 Midterm, Fall 2016 3

F SOLUTION: C

5. [1 points] True or False? Given a distribution p(X, y), it is always (in

theory) possible to compute p(y|X).

F SOLUTION: T

6. [1 points] True or False? It is generally more important to use consis-

tent estimators when one has smaller numbers of training examples.

F SOLUTION: F

7. [1 points] True or False? It is generally more important to used unbi-

ased estimators when one has smaller numbers of training examples.

F SOLUTION: F

8. [1 points] True or False? LASSO is a parametric method.

F SOLUTION: T

9. [2 points] Ridge regression uses what penalty on the regression weights?

(a) L0
(b) L1
(c) L2
(d) L22
(e) none of the above

F SOLUTION: D

10. [1 points] True or False?

P Linear2 regression gives the MLE for data
from a model yi ∼ N ( j wj xij , σ ).
CIS520 Midterm, Fall 2016 4

F SOLUTION: T

11. [2 points] Suppose you are given the following training inputs X =
[−3, 5, 4] and Y = [−10, 20, 20]. Which of the following is closest to
the MLE estimate ŵM LE ?

(a) 2.0
(b) 4.1
(c) 4.2
(d) 8.2
(e) 10.0

F SOLUTION: C

12. [2 points] Using the same data as above X = [−3, 5, 4] and Y =

[−10, 20, 20], assuming a ridge penalty λ = 50, what ratio versus the
MLE estimate ŵM LE do you think the ridge regression L2 estimate
estimate ŵridge will be?

(a) 2
(b) 1
(c) 0.666
(d) 0.5
(e) 0.25

F SOLUTION: D

13. [1 points] True or False? A probability density function (PDF) cannot

be less than 0 or bigger than 1.

F SOLUTION: F

14. [1 points] True or False? A cumulative distribution function (CDF)

cannot be less than 0 or bigger than 1.
CIS520 Midterm, Fall 2016 5

F SOLUTION: T

15. [2 points] A and B are two events. If P (A, B) decreases while P (A)
increases, what must be true:
(a) P (A|B) decreases
(b) P (B|A) decreases
(c) P (B) decreases
(d) All of above

F SOLUTION: B

16. [2 points] After applying a regularization penalty in linear regression,

you find that some of the coefficients of w are zeroed out. Which of
the following penalties might have been used?
(a) L0 norm
(b) L1 norm
(c) L2 norm
(d) either (A) or (B)
(e) any of the above

F SOLUTION: D

17. [1 points] True or False? 1-nearest neighbor is a consistent estimator.

F SOLUTION: T

18. [1 points] True or False? In the limit of infinite training and test data,
consistent estimators always give at least as low a test error as biased
estimators.

F SOLUTION: T

19. [1 points] True or False? Stepwise regression with an RIC penalty is

an unbiased estimator.
CIS520 Midterm, Fall 2016 6

F SOLUTION: F

20. [1 points] True or False? Leave-one out cross validation (LOOCV)

generally gives less accurate estimates of true test error than 10-fold
cross validation.

F SOLUTION: F

21. [2 points] The optimal code for a long series of symbols A, B, C, D

drawn with probabilities P (A) = 1/2, P (B) = 1/4, P (C) = P (D) =
1/8 should take on average how many bits per symbol?

(a) 0.5
(b) 1
(c) 1.5
(d) 1.75
(e) none of the above

F SOLUTION: D

22. [2 points] If we we want to encode a true distribution P (A) = 1/2,

P (B) = 1/2, P (C) = P (D) = 0 with an estimated distribution
P (A) = 1/2, P (B) = 1/4, P (C) = P (D) = 1/8 what is the KL
divergence (using log based 2) between them?

(a) 0.5
(b) 1
(c) 4/3
(d) 1.5
(e) none of the above
CIS520 Midterm, Fall 2016 7

F SOLUTION: A

23. [2 points] Given a coding that would be optimal for a stream of

A’s, B’s, and C’s generated from the probability distribution (pA =
1/2, pB = 1/4, pC = 1/4), How many bits would it take to code the
data stream “AABB”?

(a) 1
(b) 2
(c) 4
(d) 6
(e) none of the above

F SOLUTION: D

24. [2 points] We have a stream of A’s, B’s, and C’s generated from the
probability distribution (pA = 1/2, pB = 1/2, pC = 0). However, we
instead use a coding that would be optimal if the distribution was
(pA = 1/2, pB = 1/4, pC = 1/4). Our coding will (in expectation)

(a) take more bits than the optimal coding

(b) take fewer bits than the optimal coding
(c) not enough information is provided to tell

F SOLUTION: A

25. [1 points] True or False? Adding a feature to a linear regression model

during streamwise regression always decreases model bias.

F SOLUTION: T

26. [1 points] True or False? Adding a feature to a linear regression model

during streamwise regression always increases model variance.
CIS520 Midterm, Fall 2016 8

F SOLUTION: T

27. [1 points] True or False? 5-NN has lower bias than 1-NN.

F SOLUTION: F

28. [1 points] True or False? 5-NN is more robust to outliers than 1-NN.

F SOLUTION: T

29. [1 points] True or False? Choosing different norms does not affect the
decision boundary of 1-NN.

F SOLUTION: F

In the following four questions, assume both trees are trained on the
same data.

30. [1 points] True or False? A tree with depth of 3 has higher variance
than a tree with depth of 1.

F SOLUTION: T

31. [1 points] True or False? A tree with depth of 3 has higher bias than
a tree with depth 1.

F SOLUTION: F

32. [1 points] True or False? A tree with depth of 3 never has higher
training error than a tree with depth 1.

F SOLUTION: T

33. [1 points] True or False? A tree with depth of 3 never has higher test
error than a tree with depth 1.
CIS520 Midterm, Fall 2016 9

F SOLUTION: F

34. [1 points] True or False? If decision trees such as the ones we built in
class are allowed to have decision nodes based on questions that can
have many possible answers (e.g. “What country are you from) in ad-
dition to binary questions, they will in general tend to add the multiple
answer questions to the tree before adding the binary questions

F SOLUTION: T

In the following three questions, assume models are trained on the same
data without transformations or interactions.

35. [1 points] True or False? K-nearest neighbors will always give a linear
decision boundary.

F SOLUTION: F

36. [1 points] True or False? Decision trees with depth one will always
give a linear decision boundary.

F SOLUTION: T

37. [1 points] True or False? Gaussian Naive Bayes will always give a
linear decision boundary.

F SOLUTION: F

38. [1 points] True or False? Logistic Regression will always give a linear
decision boundary.
CIS520 Midterm, Fall 2016 10

F SOLUTION: T

39. [2 points] Suppose you have picked the parameter θ for a model using
10-fold cross validation. The best way to pick a final model to use and
estimate its error is to

(a) pick any of the 10 models you built for your model; use its error
estimate on the held-out data
(b) pick any of the 10 models you built for your model; use the aver-
age CV error for the 10 models as its error estimate
(c) average all of the 10 models you got; use the average CV error as
its error estimate
(d) average all of the 10 models you got; use the error the combined
model gives on the full training set
(e) train a new model on the full data set, using the θ you found; use
the average CV error as its error estimate

F SOLUTION: E

40. [2 points] Suppose we want to compute 10-Fold Cross-Validation error

on 100 training examples. We need to compute error N1 times, and
the Cross-Validation error is the average of the errors. To compute
each error, we need to build a model with data of size N2 , and test
the model on the data of size N3 .
What are the appropriate numbers for N1 , N2 , N3 ?

(a) N1 = 10, N2 = 90, N3 = 10

(b) N1 = 1, N2 = 90, N3 = 10
(c) N1 = 10, N2 = 100, N3 = 10
(d) N1 = 10, N2 = 100, N3 = 100

F SOLUTION: A
CIS520 Midterm, Fall 2016 11

The following question refers to this picture, which shows a set of 6

points in a two-dimension space, where each point is either labeled “.”

or “*”.

41. [2 points] What is the training error (fraction labeled wrong) in the
above picture using 1-nearest neighbor and the L1 distance norm be-
tween points? If there is more than one equally close nearest neighbor,
use the majority label of all the equally close nearest neighbors. Please
make sure you really are computing training error, not testing error.

(a) 0
(b) 1/6
(c) 2/6
(d) 3/6
(e) 4/6

F SOLUTION: A

42. [2 points] MLE estimates are often undesirable because

(a) they are biased

(b) they have high variance
(c) they are not consistent estimators
(d) None of the above

F SOLUTION: B

43. [2 points] For a data set with p features, of which q will eventually
enter the model, streamwise feature selection will test approximately
how many models?
CIS520 Midterm, Fall 2016 12

(a) p
(b) q
(c) pq
(d) p2

F SOLUTION: A

44. [2 points] For a data set with p features, of which q will eventually enter
the model, stepwise feature selection will test approximately how many
models?

(a) p
(b) q
(c) pq
(d) p2

F SOLUTION: C

45. [2 points] Which penalty should you use if you are doing stepwise
regression and expect 10 out of 100,000 features, with n = 100 obser-
vations?

(a) AIC
(b) BIC
(c) RIC

F SOLUTION: C

46. [2 points] Which penalty should you use if you are doing stepwise
regression and expect 200 out of 1,000 features, with n = 10,000,000
observations?

(a) AIC
(b) BIC
(c) RIC
CIS520 Midterm, Fall 2016 13

F SOLUTION: B

47. [2 points] You are doing feature selection with 1,000 features and 1,000
observations. Assume roughly half of the features will be selected.
What would be the best (or least bad) MDL model?

(a) AIC
(b) BIC
(c) RIC
(d) Both AIC and BIC are equally good
(e) Both AIC and RIC are equally good

F SOLUTION: A

48. [2 points] Which of the following best describes what discriminative

approaches try to model? (w are the parameters in the model)

(a) p(y|x, w)
(b) p(y, x)
(c) p(x|y, w)
(d) p(w|x, y)
(e) p(y|w)

F SOLUTION: A

49. [2 points] Which of the following tends to work best on small data sets
(few observations)?

(a) Logistic regression

(b) Naive Bayes
CIS520 Midterm, Fall 2016 14

F SOLUTION: B

50. [2 points] Suppose you have a three class problem where class la-
bel y ∈ 0, 1, 2 and each training example X has 3 binary attributes
X1 , X2 , X3 ∈ 0, 1. How many parameters do you need to know to
classify an example using the Naive Bayes classifier?

(a) 5
(b) 9
(c) 11
(d) 13
(e) 23

F SOLUTION: C

51. [1 points] True or False Radial basis functions could be used to make
the following dataset be linearly separable.

|
x o | x o
o x | o x
|
________________________________
| x1
x o | o x
o x | x o
|
x2

F SOLUTION: T

F you could just center an RBF at each point

52. [2 points] Given the following table of observations, calculate the in-
formation gain IG(Y |X) that would result from learning the value of
X
CIS520 Midterm, Fall 2016 15

X Y
Red True
Green False
Brown False
Brown True

(a) 1/2
(b) 1
(c) 3/2
(d) 2
(e) none of the above

F SOLUTION: A

53. [2 points] We are comparing two models A and B for the same data
set.
We find that model A requires 127 bits to code the residual and 260
bits to code the model.
We find that model B requires 160 bits to code the residual and 250
bits to code the model.

(a) model B is underfit compared to A

(b) model B is overfit compared to A
(c) not enough information is provided to tell if model B is over- or
under-fit relative to A

F SOLUTION: A

54. [2 points] In fitting some data using radial basis functions with kernel
width σ, we compute a training error of 345 and a testing error of 390.

(a) increasing σ will most likely reduce test set error

(b) decreasing σ will most likely reduce test set error
(c) not enough information is provided to determine how σ should
be changed
CIS520 Midterm, Fall 2016 16

F SOLUTION: C

55. [2 points] When choosing one feature from X1 , ..., Xn while building a
Decision Tree, which of the following criteria is the most appropriate to
maximize? (Here, H() means an entropy, and P() means a Probability)

(a) P (Y |Xj )
(b) P (Y ) − P (Y |Xj )
(c) H(Y ) − H(Y |Xj )
(d) H(Y |Xj )
(e) H(Y ) − P (Y )

F SOLUTION: C

56. [2 points] How many bits are required to code the decision tree (below)
which is derived from the data set below it?

(a) 3
(b) 5
(c) 7
(d) None of above

F SOLUTION: A

F Code what to split on: there are three possible features or no split
so log2 (4) = 2 bits Then code which way is ’yes’ so one more bit , giving
a total of 3 bits.
CIS520 Midterm, Fall 2016 17

X1 X2 X3 Y
0 0 0 0
0 0 1 0
0 1 0 0
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 1
1 1 1 0

57. [2 points] How many bits are required to code the residual from using
the above decision tree to code the above data set?

(a) 1
(b) 2
(c) 3
(d) None of above

F SOLUTION: C

58. [2 points] In each round of AdaBoost, the misclassification penalty for

a particular training observation is increased going from round t to
round t + 1 if the observation was...

(a) classified incorrectly by the weak learner trained in round t

(b) classified incorrectly by the full ensemble trained up to round t
(c) classified incorrectly by a majority of the weak learners trained
up to round t
(d) B and C
(e) A, B, and C

F SOLUTION: A

59. [1 points] True or False? Boosting minimizes an exponential loss func-

tion (subject to the model constraints).
CIS520 Midterm, Fall 2016 18

F SOLUTION: True

60. [2 points] Which of the following regularization methods are scale-

invariant

(a) L0 and L2 but not L1

(b) L1 and L2 but not L0
(c) L0 but not L1 or L2
(d) L2 but not L0 or L1
(e) None of the above

F SOLUTION: C

A Beginners Guide To Deep Reinforcement Learning PDF
No ratings yet
A Beginners Guide To Deep Reinforcement Learning PDF
9 pages
Advanced Optimization Homework
No ratings yet
Advanced Optimization Homework
3 pages
Introduction To Numpy Exercise
No ratings yet
Introduction To Numpy Exercise
24 pages
ML Cheatsheet Final
No ratings yet
ML Cheatsheet Final
32 pages
Auto Tux Kart
No ratings yet
Auto Tux Kart
2 pages
EXAMPLE Machine Learning (C395) Exam Questions
No ratings yet
EXAMPLE Machine Learning (C395) Exam Questions
4 pages
Laplacian Kernel
No ratings yet
Laplacian Kernel
5 pages
ISYE-6669 Optimization Homework
No ratings yet
ISYE-6669 Optimization Homework
3 pages
Top 50 Machine Learning Interview Q A
No ratings yet
Top 50 Machine Learning Interview Q A
13 pages
MScFE 650 MLF - Video - Transcripts - M2
No ratings yet
MScFE 650 MLF - Video - Transcripts - M2
23 pages
SIT718 Assessment-Task 4-T3 2019-Amended PDF
No ratings yet
SIT718 Assessment-Task 4-T3 2019-Amended PDF
7 pages
All Garch Models and Likelihood
No ratings yet
All Garch Models and Likelihood
20 pages
NumPy Extended Cheatsheet Guide
No ratings yet
NumPy Extended Cheatsheet Guide
8 pages
4.1. Power Series
No ratings yet
4.1. Power Series
4 pages
R Programming Course Notes
No ratings yet
R Programming Course Notes
28 pages
Thesis 2021 - Bayesian - Neural - Networks
No ratings yet
Thesis 2021 - Bayesian - Neural - Networks
148 pages
Course Outline MTH-375
No ratings yet
Course Outline MTH-375
5 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
3 pages
Financial Time Series Analysis and Prediction With Feature Engineering and Support Vector Machines - Newton - Linchen
100% (1)
Financial Time Series Analysis and Prediction With Feature Engineering and Support Vector Machines - Newton - Linchen
5 pages
Minor Project Ii Report Text Mining: Reuters-21578: Submitted by
100% (1)
Minor Project Ii Report Text Mining: Reuters-21578: Submitted by
51 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
AI Exam Prep for CS Students
No ratings yet
AI Exam Prep for CS Students
4 pages
Python Neural Network Guide
100% (1)
Python Neural Network Guide
12 pages
Robust and Optimal Control PDF
No ratings yet
Robust and Optimal Control PDF
7 pages
Jason Brown Lee Text Books
0% (1)
Jason Brown Lee Text Books
1 page
EPAT Brochure
No ratings yet
EPAT Brochure
22 pages
30 Frequently Asked Deep Learning Interview Questions and Answers
100% (1)
30 Frequently Asked Deep Learning Interview Questions and Answers
28 pages
Time Series Analysis
No ratings yet
Time Series Analysis
22 pages
Deep Learning 2024
No ratings yet
Deep Learning 2024
14 pages
TDD Simplified: 5 Essential Steps
100% (1)
TDD Simplified: 5 Essential Steps
24 pages
Deep Learning in Hilbert Spaces - New Frontiers in Algorithmic Trading
No ratings yet
Deep Learning in Hilbert Spaces - New Frontiers in Algorithmic Trading
351 pages
Discrete Random Variables Problems
No ratings yet
Discrete Random Variables Problems
2 pages
What Is SIMULINK?: Parameters
No ratings yet
What Is SIMULINK?: Parameters
17 pages
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
100% (1)
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
51 pages
Anamoly Detection
0% (1)
Anamoly Detection
20 pages
Deep Learning Notes Andrew NG
100% (1)
Deep Learning Notes Andrew NG
54 pages
Git Version Control Guide
No ratings yet
Git Version Control Guide
20 pages
Logistic Regression & Ensemble Learning
No ratings yet
Logistic Regression & Ensemble Learning
14 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
ML: Decision Trees & Random Forests
No ratings yet
ML: Decision Trees & Random Forests
25 pages
Instant Ebooks Textbook Deep Generative Modeling Jakub M. Tomczak Download All Chapters
No ratings yet
Instant Ebooks Textbook Deep Generative Modeling Jakub M. Tomczak Download All Chapters
49 pages
Problem Statement - Stock Market News Sentiment Analysis and Summarization - Introduction To Natural Language Processing - Great Learning
0% (3)
Problem Statement - Stock Market News Sentiment Analysis and Summarization - Introduction To Natural Language Processing - Great Learning
2 pages
Machine Learning Quiz Insights
No ratings yet
Machine Learning Quiz Insights
8 pages
Book Python For Control
No ratings yet
Book Python For Control
82 pages
Practical Linear Algebra
100% (1)
Practical Linear Algebra
253 pages
M.Tech Deep Learning Exam Guide
100% (1)
M.Tech Deep Learning Exam Guide
6 pages
Deep Learning Essentials
100% (1)
Deep Learning Essentials
140 pages
Random Forests for Data Scientists
100% (1)
Random Forests for Data Scientists
12 pages
Prompt Engineering 50 MCQs
100% (1)
Prompt Engineering 50 MCQs
2 pages
Statistics Practice Test
0% (1)
Statistics Practice Test
7 pages
MATH 257 Lecture Notes PDF
No ratings yet
MATH 257 Lecture Notes PDF
274 pages
Free Quant Resources
100% (2)
Free Quant Resources
6 pages
StatisticsMachineLearningPythonDraft PDF
100% (1)
StatisticsMachineLearningPythonDraft PDF
323 pages
WQU MScFE Discrete-Time Stochastic Processes Module 1
100% (1)
WQU MScFE Discrete-Time Stochastic Processes Module 1
52 pages
Machine Learning and Finance II
No ratings yet
Machine Learning and Finance II
103 pages
Midterm2018 Solutions
No ratings yet
Midterm2018 Solutions
15 pages
Midterm2022 Solutions
No ratings yet
Midterm2022 Solutions
12 pages
Midterm2019 Solutions
No ratings yet
Midterm2019 Solutions
17 pages
Final2019 Solutions
No ratings yet
Final2019 Solutions
23 pages
Jurnal Yuli Indriyani
No ratings yet
Jurnal Yuli Indriyani
8 pages
Stata Tutorial
No ratings yet
Stata Tutorial
59 pages
Stat Case 1 Century National Banks
No ratings yet
Stat Case 1 Century National Banks
28 pages
Time Series QBank
No ratings yet
Time Series QBank
6 pages
OLS
No ratings yet
OLS
18 pages
Time Series Question & Solution
100% (4)
Time Series Question & Solution
8 pages
Regression
100% (1)
Regression
87 pages
Session 13A - The ARMA and ARIMA Models
No ratings yet
Session 13A - The ARMA and ARIMA Models
173 pages
ARMAX Model: Definitions & Usage
No ratings yet
ARMAX Model: Definitions & Usage
3 pages
Auto Regressive Model
No ratings yet
Auto Regressive Model
3 pages
Health Insurance Cost Prediction Using Regression Models
No ratings yet
Health Insurance Cost Prediction Using Regression Models
7 pages
Course: STAT-212 Term: 182 Homework # 4 Material: Chapter 13 Due Date: Sunday, 17-March-2019
No ratings yet
Course: STAT-212 Term: 182 Homework # 4 Material: Chapter 13 Due Date: Sunday, 17-March-2019
2 pages
Econometrics Chapter 1& 2
No ratings yet
Econometrics Chapter 1& 2
35 pages
049 Stat 326 Regression Final Paper
No ratings yet
049 Stat 326 Regression Final Paper
17 pages
How To Write An Empirical Paper
No ratings yet
How To Write An Empirical Paper
20 pages
Management Accounting S6 SB
No ratings yet
Management Accounting S6 SB
158 pages
ARIMA Time Series Analysis with Python
No ratings yet
ARIMA Time Series Analysis with Python
69 pages
2-Siklus Regresi
No ratings yet
2-Siklus Regresi
27 pages
Data Analysis For Quantitative Research
100% (1)
Data Analysis For Quantitative Research
26 pages
Doughnut Demand Forecasting Analysis
No ratings yet
Doughnut Demand Forecasting Analysis
3 pages
Lfstat3e PPT 06 Rev
No ratings yet
Lfstat3e PPT 06 Rev
44 pages
Chapter 4-Fundamental Analysis
No ratings yet
Chapter 4-Fundamental Analysis
4 pages
MA Econometrics II: Midterm 2019: Instructor: Bipasha Maity Ashoka University Spring Semester 2019 March 13, 2019
No ratings yet
MA Econometrics II: Midterm 2019: Instructor: Bipasha Maity Ashoka University Spring Semester 2019 March 13, 2019
2 pages
Deming Regression
No ratings yet
Deming Regression
4 pages
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
No ratings yet
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
31 pages
STAT - Lec.3 - Correlation and Regression
No ratings yet
STAT - Lec.3 - Correlation and Regression
8 pages
MCA for Performance Analysis
No ratings yet
MCA for Performance Analysis
16 pages
Modelling Time Series With Calender Variation
No ratings yet
Modelling Time Series With Calender Variation
10 pages
Pengaruh Persepsi Atas Task Commitment Dan Sikap Terhadap Pemahaman Konsep Matematika Siswa
No ratings yet
Pengaruh Persepsi Atas Task Commitment Dan Sikap Terhadap Pemahaman Konsep Matematika Siswa
5 pages
ARIMA Forecasting Guide
No ratings yet
ARIMA Forecasting Guide
20 pages

University of Pennsylvania CIS 520: Machine Learning Midterm, 2016

Uploaded by

University of Pennsylvania CIS 520: Machine Learning Midterm, 2016

Uploaded by

UNIVERSITY of PENNSYLVANIA

CIS 520: Machine Learning

There are 60 questions.

1. [0 points] This is version A of the exam. Please fill in the “bubble”

2. [2 points] Suppose we have a regularized linear regression model: argminw kY −

(a) Increases bias, increases variance

3. [2 points] Suppose we have a regularized linear regression model: argminw kY −

(a) Increases bias, increases variance

4. [2 points] Compared to the variance of the MLE estimate ŵM LE , do

5. [1 points] True or False? Given a distribution p(X, y), it is always (in

6. [1 points] True or False? It is generally more important to use consis-

7. [1 points] True or False? It is generally more important to used unbi-

8. [1 points] True or False? LASSO is a parametric method.

9. [2 points] Ridge regression uses what penalty on the regression weights?

10. [1 points] True or False?

12. [2 points] Using the same data as above X = [−3, 5, 4] and Y =

13. [1 points] True or False? A probability density function (PDF) cannot

14. [1 points] True or False? A cumulative distribution function (CDF)

16. [2 points] After applying a regularization penalty in linear regression,

17. [1 points] True or False? 1-nearest neighbor is a consistent estimator.

19. [1 points] True or False? Stepwise regression with an RIC penalty is

20. [1 points] True or False? Leave-one out cross validation (LOOCV)

21. [2 points] The optimal code for a long series of symbols A, B, C, D

22. [2 points] If we we want to encode a true distribution P (A) = 1/2,

23. [2 points] Given a coding that would be optimal for a stream of

(a) take more bits than the optimal coding

25. [1 points] True or False? Adding a feature to a linear regression model

26. [1 points] True or False? Adding a feature to a linear regression model

40. [2 points] Suppose we want to compute 10-Fold Cross-Validation error

(a) N1 = 10, N2 = 90, N3 = 10

The following question refers to this picture, which shows a set of 6

42. [2 points] MLE estimates are often undesirable because

(a) they are biased

48. [2 points] Which of the following best describes what discriminative

(a) Logistic regression

F you could just center an RBF at each point

(a) model B is underfit compared to A

(a) increasing σ will most likely reduce test set error

58. [2 points] In each round of AdaBoost, the misclassification penalty for

(a) classified incorrectly by the weak learner trained in round t

59. [1 points] True or False? Boosting minimizes an exponential loss func-

60. [2 points] Which of the following regularization methods are scale-

(a) L0 and L2 but not L1

You might also like