0% found this document useful (0 votes)
8 views2 pages

hw2 LR

Machine Learning

Uploaded by

vijayaj1212
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views2 pages

hw2 LR

Machine Learning

Uploaded by

vijayaj1212
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Introduction to Machine Learning

1. Linear regression [50 points]


Given Xn×d , yn×1 , wd×1 , y = Xw + , where  ∼ N (0, σ 2 I).

1. [ 8 points ] Write down the loss function for Linear Regression.

2. [ 8 points ] Derive the closed form estimate for w by Least Square Method. (Assume XT X is
−1
invertible, e.g (XT X) exist.)
T T
Hint: Compute the derivative of Loss function, ∂β∂βA = AT , ∂f (β)
∂βT
= ( ∂f∂β
(β) T
)
   
1 1 2
3. [ 24 points ] Given X = 1 2 . and y = 3.
1 3 5
Please compute X T X, X T y and estimated w (denoted by ŵ)by Least Squared Method.

XT X =

XT y =

ŵ =

4. [ 10 points ] Name some methods that you believe you can use to handle the overfitting
problem. Write down the loss function or objective function you would like to minimize,
when you would like to avoid overfitting for linear regression method. Describe how you
will set the parameters in your loss function.

1
2. 2. Logistic Regression (50 points)
3.
Consider a binary logistic regression model with data D = {(xi, yi)}N
i=1, xi ∈ Rm and yi

∈ {−1, 1}, with the linear discriminative function given by wT xi.

1. [ 10 points ] What is the probability P (y = 1|x, w) under a logistic


regression model? Denote the result by g(w), where g is a sigmoid function.

2. [ 20 points ]What is the full data log-likelihood L(w) under this model, i.e., the
log of probability that all the N samples of data D are observed?What is the gradient
of the function L(w) with respect to w?

3. [ 20 points ] Suppose that you train a logistic regression classifier for m = 2 with
parameter w = [w0 , w1 , w2 ]T and get w0 = 6, w1 = −1 and w2 = 0. Then what is
your decision rule when using this classifier? Draw a figure that represents the
decision boundary found by your classifier.

You might also like