0% found this document useful (0 votes)
117 views4 pages

Sam HW2

This document provides instructions for completing homework 2 on multivariate linear regression and logistic regression. For the linear regression portion, students are asked to load housing data, preprocess the data, implement gradient descent to learn parameters, select an optimal learning rate, and compare results to the normal equation method. For logistic regression, students load student admission data, implement cost and gradient functions, use optimization to learn parameters, and plot the decision boundary. Key results are asked to be recorded.

Uploaded by

Ali Hassan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views4 pages

Sam HW2

This document provides instructions for completing homework 2 on multivariate linear regression and logistic regression. For the linear regression portion, students are asked to load housing data, preprocess the data, implement gradient descent to learn parameters, select an optimal learning rate, and compare results to the normal equation method. For logistic regression, students load student admission data, implement cost and gradient functions, use optimization to learn parameters, and plot the decision boundary. Key results are asked to be recorded.

Uploaded by

Ali Hassan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

GEOP 592: HW2

Deadline: 7 February 2019


1 Miultivariate Linear Regression
1.1 Data
Download ex2Data.zip, and extract the files from the zip file. This is a training set of housing prices in
Portland, Oregon, where the outputs y (i) are the prices and the inputs x(i) are the living area and the
number of bedrooms. There are m = 47 training examples.

1.2 Data Processing


Load the data for the training examples into your program and add the x0 = 1 intercept term into your X
matrix.
The values of the inputs x(i) are about 1000 times the number of bedrooms. This difference means that
preprocessing the inputs will significantly increase gradient descent’s efficiency. In your program, scale both
types of inputs by their standard deviations and set their means to zero. In Matlab/Octave, this can be
executed with

sigma = std(x);
mu = mean(x);
x(:,2) = (x(:,2) - mu(2))./ sigma(2);
x(:,3) = (x(:,3) - mu(3))./ sigma(3);

1.3 Gradient descent


Previously, you implemented gradient descent on a univariate regression problem. The only difference now
is that there is one more feature in the matrix X.
The hypothesis function is still
n
X
hθ (x) = θT x = θi xi ,
i=0

and the batch gradient descent update rule is:


m
1 X (i)
θj := θj − α (hθ (x(i) ) − y (i) )xj (for all j)
m i=1

Initialize your parameters to θ = ~0.

1.4 Selecting a learning rate using J(θ)


Now it’s time to select a learning rate α. The goal of this part is to pick a good learning rate in the range of

0.001 ≤ α ≤ 10

Do this by making an initial selection, running gradient descent and observing the cost function, and adjusting
the learning rate accordingly. Recall that the cost function is defined as
m 2
1 X
J(θ) = hθ (x(i) ) − y (i) .
2m i=1
The cost function can also be written in the following vectorized form,

1
1 T
J(θ) = (Xθ − ~y ) (Xθ − ~y )
2m
While in the previous exercise you calculated J(θ) over a grid of θ0 and θ1 values, you will now calculate
J(θ) using the θ of the current stage of gradient descent. After stepping through many stages, you will see
how J(θ) changes as the iterations advance.
Now, run gradient descent for about 50 iterations at your initial learning rate. In each iteration, calculate
J(θ) and store the result in a vector J. After the last iteration, plot the J values against the number of the
iteration. In Matlab/Octave, the steps would look something like this:

theta = zeros(size(x(1,:)))’; % initialize fitting parameters


alpha = %% Your initial learning rate %%
J = zeros(50, 1);

for num_iterations = 1:50


J(num_iterations) = %% Calculate your cost function here %%
theta = %% Result of gradient descent update %%
end

%% now plot J %%

To compare how different learning learning rates affect convergence, it’s helpful to plot J for several learning
rates on the same graph. If you’ve tried three different values of alpha (you should probably try more values
than this) and store the costs in J1, J2 and J3, you can use the following commands to plot them on the
same figure:

plot(0:49, J1(1:50), ‘b-’);


hold on;
plot(0:49, J2(1:50), ‘r-’);
plot(0:49, J3(1:50), ‘k-’);

Observe the changes in the cost function happens as the learning rate changes. What happens when the
learning rate is too small? Too large?
Using the best learning rate that you found, run gradient descent until convergence to find:
1. The final values of θ
2. The predicted price of a house with 1650 square feet and 3 bedrooms.

1.5 Normal Equations


The normal equation is given as:
−1
θ = XT X X T ~y .
Use the formula above to calculate θ. Once you have found θ from this method, use it to make a price
prediction for a 1650-square-foot house with 3 bedrooms. Do you get the same price that you found through
gradient descent? If not, why?

2 Logistic Regression
2.1 Data
Download ex3Data.zip and extract the files from the zip file. For this exercise, suppose that a high school
has a dataset representing 40 students who were admitted to college and 40 students who were not admit-
ted. Each (x(i) , y (i) ) training example contains a student’s score on two standardized exams and a label of

2
whether the student was admitted.
Your task is to build a binary classification model that estimates college admission chances based on a stu-
dent’s scores on two exams. In your training data,
1. The first column of represents all Test 1 scores, and the second column represents all Test 2 scores.
2. The y vector uses ‘1’ to label a student who was admitted and ‘0’ to label a student who was not admitted.

2.2 Plot the data


Load the data for the training examples into your program and add the x0 = 1 intercept term into your
X matrix. Plot the data using different symbols to represent the two classes. In Matlab/Octave, you can
separate the positive class and the negative class using the find command:

% find returns the indices of the


% rows meeting the specified condition
pos = find(y == 1); neg = find(y == 0);

% Assume the features are in the 2nd and 3rd


% columns of x
plot(x(pos, 2), x(pos,3), ‘+’); hold on
plot(x(neg, 2), x(neg, 3), ‘o’)

2.3 Cost Function and Gradient


Implement the cost function and gradient for logistic regression. The cost function in logistic regression is:
m h
X        i
J(θ) = −1/m y (i) log hθ x(i) + 1 − y (i) log 1 − hθ x(i)
i=1

and the gradient of the cost is a vector of the same length as θ where the jth element (for j = 0, 1, ..., n) is
defined as follows:
m
∂J(θ) 1 X   (i)  
(i)
= hθ x − y(i) xj
∂θj m i=1
Note that while this gradient looks identical to the linear regression gradient, the formula is actually different
because linear and logistic regression have different definitions of hθ (x).

2.4 Learning parameters using fminunc


Octave/MATLAB’s fminunc is an optimization solver that finds the minimum of an unconstrained function.
For logistic regression, you want to optimize the cost function J(θ) with parameters θ.
The syntax to use fminunc is:

% Set options for fminunc


options = optimset(‘GradObj’, ‘on’, ‘MaxIter’, 400);
% Run fminunc to obtain the optimal theta
% This function will return theta and the cost
[theta, cost] = fminunc(@(t)(costFunction(t, X, y)), initial theta, options);

We first defined the options to be used with fminunc. We set the GradObj option to on, which tells fminunc
that our function returns both the cost and the gradient. This allows fminunc to use the gradient when
minimizing the function. Furthermore, we set the MaxIter option to 400, so that fminunc will run for
at most 400 steps before it terminates. To specify the actual function we are minimizing, we use @(t) (
costFunction(t, X, y) ) . This creates a function, with argument t, which calls your costFunction.
This allows us to wrap the costFunction for use with fminunc.

3
2.5 Plot the decision boundary
After convergence, use your values of theta to plot the decision boundary in the classification problem. The
decision boundary is defined as the line where

P (y = 1|x; θ) = g(θT x) = 0.5,

which corresponds to
θT x = 0.
Plotting the decision boundary is equivalent to plotting the θT x = 0 line.
Finally, record your answers to these questions.
1. What values of θ did you get? How many iterations were required for convergence?
2. What is the probability that a student with a score of 20 on Exam 1 and a score of 80 on Exam 2 will
not be admitted?

You might also like