0% found this document useful (0 votes)
16 views9 pages

OPTFIT Aflevering

Uploaded by

michella.mrs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views9 pages

OPTFIT Aflevering

Uploaded by

michella.mrs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Homework assignment 1

Optimization and Data Fitting - 02610


Michella Ravn Søndergaard - s184005
4th of October 2020

1
s184005 Michella Ravn Søndergaard

The Matlab code for this report is in a seperate folder.

1 One-page report on the exercises for week 5


Gradient, and hessian for the function: f (x1 , x2 ) = 100 · (x2 − x21 )2 + (1 − x1 )2

 0 
2x1 − 400x1 x2 − x21 − 2
  
fx1 (x1 , x2 )
∇f (x1 , x2 ) = 0 =
fx2 (x1 , x2 ) 200x2 − 20x1

fx001 ,x1 (x1 , x2 ) fx001 ,x2 (x1 , x2 ) 1200x21 − 400x2 + 2


   
2 −400x1
∇ f (x1 , x2 ) = 00 00 =
fx2 ,x1 (x1 , x2 ) fx2 ,x2 (x1 , x2 ) −400x1 200
A contour plot of f is made and shown below.

Contour plot the function f with the minimum, (1,1), is marked with red
2 200

180

1.5
160

140
1

120
x2

0.5 100

80

0
60

40
-0.5

20

-1 0
-1 -0.5 0 0.5 1 1.5 2
x1

Applying steepest descent method to approximate the minimizer of f (x) from two different star-
ting points (1.2, 1.2) and (−1.2, 1) and changing the stepsize.
Firstly we run the steepest descent algorithm with the starting point close to the minimum and
try different stepsizes. When choosing a relatively large stepsize, e.g. 0.5, we see the algorithm
will oscillate between two points. The intuition is to take a smaller alpha to avoid these oscil-
lations. Gradually decreasing the stepsize will increase the amount of steps and since we have
a constraint of 2000 iterations maximum, we run out of iterations. The conclusion from this is,
the algorithm is unable to converge before we run out of iterations. If we run steepest descent
algorithm using a point further away from the minimum. Running the algorithm from this point
with a realtively large stepsize will result in jumping quickly towards the minimum, but then it
will then jump out again and oscillate over the gap in the graph since the step size is constant
and too large. With a relatively small stepsize we will slowly go towards the minimum, but it
will require more iterations to converge and reach the minimum, as with the other starting po-
int.

Page 2
s184005 Michella Ravn Søndergaard

The error, ||x k -x *||, for the three methods Function values for the three methods
100 105

10-2 100

10-4 10-5

10-6 10-10

10-8 10-15

10-10 10-20

10-12 Steepest descent 10-25 Steepest descent


Newton Newton
BFGS BFGS
10-14 10-30
100 101 102 103 104 100 101 102 103 104

Steepest descent is a cheap algorithm to run computationally and a more direct path to the
minimum is taken. Steepest descent is globally convergent, but it might require many iterations
to reach the minimum, as illustrated on the graph above with the blue line, it never gets below
10− 10 which is the boundary for convergence.
Newton is more computationally heavy, because a new Hessian matrix is computed every time.
Newtons algorithm is only locally convergent, which means we have to choose a tactical starting
point, but when close to the minimum, the algorithm is very fast and requires few iterations.
This is illustrated on the graph with the red line.
BFGS is computationally cheaper than newton because it only approximates the Hessian ma-
trix. This also means that it takes a bit longer to converge. This is illustrated with the yellow
line, which is slightly slower than Newton’s method.

Page 3
s184005 Michella Ravn Søndergaard

2 Stationary points and steepest descent method


f (x) = f1 (x)2 + f2 (x)2
where
f1 (x) = x1 + x22 − 1
f2 (x) = x1 + 3

a. Gradient and hessian


 0
2x1 + 2x22 − 2 + 2x1 + 6 4x1 + 2x22 + 4
    
fx1 (x1 , x2 )
∇f (x1 , x2 ) = = =
fx0 2 (x1 , x2 ) 2x2 (2x1 + 2x22 − 2) 4x2 (x1 + x22 − 1)

 00
fx1 ,x1 (x1 , x2 ) fx001 ,x2 (x1 , x2 )
  
4 4x2
∇2 f (x1 , x2 ) = =
fx002 ,x1 (x1 , x2 ) fx002 ,x2 (x1 , x2 ) 4x2 12x22 + 4x1 − 4

b. Find the three stationary points by calculating the first-order opti-


mality condition, and prove that two are global minimizers and one is a
saddle point.
Optimality condition is when ∇f (x1 , x2 ) = (0, 0) and whether the stationary points are mini-
mums, maximums or saddlepoints is revealed by the eigenvalues og the Hessian matrix in the
stationary points.
Solving the first-order optimality condition gives us: (−1, 0), (−3, −2), (−3, 2). Analyzing these
points we get:

 
−1
sign(eig(H(−1,0) )) =
1
 
1
sign(eig(H(−3,−2) )) =
1
 
1
sign(eig(H(−3,2) )) =
1

The signs of the eigenenvalues reveals that (−1, 0) is a saddle point and (−3, −2), (−3, 2) are
global minimizers.

Page 4
s184005 Michella Ravn Søndergaard

c. Contour plot of the function f, and mark these three points on the
contour plot.
A contour plot of f is shown below.

Contour plot of the function f with minimizers marked in red and saddlepoint in green
3 200

180
2

160

1 140

120
0
x2

100

-1
80

60
-2

40

-3
20

-4 0
-5 -4 -3 -2 -1 0 1 2
x1

Page 5
s184005 Michella Ravn Søndergaard

d. Steepest descent method with backtracking line search on the pro-


blem of minimizing f.
An illustration of the walk from the points to the converged points.

The three different starting points converged to the three stationary points, this is shown in the
table below.

Initial point Number of iterations Function value at converged point Points where it converged
(-2,-1) 54 1.7407e-11 (-3,-2)
(-2,0) 1 8 (-1,0)
(-2,1) 54 1.7407e-11 (-3,2)

Page 6
s184005 Michella Ravn Søndergaard

3 Local convergence in Newton’s method


x41 2
min f (x) = − x21 + 2x1 + (x2 − 1)
x 4

a. Gradient and hessian


The gradient is found.

 0   3 
fx1 (x1 , x2 ) x1 − 2x1 + 2
∇f (x1 , x2 ) = 0 =
fx2 (x1 , x2 ) 2x2 − 2

The hessian is found as:

 00
fx1 ,x1 (x1 , x2 ) fx001 ,x2 (x1 , x2 )
  2 
3x1 − 2 0
∇2 f (x1 , x2 ) = =
fx002 ,x1 (x1 , x2 ) fx002 ,x2 (x1 , x2 ) 0 2

b. Contour plot of the given interval

Contour plot of the function f


3 30

25
2

20
1

15
x2

10

-1
5

-2
0

-3
-3 -2 -1 0 1 2 3
x1

c. 10 iterations of Newton’s method with given conditions


In the 10 iterations the function value will oscillate between to points. To use Newton’s method,
the eigenvalues of the Hessian matrix needs to be positive, since it needs to be positive defini-
te. Already in the first iteration one of the eigenvalues is -2, hence Newton’s method cannot be
used. A modification was made to newton.m and in here the eigenvalues are stored and printed.
One of them is negative every second iteration.

Page 7
s184005 Michella Ravn Søndergaard

d. Backtracking line search

f(x k ) ||f'(xk )||


2 102 1

0.9
1
100
0.8

0 0.7
10-2

0.6
-1

10-4 0.5

-2
0.4

10-6
-3 0.3

0.2
10-8
-4
0.1

-5 -10 0
10
0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120

This method converges and we are able to get an approximated minimum. When running newton line.m,
the α value is small for many iterations, and then suddenly it finds it way to the minimum.
When running the code, it will not be possible to find a function value that is smaller than the
current, which makes alpha smaller in a while loop, until we reach α = 0.01 where the algorit-
hm is forced to take a step, even though the new function value is not smaller than the current.
This forces a way through the graph and closer to the minimum, where it then can take larger
steps to find the minimum.

e. Steepest descent

f(x k) ||f'(x k)||


1 101 1

0 0.9
10

0
10-1 0.8

10-2 0.7
-1
-3 0.6
10

-2 -4 0.5
10

10-5 0.4

-3
10-6 0.3

-7 0.2
10
-4

10-8 0.1

-5 10-9 0
0 10 20 30 0 10 20 30 0 10 20 30

With the same tolerance and max iterations as before, the steepest descent method with back-
tracking converges after 27 iterations, compared to newton which converges in 107 iterations.

Page 8
s184005 Michella Ravn Søndergaard

f. Repeating with new starting point (-1,0)

f(x k ) ||f'(xk )||


-1.5 1
100

-2
0.8
-2.5 10-5

-3 0.6

-3.5 10-10
0.4
-4

-4.5 10-15 0.2


1 2 3 4 5 1 2 3 4 5 1 2 3 4

f(x k ) ||f'(xk )||


-1.5 1
100
-2
0.8

-2.5
0.6
-3
10-5
0.4
-3.5

0.2
-4

-4.5 -10 0
10
0 10 20 30 0 10 20 30 0 10 20 30

Now Newton’s method only requires 4 iterations and steepest descent 28, both with backtrack-
ing line search. This new starting point is closer to the minimum which for Newton’s method is
an advantage. Furthermore the steepest descent method will take more steps when closer to the
minimum.

Page 9

You might also like