0% found this document useful (0 votes)

38 views14 pages

Gradient and Newton's Methods Lecture

A lecture note!

Uploaded by

Saptarshi Chowdhury

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views14 pages

Gradient and Newton's Methods Lecture

A lecture note!

Uploaded by

Saptarshi Chowdhury

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

optimization

(SI 416) – lecture 9

Harsha Hutridurga

IIT Bombay

Harsha Hutridurga (IIT Bombay) SI 416 1 / 14

recap

♣ Take a continuously differentiable function f : Rn → R

♣ Gradient descent algorithm reads as follows:
(
Begin with a x(0) ∈ Rn
Build iterates using x(n+1) = x(n) − δ∇f (x(n) ) for n = 0, 1, 2, .

♣ If f is strongly convex, then there is a unique global minimizer x∗

♣ If f is further assumed to be β-smooth, then picking δ ∈ (0, β −1 )
yields a minimizing sequence, i.e. f (x(n+1) ) ≤ f (x(n) )
♣ Furthermore, we have the estimate:
n
(n) 1 2
x − x∗ ≤ x(0) − x∗ for n = 0, 1, . . .
1 + 2δλ

♣ Hence we deduced that

n = O(ln(ε−1 )) =⇒ x(n) − x∗ = O(ε)

Harsha Hutridurga (IIT Bombay) SI 416 2 / 14

recap (contd.)

♣ Take a twice continuously differentiable function f : Rn → R

♣ Further assume that ∇2 f (x) is invertible for all x ∈ Rn
♣ Newton’s algorithm reads as follows:

 Begin with a x(0) ∈ Rn
−1
 Build iterates using x(n+1) = x(n) − δ ∇2 f (x(n) ) ∇f (x(n) )

for n = 0, 1, 2, . . .
♣ Suppose f is strongly convex. Then
I there is a unique global minimizer x∗
I the hessian ∇2 f (x) is positive definite
♣ Picking δ 1 small enough yields a minimizing sequence:

f (x(n+1) ) ≤ f (x(n) )

Harsha Hutridurga (IIT Bombay) SI 416 3 / 14

recap (contd.)

♣ Assume further that f is smooth in the following sense:

∇2 f (x)v − ∇2 f (y)v ≤ γ kx − yk kvk for all x, y, v ∈ Rn
for some γ > 0
♣ For δ = 1, we have the estimate:
γ 2n −1 2n
x(n) − x∗ ≤ x(0) − x∗
2λ

♣ Note that

λ 2λ n
x (0)
− x∗ ≤ =⇒ x (n)
− x∗ ≤ 2−2
γ γ

♣ Hence we deduced that

n = O(ln(ln(ε−1 ))) =⇒ x(n) − x∗ = O(ε)

Harsha Hutridurga (IIT Bombay) SI 416 4 / 14

rates of convergence

♣ If a sequence {x(n) } ⊂ Rn converging to a point x∗ ∈ Rn , then

lim x(n) − x∗ = 0
n→∞

♣ For a convergent sequence, we can talk about rate of convergence

I The convergence is linear if there exists a θ ∈ (0, 1) such that

x(n+1) − x∗ ≤ θ x(n) − x∗
for all n sufficiently large
I The convergence is superlinear if
x(n+1) − x∗
lim =0
n→∞ x(n) − x∗

I The convergence is quadratic if there exists a C > 0 such that

2
x(n+1) − x∗ ≤ C x(n) − x∗
for all n sufficiently large
Harsha Hutridurga (IIT Bombay) SI 416 5 / 14
rate of convergence (contd.)

♣ Recall: For the gradient descent algorithm to minimize a strongly

convex β-smooth function, we had
1
(n+1) 1 2
x − x∗ ≤ x(n) − x∗
1 + 2δλ

♣ Hence the convergence here is linear

♣ Recall: For the Newton’s algorithm to minimize a smooth strongly
convex function, we had
γ 2
x(n+1) − x∗ ≤ x(n) − x∗
2λ

♣ Hence the convergence here is quadratic

Harsha Hutridurga (IIT Bombay) SI 416 6 / 14
line search algorithms

♣ Start with an initial vector x(0) ∈ Rn and a direction p(0) ∈ Rn

♣ Find the next iterate x(1) along the line x(0) + αp(0) with α > 0 s.t.
f (x(1) ) ≤ f (x(0) )

♣ At the point x(1) , pick a new direction p(1) ∈ Rn

♣ Find the next iterate x(2) along the line x(1) + αp(1) with α > 0 s.t.
f (x(2) ) ≤ f (x(1) )

♣ General principle of line search algorithms:

I At the current iterate x(n) , choose a direction p(n)
I Pick the next iterate x(n+1) along the line x(n) + αp(n) with α > 0
such that
f (x(n+1) ) ≤ f (x(n) )

Harsha Hutridurga (IIT Bombay) SI 416 7 / 14

line search algorithms (contd.)

♣ At each iteration step, we may perform a one-dimensional

minimization problem:
min f (x(n) + αp(n) )
α>0

♣ But, in practice, we are content with finding a candidate that

comes close to solving the above one-dimensional problem
♣ The direction p(n) is referred to as the search direction
♣ Recall the steepest descent algorithm:
x(n+1) = x(n) − δ∇f (x(n) )

♣ So, here the search direction at the nth iteration step is

p(n) = −∇f (x(n) )

Harsha Hutridurga (IIT Bombay) SI 416 8 / 14

line search algorithms (contd.)

♣ At the iterate x(n) and for any search direction p(n) , we have
D E
f (x(n) + αp(n) ) = f (x(n) ) + α ∇f (x(n) ), p(n)
α2 D 2 E
+ ∇ f (x(n) + sp(n) )p(n) , p(n)
2
for some s ∈ (0, α), thanks to Taylor’s theorem.
♣ Define a function g : [0, ∞) → R as follows:
g(α) := f (x(n) + αp(n) ) for α ∈ [0, ∞).

♣ Observe that D E
g 0 (0) = ∇f (x(n) ), p(n)

♣ That is, the rate of change of f at the point x(n) in the direction
p(n) is given by D E
∇f (x(n) ), p(n)
Harsha Hutridurga (IIT Bombay) SI 416 9 / 14
line search algorithms (contd.)

♣ If we are interested in finding a unit direction of maximum

decrease at the point x(n) , we should understand
D E
min ∇f (x(n) ), p
p∈Rn , kpk=1

♣ Recall that, if θn denotes the angle between ∇f (x(n) ) and p, then

D E
∇f (x(n) ), p = kpk ∇f (x(n) ) cos(θn ) = ∇f (x(n) ) cos(θn )

♣ So, the minimum possible value of f (x(n) ), p is obtained when

cos(θn ) = −1

♣ Observe that the unit vector p which realises that is

∇f (x(n) )
p=−
∇f (x(n) )

Harsha Hutridurga (IIT Bombay) SI 416 10 / 14

line search algorithms (contd.)

♣ We have seen that steepest descent is a line search algorithm

where we take the search direction
p(n) = −∇f (x(n) )

♣ Taylor’s theorem says

D E
f (x(n) + αp(n) ) = f (x(n) ) + α ∇f (x(n) ), p(n)
α2 D 2 E
+ ∇ f (x(n) + sp(n) )p(n) , p(n)
2

♣ Hence, if we take 0 < α 1, and if we ensure that

D E
∇f (x(n) ), p(n) < 0

then we find that f (x(n+1) ) < f (x(n) )

♣ Any such direction p(n) is referred to as descent direction
Harsha Hutridurga (IIT Bombay) SI 416 11 / 14
line search algorithms (contd.)

♣ For any search direction p, we have by Taylor’s theorem:

D E 1D E
f (x(n) + p) = f (x(n) ) + ∇f (x(n) ), p + ∇2 f (x(n) + sp)p, p
2
for some s ∈ (0, 1).
♣ Let us assume that ∇2 f (x(n) + sp) ≈ ∇2 f (x(n) )
♣ Hence we obtain
D E 1D E
f (x(n) + p) ≈ f (x(n) ) + ∇f (x(n) ), p + ∇2 f (x(n) )p, p =: F (p)
2

♣ Observe that F is a quadratic function in p

♣ If ∇2 f is positive definite, then F (p) has a unique global minimum
♣ Recall: that global minimizer p∗ is a critical point of F , i.e.
−1
∇F (p∗ ) = 0 =⇒ p∗ = − ∇2 f (x(n) ) ∇f (x(n) )

♣ This is the search direction in Newton’s algorithm

Harsha Hutridurga (IIT Bombay) SI 416 12 / 14
line search algorithms (contd.)

♣ Newton’s algorithm is also a line search algorithm

♣ The search direction in Newton’s algorithm is
−1
p(n) = − ∇2 f (x(n) ) ∇f (x(n) )

♣ If ∇2 f is strictly positive definite, then

D E −1
(n) (n) (n) 2 (n) (n)
∇f (x ), p = − ∇f (x ), ∇ f (x ) ∇f (x ) < 0

♣ Hence the above p(n) is a descent direction

Harsha Hutridurga (IIT Bombay) SI 416 13 / 14

end of lecture 9
thank you for your attention

Harsha Hutridurga (IIT Bombay) SI 416 14 / 14

Lecture 5 Si416 2025
No ratings yet
Lecture 5 Si416 2025
21 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Chương 9
No ratings yet
Chương 9
12 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
Optimization 2
No ratings yet
Optimization 2
40 pages
Lecture 12
No ratings yet
Lecture 12
16 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
CS-6777 Liu Abs
100% (1)
CS-6777 Liu Abs
103 pages
Clnote Oct12
No ratings yet
Clnote Oct12
25 pages
E1 251 Linear and Nonlinear Op2miza2on
No ratings yet
E1 251 Linear and Nonlinear Op2miza2on
24 pages
Lec 11
No ratings yet
Lec 11
13 pages
Hauser Lecture2
No ratings yet
Hauser Lecture2
26 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
Gradient and Newton Optimization
No ratings yet
Gradient and Newton Optimization
42 pages
Screenshot 2024-10-19 at 10.37.25 AM
No ratings yet
Screenshot 2024-10-19 at 10.37.25 AM
25 pages
Optimumengineeringdesign Day3a
No ratings yet
Optimumengineeringdesign Day3a
34 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
5 1 SD 17122020
No ratings yet
5 1 SD 17122020
47 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
Clnote Sept24
No ratings yet
Clnote Sept24
24 pages
Steepest Descent Algorithm
No ratings yet
Steepest Descent Algorithm
28 pages
Lecture8 UnconstrainedII 2023
No ratings yet
Lecture8 UnconstrainedII 2023
57 pages
Gradient Descent & Linear Regression
No ratings yet
Gradient Descent & Linear Regression
75 pages
Basic Concepts: 1.1 Continuity
No ratings yet
Basic Concepts: 1.1 Continuity
7 pages
Midterm 1 Notes
No ratings yet
Midterm 1 Notes
46 pages
Part3 1
No ratings yet
Part3 1
15 pages
Lecture 14
No ratings yet
Lecture 14
9 pages
14 Newton
No ratings yet
14 Newton
24 pages
19 Newton Method
No ratings yet
19 Newton Method
10 pages
Sheet 2 Solution
No ratings yet
Sheet 2 Solution
5 pages
Chapter 8 Lecture Notes
No ratings yet
Chapter 8 Lecture Notes
4 pages
Unconstrained Function Optimization
No ratings yet
Unconstrained Function Optimization
30 pages
4 Pattern Directions, 21-08-2024
No ratings yet
4 Pattern Directions, 21-08-2024
58 pages
Lecture 7 Si416 2025
No ratings yet
Lecture 7 Si416 2025
21 pages
Nocedal - Wright CH - 02-02
No ratings yet
Nocedal - Wright CH - 02-02
12 pages
OpTimIzation Overview
No ratings yet
OpTimIzation Overview
47 pages
Project For Automated Train by Roshan
No ratings yet
Project For Automated Train by Roshan
6 pages
Unconstrained Multivariable Optimization
No ratings yet
Unconstrained Multivariable Optimization
42 pages
Multi-Variable Optimization Methods
No ratings yet
Multi-Variable Optimization Methods
21 pages
Newton's Method For Unconstrained Optimization
No ratings yet
Newton's Method For Unconstrained Optimization
14 pages
04 Nonlinear Systems and Optimization
No ratings yet
04 Nonlinear Systems and Optimization
74 pages
Process Optimization
100% (1)
Process Optimization
70 pages
Chapter 2 - Final
No ratings yet
Chapter 2 - Final
11 pages
Bisection Method in Optimization
No ratings yet
Bisection Method in Optimization
7 pages
Lecture3-Steepest and Gradient Descent
No ratings yet
Lecture3-Steepest and Gradient Descent
7 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
24 pages
Optimization Based On Gradient Descent
No ratings yet
Optimization Based On Gradient Descent
24 pages
Opt Lec 10
No ratings yet
Opt Lec 10
16 pages
Optimization of Chemical Processes (Che1011)
No ratings yet
Optimization of Chemical Processes (Che1011)
9 pages
(K) K (k+1) (K) K (K)
No ratings yet
(K) K (k+1) (K) K (K)
6 pages
Lecture 9
No ratings yet
Lecture 9
31 pages
Week02 Convex Optimization
No ratings yet
Week02 Convex Optimization
48 pages
Unit VI Optimization Techniques Question Bank Solved Answer
No ratings yet
Unit VI Optimization Techniques Question Bank Solved Answer
20 pages
Lecture A1 07 LineSearch
No ratings yet
Lecture A1 07 LineSearch
6 pages
Line Search Algorithms With Guaranteed Sufficient Decrease
No ratings yet
Line Search Algorithms With Guaranteed Sufficient Decrease
22 pages
LineSearch Methods
No ratings yet
LineSearch Methods
19 pages
Clnote Oct8
No ratings yet
Clnote Oct8
39 pages
Accounting, Organizations and Society: Isabella Grabner, Frank Moers
No ratings yet
Accounting, Organizations and Society: Isabella Grabner, Frank Moers
13 pages
Primary 4 Math Mid-Year Exam Paper
100% (1)
Primary 4 Math Mid-Year Exam Paper
31 pages
ML
No ratings yet
ML
2 pages
Thermistor Specs for Engineers
No ratings yet
Thermistor Specs for Engineers
2 pages
Shaft Voltage Testing Guide
No ratings yet
Shaft Voltage Testing Guide
11 pages
12-Tuning Based On Integral Error Criteria
No ratings yet
12-Tuning Based On Integral Error Criteria
15 pages
Adtrx1v1 3 - Yu1lm
No ratings yet
Adtrx1v1 3 - Yu1lm
16 pages
Laser Series c15 Alarms
100% (3)
Laser Series c15 Alarms
44 pages
Momentum Dan Impuls
No ratings yet
Momentum Dan Impuls
14 pages
8.1 em 09
No ratings yet
8.1 em 09
12 pages
Ancient Tamil Vattezhutthu Alphabets Recognition in Stone Inscription Using Wavelet Transform and SVM Classifier
No ratings yet
Ancient Tamil Vattezhutthu Alphabets Recognition in Stone Inscription Using Wavelet Transform and SVM Classifier
5 pages
Giachetti, Ronald E. - Design of Enterprise Systems - Theory, Architecture, and Methods (2010, CRC Press)
100% (5)
Giachetti, Ronald E. - Design of Enterprise Systems - Theory, Architecture, and Methods (2010, CRC Press)
448 pages
Form 3 Computer Studies Exam
No ratings yet
Form 3 Computer Studies Exam
9 pages
Study Guide For A Beginning Course in Ground-Water Hydrology: R"-/ A ,: Part Ii - Instructor'S Guide
100% (1)
Study Guide For A Beginning Course in Ground-Water Hydrology: R"-/ A ,: Part Ii - Instructor'S Guide
42 pages
Smarter Than Air Fryers With Zero Oil Cooking: Presenting Magicook
No ratings yet
Smarter Than Air Fryers With Zero Oil Cooking: Presenting Magicook
2 pages
Autocad 2011 Shortcut Keys
No ratings yet
Autocad 2011 Shortcut Keys
1 page
Classical Encryption Methods Guide
No ratings yet
Classical Encryption Methods Guide
18 pages
Soeteman Legal Logic or Can We Do Without PDF
No ratings yet
Soeteman Legal Logic or Can We Do Without PDF
14 pages
Mak Thermic Fluid Synth
No ratings yet
Mak Thermic Fluid Synth
4 pages
Qualimap 1.0: Installation & Usage Guide
No ratings yet
Qualimap 1.0: Installation & Usage Guide
35 pages
A Presentation On Open Ended Project Topic:-Fluidization Subject: - Fluid Flow Operation
No ratings yet
A Presentation On Open Ended Project Topic:-Fluidization Subject: - Fluid Flow Operation
20 pages
CBSE 12th CH-6 Most Important Questions
No ratings yet
CBSE 12th CH-6 Most Important Questions
25 pages
Market Risk Analysis Pricing Hedging and Trading Financial Instruments V 3 Carol Alexander Updated 2025
100% (1)
Market Risk Analysis Pricing Hedging and Trading Financial Instruments V 3 Carol Alexander Updated 2025
162 pages
ATS48C32Q: Product Data Sheet
No ratings yet
ATS48C32Q: Product Data Sheet
2 pages
Kmno4 Titration
100% (1)
Kmno4 Titration
3 pages
C 100 Dev
No ratings yet
C 100 Dev
10 pages
Seminar Report On Silicon Photonics
83% (6)
Seminar Report On Silicon Photonics
27 pages
Cds 13 Instruct 1
No ratings yet
Cds 13 Instruct 1
53 pages
Modular Expansion Joints For Road Bridges: August 2019
No ratings yet
Modular Expansion Joints For Road Bridges: August 2019
65 pages
Non Voice BPO - Computer Fundamentals
No ratings yet
Non Voice BPO - Computer Fundamentals
32 pages

Gradient and Newton's Methods Lecture

Uploaded by

Gradient and Newton's Methods Lecture

Uploaded by

optimization

(SI 416) – lecture 9

Harsha Hutridurga (IIT Bombay) SI 416 1 / 14

♣ Take a continuously differentiable function f : Rn → R

♣ If f is strongly convex, then there is a unique global minimizer x∗

♣ Hence we deduced that

Harsha Hutridurga (IIT Bombay) SI 416 2 / 14

♣ Take a twice continuously differentiable function f : Rn → R

Harsha Hutridurga (IIT Bombay) SI 416 3 / 14

♣ Assume further that f is smooth in the following sense:

♣ Hence we deduced that

Harsha Hutridurga (IIT Bombay) SI 416 4 / 14

♣ If a sequence {x(n) } ⊂ Rn converging to a point x∗ ∈ Rn , then

♣ For a convergent sequence, we can talk about rate of convergence

I The convergence is quadratic if there exists a C > 0 such that

♣ Recall: For the gradient descent algorithm to minimize a strongly

♣ Hence the convergence here is linear

♣ Hence the convergence here is quadratic

♣ Start with an initial vector x(0) ∈ Rn and a direction p(0) ∈ Rn

♣ At the point x(1) , pick a new direction p(1) ∈ Rn

♣ General principle of line search algorithms:

Harsha Hutridurga (IIT Bombay) SI 416 7 / 14

♣ At each iteration step, we may perform a one-dimensional

♣ But, in practice, we are content with finding a candidate that

♣ So, here the search direction at the nth iteration step is

Harsha Hutridurga (IIT Bombay) SI 416 8 / 14

♣ If we are interested in finding a unit direction of maximum

♣ Recall that, if θn denotes the angle between ∇f (x(n) ) and p, then

♣ So, the minimum possible value of f (x(n) ), p is obtained when

♣ Observe that the unit vector p which realises that is

Harsha Hutridurga (IIT Bombay) SI 416 10 / 14

♣ We have seen that steepest descent is a line search algorithm

♣ Taylor’s theorem says

♣ Hence, if we take 0 < α  1, and if we ensure that

then we find that f (x(n+1) ) < f (x(n) )

♣ For any search direction p, we have by Taylor’s theorem:

♣ Observe that F is a quadratic function in p

♣ This is the search direction in Newton’s algorithm

♣ Newton’s algorithm is also a line search algorithm

♣ If ∇2 f is strictly positive definite, then

♣ Hence the above p(n) is a descent direction

Harsha Hutridurga (IIT Bombay) SI 416 13 / 14

Harsha Hutridurga (IIT Bombay) SI 416 14 / 14

You might also like

♣ Hence, if we take 0 < α 1, and if we ensure that