0% found this document useful (0 votes)
20 views25 pages

Clnote Oct12

The document discusses optimization techniques in nonlinear programming, focusing on multi-dimensional minimization methods and the rate of convergence for quadratic objective functions. It explains the steepest descent algorithm, step length calculations, and the Kantorovich inequality, which relates eigenvalues of a symmetric positive definite matrix to convergence rates. Additionally, it highlights the significance of the condition number in determining convergence speed, with implications for the shape of contours in optimization problems.

Uploaded by

Sreya Banerjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views25 pages

Clnote Oct12

The document discusses optimization techniques in nonlinear programming, focusing on multi-dimensional minimization methods and the rate of convergence for quadratic objective functions. It explains the steepest descent algorithm, step length calculations, and the Kantorovich inequality, which relates eigenvalues of a symmetric positive definite matrix to convergence rates. Additionally, it highlights the significance of the condition number in determining convergence speed, with implications for the shape of contours in optimization problems.

Uploaded by

Sreya Banerjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Optimization

Nonlinear programming:
Multi-dimensional minimization methods
Rate of Convergence
• Consider quadratic objective function
1 𝑇𝑇
• 𝑓𝑓 𝒙𝒙 = 𝒙𝒙 𝑯𝑯𝑯𝑯 + 𝑪𝑪𝑇𝑇 𝒙𝒙
2
• Optimal solution: 𝒙𝒙∗ = −𝑯𝑯−1 𝑪𝑪
1
• So, 𝑓𝑓 𝒙𝒙∗ = − 𝑪𝑪𝑇𝑇 𝑯𝑯−1 𝑪𝑪
2
• Let 𝒙𝒙𝑘𝑘 is the kth point of steepest descent algorithm
1 𝑇𝑇
i.e,𝑓𝑓 𝒙𝒙𝑘𝑘 = 𝒙𝒙𝑘𝑘 𝑯𝑯𝒙𝒙𝑘𝑘 + 𝑪𝑪𝑇𝑇 𝒙𝒙𝑘𝑘
2
• The current direction 𝒅𝒅𝑘𝑘 = −𝒈𝒈 𝒙𝒙𝑘𝑘 = −𝑯𝑯𝒙𝒙𝑘𝑘 − 𝑪𝑪
• Let us compute the next iteration:
1
• 𝜙𝜙 = 𝑓𝑓 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 = 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 𝑇𝑇 𝑯𝑯 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 +
2
𝑪𝑪𝑇𝑇 (𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 )
Step Length
𝜙𝜙 = 𝑓𝑓 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘
1
= 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 𝑇𝑇 𝑯𝑯 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 + 𝑪𝑪𝑇𝑇 (𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 )
2
1 𝑇𝑇 1 2 𝑇𝑇
= 𝒙𝒙𝑘𝑘 𝑯𝑯𝑥𝑥𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 𝑯𝑯𝒙𝒙𝑘𝑘 + 𝛼𝛼 𝒅𝒅𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘 + 𝑪𝑪𝑇𝑇 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝑪𝑪𝑇𝑇 𝒅𝒅𝑘𝑘
2 2
𝑇𝑇 𝑇𝑇
1 2 𝑇𝑇
= 𝑓𝑓 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 𝑯𝑯𝒙𝒙𝑘𝑘 + 𝛼𝛼𝑪𝑪 𝒅𝒅𝑘𝑘 + 𝛼𝛼 𝒅𝒅𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
2
𝑇𝑇 1 2 𝑇𝑇
= 𝑓𝑓 𝒙𝒙𝑘𝑘 − 𝛼𝛼𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘 + 𝛼𝛼 𝒅𝒅𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
2
𝜕𝜕𝜕𝜕
= −𝒅𝒅𝑇𝑇𝑘𝑘 𝒅𝒅𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑇𝑇𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘 = 0
𝜕𝜕𝜕𝜕
𝒅𝒅𝑇𝑇
𝑘𝑘 𝒅𝒅𝑘𝑘
So, 𝛼𝛼 =
𝒅𝒅𝑇𝑇
𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
Rate of Convergence
1 2 𝑇𝑇
𝑓𝑓 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 = 𝑓𝑓 𝒙𝒙𝑘𝑘 − 𝛼𝛼𝒅𝒅𝑇𝑇𝑘𝑘 𝒅𝒅𝑘𝑘 + 𝛼𝛼 𝒅𝒅𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
2 2
𝑇𝑇
1 𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘
= 𝑓𝑓 𝒙𝒙𝑘𝑘 −
2 𝒅𝒅𝑇𝑇𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
𝑇𝑇 2
1 𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘
𝑓𝑓 𝒙𝒙𝑘𝑘 − 𝑇𝑇 −𝑓𝑓 𝒙𝒙∗
𝑓𝑓 𝒙𝒙𝑘𝑘 +1 −𝑓𝑓 𝒙𝒙∗ 2 𝒅𝒅 𝑯𝑯𝒅𝒅
Now, = 𝑘𝑘 𝑘𝑘
𝑓𝑓(𝒙𝒙𝑘𝑘 −𝑓𝑓(𝒙𝒙∗ ) 𝑓𝑓 𝒙𝒙𝑘𝑘 −𝑓𝑓(𝒙𝒙∗ )

𝑇𝑇 2
1 𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘
2 𝒅𝒅𝑇𝑇 𝑯𝑯𝒅𝒅
𝑘𝑘 𝑘𝑘
=1− 1 𝑇𝑇 1
𝒙𝒙𝑘𝑘 𝑯𝑯𝒙𝒙𝑘𝑘 +𝑪𝑪𝑇𝑇 𝒙𝒙𝑘𝑘 + 𝑪𝑪𝑇𝑇 𝑯𝑯−1 𝑪𝑪
2 2
Rate of Convergence
𝑇𝑇 2
1 𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘
𝑓𝑓 𝒙𝒙𝑘𝑘 +1 −𝑓𝑓 𝒙𝒙∗ 2 𝒅𝒅𝑇𝑇 𝑯𝑯𝒅𝒅
𝑘𝑘 𝑘𝑘
=1− 1 𝑇𝑇 1
𝑓𝑓(𝒙𝒙𝑘𝑘 )−𝑓𝑓(𝒙𝒙∗ ) 𝒙𝒙𝑘𝑘 𝑯𝑯𝒙𝒙𝑘𝑘 +𝑪𝑪𝑇𝑇 𝒙𝒙𝑘𝑘 + 𝑪𝑪𝑇𝑇 𝑯𝑯−1 𝑪𝑪
2 2

𝑇𝑇 2
1 𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘
2 𝒅𝒅𝑇𝑇𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
=1−
1
(𝑯𝑯𝒙𝒙𝑘𝑘 + 𝑪𝑪)𝑻𝑻 𝐻𝐻−1 (𝑯𝑯𝒙𝒙𝑘𝑘 + 𝑪𝑪)
2

𝑇𝑇 2
𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘 1
=1− =1−
𝒅𝒅𝑇𝑇𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘 𝒅𝒅𝑇𝑇𝑘𝑘 𝑯𝑯−1 𝒅𝒅𝑘𝑘 β
Rate of Convergence
Kantorovich inequality
Let H ϵ 𝑅𝑅𝑁𝑁×𝑁𝑁 be a symmetric positive definite matrix. Let 𝜆𝜆1 𝑎𝑎𝑎𝑎𝑎𝑎 𝜆𝜆𝑛𝑛
be respectively the largest and smallest eigenvalues of H. Then, for
any vector 𝐲𝐲 ≠ 0,
𝑦𝑦 𝑇𝑇 𝑦𝑦 2 4𝜆𝜆1 𝜆𝜆𝑛𝑛
𝑇𝑇 𝑇𝑇 −1

(𝑦𝑦 𝐻𝐻𝐻𝐻)(𝑦𝑦 𝐻𝐻 𝑦𝑦) 𝜆𝜆1 + 𝜆𝜆𝑛𝑛 2

Using the above inequality,


2
𝜆𝜆𝑛𝑛
𝑓𝑓 𝒙𝒙𝑘𝑘 + 1 − 𝑓𝑓 𝒙𝒙∗ 1 4𝜆𝜆1 𝜆𝜆𝑛𝑛 −1
𝜆𝜆 1

=1− ≤1− 2
≤ 2
𝑓𝑓(𝒙𝒙𝑘𝑘 ) − 𝑓𝑓(𝒙𝒙 ) 𝛽𝛽 𝜆𝜆1 + 𝜆𝜆𝑛𝑛 𝜆𝜆𝑛𝑛
+1
𝜆𝜆 1
𝑓𝑓 𝒙𝒙𝑘𝑘 + 1 − 𝑓𝑓 𝒙𝒙∗ 𝜆𝜆𝑛𝑛 − 𝜆𝜆1 2
≤ = 𝛿𝛿 ⇒ 𝐥𝐥𝐥𝐥𝐥𝐥𝐥𝐥𝐥𝐥𝐥𝐥 𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜
𝑓𝑓(𝒙𝒙𝑘𝑘 ) − 𝑓𝑓(𝒙𝒙∗ ) 𝜆𝜆𝑛𝑛 + 𝜆𝜆1 2
Rate of Convergence
𝜆𝜆1
• The ratio 𝜅𝜅 𝑯𝑯 = is called the condition number of H.
𝜆𝜆𝑛𝑛
The larger the condition number, the closer the ratio of
𝜆𝜆1 −𝜆𝜆𝑛𝑛
to 1 and therefore slower the convergence.
𝜆𝜆1 +𝜆𝜆𝑛𝑛

• If 𝜆𝜆1 = 𝜆𝜆𝑛𝑛 then only one iteration is needed, in this case H is


a multiple of the identity matrix, and the contours are
concentric circles which means that the steepest descent
direction points straight at the solution.

𝜆𝜆1
• As the condition number 𝜅𝜅 𝑯𝑯 = increases the contours
𝜆𝜆𝑛𝑛
become more elongated, which increases the amount of zig-
zagging.
Newton’s method
Newton’s method presented in One Dimensional Minimisation Methods
can be extended for the minimization of multivariable functions.
Consider min 𝑓𝑓(𝐗𝐗)
𝐗𝐗

The quadratic approximation of the function f (x) at x=xi using the


Taylor’s series expansion
1
𝑓𝑓 𝐱𝐱 ≈ 𝑓𝑓𝑞𝑞 (𝒙𝒙) = 𝑓𝑓 𝐱𝐱 𝐢𝐢 + 𝛻𝛻𝑓𝑓𝑖𝑖𝑇𝑇 𝐱𝐱 − 𝐱𝐱 𝐢𝐢 + (𝐱𝐱 − 𝐱𝐱 𝐢𝐢 )𝑇𝑇 [𝑯𝑯𝒊𝒊 ](𝐱𝐱 − 𝐱𝐱 𝐢𝐢 )
2

where [Hi]=[H]|X is the matrix of second partial derivatives (Hessian


matrix) of f evaluated at the point xi. By setting the partial derivative of
the above equation equal to zero for the minimum of f (x), we obtain
Newton’s method
The equations
1
𝑓𝑓 𝐱𝐱 ≈ 𝑓𝑓𝑞𝑞 (𝒙𝒙) = 𝑓𝑓 𝐱𝐱 𝐢𝐢 + 𝛻𝛻𝑓𝑓𝑖𝑖𝑇𝑇 𝐱𝐱 − 𝐱𝐱 𝐢𝐢 + (𝐱𝐱 − 𝐱𝐱𝐢𝐢 )𝑇𝑇 [𝑯𝑯𝒊𝒊 ](𝐱𝐱 − 𝐱𝐱 𝐢𝐢 )
2
𝜕𝜕𝜕𝜕 𝒙𝒙
and = 0, 𝑓𝑓𝑓𝑓𝑓𝑓 𝑗𝑗 = 1,2, … . , 𝑛𝑛
𝝏𝝏𝒙𝒙𝒋𝒋

give:
𝛻𝛻𝑓𝑓𝑞𝑞 = 𝛻𝛻𝑓𝑓𝑖𝑖 + 𝐻𝐻𝑖𝑖 𝐱𝐱 − 𝐱𝐱 𝐢𝐢 = g𝑖𝑖 + 𝑯𝑯𝒊𝒊 𝐱𝐱 − 𝐱𝐱 𝐢𝐢 = 0

If [Hi] is nonsingular and invertible, the above equation can be


solved to obtain an improved approximation (x=xi+1) as
𝐱𝐱 𝐢𝐢+𝟏𝟏 = 𝐱𝐱 𝐢𝐢 − [𝑯𝑯𝒊𝒊 ]−1 g𝒊𝒊
Newton’s method
Since the function is approximated as quadratic function at every
iteration and also higher order terms have been neglected in the
equation
1
𝑓𝑓 𝐱𝐱 ≈ 𝑓𝑓𝑞𝑞 (𝐱𝐱) = 𝑓𝑓 𝐱𝐱 𝐢𝐢 + 𝛻𝛻𝑓𝑓𝑖𝑖𝑇𝑇 𝐱𝐱 − 𝐱𝐱 𝐢𝐢 + (𝐱𝐱 − 𝐱𝐱 𝐢𝐢 )𝑇𝑇 [𝑯𝑯𝒊𝒊 ](𝐱𝐱 − 𝐱𝐱 𝐢𝐢 )
2

the equation 𝐱𝐱 𝐢𝐢+𝟏𝟏 = 𝐱𝐱 𝐢𝐢 − [𝑯𝑯𝒊𝒊 ]−1 g𝒊𝒊 is to be used iteratively to


find the optimum solution x*.

The sequence of points x1, x2, ..., xi+1 can be shown to converge to
the actual solution x* from any initial point x1 sufficiently close to
the solution x* , provided that [Hi] is invertible and nonsingular.
Newton’s Method
• The update equation 𝐱𝐱 𝐢𝐢+𝟏𝟏 = 𝐱𝐱 𝐢𝐢 − [𝑯𝑯𝒊𝒊 ]−1 g𝒊𝒊
is of the form 𝐱𝐱 𝐢𝐢+𝟏𝟏 = 𝐱𝐱 𝐢𝐢 + 𝜆𝜆𝑖𝑖 𝒅𝒅𝒊𝒊
• Classical Newton Method:
• Newton Direction: 𝒅𝒅𝑁𝑁 𝑖𝑖 = −[𝑯𝑯 𝒊𝒊 ] −1
g𝒊𝒊
• Step Length: 𝜆𝜆𝑖𝑖 = 1
• Is 𝒅𝒅𝑁𝑁
𝑖𝑖 a descent direction?
g 𝑇𝑇𝑖𝑖 𝒅𝒅𝑁𝑁
𝑖𝑖 = −g 𝑻𝑻
𝒊𝒊 [𝑯𝑯𝒊𝒊 ]−1
g𝒊𝒊 < 𝟎𝟎 if Hi is positive definite
1
• Consider to minimize 𝑓𝑓(x)= x 𝑇𝑇 𝐻𝐻x − 𝑐𝑐 𝑇𝑇 x
2
Analytically, ∇f = g = Hx – c = 0 => x* = H-1 c
Using classical Newton’s method, starting from x0
x1 = x0 – H-1(Hx0 - c) = H-1 c = x* => optimum in one step
Classical Newton Method Algorithm
1) Initialize x0 and ε, set k=0
2) while |gk| > ε
a) dk = -(Hk)-1 gk
b) λk = 1
c) xk+1 = xk + λk dk
d) k = k + 1
end while
output : x = x*, a stationary point of f(x)
** 𝜆𝜆𝑘𝑘 is normally evaluated by either exact or in-
exact method (Armijo-Goldstein’s condition)
Newton’s Method: order of convergence
Consider the problem min f(x)
Let x* is the stationary point such that g(x*) = 0 and g’(x*) > 0
Using Newton’s method, at kth iteration
−1
g 𝑘𝑘
𝒙𝒙𝑘𝑘+1 = 𝒙𝒙𝑘𝑘 − [𝐻𝐻𝑘𝑘 ] g 𝑘𝑘 = 𝒙𝒙𝑘𝑘 −
g′𝑘𝑘 ∗
∗ ∗
g 𝑘𝑘 − g(𝒙𝒙 )
∴ 𝒙𝒙𝑘𝑘+1 − 𝒙𝒙 = 𝒙𝒙𝑘𝑘 −𝒙𝒙 −
g′𝑘𝑘

g 𝑘𝑘 − g 𝒙𝒙∗ + g′𝑘𝑘 (𝒙𝒙∗ − 𝒙𝒙𝑘𝑘 )


=−
g′𝑘𝑘
If we approximate g(x*) around xk using taylor series
1
g(𝒙𝒙∗ ) = g(𝒙𝒙𝑘𝑘 ) + g′(𝒙𝒙𝑘𝑘 ) 𝒙𝒙∗ − 𝒙𝒙𝑘𝑘 + g"(𝒙𝒙𝑘𝑘 )(𝒙𝒙∗ − 𝒙𝒙𝑘𝑘 )2
2
Newton’s Method: order of convergence
1
So, g(𝒙𝒙∗ ) = g 𝑘𝑘 + g′𝑘𝑘 𝒙𝒙∗ − 𝒙𝒙𝑘𝑘 + g"𝑘𝑘 (𝒙𝒙∗ − 𝒙𝒙𝑘𝑘 )2
2
1
Or, g 𝑘𝑘 − g(𝒙𝒙∗ ) + g′𝑘𝑘 𝒙𝒙∗ − 𝒙𝒙𝑘𝑘 = − g"𝑘𝑘 (𝒙𝒙∗ − 𝒙𝒙𝑘𝑘 )2
2
Therefore,
1 g"𝑘𝑘
𝒙𝒙𝑘𝑘+1 − 𝒙𝒙∗ = (𝒙𝒙 𝑘𝑘 − 𝒙𝒙 ∗ )2
2 g ′ 𝑘𝑘
So the order of convergence is two.

• Requires O(n3) computational effort for every iteration


• No guarantee that dk is descent direction
• No guarantee that f(xK+1) < f(xk) (no line search)
• Sensitive to initial point (for non-quadratic functions)
Steepest Descent vs Newton’s method

You might also like