Optimization
Nonlinear programming:
Multi-dimensional minimization methods
Rate of Convergence
• Consider quadratic objective function
1 𝑇𝑇
• 𝑓𝑓 𝒙𝒙 = 𝒙𝒙 𝑯𝑯𝑯𝑯 + 𝑪𝑪𝑇𝑇 𝒙𝒙
2
• Optimal solution: 𝒙𝒙∗ = −𝑯𝑯−1 𝑪𝑪
1
• So, 𝑓𝑓 𝒙𝒙∗ = − 𝑪𝑪𝑇𝑇 𝑯𝑯−1 𝑪𝑪
2
• Let 𝒙𝒙𝑘𝑘 is the kth point of steepest descent algorithm
1 𝑇𝑇
i.e,𝑓𝑓 𝒙𝒙𝑘𝑘 = 𝒙𝒙𝑘𝑘 𝑯𝑯𝒙𝒙𝑘𝑘 + 𝑪𝑪𝑇𝑇 𝒙𝒙𝑘𝑘
2
• The current direction 𝒅𝒅𝑘𝑘 = −𝒈𝒈 𝒙𝒙𝑘𝑘 = −𝑯𝑯𝒙𝒙𝑘𝑘 − 𝑪𝑪
• Let us compute the next iteration:
1
• 𝜙𝜙 = 𝑓𝑓 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 = 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 𝑇𝑇 𝑯𝑯 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 +
2
𝑪𝑪𝑇𝑇 (𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 )
Step Length
𝜙𝜙 = 𝑓𝑓 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘
1
= 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 𝑇𝑇 𝑯𝑯 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 + 𝑪𝑪𝑇𝑇 (𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 )
2
1 𝑇𝑇 1 2 𝑇𝑇
= 𝒙𝒙𝑘𝑘 𝑯𝑯𝑥𝑥𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 𝑯𝑯𝒙𝒙𝑘𝑘 + 𝛼𝛼 𝒅𝒅𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘 + 𝑪𝑪𝑇𝑇 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝑪𝑪𝑇𝑇 𝒅𝒅𝑘𝑘
2 2
𝑇𝑇 𝑇𝑇
1 2 𝑇𝑇
= 𝑓𝑓 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 𝑯𝑯𝒙𝒙𝑘𝑘 + 𝛼𝛼𝑪𝑪 𝒅𝒅𝑘𝑘 + 𝛼𝛼 𝒅𝒅𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
2
𝑇𝑇 1 2 𝑇𝑇
= 𝑓𝑓 𝒙𝒙𝑘𝑘 − 𝛼𝛼𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘 + 𝛼𝛼 𝒅𝒅𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
2
𝜕𝜕𝜕𝜕
= −𝒅𝒅𝑇𝑇𝑘𝑘 𝒅𝒅𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑇𝑇𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘 = 0
𝜕𝜕𝜕𝜕
𝒅𝒅𝑇𝑇
𝑘𝑘 𝒅𝒅𝑘𝑘
So, 𝛼𝛼 =
𝒅𝒅𝑇𝑇
𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
Rate of Convergence
1 2 𝑇𝑇
𝑓𝑓 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 = 𝑓𝑓 𝒙𝒙𝑘𝑘 − 𝛼𝛼𝒅𝒅𝑇𝑇𝑘𝑘 𝒅𝒅𝑘𝑘 + 𝛼𝛼 𝒅𝒅𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
2 2
𝑇𝑇
1 𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘
= 𝑓𝑓 𝒙𝒙𝑘𝑘 −
2 𝒅𝒅𝑇𝑇𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
𝑇𝑇 2
1 𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘
𝑓𝑓 𝒙𝒙𝑘𝑘 − 𝑇𝑇 −𝑓𝑓 𝒙𝒙∗
𝑓𝑓 𝒙𝒙𝑘𝑘 +1 −𝑓𝑓 𝒙𝒙∗ 2 𝒅𝒅 𝑯𝑯𝒅𝒅
Now, = 𝑘𝑘 𝑘𝑘
𝑓𝑓(𝒙𝒙𝑘𝑘 −𝑓𝑓(𝒙𝒙∗ ) 𝑓𝑓 𝒙𝒙𝑘𝑘 −𝑓𝑓(𝒙𝒙∗ )
𝑇𝑇 2
1 𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘
2 𝒅𝒅𝑇𝑇 𝑯𝑯𝒅𝒅
𝑘𝑘 𝑘𝑘
=1− 1 𝑇𝑇 1
𝒙𝒙𝑘𝑘 𝑯𝑯𝒙𝒙𝑘𝑘 +𝑪𝑪𝑇𝑇 𝒙𝒙𝑘𝑘 + 𝑪𝑪𝑇𝑇 𝑯𝑯−1 𝑪𝑪
2 2
Rate of Convergence
𝑇𝑇 2
1 𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘
𝑓𝑓 𝒙𝒙𝑘𝑘 +1 −𝑓𝑓 𝒙𝒙∗ 2 𝒅𝒅𝑇𝑇 𝑯𝑯𝒅𝒅
𝑘𝑘 𝑘𝑘
=1− 1 𝑇𝑇 1
𝑓𝑓(𝒙𝒙𝑘𝑘 )−𝑓𝑓(𝒙𝒙∗ ) 𝒙𝒙𝑘𝑘 𝑯𝑯𝒙𝒙𝑘𝑘 +𝑪𝑪𝑇𝑇 𝒙𝒙𝑘𝑘 + 𝑪𝑪𝑇𝑇 𝑯𝑯−1 𝑪𝑪
2 2
𝑇𝑇 2
1 𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘
2 𝒅𝒅𝑇𝑇𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
=1−
1
(𝑯𝑯𝒙𝒙𝑘𝑘 + 𝑪𝑪)𝑻𝑻 𝐻𝐻−1 (𝑯𝑯𝒙𝒙𝑘𝑘 + 𝑪𝑪)
2
𝑇𝑇 2
𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘 1
=1− =1−
𝒅𝒅𝑇𝑇𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘 𝒅𝒅𝑇𝑇𝑘𝑘 𝑯𝑯−1 𝒅𝒅𝑘𝑘 β
Rate of Convergence
Kantorovich inequality
Let H ϵ 𝑅𝑅𝑁𝑁×𝑁𝑁 be a symmetric positive definite matrix. Let 𝜆𝜆1 𝑎𝑎𝑎𝑎𝑎𝑎 𝜆𝜆𝑛𝑛
be respectively the largest and smallest eigenvalues of H. Then, for
any vector 𝐲𝐲 ≠ 0,
𝑦𝑦 𝑇𝑇 𝑦𝑦 2 4𝜆𝜆1 𝜆𝜆𝑛𝑛
𝑇𝑇 𝑇𝑇 −1
≥
(𝑦𝑦 𝐻𝐻𝐻𝐻)(𝑦𝑦 𝐻𝐻 𝑦𝑦) 𝜆𝜆1 + 𝜆𝜆𝑛𝑛 2
Using the above inequality,
2
𝜆𝜆𝑛𝑛
𝑓𝑓 𝒙𝒙𝑘𝑘 + 1 − 𝑓𝑓 𝒙𝒙∗ 1 4𝜆𝜆1 𝜆𝜆𝑛𝑛 −1
𝜆𝜆 1
∗
=1− ≤1− 2
≤ 2
𝑓𝑓(𝒙𝒙𝑘𝑘 ) − 𝑓𝑓(𝒙𝒙 ) 𝛽𝛽 𝜆𝜆1 + 𝜆𝜆𝑛𝑛 𝜆𝜆𝑛𝑛
+1
𝜆𝜆 1
𝑓𝑓 𝒙𝒙𝑘𝑘 + 1 − 𝑓𝑓 𝒙𝒙∗ 𝜆𝜆𝑛𝑛 − 𝜆𝜆1 2
≤ = 𝛿𝛿 ⇒ 𝐥𝐥𝐥𝐥𝐥𝐥𝐥𝐥𝐥𝐥𝐥𝐥 𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜
𝑓𝑓(𝒙𝒙𝑘𝑘 ) − 𝑓𝑓(𝒙𝒙∗ ) 𝜆𝜆𝑛𝑛 + 𝜆𝜆1 2
Rate of Convergence
𝜆𝜆1
• The ratio 𝜅𝜅 𝑯𝑯 = is called the condition number of H.
𝜆𝜆𝑛𝑛
The larger the condition number, the closer the ratio of
𝜆𝜆1 −𝜆𝜆𝑛𝑛
to 1 and therefore slower the convergence.
𝜆𝜆1 +𝜆𝜆𝑛𝑛
• If 𝜆𝜆1 = 𝜆𝜆𝑛𝑛 then only one iteration is needed, in this case H is
a multiple of the identity matrix, and the contours are
concentric circles which means that the steepest descent
direction points straight at the solution.
𝜆𝜆1
• As the condition number 𝜅𝜅 𝑯𝑯 = increases the contours
𝜆𝜆𝑛𝑛
become more elongated, which increases the amount of zig-
zagging.
Newton’s method
Newton’s method presented in One Dimensional Minimisation Methods
can be extended for the minimization of multivariable functions.
Consider min 𝑓𝑓(𝐗𝐗)
𝐗𝐗
The quadratic approximation of the function f (x) at x=xi using the
Taylor’s series expansion
1
𝑓𝑓 𝐱𝐱 ≈ 𝑓𝑓𝑞𝑞 (𝒙𝒙) = 𝑓𝑓 𝐱𝐱 𝐢𝐢 + 𝛻𝛻𝑓𝑓𝑖𝑖𝑇𝑇 𝐱𝐱 − 𝐱𝐱 𝐢𝐢 + (𝐱𝐱 − 𝐱𝐱 𝐢𝐢 )𝑇𝑇 [𝑯𝑯𝒊𝒊 ](𝐱𝐱 − 𝐱𝐱 𝐢𝐢 )
2
where [Hi]=[H]|X is the matrix of second partial derivatives (Hessian
matrix) of f evaluated at the point xi. By setting the partial derivative of
the above equation equal to zero for the minimum of f (x), we obtain
Newton’s method
The equations
1
𝑓𝑓 𝐱𝐱 ≈ 𝑓𝑓𝑞𝑞 (𝒙𝒙) = 𝑓𝑓 𝐱𝐱 𝐢𝐢 + 𝛻𝛻𝑓𝑓𝑖𝑖𝑇𝑇 𝐱𝐱 − 𝐱𝐱 𝐢𝐢 + (𝐱𝐱 − 𝐱𝐱𝐢𝐢 )𝑇𝑇 [𝑯𝑯𝒊𝒊 ](𝐱𝐱 − 𝐱𝐱 𝐢𝐢 )
2
𝜕𝜕𝜕𝜕 𝒙𝒙
and = 0, 𝑓𝑓𝑓𝑓𝑓𝑓 𝑗𝑗 = 1,2, … . , 𝑛𝑛
𝝏𝝏𝒙𝒙𝒋𝒋
give:
𝛻𝛻𝑓𝑓𝑞𝑞 = 𝛻𝛻𝑓𝑓𝑖𝑖 + 𝐻𝐻𝑖𝑖 𝐱𝐱 − 𝐱𝐱 𝐢𝐢 = g𝑖𝑖 + 𝑯𝑯𝒊𝒊 𝐱𝐱 − 𝐱𝐱 𝐢𝐢 = 0
If [Hi] is nonsingular and invertible, the above equation can be
solved to obtain an improved approximation (x=xi+1) as
𝐱𝐱 𝐢𝐢+𝟏𝟏 = 𝐱𝐱 𝐢𝐢 − [𝑯𝑯𝒊𝒊 ]−1 g𝒊𝒊
Newton’s method
Since the function is approximated as quadratic function at every
iteration and also higher order terms have been neglected in the
equation
1
𝑓𝑓 𝐱𝐱 ≈ 𝑓𝑓𝑞𝑞 (𝐱𝐱) = 𝑓𝑓 𝐱𝐱 𝐢𝐢 + 𝛻𝛻𝑓𝑓𝑖𝑖𝑇𝑇 𝐱𝐱 − 𝐱𝐱 𝐢𝐢 + (𝐱𝐱 − 𝐱𝐱 𝐢𝐢 )𝑇𝑇 [𝑯𝑯𝒊𝒊 ](𝐱𝐱 − 𝐱𝐱 𝐢𝐢 )
2
the equation 𝐱𝐱 𝐢𝐢+𝟏𝟏 = 𝐱𝐱 𝐢𝐢 − [𝑯𝑯𝒊𝒊 ]−1 g𝒊𝒊 is to be used iteratively to
find the optimum solution x*.
The sequence of points x1, x2, ..., xi+1 can be shown to converge to
the actual solution x* from any initial point x1 sufficiently close to
the solution x* , provided that [Hi] is invertible and nonsingular.
Newton’s Method
• The update equation 𝐱𝐱 𝐢𝐢+𝟏𝟏 = 𝐱𝐱 𝐢𝐢 − [𝑯𝑯𝒊𝒊 ]−1 g𝒊𝒊
is of the form 𝐱𝐱 𝐢𝐢+𝟏𝟏 = 𝐱𝐱 𝐢𝐢 + 𝜆𝜆𝑖𝑖 𝒅𝒅𝒊𝒊
• Classical Newton Method:
• Newton Direction: 𝒅𝒅𝑁𝑁 𝑖𝑖 = −[𝑯𝑯 𝒊𝒊 ] −1
g𝒊𝒊
• Step Length: 𝜆𝜆𝑖𝑖 = 1
• Is 𝒅𝒅𝑁𝑁
𝑖𝑖 a descent direction?
g 𝑇𝑇𝑖𝑖 𝒅𝒅𝑁𝑁
𝑖𝑖 = −g 𝑻𝑻
𝒊𝒊 [𝑯𝑯𝒊𝒊 ]−1
g𝒊𝒊 < 𝟎𝟎 if Hi is positive definite
1
• Consider to minimize 𝑓𝑓(x)= x 𝑇𝑇 𝐻𝐻x − 𝑐𝑐 𝑇𝑇 x
2
Analytically, ∇f = g = Hx – c = 0 => x* = H-1 c
Using classical Newton’s method, starting from x0
x1 = x0 – H-1(Hx0 - c) = H-1 c = x* => optimum in one step
Classical Newton Method Algorithm
1) Initialize x0 and ε, set k=0
2) while |gk| > ε
a) dk = -(Hk)-1 gk
b) λk = 1
c) xk+1 = xk + λk dk
d) k = k + 1
end while
output : x = x*, a stationary point of f(x)
** 𝜆𝜆𝑘𝑘 is normally evaluated by either exact or in-
exact method (Armijo-Goldstein’s condition)
Newton’s Method: order of convergence
Consider the problem min f(x)
Let x* is the stationary point such that g(x*) = 0 and g’(x*) > 0
Using Newton’s method, at kth iteration
−1
g 𝑘𝑘
𝒙𝒙𝑘𝑘+1 = 𝒙𝒙𝑘𝑘 − [𝐻𝐻𝑘𝑘 ] g 𝑘𝑘 = 𝒙𝒙𝑘𝑘 −
g′𝑘𝑘 ∗
∗ ∗
g 𝑘𝑘 − g(𝒙𝒙 )
∴ 𝒙𝒙𝑘𝑘+1 − 𝒙𝒙 = 𝒙𝒙𝑘𝑘 −𝒙𝒙 −
g′𝑘𝑘
g 𝑘𝑘 − g 𝒙𝒙∗ + g′𝑘𝑘 (𝒙𝒙∗ − 𝒙𝒙𝑘𝑘 )
=−
g′𝑘𝑘
If we approximate g(x*) around xk using taylor series
1
g(𝒙𝒙∗ ) = g(𝒙𝒙𝑘𝑘 ) + g′(𝒙𝒙𝑘𝑘 ) 𝒙𝒙∗ − 𝒙𝒙𝑘𝑘 + g"(𝒙𝒙𝑘𝑘 )(𝒙𝒙∗ − 𝒙𝒙𝑘𝑘 )2
2
Newton’s Method: order of convergence
1
So, g(𝒙𝒙∗ ) = g 𝑘𝑘 + g′𝑘𝑘 𝒙𝒙∗ − 𝒙𝒙𝑘𝑘 + g"𝑘𝑘 (𝒙𝒙∗ − 𝒙𝒙𝑘𝑘 )2
2
1
Or, g 𝑘𝑘 − g(𝒙𝒙∗ ) + g′𝑘𝑘 𝒙𝒙∗ − 𝒙𝒙𝑘𝑘 = − g"𝑘𝑘 (𝒙𝒙∗ − 𝒙𝒙𝑘𝑘 )2
2
Therefore,
1 g"𝑘𝑘
𝒙𝒙𝑘𝑘+1 − 𝒙𝒙∗ = (𝒙𝒙 𝑘𝑘 − 𝒙𝒙 ∗ )2
2 g ′ 𝑘𝑘
So the order of convergence is two.
• Requires O(n3) computational effort for every iteration
• No guarantee that dk is descent direction
• No guarantee that f(xK+1) < f(xk) (no line search)
• Sensitive to initial point (for non-quadratic functions)
Steepest Descent vs Newton’s method