0% found this document useful (0 votes)

13 views39 pages

Clnote Oct8

The document discusses optimization techniques in nonlinear programming, focusing on various unconstrained minimization methods, including direct and descent methods. It elaborates on the importance of the gradient vector in determining the direction of steepest ascent and descent, as well as the challenges in evaluating gradients. Additionally, it introduces the steepest descent method and conjugate gradient methods, highlighting their iterative processes and convergence criteria.

Uploaded by

Sreya Banerjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views39 pages

Clnote Oct8

Uploaded by

Sreya Banerjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Optimization

Nonlinear programming:
Multi-dimensional minimization methods
Classification of unconstrained
minimization methods
Direct search methods Descent methods

• Univariate method • Steepest descent (Cauchy

• Box evolution method method)
• Simplex method • Fletcher-Reeves method
• Pattern search methods • Newton’s method
• Powell’s method • Marquardt method
• Hooke-Jeeves method • Quasi-Newton methods
• Rosenbrock’s method • Davidon-Fletcher-Powell
method
• Broyden-Fletcher-Goldfarb-
Shanno method
Indirect search (descent method)
Gradient of a function
The gradient of a
function is an n-
component vector given
by:
 ∂f 
 ∂x 
 1
 ∂f 
 
∇f = g =  ∂x2 
n×1
  
 
 ∂f 
 ∂x 
 n
Indirect search (descent method)

The gradient has a very important property. If we move along the

gradient direction from any point in n dimensional space, the function
value increases at the fastest rate. Hence the gradient direction is called
the direction of the steepest ascent. Unfortunately, the direction of
steepest ascent is a local property not a global one.
Indirect search (descent method)
• The gradient vectors g evaluated at
points 1,2,3 and 4 lie along the
directions 11’, 22’, 33’,44’,
respectively.
• Thus the function value increases
at the fastest rate in the direction
11’ at point 1, but not at point 2.
Similarly, the function value
increases at the fastest rate in
direction 22’ at point 2, but not at
point 3.
• In other words, the direction of
steepest ascent generally varies
from point to point, and if we make
infinitely small moves along the
direction of steepest ascent, the
path will be a curved line like the
curve 1-2-3-4 in the
Indirect search (descent method)
• Since the gradient vector represents the direction of steepest ascent, the
negative of the gradient vector denotes the direction of the steepest
descent.

• Thus, any method that makes use of the gradient vector can be expected
to give the minimum point faster than one that does not make use of the
gradient vector.

• All the descent methods make use of the gradient vector, either directly
or indirectly, in finding the search directions.

Theorem 1: The gradient vector represents the direction of the steepest

ascent.
Theorem 2: The maximum rate of change of f at any point X is equal to the
magnitude of the gradient vector at the same point.
Indirect search (descent method)
• In general, if df/ds =∇f Tu > 0 along a vector dX, it is called a
direction of ascent, and if df/ds < 0, it is called a direction of descent.

Evaluation of the gradient

The evaluation of the gradient requires the computation of the partial
derivatives δf/δxi , i=1,2,….,n. There are three situations where the
evaluation of the gradient poses certain problems:

1. The function is differentiable at all the points, but the calculation of

the components of the gradient, δf/δxi , is either impractical or
impossible.
2. The expressions for the partial derivatives δf/δxi can be derived, but
they require large computational time for evaluation.
3. The gradient ∇f is not defined at all points.
Indirect search (descent method)
1st case: The function is differentiable at all the points, but the calculation of the
components of the gradient, δf/δxi , is either impractical or impossible.
• In the first case, we can use the forward finite-difference formula

∂f f ( X m + ∆xi u i ) − f ( X m )
≅ , i = 1,2,..., n
∂xi Xm
∆xi

to approximate the partial derivative δf/δxi at Xm. If the function value at

the base point Xm is known, this formula requires one additional function
evaluation to find (δf/δxi )|Xm. Thus, it requires n additional function
evaluations to evaluate the approximate gradient ∇f |Xm. For better
results, we can use the central finite difference formula to find the
approximate partial derivative δf/δxi |Xm

∂f f ( X m + ∆xi u i ) − f ( X m − ∆xi u i )
≅ , i = 1,2,..., n
∂xi Xm
2∆xi
Indirect search (descent method)
• In these two equations, ∆xi is a small scalar quantity and ui is a vector of
order n whose ith component has a value of 1, and all other components
have a value of zero.

• In practical computations, the value of ∆xi has to be chosen with some care.
If ∆xi is too small, the difference between the values of the function
evaluated at (Xm+ ∆xi ui) and (Xm- ∆xi ui) may be very small and numerical
round-off errors may dominate. On the other hand, if ∆xi is too large, the
truncation error may predominate in the calculation of the gradient.

• If the expressions for the partial derivatives may be derived, but they require
large computational time for evaluation (Case 2), the use of the finite
difference formulas has to be preferred whenever the exact gradient
evaluation requires more computational time.
Indirect search (descent method)
• If the gradient is not defined at all points (Case 3), we can not use the finite
difference formulas.
• For example, consider the function shown in the figure. If the equation
∂f f ( X m + ∆xi u i ) − f ( X m − ∆xi u i )
≅ , i = 1,2,..., n
∂xi Xm
2∆xi

is used to evaluate the derivative

df/dx at Xm, we obtain a value of
α1 for a step size ∆x1 and a value
of α2 for a step size ∆x2. Since in
reality, the derivative does not
exist at the point Xm, the use of
the finite-difference formulas
might lead to a complete
breakdown of the minimization
process. In such cases, the
minimization can be done only by
one of the direct search techniques
discussed.
Steepest descent (Cauchy method)
• The use of the negative of the gradient vector as a direction for
minimization was first made by Cauchy in 1847.

• In this method, we start from an initial trial point X1 and iteratively

move along the steepest descent directions until the optimum point is
found.

• The steepest descent method can be summarized by the following steps:

1. Start with an arbitrary initial point X1 . Set the iteration number as

i=1.
2. Find the search direction di as
d i = −∇f ( X i ) = g i

3. Determine the optimal step length λ* in the direction di and set

X i +1 = X i + λ*i d i = X i − λ*i g i
Steepest descent (Cauchy method)
4. Test the new point, Xi+1 , for optimality. If Xi+1 is optimum, stop the
process. Otherwise go to step 5.

5. Set the new iteration number i=i+1 and go to step 2.

The method of steepest descent may appear to be the best

unconstrained minimization technique since each one-dimensional
search starts in the ‘best direction’. However, owing to the fact that
the steepest descent direction is a local property, the method is not
really effective in most problems.
Example
Minimize f ( x1 , x2 ) = x1 − x2 + 2 x12 + 2 x1 x2 + x22
0
Starting from the point X1 =  
0
Solution:
Iteration 1: The gradient of f is given by:

∂f / ∂x1  1 + 4 x1 + 2 x2 
∇f =  = 
∂f / ∂x 2  − 1 + 2 x1 + 2 x 2

 1
g1 = ∇f ( X1 ) =  
− 1
Therefore
− 1
d1 = −g1 =  
 1
Example
• To find X2, we need to find the optimal step length λ1*. For this, we
minimize

df
f ( X1 + λ d) = f (−λ1 , λ1 ) = λ − 2λ1 with respect to λ1. Since
* 2
= 0 at λ1* = 1,
dλ1
1 1

we obtain :
0 − 1 − 1
X 2 = X1 + λ d =   + 1  =  
*
1 1
0  1  1

• As
− 1 0
g 2 = ∇f ( X 2 ) =   ≠  , X 2 is not optimum.
− 1 0
Example
Iteration 2: 1
d 2 = −g 2 =  
1
To minimize
f ( X 2 + λ2d 2 ) = f (−1 + λ2 ,1 + λ2 ) = 5λ22 − 2λ2 − 1
df
we set = 0. This gives λ*2 = 1 / 5, and hence
dλ2
− 1 1 1 − 0.8
X3 = X 2 + λ 2 d 2 =   +   = 
*

 1 5  
1 1 . 2 

 0.2
Since the components of the gradient at X3, g 3 =   are not zero, we
proceed to the next iteration.  − 0 . 2 
Example
Iteration 3: − 0.2 
d 3 = −g 3 =  
 0 . 2 
As
f ( X 3 + λ3d 3 ) = f (−0.8 − 0.2λ3 ,1.2 + 0.2λ3 ) = 0.04λ32 − 0.08λ3 − 1.20
df
we set = 0. This gives λ*3 = 1.0. Therefore,
dλ3
− 0.8 − 0.2 − 1.0 
X 4 = X3 + λ 3 d3 = 
*
 + 1.0 = 
 1. 2   0 . 2   1 . 4 

− 0.2
The gradient at X4 is given by: 4  g = ≠0
 − 0 . 2 
Since the components of the gradient at X4 are not equal to zero, X4 is not optimum
and hence we have to proceed to the next iteration. This process has to be continued
− 1.0 
until the optimum point, X* =   is found.
 1.5
Convergence Criteria
The following criteria can be used to terminate the iterative process:

1. When the change in function value in two consecutive iterations is

small:

f ( X i +1 ) − f ( X i )
≤ ε1
f (Xi )

2. When the partial derivatives (components of the gradient) of f are small:

∂f
≤ ε2, i = 1,2,..., n
∂xi
3. When the change in the design vector in two consecutive iterations is
small:
X i +1 - X i ≤ ε 3
Conjugate Gradient Methods
• The convergence characteristics of the steepest descent method can be
improved greatly by modifying it into a conjugate gradient method which
can be considered as a conjugate directions method involving the use of
the gradient of the function.

• We saw that any minimization method that makes use of the conjugate
directions is quadratically convergent. This property of quadratic
convergence is very useful because it ensures that the method will
minimize a quadratic function in n steps or less.

• Since any general function can be approximated reasonably well by a

quadratic near the optimum point, any quadratically convergent method
is expected to find the optimum point in a finite number of iterations.
Conjugate Gradient Method
• We have seen that Powell’s conjugate direction method requires n single-
variable minimizations per iteration and sets up a new conjugate direction at
the end of each iteration.

• Thus, it requires in general n2 single-variable minimizations to find the

minimum of a quadratic function.

• On the other hand, if we can evaluate the gradients of the objective function,
we can set up a new conjugate direction after every one dimensional
minimization, and hence we can achieve faster convergence.
Development of Conjugate Gradient Methods
• Consider the development of an algorithm by modifying the steepest descent
𝟏𝟏
method applied to a quadratic function 𝐟𝐟 𝐗𝐗 = 𝐗𝐗 𝐓𝐓 𝐀𝐀𝐀𝐀 + 𝐁𝐁 𝐓𝐓 𝐗𝐗 + 𝐂𝐂 by
𝟐𝟐
imposing the condition that the successive directions be mutually conjugate.

• Let X1 be the starting point for the minimization and let the first search
direction be the steepest descent direction:

d1 = -∇f1 = -AX1-B
X 2 = X1 + λ1*d1
X 2 - X1
or d1 =
λ1*
where λ1* is the minimizing step length in the direction d1, so that

d1T ∇f = d1T g 2 = 0
X2
Development of the Fletcher-Reeves Method
The equation d1T g 2 = 0 can be expanded as d [A(X + λ d ) + B] = 0
T
1 1
*
1 1

- d1T (AX1 + B) d1T g1

from which the value of λ1* can be found as: λ =
*
1 T
=− T
d1 Ad1 d1 Ad1

Now express the second search direction as a linear combination of d1 and -g2:
d 2 = -g 2 + β1d1
where β2 is to be chosen so as to make d1 and d2 conjugate. This requires that:
d1T Ad 2 = 0
Substituting d 2 = -g 2 + β1d1 into above equation leads to:

d1T A(-g 2 + β1d1 ) = 0

gT2 Ad1
or,
β1 = T
d1 Ad1
Development of the Conjugate Gradient Methods
The difference of the gradients (g2 - g1) can be expressed as:
(g2 – g1) = (AX2 + B) – (AX1 + B) = A(X2 – X1) = λ1* A d1

gT2 Ad1 gT2 (g 2 - g1 )

• =
so, β1 = (1)
d1T Ad1 d1T (g 2 - g1 )

• Also, we know, d1T g 2 = g T2d1 = 0 and d1T = −g1T

gT2 (g 2 - g1 ) gT2 (g 2 - g1 )
so, β1 =
= (2)
d1T (g 2 - g1 ) g1T g1

• Finally, g1T g 2 =
−d1T g 2 =
0 gives.

gT2 (g 2 - g1 ) gT2 g 2
=β1 = (3)
g1T g1 g1T g1
Development of the Conjugate Gradient Methods
The same methodology can be extended easily to obtain equations for βk values. So
equations 1, 2 and 3 can be written in general form as

gTk +1 (g k +1 - g k )
• βk = T (1) Hesteness-Shiefel Formula
d k (g k +1 - g k )

gTk +1 (g k +1 - g k )
• βk = (2) Polak-Ribiere Formula
gTk g k

gTk +1 g k +1
• βk = T (3) Fletcher-Reeves Formula
gk gk
Fletcher-Reeves Method
The iterative procedure of Fletcher-Reeves method can be stated as follows:
1. Start with an arbitrary initial point X1.
2. Set the first search direction d1 = -∇ f (X1)= -g1
3. Find the point X2 according to the relation
X 2 = X1 + λ*1d1
where λ1* is the optimal step length in the direction d1. Set i=2 and go to the next
step. 2
gi
4. Find gi = ∇f (Xi), and set d i = -g i + 2
d i-1
g i −1

5. Compute the optimum step length λi* in the direction di, and find the new point
X i +1 = X i + λ*i d i

6. Test for the optimality of the point Xi+1. If Xi+1 is optimum, stop the process.
Otherwise set the value of i=i+1 and go to step 4.
Fletcher-Reeves Method
Remarks:
1. Since the directions di used in this method are A- conjugate, the process
should converge in n cycles or less for a quadratic function. However, for
ill-conditioned quadratics (whose contours are highly eccentric and
distorted), the method may require much more than n cycles for
convergence. The reason for this has been found to be the cumulative
effect of the rounding errors. Since di is given by
2
gi
d i = -g i + 2
d i-1
g i −1

any error resulting from the inaccuracies involved in the determination

of λi* , and from the round-off error involved in accumulating the
successive
2 2
g i d i-1 / g i −1
terms, is carried forward through the vector di.
Fletcher-Reeves Method
Thus, the search directions di will be progressively contaminated by these
errors. Hence it is necessary, in practice, to restart the method periodically
after every, say, m steps by taking the new search direction as the steepest
descent direction. That is, after every m steps, dm+1 is set equal to -∇fm+1
instead of the usual form. Fletcher and Reeves have recommended a value of
m=n+1, where n is the number of design variables.

2. Despite the limitations indicated above, the Fletcher-Reeves method is vastly

superior to the steepest descent method and the pattern search methods, but it
turns out to be rather less efficient than the Newton and the quasi-Newton
(variable metric) methods discussed in the latter sections.
Example
Minimize

starting from the point

Solution:
Iteration 1

The search direction is taken as:

Example

• To find the optimal step length λ1* along S1, we minimize

with respect to λ1. Here

• Therefore
Example
Iteration 2: Since

∇f i
2

the equation S i = -∇f i + S i-1

∇f i −1
2

∇f 2
2

gives the next search direction as S 2 = -∇f 2 + S1

∇f 1
2

where

Therefore
Example
• To find λ2*, we minimize

with respect to λ2. As df/d λ2=8 λ2-2=0 at λ2*=1/4, we obtain:

Thus the optimum point is reached in two iterations. Even if we do not

know this point to be optimum, we will not be able to move from this
point in the next iteration. This can be verified as follows:
Example
Iteration 3:
Now

Thus,

This shows that there is no search direction to reduce f further, and hence
X3 is optimum.
Rate of Convergence
• Consider quadratic objective function
1 𝑇𝑇
• 𝑓𝑓 𝒙𝒙 = 𝒙𝒙 𝑯𝑯𝑯𝑯 + 𝑪𝑪𝑇𝑇 𝒙𝒙
2
• Optimal solution: 𝒙𝒙∗ = −𝑯𝑯−1 𝑪𝑪
1
• So, 𝑓𝑓 𝒙𝒙∗ = − 𝑪𝑪𝑇𝑇 𝑯𝑯−1 𝑪𝑪
2
• Let 𝒙𝒙𝑘𝑘 is the kth point of steepest descent algorithm
1 𝑇𝑇
i.e,𝑓𝑓 𝒙𝒙𝑘𝑘 = 𝒙𝒙𝑘𝑘 𝑯𝑯𝒙𝒙𝑘𝑘 + 𝑪𝑪𝑇𝑇 𝒙𝒙𝑘𝑘
2
• The current direction 𝒅𝒅𝑘𝑘 = −𝒈𝒈 𝒙𝒙𝑘𝑘 = −𝑯𝑯𝒙𝒙𝑘𝑘 − 𝑪𝑪
• Let us compute the next iteration:
1
• 𝜙𝜙 = 𝑓𝑓 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 = 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 𝑇𝑇 𝑯𝑯 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 +
2
𝑪𝑪𝑇𝑇 (𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 )
Rate of Convergence
𝜙𝜙 = 𝑓𝑓 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘
1
= 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 𝑇𝑇 𝑯𝑯 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 + 𝑪𝑪𝑇𝑇 (𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 )
2
𝟏𝟏 𝑇𝑇 1 2 𝑇𝑇
= 𝒙𝒙𝑘𝑘 𝑯𝑯𝑥𝑥𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 𝑯𝑯𝒙𝒙𝑘𝑘 + 𝛼𝛼 𝒅𝒅𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘 + 𝑪𝑪𝑇𝑇 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝑪𝑪𝑇𝑇 𝒅𝒅𝑘𝑘
𝟐𝟐 2
𝑇𝑇 𝑇𝑇
1 2 𝑇𝑇
= 𝑓𝑓 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 𝑯𝑯𝒙𝒙𝑘𝑘 + 𝛼𝛼𝑪𝑪 𝒅𝒅𝑘𝑘 + 𝛼𝛼 𝒅𝒅𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
2
𝑇𝑇 1 2 𝑇𝑇
= 𝑓𝑓 𝒙𝒙𝑘𝑘 − 𝛼𝛼𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘 + 𝛼𝛼 𝒅𝒅𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
2
𝜕𝜕𝜕𝜕
= −𝒅𝒅𝑇𝑇𝑘𝑘 𝒅𝒅𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑇𝑇𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘 = 0
𝜕𝜕𝜕𝜕
𝒅𝒅𝑇𝑇
𝑘𝑘 𝒅𝒅𝑘𝑘
So, 𝛼𝛼 =
𝒅𝒅𝑇𝑇
𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
Rate of Convergence
1 2 𝑇𝑇
𝑓𝑓 𝒙𝒙𝑘𝑘 + 𝛼𝛼𝒅𝒅𝑘𝑘 = 𝑓𝑓 𝒙𝒙𝑘𝑘 − 𝛼𝛼𝒅𝒅𝑇𝑇𝑘𝑘 𝒅𝒅𝑘𝑘 + 𝛼𝛼 𝒅𝒅𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
2 2
𝑇𝑇
1 𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘
= 𝑓𝑓 𝒙𝒙𝑘𝑘 −
2 𝒅𝒅𝑇𝑇𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
𝑇𝑇 2
1 𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘
𝑓𝑓 𝒙𝒙𝑘𝑘 − 𝑇𝑇 −𝑓𝑓 𝒙𝒙∗
𝑓𝑓 𝒙𝒙𝑘𝑘 +1 −𝑓𝑓 𝒙𝒙∗ 2 𝒅𝒅 𝑯𝑯𝒅𝒅
Now, = 𝑘𝑘 𝑘𝑘
𝑓𝑓(𝒙𝒙𝑘𝑘 −𝑓𝑓(𝒙𝒙∗ ) 𝑓𝑓 𝒙𝒙𝑘𝑘 −𝑓𝑓(𝒙𝒙∗ )

𝑇𝑇 2
1 𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘
2 𝒅𝒅𝑇𝑇 𝑯𝑯𝒅𝒅
𝑘𝑘 𝑘𝑘
=1− 1 𝑇𝑇 1
𝒙𝒙𝑘𝑘 𝑯𝑯𝒙𝒙𝑘𝑘 +𝑪𝑪𝑇𝑇 𝒙𝒙𝑘𝑘 + 𝑪𝑪𝑇𝑇 𝑯𝑯−1 𝑪𝑪
2 2
Rate of Convergence
𝑇𝑇 2
1 𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘
𝑓𝑓 𝒙𝒙𝑘𝑘 +1 −𝑓𝑓 𝒙𝒙∗ 2 𝒅𝒅𝑇𝑇 𝑯𝑯𝒅𝒅
𝑘𝑘 𝑘𝑘
=1− 1 𝑇𝑇 1
𝑓𝑓(𝒙𝒙𝑘𝑘 )−𝑓𝑓(𝒙𝒙∗ ) 𝒙𝒙𝑘𝑘 𝑯𝑯𝒙𝒙𝑘𝑘 +𝑪𝑪𝑇𝑇 𝒙𝒙𝑘𝑘 + 𝑪𝑪𝑇𝑇 𝑯𝑯−1 𝑪𝑪
2 2

𝑇𝑇 2
1 𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘
2 𝒅𝒅𝑇𝑇𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘
=1−
1
(𝑯𝑯𝒙𝒙𝑘𝑘 + 𝑪𝑪)𝑻𝑻 𝐻𝐻−1 (𝑯𝑯𝒙𝒙𝑘𝑘 + 𝑪𝑪)
2

𝑇𝑇 2
𝒅𝒅𝑘𝑘 𝒅𝒅𝑘𝑘 1
=1− =1−
𝒅𝒅𝑇𝑇𝑘𝑘 𝑯𝑯𝒅𝒅𝑘𝑘 𝒅𝒅𝑇𝑇𝑘𝑘 𝑯𝑯−1 𝒅𝒅𝑘𝑘 β
Rate of Convergence
Kantorovich inequality
Let H ϵ 𝑅𝑅𝑁𝑁×𝑁𝑁 be a symmetric positive definite matrix. Let 𝜆𝜆1 𝑎𝑎𝑎𝑎𝑎𝑎 𝜆𝜆𝑛𝑛
be respectively the largest and smallest eigenvalues of H. Then, for
any 𝐲𝐲 ≠ 0,
𝑦𝑦 𝑇𝑇 𝑦𝑦 2 4𝜆𝜆1 𝜆𝜆𝑛𝑛
𝑇𝑇 𝑇𝑇 −1
≥
(𝑦𝑦 𝐻𝐻𝐻𝐻)(𝑦𝑦 𝐻𝐻 𝑦𝑦) 𝜆𝜆1 + 𝜆𝜆𝑛𝑛 2

Using the above inequality,

2
𝜆𝜆𝑛𝑛
𝑓𝑓 𝒙𝒙𝑘𝑘 + 1 − 𝑓𝑓 𝒙𝒙∗ 1 4𝜆𝜆1 𝜆𝜆𝑛𝑛 −1
𝜆𝜆 1
∗
=1− ≤1− 2
≤ 2
𝑓𝑓(𝒙𝒙𝑘𝑘 ) − 𝑓𝑓(𝒙𝒙 ) 𝛽𝛽 𝜆𝜆1 + 𝜆𝜆𝑛𝑛 𝜆𝜆𝑛𝑛
+1
𝜆𝜆 1
𝑓𝑓 𝒙𝒙𝑘𝑘 + 1 − 𝑓𝑓 𝒙𝒙∗ 𝜆𝜆𝑛𝑛 − 𝜆𝜆1 2
≤ ≤ 𝛿𝛿 ⇒ 𝐥𝐥𝐥𝐥𝐥𝐥𝐥𝐥𝐥𝐥𝐥𝐥 𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜𝐜
𝑓𝑓(𝒙𝒙𝑘𝑘 ) − 𝑓𝑓(𝒙𝒙∗ ) 𝜆𝜆𝑛𝑛 + 𝜆𝜆1 2
Rate of Convergence
𝜆𝜆1
• The ratio 𝜅𝜅 𝑯𝑯 = is called the condition number of H.
𝜆𝜆𝑛𝑛
The larger the condition number, the closer the ratio of
𝜆𝜆1 −𝜆𝜆𝑛𝑛
to 1 and therefore slower the convergence.
𝜆𝜆1 +𝜆𝜆𝑛𝑛

• If 𝜆𝜆1 = 𝜆𝜆𝑛𝑛 then only one iteration is needed in this case H is

a multiple of the identity matrix, and the contours are
concentric circles which means that the steepest descent
direction points straight at the solution.

𝜆𝜆1
• As the condition number 𝜅𝜅 𝑯𝑯 = increases the contours
𝜆𝜆𝑛𝑛
become more elongated, which increases the amount of zig-
zagging.

Gradient Of A Function هّلادلا رادحنإ
No ratings yet
Gradient Of A Function هّلادلا رادحنإ
11 pages
5 1 SD 17122020
No ratings yet
5 1 SD 17122020
47 pages
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
No ratings yet
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
13 pages
FALLSEM2023-24 EEE1020 ETH VL2023240103124 2023-08-19 Reference-Material-I
No ratings yet
FALLSEM2023-24 EEE1020 ETH VL2023240103124 2023-08-19 Reference-Material-I
9 pages
Gradient and Newton Optimization
No ratings yet
Gradient and Newton Optimization
42 pages
Gradient Descent & Linear Regression
No ratings yet
Gradient Descent & Linear Regression
75 pages
MD Unit2 Pes Opt 2013
No ratings yet
MD Unit2 Pes Opt 2013
41 pages
Optimization of Chemical Processes (Che1011)
No ratings yet
Optimization of Chemical Processes (Che1011)
9 pages
Chapter 2 - Final
No ratings yet
Chapter 2 - Final
11 pages
Steepest Descent Method Overview
No ratings yet
Steepest Descent Method Overview
7 pages
Steepest Descent
No ratings yet
Steepest Descent
7 pages
33-Cauchy Method and Fletcher-Reeves Method-13-04-2024
No ratings yet
33-Cauchy Method and Fletcher-Reeves Method-13-04-2024
37 pages
Multivariable Optimization Techniques
No ratings yet
Multivariable Optimization Techniques
25 pages
Steepest Descent for Optimization
No ratings yet
Steepest Descent for Optimization
29 pages
US - TMC - 05 - Optimization 2022
No ratings yet
US - TMC - 05 - Optimization 2022
43 pages
Chapter 8 Lecture Notes
No ratings yet
Chapter 8 Lecture Notes
4 pages
4 Pattern Directions, 21-08-2024
No ratings yet
4 Pattern Directions, 21-08-2024
58 pages
Optimization Methods for Engineers
No ratings yet
Optimization Methods for Engineers
31 pages
Cauchy Gradient Based Technique Lecture 5
No ratings yet
Cauchy Gradient Based Technique Lecture 5
21 pages
Optimization Techniques Lab
No ratings yet
Optimization Techniques Lab
9 pages
Chapter 4: Unconstrained Optimization
No ratings yet
Chapter 4: Unconstrained Optimization
25 pages
Indirect Search Method
No ratings yet
Indirect Search Method
3 pages
Unconstrained Function Optimization
No ratings yet
Unconstrained Function Optimization
30 pages
GA Ex 2
No ratings yet
GA Ex 2
21 pages
Lecture8 UnconstrainedII 2023
No ratings yet
Lecture8 UnconstrainedII 2023
57 pages
Non-Linear Programming Problems in Engineering Optimization Course
No ratings yet
Non-Linear Programming Problems in Engineering Optimization Course
27 pages
ECOM 6302: Engineering Optimization: Chapter Three
100% (1)
ECOM 6302: Engineering Optimization: Chapter Three
56 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
CS-6777 Liu Abs
100% (1)
CS-6777 Liu Abs
103 pages
Ame-2 One-Dimensional-Search
No ratings yet
Ame-2 One-Dimensional-Search
11 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Maximum Slope Method
No ratings yet
Maximum Slope Method
14 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
Multi-Variable Optimization Methods
No ratings yet
Multi-Variable Optimization Methods
21 pages
Lecture05 - Multidimensional Gradient Methods
No ratings yet
Lecture05 - Multidimensional Gradient Methods
39 pages
Lec3 Gradient Based Method Part I
No ratings yet
Lec3 Gradient Based Method Part I
30 pages
Type 5
No ratings yet
Type 5
19 pages
Gradient Descent Insights
No ratings yet
Gradient Descent Insights
4 pages
Midterm 1 Notes
No ratings yet
Midterm 1 Notes
46 pages
Process Optimization
100% (1)
Process Optimization
70 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
(K) K (k+1) (K) K (K)
No ratings yet
(K) K (k+1) (K) K (K)
6 pages
Optimumengineeringdesign Day3a
No ratings yet
Optimumengineeringdesign Day3a
34 pages
Hauser Lecture2
No ratings yet
Hauser Lecture2
26 pages
FALLSEM2020-21 CHE1011 TH VL2020210101704 Reference Material I 05-Sep-2020 Lecture 17 PDF
No ratings yet
FALLSEM2020-21 CHE1011 TH VL2020210101704 Reference Material I 05-Sep-2020 Lecture 17 PDF
19 pages
Structural and Multidisciplinary Optimization
No ratings yet
Structural and Multidisciplinary Optimization
33 pages
Unconstrained Minimization
No ratings yet
Unconstrained Minimization
7 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
02-Subgrad Method Notes
No ratings yet
02-Subgrad Method Notes
27 pages
Op Tim Ization
No ratings yet
Op Tim Ization
25 pages
Chương 9
No ratings yet
Chương 9
12 pages
Optimization Lesson 4 - Numerical Solutions of Unconstrained Single-Variable Optimization
No ratings yet
Optimization Lesson 4 - Numerical Solutions of Unconstrained Single-Variable Optimization
14 pages
Lect 5 - Gradient Descent
No ratings yet
Lect 5 - Gradient Descent
31 pages
06 23ECE216 GradientDescent v2
No ratings yet
06 23ECE216 GradientDescent v2
73 pages
Doan BFGS
No ratings yet
Doan BFGS
72 pages
Lec 5 - Gradient-Descent
No ratings yet
Lec 5 - Gradient-Descent
31 pages
My 5 Minutes Journal
No ratings yet
My 5 Minutes Journal
100 pages
A3 Asymptotics
No ratings yet
A3 Asymptotics
2 pages
RL Diwekar
No ratings yet
RL Diwekar
10 pages
A1 Assignment Part 2
No ratings yet
A1 Assignment Part 2
3 pages
OTPD A4 1st Page
No ratings yet
OTPD A4 1st Page
1 page
Sreya Banerjee 20CH91R26 Assignment#4
No ratings yet
Sreya Banerjee 20CH91R26 Assignment#4
4 pages
A Quantitative Model-Independent Method For Global
No ratings yet
A Quantitative Model-Independent Method For Global
21 pages
Aspen Tutorial 4
No ratings yet
Aspen Tutorial 4
6 pages
Clnote Oct1&5
No ratings yet
Clnote Oct1&5
59 pages
User Mix
No ratings yet
User Mix
8 pages
Assignment 2
No ratings yet
Assignment 2
1 page
Winter School # Finite Volume Method - IV
No ratings yet
Winter School # Finite Volume Method - IV
31 pages
Rohani Et Al 2012 Optimal Solvent Screening For The Crystallization of Pharmaceutical Compounds From Multisolvent
No ratings yet
Rohani Et Al 2012 Optimal Solvent Screening For The Crystallization of Pharmaceutical Compounds From Multisolvent
11 pages
Membrane Technology: by Sreya Banerjee Under The Guidance of Dr. M.K.Mandal
No ratings yet
Membrane Technology: by Sreya Banerjee Under The Guidance of Dr. M.K.Mandal
15 pages
Regions of Attainable Particle Sizes Vetter 1-S2.0-S0009250913007379-Main
No ratings yet
Regions of Attainable Particle Sizes Vetter 1-S2.0-S0009250913007379-Main
14 pages
Winter School # Finite Volume Method - V
No ratings yet
Winter School # Finite Volume Method - V
41 pages
PolymorphicControl - 1 s2.0 S0009250917305894 Main
No ratings yet
PolymorphicControl - 1 s2.0 S0009250917305894 Main
13 pages
Recent Trends On Hybrid Modeling For Indust - 2021 - Computers - Chemical Engine
No ratings yet
Recent Trends On Hybrid Modeling For Indust - 2021 - Computers - Chemical Engine
21 pages
Proceedings 62 00007
No ratings yet
Proceedings 62 00007
8 pages
Unit I Vector Analysis: 1 Scalars and Vectors
No ratings yet
Unit I Vector Analysis: 1 Scalars and Vectors
45 pages
PHY109 Unit1
No ratings yet
PHY109 Unit1
21 pages
Solucionario Parcial Cálculus 1 y 2 (Tom Apostol) - Parte6
No ratings yet
Solucionario Parcial Cálculus 1 y 2 (Tom Apostol) - Parte6
50 pages
12TH Mathematics Practical Question Bank
No ratings yet
12TH Mathematics Practical Question Bank
18 pages
Elementary Quantum Mechanics - With Problems and Solutions
No ratings yet
Elementary Quantum Mechanics - With Problems and Solutions
268 pages
Homogeneous Transformation Matrix and Its Partitioning in Robotics - 20250309 - 194020 - 0000
100% (1)
Homogeneous Transformation Matrix and Its Partitioning in Robotics - 20250309 - 194020 - 0000
2 pages
Vectors 11 Cross Vetor Product
No ratings yet
Vectors 11 Cross Vetor Product
6 pages
Fundamentals of Structural Geology Exercise: Concepts From Chapter 6
No ratings yet
Fundamentals of Structural Geology Exercise: Concepts From Chapter 6
6 pages
Basic Linear Algebra
No ratings yet
Basic Linear Algebra
268 pages
Experiment 4 Eigenvalues and Diagonalization
No ratings yet
Experiment 4 Eigenvalues and Diagonalization
9 pages
Self-Adjoint Operators & Spectral Theorems
No ratings yet
Self-Adjoint Operators & Spectral Theorems
11 pages
Practice Paper-01 - Maths
No ratings yet
Practice Paper-01 - Maths
6 pages
Second Semester Mat-C-415: Numerical Computations
No ratings yet
Second Semester Mat-C-415: Numerical Computations
1 page
MATH115 F24 Course Notes V5
No ratings yet
MATH115 F24 Course Notes V5
394 pages
Lecture 2 - LA
No ratings yet
Lecture 2 - LA
10 pages
Curve-Fitting and Interpolation
No ratings yet
Curve-Fitting and Interpolation
16 pages
Matrices and Detrminants Arjuna
No ratings yet
Matrices and Detrminants Arjuna
73 pages
Matrix Determinants for Statisticians
No ratings yet
Matrix Determinants for Statisticians
19 pages
Maths Sem2
No ratings yet
Maths Sem2
8 pages
12th Grade Math Exam Prep
No ratings yet
12th Grade Math Exam Prep
2 pages
Senan Sekhon: - Algebras: The (Quantum) Path Less Traveled
No ratings yet
Senan Sekhon: - Algebras: The (Quantum) Path Less Traveled
45 pages
M8 Mod 2 T1 Lesson 4 PW
No ratings yet
M8 Mod 2 T1 Lesson 4 PW
2 pages
Advanced Mathematics Exam
No ratings yet
Advanced Mathematics Exam
3 pages
Linear Algebra with MATLAB Guide
No ratings yet
Linear Algebra with MATLAB Guide
27 pages
Periodic Test - 1 Class Xi
No ratings yet
Periodic Test - 1 Class Xi
20 pages
Quantum Group Calculi for Mathematicians
No ratings yet
Quantum Group Calculi for Mathematicians
19 pages
VAC Vedic Mathematics II Semester 1st
No ratings yet
VAC Vedic Mathematics II Semester 1st
2 pages
Long Chen: Min H H H
No ratings yet
Long Chen: Min H H H
11 pages
The Fundamental Theorem of Linear Algebra 1993
No ratings yet
The Fundamental Theorem of Linear Algebra 1993
9 pages

Clnote Oct8

Uploaded by

Clnote Oct8

Uploaded by

Optimization

• Univariate method • Steepest descent (Cauchy

The gradient has a very important property. If we move along the

Theorem 1: The gradient vector represents the direction of the steepest

Evaluation of the gradient

1. The function is differentiable at all the points, but the calculation of

to approximate the partial derivative δf/δxi at Xm. If the function value at

is used to evaluate the derivative

• In this method, we start from an initial trial point X1 and iteratively

• The steepest descent method can be summarized by the following steps:

1. Start with an arbitrary initial point X1 . Set the iteration number as

3. Determine the optimal step length λ* in the direction di and set

5. Set the new iteration number i=i+1 and go to step 2.

The method of steepest descent may appear to be the best

1. When the change in function value in two consecutive iterations is

2. When the partial derivatives (components of the gradient) of f are small:

• Since any general function can be approximated reasonably well by a

• Thus, it requires in general n2 single-variable minimizations to find the

- d1T (AX1 + B) d1T g1

d1T A(-g 2 + β1d1 ) = 0

gT2 Ad1 gT2 (g 2 - g1 )

• Also, we know, d1T g 2 = g T2d1 = 0 and d1T = −g1T

any error resulting from the inaccuracies involved in the determination

2. Despite the limitations indicated above, the Fletcher-Reeves method is vastly

starting from the point

The search direction is taken as:

• To find the optimal step length λ1* along S1, we minimize

the equation S i = -∇f i + S i-1

gives the next search direction as S 2 = -∇f 2 + S1

with respect to λ2. As df/d λ2=8 λ2-2=0 at λ2*=1/4, we obtain:

Thus the optimum point is reached in two iterations. Even if we do not

Using the above inequality,

• If 𝜆𝜆1 = 𝜆𝜆𝑛𝑛 then only one iteration is needed in this case H is

You might also like