0% found this document useful (0 votes)
27 views11 pages

Nlpsol 2

The document summarizes necessary and sufficient conditions for a point x* to be a local/global minimum of a function f over a domain X in three different cases: 1. X is polyhedral. x* satisfies ∇f(x*) ≥ 0 for all x in X to be a local minimum. Additional conditions are needed for global minimum. 2. A weaker condition is ∇f(x*) ≥ 0 for feasible directions at x*. This also ensures local optimality. 3. If ∇f(x*) ≥ 0 and (x-x*)'∇2f(x*)(x-x*) ≥ γ||x-x*||2 for some γ

Uploaded by

Afshin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views11 pages

Nlpsol 2

The document summarizes necessary and sufficient conditions for a point x* to be a local/global minimum of a function f over a domain X in three different cases: 1. X is polyhedral. x* satisfies ∇f(x*) ≥ 0 for all x in X to be a local minimum. Additional conditions are needed for global minimum. 2. A weaker condition is ∇f(x*) ≥ 0 for feasible directions at x*. This also ensures local optimality. 3. If ∇f(x*) ≥ 0 and (x-x*)'∇2f(x*)(x-x*) ≥ γ||x-x*||2 for some γ

Uploaded by

Afshin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Section 2.

Solutions Chapter 2

1. SECTION 2.1

2.1.9 www

From Prop. 2.1.2(a), if x∗ is a local minimum, then

∇f (x∗ ) (x − x∗ ) ≥ 0, ∀ x ∈ X,

or

n
∂f (x∗ )
(xi − x∗i ) ≥ 0.
i=1
∂xi
If x∗i = αi , then xi ≥ x∗i , ∀ xi . Letting xj = x∗j , for j = i, we have

∂f (x∗ )
≥ 0.
∂xi
Similarly, if x∗i = βi , then xi ≤ x∗i , for all xi . Letting xj = x∗j , for j = i, we have

∂f (x∗ )
≤ 0.
∂xi
If αi < x∗i < βi , let xj = x∗j for j = i. Letting xi = αi , we obtain

∂f (x∗ )
≤ 0,
∂xi
and letting xi = βi , we obtain
∂f (x∗ )
≥ 0.
∂xi
Combining these inequalities, we see that we must have
∂f (x∗ )
= 0.
∂xi

Assume that f is convex. To show that Eqs. (1.6)-(1.8) are sufficient for x∗ to be a global
minimum, let I1 = {i | x∗i = αi }, I2 = {i | x∗i = βi }, I3 = {i | αi < x∗i < βi }. Then

n
∂f (x∗ )
∇f (x∗ ) (x − x∗ ) = (xi − x∗i )
i=1
∂xi
 ∂f (x∗ )  ∂f (x∗ )  ∂f (x∗ )
= (xi − αi ) + (xi − βi ) + (xi − x∗i ).
∂xi ∂xi ∂xi
i∈I1 i∈I2 i∈I3

1
Section 2.1
∂f (x∗ ) ∂f (x∗ ) ∂f (x∗ )
Since ∂xi ≥ 0 for i ∈ I1 , ∂xi ≤ 0 for i ∈ I2 , and ∂xi = 0 for i ∈ I3 , each term in the
above equation is greater than or equal to zero. Therefore

∇f (x∗ ) (x − x∗ ) ≥ 0, ∀ x ∈ X.

From Prop. 2.1.2(b), it follows that x∗ is a global minimum.

2.1.10 www

For any x ∈ X such that ∇f (x∗ ) (x − x∗ ) = 0, we have by the second order expansion of Prop.
A.23, for all α ∈ [0, 1] and some α̃ ∈ [0, α],

   
f x∗ + α(x − x∗ ) − f (x∗ ) = 12 α2 (x − x∗ ) ∇2 f x∗ + α̃(x − x∗ ) (x − x∗ ).

For all sufficiently small α, the left-hand side is nonnegative, since x∗ is a local minimum. Hence
the same is true for the right-hand side, and by taking the limit as α → 0 (and also α̃ → 0), we
obtain
(x − x∗ ) ∇2 f (x∗ )(x − x∗ ) ≥ 0.

2.1.11 www

Proof under condition (1): Assume, to arrive at a contradiction, that x∗ is not a local
minimum. Then there exists a sequence {xk } ⊆ X converging to x∗ such that f (xk ) < f (x∗ ) for
all k. We have

1
f (xk ) = f (x∗ ) + ∇f (x∗ ) (xk − x∗ ) + (xk − x∗ ) ∇2 f (x∗ )(xk − x∗ ) + o( xk − x∗ 2 ).
2

Introducing the vector


xk − x∗
pk = ,
xk − x∗
and using the relation f (xk ) < f (x∗ ), we obtain

1 o( xk − x∗ 2 )
∇f (x∗ ) pk + pk  ∇2 f (x∗ )pk xk − x∗ + < 0. (1)
2 xk − x∗

This together with the hypothesis ∇f (x∗ ) pk ≥ 0 implies

1 k 2 o( xk − x∗ 2 )
p ∇ f (x∗ )pk xk − x∗ + < 0. (2)
2 xk − x∗

Let us call feasible direction at x∗ any vector p of the form α(x − x∗ ), where α > 0 and
x ∈ X, x = x∗ (see also Section 2.2). The sequence {pk } is a sequence of feasible directions at

2
Section 2.1

x∗ that lie on the surface of the unit sphere. Therefore, a subsequence {pk }K converges to a
vector p, which because X is polyhedral, must be a feasible direction at x∗ (this is easily seen by
expressing the polyhedral set X in terms of linear equalities and inequalities). Therefore, by the
hypothesis of the exercise, we have ∇f (x∗ ) p ≥ 0. By letting k → ∞, k ∈ K in (1), we have

∇f (x∗ ) p = 0.

The hypothesis of the exercise implies that

p ∇2 f (x∗ )p > 0. (3)

Dividing by xk − x∗ and taking the limit in Eq. (2) as k → ∞, k ∈ K, we obtain

1  2 o( xk − x∗ 2 )
p ∇ f (x∗ )p + lim ≤ 0.
2 k→∞, k∈K xk − x∗ 2

This contradicts Eq. (3).

Proof under condition (2): Here we argue in the similar way as in part (1). Suppose that all
the given assumptions hold and x∗ is not a local minimum. Then there is a sequence {xk } ⊆ X
converging to x∗ such that f (xk ) < f (x∗ ) for all k. By using the second order expansion of f at
xk −x∗
x∗ and introducing the vector pk = xk −x∗ 
, we have that both Eq. (1) and (2) hold for all k.
Since {pk } consists of feasible directions at x∗ that lie on the surface of the unit sphere, there
is a subsequence {pk }K converging to a vector p with ||p|| = 1. By the assumption given in the
exercise, we have that
∇f (x∗ )pk ≥ 0, ∀ k.

Hence ∇f (x∗ )p ≥ 0. By letting k → ∞, k ∈ K in (1), we obtain ∇f (x∗ )p ≤ 0. Consequently


∇f (x∗ )p = 0. Since the vector p is in the closure of the set of the feasible directions at x∗ , the
condition given in part (2) implies that p ∇2 f (x∗ )p > 0. Dividing by xk − x∗ and taking the
limit in Eq. (2) as k → ∞, k ∈ K, we obtain p ∇2 f (x∗ )p ≤ 0, which is a contradiction. Therefore,
x∗ must be a local minimum.

Proof under condition (3): We have

1
f (x) = f (x∗ ) + ∇f (x∗ ) (x − x∗ ) + (x − x∗ ) ∇2 f (x∗ )(x − x∗ ) + o( x − x∗ 2 ),
2

so that by using the hypotheses ∇f (x∗ ) (x − x∗ ) ≥ 0 and (x − x∗ ) ∇2 f (x∗ )(x − x∗ ) ≥ γ x − x∗ 2 ,

γ
f (x) − f (x∗ ) ≥ x − x∗ 2 + o( x − x∗ 2 ).
2

The expression in the right-hand side is nonnegative for x ∈ X close enough to x∗ , and it is
strictly positive if in addition x = x∗ . Hence x∗ is a strict local minimum.

3
Section 2.1

Example: [Why the assumption that X is a polyhedral set was important under
condition (1)] A polyhedral set X has the property that for any point x ∈ X, the set V (x) of
the feasible directions at x is closed. This was crucial for proving that the conditions

∇f (x∗ ) (x − x∗ ) ≥ 0, ∀ x ∈ X, (1)

(x − x∗ )∇2 f (x∗ )(x − x∗ ) > 0, ∀ x ∈ X, x = x∗ , for which ∇f (x∗ ) (x − x∗ ) = 0, (2)

are sufficient for local optimality of x∗ .

Consider the set X = {(x1 , x2 ) | (x1 )2 ≤ x2 } and the point (0, 0) ∈ X. Let the cost function
be f (x1 , x2 ) = −2(x1 )2 + x2 . Note that the gradient of f at 0 is [0, 1] . It is easy to see that

∇f (0) (x − 0) = x2 > 0, ∀ x ∈ X, x = 0.

Thus the point x∗ = 0 satisfies conditions (1) and (2) (condition 2 is trivially satisfied since in
our example ∇f (0) (x − 0) = 0 simply never occurs for x ∈ X, x = 0). On the other hand,
x∗ = 0 is not a local minimum of f in X. Consider the points xn = ( n1 , n12 ) ∈ X for n ≥ 1.
Since xn → x∗ as n → ∞, for any δ > 0 there is an index nδ such that ||xn − x∗ || < δ for all
n ≥ nδ . By evaluating the cost function, we have f (xn ) = − n12 < 0 = f (x∗ ). Hence, in any δ
neighborhood of x∗ = 0, there are points xn ∈ X with the better objective value, i.e. x∗ is not a
local minimum.

This is happening because the set V (x∗ ) of the feasible directions at point x∗ is not closed
in this case. The set V (x∗ ) is given by

V (x∗ ) = {d = (d1 , d2 ) | d2 > 0, ||d|| = 1},

and is open. The vectors    


1 −1
and
0 0
belong to the closure of V (x∗ ) but they are not in the set V (x∗ ).

2.1.18 www

The assumption on ∇2 f (x) guarantees that f is strictly convex and coercive, so it has a unique
global minimum over any closed convex set (using Weierstrass’ theorem, Prop. A.8). By the
second order expansion of Prop. A.23, we have for all x and y in n

f (y) = f (x) + ∇f (x) (y − x) + 12 (y − x)∇f (ỹ)(y − x)

4
Section 2.1

for some ỹ in the line segment connecting x and y. It follows, using the hypothesis, that
M m
∇f (x) (y − x) + y−x 2 ≥ f (y) − f (x) ≥ ∇f (x) (y − x) + y − x 2.
2 2
Taking the minimum in this inequality over y ∈ X, and changing sign, we obtain
 
M m
− min ∇f (x) (y − x) +
 y−x 2 ≤ f (x) − f (x∗ ) ≤ − min ∇f (x) (y − x) + y−x 2 ,
y∈X 2 y∈X 2
which is the desired relation.

2.1.19 of 2nd Printing (Existence of Solutions of Nonconvex Quadratic


Programming Problems) www

Let {γ k } be a decreasing sequence with γ k ↓ f ∗ , and denote

S k = {x ∈ X | x Qx + c x ≤ γ k }.

Then the set of optimal solutions of the problem is ∩∞ k


k=0 S , so by Prop. 2.1.4, it will suffice

to show that for each asymptotic direction of {S k }, all corresponding asymptotic sequences are
retractive. Let d be an asymptotic direction and let {xk } be a corresponding asymptotic sequence.
Similar to the proof of Prop. 2.1.5, we have d Qd ≤ 0. Also, in case (i), similar to the proof of
Prop. 2.1.5, we have aj d ≤ 0 for all j, while in case (ii) it is seen that d ∈ N , where X = B + N
and B is compact and N is a polyhedral cone. For any x ∈ X, consider the vectors x̃k = x + kd.
Then, in both cases (i) and (ii), it can be seen that x̃k ∈ X [in case (i) by using the argument
in the proof of Prop. 2.1.5, and in case (ii) by using the definition X = B + N ]. Thus, the cost
function value corresponding to x̃k satisfies
f ∗ ≤ (x + kd) Q(x + kd) + c (x + kd)

= x Qx + c x + k 2 d Qd + k(c + 2Qx) d

≤ x Qx + c x + k(c + 2Qx) d,
where the last inequality follows from the fact d Qd ≤ 0. From the finiteness of f ∗ , it follows
that
(c + 2Qx) d ≥ 0, ∀ x ∈ X.

We now show that {xk } is retractive, so that we can use Prop. 2.1.4. Indeed for any α > 0, since
xk → ∞, it follows that for k sufficiently large, we have xk − αd ∈ X [this follows similar to
the proof of Prop. 2.1.5 in case (i), and because d ∈ N in case (ii)]. Furthermore, we have
f (xk − αd) = (xk − αd) Q(xk − αd) + c (xk − αd)

= xk Qxk + c xk − α(c + 2Qxk ) d + α2 d Qd

≤ xk Qxk + c xk

≤ γk,

5
Section 2.1

where the first inequality follows from the facts d Qd ≤ 0 and (c + 2Qxk ) d ≥ 0 shown earlier.
Thus for sufficiently large k, we have xk − αd ∈ S k , so that {xk } is retractive. The existence of
an optimal solution now follows from Prop. 2.1.4.

2.1.20 of 2nd Printing www

We proceed as in the proof of Prop. 2.1.5. By using a decomposition of dk as the sum of a vector
in the nullspace of A and its orthogonal complement, and an argument like the one in the proof
of Prop. 2.1.5, we can show that
Ad = 0, c d ≤ 0.

Similarly, we can show that


aj d ≤ 0, j = 1, . . . , r.

Using the finiteness of f ∗ , we can also show that c d = 0, and we can conclude the proof similar
to the proof of Prop. 2.1.5.

2.1.21 of 2nd Printing www

Note that the cone N in this exercise must be assumed polyhedral (see the errata sheet). Let
S k = x ∈ X | f (x) ≤ γ k , and let d be an asymptotic direction of {S k }, and let {xk } be a
corresponding asymptotic sequence. We will show that {xk } is retractive, so by applying Prop.
2.1.4, it follows that the intersection of {S k }, the set of minima of f over X, is nonempty.

Since d is an asymptotic direction of {S k }, d is also an asymptotic direction of x | f (x) ≤


γ k , and by hypothesis for some bounded positive sequence {αk } and some positive integer k,
we have f (xk − αk d) ≤ γ k for all k ≥ k.

Let X = X + N , where X is compact, and N is the polyhedral cone

N = {y | aj y ≤ 0, j = 1, . . . , r},

where a1 , . . . , ar are some vectors. We can represent xk as

xk = xk + y k , ∀ k = 0, 1, . . . ,

where xk ∈ X and y k ∈ N , so that

aj xk = aj (xk + y k ), ∀ k = 0, 1, . . . , j = 1, . . . , r.

Dividing both sides with xk and taking the limit as k → ∞, we obtain

aj y k
aj d = lim .
k→∞ xk

6
Section 2.2

Since aj y k ≤ 0 for all k and j, we obtain that aj d ≤ 0 for all j, so that d ∈ N .

For each j, we consider two cases:

(1) aj d = 0. In this case, aj (y k − αd) ≤ 0 for all k, since y k ∈ N and aj y k ≤ 0.

(2) aj d < 0. In this case, we have

1  k 1
a (y − αd) = k aj (xk − xk − αd),
xk j x

xk
so that since xk 
→ d, {xk } is unbounded, and {xk } is bounded, we obtain

1  k
lim a (y − αd) = aj d < 0.
k→∞ xk j

Hence aj (y k − αd) < 0 for k greater than some k.

Thus, for k ≥ k and α ∈ (0, α], we have aj (y k − αd) ≤ aj (y k − αd) ≤ 0 for all j, so that
y k − αd ∈ N and xk − αd ∈ X.

Thus {xk } is retractive, and by applying Prop. 2.1.4, we have that {S k } has nonempty
intersection.

2.1.22 of 2nd Printing www

We follow the hint. Let {yk } be a sequence of points in A S converging to some y ∈ n. We will
prove that A S is closed by showing that y ∈ A S.

We introduce the sets


Wk = z | z − y ≤ yk − y ,

and
Sk = {x ∈ S | Ax ∈ Wk }.

To show that y ∈ A S, it is sufficient to prove that the intersection ∩∞


k=0 Sk is nonempty, since

every x ∈ ∩∞
k=0 Sk satisfies x ∈ S and Ax = y (because yk → y). The asymptotic directions of

{Sk } are asymptotic directions of S that are also in the nullspace of A, and it can be seen that
every corresponding asymptotic sequence is retractive for {Sk }. Hence, by Prop. 2.1.4, ∩∞
k=0 Sk

is nonempty.

2. SECTION 2.2

7
Section 2.3

2.2.7 www

Since the number of extreme points of f is finite, some extreme point must be repeated within a
finite number of iterations, i.e., for some k and i ∈ {0, 1, . . . , k − 1}, we have

xi = arg min ∇f (xk ) (x − xk ).


x∈X

Since xk minimizes f (x) over X k−1 , we must have

∇f (xk ) (xi − xk ) ≥ 0, ∀ i = 0, 1, . . . , k − 1.

Combining the above two equations, we see that

∇f (xk ) (x − xk ) ≥ 0, ∀ x ∈ X,

which implies that xk is a stationary point of f over X.

3. SECTION 2.3

2.3.4 www

We assume here that the unscaled version of the method (H k = I) is used and that the stepsize
sk is a constant s > 0.

(a) If xk is nonstationary, there exists a feasible descent direction x̂k −xk for the original problem,
where x̂k ∈ X. Since x̂k ∈ X k , we have

1 k 1 k
∇f (xk ) (x̃k − xk ) + x̃ − xk 2 ≤ ∇f (xk ) (x̂k − xk ) + x̂ − xk 2 < 0,
2s 2s

where x̃k is defined by the algorithm. Thus,

1 k
∇f (xk ) (x̃k − xk ) ≤ − x̃ − xk 2 < 0,
2s

so that x̃k − xk is a descent direction at xk . It is also a feasible direction, since aj (x̃k − xk ) ≤ 0
for all j such that aj xk = bj .

(b) As in the proof of Prop. 2.3.1, we will show that the direction sequence {xk − xk } is gradient-
related, where
xk = γ k x̃k + (1 − γ k )xk

8
Section 2.3

and
γ k = max γ ∈ [0, 1] | γ x̃k + (1 − γ)xk ∈ X .

Indeed, suppose that {xk }k∈K converges to a nonstationary point x̃. We must prove that

lim sup xk − xk < ∞, (*)


k→∞, k∈K

lim sup ∇f (xk ) (xk − xk ) < 0. (**)


k→∞, k∈K

Since xk − xk ≤ x̃k − xk ≤ s ∇f (xk ) , Eq. (*) clearly holds, so we concentrate on proving


(**). The key to this is showing that γ k is bounded away from 0, so that the inner product
∇f (xk ) (xk − xk ) is bounded away from 0 when ∇f (xk ) (x̃k − xk ) is.

For each k, we either have γ k = 1, or else we must have for some j with aj xk < bj − ,
 
aj γ k x̃k + (1 − γ k )xk = bj

so that
γ k aj (x̃k − xk ) = bj − aj xk > ,

from which

γk > .
aj · x̃k − xk
It follows that for all k, we have
 

min 1, min ≤ γ k ≤ 1.
j aj · x̃k − xk
Since the subsequence {xk }K converges, the subsequence {x̃k − xk }K is bounded implying also
that the subsequence {γ k }K is bounded away from 0.

For sufficiently large k, the set

X k = x | aj x ≤ bj , for all j with bj −  ≤ aj xk ≤ bj ,

is equal to the set

X̃ = x | aj x ≤ bj , for all j with bj −  ≤ aj x̃ ≤ bj ,

so proceeding as in the proof of Prop. 2.3.1, we obtain


1  + 2
lim sup ∇f (xk ) (x̃k − xk ) ≤ − x̃ − x̃ − s∇f (x̃) ,
k→∞, k∈K s

where [·]+ denotes projection on the set X̃. Since x̃ is nonstationary, the right-hand side of the
above inequality is negative, so that

lim sup ∇f (xk ) (x̃k − xk ) < 0.


k→∞, k∈K

9
Section 2.3

We have xk − xk = γ k (x̃k − xk ), and since γ k is bounded away from 0, it follows that

lim sup ∇f (xk ) (xk − xk ) < 0,


k→∞, k∈K

proving Eq. (**).

(c) Here we consider the variant of the method that uses a constant stepsize, which however, is
reduced if necessary to ensure that xk is feasible. If the stepsize is sufficiently small to ensure
convergence to the unique local minimum x∗ of the positive definite quadratic cost function, then
xk will be arbitrarily close to x∗ for sufficiently large k, so that xk = x̃k . Thus the convergence
rate estimate of the text applies.

2.3.7 www

The key idea is to show that xk stays in the bounded set

A = x ∈ X | f (x) ≤ f (x0 )

and to use a constant stepsize sk = s that depends on the constant L corresponding to this
bounded set. Let
R = max{ x | x ∈ A},

G = max{ ∇f (x) | x ∈ A},

and
B = {x | x ≤ R + 2G}.

Using condition (i) in the exercise, there exists some constant L such that ∇f (x) − ∇f (y) ≤
L x − y , for all x, y ∈ B. Suppose the stepsize s satisfies 0 < s < 2 min{1, 1/L}. We will, show
by induction on k that with this stepsize, we have xk ∈ A and
 
  L 1
f x k+1 ≤ f (x ) −
k − xk+1 − xk 2 ≤ f (xk ), (*)
2 s

for all k ≥ 0.

To start the induction, we note that x0 ∈ A, by the definition of A. Suppose that xk ∈ A.


 +
We have xk+1 = xk − s∇f (xk ) , so by using the nonexpansiveness of the projection mapping,

xk+1 − xk ≤ (xk − s∇f (xk )) − xk ≤ s ∇f (xk ) ≤ 2G.

Thus,
xk+1 ≤ xk + 2G ≤ R + 2G,

10
Section 2.3

implying that xk+1 ∈ B. Since B is convex, we conclude that the entire line segment connecting
xk and xk+1 belongs to B. In order to prove Eq. (*), we now proceed as in the proof of Prop. 2.3.2.
A difficulty arises because Prop. A.24 assumes that the inequality ∇f (x) − ∇f (y) ≤ L x − y
holds for all x, y, whereas in this exercise this inequality holds only for x, y ∈ B. However, using
the fact that the Lipschitz condition holds along the line segment connecting xk and xk+1 (which
belongs to B as argued earlier), the proof of Prop. A.24 can be repeated to obtain
L k+1
f (xk+1 ) − f (xk ) ≤ ∇f (xk ) (xk+1 − xk ) + x − xk 2.
2
Using this relation, and the relation
1 k+1
∇f (xk ) (xk+1 − xk ) ≤ − x − xk 2,
s
[which is Eq. (3.27) of the text], we obtain Eq. (*) [as in the text, cf. Eq. (3.29)]. It follows that
xk+1 ∈ A, completing the induction. The remainder of the proof is the same as in Prop. 2.3.2.

2.3.8 www

(a) The expression for f given in the hint is verified by straightforward calculation. Based on
this expression, the method takes the form
 
1
xk+1 = arg min ∇f (xk ) (x − xk ) + 12 (x − xk ) Q(x − xk ) + k x − xk 2 ,
x∈X 2c
or    
1
xk+1
= arg min ∇f (xk ) (x
− + (x −
xk ) Q + k I (x − x ) .
1
2 xk ) k
x∈X c
This is recognized as the scaled gradient projection method with scaling matrix H k = Q+(1/ck )I
and stepsizes sk = 1, αk = 1.

(b) Similar to part (a), we have

xk = arg min {∇f (xk ) (x − xk ) + 12 (x − xk ) (Q + M k )(x − xk )} ,


x∈X

and x − xk is recognized as the direction of the scaled gradient projection method with scaling
k

matrix H k = Q + M k and stepsize sk = 1.

(c) If X = n and M k = Q, we have

xk = xk − (Q + M k )−1 ∇f (xk ) = xk − 12 Q−1 ∇f (xk ),

so for a stepsize αk = 2, we have

xk+1 = xk + αk (xk − xk ) = xk − Q−1 ∇f (xk ).

Thus the method reduces to the pure form of Newton’s method for unconstrained minimization
of f , which for a quadratic function converges in a single step to the optimal solution.

11

You might also like