ORF 522: Lecture 16
Nonlinear Optimization
Robert J. Vanderbei
November 14, 2013
Slides last edited at 4:14pm on Tuesday 3rd December, 2013
Operations Research and Financial Engineering, Princeton University
http://www.princeton.edu/∼rvdb
The Interior-Point Algorithm
1
Introduce Slack Variables
• Start with an optimization problem—for now, the simplest NLP:
minimize f (x)
subject to hi(x) ≥ 0, i = 1, . . . , m
• Introduce slack variables to make all inequality constraints into nonnegativities:
minimize f (x)
subject to h(x) − w = 0,
w≥0
2
Associated Log-Barrier Problem
• Replace nonnegativity constraints with logarithmic barrier terms in the objective:
m
X
minimize f (x) − µ log(wi)
i=1
subject to h(x) − w = 0
3
First-Order Optimality Conditions
• Incorporate the equality constraints into the objective using Lagrange multipliers:
m
X
L(x, w, y) = f (x) − µ log(wi) − y T (h(x) − w)
i=1
• Set all derivatives to zero:
∇f (x) − ∇h(x)T y = 0
−µW −1e + y = 0
h(x) − w = 0
4
Symmetrize Complementarity Conditions
• Rewrite system:
∇f (x) − ∇h(x)T y = 0
W Y e = µe
h(x) − w = 0
5
Apply Newton’s Method
• Apply Newton’s method to compute search directions, ∆x, ∆w, ∆y:
T T
H(x, y) 0 −A(x) ∆x −∇f (x) + A(x) y
0 Y W ∆w =
µe − W Y e .
A(x) −I 0 ∆y −h(x) + w
Here,
m
X
2
H(x, y) = ∇ f (x) − yi∇2hi(x)
i=1
and
A(x) = ∇h(x)
• Note: H(x, y) is positive semidefinite if f is convex, each hi is concave, and each yi ≥ 0.
6
Reduced KKT System
• Use second equation to solve for ∆w. Result is the reduced KKT system:
" #" # " #
−H(x, y) AT (x) ∆x ∇f (x) − AT (x)y
=
A(x) W Y −1 ∆y −h(x) + µY −1e
• Iterate:
x(k+1) = x(k) + α(k)∆x(k)
w(k+1) = w(k) + α(k)∆w(k)
y (k+1) = y (k) + α(k)∆y (k)
7
Convex vs. Nonconvex Optimization Probs
Nonlinear Programming (NLP) Problem:
minimize f (x)
subject to hi(x) = 0, i ∈ E,
hi(x) ≥ 0, i ∈ I.
NLP is convex if
• hi’s in equality constraints are affine;
• hi’s in inequality constraints are concave;
• f is convex;
NLP is smooth if
• All are twice continuously differentiable.
8
Modifications for Convex Optimization
For convex nonquadratic optimization, it does not suffice to choose the steplength α simply
to maintain positivity of nonnegative variables.
• Consider, e.g., minimizing
f (x) = (1 + x2)1/2.
• The iterates can be computed explicitly:
x(k+1) = −(x(k))3
• Converges if and only if |x| ≤ 1.
• Reason: away from 0, function is too linear.
9
Step-Length Control
A filter-type method is used to guide the choice of steplength α.
Define the dual normal matrix:
N (x, y, w) = H(x, y) + AT (x)W −1Y A(x).
Theorem Suppose that N (x, y, w) is positive definite.
1. If current solution is primal infeasible, then (∆x, ∆w) is a descent direction for the
infeasibility kh(x) − wk.
2. If current solution is primal feasible, then (∆x, ∆w) is a descent direction for the barrier
function.
Shorten α until (∆x, ∆w) is produces a decrease in either the infeasibility or the barrier
function.
10
Nonconvex Optimization: Diagonal Perturbation
• If H(x, y) is not positive semidefinite then N (x, y, w) might fail to be positive definite.
• In such a case, we lose the descent properties given in previous theorem.
• To regain those properties, we perturb the Hessian: H̃(x, y) = H(x, y) + λI.
• And compute search directions using H̃ instead of H.
Notation: let Ñ denote the dual normal matrix associated with H̃.
Theorem If Ñ is positive definite, then (∆x, ∆w, ∆y) is a descent direction for
1. the primal infeasibility, kh(x) − wk and
2. the noncomplementarity, wT y.
11
Notes:
• Not necessarily a descent direction for dual infeasibility.
• A line search is performed to find a value of λ within a factor of 2 of the smallest
permissible value.
12
Nonconvex Optimization: Jamming
Theorem If the problem is convex and and the current solution is not optimal and ..., then
for any slack variable, say wi, we have wi = 0 implies ∆wi ≥ 0.
• To paraphrase: for convex problems, as slack variables get small they tend to get large
again. This is an antijamming theorem.
• An example of Wächter and Biegler shows that for nonconvex problems, jamming really
can occur.
• Recent modification:
– if a slack variable gets small and
– its component of the step direction contributes to making a very short step,
– then increase this slack variable to the average size of the variables the “mainstream”
slack variables.
• This modification corrects all examples of jamming that we know about.
13
Modifications for General Problem Formulations
• Bounds, ranges, and free variables are all treated implicitly as described in Linear Pro-
gramming: Foundations and Extensions (LP:F&E).
• Net result is following reduced KKT system:
" #" # " #
−(H(x, y) + D) AT (x) ∆x Φ1
=
A(x) E ∆y Φ2
• Here, D and E are positive definite diagonal matrices.
• Note that D helps reduce frequency of diagonal perturbation.
• Choice of barrier parameter µ and initial solution, if none is provided, is described in the
book.
• Stopping rules, matrix reordering heuristics, etc. are as described in LP:F&E.
14