Polynomial Optimization
Simone Naldi
    Université de Limoges
      Master ACSYON
         2017-2018
   These are the notes for the cours “Semialgebraic Optimization” given jointly by
Simone Naldi (Université de Limoges) and Didier Henrion (CNRS LAAS, Toulouse
and Czech University, Prague) for the M2 of the Master ACSYON, Université de
Limoges. In case of errors or typos, please send feedback to Simone Naldi at
                         simone.naldi@unilim.fr
   We closely follow the book [5]. Other good references are [1] (treating different
topics in conic and semialgebraic optimization), the book [4] focusing on the gener-
alized moment problem, the original papers by J-B. Lasserre [3] and the PhD thesis
of P. Parrilo [6]. A more general introduction to convex algebraic geometry is [2].
                                         2
Contents
Lecture I                                                                           4
   1.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   4
   1.2   Multivariate polynomials . . . . . . . . . . . . . . . . . . . . . . .     5
   1.3   Semialgebraic optimization . . . . . . . . . . . . . . . . . . . . . .     6
   1.4   Standard examples . . . . . . . . . . . . . . . . . . . . . . . . . .      6
   1.5   Equivalence with positivity . . . . . . . . . . . . . . . . . . . . . .    7
   1.6   Nonnegative polynomials and sums of squares . . . . . . . . . . . .        8
Bibliography                                                                        11
                                         3
                                      Lecture I
1.1. Introduction
The general goal of the course is to address a special class of optimization problems,
defined with polynomial data. For instance, the objective function or the feasible
set are defined by multivariate real polynomials such as
                                                      √
                             f = 3x3 y2 − 5z2 + yz − 2.
                                                       N             Z              Q
We use the standard notation for sets of natural ( ), integer ( ), rational ( ), real
R                        C
( ) and complex ( ) numbers.
                                                               R      R
Definition 1 (Optimization Problem). Let f , g1 , . . . , gm : n → be functions. The
general Optimization Problem (OP) associated to f , g1 , . . . , gm is
                                    f ∗ := inf f (x)
                                                                                         (1)
                                           s.t. x ∈ S
                     R
where S = {x ∈ n : gi (x) ≥ 0, ∀ i = 1, . . . , m}. The set S is called the feasible set
and f is called the objective function of problem (1).
   We first remark that similar definitions in the optimization literature, for the fea-
sible set S, could involve different types of constraints, but that our description with
inequalities of type ≥ is general enough. Indeed:
    • g(x) ≤ 0 if and only if −g(x) ≥ 0
    • g(x) = 0 if and only if g(x) ≥ 0 and g(x) ≤ 0.
In particular, we will not distinguish between equality and inequality constraints.
We also remark that in its whole generality S can possibly be
    • non-linear, non-convex with nodal points on the boundary
    • finite or infinite
    • bounded, unbounded
Example 2. Suppose that g1 , . . . , gm are concave functions, that is for i = 1, . . . , m
          ∀ x, y ∈   Rn, ∀t ∈ [0, 1] ⇒ gi(tx + (1 − t)y) ≥ tg(x) + (1 − t)g(y).
Then S is a convex set. By the way, this will not be the case in this course, generally
speaking).
    Typically, in optimization, we distinguish between local and global solutions.
                                                                R
Example 3. Let f = (x + 3)(x + 1)(x − 1)(x − 2), S = n . Then the goal is to
                                         R
minimise f over the whole domain n . In this case f has one global minimizer
between −3 and −1, a local maximizer between −1 and 1, and a local minimizer
between 1 and 2. It has no global maximizer.
                                             4
                                                  30
                                                                  f (x)
                                                  20
                                                  10
                                                                           x
                        −4 −3 −2 −1                         1     2        3
                                                 −10
   Local minima/maxima can be computed via first-order conditions, that is van-
ishing of the gradient vector:
                               ∇ f (x) = 0,
for functions f ∈ C1 (S) (the space of continuously differentiable functions). By the
way, one cannot distinguish between minima and maxima just looking at the first
derivative. Second-order conditions on the Hessian of f
                                  H f (x)  0 or H f (x)  0,
(where , resp. , stands for positive semidefinite, resp. negative semidefinite,
see Appendix) give local convexity/concavity for f ; hence one could distinguish
maxima and minima via second-order conditions (sufficient, not necessary).
   In the special case of semialgebraic optimization, that is when data f , g1 , . . . , gm
are multivariate real polynomials (or possibly rational functions, but we will not
treat this case in the course), we will see that one can approximate as close as
possible (at fixed precision) the value f ∗ , and often compute it exactly. This course
describes a method to address this special optimization problem.
1.2. Multivariate polynomials
         N
Let n ∈ . In the whole notes, we use the x to denote the vector of n variables
x1 , . . . , xn . For small n we will often use the letters x, y, z, w . . . to denote variables.
    The monomial in x with exponent α is denoted by
                        xα = x1α1 · · · xnαn , for α = (α1 . . . αn ) ∈   Nn .
                                           N
For instance, when α = (1, 3, 2) ∈ 3 then one has xα = x1 x23 x32 . A polynomial is a
finite linear combination of monomials, with coefficients in :             R
                                     f=    ∑ f α xα , f α ∈ R
                                           α∈I
                                   N
where I is a finite subset of n . The set of all real polynomials in x is denoted
    R
by [x]. The degree of xα is |α| = α1 + · · · + αn . The degree of a polynomial
f = ∑α∈I fα xα is deg( f ) = maxα∈I |α|. In particular, deg(xα ) = |α|.
                    R                  R                              N
   We denote by [x]d = { f ∈ [x] : deg( f ) ≤ d}, d ∈ . Both [x] and [x]d        R        R
                                                      R
are real vector spaces. The dimension of [x] is infinite, while the dimension of
R [x]d equals the number of monomials xα of degree at most d, that is the binomial
coefficient
                                         R
                                               
                                           n+d
                             dim [x]d =           .
                                             d
                                                  5
1.3. Semialgebraic optimization
We define the semialgebraic optimization problem. We start from the feasible set.
                                                                  R
   Definition 4 (Semialgebraic set). Let g1 , . . . , gm ∈ [x]. The semialgebraic
                                                                           R
   set associated to g = (g1 , . . . , gm ) is the set S(g) = {x ∈ n : gi ≥ 0, ∀i =
   1, . . . , m}.
   Examples of semialgebraic1 sets are hyperplanes, polyhedra, circles, ellipsoids,
spheres (cf. Figure 1.1)
                            Figure 1.1. Some semialgebraic sets
    As remarked above for general functions, equality constraints are allowed, and
                                                                               R
in this case the feasible set is called a real algebraic set: {x ∈ n : gi = 0, ∀i =
1, . . . , m}.
   Semialgebraic sets are the feasibles sets of the semialgebraic optimization prob-
lem, that we are going to define.
   Definition 5 (Semialgebraic Optimization). Let f , g1 , . . . , gm ∈ [x]. The   R
   Semialgebraic Optimization Problem (SAO) associated to f , g1 , . . . , gm is
                                    f ∗ := inf f (x)
                                                                                             (2)
                                           s.t. x ∈ S(g).
1.4. Standard examples
Problem (2) models many important classes of optimization problems:
   1. Linear Programming (LP) is the special case of Problem (2), when all poly-
      nomials are of degree 1:
                                     f ∗ := inf cT x
                                                                                                   (3)
                                            s.t. Ax = b, x ≥ 0.
          Above, x ≥ 0 means entry-wise positivity (xi ≥ 0 for all i). The feasible set is
          a polyhedron (a special semialgebraic set).
   1 Inreal algebraic geometry, the sets defined in Definition 4 are called basic semialgebraic sets,
and semialgebraic sets are union of basic semialgebraic sets. In this course, we consider only basic
semialgebraic sets, and we avoid the term basic.
                                                 6
   2. Quadratic Programming (QP), involving polynomials of degree 2:
                       f ∗ := inf xT Cx + cT x
                                                                                      (4)
                              s.t. xT Pj x + qTj x + r j ≥ 0, j = 1, . . . , m
      where C, Pj are real symmetric matrices and q j are real vectors and r j ∈     R.
   3. Boolean Programming (BP) where S(g) = {0, 1}n . S(g) is semialgebraic since
      it is defined by the polynomial equalities xi (xi − 1) = 0.
   4. Mixed-Integer Programming (MIP), that is LP with additional constraints of
                                Z
      type S(g) = [−N, N] ∩ n , solution to the polynomial equalities
                             (xi + N)(xi + N − 1) · · · (xi − N) = 0.
    Hence (SAO) can be non-linear, non-convex; the feasible set can be unbounded.
It is in general a hard problem, more precisely NP-hard, which means that there
does not exist a polynomial-time algorithm solving all (SAO) problems.            1h30
1.5. Equivalence with positivity
Suppose that the polynomial f in (2) is bounded from below on S(g) ( f ∗ > −∞) and
                                                                                 R
that S(g) is non-empty ( f ∗ < ∞). This is equivalent to the fact that f ∗ ∈ . Remark
that in this case the polynomial g(x) = f (x) − f ∗ takes only nonnegative values on
S(g). The next Figure shows the polynomial f in Example 3 and f − f ∗ .
                                             40          f (x) − f ∗
                                                               f (x)
                                             20
                                                                       x
                       −4 −3 −2 −1                       1     2       3
                                                  R
Definition 6 (Positivity). Let f , g1 , . . . , gm ∈ [x]. We say that f is nonnegative on
S := S(g) if f (x) ≥ 0 for all x ∈ S. We say that f is positive (or strictly nonnegative)
if f (x) > 0 for all x ∈ S. The set of nonnegative polynomials on S is denoted by
                     N
P(S), and for d ∈ , Pd (S) = P(S) ∩ [x]d .    R
                                R
   For instance, let S = {x ∈ 2 : x12 + x22 ≤ 1} be the circle of radius 1 around the
origin. Then the polynomial f = x2 (1 − x2 − y2 ) is nonnegative on S. Indeed, it is
the product of the “clearly globally nonnegative” polynomial x2 , with a polynomial
which is also “clearly nonnegative over S”, by definition of S (that is 1 − x2 − y2 ).
Proposition 7 (Equivalence). Let f , g1 , . . . , gm ∈   R[x]≤d . Then
                     inf f (x)     = sup λ
                                                                                      (5)
                     s.t. x ∈ S(g)   s.t. f − λ ∈ Pd (S(g)).
                                              7
Proof. Let f ∗ be the left term of the equality, and s∗ the right one. If f ∗ = −∞, then
 f is unbounded from below on S, hence {λ : f − λ ∈ Pd (S)} = 0,        / which implies
that s∗ = −∞ = f ∗ . Analogously if f ∗ = +∞, then S = 0/ and every polynomial is
nonnegative on S, hence s∗ = +∞.
                            R
    Suppose now that f ∗ ∈ . Then f − f ∗ is nonnegative on S, that is f ∗ ≤ s∗ . Now
let λ be such that f − λ is nonnegative on S. Hence f ∗ ≥ λ , and passing to the
supremum one gets f ∗ = s∗ .                                                        
  Hence we have the equivalence between polynomial positivity on S and polyno-
mial optimization. We will typically distinguish two cases:
   • S(g) =   Rn (global case of SAO)
   •   S(g) ( Rn (local case of SAO)
    In the local case, compactness or almost equivalent algebraic assumptions can
be made in order to obtain characterizations of positivity (classically known as Pos-
itivstellensätze).
Proposition 8. If f ∈ P(   Rn), then its degree is even.
Proof. If f has odd degree, then there is an index j ∈ {1, . . . , n} such that
                                    f = c j x2k+1
                                             j    ∏ xiki ,
                                                  i6= j
that is one variable must appear with an odd power. Then evaluating all other vari-
ables to 1, one gets that the resulting polynomial is non-zero and of odd degree.
Hence it must change sign, and the same for f .                                 
Definition 9. A polynomial f is a sum of squares (SOS for short) if f = f12 +· · ·+ fm2
for some polynomials fi . The set of sum-of-squares polynomials with n unknowns is
denoted by Σn and its degree−d part with Σn,d = Σn ∩ [x]d .  R
   Two of the exercises ask to prove that the cone of nonnegative polynomials, the
cone of sums of squares and their degree−d parts are closed convex cones with
non-empty interiors.
1.6. Nonnegative polynomials and sums of squares
Every sum-of-squares polynomial is nonnegative by construction. What about the
converse? Given a nonnegative polynomial, is it a sum of squares of polynomials?
If yes, how such a decomposition can be computed?
   Hilbert solved the first question in a celebrated result of 1888, inspired by a
question posed by Minkowski in his doctoral dissertation. In this section we will
partially prove Hilbert’s theorem. We start with the easiest example of univariate
polynomials.
Proposition 10. A univariate polynomial is globally nonnegative if and only if it is
                                R
a sum of squares. That is P( ) = Σ1 .
                                              8
Proof. Since f has real coefficients, then its roots are conjugate pairs of complex
numbers (exercise). Moreover, by positivity the real roots have even multiplicity
(2, 4, 6 . . .). If R (resp. C) is the set of real (resp. complex non-real) roots of f (for
R counted multiplicity), then
                         f = c·   ∏ (x − α)(x − α) ∏ (x − β )2
                                  α∈C                β ∈R
where c ≥ 0. Then f = hhg2 by properly defining polynomials h ∈ [x] and g ∈ C
R [x]. Hence we deduce f = (ℜ(h)2 + ℑ(h)2 )g2 = (gℜ(h))2 + (gℑ(h))2 , where ℜ
and ℑ return the real and imaginary part.                                   
   Remark that the proof of Proposition 10 implies that two squares are always
sufficient to describe all nonnegative polynomials.
Example 11. The polynomial f = (x2 + 1)(x − 2)2 is nonnegative, and indeed its
unique real root has multiplicity 2. Factorizing f = (x − i)(x + i)(x − 2)2 yields
the polynomials h and g in the proof of Proposition 10, for instance h = x + i and
g = x − 2, and the corresponding sum of squares decomposition:
                               f = (x(x − 2))2 + (x − 2)2 .
  We consider a second class of multivariate polynomials, those of degree 2. A
                     R
quadratic form f ∈ [x]2 is identified by its defining symmetric matrix M, for which
                          f = xT Mx, x = (x1 , . . . , xn ) ∈   Rn.
                                          R
Proposition 12. A quadratic form f ∈ [x]2 is nonnegative if and only if it is a sum
                                           R
of squares (of linear forms). That is, P( n )2 = Σn,2 .
                               R
Proof. Suppose that f ∈ P( n )2 . Then remark that f = xT Mx where M is a pos-
itive semidefinite symmetric matrix. By the Spectral Theorem there exists an or-
thogonal matrix P and a diagonal matrix D (with nonnegative entries) such that
M = PT DP. Hence one gets
                                                               p
      f = xT Mx = xT PT D(Px) = (Px)D(Px) = ∑ di (`i (x))2 = ∑( di `i (x))2
                                                      i               i
where `i (x) are the elements of the vector Px.                                         
    We finally state (without complete proof) Hilbert’s 1888 theorem.
Theorem 13 (Hilbert 1888). The equality P(         Rn)2d = Σn,2d holds if and only if
    • n = 1 (univariate polynomials)
    • d = 1 (quadratic forms)
    • n = 2 and d = 2 (plane quartics)
                                             9
          The original proof by Hilbert was not totally constructive. By the way he suc-
       ceded in proving that in all cases (d ≥ 3 and n ≥ 2, or n ≥ 3) there exist nonnegative
       polynomials that are not sums of squares. The first explicity examples were given
       by Motzkin and Robinson in the fifties and the sixties:
       Motzkin polynomial                                    M(x, y) = x4 y2 + x2 y4 + 1 − 3x2 y2
       Robinson polynomial        R(x, y, z) = y2 (1 − x2 )2 + x2 (1 − y2 )2 + x2 y2 (x2 + y2 − 2)2
          We often call a sum of squares decomposition of a nonnegative polynomial an
       algebraic certificate of positivity or simply, a certificate. We will also give certifi-
       cates for local positivity, that is, for polynomials that take nonnegative values when
       restricted on a given semialgebraic set.
          While not every nonnegative polynomial is a sum of squares of polynomials, by
                                                                                   R
       generalizing the certificate to rational functions (hence embedding [x] in (x))      R
       one gets the following powerful result.
       Theorem 14 (Artin 1927). Every nonnegative polynomial is a sum of squares of
       rational functions.
          For instance, the Motzkin polynomial is not SOS but has the following Artin
       representation (as sum of squares of rational functions):
                                   2  2          2  2           2
                       xy(1 − x2 )     y (1 − x2 )      x (1 − y2 )
                      
           M(x, y) =                 +               +                 +
                         x2 + y2         x2 + y2         x2 + y2
                                   2  2 2            2  2 2              2
                       xy(1 − y2 )     x y(x + y2 − 2)       xy (x + y2 − 2)
                     
                   +                 +                   +                      .
                        x2 + y2             x2 + y2                x2 + y2
3h00
                                                  10
References
[1] A NJOS , M. F., AND L ASSERRE , J.-B. Handbook on semidefinite, conic and
    polynomial optimization. International Series in Operations Research & Man-
    agement Science, vol. 166. Springer US, 2012.
[2] B LEKHERMAN , G., PARRILO , P., AND T HOMAS , R. Semidefinite optimiza-
    tion and convex algebraic geometry, vol. 13. Siam, 2013.
[3] L ASSERRE , J.-B. Global optimization with polynomials and the problem of
    moments. SIAM J. Optim. 11, 3 (2001), 796–817.
[4] L ASSERRE , J.-B., Ed. Moments, positive polynomials and their applications,
    vol. 1 of Optimization Series. Imperial College Press, London, 2010.
[5] L ASSERRE , J.-B. An introduction to polynomial and semi-algebraic optimiza-
    tion. Vol. 52. Cambridge University Press, 2015.
[6] PARRILO , P. A. Structured semidefinite programs and semialgebraic geom-
    etry methods in robustness and optimization. Dissertation (Ph.D.), California
    Institute of Technology, 2000.
                                       11