0% found this document useful (0 votes)
7 views14 pages

Chapter9 Hierarchies

Chapter IX of G. Sagnol's work on Convex Optimization introduces polynomial optimization through semidefinite programming and sums of squares relaxations, highlighting the NP-hard nature of these problems. It discusses nonnegative polynomials, sums of squares, and presents the Lasserre hierarchy for polynomial optimization, including examples and theorems that establish conditions for nonnegativity and sums of squares. The chapter also addresses multivariate polynomials and provides insights into the relationship between nonnegativity and sum of squares in polynomial optimization.

Uploaded by

sddewal99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views14 pages

Chapter9 Hierarchies

Chapter IX of G. Sagnol's work on Convex Optimization introduces polynomial optimization through semidefinite programming and sums of squares relaxations, highlighting the NP-hard nature of these problems. It discusses nonnegative polynomials, sums of squares, and presents the Lasserre hierarchy for polynomial optimization, including examples and theorems that establish conditions for nonnegativity and sums of squares. The chapter also addresses multivariate polynomials and provides insights into the relationship between nonnegativity and sum of squares in polynomial optimization.

Uploaded by

sddewal99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

G.

Sagnol Convex Optimization: Chapter IX - Lasserre Hierarchy WS 2019, TU Berlin

Chapter IX: The Lasserre Hierarchy for Polynomial


and Combinatorial Optimization
The purpose of this chapter is to give an introduction on the topic of polynomial optimization via
semidefinite programming and sums of squares relaxations. This material is partly based on lecture notes
and review papers from H. Fawzi [1], M. Laurent [2], M. Mevissen [4], T. Rothvoß [6] and Y. de Castro [5],
as well as the book of J.-B. Lasserre [3].
In this chapter, we will study a polynomial optimization problem of the form

minimize
n
p(x) (P)
x∈R

s.t. gi (x) ≥ 0, (∀i ∈ [m]),

where p, g1 , . . . , gm ∈ R[x1 , . . . , xn ] are polynomials. Clearly, Problem (P) is NP-hard in general, as binary
variables xi ∈ {0, 1} can be encoded by introducting the constraints x2i = xi (which are equivalent to the
pair of polynomial inequalities x2i − xi ≥ 0 and xi − x2i ≥ 0).
Another reduction shows that it is NP-hard to minimize a quartic polynomial (i.e., of degree 4) over
Rn . In the previous lecture, we have seen that it is NP-hard to separate the copositive positive cone Cn
(as otherwise we could optimize efficiently over Cn , and solve e.g. the maximum stable set problem in
polytime). So, given a symmetric matrix Q ∈ Sn , it is NP-hard to decide whether Q ∈ Cn , or to output a
separating hyperplane H such that hH, Qi < 0 and hH, M i ≥ 0, ∀M ∈ Cn , i.e., H ∈ Cn∗ is completely positive.
Now, define the quartic polynomial p(x) = (x ◦ x)T Q(x ◦ x) = i,j Qi,j x2i x2j . Clearly, inf x∈Rn p(x) = 0 iff
P

y T Qy ≥ 0 for all y ≥ 0, that is, Q Cn 0. On the other hand, any x such that p(x) < 0 yields a separating
hyperplane H = (x ◦ x)(x ◦ x)T ∈ Cn∗ .

1 Nonnegative Polynomials of one variable

Definition 1 (Nonnegative Polynomial). We say that a polynomial p ∈ R[x] is nonnegative if

p(x) ≥ 0, ∀x ∈ R.

The set of nonnegative polynomials of degree ≤ d can be identified with the cone
d
X
Pd+ := {p ∈ Rd+1 : pi xi ≥ 0, ∀x ∈ R} ⊂ Rd+1 .
i=0

Note that the problem of minimizing a polynomial p ∈ Rd [x] over R can be written as a conic program over
Pd+ :

inf p(x) = sup {γ ∈ R : p − γ is nonnegative} = sup γ


x∈R γ∈R

p − γe0 P + 0.
d

Page 1 of 14
G. Sagnol Convex Optimization: Chapter IX - Lasserre Hierarchy WS 2019, TU Berlin

Definition 2 (Sum of squares). We say that a polynomial p ∈ R[x] is a sum of squares if there exist
Pm
polynomials p1 , . . . , pm ∈ R[x] such that p = i=1 p2i . We denote by PdSOS ⊂ Rd+1 the set of (vectors
of coefficients of) sum of squares polynomials of degree at most d.

From the definition, it is clear that PdSOS ⊆ Pd+ . The proof of the next proposition is left to the reader.

SOS +
Proposition 1. For all d ∈ N, the cones P2d and P2d are proper.

Note that when d is odd, Pd+ and PdSOS are reduced to the set of nonnegative constant polynomials. So
we can restrict our attention on polynomials of even degree. In fact, in the case of univariate polynomials,
SOS +
equality holds between P2d and P2d :

Theorem 2. All nonnegative polynomials of one variable can be written as the sum of two squares.
Hence, it holds:
+ SOS
P2d = P2d .

Proof. Let a1 , . . . , a2d ∈ C be the (complex-valued) roots of p ∈ R2d [x] (counted with multiplicity). So, we have
p(x) = p2d 2d
Q
i=1 (x − ai ). Since p has real-valued coefficients, it holds p(z̄) = p(z) for all z ∈ C, hence z is a root of p
iff z̄ is a root. Also, if x ∈ R is a real root, then it must have even multiplicity because p is nonnegative on the whole
real line. Hence, after reindexing the roots, we can write
d
Y
p(x) = p2d (x − ai )(x − ai ).
i=1


Now, we recognize that this expression can be written as p(x) = q(x) q(x) = |q(x)|2 , where q(x) = p2d di=1 (x − ai ).
Q
2 2
Finally, we have p(x) = p1 (x) + p2 (x) , where the polynomials p1 and p2 correspond to the real and imaginary parts
of q, respectively.

While checking whether a polynomial is nonnegative basically accounts to solving a polynomial opti-
mization problem (or, as in the above proof, compute all its complex roots), we can easily check if a given
polynomial is a sum of squares, by solving a linear matrix inequality:

P2d
Theorem 3. The polynomial p(x) = i=0 pk xk is a sum of squares if and only if there exists a matrix
M  0 such that the sum of the kth antidiagonal is pk , for each k = 0, . . . , 2d:
X
sk (M ) = Mij = pk , ∀k ∈ {0, . . . , 2d}.
{0≤i,j≤d: i+j=k}

Proof. Let x ∈ R and denote by v(x) = [1, x, x2 , . . . , xd ] ∈ Rd+1 the vector of the first (d + 1) powers of x. Direct
calculation shows that
X 2d
X X 2d
X
v(x)T M v(x) = xi Mij xj = Mij xi+j = sk (M ) xk .
0≤i,j≤d k=0 {i,j: i+j=k} k=0

Hence, it holds p(x) = v(x)T M v(x) iff the matrix M satisfies sk (M ) = pk , for all k ∈ {0, . . . , 2d}.
Now, assume that p(x) = v(x)T M v(x) for some positive semidefinite matrix M . This means that M = P T P for
some matrix P ∈ Rm×(d+1) , and p(x) = v(x)T P T P v(x) = kP v(x)k2 = m T 2 T
P
Pm i=1 (pi v(x)) , where pi is the ith row of
2 T
P . We have thus shown that p is a sum of squares: p(x) = i=1 pi (x) , where pi (x) := pi v(x) is a polynomial of
degree ≤ d.
Conversely, if p is a sum of squares, then we have p(x) = kP v(x)k2 for some matrix P , that is, p(x) =
v(x)T P T P v(x). So, we obtain the result of the theorem by setting M = P P T  0.

Page 2 of 14
G. Sagnol Convex Optimization: Chapter IX - Lasserre Hierarchy WS 2019, TU Berlin

By combining the results of Theorems 2 and 3, we see that polynomial minimization problems over R
can be formulated as an SDP.
Example:
We formulate the problem of minimizing the polynomial p(x) = x6 − 17x4 + 2x3 − 2x + 1 over R as an
SDP. By Theorem 2, this problem is equivalent to solving sup {γ ∈ R : p − γe0 ∈ P6SOS }, where p =
[1, −2, 0, 2, −17, 0, 1]T . Then, we can use Theorem 3 to obtain the following SDP formulation:

maximize γ
M ∈S4

s.t. M00 = 1 − γ
M10 + M01 = −2
#1
M20 + M11 + M02 = 0
M30 + M21 + M12 + M03 = 2
M31 + M22 + M13 = −17
M32 + M23 = 0
M33 = 1
M  0.

2 Multivariate Polynomials
A polynomial p ∈ Rd [x1 , . . . , xn ] can be written compactly as
X
p(x) = pα xα ,
α∈∆(n,d)

Pn
where ∆(n, d) is the set of nonnegative integer vectors with sum ≤ d: ∆(n, d) := {α ∈ Zn≥0 : i=1 αi ≤ d},
and xα is a compact notation for
xα := xα 1 α2 αn
1 x2 · · · xn .
Pn
We say that xα is a monomial of total degree |α| := i=1 αi .

+
The cone Pn,d of n−variate nonnegative polynomials of total degree ≤ d can be defined as in the univariate
SOS
case, as well as the cone Pn,d of n−variate sum of squares of degree ≤ d. Note that the dimension of these
cones is s(n, d) := |∆(n, d)| = n+d

d . As in the univariate case, it is also easy to see that a polynomial can
+
only be nonnegative if its degree is even, so we will often write Pn,2d .
+ SOS
Unlike the univariate case, Pn,d 6= Pn,d in general. A famous counter-example is the Motzkin polynomial:

Proposition 4. The polynomial p(x, y) = x4 y 2 + x2 y 4 + 1 − 3x2 y 2 ∈ R6 [x, y] is nonnegative, but is not


a sum of squares.

Proof. Nonnegativity follows from the arithmetic-geometric mean applied to x2 y 4 , x4 y 2 and 1:

1 4 2
(x y + x2 y 4 + 1) ≥ (x6 y 6 )1/3 .
3
SOS
A certificate for p ∈
/ Pn,d will be given after Theorem 6.

In fact, the study of nonnegative polynomials and sum of squares dates back to Hilbert, who characterized
the equality cases:

Page 3 of 14
G. Sagnol Convex Optimization: Chapter IX - Lasserre Hierarchy WS 2019, TU Berlin

Theorem 5 (Hilbert).
 
+ SOS
(Pn,d = Pn,d ) ⇐⇒ (n = 1) or (d = 2) or (n, d) = (2, 4) .

SOS +
However, it is clear that the inclusion Pn,d ⊆ Pn,d still holds, so we obtain relaxations of polynomial
SOS +
optimization problems by optimizing over Pn,d instead of Pn,d . The next theorem shows that it can be
done by semidefinite programming:

Theorem 6. The polynomial p ∈ R2d [x1 , . . . , xn ], where p(x) = α∈∆(n,2d) pα xα is a sum of squares
P

if and only if there exists a matrix M ∈ Ss(n,d) (indexed by multi-indices α, β ∈ ∆(n, d)) such that M  0
and X
Mα,β = pγ , ∀γ ∈ ∆(n, 2d). (1)
α,β∈∆(n,d)
α+β=γ

Proof. The proof is completely similar to that of Theorem 3: Take any x ∈ Rn and introduce the vector v(x) =
[xα ]α∈∆(n,d) ∈ Rs(n,d) , which contains all monomials of degree ≤ d. Direct calculation shows that v(x)T M v(x) =
p(x) if and only if the equality conditions (1) are satisfied.
Then, when (1) holds, we have

M  0 ⇐⇒ M = P T P ⇐⇒ p(x) = v(x)T P T P v(x) = kP v(x)k2 ⇐⇒ p is a sum of squares.

Example:
We sketch how to use the above theorem to establish a certificate that the Motzkin polynomial

p(x1 , x2 ) = x41 x22 + x21 x42 + 1 − 3x21 x22

of Proposition 4 is not a sum of square. In that case, we have 2d = 6 and n = 2, so the matrix M of
Theorem 6 is of size s(2, 3) = 10; The Motzkin polynomial is a sum of squares iff ∃M ∈ S10
+ such that

M00,00 = 1 [p00 = 1]
2M11,00 + 2M10,01 = 1 [p11 = 0]
2M21,01 + 2M20,02 + 2M12,10 + M11,11 = −3 [p22 = −3]
2M21,03 + M12,12 = 1 [p24 = 1]
(...) there are s(2, 6) = 28 such constraints, one for each pα (...)
#2
This is, in fact, a feasibility SDP problem. Let Pγ denote the matrix such that the above constraints read
hM, Pγ i = pγ , for all γ ∈ ∆(2, 6). To show that this problem is infeasible, we apply the Farkas lemma for
cones (cf. Chapter 2): X
SOS
p∈/ P2,6 ⇐⇒ ∃y : yγ Pγ  0 and hp, yi < 0.
γ

Such a certificate can be obtained by using an SDP solver. For example, one can check numerically that the
vector y defined by

y00 = 1, y20 = y02 = 1.1660, y40 = y04 = 1.8484, y60 = y06 = 9.5289,
y24 = y42 = 0.8523, y22 = 0.9348,

and yγ = 0 for all the other multi-indices γ ∈ ∆(2, 6) is a valid certificate of infeasibility. Hence, the Motzkin
polynomial is not a sum of squares.

Page 4 of 14
G. Sagnol Convex Optimization: Chapter IX - Lasserre Hierarchy WS 2019, TU Berlin

In fact, one can show that, while the Motzkin polynomial p(x, y) is not a sum of squares, multiplying
this polynomial by (1 + x2 + y 2 ) yields a sum of squares. Indeed, one can verify that

3 1
(1+x2 +y 2 ) x4 y 2 +x2 y 4 +1−3x2 y 2 = y 2 (1−x2 )2 +x2 (1−y 2 )2 +(x2 y 2 −1)2 + x2 y 2 (x2 +y 2 −2)2 + x2 y 2 (x2 −y 2 )2 .

4 4
For the purpose of polynomial optimization over Rn , this motivates the study of the following hierarchy
of semidefinite programming problems:

vr∗ := sup {γ ∈ R : (1 + x21 + . . . + x2n )r (p(x) − γ) is a sum of square}.

Proposition 7. We have:
v0∗ ≤ v1∗ ≤ v2∗ ≤ . . . ≤ infn p(x).
x∈R

Proof. The inequality vr∗ ≤ vr+1



follows from the fact that the product of two sums of squares is a sum of squares.
SOS
So, p ∈ Pn,d SOS
=⇒ q ∈ Pn,d+2 , where q(x) = (1 + x21 + . . . + x2n ) · p(x). The inequality vr∗ ≤ inf x p(x) follows from
the implication

(1 + x21 + . . . + x2n )r (p(x) − γ) is a sum of squares =⇒ p(x) ≥ γ, ∀x ∈ Rn .

Under some mild conditions, it can be shown that the hierarchy converges. In the next section, we
+ SOS
are going to study the dual cones of Pn,2d and Pn,2d , which will lead to another hierarchy –The Lasserre
hierarchy– for the general polynomial problem (P) presented in the introduction.

3 The moment problem


Denote by M+ (Rn ) the set of all nonnegative measures over Rn . Given a nonnegative measure µ ∈ M+ (Rn ),
define its (infinite) sequence of moments (yα )α∈Zn≥0 by
Z
yα := xα µ(dx), ∀α ∈ Zn≥0 .
Rn

For example, if µ is the probability measure corresponding to a random vector Y ∈ Rn , then the yα ’s
R
correspond to the raw moments of Y : we have y0 = µ(dx) = 1, and for all i it holds
Z Z
yei = xi µ(dx) = E[Yi ] y2ei = x2i µ(dx) = E[Yi2 ] = V[Yi ] + (E[Yi ])2 .

More generally, yα = E[Y1α1 · · · Ynαn ].


Conversely, given a vector y ∈ Rs(n,d) , we may ask ourselves whether y has a representing measure, i.e.,
if there exists a measure µ ∈ M+ (Rn ) whose moments of total degree ≤ d correspond to the coordinates
of y. This is the moment problem. Those vectors y with a representing measure are in the moment cone
 Z  
M+ d (R n
) := x α
µ(dx) : µ ∈ M +
(R n
) .
Rn α∈∆(n,d)

+
As it turns out, we now show that Pn,2d and M+ n
2d (R ) are dual from each other.

Page 5 of 14
G. Sagnol Convex Optimization: Chapter IX - Lasserre Hierarchy WS 2019, TU Berlin

Proposition 8.
+
(Pn,2d )∗ = M+ n
2d (R )

+
Proof. Let p ∈ Pn,2d , and y ∈ M+ n + n
2d (R ). Then, y has a representing measure µ ∈ M (R ), so

X X Z Z X
hp, yi = pα yα = pα xα µ(dx) = pα xα µ(dx) ≥ 0.
α∈∆(n,2d) α∈∆(n,2d) Rn Rn α∈∆(n,2d)
| {z }
=p(x)≥0

+ n ∗
This already proves Pn,2d ⊆ (M+ 2d (R )) . Conversely, assume p ∈
+
/ Pn,2d , that is, ∃z ∈ Rn : p(z) < 0. Let y be the
vector of moments of degree ≤ 2d corresponding to the dirac measure µ = δz at z. Then, yα = Rn xα δz (dx) = z α ,
R
n ∗ n ∗
pα z α = p(z) < 0, hence p ∈ / (M+ + +
P
so it holds hp, yi = 2d (R )) . This shows (M2d (R )) ⊆ Pn,2d . Finally, the
statement of the proposition follows because the considered cones are proper, so they are equal to their bi-dual.

As for the case of nonnegative polynomials, there is no simple condition which ensures that y ∈ M+ n
2d (R ).
+ n
However, there is a simple linear matrix inequality that is satisfied by all y ∈ M2d (R ). Note that the
situation is reversed compared to the case of polynomials. While an LMI allowed to give a sufficient condition
of positivity for a polynomial (if it is an SOS), for the case of moments we obtain a necessary condition
relying on an LMI for y to have a representing measure. Given a (possibly infinite) sequence (y α ) containing
all elements indexed by some α of total degree |α| := i αi ≤ 2r, denote by Mr (y) ∈ Ss(n,r) the matrix
P

with elements

Mr (y) α,β = yα+β , ∀α, β ∈ ∆(n, r).

It can be seen that Mr (y) = γ∈∆(n,2r) yγ Pγ , where Pγ ∈ Ss(n,r) is the matrix with a 1 at each coordinate
P

(α, β) such that α + β = γ, and zeros elsewhere.

Proposition 9. Let y ∈ M+ n
2d (R ), and let r ∈ N such that r ≤ d. Then, Mr (y)  0.

Proof. As M+ n
2d (R ) and {y : Mr (y)  0} are cones, we can rescale y and assume w.l.o.g. that y0 = 1. Let µ be a
realizing probability measure for y, and let X be a random variable corresponding to µ. For an arbitrary sequence
c ∈ ∆(n, r), define the polynomial c(x) = |α|≤r cα xα . Then, we have
P

X X X X
0 ≤ E[c(X)2 ] = E[ cα X α cβ X β ] = cα cβ E[X α+β ] = = cT Mr (y)c.

cα cβ Mr (y) α,β
α β α,β
| {z } α,β
=yα+β

The above inequality holds for all vectors c, which proves Mr (y)  0.

Now, let us define

MSDP n
2d (R ) := {y ∈ R
s(n,2d)
: Mr (y)  0, ∀r ≤ d} = {y ∈ Rs(n,2d) : Md (y)  0},

where the equality follows from the observation that Mr (y) is a principal submatrix of Mr+1 (y). Then, the
above proposition simply rewrites M+ n SDP n
2d (R ) ⊆ M2d (R ).

We also leave the following result as an exercise:

SOS ∗
Proposition 10. (Pn,2d ) = MSDP n
2d (R ).

Page 6 of 14
G. Sagnol Convex Optimization: Chapter IX - Lasserre Hierarchy WS 2019, TU Berlin

4 Polynomial optimization: The point of view of moments


We return to the polynomial optimization problem given in the introduction of this chapter:

p∗ = infn p(x) (P)


x∈R

s.t. gi (x) ≥ 0, (∀i ∈ [m]),

Problem (P) asks to minimize the polynomial p(x) over K := {x ∈ Rn : gi (x) ≥ 0, ∀i ∈ [m]}. We say
that K, which is defined by polynomial inequalities, is a semi-algebraic set. It is possible to reformulate (P)
as a moment problem over K:
Z
p∗ = inf p(x) µ(dx) (2)
µ∈M+ (K) K
s.t. µ(K) = 1.

The decision variable is a probabililty measure µ supported by K: we have µ ∈ M+ (K), where

M+ (K) := {µ ∈ M+ (Rn ) : µ(Rn \ K) = 0}.

We also define M+
 R
d (K) := ( Rn xα µ(dx))α∈∆(n,d) : µ ∈ M+ (K) , i.e., the set of truncated mo-
ment sequences (up to degree d) for all nonnegative measures over K. Then, denote by y the trun-
cated moment sequence of µ ∈ M+ (K), and observe that µ(K) = µ(Rn ) = y0 and K p(x) µ(dx) =
R
α
R P
K α∈∆(n,d) pα x µ(dx) = hp, yi, so we obtain:

p∗ = inf hp, yi (3)


y∈M+
d (K)

s.t. y0 = 1.

Thus, the problem is now to derive a tractable approximation of M+ d (K). As in the previous section, we
first obtain necessary conditions. Consider a polynomial g = |α|≤q gα xα of degree ≤ 2u and an integer
P

r ∈ N. Then, for a (possibly infinite) sequence (y α ) containing all elements indexed by some α of total
degree |α| ≤ 2(u + r), define the localizing matrix Mr (gy) ∈ Ss(n,r) with elements
 X
Mr (gy) α,β = gγ yα+β+γ , ∀α, β ∈ ∆(n, r).
|γ|≤2u

Proposition 11. Let y be the moment sequence of a measure µ ∈ M+ (K). Then, for all r ∈ N we have

Mr (y)  0 and Mr (gi y)  0, ∀i ∈ [m].

Proof. The fact that Mr (y)  0 follows from Proposition 9, because the truncated vector (yα )|α|≤2r is a member
of M+ + n
2r (K) ⊆ M2r (R ). Then, we proceed similarly as in the proof of Proposition 9 to show Mr (gi y)  0: For
an arbitrary sequence c ∈ ∆(n, r), define the polynomial c(x) = |α|≤r cα xα . The polynomial x 7→ gi (x)c(x)2 is
P

nonnegative over K, so it holds K gi (x)c(x)2 µ(dx) ≥ 0, where µ ∈ M+ (K) is a representing measure for y. Finally,
R

it can be seen that K gi (x)c(x)2 µ(dx) = cT Mr (gi y)c ≥ 0, and since c is arbitrary we obtain the desired result.
R

This proposition can also be rewritten as follows: Define

MSDP
2r (K) := {y ∈ R
s(n,2r)
: Mr (y)  0 and Mr−ui (gi y)  0, ∀i ∈ [m]},

where ui = ddeg(gi )/2e. Then, M+ SDP


2r (K) ⊆ M2r (K).

Page 7 of 14
G. Sagnol Convex Optimization: Chapter IX - Lasserre Hierarchy WS 2019, TU Berlin

In fact, we can also construct more precise outer approximations of M+2r (K) by considering the necessary
condition from Proposition 11 for moment sequences truncated at a higher degree, say 2(r + δ):
n
MSDP
2r,δ (K) := y ∈ Rs(n,2r) : ∃ỹ ∈ Rs(n,2(r+δ)) such that ỹα = yα , ∀|α| ≤ 2r;
Mr+δ (ỹ)  0;
o
Mr+δ−ui (gi ỹ)  0, ∀i ∈ [m] .

This definition basically says that y ∈ MSDP 2r,δ (K) if we can extend the sequence (y α )|α|≤2r to obtain a
sequence (ỹ α )|α|≤2(r+δ) for which the moment matrix and the localizing matrices are positive semidefinite.
Proposition 11 implies that M+ SDP
2r (K) ⊆ M2r,δ (K) holds for all δ ∈ N. Moreover, these outer approximations
are nested by construction, hence

Corollary 12.

M+ SDP SDP SDP SDP


2r (K) ⊆ · · · ⊆ M2r,2 (K) ⊆ M2r,1 (K) ⊆ M2r,0 (K) = M2r (K).

Under some technical conditions, it can be shown that the converse of Proposition 11 is valid: This result
is a consequence of Putinar’s Positivstellensatz, which we will mention at the end of Section 5. For the sake
of this lecture, we will simply say that K satisfies the Archimedean condition if
m
X
R2 − kxk2 = σi (x)gi (x)
i=1

for some R and SOS polynomials σ1 , . . . , σm . This condition can be interpreted as an algebraic certificate of
P
compactness for K. Indeed, for x ∈ K we have gi (x) ≥ 0, ∀i, so i σi (x)gi (x) ≥ 0 for all SOS polynomials
σ1 , . . . , σm , and the Archimedean condition implies kxk2 ≤ R2 . Note that if we know that K is included
in the ball of center 0 and radius R, we can simply add the constraint gm+1 (x) = R2 − kxk2 ≥ 0 into
the problem, so that the Archimedean condition is automatically satisfied (consider the SOS polynomials
σi (x) = 0, ∀i ∈ [m] and σm+1 (x) = 1).

Theorem 13. Assume that K satisfies the Archimedean condition. Then, an infinite moment sequence y
has a representing measure µ ∈ M+ (K) if and only if for all r ∈ N, Mr (y)  0 and Mr (gi y)  0, ∀i ∈ [m].
In this case, the hierarchy of Corollary 12 is convergent:
\
M+ 2r (K) = MSDP
2r,δ (K).
δ∈N

As a consequence, we obtain a hierarchy of SDPs which converges to the polynomial optimization prob-
lem (P), Let u = maxi ui and r = max ddeg(p)/2e, u , and replace the constraint y ∈ M+

d (K) in (3) by
y ∈ MSDP
2r,δ (K) for some δ ∈ N:

minimize hp, yi, (Lasδ )


y∈Rs(n,2(r+δ))

s.t. y0 = 1.
Mr+δ (y)  0;
Mr+δ−ui (gi y)  0, ∀i ∈ [m].

Page 8 of 14
G. Sagnol Convex Optimization: Chapter IX - Lasserre Hierarchy WS 2019, TU Berlin

Note that in the above problem, p was implicitely extended to a vector in Rs(n,2(r+δ)) (so its scalar product
with y is well defined), i.e., we set pα = 0 for all |α| > deg(p). By construction, the sequence of optimal
values p∗δ of (Lasδ ) is nondecreasing, and Theorem (13) guarantees that p∗δ converges to the optimal value
p∗ of (P):
p∗0 ≤ p∗1 ≤ p∗2 ≤ . . . ≤ p∗ and lim p∗δ = p∗ .
δ→∞

Moreover, under certain conditions (one of them is that the problem contains only equality constraints, we
will prove a special case of this result in Section 6), it can be shown that the convergence occurs after a finite
number of steps, i.e., ∃δ ∈ N : p∗δ = p∗ . The following condition can be used to check whether convergence
took place after δ rounds of the hierarchy:

Theorem 14. Let y ∗ be an optimal solution of (Lasδ ), so we have p∗δ = hp, y ∗ i. If y ∗ satisfies

rank Mr+δ (y ∗ ) = rank Mr+δ−u (y ∗ ) (=: rδ )

where u = maxi ui , then p∗δ = p∗ , and y ∗ has a representing measure µ∗ ∈ M+ (K) that solves Problem (2).
Moreover, µ∗ is rδ -atomic, which means that µ∗ can be decomposed as a convex combination of rδ dirac
Prδ
measures: µ∗ = ∗ ∗
P
i=1 wi δxi , with w ≥ 0, i wi = 1, and the support points xi of µ are global

minimizers of (P).

We also point out that there is an algorithm that can be used to extract the support points x∗i of the
optimal measure µ∗ , cf. [3]. This is particularly easy when rδ = rank Mr+δ (y ∗ ) = 1, because in this case
we have Mr+δ (y ∗ ) = v(x∗ )v(x∗ )T , where v(x∗ ) = (x∗α )|α|≤r+δ is the vector of monomials of x∗ . So an
optimal solution x∗ to Problem (P) is simply recovered by reading the coordinates of y ∗ corresponding to
monomials of degree 1: x∗i = ye∗i , ∀i ∈ [n].
Example:
Consider the polynomial optimization problem

min p(x, y) = 11x2 + 16xy + x − y 2 + 2y + 1 s.t. x2 + y 2 ≤ 1.


x,y

If there is a solution to this problem in the interior of the unit ball (x2 + y 2 < 1), then it must be a
stationary point, i.e., the gradient ∇p = [22x + 16y + 1, 16x − 2y + 2]T vanishes. This yields the point
[x0 , y0 ] ' [−0.1133, 0.0933] as a good candidate, but the analysis of the eigenvalues of the hessian matrix
shows that this point is in fact a saddle point of p. So the optimum must be attained on the boundary of the
unit ball. Plotting p(cos θ, sin θ) over θ ∈ (−π, π] reveals that the minimum is attained at θ∗ ' −1.15043,
hence [x∗ , y ∗ ] = [0.4080, −0.9129] is the solution to the above polynomial optimization problem.
The polynomial p has n = 2 variables and is of degree 2r = 2, and the constraint is of degree 2u = 2, so
r = u = 1. For a vector (zα )|α|≤2ρ , recall that the moment matrix is Mρ (z)α,β = zα+β , (|α| ≤ ρ, |β| ≤ ρ) #3
so if we order the monomials of degree ≤ 1 as (1, x, y), corresponding to the multi-indices (00, 10, 01), at the
level δ = 0 of the hierarchy we have
 
z00 z10 z01
Mr+δ (z) = M1 (z) =  z10 z20 z11 .
 
z01 z11 z02
P
And the localizing matrix Mρ (gz) has coordinates Mρ (gz)α,β = γ gγ zα+β+γ for |α| ≤ ρ, |β| ≤ ρ. Hence,
for g(x, y) = 1 − x2 − y 2 ,
Mr+δ−u (gz) = M0 (g1 z) = (1 − z20 − z02 ).
In this example, the localizing matrix is scalar. But if we need to go to the next level (δ = 1) of the
hierarchy, we will need to consider the moment matrix M2 (z) indexed over |α| ≤ 2, |β| ≤ 2 of dimension
s(n, 2) = n+2
 4 n+1
 3
2 = 2 =6, and the localizing matrix M1 (gz) of dimension s(n, 1) = 1 = 1 =3.

Page 9 of 14
G. Sagnol Convex Optimization: Chapter IX - Lasserre Hierarchy WS 2019, TU Berlin

Example (continued):
The SDP for the hierarchy at level δ = 0 is therefore:

minimize 11 z20 + 16 z11 + z10 − z02 + 2 z01 + 1


z

s.t. z00 = 1
 
z00 z10 z01
 z10 z20 z11   0
z01 z11 z02
1 − z20 − z02 ≥ 0.
#3
If we solve this SDP, we get the optimal moment matrix

−0.91294164
 
1. 0.40809016
M1 (z ∗ ) =  0.40809016 0.16653757 −0.37256249  .
−0.91294164 −0.37256249 0.83346242

This matrix has rank 1, so the certificate of global optimality of Theorem 14 is satisfied, as obviously,
Mr+δ−u (z ∗ ) = M0 (z ∗ ) = 1 has rank one, too. This is the easy case (rδ = 1) where the optimal solution of
the polynomial optimization problem can be read directly from the vector of optimal moments z ∗ :

x∗ = z10

= 0.40809016 and y ∗ = z01

= −0.91294164.

5 Polynomial Optimization and Sum of Squares


Now, we derive the dual optimization problem of (Lasδ ), which will give an alternative interpretation of
P
the hierarchy. For ρ ≥ 0, recall that the moment matrix Mρ (y) can be written as Mρ (y) = |γ|≤2ρ yγ Pγ ,
where Pγ is a {0, 1}-symmetric matrix of size s(n, ρ) × s(n, ρ) with a 1 at all coordinates (α, β) such that
α + β = γ. Similarly, in can be seen that the localizing matrix of a polynomial g of degree ≤ 2u, which has
elements Mρ (gy)α,β = |τ |≤2u gτ yα+β+τ , can be written as Mρ (gy) = yγ Qgγ , where Qgγ is the matrix in
P P

Ss(n,ρ) such that (Qgγ )α,β = gτ whenever α + β + τ = γ.


Let us now derive the dual of (Lasδ ):
X
p∗δ = inf hp, yi + sup λ · (1 − y0 ) − hMr+δ (y), Λi − hMr+δ−ui (gi y), Ωi i
y λ∈R
Λ0 i∈[m]
Ωi 0,∀i∈[m]
* + * +
X X X
d∗δ = sup inf hp, yi + λ · (1 − y0 ) − yγ Pγ , Λ − yγ Qgγi , Ωi
λ∈R y
Λ0 |γ|≤2(r+δ) i |γ|≤2(r+δ−ui )
Ωi 0,∀i∈[m]
!
X X
= sup λ + inf yγ pγ − λ(e0 )γ − hPγ , Λi − hQgγi , Ωi i
λ∈R y
Λ0 γ i
Ωi 0,∀i∈[m]

= sup λ (Sosδ )
λ,Λ,(Ωi )i∈[m]
X
s.t. (p − λe0 )γ = hPγ , Λi + hQgγi , Ωi i, ∀|γ| ≤ 2(r + δ)
i∈[m]

Λ0
Ωi  0, ∀i ∈ [m]

Page 10 of 14
G. Sagnol Convex Optimization: Chapter IX - Lasserre Hierarchy WS 2019, TU Berlin

Now, let us try to understand the meaning of this dual formulation. The equality constraint tells us that
the polynomial x 7→ p(x) − λ is the sum of a polynomial σ0 with coefficients hPγ , Λi and some polynomials
q1 , . . . , qm with coefficients hQgγi , Ωi i, ∀i ∈ [m]. We know from Theorem 6 that Λ  0 is equivalent to σ0
being a sum of squares. Similarly, we can observe that
X X X X X
qi (x) = xγ hQgγi , Ωi i = xγ (gi )γ (Ωi )α,β = (gi )γ xτ · xα (Ωi )α,β xβ ,
γ γ α+β+τ =γ τ
| {z } |α,β {z }
=gi (x) =σi (x)

and Ωi  0 ⇐⇒ σi is a sum of squares, cf. proof of Theorem 6. The dual problem of the hierarchy can also
be interpreted as follows:

d∗δ = sup λ (Sosδ )


λ,σ0 ,(σi )i∈[m]
X
s.t. p(x) − λ = σ0 (x) + gi (x) · σi (x)
i∈[m]

σ0 is a SOS polynomial of degree ≤ 2(r + δ)


σi is a SOS polynomial of degree ≤ 2(r + δ − ui ), ∀i ∈ [m]
P
It is easy to see that p(x) − λ = σ0 (x) + i∈[m] gi (x) · σi (x) for some SOS polynomials is an algebraic
certificate for p − λ to be nonnegative over K, i.e., p(x) ≥ λ, ∀x ∈ K. Therefore, the optimal value λ∗ = d∗δ
is an underestimator for the optimal value of the polynomial optimization problem p∗ , which we already
knew from weak duality: d∗δ ≤ p∗δ ≤ p∗ .
Putinar’s Positivstellensatz, the result which can be used to prove convergence of the Lasserre / sum-of-
squares hierarchy, has a very similar flavour indeed:

Theorem 15. (Putinar’s Positivstellensatz) Let p be a positive polynomial over a set K = {x ∈ Rn :


P
gi (x) ≥ 0, ∀i ∈ [m]} that has the Archimedean property. Then, p can be written as p = σ0 + i∈[m] gi · σi
for some SOS polynomials σ0 , . . . , σm .

Note that we require p to be positive on K (nonnegative is not enough), and the result does not tell
anything on the degrees of the σi ’s. The convergence of the sum-of-squares hierarchy (Sosδ ) –and hence of
the Lasserre hierarchy (Lasδ )– simply follows from this theorem: p − p∗ +  is positive over K for all  > 0,
so p − p∗ +  = σ0 + i∈[m] gi · σi for some SOS polynomials σi ’s of degree high enough. This means that
P

there exists δ ∈ N such that p∗ −  is feasible for (Sosδ ).

6 The Lasserre Hierarchy in Combinatorial Optimization


In this section we will review a few properties of the Lasserre hierarchy applied to the integer program

minimize cT x s.t. Ax ≥ b, x ∈ {0, 1}n . (IP)

The integer constraints can be handled by the equalities x2i = xi , forall i ∈ [m]. Instead of considering
the pair of polynomial inequalities x2i − xi ≥ 0 and x2i ≤ xi , we observe that we can simplify the Lasserre
hierarchy by carrying out some moment substitutions: for all nonnegative measure µ supported on the
feasible set K = {x ∈ {0, 1}n : Ax ≥ b}, it holds x2i µ(dx) = xi µ(dx). More generally, for all α ∈ Zn≥0
R R

we have Z Z !
Y
xα µ(dx) = xi µ(dx), where I = {i ∈ [n] : αi ≥ 1}.
K K i∈I

Page 11 of 14
G. Sagnol Convex Optimization: Chapter IX - Lasserre Hierarchy WS 2019, TU Berlin

This indicates that the moments yα do not depend on the actual values of the αi , but only on the sparsity
pattern I of α. Therefore, we can simplify the hierarchy and consider a vector of moments y indexed by
P
some subsets I ⊆ [n]: we identify yI with yeI , where eI = k∈I ek is the incidence vector of I. For example,
the moment matrix has coordinates
Z ! 
Z !
 Y Y Y
Mρ (y) I,J = xi  xj µ(dx) =
 xi µ(dx) = yI∪J .
K i∈I j∈J K i∈I∪J

In the moment formulation of the problem, we are searching a probability measure µ over K, correspond-
ing to a random variable X. Since K is finite, the distribution of X is discrete, and it can be interpreted as
a randomized algorithm that outputs x ∈!K with probability
" #P[X =" x]. Then, we# have
Z Y Y ^
yI = xi µ(dx) = E Xi = P (Xi = 1) .
K i∈I i∈I i∈I

In particular, y∅ = 1 and y{i} = P[Xi = 1]. When x∗ is an optimal solution for the LP relaxation min{cT x :
Ax ≥ b, 0 ≤ x ≤ 1}, it is customary to interprete x∗i as the probability with which xi should be set
to 1. However, rounding all variables indepently from each other will most likely result in a non-feasible
solution x ∈/ K. So information on joint events is required, for example we need to know P[Xi = Xj = 1].
One interpretation of the Lasserre hierarchy is that it gives information on the correlation structure of the
solution, by introducing a set of variables yI giving information on joint events of bounded cardinality |I|.
Let us now have a look at the problem (Lasδ ) applied to (IP). We only consider the linear constraints
gi (x) = j aij xj − bi ≥ 0, ∀i ∈ [m], since we are already handling the equality constraints x2i = xi thanks to
P

the aforementioned moment substitutions. The objective function and all constraints of (IP) are linear, so
ui = u = r = 1. The level δ ≥ 0 of the hierarchy depends on the variables yI for all subsets I of cardinality
≤ 2(δ + 1) (thanks to our moments substitution, as opposed to all variables yα for α ∈ ∆(n, 2(δ + r))).

Definition 3. A vector y = (yI )|I|≤2(δ+1) is said to be in the δ-th level of the Lasserre hierarchy, and
we write y ∈ Lδ , if the following LMIs hold:
y∅ = 1
Mδ+1 (y) := (yI∪J )|I|,|J|≤δ+1  0
 
X
Mδ (gi y) :=  aij yI∪J∪{j} − bi yI∪J   0, ∀i ∈ [m].
j∈[n]
|I|,|J|≤δ

Define further the set Lproj


δ = { [y{1} , . . . , y{n} ]T | y ∈ Lδ }, i.e., the projection of Lδ onto the set of
original coordinates.

Example:
The level δ = 0 of the hierarchy corresponds to the general recipe for constructing the SDP relaxation of
problems with binary variables seen in the previous chapter: Denote by z ∈ Rn the vector with elements
zi = y{i} and denote by Z ∈ Sn the matrix with coordinates Zij = y{i,j} . Then, if we order the sets I of

cardinality |I| ≤ 1 as ∅, {1}, . . . , {n} , we have
y∅ z T 1 zT
   
M1 (y) = = , #4
z Z z Z

and for all i it holds Zii = y{i,i} = y{i} = zi , that is, diag (Z) = z. The matrix M0 (gi y) is scalar, its only
element is indexed by (∅, ∅) and it is equal to j∈[n] aij y{j} − bi y∅ = aTi z − bi . This shows:
P

yT
 
1
y∈ Lproj
0 ⇐⇒ Ay ≥ b and ∃Z :  0, Diag(Z) = y.
y Z

Page 12 of 14
G. Sagnol Convex Optimization: Chapter IX - Lasserre Hierarchy WS 2019, TU Berlin

By construction, the Lproj


δ are nested and they contain K (as in Corollary 12):

{x ∈ Rn : Ax ≥ b} ⊇ Lproj
0 ⊇ Lproj
1 ⊇ . . . ⊇ K = {x ∈ {0, 1}n : Ax ≥ b}.

Since the Lproj


δ are convex (they are SDP-representable), we can be a bit more precise and observe that the
proj
Lδ contain the integer hull H of Problem (IP), that is, the polytope formed by the integer vertices of K:

Lproj
δ ⊇ H := conv {x ∈ {0, 1}n : Ax ≥ b}.

In fact, we will now show that the hierarchy converges to H after at most δ = n rounds, that is, Lproj
n = H.
In other words, there exists a δ ≤ n such that solving the SDP-relaxation over Lproj
δ is equivalent to solving
the original problem (IP).

Lemma 16. Let y ∈ Lδ . Then, for all |I| ≤ 2(δ + 1) it holds yI ∈ [0, 1].

Proof. The principal submatrix of Mδ+1 (y) corresponding to the coordinates ∅ and I is:
" #
1 yI
.
yI yI

It must positive semidefinite, so by the Schur complement lemma we have yI ≥ 0 and yi ≥ yI2 ⇐⇒ 1 ≥ yI .

Lemma 17. Let y ∈ Lδ and assume that 0 < yk < 1 for some k ∈ [n]. Define the vectors z (1) and z (2)
as follows:
yI∪{k} yI − yI∪{k}
(z (1) )I = and (z (2) )I = , ∀|I| ≤ 2δ.
yk 1 − yk
Then, we have z (1) , z (2) ∈ Lδ−1 , (z (1) ){k} = 1, (z (2) ){k} = 0, and the vector ȳ = (yI )|I|≤2δ satisfies

ȳ = yk z (1) + (1 − yk )z (2) .

y{k} y{k} −y{k}


Proof. We have (z (1) ){k} = y{k}
= 1, (z (2) ){k} = 1−y{k}
= 0 and for all |I| ≤ 2δ it holds

(1) (2)
yk z I + (1 − yk )z I = yI∪{k} + yI − yI∪{k} = yI = (ȳ)I ,

so the only thing left to show is z (1) , z (2) ∈ Lδ−1 . The first easy thing to check is that (z (1) )∅ = (z (2) )∅ = 1. Then, the

matrix Mδ+1 (y) is positive semidefinite, so there exist vectors (v I )|I|≤δ+1 such that Mδ+1 (y) I,J = yI∪J = hv I , v J i,
∀|I|, |J| ≤ δ + 1. Now, for all |I| ≤ δ, define the vectors

(1) v I∪{k} (2) v I − v I∪{k}


v̄ I = √ and v̄ I = √ .
yk 1 − yk
(1) (1) 1 1
We have hv̄ I , v̄ J i = yk
hv I∪{k} , v J∪{k} i = y
yk I∪J∪{k}
, which is also the element of coordinates (I, J) of the matrix
(1) (1)
Mδ (z ). Hence, Mδ (z )  0.
(2) (2) 1 1 yI∪J −yI∪J∪{k}
Similarly, hv̄ I , v̄ J i = 1−y k
hv I − v I∪{k} , v J − v J∪{k} i = 1−yk
(yI∪J − 2yI∪J∪{k} + yI∪J∪{k} ) = 1−yk
,
which is the element (I, J) of Mδ (z (2) ). This shows Mδ (z (2) )  0.
Finally, for all i ∈ [n], Mδ−1 (gi z (1) )  0 and Mδ−1 (gi z (2) )  0 can be proved in a similar manner: Let (wI )|I|≤δ

be such that Mδ (gi y) I,J = hwI , wJ i, and for all |I| ≤ δ − 1 define the vectors

(1) wI∪{k} (2) wI − wI∪{k}


w̄I = √ and w̄I = √ .
yk 1 − yk
(1) (1) (2) (2)
It can be verified that Mδ−1 (gi z (1) ) = hw̄I , w̄J i and Mδ−1 (gi z (2) )
 
I,J I,J
= hw̄I , w̄J i, which proves the claim
and concludes this proof.

Page 13 of 14
G. Sagnol Convex Optimization: Chapter IX - Lasserre Hierarchy WS 2019, TU Berlin

Corollary 18. The projection of Lδ over the subset of coordinates |I| ≤ 2δ, Lδ|δ−1 := {(yI )|I|≤2δ |y ∈ Lδ },
satisfies  
Lδ|δ−1 ⊆ conv {z ∈ Lδ−1 : zk = 0} ∪ {z ∈ Lδ−1 : zk = 1} .

Iterating the above result, we also see that for all subsets S ⊆ [n], Lδ|δ−|S| := {(yI )|I|≤2(δ−|S|+1) |y ∈ Lδ }
is the convex hull of all elements of Lδ−|S| with {0, 1} elements in S:

Lδ|δ−|S| := {(yI )|I|≤2(δ−|S|+1) |y ∈ Lδ } ⊆ conv {z ∈ Lδ−|S| : zi ∈ {0, 1}, ∀i ∈ S}.

In particular, if we take δ = n and S = [n], we obtain Ln|0 ⊆ conv {z ∈ L0 : z ∈ {0, 1}n }. Then, by projecting
onto the subset of original coordinates ({1}, . . . , {n}), we obtain Lproj
n ⊆ H := conv {x ∈ {0, 1}n : Ax ≥ b}.
Therefore, we have shown:

Proposition 19.

{x ∈ Rn : Ax ≥ b} ⊇ Lproj
0 ⊇ Lproj
1 ⊇ . . . ⊇ Lproj
n = conv {x ∈ {0, 1}n : Ax ≥ b}.

We conclude this chapter by mentioning that the study of these hierarchies is an active field of research.
In particular, the SDP relaxation of (IP) over Lproj
δ can be solved in polynomial time for a fixed δ ∈ N.
Many of the tightest known polytime approximation algorithms for certain NP hard optimization problems
involve solving δ = O(1/) rounds of the Lasserre hierarchy.

References
[1] Hamza Fawzi. Topics in Convex Optimisation. Lecture notes availble at http://www.damtp.cam.ac.uk/
user/hf323/lent2017_topics_convex_optimisation/topics_convex_optimisation.html, 2017.
[2] Monique Laurent. Sums of squares, moment matrices and optimization over polynomials. In Emerging
applications of algebraic geometry, pages 157270. Springer, 2009.

[3] Jean-Bernard Lasserre. Moments, positive polynomials and their applications. World Scientific. 2010.
[4] Martin Mevissen. Introduction to concepts and advances in polynomial optimization. Re-
view available at https://inf.ethz.ch/personal/fukudak/semi/optpast/FS07/opt_abs/
PolynomialOptimization.pdf, Institute for Operations Research, ETH, Zurich, 2007.

[5] Yohann de Castro. A short introduction to Moment-SOS hierarchies. Review paper available at https:
//ydecastro.github.io/research/paper24.pdf, Deuxième Congès National de la SMF, 2018.
[6] Thomas Rothvoß. The Lasserre hierarchy in Approximation algorithms. Lecture Notes for the
MAPSP 2013 Tutorial available at https://sites.math.washington.edu/~rothvoss/lecturenotes/
lasserresurvey.pdf, 2013.

Page 14 of 14

You might also like