EE364a Homework 5
Summer 2022
Exercise A7.3
 (a) When c(x) = x, the associated family densities is the family of exponential distribu-
     tions. The density function is valid iff θ < 0, since
                                 Z ∞             (
                                                  − 1θ if θ < 0;
                                      eθx dx =
                                   0              ∞      if θ ≥ 0.
 (b) For θ ∈ R, the associated distribution is
                                           1                      eθ
                              pθ (0) =          , and p θ (1) =        .
                                         1 + eθ                 1 + eθ
     When θ runs over R,
                                                  eθ
                                               1 + eθ
     runs over (0, 1), and so the distribution runs over the family of Bernoulli(p) distribu-
     tions, with p ∈ (0, 1). All θ ∈ R are valid.
 (c) Let D = Rn and c1 (x) = x, C2 (x) = − 12 xxT . For parameter θ = (Σ−1 µ, Σ−1 ), we have
                                                                    
                     T                 1       −1     T
                                                            −1
                                                                 T
                exp(θ c(x)) = exp − tr Σ · xx + Σ µ x
                                       2
                                                                             
                                       1       T −1
                                                          T −1     1 T −1
                            ∝ exp − tr x Σ x + x Σ µ − µ Σ µ
                                       2                            2
                                                            
                                       1
                            ∝ exp − (x − µ)T Σ−1 (x − µ) .
                                       2
 (d) In the finite case, we have
                                                                           !
                                                        X
                                         T                        T
                          log pθ (x) = θ c(x) − log           exp(θ c(x)) .
                                                        x∈D
     Subtracting a log-sum-exp function from an affine function makes a concave function.
     And the continuous case can be handled by taking limits of finite sums.
                                               1
 (e) Using the formula from the previous part and assuming the interchangeablity of inte-
     gration and differentiation,
                                       K
      1                             1 X ∂
        · ∇θ lθ (x1 , · · · , xK ) = ·       pθ (xk )
      K                             K k=1 ∂θ
                                       K           Z                                              !−1
                                    1 X                                          X
                                   = ·    c(xk ) −      c(x) exp(θT c(x)) dx ·         exp(θT c(x))
                                    K k=1             D                          x∈D
                                       K
                                    1  X
                                =     ·   c(xk ) − E c(x).
                                    K k=1         x∼D
Exercise 6.2
 (a) For l1 -norm, if we order the entries of b into b(1) ≤ b(2) ≤ · · · ≤ b(n) , we have for any
     x∈R
                   n                 n                                   n
                   X              1X                                1X
     ∥x1 − b∥1 =     |b(k) − x| =       |b(k) − x| + |b(n+1−k) − x| ≥       b(k) − b(n+1−k) .
                 k=1
                                  2 k=1
                                                                      2 k=1
     On the other hand, if we choose x⋆ = median(b), then for any k ≤ n+1         2
                                                                                    holds b(k) ≤
     x⋆ ≤ b(n+1−k) , and hence for any k ∈ [n] we have (b(k) − x⋆ )(b(n−k) − x⋆ ) ≤ 0. Therefore
                     n                        n                                     n
                     X                     1X                                  1X
     ∥x⋆ 1 − b∥1 =         |b(k) −x⋆ | =         |b(k) − x| + |b(n+1−k) − x⋆ | =       b(k) − b(n+1−k) .
                     k=1
                                           2 k=1                                 2 k=1
     In conclusion, the problem attains its minimum at x⋆ = median(b), with an optimal
     value of                           n
                                     1X
                                            b(k) − b(n+1−k) .
                                     2 k=1
 (b) For l2 -norm, the problem is in fact a special case of least-squares approximation. Since
     rank(1) = 1, we have the optimal x⋆ = P     (1T 1)−1 1T b = n1 (b1 + · · · + bn ) := b̄, under
     which the objective attains its minimum nk=1 (bk − b̄)2 .
 (c) For l∞ -norm, we have simply
                            ∥x1 − b∥∞ = max |x − bk |
                                        1≤k≤n                                                                      
                                      = max max (x − bk ), max (bk − x)
                                              1≤k≤n           1≤k≤n                                            
                                      = max x − b(1) , b(n) − x ,
     which is clear to be minimized at 12 (b(1) + b(n) ) with minimal value 12 (b(n) − b(1) ).
                                                     2
Exercise 6.5
It’s obvious that ∥Ax − b∥ − ϵ is a convex function. It suffices to prove that length(·)
is a quasiconvex function. For any fixed α, if x, y ∈ Rn such that length(x) ≤ α and
length(y) ≤ α, then since xk = yk = 0 holds for all k > α, the same obviously holds also for
any convex combination θx + (1 − θ)y of x and y, and hence
                                      length(θx + (1 − θ)y) ≤ α.
Exercise A17.16
Python code:
import numpy as np
import cvxpy as cp
S = np . l i n s p a c e ( 0 . 5 , 2 , 200)
y = cp . V a r i a b l e ( 2 0 0 )
pn = cp . V a r i a b l e ( )
p = np . a r r a y ( [ 1 , 1 , 0 . 0 6 , 0 . 0 3 , 0 . 0 2 , 0 . 0 1 ] )
V = np . a r r a y ( [ [ 1 . 0 5 ] ∗ 2 0 0 ,
                       S,
                       np . maximum( S −       1.1 ,       0) ,
                       np . maximum( S −       1.2 ,       0) ,
                       np . maximum ( 0 . 8    − S,        0) ,
                       np . maximum ( 0 . 7    − S,        0)])
Vn = np . minimum ( np . maximum( S − 1 , −0.1) , 0 . 1 5 )
c o n s t r a i n t s = [V @ y == p ,
                         Vn @ y == pn ,
                         y >= 0 ]
prob = cp . Problem ( cp . Minimize ( pn ) , c o n s t r a i n t s )
prob . s o l v e ( )
print ( pn . v a l u e )
As outputted by the above program, the possible range of the collar is [0.03262, 0.06495].
                                                       3
Exercise 6.10
We first deal with the constraint that f is nondecreasing on B. Note that f is nondecreasing
on B if and only if
                          for all u ⪰ 0 and x ∈ B, ⟨∇f (x), u⟩ ≥ 0.
Since having nonnegative inner product with every u ∈ Rn+ is equivalent to being in Rn+ itself
(the dual cone of Rn+ is Rn+ ), we can further simplify the above constraint into
                                   ∇f (x) ⪰ 0 for all x ∈ B.
Now, plugging in the formula for f , we have ∇f (x) = P x + q. For each k ∈ [n], the kth
entry (P x + q)k ≥ 0 for all x ∈ B if and only if under the worst case selection of each xk ,
                           n                            
                           X          l,       if Pkj ≥ 0
                              Pkj ·                          + qk ≥ 0.
                                     u,        if Pkj < 0
                           j=1
Combining all the above observations, we can finally state the constraint that f is nonde-
creasing on B as follows:
                              min (l · P, u · P ) · 1 + q ⪰ 0,
where the min function denotes entrywise minimum of two matrices. Since min(lz, uz) is
a concave function of z ∈ R, the above system of constraints are all convex constraints (in
(P, q), which takes value in Sn × Rn ).
    Having formulated the nondecreasing constraint, the nonnegative constraint is reduced
to being nonnegative at x = l · 1:
                                   l2 T
                                     1 P 1 + l · 1T q + r ≥ 0.
                                   2
   Finally, f being concave is the same as its hessian P ⪯ 0.
   In conclusion, we can write the optimization problem as
                                     N                                      2
                                     X   1
                          minimize             xTi P xi      T
                                                          + q xi + r − y i
                                     i=1
                                           2
                         subject to min (l · P, u · P ) · 1 + q ⪰ 0,
                                    l2 T
                                      1 P 1 + l · 1T q + r ≥ 0,
                                    2
                                    P ⪯ 0.
   The objective is a sum of squares of affine functions in the variables (P, q, r), and is hence
convex.
                                                  4
Exercise A5.21
 (a) f is convex because it’s the supremum of a family of convex (linear) functions.
 (b) The Lagrangian is
                                      L(c, λ) = cT x + λT (g − F c),
     and so the dual problem is
                                           minimize λT g
                                          subject to λT F = xT ,
                                                     λ ⪰ 0.
     The dual optimal value is equal to the primary optimal value f (x) because strong
     duality always holds for linear programs.
 (c) Plugging the dual problem into the original robust LP yields
                                           minimize λT g
                                          subject to Ax ⪰ b,
                                                       λ T F = xT ,
                                                       λ ⪰ 0.
     Or, more succinctly,
                                           minimize λT g
                                          subject to AF T λ ⪰ b,
                                                     λ ⪰ 0.
     Python code:
     import numpy as np
     import cvxpy as cp
    np . random . s e e d ( 1 0 )
    (m, n ) = ( 3 0 , 1 0)
    A = np . random . rand (m, n ) ; A = np . a s m a t r i x (A)
    b = np . random . rand (m, 1 ) ; b = np . a s m a t r i x ( b )
    c nom = np . ones ( ( n , 1 ) ) + np . random . rand ( n , 1 ) ; c nom = np . a s m a t r i x ( c nom
    average nom = np .sum( c nom ) / n
     lambd = cp . V a r i a b l e ( ( 2 ∗ n + 2 , 1 ) )
     F = np . b l o c k ( [ [ np . eye ( n ) ] ,
                            [−np . eye ( n ) ] ,
                                                   5
                    [ np . ones ( n ) / n ] ,
                    [−np . ones ( n ) / n ] ] )
g = np . b l o c k ( [ [ 1 . 2 5 ∗ c nom ] ,
                       [ −0.75∗ c nom ] ,
                        [ 1 . 1 ∗ average nom ] ,
                       [ −0.9∗ average nom ] ] )
c o n s t r a i n t s = [A @ F .T @ lambd >= b ,
                         lambd >= 0 ]
prob = cp . Problem ( cp . Minimize ( lambd .T @ g ) , c o n s t r a i n t s )
prob . s o l v e ( )
print ( prob . v a l u e )
x = cp . V a r i a b l e ( ( n , 1 ) )
prob nom = cp . Problem ( cp . Minimize ( c nom .T @ x ) , [A @ x >= b ] )
prob nom . s o l v e ( )
print ( prob nom . v a l u e )
As outputted by the above program, the optimal value of f (x) is 3.166, while the
optimal value of cTnom x is 2.109. The rather large difference between the two values
might indicate that the nominal LP is not very robust.