0% found this document useful (0 votes)

145 views39 pages

Subgradients in Convex Analysis

The document discusses subgradients and subdifferentials of convex functions. It defines a subgradient as a vector that satisfies an inequality characterizing a first-order approximation of a convex function. The subdifferential is the set of all subgradients. Properties of subgradients and subdifferentials are presented, including rules for calculating subgradients of sums, max functions, and other operations on convex functions. Examples of applying these rules to specific convex functions like the absolute value and L1-norm are also provided.

Uploaded by

DAVID MORANTE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

145 views39 pages

Subgradients in Convex Analysis

Uploaded by

DAVID MORANTE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

L.

Vandenberghe ECE236C (Spring 2020)

2. Subgradients

• definition

• subgradient calculus

• duality and optimality conditions

• directional derivative

2.1
Basic inequality

recall the basic inequality for differentiable convex functions:

f (y) ≥ f (x) + ∇ f (x)T (y − x) for all y ∈ dom f

(x, f (x))

∇ f (x)
−1

• the first-order approximation of f at x is a global lower bound

• ∇ f (x) defines non-vertical supporting hyperplane to epigraph of f at (x, f (x)):
T
∇ f (x) y x
− ≤ 0 for all (y, t) ∈ epi f
−1 t f (x)

Subgradients 2.2
Subgradient

g is a subgradient of a convex function f at x ∈ dom f if

f (y) ≥ f (x) + gT (y − x) for all y ∈ dom f

f (y)

f (x1) + g1T (y − x1)

f (x1) + g2T (y − x1)

f (x2) + g3T (y − x2)

x1 x2

g1, g2 are subgradients at x1; g3 is a subgradient at x2

Subgradients 2.3
Subdifferential

the subdifferential ∂ f (x) of f at x is the set of all subgradients:

∂ f (x) = {g | gT (y − x) ≤ f (y) − f (x), ∀y ∈ dom f }

Properties

• ∂ f (x) is a closed convex set (possibly empty)

this follows from the definition: ∂ f (x) is an intersection of halfspaces

• if x ∈ int dom f then ∂ f (x) is nonempty and bounded

proof on next two pages

Subgradients 2.4
Proof: we show that ∂ f (x) is nonempty when x ∈ int dom f

• (x, f (x)) is in the boundary of the convex set epi f

• therefore there exists a supporting hyperplane to epi f at (x, f (x)):

T
a y x
∃(a, b) , 0, − ≤0 ∀(y, t) ∈ epi f
b t f (x)

• b > 0 gives a contradiction as t → ∞

• b = 0 gives a contradiction for y = x + a with small > 0

1
• therefore b < 0 and g = a is a subgradient of f at x
|b|

Subgradients 2.5
Proof: ∂ f (x) is bounded when x ∈ int dom f

• for small r > 0, define a set of 2n points

B = {x ± re k | k = 1, . . . , n} ⊂ dom f

and define M = max f (y) < ∞

y∈B
• for every g ∈ ∂ f (x), there is a point y ∈ B with

r kgk∞ = gT (y − x)

(choose an index k with |gk | = kgk∞, and take y = x + r sign(gk )e k )

• since g is a subgradient, this implies that

f (x) + r kgk∞ = f (x) + gT (y − x) ≤ f (y) ≤ M

• we conclude that ∂ f (x) is bounded:

M − f (x)
kgk∞ ≤ for all g ∈ ∂ f (x)
r

Subgradients 2.6
Example

f (x) = max { f1(x), f2(x)} with f1, f2 convex and differentiable

f (y)

f2(y)

f1(y)

• if f1( x̂) = f2( x̂), subdifferential at x̂ is line segment [∇ f1( x̂), ∇ f2( x̂)]
• if f1( x̂) > f2( x̂), subdifferential at x̂ is {∇ f1( x̂)}
• if f1( x̂) < f2( x̂), subdifferential at x̂ is {∇ f2( x̂)}

Subgradients 2.7
Examples

Absolute value f (x) = |x|

f (x) ∂ f (x)

x −1

Euclidean norm f (x) = k xk2

1
∂ f (x) = { x} if x , 0, ∂ f (x) = {g | kgk2 ≤ 1} if x = 0
k xk2

Subgradients 2.8
Monotonicity

the subdifferential of a convex function is a monotone operator:

(u − v)T (x − y) ≥ 0 for all x , y , u ∈ ∂ f (x), v ∈ ∂ f (y)

Proof: by definition

f (y) ≥ f (x) + uT (y − x), f (x) ≥ f (y) + v T (x − y)

combining the two inequalities shows monotonicity

Subgradients 2.9
Examples of non-subdifferentiable functions

the following functions are not subdifferentiable at x = 0

• f : R → R, dom f = R+

f (x) = 1 if x = 0, f (x) = 0 if x > 0

• f : R → R, dom f = R+ √
f (x) = − x

the only supporting hyperplane to epi f at (0, f (0)) is vertical

Subgradients 2.10
Subgradients and sublevel sets

if g is a subgradient of f at x , then

f (y) ≤ f (x) =⇒ gT (y − x) ≤ 0

x
f (y) ≤ f (x)

the nonzero subgradients at x define supporting hyperplanes to the sublevel set

{y | f (y) ≤ f (x)}

Subgradients 2.11
Outline

• definition

• subgradient calculus

• duality and optimality conditions

• directional derivative
Subgradient calculus

Weak subgradient calculus: rules for finding one subgradient

• sufficient for most nondifferentiable convex optimization algorithms

• if you can evaluate f (x), you can usually compute a subgradient

Strong subgradient calculus: rules for finding ∂ f (x) (all subgradients)

• some algorithms, optimality conditions, etc., need entire subdifferential

• can be quite complicated

we will assume that x ∈ int dom f

Subgradients 2.12
Basic rules

Differentiable functions: ∂ f (x) = {∇ f (x)} if f is differentiable at x

Nonnegative linear combination

if f (x) = α1 f1(x) + α2 f2(x) with α1, α2 ≥ 0, then

∂ f (x) = α1 ∂ f1(x) + α2 ∂ f2(x)

(right-hand side is addition of sets)

Affine transformation of variables: if f (x) = h(Ax + b), then

∂ f (x) = AT ∂h(Ax + b)

Subgradients 2.13
Pointwise maximum

f (x) = max { f1(x), . . . , fm (x)}

define I(x) = {i | fi (x) = f (x)}, the ‘active’ functions at x

Weak result

to compute a subgradient at x , choose any k ∈ I(x), any subgradient of fk at x

Strong result
∂ f (x) = conv ∂ fi (x)
[
i∈I(x)

• the convex hull of the union of subdifferentials of ‘active’ functions at x

• if fi ’s are differentiable, ∂ f (x) = conv {∇ fi (x) | i ∈ I(x)}

Subgradients 2.14
Example: piecewise-linear function

f (x) = max (aiT x + bi )

i=1,...,m

f (x)

aiT x + bi

the subdifferential at x is a polyhedron

∂ f (x) = conv {ai | i ∈ I(x)}

with I(x) = {i | aiT x + bi = f (x)}

Subgradients 2.15
Example: `1-norm

f (x) = k xk1 = max sT x

s∈{−1,1}n

the subdifferential is a product of intervals

 [−1, 1]

 xk = 0
∂ f (x) = J1 × · · · × Jn, Jk = {1} xk > 0

 {−1} xk < 0



(1, 1)
(−1, 1) (1, 1) (1, 1)

(−1, −1) (1, −1)

(1, −1)

∂ f (0, 0) = [−1, 1] × [−1, 1] ∂ f (1, 0) = {1} × [−1, 1] ∂ f (1, 1) = {(1, 1)}

Subgradients 2.16
Pointwise supremum

f (x) = sup fα (x), fα (x) convex in x for every α

α∈A

Weak result: to find a subgradient at x̂ ,

• find any β for which f ( x̂) = f β ( x̂) (assuming maximum is attained)

• choose any g ∈ ∂ f β ( x̂)

(Partial) strong result: define I(x) = {α ∈ A | fα (x) = f (x)}

∂ fα (x) ⊆ ∂ f (x)
[
conv
α∈I(x)

equality requires extra conditions (for example, A compact, fα continuous in α)

Subgradients 2.17
Exercise: maximum eigenvalue

Problem: explain how to find a subgradient of

f (x) = λmax(A(x)) = sup yT A(x)y

k yk2 =1

where A(x) = A0 + x1 A1 + · · · + xn An with symmetric coefficients Ai

Solution: to find a subgradient at x̂ ,

• choose any unit eigenvector y with eigenvalue λmax(A( x̂))

• the gradient of yT A(x)y at x̂ is a subgradient of f :

(yT A1 y, . . . , yT An y) ∈ ∂ f ( x̂)

Subgradients 2.18
Minimization

f (x) = inf h(x, y), h jointly convex in (x, y)

Weak result: to find a subgradient at x̂ ,

• find ŷ that minimizes h( x̂, y) (assuming minimum is attained)

• find subgradient (g, 0) ∈ ∂h( x̂, ŷ)

Proof: for all x , y ,

h(x, y) ≥ h( x̂, ŷ) + gT (x − x̂) + 0T (y − ŷ)
= f ( x̂) + gT (x − x̂)

therefore
f (x) = inf h(x, y) ≥ f ( x̂) + gT (x − x̂)
y

Subgradients 2.19
Exercise: Euclidean distance to convex set

Problem: explain how to find a subgradient of

f (x) = inf k x − yk2

y∈C

where C is a closed convex set

Solution: to find a subgradient at x̂ ,

• if f ( x̂) = 0 (that is, x̂ ∈ C), take g = 0

• if f ( x̂) > 0, find projection ŷ = P( x̂) on C and take

1 1
g= ( x̂ − ŷ) = ( x̂ − P( x̂))
k ŷ − x̂k2 k x̂ − P( x̂)k2

Subgradients 2.20
Composition

f (x) = h( f1(x), . . . , fk (x)), h convex and nondecreasing, fi convex

Weak result: to find a subgradient at x̂ ,

• find z ∈ ∂h( f1( x̂), . . . , fk ( x̂)) and gi ∈ ∂ fi ( x̂)

• then g = z1 g1 + · · · + z k gk ∈ ∂ f ( x̂)

reduces to standard formula for differentiable h, fi

Proof:

f (x) ≥ h f1( x̂) + g1T (x − x̂), . . . ,
fk ( x̂) + gTk (x− x̂)

≥ h ( f1( x̂), . . . , fk ( x̂)) + z g1 (x − x̂), . . . , gk (x − x̂)
T T T

= h ( f1( x̂), . . . , fk ( x̂)) + (z1 g1 + · · · + z k gk )T (x − x̂)

= f ( x̂) + gT (x − x̂)

Subgradients 2.21
Optimal value function

define f (u, v) as the optimal value of convex problem

minimize f0(x)
subject to fi (x) ≤ ui, i = 1, . . . , m
Ax = b + v

(functions fi are convex; optimization variable is x )

Weak result: suppose f (û, v̂) is finite and strong duality holds with the dual
!
inf f0(x) + λi ( fi (x) − ûi ) + νT (Ax − b − v̂)
X
maximize
x
i
subject to λ0

if λ̂, ν̂ are optimal dual variables (for right-hand sides û, v̂ ) then (−λ̂, −ν̂) ∈ ∂ f (û, v̂)

Subgradients 2.22
Proof: by weak duality for problem with right-hand sides u, v
!
f (u, v) ≥ inf f0(x) + λ̂i ( fi (x) − ui ) + ν̂T (Ax − b − v)
X
x
i
!
= inf f0(x) + λ̂i ( fi (x) − ûi ) + ν̂T (Ax − b − v̂)
X
x
i

− λ̂T (u − û) − ν̂T (v − v̂)

= f (û, v̂) − λ̂T (u − û) − ν̂T (v − v̂)

Subgradients 2.23
Expectation

f (x) = E h(x, u) u random, h convex in x for every u

Weak result: to find a subgradient at x̂ ,

• choose a function u 7→ g(u) with g(u) ∈ ∂x h( x̂, u)

• then, g = Eu g(u) ∈ ∂ f ( x̂)

Proof: by convexity of h and definition of g(u),

f (x) = E h(x, u)

≥ E h( x̂, u) + g(u)T (x − x̂)

= f ( x̂) + gT (x − x̂)

Subgradients 2.24
Outline

• definition

• subgradient calculus

• duality and optimality conditions

• directional derivative
Optimality conditions — unconstrained

x? minimizes f (x) if and only

0 ∈ ∂ f (x?)

this follows directly from the definition of subgradient:

f (y) ≥ f (x?) + 0T (y − x?) for all y ⇐⇒ 0 ∈ ∂ f (x?)

Subgradients 2.25
Example: piecewise-linear minimization

f (x) = max (aiT x + bi )

i=1,...,m

Optimality condition

0 ∈ conv {ai | i ∈ I(x?)} where I(x) = {i | aiT x + bi = f (x)}

• in other words, x? is optimal if and only if there is a λ with

m
λ 0, 1 λ = 1,
T
λi ai = 0, λi = 0 for i < I(x?)
X
i=1

• these are the optimality conditions for the equivalent linear program

minimize t maximize bT λ
subject to Ax + b t1 subject to AT λ = 0
λ 0, 1T λ = 1

Subgradients 2.26
Optimality conditions — constrained

minimize f0(x)
subject to fi (x) ≤ 0, i = 1, . . . , m

assume dom fi = Rn, so functions fi are subdifferentiable everywhere

Karush–Kuhn–Tucker conditions

if strong duality holds, then x?, λ? are primal, dual optimal if and only if
1. x? is primal feasible

2. λ? 0

3. λi? fi (x?) = 0 for i = 1, . . . , m

4. x? is a minimizer of L(x, λ?) = f0(x) + λ? f (x):

Pm
i=1 i i

m
?
0 ∈ ∂ f0(x ) + λi?∂ fi (x?)
X
i=1

Subgradients 2.27
Outline

• definition

• subgradient calculus

• duality and optimality conditions

• directional derivative
Directional derivative

Definition (for general f ): the directional derivative of f at x in the direction y is

0 f (x + αy) − f (x)
f (x; y) = lim
α&0 α

1
= lim t( f (x + y) − t f (x)
t→∞ t

(if the limit exists)

• f 0(x; y) is the right derivative of g(α) = f (x + αy) at α = 0

• f 0(x; y) is homogeneous in y :

f 0(x; λy) = λ f 0(x; y) for λ ≥ 0

Subgradients 2.28
Directional derivative of a convex function

Equivalent definition (for convex f ): replace lim with inf

0 f (x + αy) − f (x)
f (x; y) = inf
α>0 α

1
= inf t f (x + y) − t f (x)
t>0 t

Proof

• the function h(y) = f (x + y) − f (x) is convex in y , with h(0) = 0

• its perspective th(y/t) is nonincreasing in t (ECE236B ex. A2.5); hence

f 0(x; y) = lim th(y/t) = inf th(y/t)

t→∞ t>0

Subgradients 2.29
Properties

consequences of the expressions (for convex f )

0 f (x + αy) − f (x)
f (x; y) = inf
α>0 α

1
= inf t f (x + y) − t f (x)
t>0 t

• f 0(x; y) is convex in y (partial minimization of a convex function in y , t )

• f 0(x; y) defines a lower bound on f in the direction y :

f (x + αy) ≥ f (x) + α f 0(x; y) for all α ≥ 0

Subgradients 2.30
Directional derivative and subgradients

for convex f and x ∈ int dom f

f 0(x; y) = sup gT y
g∈∂ f (x)
fˆ0(x, y) = gT y

ĝ
f 0(x; y) is support function of ∂ f (x)
y ∂ f (x)

• generalizes f 0(x; y) = ∇ f (x)T y for differentiable functions

• implies that f 0(x; y) exists for all x ∈ int dom f , all y (see page 2.4)

Subgradients 2.31
Proof: if g ∈ ∂ f (x) then from page 2.29

0 f (x) + αgT y − f (x)

f (x; y) ≥ inf = gT y
α>0 α

it remains to show that f 0(x; y) = ĝT y for at least one ĝ ∈ ∂ f (x)

• f 0(x; y) is convex in y with domain Rn, hence subdifferentiable at all y

• let ĝ be a subgradient of f 0(x; y) at y : then for all v , λ ≥ 0,

λ f 0(x; v) = f 0(x; λv) ≥ f 0(x; y) + ĝT (λv − y)

• taking λ → ∞ shows that f 0(x; v) ≥ ĝT v ; from the lower bound on page 2.30,

f (x + v) ≥ f (x) + f 0(x; v) ≥ f (x) + ĝT v for all v

hence ĝ ∈ ∂ f (x)

• taking λ = 0 we see that f 0(x; y) ≤ ĝT y

Subgradients 2.32
Descent directions and subgradients

y is a descent direction of f at x if f 0(x; y) < 0

• the negative gradient of a differentiable f is a descent direction (if ∇ f (x) , 0)

• negative subgradient is not always a descent direction

Example: f (x1, x2) = |x1 | + 2|x2 |

g = (1, 2)

x1
(1, 0)

g = (1, 2) ∈ ∂ f (1, 0), but y = (−1, −2) is not a descent direction at (1, 0)

Subgradients 2.33
Steepest descent direction

Definition: (normalized) steepest descent direction at x ∈ int dom f is

∆xnsd = argmin f 0(x; y)

k yk2 ≤1

∆xnsd is the primal solution y of the pair of dual problems (BV §8.1.3)

minimize (over y ) f 0(x; y) maximize (over g ) −kgk2

subject to k yk2 ≤ 1 subject to g ∈ ∂ f (x)

• dual optimal g? is subgradient with least norm

• f 0(x; ∆xnsd) = −kg? k2 ∂ f (x)
g?
• if 0 < ∂ f (x), ∆xnsd = −g?/kg? k2
• ∆xnsd can be expensive to compute

∆xnsd gT ∆xnsd = f 0(x, ∆xnsd)

Subgradients 2.34
Subgradients and distance to sublevel sets

if f is convex, f (y) < f (x), g ∈ ∂ f (x), then for small t > 0,

k x − tg − yk22 = k x − yk22 − 2tgT (x − y) + t 2 kgk22

≤ k x − yk22 − 2t( f (x) − f (y)) + t 2 kgk22
< k x − yk22

• −g is descent direction for k x − yk2, for any y with f (y) < f (x)

• in particular, −g is descent direction for distance to any minimizer of f

Subgradients 2.35
References

• A. Beck, First-Order Methods in Optimization (2017), chapter 3.

• D. P. Bertsekas, A. Nedić, A. E. Ozdaglar, Convex Analysis and Optimization

(2003), chapter 4.

• J.-B. Hiriart-Urruty, C. Lemaréchal, Convex Analysis and Minimization

Algoritms (1993), chapter VI.

• Yu. Nesterov, Lectures on Convex Optimization (2018), section 3.1.

• B. T. Polyak, Introduction to Optimization (1987), section 5.1.

Subgradients 2.36

Exercises With Solutions PDF
No ratings yet
Exercises With Solutions PDF
37 pages
Fixed Point Theorems in Math 175
No ratings yet
Fixed Point Theorems in Math 175
20 pages
Advanced Integral Transforms
No ratings yet
Advanced Integral Transforms
6 pages
Theory of Distributions Notes
No ratings yet
Theory of Distributions Notes
160 pages
Nicolas Bourbaki - Elements of Mathematics - General Topology, Pt.1-Addison-Wesley Educational Publishers Inc (1967)
No ratings yet
Nicolas Bourbaki - Elements of Mathematics - General Topology, Pt.1-Addison-Wesley Educational Publishers Inc (1967)
443 pages
Introduction To Topology (Second Edition) PDF
No ratings yet
Introduction To Topology (Second Edition) PDF
10 pages
Dini Derivatives in Optimization
No ratings yet
Dini Derivatives in Optimization
28 pages
Lehmann-Scheffe Theorem
No ratings yet
Lehmann-Scheffe Theorem
15 pages
Lec 24 Lagrange Multiplier
No ratings yet
Lec 24 Lagrange Multiplier
20 pages
Math 55 3rd Exam Exercises PDF
No ratings yet
Math 55 3rd Exam Exercises PDF
3 pages
Anatoly A. Kilbas, Megumi Saigo - H-Transforms - Theory and Applications (Analytical Methods and Special Functions) - CRC Press (2004) PDF
No ratings yet
Anatoly A. Kilbas, Megumi Saigo - H-Transforms - Theory and Applications (Analytical Methods and Special Functions) - CRC Press (2004) PDF
398 pages
Constrained Optimization
No ratings yet
Constrained Optimization
6 pages
Akra Bazzi Assignment
No ratings yet
Akra Bazzi Assignment
10 pages
Intro to Variational Calculus
100% (1)
Intro to Variational Calculus
3 pages
CSE548 Lectures 7 8 PDF
No ratings yet
CSE548 Lectures 7 8 PDF
10 pages
Polynomial Interpolation Guide
No ratings yet
Polynomial Interpolation Guide
79 pages
Nonparametric Curve Estimation Course
No ratings yet
Nonparametric Curve Estimation Course
114 pages
Advanced Engineering Analysis
No ratings yet
Advanced Engineering Analysis
8 pages
Differential Equations & Modeling Applications: Dennis G.Zill'
No ratings yet
Differential Equations & Modeling Applications: Dennis G.Zill'
4 pages
1 Basic Facts About The Gamma Function
No ratings yet
1 Basic Facts About The Gamma Function
7 pages
Alaoglu's Theorem
No ratings yet
Alaoglu's Theorem
5 pages
ILL Conditioned Systems
No ratings yet
ILL Conditioned Systems
5 pages
Classical Optimization Guide
No ratings yet
Classical Optimization Guide
19 pages
Golden Search and Quadratic Estimation
No ratings yet
Golden Search and Quadratic Estimation
8 pages
20 Best Applied Mathematics Books of All Time
No ratings yet
20 Best Applied Mathematics Books of All Time
141 pages
Cholesky Decomposition
No ratings yet
Cholesky Decomposition
8 pages
Mi1an Algo-Exercices Corriges (1) .FR - en
No ratings yet
Mi1an Algo-Exercices Corriges (1) .FR - en
87 pages
2007 Pep
No ratings yet
2007 Pep
4 pages
Heckman Models for Marriage & Wages
No ratings yet
Heckman Models for Marriage & Wages
4 pages
Numerical Linear Algebra University of Edinburgh Past Paper 2020-2021
100% (1)
Numerical Linear Algebra University of Edinburgh Past Paper 2020-2021
4 pages
Modifications On The Singular Pade-Chebyshev Approximation
No ratings yet
Modifications On The Singular Pade-Chebyshev Approximation
44 pages
Deep Denoiser for Image Restoration
No ratings yet
Deep Denoiser for Image Restoration
16 pages
22 1 Bounded Linear Operators
No ratings yet
22 1 Bounded Linear Operators
5 pages
Double Integral
No ratings yet
Double Integral
10 pages
Linear Control Systems Lecture # 8 Observability & Discrete-Time Systems
No ratings yet
Linear Control Systems Lecture # 8 Observability & Discrete-Time Systems
25 pages
HabeebKhudhurKadhim A 20 PDF
100% (1)
HabeebKhudhurKadhim A 20 PDF
17 pages
Iterative Methods For Linear Systems: Course Website
No ratings yet
Iterative Methods For Linear Systems: Course Website
24 pages
On The Numerical Solutions of A Wave Equation
No ratings yet
On The Numerical Solutions of A Wave Equation
4 pages
Advanced Optimization Techniques
No ratings yet
Advanced Optimization Techniques
5 pages
Computational Fluid Dynamics Code
100% (2)
Computational Fluid Dynamics Code
5 pages
Grade 12 Maths Test 2009
No ratings yet
Grade 12 Maths Test 2009
2 pages
Approximation
No ratings yet
Approximation
14 pages
Optimization for Math Students
No ratings yet
Optimization for Math Students
26 pages
50 Hadith
No ratings yet
50 Hadith
10 pages
University Mathematics II by Olaniyi Evans (2024)
100% (1)
University Mathematics II by Olaniyi Evans (2024)
10 pages
Algorithms For Constrained Optimization
No ratings yet
Algorithms For Constrained Optimization
22 pages
Beta Function Trigonometric
100% (1)
Beta Function Trigonometric
6 pages
Spline
No ratings yet
Spline
13 pages
ProbStochProc 1.42 NoSolns PDF
No ratings yet
ProbStochProc 1.42 NoSolns PDF
241 pages
MAFE208IU-L6 - Least Squares Regression
No ratings yet
MAFE208IU-L6 - Least Squares Regression
47 pages
Problem Statement
No ratings yet
Problem Statement
9 pages
EViews Introduction
No ratings yet
EViews Introduction
31 pages
WTW 263 An Introduction To Numerical Analysis 7th Edt.
No ratings yet
WTW 263 An Introduction To Numerical Analysis 7th Edt.
225 pages
Subgradients: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Subgradients: Ryan Tibshirani Convex Optimization 10-725
25 pages
Subgradients
No ratings yet
Subgradients
39 pages
Notes On Subgradients
No ratings yet
Notes On Subgradients
13 pages
Subgradients Slides
No ratings yet
Subgradients Slides
37 pages
Subgradient Method for Optimization
No ratings yet
Subgradient Method for Optimization
33 pages
Subgradient Methods
No ratings yet
Subgradient Methods
56 pages
Lect4 Removed
No ratings yet
Lect4 Removed
32 pages
Panasonic Ajd610wbp
No ratings yet
Panasonic Ajd610wbp
160 pages
SYLLABUS
No ratings yet
SYLLABUS
4 pages
Bank Statement for Account Holders
No ratings yet
Bank Statement for Account Holders
4 pages
SMB55F Smb55Ra
No ratings yet
SMB55F Smb55Ra
14 pages
Control System Performance Measure-Pcee503-Unit-5
No ratings yet
Control System Performance Measure-Pcee503-Unit-5
32 pages
Pascal's Law and Its Applications
No ratings yet
Pascal's Law and Its Applications
15 pages
Industrial Pressure Transmitter: 0.13% FS Accuracy, External Adjustments, 4 To 20 Ma Output
No ratings yet
Industrial Pressure Transmitter: 0.13% FS Accuracy, External Adjustments, 4 To 20 Ma Output
3 pages
Syllabus - Sem 3 (Scheme 2024)
No ratings yet
Syllabus - Sem 3 (Scheme 2024)
26 pages
AI Art: Transform Words to Images
No ratings yet
AI Art: Transform Words to Images
1 page
Ds Ass3
No ratings yet
Ds Ass3
8 pages
Brochure Vcheck+C10 EN+ (Rev.6) 250704
No ratings yet
Brochure Vcheck+C10 EN+ (Rev.6) 250704
4 pages
Xcel HD
No ratings yet
Xcel HD
26 pages
Lift Maint Format Annexure
No ratings yet
Lift Maint Format Annexure
3 pages
Procurement Analysis Case Study
No ratings yet
Procurement Analysis Case Study
4 pages
Dok TD MRD1 Ge
No ratings yet
Dok TD MRD1 Ge
56 pages
Fiche Utilisation Topspin (Analyseur RMN Et Autres Spectres)
No ratings yet
Fiche Utilisation Topspin (Analyseur RMN Et Autres Spectres)
106 pages
Ex6 Answers
No ratings yet
Ex6 Answers
5 pages
2018 Kilia Agencies worldwide - 2018- 肉品加工
No ratings yet
2018 Kilia Agencies worldwide - 2018- 肉品加工
12 pages
High-Performance Computing Scaling Challenges
No ratings yet
High-Performance Computing Scaling Challenges
10 pages
Suraj Jagdale
No ratings yet
Suraj Jagdale
15 pages
Cute Christmas Cards Collection by Slidesgo
No ratings yet
Cute Christmas Cards Collection by Slidesgo
52 pages
Babuland Market Activation (Ramdan)
No ratings yet
Babuland Market Activation (Ramdan)
1 page
Cancel
No ratings yet
Cancel
2 pages
CY7C63813 SXC Cypress Semiconductor
No ratings yet
CY7C63813 SXC Cypress Semiconductor
86 pages
Po Na Select Solutions Drum
No ratings yet
Po Na Select Solutions Drum
4 pages
Advanced Programming With Net
No ratings yet
Advanced Programming With Net
3 pages
AD1735 - Virtualization and Cloud Computing Lab Manual
100% (1)
AD1735 - Virtualization and Cloud Computing Lab Manual
68 pages
Section 13b Dfss Lecture Notes
No ratings yet
Section 13b Dfss Lecture Notes
46 pages
Onepager Advantage Ac Caadv
No ratings yet
Onepager Advantage Ac Caadv
2 pages
SOM-2428-HRS-L02-KQ-R10-25112024-A0 - KQ-1.2 Layout
No ratings yet
SOM-2428-HRS-L02-KQ-R10-25112024-A0 - KQ-1.2 Layout
1 page

Subgradients in Convex Analysis

Uploaded by

Subgradients in Convex Analysis

Uploaded by

L.

Vandenberghe ECE236C (Spring 2020)

• duality and optimality conditions

recall the basic inequality for differentiable convex functions:

f (y) ≥ f (x) + ∇ f (x)T (y − x) for all y ∈ dom f

• the first-order approximation of f at x is a global lower bound

g is a subgradient of a convex function f at x ∈ dom f if

f (y) ≥ f (x) + gT (y − x) for all y ∈ dom f

f (x1) + g1T (y − x1)

f (x1) + g2T (y − x1)

f (x2) + g3T (y − x2)

g1, g2 are subgradients at x1; g3 is a subgradient at x2

the subdifferential ∂ f (x) of f at x is the set of all subgradients:

∂ f (x) = {g | gT (y − x) ≤ f (y) − f (x), ∀y ∈ dom f }

• ∂ f (x) is a closed convex set (possibly empty)

• if x ∈ int dom f then ∂ f (x) is nonempty and bounded

• (x, f (x)) is in the boundary of the convex set epi f

• therefore there exists a supporting hyperplane to epi f at (x, f (x)):

• b > 0 gives a contradiction as t → ∞

• b = 0 gives a contradiction for y = x +  a with small  > 0

• for small r > 0, define a set of 2n points

and define M = max f (y) < ∞

(choose an index k with |gk | = kgk∞, and take y = x + r sign(gk )e k )

f (x) + r kgk∞ = f (x) + gT (y − x) ≤ f (y) ≤ M

• we conclude that ∂ f (x) is bounded:

f (x) = max { f1(x), f2(x)} with f1, f2 convex and differentiable

Absolute value f (x) = |x|

Euclidean norm f (x) = k xk2

the subdifferential of a convex function is a monotone operator:

(u − v)T (x − y) ≥ 0 for all x , y , u ∈ ∂ f (x), v ∈ ∂ f (y)

f (y) ≥ f (x) + uT (y − x), f (x) ≥ f (y) + v T (x − y)

combining the two inequalities shows monotonicity

the following functions are not subdifferentiable at x = 0

f (x) = 1 if x = 0, f (x) = 0 if x > 0

the only supporting hyperplane to epi f at (0, f (0)) is vertical

the nonzero subgradients at x define supporting hyperplanes to the sublevel set

• duality and optimality conditions

Weak subgradient calculus: rules for finding one subgradient

• sufficient for most nondifferentiable convex optimization algorithms

Strong subgradient calculus: rules for finding ∂ f (x) (all subgradients)

• some algorithms, optimality conditions, etc., need entire subdifferential

we will assume that x ∈ int dom f

Differentiable functions: ∂ f (x) = {∇ f (x)} if f is differentiable at x

Nonnegative linear combination

if f (x) = α1 f1(x) + α2 f2(x) with α1, α2 ≥ 0, then

∂ f (x) = α1 ∂ f1(x) + α2 ∂ f2(x)

(right-hand side is addition of sets)

Affine transformation of variables: if f (x) = h(Ax + b), then

f (x) = max { f1(x), . . . , fm (x)}

define I(x) = {i | fi (x) = f (x)}, the ‘active’ functions at x

to compute a subgradient at x , choose any k ∈ I(x), any subgradient of fk at x

• the convex hull of the union of subdifferentials of ‘active’ functions at x

f (x) = max (aiT x + bi )

the subdifferential at x is a polyhedron

∂ f (x) = conv {ai | i ∈ I(x)}

with I(x) = {i | aiT x + bi = f (x)}

f (x) = k xk1 = max sT x

the subdifferential is a product of intervals

(−1, −1) (1, −1)

∂ f (0, 0) = [−1, 1] × [−1, 1] ∂ f (1, 0) = {1} × [−1, 1] ∂ f (1, 1) = {(1, 1)}

f (x) = sup fα (x), fα (x) convex in x for every α

Weak result: to find a subgradient at x̂ ,

• find any β for which f ( x̂) = f β ( x̂) (assuming maximum is attained)

(Partial) strong result: define I(x) = {α ∈ A | fα (x) = f (x)}

equality requires extra conditions (for example, A compact, fα continuous in α)

Problem: explain how to find a subgradient of

f (x) = λmax(A(x)) = sup yT A(x)y

where A(x) = A0 + x1 A1 + · · · + xn An with symmetric coefficients Ai

Solution: to find a subgradient at x̂ ,

• choose any unit eigenvector y with eigenvalue λmax(A( x̂))

f (x) = inf h(x, y), h jointly convex in (x, y)

Weak result: to find a subgradient at x̂ ,

• find ŷ that minimizes h( x̂, y) (assuming minimum is attained)

Proof: for all x , y ,

Problem: explain how to find a subgradient of

f (x) = inf k x − yk2

• b = 0 gives a contradiction for y = x + a with small > 0