0% found this document useful (0 votes)

17 views55 pages

Op Tim Ization Notes

Uploaded by

Rudrakshi Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views55 pages

Op Tim Ization Notes

Uploaded by

Rudrakshi Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Indian Institute of Technology Madras

Optimization

Nirav Bhatt
Email: niravbhatt@iitm.ac.in
Office: BT 307 Block II
Biotechnology Department

September 27, 2024

September 27, 2024 1/44

Content

I Unconstrained vs Constrained optimization

I Types of Optimizations problems:
I Linear programming (LP)
I Quadratic programming (QP)
I Nonlinear programming (NLP)
I Dynamic optimization as an NLP
I Overview of Numerical solution approaches
I Book: Nocedal J, Wright SJ, editors. Numerical optimization.
New York, NY: Springer New York; 1999 Aug 27. Reference
Chapters 1, 2, 12
I Reference Book: An Introduction to Optimization, EDWIN K. P.
CHONG and STANISLAW H. ZAK

September 27, 2024 2/44

Learning outcomes
Introduction to optimization and different forms of optimization

I The students are expected to learn

I Different types of optimization problems and their application
I KKT Conditions for finding an optimal solution
I Overview of Line search and trust region for numerical
optimization

September 27, 2024 3/44

Optimization
Elements

I Mathematical Optimization (or Mathematical Programming):

Select a best option or a set of options from the available set
I Consider an optimization problem with decision variables
x = [x1 , x2 , ..., xn ]T
max f (x)
x
s.t. Ax ≤ b
1T x ≤ 1
or
max f (x1 , x2 , ..., xn )
x1 ,x2 ,...,xn

I f (x) is called cost function or objective function or loss function

I Ax ≤ b and ni=1 xi ≤ 1 are constraints
P

September 27, 2024 4/44

Optimization
Some Problems in ML/DL
I Regression Analysis: Given a dataset (y, X), fit a function form
f (X, θ) : Rn × Rp → R

min ky − f (X, θ)k22

I Classification Problems: Given (y, X), fit a function form

f (X, θ) : Rn × Rp → {0, 1}

m
1 X
min − (yi log(pi (x, θ)) + (1 − yi )log(1 − pi (x, θ))
θ m
i=1

where pi (· · · ) is the probability of the class 1.

September 27, 2024 5/44
Optimization
Constraints

I Types of constraints
I Linear Constraints

Ax ≤ b (Inequality )

Aeq x = beq (Equality )

I Inequality Constraints

g(x) ≤ 0 (Inequality )

h(x) = 0 (Equality )
I Bounds: xlb ≤ x ≤ xub
I x ∈ S For example, S = {−1, 0, 1}

September 27, 2024 6/44

Optimization
Types

I Types based constraints: (i) Unconstrained optimization and

(ii) Constrained Optimization; (i) Static optimization and (ii)
Dynamic optimization
I Types of function or variable set smoothness: (i) Continuous
Optimization, and (ii) Integer optimization
I Types of function and constraints
I Linear Programming (LP)
I Quadratic Programming (QP)
I Nonlinear Programming (NLP) or Nonlinear Optimization
I Mixed Integer LP or NLP

September 27, 2024 7/44

Optimization

I Unconstrained Optimization: x Decision variables

max or min f (x) (1)

I Constrained Optimization

max f (x)
x
s.t. gi (x) ≤ bi , i = 1, . . . , p
(2)
aTj x = cj , j = 1, . . . , q
LB
x ≤ x ≤ xUB

I x∗ denote an optimal solution

September 27, 2024 8/44

Motivation
Dynamic Optimization

I The current profit is a function of the current production (x(t))

and the rate of change of production (x 0 (t))
I The continuous problem can be defined as:

ZT
max J[x] = f (t, x(t), ẋ(t))dt
(3)
t=1
s.t. x(t) ≥ 0, x(0) = x0
I Objective: Find the function x(t) that maximizes the functional
J[x]
I x ∗ (t): Optimal trajectory

September 27, 2024 9/44

Optimization
I Linear Programming

min cT x
x
s.t. Ax ≤ b,
(4)
Cx = d,
xLB ≤ x ≤ xUB

I Quadratic Programming

min xT Hx + cT x
x
s.t. Ax ≤ b,
(5)
Cx = d,
xLB ≤ x ≤ xUB

H: Symmetric matrix
September 27, 2024 10/44
Optimization
I Nonlinear Programming

min f (x)
x
s.t. gi (x) ≤ 0, i = 1, . . . , M,
(6)
hj (x) = 0, j = 1, . . . , N
xLB ≤ x ≤ xUB

I Integer Quadratic Programming

min xT Hx + cT x
x
s.t. Ax ≤ b,
(7)
Cx = d,
some x are integer

September 27, 2024 11/44

Motivation
Static Optimization

I Static Optimization: An optimal number or finite set of

numbers
I Static Optimization:

max f (x) (8)

x
I Assumptions: f (x) is continuously differentiable
∂f
I First order necessary condition: ∂x (x ∗ ) = 0
2
I Second order necessary condition: ∂ f2 ≤ 0
∂x
I Example: The operating point x ∗ that maximizes the profit
f (x), where x: # of units

September 27, 2024 12/44

Motivation
Static Optimization

I Example

max 1000000 + 4000x − x 2 (9)

I ∂f ∗ ⇒ 4000 − 2x = 0
∂x (x ) = 0
∗
x = 2000
I ∂2f
∂x 2
= −2 < 0

September 27, 2024 13/44

Optimization
Static Optimization

I Static Optimization with several variables

max f (x1 , . . . xn ) (10)

x1 ,...,xn

I Example: A plant can produce n items. Find the operating

point x1 , . . . xn that maximizes the profit f (x1 , . . . xn )

September 27, 2024 14/44

Motivation
Static Optimization: Detour to Some Problems in Linear Algebra

I Recall: Ax = b when b is not in the column space spanned by

A.
I Projection of b on the plane spanned by Ax
I Error e = Ax − b or the closest point on the plane spanned by
the columns of A from b
I Optimization Problem

min kAx − bkp

x1 ,...,xn

p = 1, 2, 3 . . . , ∞

September 27, 2024 15/44

Motivation
Static Optimization

I Example

max 1000000 + 300x1 + 500x2 − x12 − x22 (11)

I ∂f ∗ − 2x1 = 0⇒ x1∗ = 150

∂x (x1 ) = 0 ⇒ 300
I ∂f ∗ − 2x2 = 0⇒ x2∗ = 250
∂x (x2 ) = 0 ⇒ 500
I ∂2f ∂2f
∂x12
= ∂x 2 = −2 < 0
2
I These conditions are correct??

September 27, 2024 16/44

Static Optimization
Local vs Global SOlutions

I Local Minimizer
A point x∗ is a local minimizer if there is a neighborhood N of
x∗ such that f (x∗ ) ≤ f (x) for all x ∈ N .

A point x∗ is a strict local minimizer (also called a strong local

minimizer) if there exists a neighborhood N of x∗ such that
f (x∗ ) < f (x) for all x ∈ N with x 6= x∗

I Global Minimizer
A point x∗ is a global minimizer if f (x∗ ) ≤ f (x) for all x
I x∗ is a stationary point if ∇f (x∗ ) = 0.

September 27, 2024 17/44

Static Optimization
Unconstrained Optimization

I First-order Necessary Conditions

If x∗ is a local minimizer and f is continuously differentiable in
an open neighborhood of x∗ , then ∇f (x∗ ) = 0.

I Second-order Necessary Conditions

If x∗ is a local minimizer of f then ∇2 f (x) exists and is
continuous in an open neighborhood of x∗ , then ∇f (x∗ ) = 0
and ∇2 f (x∗ ) is positive semidefinite.

I Second-order Sufficient Conditions

Suppose that ∇2 f (x∗ ) is continuous in an open neighborhood
of x∗ and that ∇f (x∗ ) = 0 and ∇2 f (x∗ ) is positive definite.
Then, x∗ is a strict local minimizer of f .

September 27, 2024 18/44

Constrained Optimization
Equality Constraints

I Consider the following optimization problem

min f (x)
(12)
s.t. hi (x) = 0, i = 1, 2, . . . , m

I Constraint optimization to Unconstrained optimization

I Lagrange function with Lagrange multipliers λ1 , . . . , λm
m
X
L(x, λ) = f (x) + λi hi (x)
i=1

September 27, 2024 19/44

Constrained Optimization
I First-order necessary condition

∇f (x ∗ ) + λ∗1 ∇h1 (x ∗ ) + . . . + λ∗m ∇hm (x ∗ ) = 0

hi (x ∗ ) = 0, i = 1, 2, . . . , m
I Second-order necessary condition

w T Lxx (x ∗ , λ∗ )w ≥ 0, ∀w such that ∇hi (x ∗ ).w = 0, i = 1, 2, . . . , m

where
m
X
∗ ∗ ∗
2
Lxx (x , λ ) = ∇ f (x ) + λ∗i ∇2 hi (x ∗ )
i=1

Lxx(x ∗ , λ∗ )
is positive definite on the tangent space defined by
∗
∇hi (x ).w = 0 the tangent space

September 27, 2024 20/44

Example: Constrained Optimization
I Problem:
min x1 + x2
s.t. x12 + x22 − 2 = 0
I ∇f = [1, 1]T ∇h = [2x1 , 2x2 ]T
x2
0 2
2

2
1 (1,1)
2

0
2
−1
1
2

x1
2

2
−1
1
−2

2
−1
2
(−1,−1)
−2
2

September 27, 2024 21/44

Example: Constrained Optimization
I Problem:
min x1 + x2
s.t. x12 + x22 − 2 = 0
I ∇f = [1, 1]T ∇h = [2x1 , 2x2 ]T

September 27, 2024 21/44

Example: Constrained Optimization
I Problem:
min x1 + x2
s.t. x12 + x22 − 2 = 0
I ∇f = [1, 1]T ∇h = [2x1 , 2x2 ]T

September 27, 2024 21/44

Constrained Optimization: Example
KKT Conditions
I Constrained optimization problem:

min f (x)
s.t. hi (x) = 0, i = 1, . . . , m
s.t. gj (x) ≤ 0, j = 1, . . . , n

where f , hi , and gj : smooth, and real-valued functions on a

subset of Rn
I x is a feasible point, if it satisfies the equality and inequality
constraints
I A constraint j is said to be active if gj (x) = 0 at any point x
I A(x): A set of all active constraints at any point x
I For a feasible point x and the active set A(x), if the gradients
∇hi , and ∇gj , ∀j ∈ A(x) are linearly independent,they satisfy
the Linear Independence Constraint Qualification (LICQ).
September 27, 2024 22/44
Constrained Optimization
KKT Conditions
I Lagrangian for the problem with the Lagrange multipliers λ
and µ:
m
X n
X
L(x, λ, µ) = f (x) + λi hi (x) + µj gj (x)
i=1 j=1

I First order Necessary condtions (Karush-Kuhn-Tucker

conditions): For x ∗ a local solution
∇x L(x ∗ , λ∗ , µ∗ ) = 0
hi (x ∗ ) = 0, i = 1, . . . , m
gj (x ∗ ) ≤ 0, i = 1, . . . , n
λi > 0, µj ≥ 0
µj (x )gj (x ∗ ) = 0 (Complementarity condition)
∗

I Complementarity condition: Ensures µj = 0, ∀j 6∈ A(x)

September 27, 2024 23/44
Constrained Optimization
KKT Conditions
I Complementarity condition: Ensures µj = 0, ∀j 6∈ A(x)
I Second order necessary condition

w T ∇2 L(x ∗ λ∗ , µ∗ ) w ≥ 0, ∀w ∈ T (x ∗ )

T (x ∗ ) = {w|∇hi (x ∗ )T w = 0, ∀i and ∇gj (x ∗ )T w = 0, ∀j ∈ A(x ∗ )}

(a) (b) (c)

I (a)µj > 0, ∀j ∈ A(x ∗ ): ∇f

inwards: Otherwise maximization
and ∇g outward: Otherwise
increase inward
m
I −∇f (x ∗ ) = µj ∇gj (x ∗ )
P
(d)

j=1
(b) µ1 , µ2 < 0,
(c) µ1 > 0, µ2 < 0,
(d) µ1 , µ2 > 0

September 27, 2024 24/44

Constrained Optimization
KKT Conditions
I Complementarity condition: Ensures µj = 0, ∀j 6∈ A(x)
I Second order necessary condition

w T ∇2 L(x ∗ λ∗ , µ∗ ) w ≥ 0, ∀w ∈ T (x ∗ )

T (x ∗ ) = {w|∇hi (x ∗ )T w = 0, ∀i and ∇gj (x ∗ )T w = 0, ∀j ∈ A(x ∗ )}

(a) (b) (c)

I (a)µj > 0, ∀j ∈ A(x ∗ ): ∇f

inwards: Otherwise maximization
and ∇g outward: Otherwise
increase inward
m
I −∇f (x ∗ ) = µj ∇gj (x ∗ )
P
(d)

j=1
(b) µ1 , µ2 < 0,
(c) µ1 > 0, µ2 < 0,
(d) µ1 , µ2 > 0