0% found this document useful (0 votes)

70 views34 pages

Advanced Control Systems Lecture

Uploaded by

Erdem Şimşek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views34 pages

Advanced Control Systems Lecture

Uploaded by

Erdem Şimşek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Reinforcement Learning and Optimization-based

Control

Assoc. Prof. Dr. Emre Koyuncu

Department of Aeronautics Engineering

Istanbul Technical University

Lecture 1: Introduction

E. Koyuncu (ITU) RL and ObC Lecture 1 1 / 34

Table of Contents

1 Optimal Control and RL

2 Adaptive Control

3 Reinforcement Learning

4 RL Applications

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 2 / 34

Table of Contents

1 Optimal Control and RL

2 Adaptive Control

3 Reinforcement Learning

4 RL Applications

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 3 / 34

Adaptive and Optimal Control

Optimal Control
• Minimize prescribed Adaptive Control
performance function • Learns online via feedback
• Usually designed to be offline to function
solve HJB • Not usually designed to be
• Use complete knowledge of the optimal
system • First identify the system then
• Solving Nonlinear HJB equation use the model
are often hard or impossible

E. Koyuncu (ITU) RL and ObC Lecture 1 4 / 34

MPC and RL
• Both are frameworks to solve sequential decision making problems
• Both automatically design controllers based on desired outcomes
(reward/cost, constraints, etc.)
Reinforcement Learning
• Controller directly learned from Model Predictive Control
data, exploration and • System identification precedes
exploitation control implementation, model
• Both continuous and fixed during execution
binary/sparse rewards • Typically convex stage costs
• Constraints imposed via • Constraints imposed explicitly
penalties • Online optimization over
• Mostly parameterized controller, prediction horizon - expensive?
Deep Learning integrated cheap • Usually combined with state
• Usually history included in estimator
definition of the state
E. Koyuncu (ITU) RL and ObC Lecture 1 5 / 34
Linear Quadratic Regulators(LQR)

The most basic sort of optimal controller for LTI systems. Consider
following system
ẋ = Ax(t) + Bu(t)
where the state x(t) ∈ Rn and control input u(t) ∈ Rm . The system is
associated with the infinite horizon quadratic cost function
Z ∞
V (x(t0 ), t0 ) = (x T (τ )Qx(τ ) + u T Ru(τ ))d(τ )
t0

with weighting matrices Q ≥ 0, R ≥ 0.

• it is assumed that (A, B) stabilizable - there exist a control input
makes the system stable
√
• (A, Q) is detectable - unstable modes are observable through
√
output (y = Qx)

E. Koyuncu (ITU) RL and ObC Lecture 1 6 / 34

Linear Quadratic Regulators(LQR)
The LQR optimal control problem requires finding the policy that
minimizes the cost
u ∗ (t) = arg min V (t0 , x (t0 ) , u(t))
u(t)
t0 ≤t≤∞

The solution is given by u(t) = −Kx(t), where the gain matrix will be
K = R −1 B T P
where P matrix is a positive definite solution of Algebraic Riccati Equation
AT P + PA + Q − PBR −1 B T P = 0

• under stabilizabiltiy and detectability conditions there is a unique

positive semi-definite solution
• this is closed loop system A − BK is asymptotically stable
• this is offline solution requires complete knowledge on the system
dynamics
E. Koyuncu (ITU) RL and ObC Lecture 1 7 / 34
Linear Quadratic Zero-sum Games

The LQ-ZS games have following linear dynamics

ẋ = Ax(t) + Bu(t) + Dd

where the state x(t) ∈ Rn , control input u(t) ∈ Rm , and disturbance

d(t) ∈ Rk . The system is associated with the infinite horizon quadratic
cost function
1 ∞ T
Z Z ∞
T 2 2
V (x(t), u, d) = x Qx + u Ru − γ kdk dτ ≡ r (x, u, d)dτ
2 t t

with the control weighting matrix R = R T > 0 and a scalar λ > 0.

E. Koyuncu (ITU) RL and ObC Lecture 1 8 / 34

Linear Quadratic Zero-sum Games
The LQ-ZS games require finding the control policy that minimizes the
cost wrt the control and maximizes the cost wrt to the disturbance

V ∗ (x(0)) = min max J(x(0), u, d)

u d
Z ∞
= min max Q(x) + u T Ru − γ 2 kdk2 dt
u d 0
The solution of this optimal control problem is given by

u(x) = −R −1 B T Px = −Kx
1
d(x) = 2 D T Px = Lx
γ
where the P is the solution to the game ARE
1
0 = AT P + PA + Q − PBR −1 B T P + PDD T P
γ2
E. Koyuncu (ITU) RL and ObC Lecture 1 9 / 34
Linear Quadratic Zero-sum Games

√
• There exist a solution P > 0 if (A, B) is stabilizable, (A, Q) is
observable, and λ > λ∗ the H-infinity gain.
• this is offline solution that requires complete knowledge of the system
dynamics (A, B, D)
• if system dynamics (A, B, D) change or the performance index
(Q, R, λ) varies, a new optimal control solution needed.

E. Koyuncu (ITU) RL and ObC Lecture 1 10 / 34

Table of Contents

1 Optimal Control and RL

2 Adaptive Control

3 Reinforcement Learning

4 RL Applications

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 11 / 34

Model Reference Adaptive Controller (MRAC)

Consider the simple scalar case

ẋ = ax + bu
where the state x(t) ∈ Rn , control input u(t) ∈ Rm , and input gain b > 0.
It is desired for the plant state to follow the state of a reference model
given by
ẋm = −am xm + bm r
where r (t) ∈ Rn reference input signal. Take the controller structure as

u = −kx + dr

which has a feedback term and a feedforward term. k and d are unknown
and are to be determined so that the state tracking error
e(t) = x(t) − xm (t) goes to zero.

E. Koyuncu (ITU) RL and ObC Lecture 1 12 / 34

Model Reference Adaptive Controller (MRAC)

E. Koyuncu (ITU) RL and ObC Lecture 1 13 / 34

Model Reference Adaptive Controller (MRAC)

Tune the controller parameters online. E.g., using Lyapunov techniques,

the parameters are tune wrt

k̇ = αex, d˙ = −βer

where α, β > 0 are tuning parameters, then the tracking error e(t) goes to
zero with time.
• the feedback gain k is tuned by a product of its state x(t) in the
traking error e(t)
• feedforward gain d is tuned by a product of its input r (t) in the
traking error e(t)
• the plant dynamics (a, b) are not needed in the tuning laws!

E. Koyuncu (ITU) RL and ObC Lecture 1 14 / 34

Table of Contents

1 Optimal Control and RL

2 Adaptive Control

3 Reinforcement Learning

4 RL Applications

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 15 / 34

Reinforcement Learning
RL has close connections to both optimal and adaptive control.
• allows designing adaptive controllers learn online and in real time
• provide solutions to user prescribed optimal control problems.
E.g., actor-critic structure

• policy evaluation executed by the critic

• policy improvement preformed by the actor.
• determine how close to optimal the current action
• modify control policy yields a value function.
E. Koyuncu (ITU) RL and ObC Lecture 1 16 / 34
AI/RL vs Control Terminology
RL uses max value, Control uses min cost
• Reward of a stage → Cost of a stage
• State value → State cost
• Value function → Cost function

System terminology
• Agent → Controller or decision maker
• Action → Control or decision
• Environment → Dynamic system

Learning/Planning terminology
• Learning → Solving a problem with simulation
• Self-learning → Solving problem with simulation-based policy iteration
• Planning vs Learning → Solving problem with model-based or
model-free simulations
E. Koyuncu (ITU) RL and ObC Lecture 1 17 / 34
Value Functions
• Value functions measure the goodness of a particular state or
state/action pair: how good is for the agent to be in a particular state
or execute a particular action at a particular state, for a given policy.
• Optimal value functions measure the best possible goodness of states
or state/action pairs under all possible policies.

• Prediction: For a given policy, estimate state and state/action value

functions
• Control (Optimal): Estimate the optimal state and state/action value
functions
E. Koyuncu (ITU) RL and ObC Lecture 1 18 / 34
Sequential Decision

Optimal decision
• At current state, apply decision that minimizes
Current stage cost + J ∗ (Next state)
where J ∗ (Next state) is the optimal future cost, starting from next
state
• This defines optimal policy - an optimal control to apply at each state

E. Koyuncu (ITU) RL and ObC Lecture 1 19 / 34

Principle of Optimality

Principle of optimality
Let {u0∗ , ..., uN−1
∗ } be an optimal control sequence wrt state sequence
{x0 , ..., xN }. Consider the tail subproblem that starts at xk∗ at time k and
∗ ∗

minimizes over {uk , ..., uN−1 } the cost-to-go from k to N

N−1
X
gk (xk∗ , uk ) + gm (xm , um ) + gN (xN )
m=k+1

Then the tail optimal control sequence {uk∗ , ..., uN−1

∗ } is optimal for the
tail subproblem.
E. Koyuncu (ITU) RL and ObC Lecture 1 20 / 34
Dynamic Programming
Solve all the tail subproblems of a given time length using the solution of
all the tail subproblems of shorter time length.
By principle of optimality
• Consider every possible uk and solve the tail subproblem that starts at
next state xk+1 = fk (xk , uk )
• Optimize over all uk

By principle of optimality
Start with
JN∗ (xN ) = gN (xN ) , for all xN
and for k = 0, , N − 1, let

Jk∗ (xk ) = ∗

min gk (xk , uk ) + Jk+1 (fk (xk , uk )) , for all xk .
uk ∈Uk (xk )

then optimal cost J ∗ (x0 ) is obtained at the last step: J0 (x0 ) = J ∗ (x0 )
E. Koyuncu (ITU) RL and ObC Lecture 1 21 / 34
Constraints via Infinite Cost Values

Can assign infinite cost to infeasible points, using extended reals

R := R ∪ {∞, −∞}

Equivalent Unconstrained
Constrained Optimal Control Formulation
Problem min
N−1
X
c̄ (sk , ak ) + Ē (sN )
s,a
k=0

PN−1 s.t. s0 = s̄0

mins,a k=0
c (sk , ak ) + E (sN )
s.t. s0 = s̄0 sk+1 = f (sk , ak ) , k = 0, . . . , N − 1
sk+1 = f (sk , ak ) ( )
c(s, a) if h(s, a) ≤ 0
0 ≥ h (sk , ak ) , k = 0, . . . , N − 1 with c̄(s, a) =
0 ≥ r (sN ) ∞ else
( )
E (s) if r (s) ≤ 0
and Ē (s) =
∞ else

E. Koyuncu (ITU) RL and ObC Lecture 1 22 / 34

Model-free VS Model-based

George Box
”All models are wrong but some models are useful”

• Due to model error model-free methods often achieve better policies

though are more time consuming
• (Adaptivity) We will examine use of (inaccurate) learned models and
ways not to hinder the final policy while still accelerating learning

E. Koyuncu (ITU) RL and ObC Lecture 1 23 / 34

Bellman’s curse of dimensionality

• Exact Dynamic Programming is an elegant and powerful way to solve

any optimal control problem to global optimality, independent of
convexity. It can be interpreted an efficient implementation of an
exhaustive search that explores all possible control actions for all
possible circumstances.
• However, it requires the tabulation of cost-to-go functions for all
possible states s ∈ S. Thus, it is exactly implementable only for
discrete state and action spaces, and otherwise requires a
discretization of the state space. Its computational complexity grows
exponentially in the state dimension. This ”curse of dimensionality”,
a phrase coined by Richard Bellman, unfortunately makes exact DP
impossible to appy to systems with larger state dimensions.
• Classical MPC does circumvent this problem by restricting itself to
finding only the optimal trajectory that starts at the current state s0.
• Explicit MPC suffers from the same curse of dimensionality as DP.
E. Koyuncu (ITU) RL and ObC Lecture 1 24 / 34
Table of Contents

1 Optimal Control and RL

2 Adaptive Control

3 Reinforcement Learning

4 RL Applications

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 25 / 34

Reinforcement Learning History

Historical highlights
• Exact DP, Optimal Control - Bellman, Shannon, others 1950s
• AI/RL and Decision Making ideas - late 80s and early 90s
• Backgammon programs - Tesauro, 1992
• Algorithm era, analysis, applications, books - mid 90s
• Machine Learning, Big Data, Neural Networks - mid 2000s
• AlphaGo and AlphaZero - Deepmind, 2016, 2017
• DARPA AlphaDogFight against real F-16 pilots - 2019, 2020

E. Koyuncu (ITU) RL and ObC Lecture 1 26 / 34

Multiagent Reinforcement Learning

OpenAI Hide and Seek game with emergent behaviours

https://openai.com/blog/emergent-tool-use

https://www.youtube.com/watch?v=kopoLzvh5jY

E. Koyuncu (ITU) RL and ObC Lecture 1 27 / 34

RL-based Strategical War Gaming

• Survivability based Optimal Air Combat Mission Planning with Reinforcement Learning, IEEE Conference on Control
Technology and Applications (CCTA), Copenhagen, Denmark, August 21-24, 2018, Baspinar, B., Koyuncu, E.,

E. Koyuncu (ITU) RL and ObC Lecture 1 28 / 34

RL-based Tactical Air Combat

• Assessment of Aerial Combat Game via Optimization-Based Receding Horizon Control, IEEE Access, vol. 8, pp.
35853-35863, 2020, doi: 10.1109/ACCESS.2020.2974792 Baspinar, B., Koyuncu, E.,
• Evaluation of Two-vs-One Air Combats Using Hybrid Maneuver-Based Framework and Security Strategy Approach,
Journal of Aeronautics and Space Technologies, v. 12-1, pg. 95-107, January 2019 Baspinar, B., Koyuncu, E.,
• Differential Flatness-based Optimal Air Combat Maneuver Strategy Generation, AIAA Science and Technology Forum
and Exposition (AIAA SciTech 2019), San Diego, California, 7-11 January 2019 Baspinar B., Koyuncu E.,
• Aerial Combat Simulation Environment for One-on-One Engagement, AIAA SciTech Forum and Exposition: Modelling
and Simulation Technologies, Gaylord Palms, Kissimmee, FL, 8-12 January 2018 Baspinar, B., Koyuncu, E.,

E. Koyuncu (ITU) RL and ObC Lecture 1 29 / 34

RL-based Fast Flight Replanning

https://www.youtube.com/watch?v=8IiLQFQ3V0E
• A Dynamically Feasible Fast Replanning Strategy with Deep Reinforcement Learning, Journal of Intelligent and Robotic
Systems, v. 101, issue 1, 2021 Hasanzade, M., Koyuncu, E.,

E. Koyuncu (ITU) RL and ObC Lecture 1 30 / 34

Table of Contents

1 Optimal Control and RL

2 Adaptive Control

3 Reinforcement Learning

4 RL Applications

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 31 / 34

Course Topics

• Introduction; Optimal Control; Adaptive Control and RL

• RL and Optimal Control of Discrete Systems
• RL-based Optimal Adaptive Control for Linear Systems
• RL-based Optimal Adaptive Control for Nonlinear Systems
• Policy iteration for continuous-time systems
• Value iteration for continuous-time systems
• RL-based Optimal Adaptive Control with Online Learning
• Online Learning for Zero-sum Games and H-infinity Control
• Online Learning for mutiplayer non-zero-sum Games
• RL for Zero-sum Games

E. Koyuncu (ITU) RL and ObC Lecture 1 32 / 34

Grading Policy

• 20% Paper abstract - problem selection and presentation, in class,

Due date is April 15.
• 40% Submission ready paper - 6 pages, including coding
implementation, in IFAC CPHS template - Due date is May 15, strict.
• 40% Paper presentation, including coding implementation - online, in
final exam week.
• 1 to 3 people groups

E. Koyuncu (ITU) RL and ObC Lecture 1 33 / 34

IFAC CPHS 2024, Antalya Turkey

E. Koyuncu (ITU) RL and ObC Lecture 1 34 / 34

02 - Dynamic Programming and LQR
No ratings yet
02 - Dynamic Programming and LQR
25 pages
Adaptive DP For Discrete Time LQR Optimal Tracking Control Problems With Unknown Dynamics
No ratings yet
Adaptive DP For Discrete Time LQR Optimal Tracking Control Problems With Unknown Dynamics
6 pages
16 - Optimal Control of Unknown Parameter Systems
No ratings yet
16 - Optimal Control of Unknown Parameter Systems
3 pages
Lecture 4 Control
No ratings yet
Lecture 4 Control
23 pages
Linear Quadratic Regulator
0% (1)
Linear Quadratic Regulator
52 pages
Optimal Control 2018 Souanef
No ratings yet
Optimal Control 2018 Souanef
15 pages
OCDM2223 Tutorial7solved
No ratings yet
OCDM2223 Tutorial7solved
5 pages
Optimal Control and LQR Guide
No ratings yet
Optimal Control and LQR Guide
35 pages
Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
No ratings yet
Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
53 pages
Robust Design of Linear Control Laws For Constrained Nonlinear Dynamic Systems
No ratings yet
Robust Design of Linear Control Laws For Constrained Nonlinear Dynamic Systems
6 pages
Digital Control SS7
No ratings yet
Digital Control SS7
11 pages
Woolseylecture 1
No ratings yet
Woolseylecture 1
4 pages
Implementation of A Low-Cost Prototype of Twin Rot
No ratings yet
Implementation of A Low-Cost Prototype of Twin Rot
7 pages
A2 Linear-Quadratic Optimal Control
No ratings yet
A2 Linear-Quadratic Optimal Control
8 pages
Unesco - Eolss Sample Chapters: Optimal Linear Quadratic Control
No ratings yet
Unesco - Eolss Sample Chapters: Optimal Linear Quadratic Control
12 pages
Stochastic Feedback Controller Design Considering The Dual Effect
No ratings yet
Stochastic Feedback Controller Design Considering The Dual Effect
13 pages
Linear Systems and Optimal Control Condensed Notes: J. A. Mcmahan JR
No ratings yet
Linear Systems and Optimal Control Condensed Notes: J. A. Mcmahan JR
22 pages
Linear-Quadratic Regulator (LQR) - Wikipedia
100% (1)
Linear-Quadratic Regulator (LQR) - Wikipedia
4 pages
Model-Free RL for Linear Quadratic Control
No ratings yet
Model-Free RL for Linear Quadratic Control
16 pages
BBMbook Cambridge Newstyle
No ratings yet
BBMbook Cambridge Newstyle
373 pages
Data Driven Control of Large Scale Systems (1) 240720 220740
No ratings yet
Data Driven Control of Large Scale Systems (1) 240720 220740
6 pages
Riccati Equation
No ratings yet
Riccati Equation
16 pages
Inno2024 EMT4203 CONTROL II NOTES R6
No ratings yet
Inno2024 EMT4203 CONTROL II NOTES R6
9 pages
2017 - On The Sample Complexity of The Linear Quadratic Regulator
No ratings yet
2017 - On The Sample Complexity of The Linear Quadratic Regulator
43 pages
Linear Quadratic Regulator Guide
No ratings yet
Linear Quadratic Regulator Guide
34 pages
Linear-Quadratic Optimal Control With Integral Quadratic Constraints
No ratings yet
Linear-Quadratic Optimal Control With Integral Quadratic Constraints
14 pages
Optimal Control - Wikipedia
No ratings yet
Optimal Control - Wikipedia
12 pages
Optimal Control and Quadratic Optimization
No ratings yet
Optimal Control and Quadratic Optimization
23 pages
2018-Regret Bounds For Robust Adaptive Control of The Linear Quadratic Regulator
No ratings yet
2018-Regret Bounds For Robust Adaptive Control of The Linear Quadratic Regulator
47 pages
Linear Quadratic Control Analysis
No ratings yet
Linear Quadratic Control Analysis
4 pages
Minimax Control for Positive Systems
No ratings yet
Minimax Control for Positive Systems
26 pages
MCS 5. Optimal Control
No ratings yet
MCS 5. Optimal Control
46 pages
Optimal Control Lecture Notes
No ratings yet
Optimal Control Lecture Notes
233 pages
Optimal Control Exercises Guide
100% (2)
Optimal Control Exercises Guide
79 pages
MIT6 832s09 Read ch10
No ratings yet
MIT6 832s09 Read ch10
8 pages
Kamala Pur Kar 2016
No ratings yet
Kamala Pur Kar 2016
11 pages
Robotics: Control Theory
No ratings yet
Robotics: Control Theory
54 pages
Linear Quadratic Regulator: Presented By: S.M.Mounesh (21011A0253
No ratings yet
Linear Quadratic Regulator: Presented By: S.M.Mounesh (21011A0253
34 pages
Model Based Output Difference Feedback Optimal Control
No ratings yet
Model Based Output Difference Feedback Optimal Control
6 pages
Optimal and Robust Control
No ratings yet
Optimal and Robust Control
216 pages
Chapter One: 1.1 Optimal Control Problem
No ratings yet
Chapter One: 1.1 Optimal Control Problem
25 pages
Statement 2: Under The Assumptions of This Theorem For Any
No ratings yet
Statement 2: Under The Assumptions of This Theorem For Any
7 pages
15++ Control Óptimo
No ratings yet
15++ Control Óptimo
11 pages
Borelli Predictive Control PDF
No ratings yet
Borelli Predictive Control PDF
424 pages
Linear Quadratic Tracking Control of Unknown Systems
No ratings yet
Linear Quadratic Tracking Control of Unknown Systems
10 pages
Class 4
No ratings yet
Class 4
4 pages
Kybernetika 39-2003-4 6
No ratings yet
Kybernetika 39-2003-4 6
11 pages
OPTCON LQ Optimal Control 2024-10-16
No ratings yet
OPTCON LQ Optimal Control 2024-10-16
13 pages
LQR, Controllability & Observability
No ratings yet
LQR, Controllability & Observability
6 pages
Optimal Control for Engineers
No ratings yet
Optimal Control for Engineers
4 pages
Introduction to Optimal Control
No ratings yet
Introduction to Optimal Control
135 pages
Methods of Linear Control Theory
No ratings yet
Methods of Linear Control Theory
20 pages
Optimal Control and Decision Making: Eexam
No ratings yet
Optimal Control and Decision Making: Eexam
18 pages
Learning-Based Control of Continuous-Time Systems Using Output Feedback
No ratings yet
Learning-Based Control of Continuous-Time Systems Using Output Feedback
8 pages
Unreachable Setpoints in MPC
No ratings yet
Unreachable Setpoints in MPC
7 pages
LQR Control for Engineers
No ratings yet
LQR Control for Engineers
5 pages
06722294
No ratings yet
06722294
6 pages
COVID-19 Research & Resources Hub
No ratings yet
COVID-19 Research & Resources Hub
164 pages
B1 Speaking Exam Questions Guide
No ratings yet
B1 Speaking Exam Questions Guide
6 pages
What Is Death?
No ratings yet
What Is Death?
2 pages
Swinburne's Test of DC Machine - Electrical4u
0% (1)
Swinburne's Test of DC Machine - Electrical4u
7 pages
The SARS-CoV-2 Genome, Its Variants and Their Various Way of Immunization
No ratings yet
The SARS-CoV-2 Genome, Its Variants and Their Various Way of Immunization
7 pages
Lecture 5 Equillibrium of Forces
No ratings yet
Lecture 5 Equillibrium of Forces
41 pages
Updated Academic Calendar B 41538
No ratings yet
Updated Academic Calendar B 41538
2 pages
Grammatical Gender and Number Agreement in Spanish: An ERP Comparison
No ratings yet
Grammatical Gender and Number Agreement in Spanish: An ERP Comparison
17 pages
Wyse 5070 Technical Guidebook PDF
100% (1)
Wyse 5070 Technical Guidebook PDF
27 pages
Grammar Vocab
No ratings yet
Grammar Vocab
4 pages
6 - Glaucoma
No ratings yet
6 - Glaucoma
34 pages
Invoice Template
No ratings yet
Invoice Template
14 pages
HandOn ProcurementCycle-GeM PDF
No ratings yet
HandOn ProcurementCycle-GeM PDF
46 pages
Resume - Jenica David
No ratings yet
Resume - Jenica David
1 page
1 19 Loadrunner (Controller Module) Interview Questions 43 Q. 1: What Is The Purpose of Using HP - Loadrunner?
No ratings yet
1 19 Loadrunner (Controller Module) Interview Questions 43 Q. 1: What Is The Purpose of Using HP - Loadrunner?
17 pages
Friction and Automobile Tires
No ratings yet
Friction and Automobile Tires
3 pages
Merc 175xr Sport Jet Manual - 10157970
No ratings yet
Merc 175xr Sport Jet Manual - 10157970
64 pages
EE PPT
No ratings yet
EE PPT
63 pages
How To Handrub Poster 1
No ratings yet
How To Handrub Poster 1
1 page
Boswellia Propagation Somaliland 5 - Google
No ratings yet
Boswellia Propagation Somaliland 5 - Google
23 pages
Two-Way Screenarray Cinema Loudspeakers: Key Features
No ratings yet
Two-Way Screenarray Cinema Loudspeakers: Key Features
3 pages
Walker
No ratings yet
Walker
1 page
Latestlog Old
No ratings yet
Latestlog Old
73 pages
Confidential: Main Document
No ratings yet
Confidential: Main Document
3 pages
Inayat Sharif Maner CV - 2
No ratings yet
Inayat Sharif Maner CV - 2
3 pages
Biobased Surfactants Synthesis Properties and Applications Second Edition Ashby Sample
100% (4)
Biobased Surfactants Synthesis Properties and Applications Second Edition Ashby Sample
167 pages
MDB Lesson 4 Shear Deformation, Poisson's Ratio, Thermal Deformation
100% (1)
MDB Lesson 4 Shear Deformation, Poisson's Ratio, Thermal Deformation
15 pages
Mobile Application UNIT1
No ratings yet
Mobile Application UNIT1
45 pages
Nokia Dissertation
100% (1)
Nokia Dissertation
90 pages
2025051673
No ratings yet
2025051673
37 pages

Advanced Control Systems Lecture

Uploaded by

Advanced Control Systems Lecture

Uploaded by

Reinforcement Learning and Optimization-based

Assoc. Prof. Dr. Emre Koyuncu

Department of Aeronautics Engineering

E. Koyuncu (ITU) RL and ObC Lecture 1 1 / 34

1 Optimal Control and RL

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 2 / 34

1 Optimal Control and RL

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 3 / 34

E. Koyuncu (ITU) RL and ObC Lecture 1 4 / 34

with weighting matrices Q ≥ 0, R ≥ 0.

E. Koyuncu (ITU) RL and ObC Lecture 1 6 / 34

• under stabilizabiltiy and detectability conditions there is a unique

The LQ-ZS games have following linear dynamics

where the state x(t) ∈ Rn , control input u(t) ∈ Rm , and disturbance

with the control weighting matrix R = R T > 0 and a scalar λ > 0.

E. Koyuncu (ITU) RL and ObC Lecture 1 8 / 34

V ∗ (x(0)) = min max J(x(0), u, d)

E. Koyuncu (ITU) RL and ObC Lecture 1 10 / 34

1 Optimal Control and RL

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 11 / 34

Consider the simple scalar case

E. Koyuncu (ITU) RL and ObC Lecture 1 12 / 34

E. Koyuncu (ITU) RL and ObC Lecture 1 13 / 34

Tune the controller parameters online. E.g., using Lyapunov techniques,

E. Koyuncu (ITU) RL and ObC Lecture 1 14 / 34

1 Optimal Control and RL

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 15 / 34

• policy evaluation executed by the critic

• Prediction: For a given policy, estimate state and state/action value

E. Koyuncu (ITU) RL and ObC Lecture 1 19 / 34

minimizes over {uk , ..., uN−1 } the cost-to-go from k to N

Then the tail optimal control sequence {uk∗ , ..., uN−1

Can assign infinite cost to infeasible points, using extended reals

PN−1 s.t. s0 = s̄0

E. Koyuncu (ITU) RL and ObC Lecture 1 22 / 34

• Due to model error model-free methods often achieve better policies

E. Koyuncu (ITU) RL and ObC Lecture 1 23 / 34

• Exact Dynamic Programming is an elegant and powerful way to solve

1 Optimal Control and RL

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 25 / 34

E. Koyuncu (ITU) RL and ObC Lecture 1 26 / 34

OpenAI Hide and Seek game with emergent behaviours

E. Koyuncu (ITU) RL and ObC Lecture 1 27 / 34

E. Koyuncu (ITU) RL and ObC Lecture 1 28 / 34

E. Koyuncu (ITU) RL and ObC Lecture 1 29 / 34

E. Koyuncu (ITU) RL and ObC Lecture 1 30 / 34

1 Optimal Control and RL

5 About this Course

E. Koyuncu (ITU) RL and ObC Lecture 1 31 / 34

• Introduction; Optimal Control; Adaptive Control and RL

E. Koyuncu (ITU) RL and ObC Lecture 1 32 / 34

• 20% Paper abstract - problem selection and presentation, in class,

E. Koyuncu (ITU) RL and ObC Lecture 1 33 / 34

E. Koyuncu (ITU) RL and ObC Lecture 1 34 / 34

You might also like