0% found this document useful (0 votes)
29 views8 pages

Policy Iterations On The Hamilton-Jacobi-Isaacs Equation For $H - (/infty) $ State Feedback Control With Input Saturation

Uploaded by

Bảo Sơn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views8 pages

Policy Iterations On The Hamilton-Jacobi-Isaacs Equation For $H - (/infty) $ State Feedback Control With Input Saturation

Uploaded by

Bảo Sơn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/3032597

Policy Iterations on the Hamilton–Jacobi–Isaacs Equation for $H_{\infty}$


State Feedback Control With Input Saturation

Article in IEEE Transactions on Automatic Control · January 2007


DOI: 10.1109/TAC.2006.884959 · Source: IEEE Xplore

CITATIONS READS
200 1,173

3 authors, including:

Murad Abu-Khalaf

51 PUBLICATIONS 4,988 CITATIONS

SEE PROFILE

All content following this page was uploaded by Murad Abu-Khalaf on 29 March 2013.

The user has requested enhancement of the downloaded file.


IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 51, NO. 12, DECEMBER 2006 1989

[8] J. C. Geromel and R. H. Korogui, “Matrix quadratic polynomials with


application to robust stability analysis,” in Proc. 5th IFAC Symp. Robust
Control Design, Toulouse, France, 2006.
[9] D. Henrion, D. Arzelier, D. Peaucelle, and J. B. Lasserre, “On param-
eter-dependent Lyapunov functions for robust stability of linear sys-
tems,” in Proc. 43rd IEEE Conf. Decision Control, Paradise Island, Ba-
hamas, 2004, pp. 887–892.
[10] V. J. S. Leite and P. L. D. Peres, “An improved LMI condition for
robust D-stability of uncertain polytopic systems,” IEEE Trans. Autom.
Control, vol. 48, no. 3, pp. 500–504, Mar. 2003.
[11] D. G. Luenberger, Introduction to Dynamic Systems : Theory, Models
and Applications. New York: Wiley, 1979.
[12] M. C. de Oliveira, J. Bernussou, and J. C. Geromel, “A new discrete-
time robust stability condition,” Syst. Control Lett., vol. 37, no. 4, pp.
261–265, 1999.
[13] D. Peaucelle, D. Arzelier, O. Bachelier, and J. Bernussou, “A new ro-
bust D-stability condition for real convex polytopic uncertainty,” Syst.
Control Lett., vol. 40, no. 5, pp. 21–30, 2000.
[14] D. Ramos and P. L. D. Peres, “An LMI condition for the robust stability
Fig. 1. Degree of stability. of uncertain continuous-time linear systems,” IEEE Trans. Autom. Con-
trol, vol. 47, no. 4, pp. 675–678, Apr. 2002.
[15] A. Trofino and C. E. de Souza, “Biquadratic stability of uncertain linear
gain K . However, for d slightly greater than four the quadratic sta- systems,” IEEE Trans. Autom. Control, vol. 46, no. 8, pp. 1303–1307,
bility constraints become infeasible. In the same figure, the solid line Aug. 2001.
shows that the conditions provided by Theorem 4 are always feasible
for all 0  d  11:62. Hence, in this example, the stability conditions
provided by Theorem 4 perform significantly better than the quadratic
stability conditions. For d = 11:62, the state feedback gain
Policy Iterations on the Hamilton–Jacobi–Isaacs Equation
K = [ 017:9999 049:5487 01:9946 04:1078 ] (31) for State Feedback Control With Input Saturation
assures that A(K ) is Hurwitz stable but this fact can not be tested by Murad Abu-Khalaf, Frank L. Lewis, and Jie Huang
means of a common Lyapunov function.

VI. CONCLUSION
Abstract—An H suboptimal state feedback controller for constrained
In this note, we have provided sufficient conditions to guarantee that input systems is derived using the Hamilton–Jacobi–Isaacs (HJI) equation
a matrix quadratic polynomial with an arbitrary although finite number of a corresponding zero-sum game that uses a special quasi-norm to encode
of variables has invariant sign in a simplex. We believe that the reported the constraints on the input. The unique saddle point in feedback strategy
conditions are useful in matrix analysis and in particular on the deter- form is derived. Using policy iterations on both players, the HJI equation is
broken into a sequence of differential equations linear in the cost for which
mination of robust performance bounds for problems involving poly- closed-form solutions are easier to obtain. Policy iterations on the distur-
topic parameter uncertainties. We have shown theoretically that several bance are shown to converge to the available storage function of the as-
sufficient robust stability conditions available in the literature to date sociated L -gain dissipative dynamics. The resulting constrained optimal
are special cases of the proposed ones. In our opinion, one of the main control feedback strategy has the largest domain of validity within which
contributions of this note is a new matrix gain parametrization which is L -performance for a given is guaranteed.
used to design stabilizing state feedback controllers from the proposed Index Terms—Controller saturation, H control, policy iterations,
stability conditions and which is sufficiently general to cope with other zero-sum games.
classes of robust control design problems. This aspect and others in-
cluding discrete time systems stability, performance analysis and con-
trol design are now under investigation. I. INTRODUCTION
In this note, we derive the Hamilton–Jacobi–Isaacs (HJI) equation
REFERENCES for systems with input constraints and then develop an algorithm based
[1] P. Apkarian and H. D. Tuan, “Parameterized LMIs in control theory,” on policy iterations to solve the obtained HJI equation. Although the
SIAM Control Optim., vol. 38, no. 4, pp. 1241–1264, 2000.
[2] P. A. Bliman, “A convex approach to robust stability for linear systems
1
formulation of the nonlinear H control theory has been well devel-
with uncertain scalar parameters,” SIAM Control Optim., vol. 42, no. oped, [4], [5], [7], [11], [17], and [19], solving the corresponding HJI
46, pp. 2016–2042, 2004. equation remains a challenge. Several methods have been proposed to
[3] S. P. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan, Linear Matrix solve the HJI equation. When its solution is smooth, it can be deter-
Inequalities in System and Control Theory. Philadelphia, PA: SIAM,
1994.
[4] G. Chesi, A. Garulli, A. Tesi, and A. Vicino, “Polynomially param-
eter-dependent Lyapunov functions for robust stability of polytopic Manuscript received May 25, 2005; revised December 4, 2005, May 2, 2006,
systems: An LMI approach,” IEEE Trans. Autom. Control, vol. 50, no.
and June 2, 2006. Research supported by the National Science Foundation
3, pp. 365–379, Mar. 2005.
under Grant ECS-0501451 and by the Army Research Office under Grant
[5] M. Dettori and C. W. Scherer, “New robust stability and performance
conditions based on parameter dependent multipliers,” in Proc. IEEE W91NF-05-1-0314.
Conf. Decision Control, Sydney, Australia, 2000, pp. 4187–4192. M. Abu-Khalaf and F. L. Lewis are with the Automation and Robotics Re-
[6] P. Gahinet, P. Apkarian, and M. Chilali, “Affine parameter-dependent search Institute, The University of Texas at Arlington, Fort Worth, TX 76118
Lyapunov functions and real parametric uncertainty,” IEEE Trans. USA (e-mail: abukhalaf@arri.uta.edu; lewis@uta.edu).
Autom. Control, vol. 41, no. 3, pp. 436–442, Mar. 1996. J. Huang is with the Department of Automation and Computer-Aided En-
[7] J. C. Geromel, M. C. de Oliveira, and L. Hsu, “LMI characterization of gineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
structural and robust stability,” Linear Alg. Appl., vol. 285, pp. 69–80, (e-mail: jhuang@acae.cuhk.edu.hk).
1998. Digital Object Identifier 10.1109/TAC.2006.884959

0018-9286/$20.00 © 2006 IEEE


1990 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 51, NO. 12, DECEMBER 2006

mined directly by solving for the coefficients of the Taylor series ex- Definition 1: The available storage Va when it exists is the solution
pansion of the value function, as it has been proposed in [10]. In [17], of the optimal control problem
it was proven that there exist a sequence of policy iterations on the con-
trol input to pursue the smooth solution of the HJI equation. Later in T
[8], policy iterations on the disturbance input was suggested in addi-
V a (x ) = sup kz(t)k2 0 2 kd(t)k2 dt:
tion to policy iterations on the control input. However, the existence d (1);T 0
and stability of the disturbance policy iterations were not proven. 0
In this note, we have three objectives. First, prove the existence of
policy iterations on the disturbance input under certain assumptions
When the available storage Va  0 is smooth V 2 C 1 and T ! 1,
a

it solves the HJ equation


and show their convergence to the available storage function of the
associated dissipative closed-loop dynamics. Second, we give a formal
1
solution to the suboptimal H control problem of dynamical systems
with input constraints using a special quasi-norm to obtain a non-
Va f + 41 2 V a0 kk0 V a + h0 h = 0
x x Va (0) = 0: (3)

If in addition zero-state observability is assumed, then Va > 0 and has


quadratic zero-sum game and derive the corresponding HJI equation.
Third, policy iterations on both players, the control and disturbance
a certain domain of validity.
inputs, are used to solve for the optimal strategies of the nonquadratic
Definition 2: The domain of validity (DOV) of Va is the set of all
zero-sum game. The policy iterations method results in a sequence
of linear partial differential equations whose solutions are shown to
x satisfying (3), [9].
The next Lemma is taken from [14], [17] and is used later in the
converge to the game value function that solves the HJI. Two scalar
proof of Theorem 1.
Lemma 1: If (1) with d = 0 is asymptotically stable and in addition
examples are presented to illustrate the theory.
has an L2 0 gain < , and if the available storage is smooth, then the
A major contribution of this note is that the two-player policy iter-
ations scheme generates equations that are easier to solve compared
closed-loop dynamics
to the original HJI equation of the corresponding constrained input
zero-sum game. In [2], a neural network solution method is presented
to solve the linear partial differential equations resulting from the two-
player policy iterations.
x_ = f + 21 2 kk0 V a (4)
The two-player policy iterations scheme we present in this note is
a significant improvement on our earlier work in [1] where one-player is asymptotically stable. Moreover, one can find P (x) > 0 and "(x) >
policy iterations is used to solve the HJB equation appearing in con- 0 such that
strained input optimal control theory. The role of this note is to rig-

Px0 f + 41 2 P 0 kk0 P + h0 h + "(x) = 0


orously examine two-player policy iterations to zero-sum games that
also have in addition saturation constraints. Together both papers rig- x x (5)
orously demonstrate the role of policy iterations, a machine learning
approximate dynamic programming scheme that is well established in is satisfied locally around the origin.
computer science [13], to optimal control theory and zero-sum game Proof: See [14] and [17, eq. (85)].
theory. Equation (3) is nonlinear in Va (x), therefore in general it is hard if
Remark 1: Necessary conditions for the existence of smooth solu- not impossible to solve. In Theorem 1, policy iterations on d are used
tions of the HJI equation in the case of systems with no input constraints to break (3) into a sequence of equations that are linear in V (x). This
have been studied earlier by [11], [17]. Other lines of research study the type of policy iterations becomes Newton’s method to solve the Riccati
nonsmooth solutions of the HJI equation using the theory of viscosity equation
1
solutions, [5]. This notion of solutions was studied for the H control

A0 P + P A + 12 P KK 0 P + H 0 H = 0
problem in [4]. The results in this note are done under regularity as-
sumptions as done in [11], [17], and [1] for the HJB case. (6)

II. POLICY ITERATIONS AND THE AVAILABLE STORAGE Va (x) that appears in the Bounded Real Lemma problem for linear systems
[15], [20].
Consider the system described by
Theorem 1: Let the system (1) be zero-state observable, locally
asymptotically stable with d = 0, and in addition has an L2 0 gain <
x_ = f (x) + k(x)d . Assume that the available storage is a smooth function V 3 > 0 2
z = h (x ) (1) C 1 with a DOV 3 . Then starting with d0 = 0 and assuming that
8i V i 2 C 1 , there exists a sequence of policies resulting from itera-
where f (0) = 0, d(t) is a disturbance, and z (t) is a fictitious output. tions between (7) and (8)
x = 0 is assumed to be an equilibrium point of the system. It is said
that (1) has an L2 0 gain  ,  0, if 0 + kdi ) + h0 h 0 2 kdi k2 = 0
Vxi (f (7)
di =
1 k0 V i01
T T 2 2 x (8)

kz(t)k2 dt  2 kd(t)k2 dt (2)


such that x_ = f + kd i
is locally asymptotically stable 8i. Moreover
0 0
for all T  0 and all d 2 L2 (0; T ), with x(0) = 0. Dynamical
systems that are finite L2 -gain stable are said to be dissipative, [19]. i!1) sup jV 0 V 3 j ! 0
i

x 2
The existence of the so-called available storage function is essential in
determining whether or not a system is dissipative. with 0 < V i (x) V i +1 (x) 8x 2 3 and i
 01 .i
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 51, NO. 12, DECEMBER 2006 1991

Proof: Assume that there is di such that x_ = f + kdi is asymp- that x_ = f + kdi is stable 8i. To show uniform convergence of V i to
totically stable. Then V 3 , note that

1 Vxi+1 (f
0 + kdi+1 ) = 0 h0 h + 2 kdi+1 k2
V i (x0 ) = h0 h 0
1 i01 0
0 i01 dt 0 0
4 2 Vx kk Vx (9)
Vxi f = 0 Vxi kdi 0 h0 h + 2 kdi k2
0 0 0
V i k = 2 2 di+1 :
x
is well defined and its infinitesimal version is
By integrating V i and V i+1 over the state trajectory of x_ = f + kdi+1
0 f + 1 kk0 V i01 = 0h0 h + 1 V i01 0 kk0 V i01 for x0 2 i ^ i+1 . It follows that
Vxi 2 2 x
2 x x
4 (10)

V i+1 (x0 ) 0 V i (x0 )


from which one may note that i  i01 . Adding and subtracting 1
terms to (5) in Lemma 1, one has =0 V_ i+1 (x0 ) 0 V_ i (x0 ) dt
0
1 1 0 1
Px0 f + 2 kk0 Vxi01 = 0h0 h 0 "(x) + 2 Vxi01 kk0 Vxi01 = 0 + kdi+1 ) 0 V i+1 0 (f + kdi+1 ) dt
Vxi (f
2 4 x
1 i01 0 0 i01 0
04 2 Px 0 V x kk Px 0 V x : (11) 1
= 2 kdi k2 + 2di+1 0 (di+1 0 di ) 0 kdi+1 k2 dt
Combining (10) and (11), it follows that 0
1
= 2 kdi+1 0 di k2 dt  0
0
Px 0 Vxi + 21 2 kk0 Vxi01
f 0
0
= 0"(x) 0 41 2 Px 0 Vxi01 kk0 Px 0 Vxi01 < 0: (12) and hence pointwise convergence to the solution of (3) follows. Since
3 is compact, uniform convergence of V i to V 3 on 3 follows from
Dini’s theorem, [3]. Finally, zero-state observability guarantees that
Since the vector field x_ = f + kdi is locally asymptotically stable,
V 0 (x ) > 0 .
it follows that P 0 V i > 0 is a Lyapunov function. To show local
Note that for linear systems, 8i V i (x) is a quadratic function.
asymptotic stability of x_ = f + kdi+1 , differentiating V i over the
Lemma 2: Let the L2 -gain of (1) be 3 with 1  2 > 3 . If the
trajectories of x_ = f + kdi+1 and noting (10), one has
available storages V 2 C 1 and V 2 C 1 solve (3) with 1 and 2 ,
respectively, then V  V with  .
Vxi
0 f + 1 kk0 V i = 0h0 h + 1 V i 0 kk0 V i Proof: Since for 2 , the available storage V 3 satisfies
22 x 42 x x
0
+ 1 V i 0 V i01 kk0 V i 0 V i01
42 Vx3 0 f + 41 2 Vx3 0 kk0 Vx3 + h0 h = 0:
x x x x (13)
2
and similar to (11), one has
Replacing 2 with 1 one has

Px0 f + 21 2 kk0 Vxi = 0h0 h 0 "(x) + 41 2 Vxi 0 kk0 Vxi


0
Vx3 0 f + 41 2 Vx3 0 kk0 Vx3 + h0 h  0:
0 41 2 Px 0 Vxi kk0 Px 0 Vxi : (14)
1
V3 is now a possible storage function for (1) with gain 1 . Therefore,
V3  V 3 and  .
Using P 0V i > 0 as a Lyapunov function candidate and differentiating
it along the trajectories of x_ = f + kdi+1 , one obtains by combining
The following example illustrates the policy iterations theory to
solve for the available storage.
(13) and (14)
Example 1: Consider the nonlinear system

0
Px 0 Vxi + 21 2 kk0 Vxi
f x_ = 0x3 + d z = x3 : (15)
0
= 0"(x) 0 41 2 Px 0 Vxi kk0 Px 0 Vxi The corresponding HJ equation is
0
0 41 2 Vxi 0 Vxi01 kk0 Vxi 0 Vxi01
V x (0 x 3 ) +
1 2 6
< 0: 4 2 Vx + x = 0 (16)

Hence, x_ = f + kdi+1 is locally asymptotically stable. Starting with The available storage is V (x) = 2 2 (1 0 (1 0 02 )1=2 )x4 =4. Note
d0  0, and by asymptotic stability of x_ = f , it follows by induction that the available storage cease to exists for < 1. Hence the L2 -gain
1992 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 51, NO. 12, DECEMBER 2006

is equal to 1. Note that the closed-loop dynamics with d = (1 0 (1 0 Note that this is a challenging constrained optimization since the min-
02 )1=2 )x3 is imization of the Hamiltonian with respect to u is constrained, u 2 U .
To confront this constrained optimization problem, we propose the use
of a quasi-norm to transform the constrained optimization problem (23)
x_ = 0(1 0 02 )1=2 x3 (17) into

and hence it is asymptotically stable for > 1. 1


V i (x) = pi x4
V 3 (x0 ) = min max h0 h + kukq2 0 kdk dt
To solve the HJ (16) by policy iterations, note that 2 2
with pi a constant. Hence, (7) gives u d
(24)
0

4pi x3 0x3 + 22 pi01 x3 + x6 0 2 2 p x3 2 = 0


2 i01
(18) where applying the stationarity conditions for minimizing u becomes
direct. In this case, kukq2 is defined for u 2 U . See [1] and [16] for
similar work done in the framework of HJB equations.
which is equivalent to Definition 3: A quasi-norm, k 1 kq , on a vector space X , has the
following properties:

pi (04 + 8 02 pi01 ) + 1 0 4 02 pi201 = 0: (19)


p k x k q = 0 , x = 0 ; k x + y kq  k x k q + k y k q ; k x k q = k 0 x k q :
For the case when = 2, V (x) =p(2 0 3)x4 . Iterating on (19) with
p0 = 0 converges to p1 = 2 0 3.
Note that for arbitrary f (x), k(x) and h(x), the analytical solution to This definition is weaker than the definition of a norm, in which the
the HJ (3) is not possible in general. However, one may use techniques third property is replaced by homogeneity, k xkq = j jkxkq 8 2 <,
such as neural networks to obtain a closed-form approximation of the [3]. A suitable quasi-norm to confront input saturation is
exact solution to (7) over a domain of the state–space [2].
u m u
III. L2 -GAIN OF NONLINEAR CONTROL SYSTEMS WITH INPUT kukq = 2
2
01 (v)dv = 2 01 (v)dv (25)
SATURATION 0 k=1 0

where kukq 2 C 1 is one to one and 01 is assumed to be monotoni-


Consider the following nonlinear system:

cally increasing. This implies the following Lemma.


Lemma 3: Let  belong to the domain of 01 ( ), if 01 ( ) is
6 : kx_ z=k2f=(xk)h+k2g+
(x)u + k(x)d;
ku k 2 (20) monotonically increasing in  , then [1]

where x 2 n , u 2 m , d 2 q , f (0) = 0, x = 0 is an equilibrium a


point of the system, z (t) is a fictitious output, d(t) 2 L2 [0; 1) is the 01 (v)dv 0 01 (b)0 (a 0 b) > 0 8a 6= b:
disturbance, and u(t) 2 U is the control with U defined as
b

U = fu(t) 2 L2 [0; 1)j 0  ui  i ; i = 1; . . . ; mg : An example is the use of (1) = tanh(1) when juj  1. In this case,
the range of (1) and the domain of 01 (u) is (01; 1) and, therefore,
i

satisfying the constraints.


In the L2 -gain problem, one is interested in u which for some pre-
scribed and x(0) = 0 renders
Substituting (25) in (24) implies

1 u 1
1 kz(t)k
h0 h + 2 01 (v)dv dt  2
kdk dt:
2
(26)
h 0 h + ku k 2 0 2
kdk dt
2
(21) 0 0 0
0

nonpositive for all d(t) 2 L2 (0; 1). In other words


IV. THE HJI EQUATION AND THE SADDLE POINT

1 1 Equation (24) is a zero-sum game with feedback strategy informa-


kz(t)k dt 
2 2
kd(t)k dt: 2
(22) tion structure for both players, [7]. It is shown in Lemma 4 that Isaacs’s
0 0
condition is satisfied and there is a unique saddle point solving the fi-
nite-horizon zero-sum game
It is well known, [7], that this problem is equivalent to the solvability
of the zero-sum game
T u
1 V 3 (x0 ; T )=min max h0 h +2 01 (v)dv 0 2
kdk dt:
2

V 3 (x0 ) = min max


u2U d
h 0 h + k u ( t) k 2 0 2
kdk dt
2
(23)
u d
0 0
0
(27)
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 51, NO. 12, DECEMBER 2006 1993

The Hamiltonian of the game (27) is Next, it is shown that (29) remains in saddle point equilibrium as
T ! 1 if they are sought among finite energy strategies. See [6] and
[12] for unconstrained policies.
u
Theorem 2: Suppose that there exists a V (x) 2 C 1 satisfying the
H (x; p; u; d) = p0 (f + gu + kd) + h
0h + 2 01 (v )dv 0 2
kdk 2 : HJI (33) and that
0
(28)
0 g g 0 Vx kk0 Vx
From Lemma 3, Isaacs’s condition follows as shown in the next 1 1
x_ = f + (34)
Lemma. 2 2 2
Lemma 4: For the Hamiltonian (28), Isaacs’s condition is satisfied
min max H = max min H . is locally asymptotically stable, then
u d d u
Proof: Applying the stationarity conditions @H=@u = 0,
@H=@d = 0 on (28) gives (29)
u 3 (x ) = 0  g 0 Vx d 3 (x ) =
1 1 0
k Vx (35)
2 2 2

2
01 (u3 ) + g(x)0 p = 0 ) u3 (x) = 0 1
g (x ) 0 p are in saddle point equilibrium for the infinite horizon game among
2
strategies u 2 U , d 2 L2 [0; 1).
d 3 (x ) = k(x)0 p:
1
(29) Proof: The proof is made by completing the squares
2 2

Defining
JT (u; d; x0 )
T
H 3 (x; p; u3 ; d3 ) = p0 f 0 201 (u3 )0 u3 + h0 h = h0 h + ku(t)kq2 0 2
kdk 2 dt
u 0
01 (v )dv + 0 kk0 p
1
+2 p (30) T
4 2
0 = h0 h + ku(t)kq2 0 2
kdk 2 dt
and rewriting (28) in terms of (30) gives 0
T
3
+ V (x 0 ) 0 V 3 (x T ) + V_ 3 dt
H (x; p; u; d) = H 3 (x; p; u3 ; d3 ) 0 2
kd 0 d 3 k 0
u 2 T
+2 01 (v )dv 0 01 (u3 )0 (u 0 u3 ) : = h0 h + ku(t)kq2 0 2
kdk 2 dt
u 0
T
which is valid expression for all d and all u 2 U . From Lemma 3, one
has
3
+ V (x 0 ) 0 V 3 (x T ) + Vx3 0 (f + gu + kd)dt
0
T u
H (x0 ; u3 ; d)  H (x0 ; u3 ; d3 )  H (x0 ; u; d3 ) (31) = 2 01 (v )dv 0 201 (u3 )0 (u 0 u3 )
0 u
and Isaacs’s condition follows.
Under regularity assumptions, from [7, Th. 2.6], there exists 0 2 kd 0 d 3 k 2 dt + V 3 (x0 ) 0 V 3 (xT ) (36)
V 3 (x0 ) 2 C 1 solving the HJI, then V (x0 ; u3 ; d)  V (x0 ; u3 ; d3 )  where V 3 solves (33). Since u(t), d(t) 2 L2 [0; 1), and since the
V (x0 ; u; d3 ) and the zero-sum game has a value and the pair of game has a finite value as T ! 1, this implies that x(t) 2 L2 [0; 1),
therefore x(t) ! 0, V 3 (x(1)) = 0, and
policies (29) are in saddle point equilibrium.
For the infinite horizon game, as T ! 1 in (27), one obtains the
following Isaacs equation:
1 u
J1 (u; d; x0 ) = V 3 (x0 ) + 2 01 (v )dv
H 3 (x; Vx ; u3 ; d3 ) = Vx0 (f + gu3 + kd3 ) + h0 h 0 u
u
0201 (u3 )0 (u 0 u3 ) 0 2 kd 0 d3 k2 dt: (37)
+2 01 (v )dv 0 2 kd3 k2
0 Using Lemma 3, u3 and d3 are in saddle point equilibrium in the class
=0 V (0) = 0: (32) of finite energy strategies.
Since (35) satisfies the Isaacs equation, it can be shown that the feed-
On substitution of (29) in (32), the HJI equation for constrained input back saddle point is unique in the sense that it is strongly time consis-
systems is obtained tent and noise insensitive [6].
Example 2: Consider the following nonlinear system
0( gV )
V 0 f 0 V 0 g g 0 Vx 0 01 (v )dv + x_ = 0 x3 + u + d; 01  u  1
1
x x +h h+2
2 u
0
kzk = 0 ln 1 0tanh (2x ) +2
2 2 3 01 (v)dv
1
Vx kk 0 0 Vx = 0 V (0) = 0: (33)
tanh (38)
4 2 0
1994 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 51, NO. 12, DECEMBER 2006

Note that h0 (x)h(x) = 0 ln[1 0 tanh2 (2x3 )] > 0 and is monotoni- From Lemma 3, it follows that
cally increasing in x. It follows that the HJI (33) in this case is given by

0 = Vx (0x3 ) + Vx tanh(00:5Vx ) 0 fj +1 + h0 h + 2
u
1 V 0 kk0 V  0
tanh(00:5V ) Vxj 01 (v )dv +
4 2 xj xj
+2 tanh01 (v)dv + 41 2 Vx2 0
0 with Vj is a possible storage for x_ = fj +1 which by zero-state observ-
0 ln 1 0 tanh2 (2x3 ) ability is asymptotically stable and the available storage for x_ = fj +1
0 = Vx (0x3 ) + Vx tanh(00:5Vx ) is such that Vj +1  Vj .
+ 2 tanh(00:5Vx )tanh01 (tanh(00:5Vx )) Theorem 3: Assume that the value function of the game
is smooth V 3 2 C 1 and solves (33) with the property that
+ ln 1 0 tanh2 (00:5Vx ) + 41 2 Vx2 x_ = f 0 g((1=2)g 0 Vx3 ) + (1=2 2 )kk0 Vx3 is asymptotically
stable. Assume also that 8j x_ = fj is asymptotically stable with
0 ln 1 0 tanh2 (2x3 ) Vj 2 C 1 solving (41) and x_ = f + guj + (1=2 2 )kk0 Vxj is asymp-
0 = Vx (0x3 ) + ln 1 0 tanh2 (00:5Vx) + 41 2 Vx2 totically stable. Then j ! 1 ) sup jVj 0 V 3 j ! 0. Moreover,
x2
V 3 has the largest DOV of any other constrained controller that has
0 ln 1 0 tanh2 (2x3 ) : (39) an L2 0 gain < .
Assume that = 1, then the available storage of the HJI equation exists Proof: From Lemma 5 Vj +1  Vj . Hence, Vj converges point-
and is given by V (x) = x4 and the closed-loop dynamics wise to V 3 and since 3 is compact, uniform convergence of Vj to V 3
on 3 follows by Dini’s theorem, [3]. Since Vj +1 is valid on j and,
hence, valid on 0 . Therefore, V 3 is valid for any 0 .
x_ = f 0 g 12 g0 Vx + 21 2 kk0 Vx The last part of Theorem 3 implies that u3 has the largest region
of asymptotic stability of any other constrained controller that is finite
= x3 0 tanh(2x3 ) (40) L2 -gain stable for a prescribed .
Combining Theorem 1 with Theorem 3, one obtains a two loop
is locally asymptotically stable and, hence, the L2 0 gain < 1. policy iterations solution method for the HJI (33). Specifically, select
Note that for arbitrary f (x), g (x), k(x) and h(x). Obtaining the an- uj , and find Vj that solves (41) by inner loop policy iterations on the
alytical solution to the HJI (33) is not possible in general. In the next disturbance as in Theorem 1 until Vj1 ! Vj by solving
section, a policy iterations technique as done in Section II is proposed
that reduces the solution of the HJI equation to an easier to solve iter-
u
ative equation similar to (7).
i
Vxj
0
(fj + kdi )+ h0 h +2 01 (v )dv 0 2 kdi k2 = 0: (42)
V. SOLVING THE HJI USING POLICY ITERATIONS 0
To solve (33) by policy iterations, we start by showing the existence Then, by Theorem 3, use uj +1 = 0((1=2)g 0 Vxj 1 ) in outer loop
and convergence of control policy iterations on the constrained input policy iterations on the constrained control.
similar to work done on systems with no input constraints in [17]. Then It is important to note that one may use techniques such as neural
policy iterations on both players are performed on the constrained con- networks to obtain a closed-form approximation of the exact solution
trol policy and disturbance policy. to (42) over a domain of the state–space. See [2] for a successful im-
Lemma 5: Assume that the closed-loop dynamics for the con- plementation to the nonlinear benchmark problem.
strained stabilizing controller uj Controllers derived using (33) for a fixed are suboptimal H1 con-
trollers. Optimal H1 are achieved for the lowest possible 3 for which
x_ = f (x) + g (x)uj + k (x )d  f j (x ) + k (x )d the HJI is solvable. It is straightforward to show that the DOV for
has an L2 0 gain < with the associated available storage Vj 2 C1 the game value functions V 3 and V 3 are such that 3  3 for
solving 1  2 > 3 with 3 being the smallest gain for which a stabilizing
u solution of the HJI (33) exists.
0 fj + h0 h + 2
Vxj 01 (v )dv +
1 V 0 kk0 V = 0:
4 2 xj xj (41)
VI. CONCLUSION
0
The constrained input HJI equation along with two players policy it-
Furthermore, assume that (20) is zero-state observable. Then, the up-
dated control policy uj +1 = 0((1=2)g 0 Vxj ) guarantees that the
erations provide a sequence of differential equations for which approx-
closed-loop dynamics x_ = fj +1 + kd will have an L2 0 gain  and
imate closed-form solutions are easier to obtain. The presented method
x_ = fj +1 is asymptotically stable. It also implies that if Vj +1 2 C 1 ,
can be combined with neural networks to obtain least squares solution
then Vj +1  Vj .
of the HJI equation therefore obtaining a practical method to derive
L2 -gain optimal, or suboptimal H1 , controllers of nonlinear systems
Proof: Note that
that are affine in input and with actuator saturation. The method re-
quires the problem to possess a smooth solution of the HJI equations.
u
0 fj +1 = 0h0 h 0 2
Vxj 01 (v )dv 0
1 V 0 kk0 V This is an extension to our earlier work on HJB equations [1].
4 2 xj xj
0 REFERENCES
u

+2 01 (v )dv 0 201 (uj +1 )0 (uj +1 0 uj ):


[1] M. Abu-Khalaf and F. L. Lewis, “Nearly optimal controls laws for non-
linear systems with saturating actuators using a neural network HJB
u approach,” Automatica, no. 5, pp. 779–791, 2005.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 51, NO. 12, DECEMBER 2006 1995

[2] M. Abu-Khalaf, F. L. Lewis, and J. Huang, “Neural network H state Residual Generation for Fault Diagnosis of Systems
feedback control with actuator saturation: the nonlinear benchmark Described by Linear Differential-Algebraic Equations
problem,” in Proc. 5th Int. Conf. Control Automation, Budapest, Hun-
gary, Jun. 2005, pp. 1–9.
Mattias Nyberg and Erik Frisk
[3] T. Apostol, Mathematical Analysis. Reading, MA: Addison-Wesley,
1974.
[4] J. Ball and W. Helton, “Viscosity solutions of Hamilton-Jacobi equa-
tions arising in nonlinear H -control,” J. Math. Syst., Estimat., Con-
trol, vol. 6, no. 1, pp. 1–22, 1996. Abstract—Linear residual generation for differential-algebraic equation
[5] M. Bardi and I. Capuzzo-Dolcetta, Optimal Control and Viscosity (DAE) systems is considered within a polynomial framework where a com-
Solutions of Hamilton-Jacobi-Bellman Equations. Boston, MA: plete characterization and parameterization of all residual generators is
Birkhauser, 1997. presented. Further, a condition for fault detectability in DAE systems is
[6] T. Başar and G. J. Olsder, Dynamic Noncooperative Game Theory, given. Based on the characterization of all residual generators, a design
2nd ed. Philadelphia, PA: SIAM, 1999, vol. 23, SIAM’s Classic in strategy for residual generators for DAE systems is presented. The design
Applied Mathematics. strategy guarantees that the resulting residual generator is sensitive to all
[7] T. Başar and P. Bernard, H Optimal Control and Related Minimax the detectable faults and also that the residual generator is of lowest pos-
Design Problems. Boston, MA: Birkhäuser, 1995. sible order. In all results derived, no assumption about observability or con-
[8] R. Beard and T. McLain, “Successive Galerkin approximation algo- trollability is needed. In particular, special care has been devoted to assure
rithms for nonlinear optimal and robust control,” Int. J. Control, vol. the lowest-order property also for non-controllable systems.
71, no. 5, pp. 717–743, 1998.
[9] G. Bianchini, R. Genesio, A. Parenti, and A. Tesi, “Global H con-
trollers for a class of nonlinear systems,” IEEE Trans. Autom. Control,
vol. 49, no. 2, pp. 244–249, Feb. 2004. I. INTRODUCTION
[10] J. Huang and C. F. Lin, “Numerical approach to computing nonlinear
H control laws,” J. Guid., Control, Dyna., vol. 18, no. 5, pp. Fault diagnosis consists of detecting and isolating faults acting on a
989–994, 1995.
[11] A. Isidori and A. Astolfi, “Disturbance attenuation and H -control
process. In many methods, e.g., structured residuals [1], the concept
of residuals play a central role. Commonly, a set of residuals is used
via measurement feedback in nonlinear systems,” IEEE Trans. Autom.
Control, vol. 37, no. 9, pp. 1283–1293, Sep. 1992. where different subsets of residuals are sensitive to different subsets of
[12] D. Jacobson, “On values and strategies for infinite-time linear quadratic faults and in this way isolation between faults is possible.
games,” IEEE Trans. Autom. Control, vol. 22, no. 3, pp. 490–491, Mar.
1977.
In this note, residual generation for models described by general
[13] J. Si, A. Barto, W. Powell, and D. Wunsch, Handbook of Learning and linear differential-algebraic equations (DAEs) is considered. Previous
Approximate Dynamic Programming. New York: Wiley-IEEE Press, works on residual generation have all considered more specific classes
2004. of models, i.e., transfer functions [1], [2], state-space models [3]–[5],
[14] H. Knobloch, A. Isidori, and D. Flockerzi, Topics in Control Theory.
Boston, MA: Springer-Verlag, 1993. or descriptor models e.g., [6], [7]. Since DAE models cover all these
[15] P. Lancaster and L. Rodman, Algebraic Riccati Equations. New classes of models, the methods presented in this note are applicable to
York: Oxford Univ. Press, 1995. all the three previous cases.
[16] S. E. Lyshevski, “Role of performance functionals in control laws de- In the context of residual generation, DAE-models are important be-
sign,” in Proc. Amer. Control Conf., 2001, pp. 2400–2405.
L
[17] A. J. Van Der Schaft, “ -gain analysis of nonlinear systems and non- cause they appear in large classes of engineering systems like elec-
linear state feedback H control,” IEEE Trans. Autom. Control, vol. trical systems, chemical processes, robotic manipulators, and mechan-
37, no. 6, pp. 770–784, Jun. 1992. ical systems. For example, in mechanical systems, differential equa-
[18] ——, L
-Gain and Passivity Techniques in Nonlinear Con-
tions arise from equations of motion while algebraic constraints model
trol. London, U.K.: Springer-Verlag, 1999.
[19] J. C. Willems, “Dissipative dynamical systems part I-II: linear systems geometrical constraints. Further, DAE-models are also the result when
with quadratic supplies,” Arch. Rational Mech. Anal., vol. 45, no. 1, pp. using a physically based object-oriented modelling approach [8].
321–393, 1972. The approach presented in this note is an extension of the previous
[20] K. Zhou and J. Doyle, Essentials of Robust Control. New York: Pren-
tice-Hall, 1997.
work [3] and one main contribution is a new method for designing
residual generators for DAE-models. The method finds residual gen-
erators of lowest possible order, and which are guaranteed to be sen-
sitive to detectable faults. Another main contribution is a criterion for
fault detectability in DAE-systems, i.e., a criterion that says if it is at
all possible to find any residual generator sensitive to a fault. A help in
developing these results, but also a contribution on its own, is that we
derive a characterization of all possible residual generators.
Previous works on residual generation for linear DAE systems have
all assumed that the model is in descriptor form. As said previously,
the models considered here [see (1)] are more general. However, they
can with a straightforward transformation be taken to the descriptor
form and, therefore, it makes sense to relate the present work to pre-
vious works dealing with descriptor models. For descriptor models, two

Manuscript received June 18, 2004; revised April 5, 2005, January 9, 2006,
and June 2, 2006. Recommended by Associate Editor M. Demetriou.
The authors are with the Department of Electrical Engineering, Linköping
University, SE-581 83 Linköping, Sweden (e-mail: matny@isy.liu.se;
frisk@isy.liu.se).
Digital Object Identifier 10.1109/TAC.2006.884960

0018-9286/$20.00 © 2006 IEEE

View publication stats

You might also like