0% found this document useful (0 votes)
38 views40 pages

Nash Equilibria For Stochastic Games With Asymmetric Information-Part 1: Finite Games

This document summarizes a research paper about modeling stochastic games where multiple controllers make decisions jointly but have access to different information about the state. The paper introduces a model of such games with asymmetric information and shows that under certain conditions, the game can be modeled equivalently as a game with symmetric information. This allows characterizing a class of Nash equilibria using a backward induction algorithm involving Bayesian Nash equilibria of one-stage games. The paper outlines the motivation, challenges, proposed approach, and key results of the research.

Uploaded by

iabemp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views40 pages

Nash Equilibria For Stochastic Games With Asymmetric Information-Part 1: Finite Games

This document summarizes a research paper about modeling stochastic games where multiple controllers make decisions jointly but have access to different information about the state. The paper introduces a model of such games with asymmetric information and shows that under certain conditions, the game can be modeled equivalently as a game with symmetric information. This allows characterizing a class of Nash equilibria using a backward induction algorithm involving Bayesian Nash equilibria of one-stage games. The paper outlines the motivation, challenges, proposed approach, and key results of the research.

Uploaded by

iabemp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

1

Nash Equilibria for Stochastic Games with


Asymmetric Information-Part 1: Finite Games
Ashutosh Nayyar, Abhishek Gupta, Cdric Langbort and Tamer Basar
arXiv:1209.3549v1 [cs.GT] 17 Sep 2012

Abstract

A model of stochastic games where multiple controllers jointly control the evolution of the state
of a dynamic system but have access to different information about the state and action processes
is considered. The asymmetry of information among the controllers makes it difficult to compute
or characterize Nash equilibria. Using common information among the controllers, the game with
asymmetric information is shown to be equivalent to another game with symmetric information. Further,
under certain conditions, a Markov state is identified for the equivalent symmetric information game
and its Markov perfect equilibria are characterized. This characterization provides a backward induction
algorithm to find Nash equilibria of the original game with asymmetric information in pure or behavioral
strategies. Each step of this algorithm involves finding Bayesian Nash equilibria of a one-stage Bayesian
game. The class of Nash equilibria of the original game that can be characterized in this backward manner
are named common information based Markov perfect equilibria.

Index Terms

Stochastic Games, Nash equilibrium, Markov Perfect Equilibrium, Backward Induction

I. I NTRODUCTION

Stochastic games model situations where multiple players jointly control the evolution of
the state of a stochastic dynamic system with each player trying to minimize its own costs.
Stochastic games where all players have perfect state observation are well-studied [1][5]. In
such games, the symmetry of information among players implies that they all share the same
uncertainty about the future states and future payoffs. However, a number of games arising in

The authors are with Coordinated Science Laboratory at the University of Illinois at Urbana-Champaign. Email:
{anayyar,gupta54,langbort,basar1}@illinois.edu

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


2

communication systems, queuing systems, economics, and in models of adversarial interactions


in control and communication systems involve players with different information about the state
and action processes. Due to the asymmetry of information, the players have different beliefs
about the current state and different uncertainties about future states and payoffs. As a result, the
analytical tools for finding Nash equilibria for stochastic games with perfect state observation
cannot be directly employed for games with asymmetric information.
In the absence of a general framework for stochastic games with asymmetric information,
several special models have been studied in the literature. In particular, zero-sum differential
games with linear dynamics and quadratic payoffs where the two players have different obser-
vation processes were studied in [6], [7], [8]. A zero sum differential game where one players
observation at any time includes the other players observation was considered in [9]. A zero-
sum differential game where one player has a noisy observation of the state while the other
controller has no observation of the state was considered in [10]. Discrete-time non-zero sum
LQG games with one step delayed sharing of observations were studied in [11], [12]. A one-step
delay observation and action sharing game was considered in [13]. A two-player finite game in
which the players do not obtain each others observations and control actions was considered
in [14] and a necessary and sufficient condition for Nash equilibrium in terms of two coupled
dynamic programs was presented.
Obtaining equilibrium solutions for stochastic games when players make independent noisy
observations of the state and do not share all of their information (or even when they have
access to the same noisy observation as in [15]) has remained a challenging problem for general
classes of games. Identifying classes of game structures which would lead to tractable solutions
or feasible solution methods is therefore an important goal in that area. In this paper, we identify
one such class of nonzero-sum stochastic games, and obtain characterization of a class of Nash
equilibrium strategies.
In stochastic games with perfect state observation, a subclass of Nash equilibria - namely
the Markov perfect equilibria- can be obtained by backward induction. The advantage of this
technique is that instead of searching for equilibrium in the (large) space of strategies, we only
need to find Nash equilibrium in a succession of static games of complete information.
Can a backward inductive decomposition be extended to games of asymmetric information?
The general answer to this question is negative. However, we show that there is a class of

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


3

asymmetric information games that are amenable to such a decomposition. The basic concep-
tual observation underlying our results is the following: the essential impediment to applying
backward induction in asymmetric information games is the fact that a players posterior beliefs
about the system state and about other players information may depend on the strategies used
by the players in the past. If the nature of system dynamics and the information structure of
the game ensures that the players posterior beliefs are strategy independent, then a backward
induction argument is feasible. We formalize this conceptual argument in this paper.
We first use the common information among the controllers to show that the game with
asymmetric information is equivalent to another game with symmetric information. Further, under
the assumption of strategy independence of posterior beliefs, we identify a Markov state for the
equivalent symmetric information game and characterize its Markov perfect equilibria using
backward induction arguments. This characterization provides a backward induction algorithm
to find Nash equilibria of the original game with asymmetric information. Each step of this
algorithm involves finding Bayesian Nash equilibria of a one-stage Bayesian game. The class
of Nash equilibria of the original game that can be characterized in this backward manner are
named common information based Markov perfect equilibria. For notational convenience, we
consider games with only two controllers. Our results extend to games with n > 2 controllers
in a straightforward manner.
Our work is conceptually similar to the work in [16]. The authors in [16] considered a model
of finite stochastic game with discounted infinite-horizon cost function where each player has a
privately observed state. Under the assumption that player is belief about other players states
depends only the current state of player i and does not depend on player is strategy, [16]
presented a recursive algorithm to compute Nash equilibrium. Both our model and our main
assumptions differ from those in [16].

A. Notation

Random variables are denoted by upper case letters; their realizations by the corresponding
lower case letters. Random vectors are denoted by upper case bold letters and their realizations by
lower case bold letters. Unless otherwise stated, the state, action and observations are assumed
to be vector valued. Subscripts are used as time index. Xa:b is a short hand for the vector
(Xa , Xa+1 , . . . , Xb ), if a > b, then Xa:b is empty. P() is the probability of an event, E() is the

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


4

expectation of a random variable. For a collection of functions g, Pg () and Eg () denote that


the probability/expectation depends on the choice of functions in g. Similarly, for a probability
distribution , E () denotes that the expectation is with respect to the distribution . The notation
1{a=b} denotes 1 if the equality in the subscript is true and 0 otherwise. For a finite set A, (A)
is the set of all probability mass functions over A. For two random variables (or random vectors)
X and Y , P(X = x|Y ) denotes the conditional probability of the event {X = x} given Y . This
is a random variable whose realization depends on the realization of Y .
When dealing with collections of random variables, we will at times treat the collection as a
random vector of appropriate dimension. At other times, it will be convenient to think of different
collections of random variables as sets on which one can define the usual set operations. For
example consider random vectors A = (A1 , A2 , A3 ) and A = (A1 , A2 ). Then, treating A and A
as sets would allow us to write A \ A = {A3 }.

B. Organization

The rest of this paper is organized as follows. We present our model of a stochastic game
with asymmetric information in Section II. We present several special cases of our model in
Section III. We prove our main results in Section IV. We extend our arguments to consider
behavioral strategies in Section V. We examine the importance of our assumptions in Section VI.
Finally, we conclude in Section VII.

II. T HE BASIC G AME G1

A. The Primitive Random Variables and the Dynamic System

We consider a collection of finitely-valued, mutually independent random vectors (X1 , W10 ,


W20 , . . . , WT0 1, W11 , W21, . . . , WT1 , W12 , W22, . . . , WT2 ) with known probability mass functions.
These random variables are called the primitive random variables.
We consider a discrete-time dynamic system with 2 controllers. For any time t, t = 1, 2, . . . , T ,
Xt Xt denotes the state of the system at time t, Uit Uti denotes the control action of
controller i, i = 1, 2 at time t. The state of the system evolves according to

Xt+1 = ft (Xt , U1t , U2t , Wt0 ). (1)

There are two observation processes: Yt1 Yt1 , Yt2 Yt2 , where

Yti = hit (Xt , Wti ), i = 1, 2. (2)

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


5

B. The Data Available to Controllers

At any time t, the vector Iit denotes the total data available to controller i at time t. The
vector Iit is a subset of the collection of potential observables of the system at time t, that
1 2
is, Iit {Y1:t , Y1:t , U11:t1 , U21:t1 }. We divide the total data into two components: private
information Pit and common information Ct . Thus, Iit = (Pit , Ct ). As their names suggest, the
common information is available to both controllers whereas private information is available
only to one controller. Clearly, this separation of information into private and common part can
always be done. In some cases, common or private information may even be empty. For example,
1
if I1t = I2t = {Y1:t 2
, Y1:t , U11:t1 , U21:t1 }, that is if both controllers have access to all observations
and actions, then Ct = I1t = I2t and P1t = P2t = . On the other hand, if Iit = Y1:t
i
, for i = 1, 2,
then Ct = and Pit = Iit . Games where are all information is common to both controllers are
referred to as symmetric information games.
We denote the set of possible realizations of Pit as Pti and the set of possible realizations of
Ct as Ct . Controller i chooses action Uit as a function of the total data (Pit , Ct ) available to it.
Specifically, for each controller i,
Uit = gti (Pit , Ct ), (3)

where gti , referred to as the control law at time t, can be any function of private and common
information. The collection gi = (g1i , . . . , gTi ) is called the control strategy of controller i and
the pair of control strategies for the two controllers (g1 , g2 ) is called a strategy profile. For a
given strategy profile, the overall cost of controller i is given as
T
hX i
1 2
i
J (g , g ) := E c i
(Xt , U1t , U2t ) , (4)
t=1

where the expectation on the right hand side of (4) is with respect to the probability measure on
the state and action processes induced by the choice of strategies g1 , g2 on the left hand side of
(4). A strategy profile (g1 , g2 ) is called a Nash equilibrium if no controller can lower its total
expected cost by unilaterally changing its strategy, that is,

J 1 (g1 , g2 ) J 1 (g1 , g2 ), and J 2 (g1 , g2 ) J 2 (g1 , g2 ), (5)

for all strategies g1 , g2 . We refer to the above game as game G1.

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


6

Remark 1 The system dynamics and the observation model (that is, the functions ft , h1t , h2t in
(1) and (2)), the statistics of the primitive random variables, the information structure of the
game and the cost functions are assumed to be common knowledge among the controllers.

C. Evolution of Common and Private Information

Assumption 1 We assume that the common and private information evolve over time as follows:
1) The common information Ct is increasing with time, that is, Ct Ct+1 for all t. Let
Zt+1 = Ct+1 \ Ct be the increment in common information from time t to t + 1. Thus,
Ct+1 = {Ct , Zt+1 }. Further,

Zt+1 = t+1 (P1t , P2t , U1t , U2t , Yt+1


1 2
, Yt+1 ), (6)

where t+1 is a fixed transformation.


2) The private information evolves according to the equation

Pit+1 = t+1
i
(Pit , Uit , Yt+1
i
) (7)

i
where t+1 , i = 1, 2, are fixed transformations.

Equation (6) states that the increment in common information is a function of the new
variables generated between t and t + 1, that is, the actions taken at t and the observations made
at t + 1, and the old variables that were part of private information at time t. Equation (7)
implies that the evolution of private information at the two controllers is influenced by different
observations and actions.

D. Common Information Based Conditional Beliefs

A key concept in our analysis is the belief about the state and the private informations
conditioned on the common information of both controllers. Formally, at any time t, given
the control laws from time 1 to t 1, we define the common information based conditional
belief as follows:
1 2
t (xt , p1t , p2t ) := Pg1:t1 ,g1:t1 (Xt = xt , P1t = p1t , P2t = p2t |Ct ) for all xt , p1t , p2t , (8)

1 2
where we use the superscript g1:t1 , g1:t1 in the RHS of (8) to emphasize that the conditional
belief depends on the past control laws. Note that t (, , ) is a |Xt Pt1 Pt2 |-dimensional

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


7

random vector whose realization depends on the realization of Ct . A realization of t is denoted


by t .
Given control laws gt1 , gt2, we define the following partial functions:

1t = gt1 (, Ct ) 2t = gt2 (, Ct )

These partial functions are functions from the private information of a controller to its control
action. These are random functions whose realizations depend on the realization of the random
vector Ct . The following lemma describes the evolution of the common information based
conditional belief using these partial functions.

1 2
Lemma 1 Consider any choice of control laws g1:t , g1:t . Let t be the realization of the common
information based conditional belief at time t, let ct be the realization of the common information
at time t, let ti = gti (, ct ), i = 1, 2, be the corresponding realizations of the partial functions
at time t, and zt+1 be the realization of the increment in common information (see Assumption
1). Then, the realization of the conditional belief at time t + 1 is given as

t+1 = Ft (t , t1 , t2 , zt+1 ), (9)

where Ft is a fixed transformation that does not depend on the control strategies.

Proof: See Appendix A.


Lemma 1 states that the evolution of the conditional belief t is governed by the partial
functions of control laws at time t. This lemma relies on Assumption 1 made earlier about
the evolution of common and private information. We now introduce the following critical
assumption that eliminates the dependence of t on the control laws.

Assumption 2 (Strategy Independence of Beliefs) Consider any time t, any choice of control
1 2
laws g1:t1 , g1:t1 , and any realization of common information ct that has a non-zero probability
1 2 1 2
under g1:t1 , g1:t1 . Consider any other choice of control laws g1:t1 , g1:t1 which also gives a
non-zero probability to ct . Then, we assume that
1 2 1 2
Pg1:t1 ,g1:t1 (Xt = xt , P1t = p1t , P2t = p2t |ct ) = Pg1:t1 ,g1:t1 (Xt = xt , P1t = p1t , P2t = p2t |ct ),

for all xt , p1t , p2t .

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


8

Equivalently, the evolution of the common information based conditional belief described in
Lemma 1 depends only on the increment in common information, that is, (9) can be written as

t+1 = Ft (t , zt+1 ), (10)

where Ft is a fixed transformation that does not depend on the control strategies.

Remark 2 Assumption 2 is somewhat related to the notion of one-way separation in stochastic


control, that is, the estimation (of the state in standard stochastic control and of the state and
private information in Assumption 2) is independent of the control strategy.

III. G AMES SATISFYING A SSUMPTIONS 1 AND 2

Before proceeding with further analysis, we first describe some instances of G1 where the
nature of the dynamic system and the private and common information implies that Assumptions
1 and 2 hold.

A. One-Step Delayed Information Sharing Pattern

Consider the instance of G1 where the common information at any time t is given as Ct =
1 2
{Y1:t1 , Y1:t1 , U11:t1 , U21:t1 } and the private information is given as Pit = Yti . Thus, Zt+1 :=
Ct+1 \ Ct = {Yt1, Yt2 , U1t , U2t }. This information structure can be interpreted as the case where
all observations and actions are shared among controllers with one step delay.

Lemma 2 The game with one-step delayed sharing information pattern described above satisfies
Assumptions 1 and 2.

Proof: See Appendix F1 .


A special case of the above information structure is the situation where the state Xt = (Xt1 , Xt2 )
and controller is observation Yti = Xti . A game with this information structure was considered
in [13]. It is interesting to note that Assumption 2 is not true if information is shared with delays
larger than one time step [17].

1
Appendices F-J are included in the Supplementary Material section at the end of the paper.

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


9

B. Information Sharing with One-Directional-One-Step Delay

Similar to the one-step delay case, we consider the situation where all observations of controller
1 are available to controller 2 with no delay while the observations of controller 2 are available
to controller 1 with one-step delay. All past control actions are available to both controllers.
1 2
That is, in this case, Ct = {Y1:t , Y1:t1 , U11:t1 , U21:t1 }, Zt+1 = {Yt+1
1
, Yt2 , U1t , U2t }, controller
1 has no private information and the private information of controller 2 is P2t = Yt2 .

Lemma 3 The game with one-directional-one-step delayed sharing information pattern de-
scribed above satisfies Assumptions 1 and 2.

Proof: See Appendix G.

C. State Controlled by One Controller with Asymmetric Delay Sharing

Case A: Consider the special case of G1 where the state dynamics are controlled only by
controller 1, that is,
Xt+1 = ft (Xt , U1t , Wt0).

Assume that the information structure is given as:

1 2
Ct = {Y1:t , Y1:td , U11:t1 }, P1t = , P2t = Ytd+1:t
2
.

That is, controller 1s observations are available to controller 2 instantly while controller 2s
observations are available to controller 1 with a delay of d 1 time steps.
Case B: Similar to the above case, consider the situation where the state dynamics are still
controlled only by controller 1 but the information structure is:
1 2
Ct = {Y1:t1 , Y1:td , U11:t1 }, P1t = Yt1 , P2t = Ytd+1:t
2
.

Lemma 4 The games described in Cases A and B satisfy Assumptions 1 and 2.

Proof: See Appendix H.

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


10

D. An Information Structure with Global and Local States

Noiseless Observations: We now consider the information structure described in [18]. In this
example, the state Xt has three components: a global state Xt0 and a local state Xti for each
controller. The state evolution is given by the following equation:

Xt+1 = ft (Xt0 , U1t , U2t , Wt0) (11)

Note that the dynamics depend on the current global state Xt0 but not on the current local states.
0
Each controller has access to the global state process X1:t and its current local state Xti . In
addition, each controller knows the past actions of all controllers. Thus, the common and private
information in this case are:

0
Ct = {X1:t , U11:t1 , U21:t1 }, Pit = {Xti }

It is straightforward to verify that Assumption 1 holds for this case.


For a realization {x01:t , u11:t1 , u21:t1 } of the common information, the common information
based belief in this case is given as
1 2
t (x0 , x1 , x2 ) = Pg1:t1 ,g1:t1 (Xt0 = x0 , x1t = x1 , Xt2 = x2 |x01:t , u11:t1 , u21:t1 )

= 1{x0 =x0t } P(Xt1 = x1 , Xt2 = x2 |x0t:t1 , u1t1 , u2t1 ) (12)

0
It is easy to verify that the above belief depends only on the statistics of Wt1 and is therefore
independent of control laws. Thus, Assumption 2 also holds for this case.
Noisy Observations: We can also consider a modification of the above scenario where both
controllers have a common, noisy observation Yt0 = ht (Xt0 , Wt1 ) of the global state. That is,

0
Ct = {Y1:t , U11:t1 , U21:t1 }, Pit = {Xti }, 0
Zt+1 = {Yt+1 , U1t , U2t }.

Lemma 5 The game with the information pattern described above satisfies Assumptions 1 and
2.

Proof: See Appendix I.

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


11

E. Uncontrolled State Process

Consider a state process whose evolution does not depend on the control actions, that is, the
system state evolves as
Xt+1 = ft (Xt , Wt0) (13)

Further, the common and private information evolve as follows:


1) Ct+1 = {Ct , Zt+1 } and

Zt+1 = t+1 (P1t , P2t , Yt+1


1 2
, Yt+1 ), (14)

where t+1 is a fixed transformation.


2) The private information evolves according to the equation

Pit+1 = t+1
i
(Pit , Yt+1
i
) (15)
i
where t+1 , i = 1, 2, are fixed transformations.
Note that while control actions do not affect the state evolution, they still affect the costs.

Lemma 6 The game G1 with an uncontrolled state process described above satisfies Assump-
tions 1 and 2.

Proof: See Appendix J.


As an example of this case, consider the information structure where the two controllers share
their observations about an uncontrolled state process with a delay of d 1 time steps. In
1 2
this case, the common information is Ct = {Y1:td , Y1:td } and the private information is
Pit = Ytd+1:t
i
.

F. Symmetric Information Game

Consider the case when all observations and actions are available to both controllers, that is,
I1t = I2t = Ct = {Y1:t
1 2
, Y1:t , U11:t1 , U21:t1 } and there is no private information. The common
1 2
1 2
information based belief in this case is t (xt ) = Pg1:t1 ,g1:t1 (Xt = xt |y1:t , y1:t , u11:t1 , u21:t1 ).
t is the same as the information state in centralized stochastic control, which is known to
be control strategy independent and which satisfies an update equation of the form required
in Assumption 2 [19]. A related case with perfect state observations is the situation where
I1t = I2t = X1:t .

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


12

G. Symmetrically Observed Controlled State and Asymmetrically Observed Uncontrolled State

A combination of the previous two scenarios is the situation when the state Xt consists of two
independent components: a controlled component Xta and an uncontrolled component Xtb . Both
components are observed through noisy channels. The observations about the controlled state
as well as the past actions are common to both controllers whereas the information about the
uncontrolled state satisfies the model of Section III-E. The common information based conditional
belief can then be factored into two independent components each of which satisfies an update
equation of the form required by Assumption 2.

IV. M AIN R ESULTS

Our goal in this section is to show that under Assumptions 1 and 2, a class of equilibria of
the game G1 can be characterized in a backward inductive manner that resembles the backward
inductive characterization of Markov perfect equilibria of symmetric information games with
perfect state observation. However, in order to do so, we have to view our asymmetric information
game as a symmetric information game by introducing virtual players that make decisions
based on the common information. This section describes this change of perspective and how it
can be used to characterize a class of Nash equilibria.
We reconsider the model of game G1. We assume that controller i is replaced by a virtual
player i (VP i). The system operates as follows: At time t, the data available to each virtual
player is the common information Ct . The virtual player i selects a function it from Pti to Uti
according to a decision rule it ,
it = it (Ct )

Note that under a given decision rule it , it is a random function since Ct is a random vector.
We will use ti to denote a realization of it . We will refer to it as the prescription selected by
virtual player i at time t. Once the virtual player has chosen it , a control action Uit = it (Pit )
is applied to the system. i := (i1 , i2 , . . . , iT ) is called the strategy of the virtual player i. The
total cost of the virtual player i is given as
T
hX i
1 2
i
J ( , ) := E ci (Xt , U1t , U2t ) (16)
t=1

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


13

where the expectation on the right hand side of (16) is with respect to the probability measure
on the state and action processes induced by the choice of strategies 1 , 2 on the left hand side
of (16). We refer to the game among the virtual players as game G2.

Remark 3 In case there is no private information, the function it from Pti to Uti is interpreted
as simply a value in the set Uti .

A. Equivalence with Game G1

Theorem 1 Let (g1 , g2) be a Nash equilibrium of game G1. Define i for i = 1, 2, t =
1, 2, . . . , T as
it (ct ) := gti (, ct), (17)

for each possible realization ct of common information at time t. Then (1 , 2 ) is a Nash


equilibrium of game G2. Conversely, if (1 , 2 ) is a Nash equilibrium of game G2, then define
gi for i = 1, 2, t = 1, 2, . . . , T as
gti (, ct ) := it (ct ), (18)

for each possible realization ct of common information at time t. Then (g1 , g2 ) is a Nash
equilibrium of game G1.

Proof: It is clear that using (17), any controller strategy profile in game G1 can be trans-
formed to a corresponding virtual player strategy profile in game G2 without altering the behavior
of the dynamic system and in particular the values of the expected costs. If a virtual player
can reduce its costs by unilaterally deviating from i , then such a deviation must also exist
for the corresponding controller in G1. Therefore, equilibrium of controllers strategies implies
equilibrium of corresponding virtual players strategies. The converse can be shown using similar
arguments.
The game between the virtual players is a symmetric information game since they both make
their decisions based only on the common information Ct . In the next section, we identify a
Markov state for this symmetric information game and characterize Markov perfect equilibria
for this game.

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


14

B. Markov Perfect Equilibrium of G2

We want to establish that the common information based conditional beliefs t (defined in
(8)) can serve as a Markov state for the game G2. Firstly, note that because of Assumption 2,
t depends only on the common information Ct and since both the virtual players know the
common information, the belief t is common knowledge among them. The following lemma
shows that t evolves as a controlled Markov process.

Lemma 7 From the virtual players perspective, the process t , t = 1, 2, . . . , T is a controlled


Markov process with the virtual players prescriptions t1 , t2 , t = 1, 2, . . . , T as the controlling
actions, that is,

1 2 1 2
P(t+1 |ct , 1:t , 1:t , 1:t ) = P(t+1 |1:t , 1:t , 1:t ) = P(t+1 |t , t1 , t2 ) (19)

Proof: See Appendix B.


Following the development in [20], we next show that if one virtual player is using a strategy
that is measurable with respect to t , then the other virtual player can select an optimal response
strategy measurable with respect to t as well.

Lemma 8 If virtual player i is using a decision strategy that selects prescriptions only as a
function of the belief t , that is,
it = ti (t ),

t = 1, . . . , T, then virtual player j can also choose its prescriptions only as a function of the
belief t without any loss of performance.

Proof: See Appendix C


Lemmas 7 and 8 establish t as the Markov state for the game G2. We now define a Markov
perfect equilibrium for game G2.

Definition 1 A strategy profile ( 1 , 2 ) is said to be a Markov perfect equilibrium of game


G2 if (i) at each time t, the strategies select prescriptions only as a function of the common
information based belief t and (ii) the strategies form a Nash equilibrium for every sub-game
of G2 [3].

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


15

Given a Markov perfect equilibrium of G2, we can construct a corresponding Nash equilibrium
of game G1 using Theorem 1. We refer to the class of Nash equilibria of G1 that can be
constructed from the Markov perfect equilibria of G2 as the common information based Markov
perfect equilibria of G1.

Definition 2 A strategy profile (g1 , g2) of the form Uit = gti (Pit , t ), i = 1, 2, is called a common
information based Markov perfect equilibrium for game G1 if the corresponding strategies of
game G2 defined as
ti (t ) := gti (, t ),

form a Markov perfect equilibrium of G2.

The following theorem provides a necessary and sufficient condition for a strategy profile to be
a Markov perfect equilibrium of G2.

Theorem 2 Consider a strategy pair ( 1 , 2 ) such that at each time t, the strategies select
prescriptions based only on the realization of the common information based belief t , that is,

ti = ti (t ), i = 1, 2

A necessary and sufficient condition for ( 1 , 2 ) to be a Markov perfect equilibrium of G2 is


that they satisfy the following conditions:
1) For each possible realization of T , define the value function for virtual player 1:

VT1 () := min
1
E[c1 (Xt , 1T (P1T ), 2T (P2T ))|T = , 1T = 1 , 2T = T2 ()] (20)

Then, T1 () must be a minimizing 1 in the definition of VT1 (). Similarly, define the value
function for virtual player 2:

VT2 () := min
2
E[c2 (Xt , 1T (P1T ), 2T (P2T ))|T = , 1t = T1 (), 2T = 2 ] (21)

Then, T2 () must be a minimizing 2 in the definition of VT2 ().

2) For t = T 1, . . . , 1 and for each possible realization of t , define recursively the value
functions for virtual player 1:

Vt1 () := min
1
E[c1 (Xt , 1t (P1t ), 2t (P2t )) + Vt+1
1
(t+1 )|t = , 1t = 1 , 2t = t2 ()]

(22)

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


16

where t+1 = Ft (t , Zt+1 ). Then, t1 () must be a minimizing 1 in the definition of


Vt1 (). Similarly, define recursively the value functions for virtual player 2:

Vt2 () := min
2
E[c2 (Xt , 1t (P1t ), 2 (P2t )) + Vt+1
2
(t+1 )|t = , 1t = t1 (), 2t = 2 ]

(23)
where t+1 = Ft (t , Zt+1 ). Then, t2 () must be a minimizing 2 in the definition of
Vt2 ().

Proof: See Appendix D


Theorem 2 suggest that one could follow a backward inductive procedure to find equilibrium
strategies for the virtual players. Before describing this backward procedure in detail, we make
a simple but useful observation. In (20)-(23), since the i enters the expectation only as i (Pi ),
it suggests that we may be able to carry out the minimization over i by separately minimizing
over i (pi ) for all possible pi . This observation leads us to the backward induction procedure
described in the next section.

Remark 4 Note that if Assumption 2 were not true, then according to Lemma 1, t+1 =
Ft (t , 1t , 2t , Zt+1 ). In this case, the entire prescription i will affect the second term in the
expectation in (22)-(23), and we could not hope to carry out the minimization over i by
separately minimizing over i (pi ) for all possible pi .

C. Backward Induction Algorithm for Finding Equilibrium

We can now describe a backward inductive procedure to find a Markov perfect equilibrium
of game G2 using a sequence of one-stage Bayesian games. We proceed as follows:
Algorithm 1:
1) At the terminal time T , for each realization of the common information based belief at
time T , we define a one-stage Bayesian game SGT () where
a) The probability distribution on (XT , P1T , P2T ) is .
b) Agent2 i observes PiT and chooses action UiT , i = 1, 2.

2
Agent i can be thought to be the same as controller i. We use a different name here in order to maintain the distinction
between games G1 and SGT ().

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


17

c) Agent is cost is ci (XT , U1T , U2T ), i = 1, 2.


A Bayesian Nash equilibrium of this game is a pair of strategies i , i = 1, 2, for the agents
which map their observation PiT to their action UiT such that for any realization pi , i (pi )
is a solution of the minimization problem

min
i
E [ci (XT , ui , j (PjT ))|PiT = pi ],
u

where j 6= i and the superscript denotes that the expectation is with respect to the
distribution . (See [21], [22] for a definition of Bayesian Nash equilibrium.) If a Bayesian
Nash equilibrium 1 , 2 of SGT () exists, denote the corresponding expected equilibrium
costs as VTi (), i = 1, 2 and define Ti () := i , i = 1, 2.
2) At time t < T , for each realization of the common information based belief at time t,
we define the one-stage Bayesian game SGt () where
a) The probability distribution on (Xt , P1t , P2t ) is .
b) Agent i observes Pit and chooses action Uit , i = 1, 2.
c) Agent is cost is ci (Xt , U1t , U2t ) + Vt+1
i
(Ft (, Zt+1 )), i = 1, 2.
Recall that the belief for the next time step is t+1 = Ft (, Zt+1 ) and Zt+1 is given by
(6). A Bayesian Nash equilibrium of this game is a pair of strategies i , i = 1, 2, for the
agents which map their observation Pit to their action Uit such that for any realization pi ,
i (pi ) is a solution of the minimization problem

min E [ci (Xt , ui , j (Pjt )) + Vt+1


i
(Ft (, Zt+1 ))|Pit = pi ],
ui

where j 6= i, i, j = 1, 2, and Zt+1 is the increment in common information generated


according to (6), (2) and (1) when control actions Uit = ui and Ujt = j (Pjt ) are used.
The expectation is with respect to the distribution . If a Bayesian Nash equilibrium 1 , 2
of SGt () exists, denote the corresponding expected equilibrium costs as Vti (), i = 1, 2
and define ti () := i , i = 1, 2.

Theorem 3 The strategies 1 , 2 defined by the backward induction procedure described in


Algorithm 1 form a Markov perfect equilibrium of game G2. Consequently, strategies g1 , g2
defined as
gti (, t ) := ti (t ),

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


18

i = 1, 2, t = 1, 2, . . . , T form a common information based Markov perfect equilibrium of game


G1.

Proof: To prove the result, we just need to observe that the strategies defined by the backward
induction procedure of Algorithm 1 satisfy the conditions of Theorem 2 and hence form a Markov
perfect equilibrium of game G2. See Appendix E for a more detailed proof.

D. An Example Illustrating Algorithm 1

We consider an example of game G1 where the (scalar) state Xt and the (scalar) control
actions Ut1 , Ut2 take value in the set {0, 1}. The state evolves as a controlled Markov chain
depending on the two control actions according to the state transition probabilities:
1
P Xt+1 = 0 Xt = 0, Ut1 = Ut2 = ,

4
1
P Xt+1 = 0 Xt = 1, Ut1 = Ut2 = ,

2
2
P Xt+1 = 0 Xt = 0, Ut1 6= Ut2 = P Xt+1 = 0 Xt = 1, Ut1 6= Ut2 = .
 
(24)
5
The initial state is assumed to be equi-probable, i.e., P {X1 = 0} = P {X1 = 1} = 1/2. The
first controller observes the state perfectly, while the second controller observes the state through
a binary symmetric channel with probability of error 1/3. Thus,

X
t with probability 32 ,
Yt1 = Xt , Yt2 =
1 Xt with probability 1 .
3

The controllers share the observations and actions with a delay of one time step. Thus, the
common information and private informations at time step t are given as
2 1 2
Ct = {X1:t1 , Y1:t1 , U1:t1 , U1:t1 }, P1t = {Xt }, P2t = {Yt2 }.

In the equivalent game with virtual players, the decision of the ith virtual player, it , is a function
that maps Yti := {0, 1} to Uti := {0,1}.
The common information based belief for this case is the belief on (Xt , Yt2 ) given the common
2
information x1:t1 , y1:t1 , u11:t1 , u21:t1 , that is,

t (x, y 2 ) = P Xt = x, Yt2 = y 2 x1:t1 , y1:t1 2


, u11:t1 , u21:t1

 
 1 2
2 1
= P Xt = x xt1 , ut1 , ut1
1{y2 =x} + 1{y2 6=x} . (25)
3 3

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


19

The above equation implies that the distribution t is completely specified by xt1 , u1t1 , u2t1 .
That is,
t = Ft1 (xt1 , u1t1 , u2t1 ). (26)

(Note that Ft1 is a vector-valued function whose components are given by (25) for all x, y 2
{0, 1}.) The cost functions ci (x, u1 , u2 ) for various values of state and actions are described by
the following matrices

xt = 0 xt = 1
0 1 0 1
0 1, 0 0, 1 0 0, 0 1, 1
1 0, 1 0, 0 1 0, 1 1, 0
where the rows in each matrix correspond to controller 1s actions and the columns correspond
to controller 2s actions. The first entry in each element of the cost matrix is controller 1s cost
and second entry is controller 2s cost.
Applying Algorithm 1:
We now use Algorithm 1 for a two-stage version of the game described above.
1) At the terminal time step T = 2, for a realization of the common information based
belief at time 2, we define a one stage game SG2 () where
a) The probability distribution on (X2 , Y22 ) is .
b) Agent 1 observes X2 and selects an action U21 ; Agent 2 observes Y22 and selects U22 .
c) Agent is cost is ci (X2 , U21 , U22 ), given by the matrices defined above.
A Bayesian Nash equilibrium of this game is a pair of strategies 1 , 2 , such that
For x = 0, 1, 1 (x) is a solution of minu1 E [c1 (X2 , u1, 2 (Y22 ))|X2 = x].
For y = 0, 1, 2 (y) is a solution of minu2 E [c2 (X2 , 1 (X2 ), u2 )|Y22 = y].
It is easy to verify that

1 (x) := 1, 2 (y) := 1 for all x, y {0, 1}

is a Bayesian Nash equilibrium of SG2 (). The expected equilibrium cost for agent i is

(X = 1) for i = 1,
i i 2
V2 () = E [c (X2 , 1, 1)] = (27)
0 for i = 2

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


20

where (X2 = 1) is the probability that X2 = 1 under the distribution . From the above
Bayesian equilibrium strategies, we define the virtual playerss decision rules for time
T = 2 as 2i () = i , i = 1, 2.
2) At time t = 1, since there is no common information, the common information based
belief 1 is simply the prior belief on (X1 , Y12 ). Since the initial state is equally likely to
be 0 or 1,  
2 1 2 1
1 (x, y ) = 1{y2 =x} + 1{y2 6=x}
2 3 3
We define the one-stage Bayesian game SG1 (1 ) where
a) The probability distribution on (X1 , Y12 ) is 1 .
b) Agent 1 observes X1 and selects an action U11 ; Agent 2 observes Y12 and selects U12 .
c) Agent is cost is given by ci (X1 , U11 , U12 ) + V2i (F1 (X1 , U11 , U12 )), where F1 , defined
by (26) and (25), gives the common information belief at time 2 as a function of
X1 , U11 , U22 , and V2i , defined in (27), gives the expected equilibrium cost for time 2
as a function of the common information belief at time 2.
For example, if U11 6= U12 , then (25), (26) and (27) imply V21 (F1 (X1 , U11 , U12 )) = 3/5.
Similarly, if U11 = U12 , then (25), (26) and (27) imply V21 (F1 (0, U11 , U12 )) = 3/4 and
V21 (F1 (1, U11 , U12 )) = 1/2. Also, (27) implies that V22 is identically 0.
A Bayesian Nash equilibrium of this game is a pair of strategies 1 , 2 such that
For x = 0, 1, 1 (x) is a solution of

min
1
E1 [c1 (X1 , u1, 2 (Y12 )) + V21 (F1 (X1 , u1 , 2 (Y12 )))|X1 = x].
u

For y = 0, 1, 2 (y) is a solution of

min
2
E1 [c2 (X1 , 1 (X1 ), u2 ) + V22 (F1 (X1 , 1 (X1 ), u2 ))|Y12 = y].
u

It is easy to verify that


1 (x) = 1 x, 2 (y) = 1 y

is a Bayesian Nash equilibrium of SG1 (). The expected equilibrium costs are

V1i (1 ) = E[ci (X1 , 1 (X1 ), 2 (Y12 ))],

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


21

which gives V11 (1 ) = 47/60 and V12 (1 ) = 1/3. From the above Bayesian equilibrium
strategies, we define the virtual playerss decision rules for time t = 1 as 1i (1 ) = i ,
i = 1, 2.
Since we now know the equilibrium decision rules ti , i = 1, 2, t = 1, 2 for the virtual
players, we can construct the corresponding control laws for the controllers using Theo-
rem 3. Thus, a common information based Markov perfect equilibrium for the game in
this example is given by the strategies:

1 if x = 0, 1 if y 2 = 0,
1 1
g11 (x1 , 1 ) = g12 (y12 , 1 ) =
0 if x1 = 1. 0 if y 2 = 1.
1

and

g21 (x2 , 2 ) = 1 g22 (y22 , 2 ) = 1.

V. B EHAVIORAL S TRATEGIES AND E XISTENCE OF E QUILIBRIUM

The results of Theorems 2 and 3 provide sufficient conditions for a pair of strategies to
be an equilibrium of game G2. Neither of these results addresses the question of existence
of equilibrium. In particular, the result of Theorem 3 states that the (pure strategy) Bayesian
Nash equilibria of the one-stage Bayesian games SGt (), t = T, . . . , 1, may be used to find a
Markov perfect equilibrium of game G2 and hence a common information based Markov perfect
equilibrium of G1. However, the games SGt () may not have any (pure strategy) Bayesian Nash
equilibrium.
As is common in finite games, we need to allow for behavioral strategies in order to ensure
the existence of equilibria. Toward that end, we now reconsider the model of game G1. At each
time t, each controller is now allowed to select a probability distribution Dit over the (finite) set
of actions Uti , i = 1, 2 according to a control law of the form:

Dit = gti (Pit , Ct ). (28)

The rest of the model is the same as in Section II. We denote the set of probability distributions
over Uti by (Uti ).
Following exactly the same arguments as in Section IV, we can define an equivalent game
where virtual players select prescriptions that are functions from the set of private information

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


22

Pti to the set (Uti ) and establish the result of Theorem 1 for this case. A sufficient condition for
Markov perfect equilibrium of this game is given by Theorem 2 where i are now interpreted
as mappings from Pti to (Uti ) (instead of mappings from Pti to Uti ). Given a Markov perfect
equilibrium ( 1 , 2 ) of the virtual players game, the equivalent strategies gti (, ) := ti () form
a common information based Markov perfect equilibrium of game G1 in behavioral strategies.
Further, we can follow a backward induction procedure identical to the one used in section IV-C
(Algorithm 1), but now consider mixed strategy Bayesian Nash equilibria of the one-stage
Bayesian games SGt () constructed there. We proceed as follows:
Algorithm 2:
1) At the terminal time T , for each realization of the common information based belief at
time T , consider the one-stage Bayesian game SGT () defined in Algorithm 1. A mixed
strategy i for the game SGT () is a mapping form PTi to (UTi ). A mixed strategy
Bayesian Nash equilibrium of this game is a pair of strategies 1 , 2 such that for any
realization pi , i (pi ) assigns zero probability to any action that is not a solution of the
minimization problem
min
i
E[ci (Xt , ui , Ujt )|PiT = pi ],
u

where Ujt is distributed according to j (Pjt ). Since SGt () is a finite Bayesian game, a
mixed strategy equilibrium is guaranteed to exist [22]. For any mixed strategy Bayesian
Nash equilibrium 1 , 2 of SGT (), denote the expected equilibrium costs as VTi () and
define ti () := i , i = 1, 2.
2) At time t < T , for each realization of the common information based belief at time t,
consider the one-stage Bayesian game SGt () defined in Algorithm 1. A mixed strategy
Bayesian Nash equilibrium of this game is a pair of strategies 1 , 2 such that for any
realization pi , i (pi ) assigns zero probability to any action that is not a solution of the
minimization problem

min E[ci (Xt , ui , Ujt )) + Vt+1


i
(Ft (, Zt+1 ))|Pit = pi ],
ui

where Ujt is distributed according to j (Pjt ) and Zt+1 is the increment in common in-
formation generated according to (6), (2) and (1) when control actions Uit = ui and Ujt
distributed according to j (Pjt ) are used. Since SGt () is a finite Bayesian game, a mixed
strategy equilibrium is guaranteed to exist [22]. For any mixed strategy Bayesian Nash

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


23

equilibrium 1 , 2 of SGt (), denote the expected equilibrium costs as Vti () and define
ti () := i , i = 1, 2.
We can now state the following theorem.

Theorem 4 For the finite game G1, a common information based Markov perfect equilibrium in
behavioral strategies always exists. Further, this equilibrium can be found by first constructing
strategies 1 , 2 according to the backward inductive procedure of Algorithm 2 and then defining
behavioral strategies g1 , g2 in G1 as

gti (, t ) := ti (t ),

i = 1, 2, t = 1, 2, . . . , T .

VI. D ISCUSSION

A. Importance of Assumption 2

The most restrictive assumption in our analysis of game G1 is Assumption 2 which states
that the common information based belief is independent of control strategies. It is instructive to
consider why our analysis does not work in the absence of this assumption. Let us consider the
model of Section II with Assumption 1 as before but without Assumption 2. Lemma 1, which
follows from Assumption 1, is still true. For this version of game G1 without Assumption 2, we
can construct an equivalent game with virtual players similar to game G2. Further, it is easy to
show that Theorem 1 which relates equilibria of G2 to those of G1 is still true.
The key result for our analysis of game G2 in section IV was Lemma 8 which allowed us to
use t as a Markov state and to define and characterize Markov perfect equilibria for the game
G2. Lemma 8 essentially states that the set of Markov decision strategy pairs (that is, strategies
that select prescriptions as a function of t ) is closed with respect to the best response mapping.
In other words, if we start with any pair of Markov strategies ( 1 , 2 ) for the virtual players and
define i to be the best response of virtual player i to j , then, for at least one choice of best
response strategies, the pair (1 , 2 ) belongs to the set of Markov strategy pairs. This is true
not just for strategies ( 1 , 2 ) that form an equilibrium but for any choice of Markov strategies.
We will now argue that this is not necessarily true without Assumption 2.
Recall that due to Lemma 1, the belief t evolves as
1 2
t = Ft1 (t1 , t1 , t1 , zt ).

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


24

Thus, in order to evaluate the current realization of t , a virtual player must know the prescrip-
tions used by both virtual players. However, the virtual players do not observe each others past
prescriptions since the only data they have available is ct . Thus, a virtual player cannot evaluate
the belief t without knowing (or assuming) how the other player selects its prescriptions.
Consider now decision strategies ( 1 , 2 ) for the two virtual players which operate as follows:
At each time t, the prescriptions chosen by virtual players are

ti = ti (t ) (29)

and the belief at the next time t + 1 is

t+1 = Ft (t , t1 (t ), t2 (t ), zt+1 ). (30)

Assume that the above strategies are not a Nash equilibrium for the virtual players game.
Therefore, one virtual player, say virtual player 2, can benefit by deviating from its strategy.
Given that virtual player 1 continues to operate according to (29) and (30), is it possible for
virtual player 2 to reduce its cost by using a non-Markov strategy, that is, a strategy that selects
prescriptions based on more data than just t ? Consider any time t, if virtual player 2 has
2
deviated to some other choice of Markov decision rules 1:t1 in the past, then the true belief
on state and private information given the common information,
1 2
t = P1:t1 ,1:t1 (xt , p1t , p2t |ct ),

is different from the belief t evaluated by the first player according to (30). (Note that since
past prescriptions are not observed and virtual player 1s operation is fixed by (29) and (30),
virtual player 1 continues to use t evolving according to (30) as its belief.) Even though t is
no longer the true belief, virtual player 2 can still track its evolution using (30). Using arguments
similar to those in the proofs of Lemmas 7 and 8, it can be established that an optimal strategy
for virtual player 2, given that virtual player 1 operates according to (29) and (30), is of the
form t2 = t2 (t , t ), where t is the true conditional belief on state and private information
given the common information whereas t is given by (30). Thus, the best response of player
2 may not necessarily be a Markov strategy and hence Lemma 8 may no longer hold. Without
Lemma 8, we cannot define Markov perfect equilibrium of game G2 using t as the state.

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


25

B. The Case of Team Problems

The game G1 is referred to as a team problem if the two controllers have the same cost
functions, that is, c1 () = c2 () = cteam (). Nash equilibrium strategies can then be interpreted
as person-by-person optimal strategies [23]. Clearly, the results of sections IV and V apply to
person-by-person optimal strategies for team problems as well.
For team problems, our results can be strengthened in two ways. Firstly, we can find globally
optimal strategies for the controllers in the team using the virtual player approach and secondly,
we no longer need to make Assumption 2. Let us retrace our steps in section IV for the team
problem without Assumption 2:
1) We can once again introduce virtual players that observe the common information and
select prescriptions for the controllers. The two virtual players have the same cost function.
So game G2 is now a team problem and we will refer to it as T2 . It is straightforward
to establish that globally optimal strategies for virtual player can be translated to globally
optimal strategies for the controllers in the team in a manner identical to Theorem 1.
2) Since we are no longer making Assumption 2, the common information belief evolves
according to
1 2
t = Ft1 (t1 , t1 , t1 , zt ). (31)

2
Virtual player 1 does not observe t1 , so it cannot carry out the update described in (31).
However, we will now increase the information available to virtual players and assume
1 2
that each virtual player can indeed observe all past prescriptions 1:t1 , 1:t1 . We refer
to this team with expanded information for the virtual players as T2.
It should be noted that the globally optimal expected cost for T2 can be no larger than
the globally optimal cost of T2 since we have only added information in going from T2
to T2. We will later show that the globally optimal strategies we find for T2 can be
translated to equivalent strategies for T2 with the same expected cost.
3) For T2, since all past prescriptions are observed, both virtual players can evaluate t
1 2
using (31) without knowing the past decision rules 1:t1 , 1:t1 . We can now repeat the
arguments in the proof of Lemma 7 to show that an analogous result is true for team T2
as well. The team problem for the virtual players is now a Markov decision problem with
t evolving according to (31) as the Markov state and the prescription pair (t1 , t2 ) as the

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


26

decision. We can then write a dynamic program for this Markov decision problem.

Theorem 5 For the team problem T2 with virtual players, for each realization of t , the
optimal prescriptions are the minimizers in the following dynamic program:

VTteam () := min
1 2
E[cteam (Xt , 1T (P1T ), 2T (P2T ))|T = , 1T = 1 , 2T = 2 ] (32)
,

Vtteam () := min
1 2
E[cteam (Xt , 1t (P1t ), 2t (P2t )) + Vt+1
team
(t+1 )|t = , 1t = 1 , 2t = 2 ]
,

(33)
where t+1 = Ft (t , 1t , 2t , Zt+1 ).

4) Let t1 () be the minimizer in the right hand side of the definition of Vtteam () in the
above dynamic program. The globally optimal virtual players operation can be described
as: At each t, evaluate
1 2
t = Ft1 (t1 , t1 , t1 , zt ) (34)

and then select the prescriptions

ti = ti (t ) i = 1, 2. (35)

Now, instead of operating according to (34) and (35), assume that virtual players operate
as follows: At each t, evaluate
1 2
t = Ft1 (t1 , t1 (t1 ), t1 (t1 ), zt ) (36)

and then select the prescriptions

ti = ti (t ) i = 1, 2. (37)

It should be clear that virtual players operating according to (36) and (37) will achieve the
same globally optimal performance as the virtual players operating according to (34) and
(35). Furthermore, the virtual players in T2 can follow (36) and (37) and thus achieve the
same globally optimal performance as in T2.
Thus, to find globally optimal strategies for the team of virtual players in absence of As-
sumption 2, we first increased their information to include past prescriptions and then mapped

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


27

the globally optimal strategies with increased information to equivalent strategies with original
information.
For the game G2 in absence of assumption 2, we cannot follow the above approach of first
increasing virtual players information to include past prescriptions, finding equilibrium with
added information and then mapping the equilibrium strategies to equivalent strategies with
original information. To see the reason, let us denote the virtual player operation given by (34)
and (35) by the strategy i , i = 1, 2 and the virtual player operation given by (36) and (37) by the
strategy i , i = 1, 2. Then, while it is true that J i ( 1 , 2 ) = J i ( 1 , 2 ), i = 1, 2, but for some
other strategies 1 , 2 , it is not necessarily true that J i ( i , j ) = J i ( i , j ), i, j = 1, 2, i 6= j.
Therefore, the equilibrium conditions for 1 , 2 :

J 1 ( 1 , 2 ) J 1 (1 , 2 ), and J 2 ( 1 , 2 ) J 2 ( 1 , 2 ), (38)

do not necessarily imply the equilibrium conditions for 1 , 2 :

J 1 ( 1 , 2 ) J 1 (1 , 2 ), and J 2 ( 1 , 2 ) J 2 ( 1 , 2 ). (39)

Remark 5 Our dynamic program for the team problem is similar to the dynamic program for
teams obtained in [24] using a slightly different but conceptually similar approach.

VII. C ONCLUDING R EMARKS

We considered the problem of finding Nash equilibria of a general model of stochastic


games with asymmetric information. Our analysis relied on the nature of common and private
information among the controllers. Crucially, we assumed that the common information among
controllers is increasing with time and that a common information based belief on the system
state and private information is independent of control strategies. Under these assumptions, the
game with asymmetric information is shown to be equivalent to another game with symmetric
information for which we obtained a characterization of Markov perfect equilibria. This charac-
terization allowed us to provide a backward induction algorithm to find Nash equilibria of the
original game. Each step of this algorithm involves finding Bayesian Nash equilibria of a one-
stage Bayesian game. The class of Nash equilibria of the original game that can be characterized
in this backward manner are named common information based Markov perfect equilibria.

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


28

The class of common information based Markov perfect equilibria for asymmetric information
games bears conceptual similarities with Markov perfect equilibria of symmetric information
games with perfect state observation. In symmetric information games with perfect state obser-
vation, a controller may be using past state information only because the other controller is using
that information. Therefore, if one controller restricts to Markov strategies, the other controller
can do the same. This observation provides the justification for focusing only on Markov perfect
equilibria for such games. Our results show that a similar observation can be made in our model
of games with asymmetric information. A controller may be using the entire common information
only because other controller is using that information. If one controller chooses to only use the
common information based belief on the state and private information, the other controller can
do the same. Thus, it is reasonable to focus on the class of common information based Markov
perfect equilibria for our model of games with asymmetric information.
Further, for zero-sum games, the uniqueness of the value of the game implies that the equi-
librium cost of a common information based Markov perfect equilibrium is the same as the
equilibrium cost of any other Nash equilibrium [21].
For finite games, it is always possible to find pure strategy Nash equilibria (if they exist) by
a brute force search of the set of possible strategy profiles. The number of strategy choices for
i i
controller i are |U1i ||P1 C1 | . . . |UTi ||PT CT | . For simplicity, assume that the set of possible
realizations of private information Pti does not change with time. However, because the common
information is required to be increasing with time (see Assumption 1), the cardinality of the set
possible realization of common information Ct is exponentially increasing with time. Thus, the
number of possible control strategies exhibits a double exponential growth with time.
Algorithm 1 provides an alternative way for finding an equilibrium by solving a succession
of one stage Bayesian games. But how many such games need to solved? At each time t, we
need to solve a Bayesian game for each possible realization of the belief t . Let Rt denote the
set of possible realizations of the belief t . Since the belief is simply a function of the common
information, we must have that |Rt | |Ct |. Thus, the total number of one stage games that need
to solved is no larger that Tt=1 |Ct |. Recalling the exponential growth of |Ct |, the number of
P

one-stage games to solve shows an exponential growth with time. This is clearly better than the
double exponential growth for the brute force search.
Two possible reasons may further reduce the complexity of Algorithm 1. Firstly, the set |Rt |

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


29

may not be growing exponentially with time (as in the case of the information structure in
Section IV-D, where |Rt | = 3, for all t > 1). Secondly, the one-stage games at time t, SGt ()
may possess enough structure that it is possible to find an equilibrium for a generic that can be
used to construct equilibrium for all choices of . For finite games, it is not clear what additional
features need to be present in game G1 such that the resulting one-stage games SGt () can be
solved for a generic . In the sequel to this paper we will extend the approach used here to linear
quadratic Gaussian games and show that in these games it is possible to solve the one-stage
games for a generic belief .
Conceptually, the approach adopted in this paper can be extended to infinite time horizon
games with discounted costs under suitable stationarity conditions. However, in infinite horizon
games, the number of possible realizations of the common information based belief would, in
general, be infinite. Establishing the existence of common information based Markov perfect
equilibria for infinite horizon games would be an interesting direction for future work in this
area.

VIII. ACKNOWLEDGMENTS

This work was supported in part by the AFOSR MURI Grant FA9550-10-1-0573. The second
author thanks Bharti Center for Telecommunications, IIT Bombay for infrastructural support and
Research Internship in Science and Engineering program of Indo-US Science and Technology
Forum for supporting the visit to Indian Institute of Technology Bombay.

A PPENDIX A
P ROOF OF L EMMA 1

Consider a realization ct of the common information Ct at time t. Let t1 , t2 be the corre-


sponding realization of the partial functions of the control laws at time t, that is, ti = gti (, ct ).
Given the realization of the common information based belief t and the partial functions t1 , t2 ,
we can find the joint conditional distribution on (Xt , P1t , P2t , Xt+1 , P1t+1 , P2t+1 , Zt+1 ) conditioned
on the common information at time t as follows:
1 2
Pg1:t ,g1:t (xt , p1t , p2t , xt+1 , p1t+1 , p2t+1 , zt+1 |ct )
X 1 2
= Pg1:t ,g1:t (xt , p1t , p2t , xt+1 , p1t+1 , p2t+1 , zt+1 , yt+1
1 2
, yt+1 , u1t , u2t |ct )
1 ,y2 ,u1 ,u2
yt+1 t+1 t t

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


30

X
= 1{t+1 (p1t ,p2t ,u1t ,u2t ,yt+1
1 ,y2 )=z
t+1
1 1 1 1 1 1 1 2 2 2 2 2
t+1 } {t+1 (pt ,ut ,yt+1 )=pt+1 } {t+1 (pt ,ut ,yt+1 )=pt+1 }
1 ,y2 ,u1 ,u2
yt+1 t+1 t t

1 2
P(yt+1 , yt+1 |xt+1 )P(xt+1 |xt , u1t , u2t )1{t1 (p1t )=u1t } 1{t2 (p2t )=u2t } t (xt , p1t , p2t ) (40)

Note that in addition to the arguments on the left side of conditioning in (40), we only need
t and t1 , t2 to evaluate the right hand side of (40). That is, the joint conditional distribution
on (Xt , P1t , P2t , Xt+1 , P1t+1 , P2t+1 , Zt+1 ) depends only on t , t1 and t2 with no dependence on
control strategies.
We can now consider the common information based belief at time t + 1,

t+1 (xt+1 , p1t+1 , p2t+1 ) = P(xt+1 , p1t+1 , p2t+1 |ct+1 )

= P(xt+1 , p1t+1 , p2t+1 |ct , zt+1 )


P(xt+1 , p1t+1 , p2t+1 , zt+1 |ct )
= (41)
P(zt+1 |ct )
The numerator and denominator of (41) are both marginals of the probability in (40). Using (40)
in (41), gives t+1 as a function of t , t1 , t2 , zt+1 .

A PPENDIX B
P ROOF OF L EMMA 7
1 2
Consider a realization ct of common information at time t and realizations 1:t , 1:t , 1:t of
beliefs and prescriptions till time t. Because of (10) in Assumption 2, we have

t+1 = Ft (t , Zt+1 )

Hence, in order to establish the lemma, it suffices to show that


1 2
P(Zt+1 |ct , 1:t , 1:t , 1:t ) = P(Zt+1 |t , t1 , t2 ) (42)

Recall that

Zt+1 = t+1 (P1t , P2t , U1t , U2t , Yt+1


1 2
, Yt+1 )

= t+1 (P1t , P2t , t1 (P1t ), t2 (P2t ), Yt+1


1 2
, Yt+1 ) (43)

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


31

where we used the fact that the control actions are simply the prescriptions evaluated at the
private information. Therefore,
1 2
P(Zt+1 = z|ct , 1:t , 1:t , 1:t )
X
1 2
= P(Zt+1 = z, xt , xt+1 , yt+1 , yt+1 , p1t , p2t |ct , 1:t , 1:t
1 2
, 1:t )
1 ,y2 ,p1 ,p2
xt ,xt+1 ,yt+1 t+1 t t

X
1 2
= 1{t+1 (p1t ,p2t ,t1 (p1t ),t2 (p2t ),yt+1
1 ,y2 )=z} P(y
t+1 t+1 , yt+1 |xt+1 )
1 ,y2 ,p1 ,p2
xt ,yt+1 t+1 t t

P(xt+1 |xt , t1 (p1t ), t2 (p2t ))P(xt , p1t , p2t |ct , 1:t , 1:t
1 2
, 1:t )

X
1 2
= 1 ,y2 )=z} P(y
1{t+1 (p1t ,p2t ,t1 (p1t ),t2 (p2t ),yt+1 t+1 t+1 , yt+1 |xt+1 )
1 ,y2 ,p1 ,p2
xt ,yt+1 t+1 t t

P(xt+1 |xt , t1 (p1t ), t2 (p2t ))t (xt , p1t , p2t ), (44)

where we used the fact that P(xt , p1t , p2t |ct , 1:t , 1:t
1 2
, 1:t ) = P(xt , p1t , p2t |ct ), since 1:t , 1:t
1 2
, 1:t
are all functions of ct , and the fact that P(xt , p1t , p2t |ct ) =: t (xt , p1t , p2t ). The right hand side in
(44) depends only on t and t1 , t2 . Thus, the conditional probability of Zt+1 = z conditioned
1 2
on ct , 1:t , 1:t , 1:t depends only on t and t1 , t2 . This establishes (42) and hence the lemma.

A PPENDIX C
P ROOF OF L EMMA 8

Assume that virtual player 1 is using a fixed strategy of the form 1t = t1 (t ), t = 1, 2, . . . , T .


We now want to find a strategy of virtual player 2 that is a best response to the given strategy
of virtual player 1. Lemma 7 established that t is a controlled Markov process with the
prescriptions 1t , 2t as the controlling actions. Since 1t has been fixed to t1 (t ), it follows
that, under the fixed strategy of virtual player 1, t can be viewed as a controlled Markov
process with the decisions of virtual player 2, 2t as the controlling action.
At time t, if ct is the realization of common information, t is the corresponding realization
of the common information belief, then t1 = t1 (t ) is prescription selected by virtual player 1.

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


32

If virtual player 2 selects t2 , the expected instantaneous cost for the virtual player 2 is

E[c2 (Xt , U1t , U2t )|ct ] = E[c2 (Xt , t1 (P1t ), t2 (P2t ))|ct ]
X
= c2 (xt , t1 (p1t ), t2 (p2t ))P(xt , p1t , p2t |ct )
xt ,p1t ,p2t
X
= c2 (xt , t1 (p1t ), t2 (p2t ))t (xt , p1t , p2t ) =: c2 (t , t2 ) (45)
xt ,p1t ,p2t

Thus, given the fixed strategy of virtual player 1, the instantaneous expected cost for virtual
player 2 depends only on the belief t and the prescription selected by virtual player 2. Given
the controlled Markov nature of t , it follows that virtual player 2s optimization problem is a
Markov decision problem with t as the state and hence virtual player 2 can optimal select its
prescription as a function of t . This completes the proof of the lemma.

A PPENDIX D
P ROOF OF T HEOREM 2

Consider a strategy pair ( 1 , 2 ) that satisfies the conditions of the theorem. For any 1
k T and any realization ck of the common information at time k, we want to show that the
strategies form a Nash equilibrium of the sub-game starting from time k with the costs given as
T
hX i
E ci (Xt , U1t , U2t )|ck , (46)
t=k

i = 1, 2. If the strategy of player j is fixed to tj , t = k, k + 1, . . . , T , then by arguments similar


to those in the proof of Lemma 8, the optimization problem for player i starting from time k
onwards with the objective given by (46) is a Markov decision problem which we denote by
MDPki . Since ti , t = k, k + 1, . . . , T, satisfy the conditions of Theorem 2 for player i, they
satisfy the dynamic programming conditions of MDPki . Thus, ti , t = k, k + 1, . . . , T, is the
best response to j , t = k, k + 1, . . . , T, in the sub-game starting from time k. Interchanging the
roles of i and j implies that the strategies t1 , t2 , t = k, k + 1, . . . , T, form an equilibrium of
the sub-game starting from time k. Since k was arbitrary, this completes the proof of sufficiency
part of the theorem. The converse follows a similar MDP based argument.

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


33

A PPENDIX E
P ROOF OF T HEOREM 3

Consider any realization of the common information based belief and consider a Bayesian
Nash equilibrium 1 , 2 of the game SGT (). We will show that 1 , 2 satisfy the value
function conditions for time T in Theorem 2. By definition of Bayesian Nash equilibrium, for
every realization p1 of P1T ,

E [c1 (XT , 1 (P1T ), 2 (P2T ))|P1T = p1 ] E [c1 (XT , 1 (P1T ), 2 (P2T ))|P1T = p1 ], (47)

for any choice of 1 . Averaging over p1 , we get


h i h i
E E[c1 (XT , 1 (P1T ), 2 (P2T ))|P1T ] E E[c1 (XT , 1 (P1T ), 2 (P2T ))|P1T ]

= E [c1 (XT , 1 (P1T ), 2 (P2T ))] E [c1 (XT , 1 (P1T ), 2 (P2T ))], (48)

where all the expectations are with respect to the belief on (XT , P1T , P2T ). Similarly,

E [c2 (XT , 1 (P1T ), 2 (P2T ))] E [c2 (XT , 1 (P1T ), 2 (P2T ))], (49)

for any choice of 2 . Thus, Ti () := i , i = 1, 2 satisfy the conditions in (20) and (21) when
T = .
Similarly, for any time t < T , consider any realization of the common information based
belief at t and consider a Bayesian Nash equilibrium 1 , 2 of the game SGt (). Then, by
definition of Bayesian Nash equilibrium, for every realization p1 and any choice of 1 , we have
that the expression

E [c1 (Xt , 1 (Pit ), 2 (P2t )) + Vt+1


1
(Ft (, Zt+1 ))|P1t = p1 ],

(where Zt+1 is the increment in common information generated according to (6), (2) and (1)
when control actions U1t = 1 (pi ) and U2t = 2 (P2
t ) are used) can be no larger than

E [c1 (Xt , 1 (Pit ), 2 (P2t )) + Vt+1


1
(Ft (, Zt+1 ))|P1t = p1 ],

(where Zt+1 is the increment in common information generated according to (6), (2) and (1)
when control actions U1t = 1 (pi ) and U2t = 2 (P2
t ) are used. Similar conditions hold for

player 2. Averaging over p1 , p2 , establishes that ti () := i , i = 1, 2 satisfy the conditions in


(22) and (23) when t = .
Thus, the strategies i , i = 1, 2 defined by the backward induction procedure of Algorithm 1
satisfy the conditions of Theorem 2 and hence form a Markov perfect equilibrium for game G2.

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


34

R EFERENCES

[1] L. S. Shapley, Stochastic games, Proc. Natl. Acad. Sci. USA, vol. 39, pp. 10951100, 1953.
[2] M. J. Sobel, Noncooperative stochastic games, The Annals of Mathematical Statistics, vol. 42, no. 6, pp. 19301935,
1971. [Online]. Available: http://www.jstor.org/stable/2240119
[3] D. Fudenberg and J. Tirole, Game Theory. MIT Press, 1991.
[4] T. Basar and G. J. Olsder, Dynamic Non-cooperative Game Theory. SIAM Series in Classics in Applied Mathematics,
Philadelphia, 1999.
[5] J. Filar and K. Vrieze, Competitive Markov Decision Processes. Springer, 1996.
[6] R. Behn and Y.-C. Ho, On a class of linear stochastic differential games, IEEE Trans. Autom. Contr., vol. 13, no. 3, pp.
227 240, Jun 1968.
[7] I. Rhodes and D. Luenberger, Differential games with imperfect state information, IEEE Trans. Autom. Contr., vol. 14,
no. 1, pp. 29 38, Feb 1969.
[8] W. Willman, Formal solutions for a class of stochastic pursuit-evasion games, IEEE Trans. Autom. Contr., vol. 14, no. 5,
pp. 504 509, Oct 1969.
[9] Y. C. Ho, On the minimax principle and zero-sum stochastic differential games, Journal of Optimization Theory and
Applications, vol. 13, no. 3, pp. 343361, 1974.
[10] T. Basar and M. Mintz, A multistage pursuit-evasion game that admits a Gaussian random process as a maximin control
policy, Stochastics, vol. 1:1-4, pp. 2569, 1973.
[11] T. Basar, Two-criteria LQG decision problems with one-step delay observation sharing pattern, Information and Control,
vol. 38, pp. 2150, 1978.
[12] , Decentralized multicriteria optimization of linear stochastic systems, IEEE Trans. Autom. Contr., vol. 23, no. 2,
pp. 233 243, Apr. 1978.
[13] E. Altman, V. Kambley, and A. Silva, Stochastic games with one step delay sharing information pattern with application
to power control, in Proceedings of International Conference on Game Theory for Networks, GameNets09, May 2009,
pp. 124129.
[14] J. Hespanha and M. Prandini, Nash equilibria in partial-information games on Markov chains, in Proc. of the 40th IEEE
Conference on Decision and Control, 2001, pp. 21022107.
[15] T. Basar, On the saddle-point solution of a class of stochastic differential games, Journal of Optimization Theory and
Applications, vol. 33, no. 4, pp. 539556, 1981.
[16] H. Cole and N. Kocherlakota, Dynamic games with hidden actions and hidden states, Journal of Economic Theory,
vol. 98, no. 1, pp. 114126, 2001.
[17] A. Nayyar, A. Mahajan, and D. Teneketzis, Optimal control strategies in delayed sharing information structures, IEEE
Transactions on Automatic Control, vol. 57, no. 7, pp. 16061620, July 2011.
[18] A. Nayyar and T. Basar, Dynamic stochastic games with asymmetric information, accepted in 51st IEEE Conference on
Decision and Control, 2012.
[19] P. R. Kumar and P. Varaiya, Stochastic Systems: Estimation, Identification and Adaptive Control. Prentice Hall, Englewood
Cliffs, NJ, 1986.
[20] E. Maskin and J. Tirole, Markov perfect equilibrium: I. observable actions, Journal of Economic Theory, vol. 100,
no. 2, pp. 191 219, 2001. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0022053100927856
[21] M. J. Osborne and A. Rubinstein, A Course in Game Theory. MIT Press, 1994.

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


35

[22] R. B. Myerson, Game Theory: Analysis of Conflict. Harvard University Press, Cambridge, MA, 1997.
[23] Y.-C. Ho, Team decision theory and information structures, Proc. IEEE, vol. 68, no. 6, pp. 644654, 1980.
[24] A. Nayyar, A. Mahajan, and D. Teneketzis, Decentralized stochastic control with partial sharing information structures:
A common information approach, IEEE Transacions on Automatic Control, Dec 2011, submitted.

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


36

Supplementary Material

A PPENDIX F
P ROOF OF L EMMA 2

Proof: It is straightforward to verify that the structure of common and private information
1 2
satisfies Assumption 1. We focus on the proof for Assumption 2. For a realization y1:t , y1:t ,u11:t , u21:t
of the common information at time t + 1, the common information based belief can be written
as
1 2 1 2 1 1 2 2 1 2
t+1 (xt+1 , yt+1 , yt+1 ) = Pg1:t ,g1:t (Xt+1 = xt+1 , Yt+1 = yt+1 , Yt+1 = yt+1 |y1:t , y1:t , u11:t , u21:t )
1 1 2 2
= P(Yt+1 = yt+1 |Xt+1 = xt+1 )P(Yt+1 = yt+1 |Xt+1 = xt+1 )
1 2 1 2
Pg1:t ,g1:t (Xt+1 = xt+1 |y1:t , y1:t , u11:t , u21:t )
1 1 2 2
= P(Yt+1 = yt+1 |Xt+1 = xt+1 )P(Yt+1 = yt+1 |Xt+1 = xt+1 )
X h 1 2
i
1
P(Xt+1 = xt+1 |Xt = xt , u1t , u2t )Pg1:t ,g1:t (Xt = xt |y1:t 2
, y1:t , u11:t , u21:t ) , (50)
xt

where we used the dynamics and observation model to get the expression in (50). It can now be
argued that in the last term in (50), we can remove the terms u1t , u2t in the conditioning since
1 2
they are functions of the rest of terms y1:t , y1:t , u11:t1 , u21:t1 in the conditioning. The last term
in (50) would then be
1 2 1 2
Pg1:t ,g1:t (Xt = xt |y1:t , y1:t , u11:t1 , u21:t1 ),

1 2
which is known to be independent of choice of control laws g1:t , g1:t [19]. Thus, t+1 is inde-
pendent of the choice of control laws. For the sake of completeness, we provide a more detailed
argument below.
The last term in (50) can be written as
1 2 1 2
Pg1:t ,g1:t (Xt = xt |y1:t , y1:t , u11:t , u21:t )
1 2 1 2
= Pg1:t ,g1:t (Xt = xt |y1:t , y1:t , u11:t1 , u21:t1 )
1 2
Pg1:t ,g1:t (Xt = xt , Yt1 = yt1 , Yt2 = yt2 |y1:t1
1 2
, y1:t1 , u11:t1 , u21:t1 )
= 1 2 1
Pg1:t ,g1:t (Yt1 = yt1 , Yt2 = yt2 |y1:t1 2
, y1:t1 , u11:t1 , u21:t1 )
t (xt , yt1 , yt2 )
=P 1 2
(51)
xt t (xt , yt , yt )

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


37

Combining (50) and (51) establishes that t+1 is a function only of t and zt+1 = (yt1 , yt2, u1t , u2t ).
Further, the transformation form (t , zt+1 ) to t+1 does not depend on the choice of control
strategies.

A PPENDIX G
P ROOF OF L EMMA 3

Proof: It is straightforward to verify that the structure of common and private information
1 2
satisfies Assumption 1. We focus on the proof for Assumption 2. For a realization y1:t+1 , y1:t ,
u11:t , u21:t of the common information at time t + 1, the common information based belief can be
written as
2 1 2 2 2 1 2
t+1 (xt+1 , yt+1 ) = Pg1:t ,g1:t (Xt+1 = xt+1 , Yt+1 = yt+1 |y1:t+1 , y1:t , u11:t , u21:t )
2 2 1 2 1 2
= P(Yt+1 = yt+1 |Xt+1 = xt+1 )Pg1:t ,g1:t (Xt+1 = xt+1 |y1:t+1 , y1:t , u11:t , u21:t )
1 2 1 1 1 2
2 2 Pg1:t ,g1:t (Xt+1 = xt+1 , Yt+1 = yt+1 |y1:t , y1:t , u11:t , u21:t )
= =
P(Yt+1 yt+1 |Xt+1
= xt+1 ) P g1 ,g2 1 1 1 2 1 2
(52)
xP
1:t 1:t (X
t+1 = x, Yt+1 = yt+1 |y1:t , y1:t , u1:t , u1:t )
The numerator in the second term in (52) can be written as
1 1 1 2 1 2
P(Yt+1 = yt+1 |Xt+1 = xt+1 )Pg1:t ,g1:t (Xt+1 = xt+1 |y1:t , y1:t , u11:t , u21:t )

1 1
= P(Yt+1 = yt+1 |Xt+1 = xt+1 )
X h 1 2
i
1
P(Xt+1 = xt+1 |Xt = xt , u1t , u2t )Pg1:t ,g1:t (Xt = xt |y1:t 2
, y1:t , u11:t1 , u21:t1 )
xt

1 1
= P(Yt+1 = yt+1 |Xt+1 = xt+1 )
1 2
Xh Pg1:t ,g1:t (Xt = xt , yt2 |y1:t1 2
, y1:t1 , u11:t1 , u21:t1 ) i
P(Xt+1 = xt+1 |Xt = xt , u1t , u2t ) 1 2
xt
Pg1:t ,g1:t (yt2 |y1:t
1 2
, y1:t1 , u11:t1 , u21:t1 )

1 1
Xh t (xt , yt2 ) i
= P(Yt+1 = yt+1 |Xt+1 = xt+1 ) P(Xt+1 = xt+1 |Xt = xt , u1t , u2t ) 2
(53)
x
t (yt )
t

Similar expressions can be obtained for the denominator of the second term in (52) to get
2 2 2
t+1 (xt+1 , yt+1 ) = P(Yt+1 = yt+1 |Xt+1 = xt+1 )
1 1
P h i
P(Yt+1 = yt+1 |Xt+1 = xt+1 ) xt P(Xt+1 = xt+1 |Xt = xt , u1t , u2t )t (xt , yt2 )
1 1
P h 1 2 2
i
P(Yt+1 = yt+1 |Xt+1 = x) xt P(Xt+1 = x|Xt = xt , ut , ut )t (xt , yt )
1
=: Ft (t , yt+1 , yt2 , u1t , u2t ) = Ft (t , zt+1 ) (54)

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


38

A PPENDIX H
P ROOF OF L EMMA 4

Assumption 1 is clearly satisfied. We focus on Assumption 2. Case A: For a realization


1 2
y1:t , y1:td , u11:t1 of the common information, the common information based belief in this case
can be written as:

2 1 2 2 1 2
t (xt , ytd+1:t ) = Pg1:t1 (Xt = xt , Ytd+1:t = ytd+1:t |y1:t , y1:td , u11:t1 )
X h
2 2
= P(Ytd+1:t = ytd+1:t |Xtd+1:t1 = xtd+1:t1 , Xt = xt )
xtd:t1

1
i
Pg1:t1 (Xt = xt , Xtd:t1 = xtd:t1 |y1:t
1 2
, y1:td , u11:t1 ) (55)

The first term in (55) depends only on the noise statistics. To see how the second term in
(55) is strategy independent, consider a centralized stochastic control problem with controller
1 as the only controller where the state process is Xt := (Xtd:t ), the observation process is
Yt := (Yt1 , Ytd
2
). The second term in (55) is simply the information state P(Xt |y1:t , u11:t1 )
of this centralized stochastic control problem which is known to be strategy independent and
satisfies an update equation of the form required by Lemma 4 [19].
Case B: Using arguments similar to those in Case A, the common information based belief
1 2
t for a realization y1:t1 , y1:td , u11:t1 of the common information can be written as:
X h
t (xt , yt1 , ytd+1:t
2
)= P(Yt1 = yt1 , Ytd+1:t
2 2
= ytd+1:t |Xtd+1:t1 = xtd+1:t1 , Xt = xt )
xtd:t1

1
i
1
Pg1:t1 (Xt = xt , Xtd:t1 = xtd:t1 |y1:t1 2
, y1:td , u11:t1 ) (56)

The second term in (56) is


1
2
P(ytd |xtd )Pg1:t1 (Xt = xt , Xtd:t1 = xtd:t1 |y1:t1
1 2
, y1:td1 , u11:t1 )
1 (57)
Pg1:t1 (Ytd
2 2
= ytd 1
|y1:t1 2
, y1:td1 , u11:t1 )
Both the numerator and the denominator can be shown to be strategy independent using the
transformation to centralized stochastic control problem described in case A.

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


39

A PPENDIX I
P ROOF OF L EMMA 5
0
For a realization y1:t+1 , u11:t , u21:t of the common information at time t + 1, the belief t+1 is
given as
1 2
t+1 (x0 , x1 , x2 ) = Pg1:t1 ,g1:t1 (Xt+1
0
= x0 , Xt+1
1
= x1 , Xt+1
2
= x2 |y1:t+1
0
, u11:t , u21:t ) (58)

1 2
0
Pg1:t1 ,g1:t1 (Xt+1 = x0 , Xt+1
1
= x1 , Xt+1
2
= x2 , Yt+1
0 0
= yt+1 0
|y1:t , u11:t , u21:t )
= 1 2
Pg1:t1,g1:t1 (Yt+1
0 0
= yt+1 0
|y1:t , u11:t , u21:t )
1 2
0 0 0
P(Yt+1 = yt+1 |Xt+1 = x0t+1 )Pg1:t1,g1:t1 (Xt+1
0
= x0 , Xt+1 1
= x1 , Xt+1
2
= x2 |y1:t 0
, u11:t , u21:t )
= 0 0 0 1
g1:t1 2
,g1:t1 0 0
, u11:t , u21:t )
P
x P(Yt+1 = yt+1 |Xt+1 = x)P (Xt+1 = x|y1:t
(59)
The control strategy dependent term in the numerator in (59) can be written as
1 2 0
Pg1:t1 ,g1:t1 (Xt+1 = x0 , Xt+1
1
= x1 , Xt+1
2
= x2 |y1:t
0
, u11:t , u21:t )
Xh
0
= P(Xt+1 = x0 , Xt+1
1
= x1 , Xt+1
2
= x2 |Xt0 = x , u1t , u2t )
x
1 2
i
Pg1:t1 ,g1:t1 (Xt0 = x |y1:t
0
, u11:t1 , u21:t1 )
X
0
= P(Xt+1 = x0 , Xt+1
1
= x1 , Xt+12
= x2 |Xt0 = x , u1t , u2t )t (x ) (60)
x

Similarly, the control strategy dependent term in the denominator in (59) can be written as
1 2
X
0 0
Pg1:t1,g1:t1 (Xt+1 = x|y1:t , u11:t , u21:t ) = 0
P(Xt+1 = x|Xt0 = x , u1t , u2t )t (x ) (61)
x

Substituting (60) and (61) in (59) establishes the lemma.

A PPENDIX J
P ROOF OF L EMMA 6

Consider a realization ct of the common information Ct at time t. Given the realization


of the common information based belief t , we can find the joint conditional distribution
on (Xt , P1t , P2t , Xt+1 , P1t+1 , P2t+1 , Zt+1 ) conditioned on the common information at time t as

September 18, 2012 DRAFT

Preliminary Version September 18, 2012


40

follows:

P(xt , p1t , p2t , xt+1 , p1t+1 , p2t+1 , zt+1 |ct )


X
= P(xt , p1t , p2t , xt+1 , p1t+1 , p2t+1 , zt+1 , yt+1
1 2
, yt+1 |ct )
1 ,y2
yt+1 t+1
X h
= 1{t+1 (p1t ,p2t ,yt+1
1 ,y2 )=z
t+1
1 1 1 1 1 1 2 2 2 2
t+1 } {t+1 (pt ,yt+1 )=pt+1 } {t+1 (pt ,yt+1 )=pt+1 }
1 ,y2
yt+1 t+1
i
1 2
P(yt+1 , yt+1 |xt+1 )P(xt+1 |xt )t (xt , p1t , p2t ) (62)

Note that in addition to the arguments on the left side of conditioning in (62), we only need t
to evaluate the right hand side of (62).
We can now consider the common information based belief at time t + 1,

t+1 (xt+1 , p1t+1 , p2t+1 ) = P(xt+1 , p1t+1 , p2t+1 |ct+1 )

= P(xt+1 , p1t+1 , p2t+1 |ct , zt+1 )


P(xt+1 , p1t+1 , p2t+1 , zt+1 |ct )
= (63)
P(zt+1 |ct )
The numerator and denominator of (63) are both marginals of the probability in (62). Using (62)
in (63), gives t+1 as a function of t , zt+1 .

September 18, 2012 DRAFT

Preliminary Version September 18, 2012

You might also like