.
Mixed and Behavioral Strategies
Game Theory Course:
Jackson, Leyton-Brown & Shoham
Game Theory Course: Jackson, Leyton-Brown & Shoham
Mixed and Behavioral Strategies
Randomized Strategies
There are two meaningfully different kinds of randomized
strategies in imperfect information extensive form games
mixed strategies
behavioral strategies
Mixed strategy: randomize over pure strategies
Behavioral strategy: independent coin toss every time an
information set is encountered
Game Theory Course: Jackson, Leyton-Brown & Shoham
Mixed and Behavioral Strategies
Notice that the definition contains a subtlety. An agents strategy requires a decision
at each choice node, regardless of whether or not it is possible to reach that node given
the other choice nodes. In the Sharing game above the situation is straightforward
player 1 has three pure strategies, and player 2 has eight (why?). But now consider the
game shown in Figure 5.2.
Randomized strategies example
1
A
(3,8)
(8,3)
(5,5)
F
1
G
(2,10)
Figure 5.2
H
(1,0)
A perfect-information game in extensive form.
In order to define a complete strategy for this game, each of the players must choose
an action at each of his two choice nodes. Thus we can enumerate the pure strategies
of the players as follows.
Example of a behavioral strategy:
S = {(A, G), (A, H), (B, G), (B, H)}
A with probability
.5 and G with probability .3
1
S2 = {(C, E), (C, F ), (D, E), (D, F )}
It is important to note that we have to include the strategies (A, G) and (A, H), even
though once A is chosen the G-versus-H choice is moot.
The definition of best response and Nash equilibria in this game are exactly as they
are in for normal form games. Indeed, this example illustrates how every perfectinformation game can be converted to an equivalent normal form game. For example,
the perfect-information game of Figure 5.2 can be converted into the normal form image of the game, shown in Figure 5.3. Clearly, the strategy spaces of the two games are
Multi Agent Systems, draft of September 19, 2006
Game Theory Course: Jackson, Leyton-Brown & Shoham
Mixed and Behavioral Strategies
Notice that the definition contains a subtlety. An agents strategy requires a decision
at each choice node, regardless of whether or not it is possible to reach that node given
the other choice nodes. In the Sharing game above the situation is straightforward
player 1 has three pure strategies, and player 2 has eight (why?). But now consider the
game shown in Figure 5.2.
Randomized strategies example
1
A
(3,8)
(8,3)
(5,5)
F
1
G
(2,10)
Figure 5.2
H
(1,0)
A perfect-information game in extensive form.
In order to define a complete strategy for this game, each of the players must choose
an action at each of his two choice nodes. Thus we can enumerate the pure strategies
of the players as follows.
Example of a behavioral strategy:
S = {(A, G), (A, H), (B, G), (B, H)}
A with probability
.5 and G with probability .3
1
S2 = {(C, E), (C, F ), (D, E), (D, F )}
Example of
It is a
important
to notestrategy
that we have to include
strategies
and (A, H), even
mixed
thatthe is
not(A,aG)behavioral
strategy:
though once A is chosen the G-versus-H choice is moot.
The definition of best response and Nash equilibria in this game are exactly as they
(.6(A, G),
.4(B, H)) (why not?)
are in for normal form games. Indeed, this example illustrates how every perfectinformation game can be converted to an equivalent normal form game. For example,
the perfect-information game of Figure 5.2 can be converted into the normal form image of the game, shown in Figure 5.3. Clearly, the strategy spaces of the two games are
Multi Agent Systems, draft of September 19, 2006
Game Theory Course: Jackson, Leyton-Brown & Shoham
Mixed and Behavioral Strategies
Notice that the definition contains a subtlety. An agents strategy requires a decision
at each choice node, regardless of whether or not it is possible to reach that node given
the other choice nodes. In the Sharing game above the situation is straightforward
player 1 has three pure strategies, and player 2 has eight (why?). But now consider the
game shown in Figure 5.2.
Randomized strategies example
1
A
(3,8)
(8,3)
(5,5)
F
1
G
(2,10)
Figure 5.2
H
(1,0)
A perfect-information game in extensive form.
In order to define a complete strategy for this game, each of the players must choose
an action at each of his two choice nodes. Thus we can enumerate the pure strategies
of the players as follows.
Example of a behavioral strategy:
S = {(A, G), (A, H), (B, G), (B, H)}
A with probability
.5 and G with probability .3
1
S2 = {(C, E), (C, F ), (D, E), (D, F )}
Example of
It is a
important
to notestrategy
that we have to include
strategies
and (A, H), even
mixed
thatthe is
not(A,aG)behavioral
strategy:
though once A is chosen the G-versus-H choice is moot.
The definition of best response and Nash equilibria in this game are exactly as they
(.6(A, G),
.4(B, H)) (why not?)
are in for normal form games. Indeed, this example illustrates how every perfectinformation game can be converted to an equivalent normal form game. For example,
the perfect-information
game of Figure 5.2strategy
can be converted corresponds
into the normal form im- to a mixed
In this game
every behavioral
age of the game, shown in Figure 5.3. Clearly, the strategy spaces of the two games are
strategy...
Multi Agent Systems, draft of September 19, 2006
Game Theory Course: Jackson, Leyton-Brown & Shoham
Mixed and Behavioral Strategies
Games of imperfect recall
Imagine that player 1 sends two proxies to the game with the same
strategies. When one arrives, he doesnt know if the other has
arrived before him, or if hes the first one.
121
5.2
Imperfect-information extensive-form games
s
T
T
T R
L
T
T
1,0
100,100
sH
L
HH
HH R
HH
Hs 2
T
T D
T
T
T
2,2
HH
U
5,1
Figure 5.12 A game with imperfect recall
What is the space
pure
strategies
in1 decides
thisprobabilistically
game?
librium. Noteof
in particular
that in
a mixed strategy, agent
whether to play L or R in his information set, but once he decides he plays that pure
strategy consistently. Thus the payoff of 100 is irrelevant in the context of mixed strategies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh
each time he finds himself in the information set. Noting that the pure strategy D is
weakly dominant for agent 2 (and in fact is the unique best response to all strategies of
agent 1 other than the pure strategy L), agent 1 computes the best response to D as follows. If he uses the behavioral strategy (p, 1 p) (that is, choosing L with probability
p each time he finds himself in the information set), his expected payoff is
1 p2 + 100 p(1 p) + 2 (1 p)
The expression simplifies to 99p2 + 98p + 2, whose maximum is obtained at p =
98/198. Thus (R,D) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strateGame Theory Course: Jackson, Leyton-Brown
Mixed and Behavioral Strategies
gies, and instead&weShoham
get the equilibrium ((98/198, 100/198), (0, 1)).
Games of imperfect recall
Imagine that player 1 sends two proxies to the game with the same
strategies. When one arrives, he doesnt know if the other has
arrived before him, or if hes the first one.
121
5.2
Imperfect-information extensive-form games
s
T
T
T R
L
T
T
1,0
100,100
sH
L
HH
HH R
HH
Hs 2
T
T D
T
T
T
2,2
HH
U
5,1
Figure 5.12 A game with imperfect recall
What is the space
pure
strategies
in1 decides
thisprobabilistically
game?
librium. Noteof
in particular
that in
a mixed strategy, agent
whether to play L or R in his information set, but once he decides he plays that pure
strategy
consistently.
Thus the payoff of 100 is irrelevant in the context of mixed strate 1: (L, R); 2:
(U,
D)
gies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh
each time he finds himself in the information set. Noting that the pure strategy D is
weakly dominant for agent 2 (and in fact is the unique best response to all strategies of
agent 1 other than the pure strategy L), agent 1 computes the best response to D as follows. If he uses the behavioral strategy (p, 1 p) (that is, choosing L with probability
p each time he finds himself in the information set), his expected payoff is
1 p2 + 100 p(1 p) + 2 (1 p)
The expression simplifies to 99p2 + 98p + 2, whose maximum is obtained at p =
98/198. Thus (R,D) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strateGame Theory Course: Jackson, Leyton-Brown
Mixed and Behavioral Strategies
gies, and instead&weShoham
get the equilibrium ((98/198, 100/198), (0, 1)).
Games of imperfect recall
Imagine that player 1 sends two proxies to the game with the same
strategies. When one arrives, he doesnt know if the other has
arrived before him, or if hes the first one.
121
5.2
Imperfect-information extensive-form games
s
T
T
T R
L
T
T
1,0
100,100
sH
L
HH
HH R
HH
Hs 2
T
T D
T
T
T
2,2
HH
U
5,1
Figure 5.12 A game with imperfect recall
What is the space
pure
strategies
in1 decides
thisprobabilistically
game?
librium. Noteof
in particular
that in
a mixed strategy, agent
whether to play L or R in his information set, but once he decides he plays that pure
strategy
consistently.
Thus the payoff of 100 is irrelevant in the context of mixed strate 1: (L, R); 2:
(U,
D)
gies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh
each time he finds himself in the information set. Noting that the pure strategy D is
weakly dominant
for agent 2 (and inequilibrium?
fact is the unique best response to all strategies of
What is the mixed
strategy
agent 1 other than the pure strategy L), agent 1 computes the best response to D as follows. If he uses the behavioral strategy (p, 1 p) (that is, choosing L with probability
p each time he finds himself in the information set), his expected payoff is
1 p2 + 100 p(1 p) + 2 (1 p)
The expression simplifies to 99p2 + 98p + 2, whose maximum is obtained at p =
98/198. Thus (R,D) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strateGame Theory Course: Jackson, Leyton-Brown
Mixed and Behavioral Strategies
gies, and instead&weShoham
get the equilibrium ((98/198, 100/198), (0, 1)).
Games of imperfect recall
Imagine that player 1 sends two proxies to the game with the same
strategies. When one arrives, he doesnt know if the other has
arrived before him, or if hes the first one.
121
5.2
Imperfect-information extensive-form games
sH
L
s
T
T
T R
L
T
T
1,0
100,100
HH
HH R
HH
Hs 2
T
T D
T
T
T
2,2
HH
U
5,1
Figure 5.12 A game with imperfect recall
What is the space
pure
strategies
in1 decides
thisprobabilistically
game?
librium. Noteof
in particular
that in
a mixed strategy, agent
whether to play L or R in his information set, but once he decides he plays that pure
strategy
consistently.
Thus the payoff of 100 is irrelevant in the context of mixed strate 1: (L, R); 2:
(U,
D)
gies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh
each time he finds himself in the information set. Noting that the pure strategy D is
weakly dominant
for agent 2 (and inequilibrium?
fact is the unique best response to all strategies of
What is the mixed
strategy
agent 1 other than the pure strategy L), agent 1 computes the best response to D as follows. If he uses the behavioral strategy (p, 1 p) (that is, choosing L with probability
Observe that
Dheisfindsdominant
for set),
2.hisR,
Dpayoff
is better
for 1 than L, D,
p each time
himself in the information
expected
is
1
p
+
100
p(1
p)
+
2
(1
p)
so R, D is an equilibrium.
2
The expression simplifies to 99p2 + 98p + 2, whose maximum is obtained at p =
98/198. Thus (R,D) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strateGame Theory Course: Jackson, Leyton-Brown
Mixed and Behavioral Strategies
gies, and instead&weShoham
get the equilibrium ((98/198, 100/198), (0, 1)).
Games of imperfect recall
5.2
121
Imperfect-information extensive-form games
s
T
T
T R
L
T
T
1,0
100,100
sH
L
HH
HH R
HH
Hs 2
T
T D
T
T
T
2,2
HH
U
5,1
Figure 5.12 A game with imperfect recall
What is an equilibrium
in behavioral strategies?
librium. Note in particular that in a mixed strategy, agent 1 decides probabilistically
whether to play L or R in his information set, but once he decides he plays that pure
strategy consistently. Thus the payoff of 100 is irrelevant in the context of mixed strategies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh
each time he finds himself in the information set. Noting that the pure strategy D is
weakly dominant for agent 2 (and in fact is the unique best response to all strategies of
agent 1 other than the pure strategy L), agent 1 computes the best response to D as follows. If he uses the behavioral strategy (p, 1 p) (that is, choosing L with probability
p each time he finds himself in the information set), his expected payoff is
1 p2 + 100 p(1 p) + 2 (1 p)
The expression simplifies to 99p2 + 98p + 2, whose maximum is obtained at p =
98/198. Thus (R,D) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strategies, and instead we get the equilibrium ((98/198, 100/198), (0, 1)).
There is, however, a broad class of imperfect-information games in which the expressive power of mixed and behavioral strategies coincides. This is the class of games
of perfect recall. Intuitively speaking, in these games no player forgets any information
he knew about moves made so far; in particular, he remembers precisely all his own
moves. Formally:
Definition 5.2.3 Player i has perfect recall in an imperfect-information game G if for
perfect recall
Game Theory Course: Jackson, Leyton-Brown
& Shoham
Mixed and Behavioral Strategies
any two nodes h, h that are in the same information set for player i, for any path
Games of imperfect recall
5.2
121
Imperfect-information extensive-form games
sH
L
s
T
T
T R
L
T
T
1,0
100,100
HH
HH R
HH
Hs 2
T
T D
T
T
T
2,2
HH
U
5,1
Figure 5.12 A game with imperfect recall
What is an equilibrium
in behavioral strategies?
librium. Note in particular that in a mixed strategy, agent 1 decides probabilistically
whether to play L or R in his information set, but once he decides he plays that pure
again, D strongly
dominant
strategy consistently.
Thus the payofffor
of 100 is2irrelevant in the context of mixed strategies. On the other hand, with behavioral strategies agent 1 gets to randomize afresh
if 1 uses the
strategy
thep),
his Dexpected
utility is
each behavioural
time he finds himself in the
information set.(p,
Noting1that
pure strategy
is
weakly dominant for agent 2 (and in fact is the unique best response to all strategies of
agent
1
other
than
the
pure
strategy
L),
agent
1
computes
the
best
response
to
D
as
folp2 + 100p(1
p) + 2(1 p)
lows. If he uses the behavioral strategy (p, 1 p) (that is, choosing L with probability
p each time he finds
himself in the information set), his expected payoff is
simplifies to
99p2 +
98p + 2
1 p + 100 p(1 p) + 2 (1 p)
maximum at
p
=
98/198
The expression simplifies to 99p + 98p + 2, whose maximum is obtained at p =
98/198. Thus (R,D) = ((0, 1), (0, 1)) is no longer an equilibrium in behavioral strate thus equilibrium
is we(98/198,
(0,
gies, and instead
get the equilibrium100/198),
((98/198, 100/198), (0,
1)). 1)
There is, however, a broad class of imperfect-information games in which the expressive
power
of
mixed
and
behavioral
strategies
coincides.
This
class of games
Thus, we canofhave
equilibria in behavioralis thestrategies
that are
perfect recall. Intuitively speaking, in these games no player forgets any information
2
knew about moves made so far; in particular, he remembers precisely all his own
different fromhemoves.
equilibria
in mixed strategies.
Formally:
Definition 5.2.3 Player i has perfect recall in an imperfect-information game G if for
perfect recall
Game Theory Course: Jackson, Leyton-Brown
& Shoham
Mixed and Behavioral Strategies
any two nodes h, h that are in the same information set for player i, for any path