0% found this document useful (0 votes)
26 views14 pages

Lect27 4up

This document discusses competition between agents and game theory. It provides an example of a congestion game where agents can choose to implement TCP correctly or defectively. This situation can be modeled as a game where the payoffs depend on the choices of both agents. Game theory offers strategies to analyze how individual agents should act and how the game may reach an equilibrium state between the agents. Several types of games are discussed including coordination games, common payoff games, and zero-sum games.

Uploaded by

alialataby
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views14 pages

Lect27 4up

This document discusses competition between agents and game theory. It provides an example of a congestion game where agents can choose to implement TCP correctly or defectively. This situation can be modeled as a game where the payoffs depend on the choices of both agents. Game theory offers strategies to analyze how individual agents should act and how the game may reach an equilibrium state between the agents. Several types of games are discussed including coordination games, common payoff games, and zero-sum games.

Uploaded by

alialataby
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Today

Robotics and Autonomous Systems • In the last lecture we started to look at competition between agents
Lecture 27: More on self-interest

Richard Williams

Department of Computer Science


University of Liverpool

• Today we look more into this.

1 / 53 2 / 53

Competition between agents Game theory?

• Game theory is a framework for analysing interactions between a set


of agents.
• Abstract specification of interactions.
• Describes each agent’s preferences in terms of their utility.
• Assume agents want to maximise utility.

• Give us a range of solution strategies with which we can make some


predictions about how agents will/should interact.

• Situation is more like this.

3 / 53 4 / 53
Congestion Game Congestion Game

• Capture this as:


• Agents using TCP to communicate.
i
• If packets collide, should back-off.
defect correct
• Works if everyone does this. defect -3 -4
• But what if agents could choose a defective implementation that j -3 0
doesn’t back-off? correct 0 -1
• In a collision, their message would get sent quicker. -4 -1
• But what if everyone did this?
• Outcome depends on what other agents do. • Agent i is the column player.
• Agent j is the row player.

5 / 53 6 / 53

Congestion Game Congestion Game

• Two obvious questions we can ask in this scenario: • What should an individual agent do?
• What should an individual agent do? • Depends on what the other agent does.
• How does the game get played — how do both agents together act? • How does the game get played — how do both agents together act?
• Game theory offers some ideas about how to answer these questions. • Equilibrium.

7 / 53 8 / 53
Congestion Game Normal form games

• As with all good games, the congestion game captures some


underlying truths about the world at an abstract level:

• An n-person, finite, normal form game is a tuple pN , A , uq, where


• N is a finite set of players.
• A “ A1 ˆ . . . ˆ An where Ai is a finite set of actions available to i.
Each a “ pa1 , . . . , an q P A is an action profile.
• u “ pu1 , . . . , un q where ui : A ÞÑ < is a real-valued utility function for i.
• Naturally represented by an n-dimensional matrix

• (Though you might want to alter the payoffs somewhat.)

9 / 53 10 / 53

Normal form games Strategies

• We analyze games in terms of strategies, that is what agents decide


to do.
• Combined with what the other agent(s) do(es) this jointly determines
the payoff.
• An agent’s strategy set is its set of available choices.
• Can just be the set of actions — pure strategies.
• In the congestion game, the set of pure strategies is:
tcorrect , defect u
• We need more than just pure strategies in many cases.
• Will discuss this later

11 / 53 12 / 53
Payoff matrix Common payoff games

• Here is the payoff matrix from the “choose which side” (of the road) • “Choose which side” game
game: left right
i left 1 0
left right 1 0
left 1 0 right 0 1
j 1 0 0 1
right 0 1 Also called the coordination game
0 1 • Any game with ui pa q “ uj pa q for all a P Ai ˆ Aj is a common payoff
• We can classify games by the form of the payoff matrix. game.

13 / 53 14 / 53

Common payoff games Common payoff games

• In between is the El Farol bar problem:

• The misanthropes’ (un)coordination game:


left right
left 0 1
0 1
right 1 0
1 0

Here we try to avoid each other.


• If everyone goes to the bar it is no fun, but if only some people go
then everyone who goes has a good time.
Should you go or not?

15 / 53 16 / 53
Constant sum games Zero-sum games

• Matching pennies
heads tails
heads -1 1 • A particular category of constant sum games are zero-sum games.
1 -1 • Where utilities sum to zero:
tails 1 -1
-1 1 u1 pai q ` uj pωq “ 0 for all a P Ai ˆ Aj

• Any game with ui pa q ` uj pa q “ c for all a P Ai ˆ Aj is a constant sum


game.

17 / 53 18 / 53

Zero-sum games Zero-sum games

• Where preferences of agents are diametrically opposed we have • Rock, paper, scissors:
strictly competitive scenarios.
• Zero sum implies strictly competitive.

• Zero sum encounters in real life are very rare . . . but people tend to
act in many scenarios as if they were zero sum.
• Most games have some room in the set of outcomes for agents to find is another constant/zero sum game.
(somewhat) mutually beneficial outcomes. • Game in two senses.
19 / 53 20 / 53
The rules Rock, paper, scissors

• Rules for “rock, paper, scissors”.


• How would you play?
i
rock paper scissors
rock 0 1 -1
0 -1 1
j paper -1 0 1
1 0 -1
scissors 1 -1 0
-1 1 0

21 / 53 22 / 53

Mixed strategy Mixed strategy

• A mixed strategy is just a probability distribution across a set of pure


strategies.
• Chances are you would play a mixed strategy. • More formally, for a game with two actions a1 and a2 , i picks a vector
• You would: of probabilities:
• sometimes play rock,
x “ px1 , x2 q
• sometimes play paper; and where
• sometimes play scissors.
ÿ
xk “ 1
• A fixed/pure strategy is easy for an adaptive player to beat. k

and
xk ě 0
• i then picks action a1 with probability x1 and a2 with probability a2 .

23 / 53 24 / 53
Mixed strategy Mixed strategy

• To determine the mixed strategy, i needs then to compute the best • Let’s consider the payoff matrix:
values of x1 and x2 . i
• These will be the values which give i the highest payoff given the a1 a2
options that j can choose and the joint payoffs that result. a1 -3 1
• In the next lecture we will get into the computation of expected values, j 3 -1
which is one way to analyse this. a2 0 -1
• But for now we will look at a simple graphical method 0 1
• Only works for very simple cases. and compute mixed strategies to be used by the players.

25 / 53 26 / 53

Mixed strategy Mixed strategy

• i’s analysis of this game would be something like this:

1 j picks first row 1


0 j picks second row 0 • j can analyse the problem in terms of a probability vector
−1 −1
y “ py1 , y2 q
−2 −2
−3 −3 and come up with a similar picture.
c1= 0.4

0 c1 1
1 c2 0
• i picks the probability of a1 so that it is indifferent to what j picks.

27 / 53 28 / 53
Mixed strategy Mixed strategy

• This analysis will help i and j choose a mixed strategy for the specific
• j’s analysis would be something like this:
case in which the payoffs to the two agents are equal and opposite for
3 3 each outcome.
i picks first column
2 2
1 1
i picks second column
0 0
r1 = 0.2
−1 −1

0 r1 1
1 r2 0
• Application to zero sum games is due to von Neumann.

29 / 53 30 / 53

General sum games Nash equilibrium

• Battle of the Outmoded Gender Stereotypes


• aka Battle of the Sexes

this that • Last time we introduced the notion of Nash equilibrium as a solution
this 1 0 concept for general sum games.
2 0 • (We didn’t describe it in exactly those terms.)
that 0 2 • Looked at pure strategy Nash equilibrium.
0 1
• Issue was that not every game has a pure strategy Nash equilibrium.

• Game contains elements of cooperation and competition.


• The interplay between these is what makes general sum games
interesting.

31 / 53 32 / 53
Nash equilibrium Nash equilibrium

• For example:
i
D C • The notion of Nash equilibrium extends to mixed strategies.
D 2 1
• And every game has at least one mixed strategy Nash equilibrium.
j 1 2
C 0 1
2 1
• Has no pure strategy NE.

33 / 53 34 / 53

Nash equilibrium Nash equilibrium

• For a game with payoff matrices A (to i) and B (to j), a mixed strategy
px ˚ , y ˚ q is a Nash equilibrium solution if:

@x , x ˚ Ay ˚T ě xAy ˚T
@y , x ˚ By ˚T ě xBy ˚T

• In other words, x ˚ gives a higher expected value to i than any other


strategy when j plays y ˚ .
• Similarly, y ˚ gives a higher expected value to j than any other strategy
when i plays x ˚ .

35 / 53 36 / 53
Nash equilibrium The Prisoner’s Dilemma

• Unfortunately, this doesn’t solve the problem of which Nash


equilibrium you should play.

37 / 53 38 / 53

The Prisoner’s Dilemma The Prisoner’s Dilemma

Two men are collectively charged with a crime and held in • Payoff matrix for prisoner’s dilemma:
separate cells, with no way of meeting or communicating.
i
They are told that:
defect coop
• if one confesses and the other does not, the confessor will defect 2 1
be freed, and the other will be jailed for three years; j 2 4
• if both confess, then each will be jailed for two years. coop 4 3
Both prisoners know that if neither confesses, then they will each 1 3
be jailed for one year. • What should each agent do?

39 / 53 40 / 53
What Should You Do? What Did You Do?

• The individually rational action is defect.


This guarantees a payoff of no worse than 2, whereas cooperating
guarantees a payoff of at most 1. • The Prisoner’s Dilemma is the same game as the “grade game”.
Just has a different back story.
• So defection is the best response to all possible strategies: both
agents defect, and get payoff = 2. • When you played that,
18 of you chose “defect”.
• But intuition says this is not the best outcome:
6 of you chose “cooperate”.
Surely they should both cooperate and each get payoff of 3!
• This is why the PD game is interesting — the analysis seems to give
us a paradoxical answer.

41 / 53 42 / 53

Solution Concepts The Paradox

• This apparent paradox is the fundamental problem of multi-agent


• Payoff matrix for prisoner’s dilemma: interactions.
It appears to imply that cooperation will not occur in societies of
i
self-interested agents.
defect coop
defect 2 1 • Real world examples:
j 2 4 • nuclear arms reduction/proliferation
coop 4 3 • free rider systems — public transport, file sharing;
1 3 • in the UK — television licenses.
• climate change — to reduce or not reduce emissions
• There is no dominant strategy (in the game theory sense). • doping in sport
• pD , D q is the only Nash equilibrium. • resource depletion

• All outcomes except pD , D q are Pareto optimal. • The prisoner’s dilemma is ubiquitous.
• pC , C q maximises social welfare. • Can we recover cooperation?

43 / 53 44 / 53
The Shadow of the Future Backwards Induction

• Play the game more than once.


If you know you will be meeting your opponent again, then the • Suppose you both know that you will play the game exactly n times.
incentive to defect appears to evaporate. On round n ´ 1, you have an incentive to defect, to gain that extra bit
• If you defect, you can be punished (compared to the co-operation of payoff.
reward.) But this makes round n ´ 2 the last “real”, and so you have an
• If you get suckered, then what you lose can be amortised over the rest incentive to defect there, too.
of the iterations, making it a small loss.
This is the backwards induction problem.
• Cooperation is (provably) the rational choice in the infinitely repeated • Playing the prisoner’s dilemma with a fixed, finite, pre-determined,
prisoner’s dilemma. commonly known number of rounds, defection is the best strategy.
(Hurrah!)
• But what if there are a finite number of repetitions?

45 / 53 46 / 53

Agh! Axelrod’s Tournament

• That seems to suggest that you should never cooperate.


• So how does cooperation arise? Why does it make sense? • Suppose you play iterated prisoner’s dilemma (IPD) against a range
• After all, there does seem to be such a thing as society, and even in a of opponents.
big city like New York, people don’t behave so badly. • What approach should you choose, so as to maximise your overall
Or, maybe more accurately, they don’t behave badly all the time. payoff?
• Turns out that:
• Is it better to defect, and hope to find suckers to rip-off?
• As long as you have some probability of repeating the interaction,
co-operation can have a better expected payoff. • Or is it better to cooperate, and try to find other friendly folk to
• As long as there are enough co-operative folk out there, you can come cooperate with?
out ahead by co-operating.
• But is always co-operating the best approach?

47 / 53 48 / 53
Axelrod’s Tournament Example Strategies in Axelrod’s Tournament

• ALLD:
• Robert Axelrod (1984) investi-
“Always defect” — the hawk strategy;
gated this problem.
• TIT-FOR-TAT:
• He ran a computer tournament for
programs playing the iterated pris- 1 On round u “ 0, cooperate.
2 On round u ą 0, do what your opponent did on round u ´ 1.
oner’s dilemma.
• Axelrod hosted the tournament • TESTER:
and various researchers sent in On 1st round, defect. If the opponent retaliated, then play
approaches for playing the game. TIT-FOR-TAT. Otherwise intersperse cooperation & defection.
• JOSS:
As TIT-FOR-TAT, except periodically defect.

49 / 53 50 / 53

Axelrod’s Tournament Recipes for Success

• Surprisingly TIT-FOR-TAT for won. Axelrod suggests the following rules for succeeding in his tournament:
• But don’t read too much into this. • Don’t be envious:
• Turns out that TIT-FOR-TWO-TATS would have done better. Don’t play as if it were zero sum!
• In scenarios like the IPD tournament, the best approach depends • Be nice:
heavily on what the full set of approaches is. Start by cooperating, and reciprocate cooperation.
• TIT-FOR-TAT did well because there were other players it could • Retaliate appropriately:
co-operate with. Always punish defection immediately, but use “measured” force —
• In scenarios with different strategy mixes it would not win. don’t overdo it.
• Suggests that there is some value in cooperating, at least some of the • Don’t hold grudges:
time. Always reciprocate cooperation immediately.

51 / 53 52 / 53
Summary

• Have looked a bit further at game theory and what it can do for us.
• Lots more we haven’t covered...
• Game theory helps us to get a handle on some of the aspects of
cooperation between self-interested agents.
• Rarely any definitive answers.
• Given human interactions, that should not surprise us.

53 / 53

You might also like