Today
Robotics and Autonomous Systems                        • In the last lecture we started to look at competition between agents
                     Lecture 27: More on self-interest
                              Richard Williams
                          Department of Computer Science
                              University of Liverpool
                                                                     • Today we look more into this.
                                                           1 / 53                                                                             2 / 53
Competition between agents                                          Game theory?
                                                                     • Game theory is a framework for analysing interactions between a set
                                                                       of agents.
                                                                     • Abstract specification of interactions.
                                                                     • Describes each agent’s preferences in terms of their utility.
                                                                            • Assume agents want to maximise utility.
                                                                     • Give us a range of solution strategies with which we can make some
                                                                       predictions about how agents will/should interact.
  • Situation is more like this.
                                                           3 / 53                                                                             4 / 53
Congestion Game                                                                   Congestion Game
                                                                                   • Capture this as:
 • Agents using TCP to communicate.
                                                                                                                          i
     • If packets collide, should back-off.
                                                                                                                       defect    correct
 • Works if everyone does this.                                                                              defect         -3        -4
 • But what if agents could choose a defective implementation that                                       j             -3        0
   doesn’t back-off?                                                                                         correct        0         -1
     • In a collision, their message would get sent quicker.                                                           -4        -1
 • But what if everyone did this?
     • Outcome depends on what other agents do.                                    • Agent i is the column player.
                                                                                   • Agent j is the row player.
                                                                         5 / 53                                                                    6 / 53
Congestion Game                                                                   Congestion Game
 • Two obvious questions we can ask in this scenario:                              • What should an individual agent do?
     • What should an individual agent do?                                             • Depends on what the other agent does.
     • How does the game get played — how do both agents together act?             • How does the game get played — how do both agents together act?
 • Game theory offers some ideas about how to answer these questions.                  • Equilibrium.
                                                                         7 / 53                                                                    8 / 53
Congestion Game                                                          Normal form games
 • As with all good games, the congestion game captures some
   underlying truths about the world at an abstract level:
                                                                           • An n-person, finite, normal form game is a tuple pN , A , uq, where
                                                                               • N is a finite set of players.
                                                                               • A “ A1 ˆ . . . ˆ An where Ai is a finite set of actions available to i.
                                                                                 Each a “ pa1 , . . . , an q P A is an action profile.
                                                                               • u “ pu1 , . . . , un q where ui : A ÞÑ < is a real-valued utility function for i.
                                                                           • Naturally represented by an n-dimensional matrix
 • (Though you might want to alter the payoffs somewhat.)
                                                                9 / 53                                                                                          10 / 53
Normal form games                                                        Strategies
                                                                           • We analyze games in terms of strategies, that is what agents decide
                                                                             to do.
                                                                                • Combined with what the other agent(s) do(es) this jointly determines
                                                                                  the payoff.
                                                                           • An agent’s strategy set is its set of available choices.
                                                                           • Can just be the set of actions — pure strategies.
                                                                           • In the congestion game, the set of pure strategies is:
                                                                                  tcorrect , defect u
                                                                           • We need more than just pure strategies in many cases.
                                                                                • Will discuss this later
                                                               11 / 53                                                                                          12 / 53
Payoff matrix                                                                        Common payoff games
  • Here is the payoff matrix from the “choose which side” (of the road)              • “Choose which side” game
    game:                                                                                                                  left    right
                                            i                                                                     left         1       0
                                          left     right                                                                  1        0
                                left          1        0                                                          right        0       1
                          j              1         0                                                                      0        1
                                right         0        1                                Also called the coordination game
                                         0         1                                  • Any game with ui pa q “ uj pa q for all a P Ai ˆ Aj is a common payoff
  • We can classify games by the form of the payoff matrix.                             game.
                                                                           13 / 53                                                                               14 / 53
Common payoff games                                                                  Common payoff games
                                                                                      • In between is the El Farol bar problem:
  • The misanthropes’ (un)coordination game:
                                         left     right
                              left           0        1
                                        0         1
                              right          1        0
                                        1         0
    Here we try to avoid each other.
                                                                                      • If everyone goes to the bar it is no fun, but if only some people go
                                                                                        then everyone who goes has a good time.
                                                                                        Should you go or not?
                                                                           15 / 53                                                                               16 / 53
Constant sum games                                                                  Zero-sum games
 • Matching pennies
                                    heads       tails
                          heads        -1           1                                • A particular category of constant sum games are zero-sum games.
                                    1         -1                                     • Where utilities sum to zero:
                          tails        1            -1
                                    -1        1                                                        u1 pai q ` uj pωq “ 0 for all a P Ai ˆ Aj
 • Any game with ui pa q ` uj pa q “ c for all a P Ai ˆ Aj is a constant sum
   game.
                                                                          17 / 53                                                                        18 / 53
Zero-sum games                                                                      Zero-sum games
 • Where preferences of agents are diametrically opposed we have                     • Rock, paper, scissors:
   strictly competitive scenarios.
 • Zero sum implies strictly competitive.
 • Zero sum encounters in real life are very rare . . . but people tend to
   act in many scenarios as if they were zero sum.
 • Most games have some room in the set of outcomes for agents to find                 is another constant/zero sum game.
   (somewhat) mutually beneficial outcomes.                                          • Game in two senses.
                                                                          19 / 53                                                                        20 / 53
The rules                                                                     Rock, paper, scissors
  • Rules for “rock, paper, scissors”.
                                                                                • How would you play?
                                                                                                                         i
                                                                                                                 rock         paper   scissors
                                                                                                     rock           0            1         -1
                                                                                                                0            -1       1
                                                                                                j    paper          -1           0         1
                                                                                                                1            0        -1
                                                                                                     scissors       1            -1        0
                                                                                                                -1           1        0
                                                                    21 / 53                                                                                 22 / 53
Mixed strategy                                                                Mixed strategy
                                                                                • A mixed strategy is just a probability distribution across a set of pure
                                                                                  strategies.
  • Chances are you would play a mixed strategy.                                • More formally, for a game with two actions a1 and a2 , i picks a vector
  • You would:                                                                    of probabilities:
      • sometimes play rock,
                                                                                                               x “ px1 , x2 q
      • sometimes play paper; and                                                 where
      • sometimes play scissors.
                                                                                                                  ÿ
                                                                                                                       xk “ 1
  • A fixed/pure strategy is easy for an adaptive player to beat.                                                  k
                                                                                  and
                                                                                                                    xk ě 0
                                                                                • i then picks action a1 with probability x1 and a2 with probability a2 .
                                                                    23 / 53                                                                                 24 / 53
Mixed strategy                                                                           Mixed strategy
  • To determine the mixed strategy, i needs then to compute the best                      • Let’s consider the payoff matrix:
    values of x1 and x2 .                                                                                                            i
  • These will be the values which give i the highest payoff given the                                                             a1          a2
    options that j can choose and the joint payoffs that result.                                                        a1           -3             1
  • In the next lecture we will get into the computation of expected values,                                       j           3          -1
    which is one way to analyse this.                                                                                   a2           0              -1
  • But for now we will look at a simple graphical method                                                                      0          1
       • Only works for very simple cases.                                                   and compute mixed strategies to be used by the players.
                                                                               25 / 53                                                                    26 / 53
Mixed strategy                                                                           Mixed strategy
  • i’s analysis of this game would be something like this:
     1         j picks first row                                 1
     0                                  j picks second row       0                         • j can analyse the problem in terms of a probability vector
    −1                                                       −1
                                                                                                                             y “ py1 , y2 q
    −2                                                       −2
    −3                                                           −3                          and come up with a similar picture.
                      c1= 0.4
         0                         c1                        1
         1                         c2                        0
  • i picks the probability of a1 so that it is indifferent to what j picks.
                                                                               27 / 53                                                                    28 / 53
Mixed strategy                                                               Mixed strategy
                                                                               • This analysis will help i and j choose a mixed strategy for the specific
  • j’s analysis would be something like this:
                                                                                 case in which the payoffs to the two agents are equal and opposite for
     3                                                        3                  each outcome.
                           i picks first column
     2                                                        2
     1                                                        1
                                  i picks second column
     0                                                        0
                    r1 = 0.2
    −1                                                        −1
         0                        r1                      1
         1                        r2                      0
                                                                               • Application to zero sum games is due to von Neumann.
                                                                   29 / 53                                                                            30 / 53
General sum games                                                            Nash equilibrium
  • Battle of the Outmoded Gender Stereotypes
      • aka Battle of the Sexes
                         this      that                                        • Last time we introduced the notion of Nash equilibrium as a solution
             this            1        0                                          concept for general sum games.
                        2         0                                            • (We didn’t describe it in exactly those terms.)
             that            0        2                                        • Looked at pure strategy Nash equilibrium.
                        0         1
                                                                               • Issue was that not every game has a pure strategy Nash equilibrium.
  • Game contains elements of cooperation and competition.
  • The interplay between these is what makes general sum games
    interesting.
                                                                   31 / 53                                                                            32 / 53
Nash equilibrium                                                     Nash equilibrium
  • For example:
                                           i
                                       D           C                   • The notion of Nash equilibrium extends to mixed strategies.
                               D           2           1
                                                                       • And every game has at least one mixed strategy Nash equilibrium.
                         j         1           2
                               C           0           1
                                   2           1
  • Has no pure strategy NE.
                                                           33 / 53                                                                               34 / 53
Nash equilibrium                                                     Nash equilibrium
                                                                       • For a game with payoff matrices A (to i) and B (to j), a mixed strategy
                                                                         px ˚ , y ˚ q is a Nash equilibrium solution if:
                                                                                                 @x , x ˚ Ay ˚T   ě xAy ˚T
                                                                                                 @y , x ˚ By ˚T   ě xBy ˚T
                                                                       • In other words, x ˚ gives a higher expected value to i than any other
                                                                         strategy when j plays y ˚ .
                                                                       • Similarly, y ˚ gives a higher expected value to j than any other strategy
                                                                         when i plays x ˚ .
                                                           35 / 53                                                                               36 / 53
Nash equilibrium                                                                   The Prisoner’s Dilemma
  • Unfortunately, this doesn’t solve the problem of which Nash
    equilibrium you should play.
                                                                         37 / 53                                                         38 / 53
The Prisoner’s Dilemma                                                             The Prisoner’s Dilemma
    Two men are collectively charged with a crime and held in                        • Payoff matrix for prisoner’s dilemma:
    separate cells, with no way of meeting or communicating.
                                                                                                                           i
    They are told that:
                                                                                                                         defect   coop
      • if one confesses and the other does not, the confessor will                                             defect       2       1
        be freed, and the other will be jailed for three years;                                             j            2        4
      • if both confess, then each will be jailed for two years.                                                coop         4       3
    Both prisoners know that if neither confesses, then they will each                                                   1        3
    be jailed for one year.                                                          • What should each agent do?
                                                                         39 / 53                                                         40 / 53
What Should You Do?                                                                What Did You Do?
  • The individually rational action is defect.
    This guarantees a payoff of no worse than 2, whereas cooperating
    guarantees a payoff of at most 1.                                               • The Prisoner’s Dilemma is the same game as the “grade game”.
                                                                                      Just has a different back story.
  • So defection is the best response to all possible strategies: both
    agents defect, and get payoff = 2.                                              • When you played that,
                                                                                      18 of you chose “defect”.
  • But intuition says this is not the best outcome:
                                                                                      6 of you chose “cooperate”.
    Surely they should both cooperate and each get payoff of 3!
  • This is why the PD game is interesting — the analysis seems to give
    us a paradoxical answer.
                                                                         41 / 53                                                                            42 / 53
Solution Concepts                                                                  The Paradox
                                                                                    • This apparent paradox is the fundamental problem of multi-agent
  • Payoff matrix for prisoner’s dilemma:                                             interactions.
                                                                                      It appears to imply that cooperation will not occur in societies of
                                           i
                                                                                      self-interested agents.
                                         defect   coop
                               defect        2       1                              • Real world examples:
                          j              2        4                                      •   nuclear arms reduction/proliferation
                               coop          4       3                                   •   free rider systems — public transport, file sharing;
                                         1        3                                      •   in the UK — television licenses.
                                                                                         •   climate change — to reduce or not reduce emissions
  • There is no dominant strategy (in the game theory sense).                            •   doping in sport
  • pD , D q is the only Nash equilibrium.                                               •   resource depletion
  • All outcomes except pD , D q are Pareto optimal.                                • The prisoner’s dilemma is ubiquitous.
  • pC , C q maximises social welfare.                                              • Can we recover cooperation?
                                                                         43 / 53                                                                            44 / 53
The Shadow of the Future                                                               Backwards Induction
  • Play the game more than once.
    If you know you will be meeting your opponent again, then the                        • Suppose you both know that you will play the game exactly n times.
    incentive to defect appears to evaporate.                                              On round n ´ 1, you have an incentive to defect, to gain that extra bit
       • If you defect, you can be punished (compared to the co-operation                  of payoff.
         reward.)                                                                          But this makes round n ´ 2 the last “real”, and so you have an
       • If you get suckered, then what you lose can be amortised over the rest            incentive to defect there, too.
         of the iterations, making it a small loss.
                                                                                           This is the backwards induction problem.
  • Cooperation is (provably) the rational choice in the infinitely repeated             • Playing the prisoner’s dilemma with a fixed, finite, pre-determined,
    prisoner’s dilemma.                                                                    commonly known number of rounds, defection is the best strategy.
    (Hurrah!)
  • But what if there are a finite number of repetitions?
                                                                             45 / 53                                                                              46 / 53
Agh!                                                                                   Axelrod’s Tournament
  • That seems to suggest that you should never cooperate.
  • So how does cooperation arise? Why does it make sense?                               • Suppose you play iterated prisoner’s dilemma (IPD) against a range
  • After all, there does seem to be such a thing as society, and even in a                of opponents.
    big city like New York, people don’t behave so badly.                                • What approach should you choose, so as to maximise your overall
    Or, maybe more accurately, they don’t behave badly all the time.                       payoff?
  • Turns out that:
                                                                                         • Is it better to defect, and hope to find suckers to rip-off?
       • As long as you have some probability of repeating the interaction,
         co-operation can have a better expected payoff.                                 • Or is it better to cooperate, and try to find other friendly folk to
       • As long as there are enough co-operative folk out there, you can come             cooperate with?
         out ahead by co-operating.
  • But is always co-operating the best approach?
                                                                             47 / 53                                                                              48 / 53
Axelrod’s Tournament                                                              Example Strategies in Axelrod’s Tournament
                                                                                    • ALLD:
  • Robert Axelrod (1984) investi-
                                                                                      “Always defect” — the hawk strategy;
    gated this problem.
                                                                                    • TIT-FOR-TAT:
  • He ran a computer tournament for
    programs playing the iterated pris-                                                  1   On round u “ 0, cooperate.
                                                                                         2   On round u ą 0, do what your opponent did on round u ´ 1.
    oner’s dilemma.
  • Axelrod hosted the tournament                                                   • TESTER:
    and various researchers sent in                                                   On 1st round, defect. If the opponent retaliated, then play
    approaches for playing the game.                                                  TIT-FOR-TAT. Otherwise intersperse cooperation & defection.
                                                                                    • JOSS:
                                                                                      As TIT-FOR-TAT, except periodically defect.
                                                                        49 / 53                                                                            50 / 53
Axelrod’s Tournament                                                              Recipes for Success
  • Surprisingly TIT-FOR-TAT for won.                                             Axelrod suggests the following rules for succeeding in his tournament:
  • But don’t read too much into this.                                              • Don’t be envious:
      • Turns out that TIT-FOR-TWO-TATS would have done better.                       Don’t play as if it were zero sum!
  • In scenarios like the IPD tournament, the best approach depends                 • Be nice:
    heavily on what the full set of approaches is.                                    Start by cooperating, and reciprocate cooperation.
  • TIT-FOR-TAT did well because there were other players it could                  • Retaliate appropriately:
    co-operate with.                                                                  Always punish defection immediately, but use “measured” force —
       • In scenarios with different strategy mixes it would not win.                 don’t overdo it.
  • Suggests that there is some value in cooperating, at least some of the          • Don’t hold grudges:
    time.                                                                             Always reciprocate cooperation immediately.
                                                                        51 / 53                                                                            52 / 53
Summary
 • Have looked a bit further at game theory and what it can do for us.
 • Lots more we haven’t covered...
 • Game theory helps us to get a handle on some of the aspects of
   cooperation between self-interested agents.
 • Rarely any definitive answers.
 • Given human interactions, that should not surprise us.
                                                                         53 / 53