EC 005 Part II: Dynamic Games of Complete
Information
              Soumendu Sarkar
           soumendu@econdse.org
              February 18, 2025
Syllabus: PART B
    ▶ Sequential games preliminaries (Tadelis Chapter 7) [2 lectures]
    ▶ Nash Equilibrium of dynamic games of complete information
      (Tadelis Chapter 7) [1 lecture]
    ▶ Sequential rationality and subgame perfection (Tadelis
      Chapter 8) [1 lecture]
    ▶ One stage deviation and backward induction algorithm
      (Tadelis Chapter 8,9) [2 lectures]
    ▶ How good are SPNE predictions [1 lecture]
    ▶ Repeated games (Tadelis Chapter 10) [3 lectures]
    ▶ Bargaining (Tadelis, Chapter 11) [4 lectures]
Logistics
    ▶ Classes:
        ▶ Mondays 1015-1130 (B); 1415-1530 (A)
        ▶ Wednesdays 1130-1245 (B); 1530-1645 (A)
        ▶ Fridays 0900-1015 (B); 1300-1415 (A)
    ▶ Structure: 5 minutes recap; 1 hour primary lecture with only
      clarificatory questions allowed; 10 minutes sumup/substantial
      Q&A
    ▶ Attendance in classes and tutorials crucial
    ▶ second midsem sometime late March/ early April
    ▶ Office hours (Room 120): Mondays 1245-1345 (A); Fridays
      1200-1300 (B)
Introduction: recap
    ▶ Games in strategic/normal form
    ▶ Examples of static/One-shot simultaneous-move games
      (complete information): BoS, PD, Meeting point, Matching
      Penny, Stag hunt
    ▶ Nash equilibrium in pure and mixed strategies (actions?)
    ▶ Dominance, iterated elimination of strictly dominated
      strategies
    ▶ Rationality and common knowledge
Introduction
    ▶ Dynamic games: multiple time periods
    ▶ Sequence of moves rather than simultaneous moves
    ▶ Each component of sequence corresponds to a period/interval
      of time
    ▶ Different sets of players may get to make decisions at different
      points of time
    ▶ Set of actions a player may choose from may differ over time
Introduction contd.
    ▶ Now strategies are not same as actions: but a plan of actions,
      one for each period where a player makes a decision
    ▶ Dynamic games of complete information: history of past
      actions is observed by players when making decision in current
      period
    ▶ Payoffs may be realized in multiple periods
    ▶ Discounting of payoffs may be essential to allow for
      intertemporal comparison of payoffs
Introduction contd.
    ▶ The structure of dynamic games adds new layers to
      interpretation of rationality
    ▶ Players are supposed to be sequentially rational, assuming
      that opponents are sequentially rational as well
    ▶ Some Nash equilibria no longer remain reasonable, new ideas
      of perfection or refinements are introduced
Our tasks
    ▶ Writing a dynamic game: the extensive form
    ▶ Strategy in dynamic games: relating the extensive form to the
      normal form
    ▶ Sequential rationality and equilibrium in dynamic games of
      complete information:
        ▶   Nash
        ▶   Backward induction
        ▶   Subgame perfection
        ▶   One stage deviation principle
    ▶ Simple applications; repeated games; bargaining
Example 1: a model of entry
    ▶ A firm called Entrant is facing the choice of entering a market
    ▶ If it enters the market, another firm, called Incumbent,
      observes its entry and chooses to either accommodate or fight
    ▶ Accommodate means incumbent maintaining a moderately
      high price, allowing the entrant to earn positive profit
    ▶ Fight means incumbent slashing prices to make the entrant
      suffer losses Show Figure
Example 2: Stackelberg Leadership
    ▶ A simple dynamic version of Cournot duopoly
    ▶ Two firms L and F with identical marginal cost c
    ▶ Inverse demand function P = 1 − Q where Q is total output
    ▶ In period 1 firm L chooses output level qL
    ▶ In period 2 firm F observes qL and chooses qF
    ▶ Both firms realize profits at the end of period 2
Example 3: investment followed by quantity competition
    ▶ Two firms with known identical marginal costs
    ▶ In period 1 Firm 1 faces a choice between making a fixed
      cost-reducing investment and not making an investment
    ▶ After this choice has been made, firms simultaneously
      compete in quantities in period 2
Example 4: Bargaining
    ▶ Two players are bargaining over the share of a rupee
    ▶ If player 1 makes an offer in period 1 player 2 can accept or
      reject
    ▶ If player 2 accepts then the game ends with each player
      getting the agreed shares
    ▶ If player 2 rejects, then he makes offers in period 2 that player
      1 can accept or reject
    ▶ If player 1 accepts the game ends with each player getting
      agreed shares, else the game proceeds to period 3 where 1
      makes a new offer...
Example 5: Repeated Prisoner’s Dilemma
    ▶ Two players are playing a Prisoner’s Dilemma game
      repeatedly, choosing between cooperating and
      non-cooperation
    ▶ Cooperation is better than non-cooperation, but
      non-cooperation is better when the opponent is cooperating
    ▶ Payoffs are realized at the end of each period
    ▶ Variants: finite vs infinite repetitions
   Show figure
Example 6: Repeated Cournot
    ▶ Firms 1 and 2 with identical marginal costs c
    ▶ They face the same market in every period repeatedly
    ▶ Inverse demand function P = 1 − Q where Q is total output
    ▶ Both firms realize profits at the end of each period according
      to their profit functions
    ▶ Variants: finite vs infinite repittions
Example 7: job market signalling
    ▶ A firm may be uncertain about the productivity of his
      prospective employee which determines his own profits
    ▶ A prospective employee can be of high productivity or low
      productivity
    ▶ He can acquire costly education to signal his productivity
    ▶ Education can be of two qualities– high or low
    ▶ Once an employee chooses an education level, the firm
      observes it and decides on a wage level
    ▶ The game ends with the firm earning profits net of wages, and
      the employee earning wages net of cost of education acquired
Extensive form representation
    ▶ A game in extensive form is represented as
                         ⟨I , χ, A, p, α, H, H, ι, ρ, u⟩           (1)
    ▶ Alternatively, graphically, in the form of a game tree, provided
      the sets of actions available to a player are finite
Note aside: graph theory
    ▶ A graph is a collection of two sets, V , a set of vertices, and
      E , a set of edges
    ▶ The set of edges E is essentially a subset of V × V , the
      Cartesian product of V with itself
    ▶ If a, b ∈ V and (a, b) ∈ E , then it means that a and b are
      directly connected
    ▶ A path is a sequence of directly connected nodes
    ▶ A graph is connected if there exists a path between every pair
      of nodes
    ▶ A graph is complete if every pair of nodes is directly connected
    ▶ Every path beginning and ending at the same node is called a
      cycle
    ▶ Every graph containing a cycle is cyclical, every graph which is
      not cyclical is acyclical
    ▶ A connected acyclical graph is a tree
Game Tree of Example 1
         not enter                enter
                                          I
          (0, 2)
                     accommodate              fight
                             (1, 1)           (-1, -1)   Go back
▶ Two of the five nodes are assigned to players E and I
  respectively
▶ Each of these nodes are connected to two more nodes, via
  edges
▶ Each of these edges are assigned an action available to the
  player
▶ Remaining nodes are not connected to any nodes further
  down, but are assigned payoff numbers
▶ The first set of nodes are decision nodes– as the player
  assigned to such a node must make a decision that leads to
  either a payoff or another decision node
▶ The second set of nodes are called terminal nodes
Game Tree of Example 5
                  C                           N
                                  2
             C            N               C           N
         1                    1       1                   1
     C N              C   N   C   N               C    N
     2                  2       2                    2
   C NC N             C NC N C N C N              C N C N
   Go back
▶ In this diagram, apart from decision nodes, terminal nodes,
  and edges representing actions, we see ellipses containing two
  decision nodes assigned to a player
▶ The choices available to a player at each of the decision nodes
  inside an ellipse are the same
▶ Such ellipses represent information sets of players
▶ If there are two or more nodes in an information set, the fact
  that the same choice of actions are available at each of these
  nodes denote that the corresponding player is unable to
  distinguish between these nodes given the information about
  sequence of actions observed so far
▶ Every decision node not enclosed in an ellipse is a singleton
  information set — depicting that the corresponding player
  perfectly knows the sequence of previous play
Game tree of Example 7
               wH                         wH
                       eH   tH       eL
              wL             q                wL
                   F             N        F
              wH             1−q              wH
                       eH   tL       eL
               wL                         wL
game tree=extensive form
    ▶ I : (finite) set of players which may include nature
    ▶ χ: (finite) set of nodes
    ▶ A: (finite) set of all possible actions in the game
    ▶ p : χ \ x0 → χ assigns a predecessor node to every node
      except the initial one
    ▶ If x = p(y ), then y = s(x), successor of x
    ▶ T : terminal set of nodes without a successor
    ▶ α : χ \ x0 → A assigns every non-initial node an action that
      links it to its predecessor; if p(x) = p(x ′ ) then α(x) ̸= α(x ′ )
Extensive form contd.
    ▶ H: collection of information sets
    ▶ H : χ \ T → H: assignment of non-terminal nodes to
      information sets such that if c(x) is the set of actions leading
      from node x to its successor, then c(x) = c(x ′ ) for all
      x ′ ∈ H(x)
    ▶ ι : H → I : assignment of information sets to players
    ▶ ρ: assignment of probabilities to nature’s actions
    ▶ u : T → RI : assignment of payoff vectors to terminal nodes
Information sets
    ▶ Information sets are the most important components of an
      extensive form game
    ▶ These are collections of decision nodes such that
        ▶ same choice of decisions are available at every node in an
          information set (requirement)
        ▶ perfect recall (assumption):
             ▶ no node in an information set is a predecessor or successor of
               another node in the set
             ▶ Consider a pair of nodes x, x ′ in an information set of a player
               i, Hi (x). If a is a past action taken by the player at decision
               node x ′′ on the path to x, then there must be x ′′′ ∈ Hi (x ′′ )
               such that a is on the path from x ′′′ to x ′ .
Perfect vs imperfect information
    ▶ A game in extensive form having only singleton information
      sets is a game of perfect information
    ▶ Example 1,2, 4 are games of perfect information
    ▶ A game having at least one non-singleton information set is a
      game of imperfect information
    ▶ Example 3, 5, 6 are game of imperfect information
    ▶ Every one-shot/static/simultaneous move game is a game of
      imperfect information, e.g., the one-shot prisoner’s dilemma
Game tree for one-shot Prisoner’s dilemma
                           1
                     C           N
                           2
               C     N           C      N
Complete vs Incomplete Information
    ▶ A game of incomplete information involves payoff uncertainty
      for players
    ▶ These are usually modelled by nature selecting a state of the
      world using a probability distribution, e.g., one shot auction
      games
    ▶ A game of complete information may be a game of imperfect
      information (e.g., the one-shot Prisoner’s dilemma)
    ▶ A game of incomplete information is always a game of
      imperfect information
A static game of incomplete information in extensive form
                                    N
                    p                               1−p
               t1                                     t2
       C                    N               C                  N
                                    2
   C       N            C       N       C       N          C       N
Strategies in extensive form
    ▶ Recall that a strategy of a player is a complete contingent
      action plan
    ▶ In extensive form, the information sets of a player represent
      these contingencies where he must take a decision
    ▶ Consequently, a (pure) strategy for a player in an extensive
      form game, denoted si , maps his information sets to the set of
      actions that are available to him in these sets:
                        si : Hi → A,    si (Hi ) ∈ c(Hi ).
    ▶ The collection of all possible strategies of a player i is referred
      to as his strategy set, denoted Si
Mixed and Behavioral Strategies
    ▶ A mixed strategy is a probability distribution over pure
      strategies
    ▶ A behavioral strategy specifies probability distributions over
      feasible actions for each information set
    ▶ A mixed strategy can induce a behavioral strategy and
      vice-versa, if the game is one of perfect recall
Equivalence of normal and extensive forms
    ▶ Every tuple of strategies in an extensive form game induce a
      path of play originating in an initial node and ending in a
      terminal node
    ▶ Therefore, we can associate the payoff vector assigned to this
      terminal node to the tuple of strategies and present the game
      in the familiar normal/matrix form
    ▶ As long as each strategy in normal form clearly assigns an
      action to each contingency, we can convert it into an
      extensive form
Normal form of Example 1
                           Player I
                          accommodate    fight
   Player E     enter          1, 1     -1, -1
              not enter        0, 2       0, 2
Normal form of Example 5
    ▶ Both players have 5 information sets
    ▶ Both players have choice of two actions in each of their
      information sets
    ▶ Consequently, there are 25 = 32 strategies for each player in
      this game
Nash equilibrium
    ▶ The idea of Nash equilibrium generalizes easily from static to
      dynamic games after acknowledging the general definition of
      strategies
    ▶ A Nash equilibrium for an extensive form game is a collection
      of strategies s ∗ = (s1∗ , . . . , sI∗ ) such that for all i ∈ I , si ∈ Si ,
                           ui (s ∗ ) ≥ ui (si , s−i
                                                 ∗             ∗
                                                    ) for all s−i
    ▶ Nash equilibria of example 1 are displayed in red
Exercise 7.1
    ▶ Imagine an extensive-form game in which player i has K
      information sets.
        1. If the player has an identical number of m possible actions in
           each information set, how many pure strategies does he have?
        2. If the player has mk actions in information set
           k ∈ {1, 2, . . . , K }, how many pure strategies does the player
           have?
Exercise 7.2
    ▶ Consider a two-player game in which player 1 can choose A or
      B. The game ends if he chooses A while it continues to player
      2 if he chooses B. Player 2 can then choose C or D, with the
      game ending after C and continuing again with player 1 after
      D. Player 1 can then choose E or F , and the game ends after
      each of these choices.
        1. Model this as an extensive-form game tree. Is it a game of
           perfect or imperfect information?
        2. How many terminal nodes does the game have? How many
           information sets?
        3. How many pure strategies does each player have?
        4. Imagine that the payoffs following choice A by player 1 are (2,
           0), those following C by player 2 are (3, 1), those following E
           by player 1 are (0, 0), and those following F by player 1 are (1,
           2). What are the Nash equilibria of this game? Does one strike
           you as more appealing than the other? If so, explain why.
Exercise 7.2
                A                  B
                                    2
               (2, 0)
                         C                D
                                           1
                        (3, 1)
                                   E             F
                                 (0, 0)        (1, 2)
Exercise 7.2
                            Player 2
                               C     D
               Player 1   AE 2, 0 2, 0
                          AF 2, 0 2,0
                          BE 3, 1 0, 0
                          BF 3, 1 1, 2
Exercise 7.3 (Tic-tac-toe)
    ▶ The extensive-form representation of a game can be cumber-
      some even for very simple games. Consider the game of
      tic-tac-toe, in which two players mark X or O in a 3x3 matrix.
      Player 1 moves first, then player 2, and so on. If a player gets
      three of his kind in a row, column, or one of the diagonals
      then he wins, and otherwise it is a tie. For this question
      assume that even after a winner is declared, the players must
      completely fill the matrix before the game ends.
         1. Is this a game of perfect or imperfect information? Why?
         2. How many information sets does player 2 have after player 1’s
            first move?
         3. How many information sets does player 1 have after each of
            player 2’s moves for the first time?
         4. How many information sets does each player have in total? e.
         5. How many terminal nodes does the game have?
Exercise 7.4 (Centipede)
    ▶ Imagine a two-player game that proceeds as follows. A pot of
      money is created with $6 in it initially. Player 1 moves first,
      then player 2, then player 1 again, and finally player 2 again.
      At each player’s turn to move, he has two possible actions:
      grab (G) or share (S). If he grabs he gets 23 of the current pot
      of money, the other player gets 31 of the pot, and the game
      ends. If he shares then the size of the current pot is multiplied
      by 32 and the next player gets to move. At the last stage at
      which player 2 moves, if he chooses S then the pot is still
      multiplied by 23 , player 2 gets 13 of the pot, and player 1 gets
      2
      3 of the pot.
         1.   Model this as an extensive-form game tree.
         2.   How many pure strategies does each player have?
         3.   Find the Nash equilibria of this game.
         4.   Now imagine that at the last stage at which player 2 moves, if
              he chooses to share then the pot is equally split among the
              players. Does your answer to part 4 change?
Exercise 7.4
          1    S       2    S       1      S        2      S
                                                               ( 81  81
                                                                 4 , 8 )
      G            G            G               G
     (4, 2)        (3, 6)       (9, 29 )       ( 27  27
                                                 4 , 2 )
Exercise 7.4
                                      Player 2
                        GG     GS           SG          SS
        Player 1   GG   4, 2   4, 2        4, 2        4, 2
                   GS   4, 2   4, 2        4, 2        4, 2
                   SG   3, 6   3, 6       9, 9/2      9, 9/2
                   SS   3, 6   3, 6    27/4, 27/2   81/4, 81/8
Exercise 7.4
                                     Player 2
                      GG     GS         SG             SS
      Player 1   GG   4, 2   4, 2       4, 2          4, 2
                 GS   4, 2   4, 2       4, 2          4, 2
                 SG   3, 6   3, 6     9, 9/2         9, 9/2
                 SS   3, 6   3, 6   27/4, 27/2   243/16, 243/16
Exercise 7.5 (Veto)
    ▶ Two players must choose among three alternatives, a, b, and
      c. Player 1’s preferences are given by a ≻1 b ≻1 c while player
      2’s preferences are given by c ≻2 b ≻2 a. The rules are that
      player 1 moves first and can veto one of the three alternatives.
      Then player 2 chooses one of the remaining two alternatives.
        1. Model this as an extensive-form game tree (choose payoffs
           that represent the preferences).
        2. How many pure strategies does each player have?
        3. Find all the Nash equilibria of this game.
Exercise 7.5
                                1
               va                   vb                   vc
           2                2                            2
       b       c        c                a           a        b
     2              1   1                3       3                2
                                                                 
     2              3   3                1       1                2
Exercise 7.5
                                         Player   2
                    bca    bcb    baa     bab      cca    ccb    caa    cab
               va   2, 2   2, 2   2, 2    2, 2     1, 3   1, 3   1, 3   1, 3
    Player 1   vb   1, 3   1, 3   3, 1    3, 1     1, 3   1, 3   3, 1   3, 1
               vc   3, 1   2, 2   3, 1    2, 2     3, 1   2, 2   3, 1   2, 2
Exercise 7.7 (Roommates voting)
    ▶ Three roommates need to vote on whether they will adopt a
      new rule and clean their apartment once a week or stick to
      the current once-a-month rule. Each votes “yes” for the new
      rule or “no” for the current rule. Players 1 and 2 prefer the
      new rule while player 3 prefers the old rule.
        1. Suppose the players require a unanimous vote to adopt the
           new rule. Player 1 votes first, then player 2, and then player 3,
           the latter two observing the previous votes. Draw this as an
           extensive-form game and find the Nash equilibria.
        2. Suppose that the players require a majority vote now. Again
           player 1 votes first, then player 2, and then player 3, the latter
           two observing the previous votes. Draw this as an
           extensive-form game and find the Nash equilibria.
        3. Suppose the game is as in part 2, but the votes of earlier
           movers are not observed by the later movers — and the votes
           are counted after all have voted. Draw this as an
           extensive-form game and find the Nash equilibria. In what way
           is this result different from the result in 2?
Exercise 7.7 Part 1
                                                1
                                y                               n
                            2                                       2
                    y               n                       y               n
                3                       3               3                       3
            y           n       y           n       y           n       y           n
            1           0       0           0       0           0       0           0
            1           0       0           0       0           0       0           0
            0           1       1           1       1           1       1           1
Exercise 7.7 Part 2
                                                1
                                y                               n
                            2                                       2
                    y               n                       y               n
                3                       3               3                       3
            y           n       y           n       y           n       y           n
            1           1       1           0       1           0       0           0
            1           1       1           0       1           0       0           0
            0           0       0           1       0           1       1           1
Exercise 7.7 Part 3
                                    1
                        y                       n
                                    2
                y           n               y           n
                                    3
            y       n   y       n       y       n   y       n
            1       1   1       0       1       0   0       0
            1       1   1       0       1       0   0       0
            0       0   0       1       0       1   1       1
Exercise 7.7 Part 3
                                  Player 2
                                    yes     no
                 Player 1   yes   1, 1, 0 1, 1,   0
                            no    1, 1, 0 0, 0,   1
                                       yes
                                  Player 3
                                  Player 2
                                    yes     no
                 Player 1   yes   1, 1, 0 0, 0,   1
                            no    0, 0, 1 0, 0,   1
                                        no
                                  Player 3
Exercise 7.8
    ▶ Consider the following two-stage game: In the first stage one
      brother (player 2) has two $10 bills and can choose one of two
      options: he can give his younger brother (player 1) $20 or give
      him one of the $10 bills. This money will then be used to buy
      snacks at the show they will see, and each $1 of snacks
      purchased yields one unit of payoff for a player who uses it. In
      the second stage the show they will see is determined by the
      following Battle of the Sexes game:
                        Player 1
                         O         F
       Player 2 O 16, 12          0, 0
                  F     0, 0     12, 16
        1.   Present the entire game in extensive form (a game tree).
        2.   Write the (pure) strategy sets for both players.
        3.   Present the entire game in one matrix.
        4.   Find the Nash equilibria of the entire game (pure and mixed
             strategies).
Exercise 7.8
                                            2
                          O10                                 O20
                         1                                      1
                 O                F                   O                 F
                          2                                     2
           O         F        O       F         O         F         O       F
           26    10             10    22        16        0         0       12
                                                                         
           22    10             10    26        32    20            20      36
Exercise 7.8
                                         2
             O10 OO   O10 OF   O10 FO   O10 FF   O20 OO   O20 OF   O20 FO   O20 FF
        OO   26, 22   26, 22   10, 10   10, 10   16, 32    0, 20   16, 32    0, 20
    1   OF   26, 22   26, 22   10, 10   10, 10    0, 20   12, 36    0, 20   12, 36
        FO    10,10   10, 10   22, 26   22, 26   16, 32    0, 20   16, 32    0, 20
        FF    10,10   10, 10   22, 26   22, 26    0, 20   12, 36    0, 20   12, 36
Exercise 7.9
    ▶ A student stole the DVD player from the student lounge. The
      dean of students (player 1) suspects the student (player 2)
      and begins investigation. Concrete evidence will be available
      to the dean only with probability 12 . The student knows that
      the investigation is on but does not know whether the dean
      has collected evidence or not. The game proceeds as follows:
      The dean realizes whether or not he has evidence and can
      then choose whether to accuse the student (A) or bounce the
      case (B ) and forget it. Once accused, the student can either
      confess (C) or deny (D). If the dean bounces the case then
      both players get 0. If the dean accuses the student and the
      student confesses, the dean gains 2 and the student loses 2. If
      the dean accuses the student and the student denies, then
      payoffs depend on the evidence: If the dean has no evidence it
      gives him a loss of 4, while the student gets a payoff of 4. If
      the dean has evidence he gains 4, while the student is put on
      probation and loses 4.
Exercise 7.9 contd.
    1. Draw the game tree that represents the extensive form of this
       game.
    2. Write down the matrix that represents the normal form of the
       extensive form you drew in (a).
    3. Solve for the Nash equilibria of the game.
Exercise 7.9
                                              N
                              1                                       1
                              2                                       2
                         te                                               tn
                                  S
             A                        B                   A                    B
                                          0                                        0
                                                                                      
     C           D                        0       C           D                    0
     2        4                                   2       −4
                                                               
    −2       −4                                   −2       4
Exercise 7.9
                              Student
                              C        D
                      AA   2, 2; -2 4, -4; 0
               Dean   AB   2, 0; -1 4, 0; -2
                      BA   0, 2; -1 0, -4; 2
                      BB   0, 0; 0 0, 0; 0
Exercise 7.9
    ▶ Action A is dominant for type te ; no dominant action for tn
    ▶ tn mixes between A and B to make the student indifferent
      between choosing C and D
    ▶ Solve 12 × −2 + 21 × p × −2 = 12 × −4 + 12 × p × 4 or, p =     1
                                                                     3
    ▶ Student mixes between C and D to make tn indifferent
      between A and B
    ▶ Solve 2q + (1 − q) × −4 = 0 or q = 23
    ▶ Nash equilibrium in behavioral strategies:
      ((1, 0), ( 31 , 23 ); ( 32 , 13 ))
Exercise 7.10 (Perfect and imperfect recall)
    ▶ Consider the following game:
                                                    2
                                        a                       b
                            1                                            1
                L                       R                       W                 E
                            2                                            2
        t               z       t               z       x               y x               y
            2       0               0       1               2       0         0       1
                                                                               
            1       0               0       2               1       0         0       2
Exercise 7.10 contd.
    1. What are the pure-strategy sets for each player?
    2. Show that for any behavioral strategy for player 1 there is a
       mixed strategy that leads to the same distribution over the
       terminal nodes regardless of the strategy chosen by player 2.
    3. Show that for any behavioral strategy for player 2 there is a
       mixed strategy that leads to the same distribution over the
       terminal nodes regardless of the strategy chosen by player 1.
    4. Now imagine that the game does not have perfect recall so
       that player 2’s bottom two information sets are now one large
       information set. Can you find an example showing that the
       claim in 3 is no longer true?
Are all Nash reasonable?
    ▶ (enter, accommodate) and (not enter, fight) are the two Nash
      equilibria of the entry game in Example 1
    ▶ Consider (not enter, fight); the latter describes I’s strategy in
      the information set reached when E plays enter
    ▶ In this information set, I’s optimal action is to play
      accommodate rather than fight
    ▶ Therefore, I cannot commit to fight if his information set was
      reached and (not enter, fight) is unreasonable
    ▶ In contrast, E has first mover advantage/ power of
      commitment/ credibility when he chooses to enter
    ▶ E’s entering the market forces I to accommodate, therefore
      (enter, accommodate) is reasonable
Sequential Rationality
    ▶ The idea of reasonable equilibria in dynamic games is
      formalized as the notion sequential rationality
    ▶ This notion generalizes the idea of best response strategies
      from static games
   Definition
   Given strategies s−i of other players, a strategy of player i is
   sequentially rational if it is a best response to s−i in each of i’s
   information sets
Backward Induction
    ▶ Backward induction is an algorithm to find sequentially
      rational equilibrium strategies in dynamic games of perfect
      information with finite terminal nodes
    ▶ Consider the decision nodes in the game tree which have the
      terminal nodes as successors
    ▶ Substitute a payoff vector that gives the player corresponding
      to this decision node the highest payoff over all actions
      available here; remove the terminal nodes and edges
      connecting the decision node to the terminal node
    ▶ Repeat above step for the reduced game tree
    ▶ Since there are finite terminal nodes, the process will
      terminate after finite iterations, leaving a single payoff vector
Backward induction contd.
    ▶ The corresponding payoff vector corresponds to a Nash
      equilibrium with sequentially rational strategies
    ▶ The collection of actions chosen at each information set in the
      steps that leads to the corresponding terminal node from the
      initial node in the original game tree outlines the Nash
      equilibrium involving sequentially rational strategies for every
      player
Game Tree of Example 1
            not enter                enter
                                             I
             (0, 2)
                        accommodate              fight
                                (1, 1)           (-1, -1)
Game Tree of Example 1
               not enter       enter
               (0, 2)            (1, 1)
Game Tree of Example 1
            not enter                enter
                                             I
             (0, 2)
                        accommodate              fight
                                (1, 1)           (-1, -1)
Example 2: Stackelberg leadership
    ▶ Profit of firm i ∈ {L, F } is
                       πi (qi , qj ) = (1 − qi − qj − c)qi
    ▶ A strategy of L is simply an action in R+
    ▶ A strategy of F maps possible values of qL to possible values
      of qF : qF (qL )
    ▶ Infinitely many such strategies can be defined
    ▶ Best response function of F is also a strategy: for each qL , it
      picks the qF which maximizes F ’s profit
    ▶ Best response function of F is obtained by maximizing πF
      with respect to qF and setting it to zero:
                                      1 − qL − c
                               qF =
                                          2
    ▶ Question: how many Nash equilibria are there in this game?
Nash equilibria of Stackelberg game
    ▶ A strategy pair (q̂L , qˆF (qL )) is a Nash equilibrium of the
      Stackelberg game if
                   πL (q̂L , qˆF (qL )) ≥ πL (qL , qˆF (qL )), ∀qL ∈ R+     (2)
                   πF (q̂L , qˆF (q̂L )) ≥ πF (q̂L , qF (qL )), ∀qF (qL )
               or, πF (q̂L , qˆF (q̂L )) ≥ πF (q̂L , qF ), ∀qF ∈ R+         (3)
    ▶ The first equation requires that when F is playing qˆF (qL ), L
      does not get higher profit by choosing any output other than
      q̂L
    ▶ Second equation only requires that F plays the best response
      to q̂L — not necessarily for all qL
    ▶ Any qL ≤ q M (monopolistic output) can be sustained in a
      Nash equilibrium!!!
Cournot output in Stackelberg model Nash equilibrium
                qF
        qL∗ (qF )
                                 πL (qL , qF ) = πLC
                                        qF (qL ) = q C
                      qC                  qF∗ (qL ) qL
Arbitrary output of leader in Stackelberg model
              qF
     qL∗ (qF )
                                       qF (qL )
                               πL (qL , qF ) = πL (q̄L , qF∗ (q̄L ))
     qF∗ (q̄L )
                   q̄L                   qF∗ (qL ) qL
Sequentially rational equilibrium in Stackelberg model
                 qF
         qL∗ (qF )
                                   πL (qL , qF ) = πLS
              qFS
                            qLS             qF∗ (qL ) qL
Example: power of commitment
    ▶ Player 1 (government) wants to influence choice of player 2
    ▶ Player 2 chooses an action a2 ∈ {0, 1} and receives a transfer
      t ∈ {0, 1} from 1 who observes a2
    ▶ 2’s objective is to maximize expected value of transfer net of
      cost of action, which is 0 for a2 = 0 and 21 for a2 = 1
    ▶ Player 1’s objective is to minimize 2(a2 − 1)2 + t
    ▶ Before 2 chooses his action, 1 can announce a transfer rule
      t(a2 )
    ▶ Result: 1’s announcement impacts 2’s action if and only if 1
      can commit to it!
Game tree with commitment
                                          1
              t00                  t01        t10                 t11
          2                    2                      2                 2
      0        1           0         1        0           1         0       1
    −2           0         −2       −1       −3       0          −3        −1
                                                               
    0           − 12        0        1
                                     2
                                              1       − 12        1             1
                                                                                2
Game tree without commitment
                                                          1
                          t00                       t01       t10   t11
                      2                               2       2           2
                0                   1
            1                               1
           0      1             0               1
        −2      −3               0              −1
                                      
         0      1               − 21                1
                                                    2
Subgame perfection
    ▶ Note that the backward induction algorithm is not
      well-defined for games of imperfect information
    ▶ Subgame perfection is a more general algorithm to find out
      sequentially rational Nash equilibria
    ▶ In games of perfect information, subgame perfection and
      backward induction are the same
Subgames
   ▶ Given an extensive form game G , consider any decision node
     x ∈ χ(G ) which belongs to a singleton information set
   ▶ Note: There is at least one such node in every extensive form
     game
   ▶ The subgame of G beginning at x, denoted G (x), contains x
     and all its successors
   ▶ If x ′ ∈ G (x) and x ′′ ∈ H(x ′ ) then x ′′ ∈ G (x)
   ▶ The assignments of actions, information sets, and payoffs on
     the terminal nodes for every player in G (x) are the same as in
     G but restricted to the nodes in G (x)
   ▶ Note: G is a subgame of itself
Subgame Perfect Nash Equilibrium
   Definition
   A strategy profile s in an extensive form game G is a subgame
   perfect Nash equilibrium if the corresponding restrictions of the
   strategy profile constitute Nash equilibria for each subgame of G
Subgame perfection algorithm
    ▶ Given an extensive form game, consider the subgames with
      the smallest number of nodes:
        ▶ If there is only one player with a move in any of these
          subgames, find an optimal action and replace the initial node
          of this subgame with the corresponding payoff vector; remove
          remaining nodes and edges of this subgame
        ▶ If there are multiple players with a move in any of these
          subgames, find a Nash equilibrium strategy profile for these
          players, and replace the initial node of this subgame with the
          corresponding payoff vector; remove remaining nodes and
          edges of this subgame
    ▶ Repeat the previous step in the reduced game
    ▶ As the game is finite, the algorithm terminates after finite
      iterations, yielding a subgame perfect Nash equilibrium payoff
      vector
    ▶ The strategies yielding this payoff vector is a subgame perfect
      Nash equilibrium
Remarks
   ▶ Note that in games of perfect information, subgame
     perfection is the same as backward induction
   ▶ In a game of imperfect information G where the only subgame
     is G itself, subgame perfection cannot refine Nash equilibria
     any further: all Nash equilibria are then subgame perfect
Example 3: investment followed by Cournot competition
    ▶ Firms 1 and 2 have identical constant marginal cost of 2 per
      unit
    ▶ 1 can install a new technology with a marginal cost of zero by
      paying a fixed cost f
    ▶ 2 observes whether 1 invests in new technology
    ▶ Once the investment decision is observed, the two firms will
      simultaneously choose output levels q1 and q2 as in Cournot
      competition
Example contd.
    ▶ Suppose demand is p(q1 , q2 ) = 14 − q1 − q2
    ▶ Firm 1’s payoff is (12 − q1 − q2 )q1 if it does not invest, and
      (14 − q1 − q2 )q1 − f if it invests
    ▶ Firm 2’s payoff is (12 − q1 − q2 )q2 regardless of firm 1’s
      decision to invest
    ▶ We apply subgame perfection algorithm
    ▶ There are three subgames, one for each investment decision
      by firm 1, and the entire game itself
Game tree
                invest               not invest
                1                         1
                 q1                    q1
            0    2       ∞       0     2     ∞
                 q2                    q2
            0            ∞       0           ∞
Example concl.
    ▶ In the subgame corresponding to 1 not investing, the reaction
      function of firm 1 is q1 = 6 − q22 , and that of firm 2 is
      q2 = 6 − q21
    ▶ Solving gives the Nash equilibrium output levels for this
      subgame (4, 4) with payoff (16, 16)
    ▶ In the subgame corresponding to 1 investing, the reaction
      function of firm 1 is q1 = 7 − q22 , and that of firm 2 is
      q2 = 6 − q21
    ▶ Solving gives the Nash equilibrium output levels for this
      subgame ( 16    10                256      100
                  3 , 3 ) with payoffs ( 9 − f , 9 )
    ▶ Firm 1 invests if 256
                          9 − f > 16 or f < 9
                                               112
    ▶ (invest, 4, 16     10
                  3 ; 4, 3 ) is a subgame perfect Nash equilibrium for
                112
      any f ≤ 9
    ▶ (not invest, 4, 16       10
                        3 ; 4, 3 ) is a subgame perfect Nash equilibrium
      for any f ≥ 1129
Exercise 8.1 (Mixed strategy SPNE)
                                                  1
                                    Y                 N
                            1
                                                          1.5
                                                                
                  O                     F                 1.5
                            2
         o              f       o             f
             2        0             0       1
                                          
             1        0             0       2
Exercise 8.1
                                Player 2
                                  o         f
               Player 1   YO     2, 1     0, 0
                          YF     0, 0     1, 2
                          NO   1.5, 1.5 1.5, 1.5
                          NF   1.5, 1.5 1.5, 1.5
Exercise 8.1
    ▶ Suppose 1 and 2 mix in the smallest subgame with
      probabilities (p, 1 − p) and (σ, 1 − σ) resp.
    ▶ In equilibrium of the subgame, p = 23 and σ = 13 with payoff
      ( 23 , 32 )
    ▶ Then ((0, 0, 23 , 13 ), ( 13 , 32 )) is the SPNE in mixed strategies
Past year question
    ▶ Player 1, the agenda-setter, chooses a point a1 ∈ [0, 1]. Player
      2, the voter, can either accept a1 or reject it, maintaining the
      status-quo, a0 ∈ [0, 1]. The end outcome, denoted x, is
      therefore, either a0 or a1 .
        1. If player 2’s utility from x is given by u2 (x) = −(x − 13 )2 , and
           player 1’s utility is u1 (x) = x, find a subgame perfect Nash
           equilibrium of this 2-stage game.
        2. How will your answer change if player 1’s utility changes to
           u1 (x) = −(x − 25 )2 ? (7)
Past year question
    ▶ Maya and Elif approach a King with an infant, each claiming
      that the infant is her child. The true mother values the child
      at 100 whereas the impostor values the child at 50 – each of
      them knows the other’s value as well. The king knows the
      “values” that the true mother and the impostor ascribe to the
      child, but he cannot distinguish the true mother from the
      impostor. He announces the following steps:
        1. He asks Maya if the child is hers. If she answers “no”, the
           child is given to Elif. If she answers “yes”, then next step.
        2. He asks Elif if the child is hers. If she answers “no”, the child
           is given to Maya. If she answers “yes”, Elif will pay the king
           75, and receive the child, and Maya will pay the king 10.
    ▶ Show that the game designed by the king guarantees that in
      either state of the world, the only subgame perfect equilibrium
      is the one under which the true mother gets her child and
      neither woman pays anything at all.
One Stage Deviation Principle
    ▶ Given a finite extensive form game, sometimes we want to
      check if a particular strategy profile is a subgame perfect Nash
      equilibrium
    ▶ If we assume these games are multi-stage games with
      observed action, the following result is useful for this purpose:
   Theorem
   A strategy profile s in a finite horizon multi-stage games with
   observed action is subgame perfect if and only if no player can
   profitably deviate from s in a single stage and conform to s
   thereafter.
Multistage games with observed action
    ▶ At any stage, all players move simultaneously (this allows for
      an action like “doing nothing”)
    ▶ A history at any stage t is a profile of past actions that leads
      to the stage, observed by all players at t
    ▶ History at zero, h0 is the null set ∅
    ▶ The profile of actions taken by the agents in the stage
      immediately following history at stage t, ht , is at
    ▶ For all t ≥ 1, ht = (ht−1 , at−1 ) = (∅, a0 , a1 , . . . , at−1 )
    ▶ The set of possible histories is denoted H
    ▶ Note: each history induces a subgame
    ▶ Strategies are now defined as mappings from history to
      actions si : H → A
Proof of OSD Principle: Sufficiency
    ▶ To prove that every subgame perfect Nash equilibrium
      strategy profile s must satisfy OSD
    ▶ Suppose not: there is a player i, a history ht̂ , and an action a
      such that the strategy si′ , defined as follows, is a profitable
      deviation for i at the subgame beginning at history ht̂ given
      s−i :
                                   (
                                       si (ht ) for all ht ̸= ht̂
                   si′ (ht )   =
                                       a, a ̸= si (ht̂ ) for ht = ht̂
    ▶ Then the restriction of s in the subgame following history ht̂
      cannot be a Nash equilibrium of the subgame
    ▶ This contradicts that s is a subgame perfect Nash equilibrium
      strategy profile
Game Tree example
              ht̂
                    si (·)
                    si′ (·)
Proof of OSD Principle: Necessity
    ▶ To prove that every strategy profile s satisfying OSD must be
      subgame perfect
    ▶ Suppose not: there is a player i and another strategy si′ ̸= si
      such that the restriction of si′ to the subgame beginning at ht
      is a better response to s−i in this subgame than si
    ▶ Let t̂ be the largest number such that si′ (ht̂ ) ̸= si (ht̂ )
    ▶ Notice that t̂ > t by OSD
Proof of OSD Principle: Necessity contd.
    ▶ Consider another strategy si′′ constructed as follows:
                               ′ t
                     ′′ t        si (h ) for all t ≤ t̂
                    si (h ) =
                                 si (ht ) for t > t̂
    ▶ Notice that for t > t̂, si (ht ) = si′ (ht ) = si′′ (ht ) and
      si′ (ht̂ ) = si′′ (ht̂ )
    ▶ Thus, si′ and si′′ are identical in the subgame beginning at
      history ht̂
    ▶ But si′′ is a one stage deviation from si in the subgame
      beginning at history ht̂ , so it cannot be better than si in this
      subgame unless si (ht̂ ) = si′′ (ht̂ )
    ▶ This implies si (ht̂ ) = si′ (ht̂ )
An example
             ht
                        si (·)
                        si′ (·)
                  ht̂
                        si′′ (·)
Proof of OSD Principle: Necessity concl.
    ▶ If t̂ = t + 1, then there was only one possible deviation to
      check and we are done
    ▶ If t̂ > t + 1, then we may need more iterations of the above
      procedure
    ▶ For instance, if t̂ = t + 2, then the first iteration shows
      si′ (ht+2 ) = si (ht+2 ), so then t̂ = t + 1, and the next iteration
      shows si′ (ht+1 ) = si (ht+1 ) and so on
An example
             ht
                        si (·)
                        si′ (·)
                  ht̂
Check
   ▶ A backward induction solution in an MSGOA satisfies OSD
   ▶ A subgame perfection solution in an MSGOA satisfies OSD
Games with infinite stages
    ▶ The OSD principle goes through in games with infinite stages
      provided the game is continuous at inifinity:
              For all i,      sup         |ui (h) − ui (h̃)| → 0 as t → ∞.
                           h,h̃:ht =h̃t
    ▶ Continuity at infinity will be satisfied if overall payoffs are sum
      of discounted per period payoffs git (at ), and per period
      payoffs are uniformly bounded, i.e. there is B such that
                            For all i, max
                                         t
                                           |git (at )| < B.
                                             t,a
Theorem
A strategy profile s in an infinite horizon multistage game with
observed actions satisfying continuity at infinity is subgame perfect
if and only if no player can profitably deviate from s in a single
stage and conform to s thereafter.
Proof sketch
    ▶ The argument of the previous proof shows that every subgame
      perfect strategy profile must satisfy OSD
    ▶ The argument also shows that a strategy profile satisfying
      OSD cannot be improved upon by any finite sequence of
      deviations in any subgame
    ▶ Suppose not: s is a strategy profile satisfying OSD which is
      not subgame perfect: there is stage t, history ht , and player i
      who could improve his payoff by ϵ > 0 using strategy si′
      different from si in the subgame beginning ht
    ▶ By continuity at infinity, there is a t ′ such that strategy si′′
      which agrees with si′ until t ′ and with si from t ′ onwards must
      improve on si by at least 2ϵ
    ▶ This contradicts that no strategy profile satisfying OSD can
      be improved upon by any finite sequence of deviations in any
      subgame
Critiques of backward induction: a centipede game
             1         A           2             A           I             A
                                                                               (2, . . . , 2)
         D                     D                         D
      (1, . . . , 1)       ( 12 , . . . , 12 )       ( 1I , . . . , 1I )
Critiques of backward induction
    ▶ Consider the I-player centipede game above
    ▶ Backward induction leads to unique prediction (A, . . . , A)
    ▶ (A, . . . , A) looks plausible prediction if I is small
    ▶ The payoff 2 requires all I players to play A: if players choose
      A with independent probability p < 1, the probability that
      I − 1 players will choose A can be very small if I is large
    ▶ The prediction also relies heavily on the common knowledge
      assumption
Critiques of backward induction (Rosenthal 1981)
        1    A1        2   A2        1   A3        2   A4        1   A5
                                                                          (5, 5)
   D1             D2            D3            D4            D5
    (1, 0)        (0, 1)        (3, 0)        (2, 4)        (6, 3)
Critique of backward induction
    ▶ In the centipede game above, the unique backward induction
      strategy profile involves players playing D at every decision
      node
    ▶ If player 2 observes that 1 has chosen A instead, should he
      pick A or D?
    ▶ Seems like if there is a chance of 1 responding with A, then 2
      might want to gamble by playing A
    ▶ How should players form beliefs about opponents’ future
      behaviour?
    ▶ Critics argue that reasonable theory should not rule out any
      behaviour after observing an event which is theoretically
      impossible
    ▶ Another way to address this issue is to update beliefs about
      future events based on current observations
Critique of subgame perfection (Rabin 1988)
                                1
                       L                   R
                                               2   R
                                                             (8, 6, 8)
               (6, 0, 6)               L
                                3
                       F                   G
                                1
               F       G                   F       G
     (0,0,0)         (7,10,7)       (7,10,7)           (0,0,0)
Critique of subgame perfection
    ▶ Subgame perfection supposes that players expect the same
      Nash equilibria in all subgames
    ▶ The coordination game between 1 and 3 in third stage has
      three Nash equilibria: two where 1 and 3 coordinate
      successfully with a payoff of (7, 10, 7) and a mixed strategy
      equilibrium with a payoff of (3 12 , 5, 3 21 ).
    ▶ If 1 and 3 coordinate, then 2 plays L and and 1 plays R
    ▶ If 1 and 3 cannot coordinate, 2 plays R and 1 plays R
    ▶ But if 1 believes that he cannot coordinate in stage 3, but he
      believes that 2 thinks otherwise, 1 can play L!
Repeated games: Prisoner’s Dilemma
    ▶ Suppose the per-period payoffs depend only on current actions
      gi (at ) and given by
                                     Player 2
                                       C      N
                          Player 1 C 1, 1 -1, 2
                                   N 2, -1 0,0
Average discounted payoff
    ▶ Suppose player discount future payoffs with common discount
      factor δ
    ▶ Utility of a sequence {a0 , . . . , aT } is normalized to
                                      T
                             1−δ X t
                                        δ gi (at )
                           1 − δ T +1
                                     t=0
    ▶ The above expression is the average discounted payoff:
      present value multiplied by a factor such that it would equal 1
      if per period payoffs were 1
Finite repetitions
     ▶ If the game is played only once, then non-cooperation is the
       dominant strategy for both players
     ▶ Thus the unique Nash equilibrium of the stage game is (N, N)
     ▶ If the game is repeated a finite number of times, then playing
       (N, N) in every period is the unique subgame perfect Nash
       equilibrium
Infinite repetitions
     ▶ If the game is repeated infinitely many times, playing (N, N)
       in every stage remains a subgame perfect equilibrium
     ▶ Further it is the only equilibrium where play in any stage does
       not depend on the history of the previous stage
     ▶ But if δ > 12 then the following describes a subgame perfect
       equilibrium strategy:
            play C in first period, and in every period where the action
            profile in the previous period was (C, C); if the action
            profile in previous period was different, play N from next
            period onwards
Proof
    ▶ Two classes of subgames: A, where both players play C, and
      B, where some player has played N
    ▶ A player conforming to the strategies in every subgame in A
      has an average discounted payoff 1
    ▶ A player deviating at some t and conforms to the (class B)
      strategies thereafter, has average discounted payoff
              (1 − δ)(1 + δ + δ 2 + . . . + δ t−1 + 2δ t + 0 + . . .)
                            1 − δt
                                           
                                          t
               = (1 − δ)           + 2δ
                             1−δ
               = 1 − δ t (2δ − 1)
    ▶ This expression is less than 1 when δ > 12
    ▶ Since no player can gain by deviating from the proposed
      strategy profile in a single period and conforming thereafter, it
      is a subgame perfect equilibrium
Remarks
   ▶ Folk theorem: there can be many other subgame perfect
     equilibria in the game provided the discount factor is
     sufficiently large
   ▶ Note that the nature and range of subgame perfect equilibria
     can be different depending on the number of repetitions
A finitely repeated game
    ▶ Consider a game comprising of two repetitions of the following
      stage game:
                                         Player 2
                                        L      M         R
                      Player 1 U 0, 0 3, 4 6,0
                                 M 4, 3 0, 0 0, 0
                                 D 0, 6 0, 0 5, 5
    ▶ This stage game has three Nash equilibria: (M, L), (U, M)
      and a mixed strategy equilibrium (( 73 , 74 , 0), ( 37 , 47 , 0)), with
      payoffs (4, 3), (3, 4) and ( 12  12
                                   7 , 7 )
    ▶ The efficient payoff (5, 5) is not an equilibrium
A finitely repeated game
    ▶ In the two stage game, the following strategy profile is
      subgame perfect is δ > 97 :
      play (D, R) in the first stage; if the first stage outcome
      is (D, R), play (M, L) in the second stage; if the first
      stage outcome is not (D, R), play (( 73 , 47 , 0), ( 37 , 47 , 0)) in
      the second stage
    ▶ Notice that second stage actions prescribed are Nash
      equilibria of the stage game
    ▶ If player 1 deviates in the first stage, he gains 1 in the first
      stage but his second period payoff goes down from 4 to 12     7 : so
      he will not deviate if 1 < δ(4 − 12 7 ) or, δ > 7
                                                      16
    ▶ If player 2 deviates in the first stage, he gains 1 in the first
      stage but his second period payoff goes down from 3 to 12     7 : so
      he will not deviate if 1 < δ(3 − 12 7 ) or, δ > 7
                                                      9
Infinitely repeated game with observed action
    ▶ The simultaneous move game which is repeated is called the
      stage game
    ▶ Suppose stage game involves a finite set of players I , with
      finite action spaces Ai (typical element ai ), and payoff
      functions gi : A → R, where A = A1 × . . . × AI
    ▶ Mixed strategies in the stage game are denoted αi
    ▶ We assume that players discount future payoffs using common
      discount factor δ < 1
Aggregate and continuation payoffs
    ▶ For any strategy profile σand history ht a players expected
      payoffs from period t onwards is his continuation payoff:
                                    ∞
                                    X
                       Eσ (1 − δ)          δ τ −t gi (σ t (ht ))
                                    τ =t
    ▶ Player i’s objective is to maximize the normalized sum
                                           ∞
                                           X
                      ui = Eσ (1 − δ)            δ t gi (σ t (ht ))
                                           t=0
       where σ denotes a mixed strategy profile of the entire game
       and E an expectation operator
Observation
    ▶ If α∗ is a Nash equilibrium of the stage game, the strategies
      “play αi∗ in every period” is a subgame perfect Nash
      equilibrium
    ▶ If there are m Nash equilibria, then for any map j from set of
                                  j(t)∗
      periods to indices, “play αi      in period t” is a subgame
      perfect Nash equilibrium
    ▶ Check the above statements using the OSD principle
Preliminaries
    ▶ Player i’s reservation utility or minmax value is
                                                                            v i = min max gi (αi , α−i )
                              α−i    αi
    ▶ v i is the lowest payoff that his opponents can force him to
      realize by any choice of α−i provided he foresees it and plays
      his best response
    ▶ Let m−i be the minmax strategy profile for i’s opponents that
      attains the minimum above
    ▶ Let mi be a strategy for player i such that gi (mi , m−i ) = v i
An example
    ▶ To find the minmax strategies of players in the following game:
                                     Player 2
                                         L       R
                      Player 1 U -2, 2 1, -2
                                 M 1, -2 -2, 2
                                  D 0, 1 0, 1
Example contd.
    ▶ Given any mixed strategy of player 2, (q, 1 − q), the payoffs of
      player 1 from his three strategies, U, M and D, are 1 − 3q,
      3q − 2 and 0
    ▶ Notice that
                                        1 − 3q, 0 ≤ q < 31
                                       
            max(1 − 3q, 3q − 2, 0) =      0, 13 ≤ q < 23
                                          3q − 2 23 ≤ q ≤ 1
                                       
    ▶ Hence the minmax value of player 1 is 0
    ▶ The minmax strategy of player 2 is any q ∈ [ 13 , 23 ]
Example contd.
    ▶ Given any mixed strategy of player 1, (pU , pM , 1 − pU − pM ),
      the payoffs of player 2 from his two strategies, L and R, are
      2(pU − pM ) + (1 − pU − pM ) and (1 − pU − pM ) − 2(pU − pM )
    ▶ Notice
       max(2(pU − pM ) + (1 − pU − pM ), (1 − pU − pM ) − 2(pU − pM ))
         
           2(pU − pM ) + (1 − pU − pM ), pU ≥ pM
       =
           (1 − pU − pM ) − 2(pU − pM )) pU < pM
    ▶ The above function is minimized when pU = pM and
      1 − pU − pM = 0, which gives us pU = pM = 21
    ▶ Thus the minmax value of player 2 is also zero
    ▶ The minmax strategy of player 1 against 2 is ( 12 , 12 , 0)
An observation
    ▶ Player i’s payoff is at least v i in any Nash equilibrium of the
      stage game since Nash equilibrium strategies are best
      responses, and v i is the minimum best response payoff
    ▶ Player i’s payoff is at least v i in any Nash equilibrium of the
      repeated game as well
    ▶ Consider any Nash profile σ̂ of the repeated game: even the
      “naive” strategy of choosing ai (ht ) to maximize
      gi (ai , σ̂−i (ht )) (i.e., choosing the best response to the Nash
      strategy profile of opponents in each period) yields at least v i
Feasible payoffs
    ▶ The set of feasible payoffs include the payoff vectors arising
      out of pure strategies and mixed strategies that are
      independent of each other
    ▶ If we allow for correlated mixing probabilities as well, then the
      set of feasible payoffs is the convex hull of pure strategy
      payoffs, denoted V
    ▶ Recall that the convex hull of a set A is the smallest convex
      set containing A
    ▶ The feasible payoffs where every player gets at least his
      minmax value are called individually rational
Illustration: Set of feasible payoffs
                             v2
                 (-2, 2)
                               (0,1)
                                                v1
                                       (1,-2)
Illustration:Feasible individually rational payoffs
                             v2
                 (-2, 2)
                               (0,1)
                                                v1
                                       (1,-2)
Sidenote: independent vs correlated mixing
    ▶ If players 1 and 2 use independent mixed strategies (p, 1 − p)
      and (q, 1 − q) respectively in the Prisoners Dilemma game,
      then the probability distribution over the four outcomes
      possible is given by the following matrix:
                                            Player 2
                                         C             N
                  Player 1 C            pq          p(1 − q)
                                N (1 − p)q (1 − p)(1 − q)
    ▶ For example, if p = 12 and q = 13 then the numbers in these
      boxes are 61 , 13 , 16 and 13 starting from the top left box
      clockwise
    ▶ The payoffs that can arise in the game are convex
      combinations of the four payoffs (1, 1), (−1, 2), (2, −1), and
      (0, 0) with probability weights as in the matrix above
Sidenote: independent vs correlated mixing
    ▶ But allowing only independent mixing probabilties implies that
      not all convex combinations are feasible
    ▶ For instance, no two independent mixing distributions
      (p, 1 − p), (q, 1 − q) can induce the following probability
      distribution over outcomes:
                                       Player 2
                                           C N
                            Player 1 C 12 0
                                      N 0 12
    ▶ How do we allow players to coordinate in a
      simultaneous-move game over their mixing probabilities?
Sidenote: independent vs correlated mixing
    ▶ Suppose at the beginning of the game players are able to
      observe a public signal that turns green or red with equal
      probabilty
    ▶ If players condition their playing their pure strategies on the
      signal (say, play C if green, and N if red), then we can observe
      the above probability distribution over outcomes
    ▶ Note: There is a notion of correlated equilibrium (like Nash
      equilibrium) which is a probabilty distribution p : S → [0, 1]
      over strategies such that no individual player can gain from a
      permutation of his strategies given these probabilties, i.e. for
      all i and any permutation di : Si → Si ,
                   X                 X
                        p(s)ui (s) ≥    p(s)ui (di (si ), s−i )
                   s∈S             s∈S
Assumptions required for convex feasible set
    ▶ We will assume that in the beginning of each period t ,
      players are able to observe a public signal ω t
    ▶ We then redefine history at period t as
                      ht = (a0 , . . . , at−1 ; ω 0 , . . . , ω t )
    ▶ Pure strategies are now mappings from histories to actions,
      si : H → A
Two Folk Theorems
  Theorem (Nash Folk Theorem)
  For every feasible payoff vector v ∈ V such that vi > v i for all
  players i, there exists a δ < 1 such that for all δ ∈ (δ, 1), there is a
  Nash equilibrium of G (δ) (the repeated game with discount factor
  δ) with payoffs v .
  Theorem (Perfect Folk Theorem)
  Let α∗ be an equilibrium of the stage game, with payoffs e. For
  every feasible payoff vector v ∈ V such that vi > ei for all players
  i, there exists a δ < 1 such that for all δ ∈ (δ, 1), there is a
  subgame perfect Nash equilibrium of G (δ) (the repeated game
  with discount factor δ) with payoffs v .
Proof of Nash Folk Theorem
    ▶ Suppose there is a pure action profile a such that g (a) = v
    ▶ Consider the following strategies for each i:
          Play ai in period 0 and continue to play ai as long as either
          (i) the action profile realized in previous period is a, or (ii)
          the action profile realized in previous period differs from a
          in at least two components; if only one player i deviates
          from a in some period, every other j plays mji for the rest
          of the game
    ▶ If i deviates from a, he gets at most maxai gi (ai , a−i ) in that
      period and v i from next period onwards
Proof of Nash Folk Theorem
    ▶ Thus the maximum payoff that i gets from unilaterally
      deviating at t:
           (1 − δ)(vi + δvi + · · · + δ t−1 vi + δ t max gi (ai , a−i )
                                                        ai
                 t+1       t+2
            +δ    vi + δ vi + · · · )
                        1 − δt                             δ t+1                                                                  
                                     t
            = (1 − δ)          vi + δ max gi (ai , a−i ) +       v
                        1−δ            ai                  1−δ i
            = (1 − δ t )vi + δ t (1 − δ) max gi (ai , a−i ) + δ t+1 v i
                                              ai
    ▶ The value of δ for which this payoff is equal to vi is given by
            (1 − δ t )vi + δ t (1 − δ) max gi (ai , a−i ) + δ t+1 v i = vi
                                         ai
            or, − δ vi + δ (1 − δ) max gi (ai , a−i ) + δ t+1 v i = 0
                       t        t
                                         ai
            or, (1 − δ) max gi (ai , a−i ) + δv i = vi
                           ai
Proof of Nash Folk Theorem
    ▶ Call this discount level δ i , notice that δ i < 1
    ▶ For all δ̂ > δ i the deviation payoff is less than the conforming
      payoff
    ▶ Take δ = maxi δ i to complete the argument
Proof of Nash Folk Theorem
    ▶ If v cannot be realized using a pure strategy, consider a
      strategy contingent on a public signal a(ω) with expected
      payoff v
    ▶ The maximum continuation payoff when deviating in period t
      is (1 − δ) maxai gi (ai , a−i ) + δv i
    ▶ The continuation payoff from conforming in period t is at
      least (1 − δ) minai gi (ai , a−i ) + δvi
    ▶ Call value of δ at which these two expressions are equal, δ i
    ▶ Take δ = maxi δ i to complete the argument
Proof of Perfect Folk Theorem
    ▶ Suppose there is a pure action profile a such that g (a) = v
    ▶ Consider the following strategies for each i:
          Play ai in period 0 and continue to play ai as long as the
          action profile realized in previous period is a; if at least one
          player deviated from a in some period, every i plays αi∗ for
          the rest of the game
    ▶ This strategy profile is Nash equilibrium for all δ such that
                      (1 − δ) max gi (ai , a−i ) + δei < vi
                                ai
Proof of Perfect Folk Theorem
    ▶ Notice that the inequality above holds strictly for δ = 1, so it
      must hold for a range of δ < 1
    ▶ To check that the suggested profile is subgame perfect, notice
      that in any subgame off the equilibrium path players are
      playing α∗ which is a Nash equilibrium for the subgame by our
      earlier observation
    ▶ If there is no pure strategy a such that g (a) = v , we use
      correlated mixing strategies as in the last proof
Illustration: Prisoner’s dilemma
    ▶ In our last example, the minmax payoff for both players is 0,
      maxai gi (ai , a−i ) = 2, minai gi (ai , a−i )gi (a) = −1, unique
      Nash equilibrium payoff in stage game (0, 0)
    ▶ Thus the level of δ above which it is possible to sustain a
      payoff of (1, 1) every period in a Nash equilibrium of the
      repeated game is given by 2(1 − δ) = 1 or δ = 12
    ▶ Since the minmax payoffs are also the unique Nash
      equilibrium payoffs, any δ > 12 also sustain (1, 1) every period
      in a subgame perfect Nash equilibrium of the repeated game
    ▶ A per period payoff ( 21 , 21 ) which does not correspond to any
      pure strategy in the stage game can be sustained in a
      Nash/subgame perfect equilibrium for levels of δ above the
      one defined by 2(1 − δ) = 2δ − (1 − δ) or δ = 67
Folk Theorems for finitely repeated games
    ▶ Recall that in the Prisoner’s Dilemma example above, (N, N)
      was the unique Nash equilibrium in the stage game
    ▶ Playing (N, N) every period is the unique subgame perfect
      Nash equilibrium of the finitely repeated Prisoner’s Dilemma
    ▶ In fact, this is the only Nash equilibrium of the finitely
      repeated Prisoner’s Dilemma game
    ▶ However, in another example involving a stage game with
      multiple Nash equilibria, an outcome that Pareto-dominated
      Nash equilibria was sustained in SPNE of the finitely repeated
      game through a scheme of rewards and punishments
    ▶ Can we generalize this insight further and sustain any feasible
      individually rational payoff in a finitely repeated game through
      such schemes?
(Limit) Folk Theorems for finitely repeated games
   Theorem (Nash Folk)
   Suppose for every player i there is a Nash equilibrium of the stage
   game α∗ (i) with gi (α∗ (i)) > v i . The set of Nash equilibrium
   payoffs of the T -period repeated game converges to the set of
   feasible individually rational payoffs as T → ∞ and δ → 1.
   Theorem (Perfect Folk)
   Suppose for some T ′ and δ ′ , there exist a set of I + 1 subgame
   perfect Nash equilibrium strategies for the stage game repeated T ′
   times, viz., α∗ , α∗ (1), . . . , α∗ (I ), such that for every i,
   ui (α∗ ) − ui (α∗ (i)) > 0 for δ > δ ′ . The set of subgame perfect
   Nash equilibrium payoffs of the T -period repeated game converges
   to the set of feasible individually rational payoffs as T → ∞ and
   δ → 1.
Exercise 9.1
    ▶ Consider the following simultaneous-move game that is played
      twice (the players observe the first-period outcome prior to
      the second- period play):
                                        Player 2
                                      L        C      R
                   Player 1 T 10, 10 2, 12 0, 13
                              M 12, 2        5, 5    0, 0
                               B 13, 0       0, 0    1, 1
    ▶ Find all the pure-strategy subgame-perfect equilibria with no
      discounting (δ = 1).
    ▶ For each of the equilibria you found in (a), find the smallest
      discount factor that supports it.
Exercise 9.2
    ▶ Two players are playing two consecutive games. First, they
      play the following Centipede Game:
                        1 C 2 c 1 C 2 c
                                                  (3, 3)
                    N      n      N      n
                    (1, 1) (0, 3) (2, 2) (1, 4)
    ▶ After the Centipede Game they play the following
      coordination game:
                                    Player 2
                                       a      b
                        Player 1 A 1, 1 0, 0
                                 B 0, 0 3, 3
Exercise 9.2 Contd.
    ▶ What are the Nash equilibria of each stage-game?
    ▶ How many pure strategies does each player have in the
      multistage game?
    ▶ Find all the pure-strategy subgame-perfect equilibria with
      extreme discounting (δ = 0).
    ▶ Now let δ = 1. Find a subgame-perfect equilibrium for the
      two-stage game in which the players receive the payoffs (2, 2)
      in the first stage- game.
    ▶ What is the lowest value of δ for which the subgame-perfect
      equilibrium you found above survives?
    ▶ For δ greater than the value you found above, are there other
      outcomes of the first-stage Centipede Game that can be
      supported as part of a subgame-perfect equilibrium?
Exercise 10.2
    ▶ Consider the following stage game being infinitely repeated
      with discount factor δ < 1 :
                                       Player 2
                                      L      C     R
                    Player 1 T 6, 6 -1, 7 -2, 8
                               M 7, -1 4, 4 -1, 5
                               B 8, -2 5, -1 0, 0
    ▶ For which values of δ can the players support the pair of
      actions (M, C) played in every period?
    ▶ For which values of δ can the players support the pair of
      actions (T , L) played in every period? Why is your answer
      different from the above part?
Exercise 10.3
    ▶ Consider the following stage game being infinitely repeated
      with discount factor δ < 1 :
                                          Player 2
                                            m       f
                          Player 1 M 4, 4 -1, 5
                                      F 5, -1 1, 1
    ▶ Assume that the players wish to choose a “length-T
      punishment” strategy: if there is a deviation from (a1 , a2 )
      then players will play (F , f ) for T periods and then resume
      playing (a1 , a2 ).
    ▶ Let δT be the critical discount factor so that if δ > δT , then
      the adequately defined strategies will implement the desired
      path of play with length-T punishment as the threat.
Exercise 10.3 contd.
    ▶ Let T = 1. What is the critical value δ1 to support the pair of
      actions (M, m) played in every period?
    ▶ Let T = 2. What is the critical value δT to support the pair
      of actions (M, m) played in every period?
    ▶ Compare the two critical values obtained above and explain
      why they are different.
Exercise 10.8 (Tacit collusion)
    ▶ Two firms, which have zero marginal cost and no fixed cost,
      produce some good, each producing qi ≥ 0, i ∈ {1, 2}. The
      demand for this good is given by p = 200 − Q, where
      Q = q1 + q2 .
    ▶ Solve for the static stage-game Cournot-Nash equilibrium.
    ▶ Consider the case of Cournot competition, in which each firm
      chooses qi and this game is infinitely repeated with a discount
      factor δ < 1. For which values of δ can you support the firms
      equally splitting monopoly profits in each period as a
      subgame-perfect equilibrium that uses grim-trigger strategies
      (i.e., after one deviates from the proposed split, they resort to
      the static Cournot-Nash equilibrium thereafter)?
Exercise 10.8 contd.
    ▶ Consider the case of Bertrand competition, in which each firm
      chooses a price pi ≥ 0, where the lowest-price firm gets all the
      demand and in case of a tie they split the market. Solve for
      the static stage-game Bertrand-Nash equilibrium.
    ▶ For which values of δ can you support the firms equally
      splitting monopoly profits in each period as a subgame-perfect
      equilibrium that uses grim-trigger strategies (i.e., after one
      deviates from the proposed split, they resort to the static
      Bertrand-Nash equilibrium thereafter)?
    ▶ Now try to support the firms equally splitting monopoly profits
      as a subgame-perfect equilibrium in which after a deviation
      firms would resort to the static Bertrand competition for only
      two periods. For which values of δ will this work? Why is this
      answer different than your answer in the part above?
Exercise 10.12
    ▶ Suppose the following game is being infinitely repeated:
                                               Player 2
                                                 C      D
                            Player 1 N 0, 0 0, 0
                                            T 1, 1 -1, 2
    ▶ Draw the convex hull of average payoffs.
    ▶ Are the average payoffs (v̄1 , v̄2 ) = (−0.4, 1.1) in the convex
      hull of average payoffs? Can they be supported by a pair of
      strategies that form a subgame-perfect equilibrium for a large
      enough discount factor δ?
    ▶ Show that there is a pair of subgame-perfect equilibrium
      strategies for the two players that yields average payoffs that
      approach (v̄1 , v̄2 ) = ( 31 , 43 ) as δ approaches 1.
Bargaining
    ▶ Bargaining usually refers to a situation of conflicting interests
      of multiple agents
    ▶ Can be modelled as cooperative as well as non-cooperative
      game
    ▶ One-period bargaining model also known as ultimatum game:
         ▶ Two players are bargaining over the share of a rupee
         ▶ If player 1 makes an offer in period 1 player 2 can accept or
           reject
         ▶ If player 2 accepts then the game ends with each player getting
           the agreed shares
         ▶ If player 2 rejects, then the game ends with zero payoffs for
           both
Equilibria of the ultimatum game
    ▶ For any x ∈ [0, 1], (offer x, accept any offer of x or more) is a
      Nash equilibrium
    ▶ (offer 0, reject all offers) is also a Nash equilibrium
    ▶ Unique subgame perfect equilibrium: (offer 0, accept any
      positive offer), with outcome (1, 0) shares
2 periods
    ▶ If player 1 makes an offer in period 1 player 2 can accept or
      reject
    ▶ If player 2 accepts then the game ends with each player
      getting the agreed shares
    ▶ If player 2 rejects, then player 2 makes an offer in period 2
      player 1 can accept or reject
    ▶ If player 1 accepts then the game ends with each player
      getting the agreed shares
    ▶ If player 1 rejects, then the game ends with zero payoffs for
      both
SPNE of 2 period bargaining
    ▶ By an argument similar to that of the ultimatum game, the
      unique SPNE outcome of the period-2 game is (0, 1)
    ▶ In other words, player 2 can get at least 1 in period 2 by
      rejecting player 1’s period 1 offer
    ▶ Getting 1 in period 2 is worth δ < 1 in period 1
    ▶ Therefore, if 1 offers 2 a share of δ or more in period 1, 2
      must accept it
    ▶ The highest payoff that 1 can get therefore is 1 − δ
    ▶ The unique SPNE ((offer δ in period 1, accept any positive
      share in period 2), (accept any share of δ or more in period 1,
      offer 0 in period 2)) yields an outcome (1 − δ, δ) in period 1
      itself
SPNE of finite period bargaining
    ▶ The unique SPNE outcome of three period bargaining game:
      (1 − δ(1 − δ), δ(1 − δ))
    ▶ The unique SPNE outcome of four period bargaining game:
      (1 − δ(1 − δ(1 − δ)), δ(1 − δ(1 − δ))
    ▶ The unique SPNE outcome of T period bargaining game:
       (1 − δ + δ 2 + . . . + (−1)T −1 δ T −1 , δ − δ 2 + . . . + (−1)T δ T )
            1 − (−δ)T δ(1 − (−δ)T −1 )
                                            
        =                ,
               1+δ              1+δ
                                           1
    ▶ If T → ∞, this outcome approaches ( 1+δ    δ
                                              , 1+δ )
SPNE of infinite period bargaining
    ▶ Question: Suppose bargaining extends to potentially infinite
      number of periods. Since it is not possible to apply backward
      induction, how do we then figure out reasonable equilibrium
      strategies?
    ▶ Is the limiting equilibrium outcome of the finite period game
      obtainable?
SPNE strategies in the infinite period game
    ▶ Consider the following strategy
                                  δ
          offer the other player 1+δ whenever it is one’s turn to offer,
                                            δ
          accept any offer with a share of 1+δ  or more whenever the
          other player is making an offer
    ▶ It is straightforward to check that such a pair of strategies
      satisfy OSD and hence it constitutes a SPNE
    ▶ Rubinstein (1982): above strategies constitute the unique
      SPNE of infinite period bargaining game
Proof of uniqueness
   Lemma
                                    1
   In no SPNE, 1 gets more than    1+δ .
                            1
    ▶ If 1 gets more than 1+δ when making an offer, 2 must be
                         δ
      getting less than 1+δ .
    ▶ Player 2 can reject and get at least 1+δ1
                                                 next period by playing
      the SPNE strategies.
    ▶ If 1 gets more than 1+δ 1
                                 when 2 is making an offer, 2 can offer
      her slightly less, and 1 will accept the new offer since she can
                       1
      earn at most 1+δ    if she rejects and makes an offer herself.
   Corollary
                                   δ
   In no SPNE, 2 gets less than   1+δ
Proof of uniqueness contd.
   Lemma
                                    δ
   In no SPNE, 2 gets more than    1+δ .
    ▶ If 2 gets more than 1+δδ
                               when making an offer, 1 must be
                         1
      getting less than 1+δ .
                                δ
    ▶ If 1is getting less than 1+δ , she can reject and get at least
        1
      1+δ next period by playing the SPNE strategies.
    ▶ If 1 gets more than 1+δ δ
                                 when 2 is making an offer, 2 can offer
      her slightly less, and 1 will accept the new offer since she can
                       1
      earn at most 1+δ    if she rejects and makes an offer herself.
    ▶ If 2 gets more than 1+δ δ
                                 when 1 is making an offer, 1 can offer
      her slightly less, and 2 will accept the new offer since she can
                       1
      earn at most 1+δ    if he rejects and makes an offer himself.
   Corollary
                                   1
   In no SPNE, 1 gets less than   1+δ .