Game Playing AI
(based on earlier lecture from Stephen Gould)
Structured Programming COMP1110/COMP1140/COMP6710
Origins
Schachtürke (Chess Turk)   El Ajedrecista (The Chess Player)   Plankalkül (Plan Calculus)
         1770                            1912                             1941
Early History of AI
  John von Neumann    John McCarthy   Arthur Samuel
Games
A game consists of a set of two or more players, a set of moves for the
players, and a specification of payoffs (outcomes) for each combination
of strategies.
Many different types of games:
   •   two-person zero-sum
   •   multi-player
   •   perfect information games
   •   imperfect information games
   •   games of chance
Game Trees
A strategy defines a complete
plan of action for a given player.
Given enough processing time an
optimal strategy can be found for
games of perfect information by
enumerating paths of a game
tree. However, in practice this can
only be done for small games.
Minimax
Consider two players, MAX and MIN. Player MAX is trying to maximize
the score and player MIN is trying to minimize the score. We assume
that the players are rational.
Minimax
The minimax algorithm allows each player to compute their optimal
move on a game tree of alternating MAX and MIN nodes.
  max-value(s)                     min-value(s)
    if terminal(s) then              if terminal(s) then
      return v(s)                      return v(s)
    end if                           end if
    v = −∞                           v = ∞
    for each successor s’ do         for each successor s’ do
      v = max(v, min-value(s’))        v = min(v, max-value(s’))
    end for                          end for
    return v                         return v
Minimax Example
                                      start
MAX                   5
MIN
              5               2
MAX
          5       7       2       8
MIN
      3   5   7   2   2   1   4   8   end
Alpha-beta Pruning
Minimax suffers from the problem that the number of game states it
has to examine is exponential in the number of moves.
Alpha-beta pruning is a method for reducing the number of nodes that
need to be evaluated by only considering nodes that may be reached in
game play.
Alpha-beta pruning places bounds on the values appearing anywhere
along a path:
   • 𝛼: the best (highest) value found so far for MAX
   • 𝛽: the best (lowest) value found so far for MIN
𝛼 and 𝛽 propagate down the game tree. The value v propagates up the
game tree.
Alpha-beta Pruning (2)
Initialize 𝛼 = −∞ and 𝛽 = ∞
 max-value(𝐬, 𝜶, 𝜷)                  min-value(𝐬, 𝜶, 𝜷)
   if terminal(s) then                 if terminal(s) then
     return v(s)                         return v(s)
   end if                              end if
   v = −∞                              v = ∞
   for each successor s’ do            for each successor s’ do
     v = max(v, min-value(s’,𝛼,𝛽))       v = min(v, max-value(s’,𝛼,𝛽))
     if v ≥ 𝛽 then                       if v ≤ 𝛼 then
       return v                            return v
     end if                              end if
     𝛼 = max(𝛼, v)                       β = min(𝛽, v)
   end for                             end for
   return v                            return v
Alpha-beta Pruning Example
MAX                   5
MIN
              5                  2
MAX
          5       7       2
MIN
      3   5   7       2      1
Multi-player Games
When we have more than two players we need to adapt the minimax
approach. The most conservative strategy is to assume that all of your
opponents are conspiring to minimize your score.
   • Treat your opponents as one big powerful player.
Big Games (e.g., Patchwork)
                        …
                 >100 possible moves*
                                        * Depending on available pieces.
Big Games (e.g., Patchwork)
             …                          …
     >400 possible moves*       >400 possible moves*
                                           * Depending on available pieces.
Big Games (e.g., Patchwork)
                        (33 pcs)
                          >100
                         moves
                      100 x 400 =
                     >4000 moves
                    100 x 400 x 400=
                   16,000,000 moves
                 100 x 400 x 400 x 400 =
                  6,400,000,000 moves
                      (~10 pieces)
                        ? moves
                          …
Static Evaluation Function
For real-world games, even with alpha-beta pruning, we still can't
search the entire game tree. In these situations, instead of a terminal
test, we introduce a cut-off test that applies a heuristic value at some
intermediate game state.
The heuristic is called a static evaluation function and it returns an
estimate of the expected payoff from a given position.
Machine learning techniques are often used to find a good static
evaluation function based on a linear combination of features:
                    𝑣ො 𝑠 = 𝑤1 𝑓1 𝑠 + ⋯ + 𝑤𝑛 𝑓𝑛 (𝑠)
Exploration versus Exploitation
Learning the static evaluation function is a classic reinforcement
learning problem.
   • Repeatedly play against yourself.
   • Reward board positions that lead to wins.
   • Punish board positions that lead to losses.
A crucial trade off is in choosing between exploration and exploitation.
Q-Learning
• Many games can be modelled as a Markov Decision Process:
   • At each discrete timestep, the game is in a particular state.
   • In each state, the player can choose from a set of actions.
   • Given a choice of action, the system will transition
     to a new state chosen at random with some
     probability distribution.
• Q-Learning learns a ‘Quality’ value for each
  state-action combination, based on the
  expected reward for each state
• Parameters: learning rate, discount factor,
  initial Q-values
Cut-off Test
A cut-off test determines when to apply static evaluation. Searching to
a fixed depth is a simple cut-off policy, but this suffers from the horizon
problem: an unavoidable damaging move that can be pushed beyond
the depth of the search.
Cut-off Test
A cut-off test determines when to apply static evaluation. Searching to
a fixed depth is a simple cut-off policy, but this suffers from the horizon
problem: an unavoidable damaging move that can be pushed beyond
the depth of the search.
Another problem is stopping in the middle of a sequence of moves
(e.g., piece exchange in chess).
Some techniques exist to avoid these issues:
   • only apply static evaluation on quiescent positions (i.e., stable heuristic).
   • killer heuristic – always consider bad moves from the opponent.
Games that include an element of chance require that we calculate the
expected value of a position rather than the exact value.
Games with Chance
RAND
MAX
RAND
MIN
Games with Chance (e.g., Stratopolis)
                           green’s
                            move
                      red’s (opponent)
                            move
               random player shuffles remaining
                           pieces
                           green’s
                          next move
                       red’s (opponent)
                          next move
                             …
Monte Carlo Simulation
Monte Carlo simulation is randomized algorithm that can be used to
approximate the value of an intermediate game state.
   • Develop the game tree to some fixed depth or some fixed width
   • Run simulations from each leaf node
   • Use results of simulation to assign a value to the node
Opening Book and Endgame Databases
• Opening books can save computation at the beginning of a
  game by storing a good sequence of starting moves.
   • For variety, a player can randomly choose between the moves.
   • As soon as an opponent plays a move that is not encoded in the
     book, the player must resort to search or simulated game play.
• For some games, the state space reduces near to the end of
  the game. In such cases, an endgame database can be pre-
  computed by working backwards from different endings.
   • If an agent ever finds a game state that matches one in the endgame
     database it can immediately determined whether it will win or lose.
Milestones in AI Game Playing
        1959 Arthur Samuel develops Checkers playing program
        1997 IBM’s Deep Blue chess machine beats Gary Kasparov
        2007 Checkers solved by University of Alberta
        2011 IBM’s Watson wins Jeopardy! requiring natural
             language understanding
        2015 Deep reinforcement learning algorithms learn to play
             Atari arcade games from scratch
        2016 Google DeepMind’s AlphaGo beats Lee Sedol, Korea