0% found this document useful (0 votes)
16 views51 pages

AAI Lecture 7 SP 25

The document discusses the application of artificial intelligence in games, focusing on game search problems, adversarial search, and algorithms like MiniMax and Alpha-Beta pruning. It covers the formulation of game problems, optimal strategies, and the complexities involved in implementing these algorithms. Additionally, it addresses nondeterministic games and the Expectiminimax algorithm for handling chance nodes.

Uploaded by

Ahsan Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views51 pages

AAI Lecture 7 SP 25

The document discusses the application of artificial intelligence in games, focusing on game search problems, adversarial search, and algorithms like MiniMax and Alpha-Beta pruning. It covers the formulation of game problems, optimal strategies, and the complexities involved in implementing these algorithms. Additionally, it addresses nondeterministic games and the Expectiminimax algorithm for handling chance nodes.

Uploaded by

Ahsan Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

AI 4007– Applied Artificial Intelligence

Dr. Atif Aftab Ahmed Jilani


Assistant Professor
atif.jilani@nu.edu.pk
AGENDA
• Games as Search Problem
• Adversarial Search
• MiniMax Algorithm
• Alpha Beta Pruning

2
Games vs. Search Problems
□ "Unpredictable" opponent
□ specifying a move for every
possible opponent reply
□ Time limits
□ unlikely to find goal,
must approximate

3
Types of games
Deterministic Chance

Backgammon
Chess, Checkers, Go, Monopoly
Perfect Information Othello,
Rock−Paper−Scissors

Battleships, Bridge, Poker,


Kriegspiel, Scrabble, Nuclear
Imperfect Information Stratego War

4
AI and Games
□ In AI, “games” have special format:
□ deterministic, turn-taking, 2-player, zero-
sum games of perfect information

□ Zero-sum describes a situation in which a


participant’s gain or loss is exactly balanced
by the losses or gains of the other
participant(s)
Go! 围棋
□ Or, the total payoff to all players is the
same for every instance of the game
(constant sum)
5
Game Problem Formulation
□ A game with 2 players (MAX and MIN, MAX moves first, turn-
taking) can be defined as a search problem with:
□ initial state: board position

□ player: player to move

□ successor function: a list of legal (move, state) pairs

□ goal test: whether the game is over – terminal states

□ utility function: gives a numeric value for the terminal states


(win, loss, draw)

□ Game tree = initial state + legal moves

6
Game Tree (2-player, deterministic, turns)

Utility value

7
Optimal Strategies - MiniMax
□ Perfect play for deterministic, perfect-information games
□ Idea: choose move to position with highest minimax value
= best achievable utility against best play
□ MAX must find a contingent strategy, specifying MAX’s move in:
□ the initial state

□ the states resulting from every possible response by MIN

□ E.g., 2-ply game (the tree is one move deep, consisting of two half-moves, each of
which is called a ply):
MAX 3

MIN 3 2 2

MAX
3 12 8 2 4 6 14 5 2 8
Minimax Value
□ Perfect play for deterministic game, assume both players play
optimally
□ Idea: choose move to position with highest minimax value =
best achievable payoff against best play

9
A Partial Game Tree for Tic-Tac-Toe

O’s turn (MIN)

X’s turn (MAX)

10
Minimax Algorithm
function MINIMAX-DECISION(state) returns an action
inputs: state, current state in game
retur n the a in ACTIONS(state) maximizing MIN-V ALUE(RESULT(a, state))

function M AX -VALUE(state) returns a utility value


if T ERMINAL -TEST(state) then return UTILITY(state)
v ← −∞
for a, s in SUCCESSORS(state) do v ← M AX (v, MIN-VALUE(s))
retur n v

function MIN-VALUE(state) returns a utility value


if T ERMINAL -TEST(state) then return UTILITY(state)
v←∞
for a, s in SUCCESSORS(state) do v ← MIN(v, M AX -VALUE(s))
retur n v

11
Analysis of Minimax
□ Complete?? Yes, if tree is finite (chess has specific rules for this)
□ Optimal?? Yes, against an optimal opponent. Otherwise??
□ Optimal play for MAX assumes that MIN also plays optimally, what if
MIN does not play optimally?
□ A complete depth-first search?
□ Yes
□ Time complexity?
□ O(bm)
□ Space complexity?
□ O(bm) (depth-first exploration)
□ For chess, b ≈ 35, m ≈ 100 for "reasonable" games
□ exact solution completely infeasible
12
α-β Pruning
□ The number of game states with minimax search is
exponential in the # of moves

□ Is it possible to compute the correct minimax decision


without looking at every node in the game tree?

□ Need to prune away branches that cannot possibly


influence the final decision

13
Intuition Behind alpha beta pruning

The value of the root and hence the minimax decision are independent of the values of
pruned leaves xand y

14
Definitions of α and β
□ α = the value of the best (highest-value) choice we have
found so far at any choice point along the path for MAX

□ β = the value of the best (lowest-value) choice we have


found so far at any choice point along the path for MIN

□ α-β search updates the values of α and β as it goes along


and prunes the remaining branches at a node as soon as
the value of the current node is worse than the current α or
β for MAX or MIN respectively

15
α-β Pruning Example
MAX 3
A

MIN 3 2 2

3 12 8 2 4 6 14 5 2

16
α-β Pruning Example
[-∞,+∞]

[-∞, 3]

17
α-β Pruning Example

[-∞,+∞]

[-∞, 3]

18
α-β Pruning Example
[3,+∞]

[3, 3]

19
α-β Pruning Example
[3,+∞]

[3, 3] [-∞, 2]

20
α-β Pruning Example
[3, 14]

[3, 3] [-∞, 2] [-∞, 14]

21
α-β Pruning Example
[3, 5]

[3, 3] [-∞, 2] [-∞, 5]

22
α-β Pruning Example
[3, 3]

[3, 3] [-∞, 2] [2, 2]

23
α-β Pruning Exercise

MAX

MIN

MAX
2 5 8 1 2 7 3 6 9

24
Basic Idea of α-β
Consider a node n such that Player
has a choice of moving to

If Player has a better choice m


either at the parent of n or at any
choice point further up, then n will
never be reached in actual play

α-β pruning gets it name from the


two parameters that describe
bounds on the backed-up values
25
The α–β Algorithm
function A LPHA -B ETA -DECISION(state) returns an action
retur n the a in ACTIONS(state) maximizing MIN-V ALUE(RESULT(a, state))

function M AX -VALUE(state, α, β) returns a utility value


inputs: state, current state in game
α, the value of the best alternative for MAX along the path to state
β, the value of the best alternative for MIN along the path to state
if T ERMINAL -TEST(state) then return UTILITY(state)
v ← −∞
for a, s in SUCCESSORS(state) do
v ← M AX (v, MIN-VALUE(s, α,β))
if v ≥ β then return v
α ← MAX(α, v)
retur n v

function MIN-VALUE(state, α, β) returns a utility value


same as M AX -V ALUE but with roles of α,β reversed
26
Now, Trace The Behavior
MAX 3 A

MIN 3 2 2

3 12 8 2 4 6 14 5 2

27
Analysis of α-β
□ Pruning does not affect final result

□ The effectiveness of alpha-beta pruning is highly dependent on the order of


successors
□ It might be worthwhile to try to examine first the successors that are likely to
be best
□ With "perfect ordering," time complexity = O(bm/2)
□ effective branching factor becomes b
□ For chess, 6 instead of 35
□ it can look ahead roughly twice as far as minimax in the same amount of
time
□ Ordering in chess: captures, threats, forward moves, and then backward
moves

28
Random Ordering?
□ If successors are examined in random order rather
• than best-first, the complexity will be roughly
O(b3m/4)

□ Adding dynamic move-ordering schemes, such as


trying first the moves that were found to be best last
time, brings us close to the theoretical limit

□ The best moves are often called killer moves (killer


move heuristic)

29
Dealing with Repeated States
□ In games, repeated states occur frequently because
of transpositions --
□ different permutations of the move sequence end up in the same position
e.g., [a1, b1, a2, b2] vs. [a1, b2, a2, b1]
□ It’s worthwhile to store the evaluation of this position
in a hash table the first time it is encountered
□ similar to the “explored set” in graph-search
□ Tradeoff:
□ Transposition table can be too big
□ Which to keep and which to discard

30
Imperfect, Real-Time Decisions
□ Minimax generates the entire game search space
□ Alpha-beta prunes large part of it, but still needs to search all
the way to terminal states
□ However, moves must be made in reasonable amount of time

□ Standard approach: turning non-terminal nodes into terminal


leaves
□ cutoff test: replaces terminal test, e.g., depth limit

□ heuristic evaluation function = estimated desirability or utility


of position

31
Evaluation Functions
□ The performance of a game-playing program is dependent on the
quality of its evaluation function
□ Order the terminal states the same way as the true utility
function

□ Evaluation of nonterminal states correlate with the actual


chance of winning

□ Computation must not take too long!

32
Evaluation Functions

Black to move White to move

White slightly better Black winning

For chess, typically linear weighted sum of features


Eval(s) = w 1 f 1 (s) + w 2 f 2 (s) + . . . + w n f n (s)
e.g., w1 = 9 with
f 1 (s) = (number of white queens) – (number of black queens), etc.
33
Digression:Exact values don’t matter

MAX

MIN 1 2 1 20

1 2 2 4 1 20 20 400

Behaviour is preserved under any monotonic transformation of EVAL

Only the order matters:


an ordinal utility function suffices for deterministic games

34
Cutting off Search
□ Modify alpha-beta search so that:
□ Terminal? is replaced by Cutoff?

□ Utility is replaced by Eval

□ if Cut o ff-Test (state, depth) then return Eva l(state)

□ depth is chosen such that the amount of time used will not exceed
what the rules of the game allow

□ Iterative deepening search can be applied


• When time runs out, returns the move selected by the deepest
completed search
35
Result?
□ Does it work in practice?
□ Chess problem. Assume we can generate and evaluate a
million nodes per second, this will allow us to search roughly
200 million nodes per move under standard time control (3
minutes per move).
□ With minimax, only 5-ply, but with alpha-beta,10-ply

□ 4-ply lookahead is a hopeless chess player!


□ 4-ply ≈ human novice

□ 8-ply ≈ typical PC, human master

□ 12-ply ≈ Deep Blue, Kasparov

36
Deterministic Games in Practice
Checkers: Chinook ended 40-year-reign of human world champion Marion
Tinsley in 1994. Used a precomputed endgame database defining perfect play
for all positions involving 8 or fewer pieces on the board, a total of 444 billion
positions.
Chess: Deep Blue defeated human world champion Garry Kasparov in a six-
game match in 1997. Deep Blue searches 200 million positions per second,
uses very sophisticated evaluation, and undisclosed methods for extending
some lines of search up to 40 ply.
Othello: human champions refuse to compete against computers, who are too
good.
Go: human champions refuse to compete against computers, who are too bad.
In go, b >3 00 , so most programs use pattern knowledge bases to suggest
plausible moves.

37
Nondeterministic games: backgammon
0 1 2 3 4 5 6 7 8 9 10 11 12

25 24 23 22 21 20 19 18 17 16 15 14 13 38
Nondeterministic games in general
• In nondeterministic games, chance introduced by dice, card-shuffling
• Simplified example with coin-flipping:
MAX

CHANCE 3 −1

0.5 0.5 0.5 0.5

MIN 2 4 0 −2

39
2 4 7 4 6 0 5 −2
Algorithm for nondeterministic games
E XPECTIMINIMAX gives perfect play
Just like MINIMAX, except we must also handle chance nodes:
...
if state is a M AX node then
return the highest E XPECTI M INIMAX -VALUE of SUCCESSORS(state)
if state is a MIN node then
return the lowest E XPECTI M INIMAX-VALUE of SUCCESSORS(state)
if state is a chance node then
return average of E XPECTI M INIMAX -VALUE of SUCCESSORS(state)
...

40
Nondeterministic games in practice
Dice rolls increase b: 21 possible rolls with 2 dice Backgammon ≈
20 legal moves (can be 6,000 with 1-1 roll)
depth 4 = 20 × (21 × 20)3 ≈ 1.2 × 109
As depth increases, probability of reaching a given node shrinks
⇒ value of lookahead is diminished
α–β pruning is much less effective
TDGAMMON uses depth-2 search + very good EVAL
≈ world-champion level
41
Digression:Exact values D O matter
MAX

DICE 2.1 1.3 21 40.9

.9 .1 .9 .1 .9 .1 .9 .1

MIN 2 3 1 4 20 30 1 400

2 2 3 3 1 1 4 4 20 20 30 30 1 1 400 400

Behaviour is preserved only by positive linear transformation of EVAL


Hence EVAL should be proportional to the expected utility

42
Games of imperfect information
E.g., card games, where opponent’s initial cards are unknown
Typically we can calculate a probability for each possible deal
Seems just like having one big dice roll at the beginning of the game∗
Idea: compute the minimax value of each action in each deal,
then choose the action with highest expected value over all deals∗
Special case: if an action is optimal for all deals, it’s optimal.∗
GIB, current best bridge program, approximates this idea by
1. generating 100 deals consistent with bidding information
2. picking the action that wins most tricks on average
43
Commonsense example
• Road A leads to a small heap of gold pieces
• Road B leads to a fork:
• take the left fork and you’ll find a mound of jewels;
• take the right fork and you’ll be run over by a bus.
• Road A leads to a small heap of gold pieces
• Road B leads to a fork:
•take the left fork and you’ll be run over by a bus;
•take the right fork and you’ll find a mound of jewels.
• Road A leads to a small heap of gold pieces
• Road B leads to a fork:
•guess correctly and you’ll find a mound of jewels;
•guess incorrectly and you’ll be run over by a bus.

44
Proper analysis
• Intuition that the value of an action is the average of its values in all actual states is WR ONG
• With partial observability, value of an action depends on the information state or belief
state the agent is in
• Can generate and search a tree of information states

• Leads to rational behaviors such as


♦ Acting to obtain information
♦ Signalling to one’s partner
♦ Acting randomly to minimize information disclosure

45
Summary
• Games are fun to work on!
• They illustrate several important points about AI
• Perfection is unattainable must approximate
• good idea to think about what to think about
• uncertainty constrains the assignment of values to
states
• optimal decisions depend on information state, not
real state Games are to AI as grand prix racing is to
automobile design
46
Exercise 1
• Prove that: for every game tree, the utility obtained by
MAX using minimax decisions against a suboptimal MIN
will never be lower than the utility obtained playing
against an optimal MIN.

47
Exercise 2
□ Consider the following game:
□ Draw the complete game tree, using the following convention:
□ State: (Sa, Sb) where Sa and Sb denote the token locations
□ Identify the terminal state in a square box and assign it a value
□ Put loop states in double square boxes
□ Since it’s not clear how to assign values to loop states, annotate each with
a “?”

A B

1 2 3 4

48
Answer to Ex1
• Consider a MIN node whose children are terminal nodes.
If MIN plays suboptimally, then the value of the node is
greater than or equal to the value it would have if MIN
played optimally. Hence, the value of the MAX node that
is the MIN node’s parent can only be increased. This
argument can be extended by a simple induction all the
way to the root.

49
Game Tree for Ex2

Think about how to get


the values

50
Thanks

51

You might also like