CS 3600: Introduction to Artificial Intelligence
— MCTS + Logic —
By Munroe, CC BY-NC 2.5, https://xkcd.com/638/
Animesh Garg, Larry Heck, Weicheng Ma
Spring 2025
1
[These slides are partially adopted from cs6601 by Thomas Plötz, cs6601 by Thad Starner, cs 3600 by Mark Riedl, and Berkeley cs188 course by Dan Klein and Pieter Abbeel]
Multi-Agent Utilities
What if the game is not zero-sum, or has multiple players?
Generalization of minimax:
Terminals have utility tuples
Node values are also utility tuples
Each player maximizes its own component
Can give rise to cooperation and
competition dynamically…
1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5
Mixed Layer Types
E.g. Backgammon
Expectiminimax
Environment is an
extra “random
agent” player that
moves after each
min/max agent
Each node
computes the
appropriate
combination of its
children
Monte Carlo Tree Search
Monte Carlo Tree Search
Methods based on alpha-beta search assume a fixed horizon
Pretty hopeless for Go, with b > 300
MCTS combines two important ideas:
Evaluation by rollouts – play multiple games to termination from a
state s (using a simple, fast rollout policy) and count wins and losses
Selective search – explore parts of the tree that will help improve the
decision at the root, regardless of depth
Rollouts
“Move 37”
For each rollout:
Repeat until terminal:
Play a move according to
a fixed, fast rollout policy
Record the result
Fraction of wins
correlates with the true
value of the position!
Having a “better”
rollout policy helps
MCTS Version 0
Do N rollouts from each child of the root, record fraction of wins
Pick the move that gives the best outcome by this metric
57/100 39/100 65/100
MCTS Version 0
Do N rollouts from each child of the root, record fraction of wins
Pick the move that gives the best outcome by this metric
57/100 0/100 59/100
MCTS Version 0.9
Allocate rollouts to more promising nodes
77/140 0/10 90/150
MCTS Version 0.9
Allocate rollouts to more promising nodes
61/100 6/10 48/100
MCTS Version 1.0
Allocate rollouts to more promising nodes
Allocate rollouts to more uncertain nodes
61/100 6/10 48/100
UCB heuristics
UCB1 formula combines “promising” and “uncertain”:
N(n) = number of rollouts from node n
U(n) = total utility of rollouts (e.g., # wins) for Player(Parent(n))
A provably not terrible heuristic for bandit problems
(which are not the same as the problem we face here!)
MCTS Version 2.0: UCT
Repeat until out of time:
Given the current search tree, recursively apply UCB to choose a path
down to a leaf (not fully expanded) node n
Add a new child c to n and run a rollout from c
Update the win counts from c back up to the root
Choose the action leading to the child with highest N
UCT Example
5/10 4/9
4/7 1/2 0/1 0/1
2/3 0/1 2/2 1/1
Why is there no min or max?
“Value” of a node, U(n)/N(n), is a weighted sum of child values!
Idea: as N → ∞ , the vast majority of rollouts are concentrated in
the best child(ren), so weighted average → max/min
Theorem: as N → ∞ UCT selects the minimax move
(but N never approaches infinity!)
MCTS + Machine Learning: AlphaGo
■ Monte Carlo Tree Search with additions including:
■ Rollout policy is a neural network trained with reinforcement learning
and expert human moves
■ In combination with rollout outcomes, use a trained value function to
better predict node’s utility
[Mastering the game of Go with deep neural networks and tree search. Silver et al. Nature. 2016]
Summary
Games require decisions when optimality is impossible
Bounded-depth search and approximate evaluation functions
Games force efficient use of computation
Alpha-beta pruning, MCTS
Game playing has produced important research ideas
Reinforcement learning (checkers)
Iterative deepening (chess)
Rational metareasoning (Othello)
Monte Carlo tree search (chess, Go)
Solution methods for partial-information games in economics (poker)
Video games present much greater challenges – lots to do!
b = 10500, |S| = 104000, m = 10,000, partially observable, often > 2 players
By Munroe, CC BY-NC 2.5, https://xkcd.com/638/
27
[These slides are partially adopted from cs6601 by Thomas Plötz, cs6601 by Thad Starner, cs 3600 by Mark Riedl, and Berkeley cs188 course by Dan Klein , Pieter Abbeel], Stuart Ruseel
Outline of the course
unknown
RL
known
deterministic stochastic
atomic
SEARCH MDPs
factored Bayes
LOGIC
nets
structured First-order
logic
Logic Outline
1. Propositional Logic I
Basic concepts of knowledge, logic, reasoning
Propositional logic: syntax and semantics, Pacworld example
Inference by theorem proving
2. Propositional logic II
Inference by model checking
A Pac agent using propositional logic
3. First-order logic
Agents that know things
Agents acquire knowledge through perception, learning, language
Knowledge of the effects of actions (“transition model”)
Knowledge of how the world affects sensors (“sensor model”)
Knowledge of the current state of the world
Can keep track of a partially observable world
Can formulate plans to achieve goals
Check out OpenAI’s Operator (Web Agents)
Knowledge, contd.
Knowledge base = set of sentences in a formal language
Declarative approach to building an agent (or other system):
Tell it what it needs to know (or have it Learn the knowledge)
Then it can Ask itself what to do—answers should follow from the KB
Agents can be viewed at the knowledge level
i.e., what they know, regardless of how implemented
A single inference algorithm can answer any answerable question
Knowledge base Domain-specific facts
Inference engine Generic code
Logic
Syntax: What sentences are allowed? Sentence: x > y
Semantics: World1: x = 5; y = 2
World2: x = 2; y = 3
What are the possible worlds?
Which sentences are true in which worlds? (i.e., definition of truth)
α1
α2 α3
Syntaxland Semanticsland
Different kinds of logic
Propositional logic
Syntax: P ∨ (¬Q ∧ R); X1 ⇔ (Raining ⇒ ¬Sunny)
Possible world: {P=true,Q=true,R=false,S=true} or 1101
Semantics: α ∧ β is true in a world iff is α true and β is true (etc.)
First-order logic
Syntax: ∀x ∃y P(x,y) ∧ ¬Q(Joe,f(x)) ⇒ f(x)=f(y)
Possible world: Objects o1, o2, o3; P holds for <o1,o2>; Q holds for <o3,o2 >;
f(o1)=o1; Joe=o3; etc.
Semantics: φ(σ) is true in a world if σ=oj and φ holds for oj; etc.
Different kinds of logic, contd.
Relational databases:
Syntax: ground relational sentences, e.g., Sibling(Ali,Bo)
Possible worlds: (typed) objects and (typed) relations
Semantics: sentences in the DB are true, everything else is false
Cannot express disjunction, implication, universals, etc.
Query language (SQL etc.) typically some variant of first-order logic
Often augmented by first-order rule languages, e.g., Datalog
Knowledge graphs (roughly: relational DB + ontology of types and relations)
Google Knowledge Graph: 5 billion entities, 500 billion facts, >30% of queries
Facebook network: 2.8 billion people, trillions of posts, maybe quadrillions of facts
Inference: entailment
Entailment: α |= β (“α entails β” or “β follows from α”) iff in
every world where α is true, β is also true
I.e., the α-worlds are a subset of the β-worlds [models(α) ⊆ models(β)]
In the example, α2 |= α1
(Say α2 is ¬Q ∧ R ∧ S ∧ W
α1 is ¬Q )
α1
α2
Inference: proofs
A proof is a demonstration of entailment between α and β
Sound algorithm: everything it claims to prove is in fact entailed
Complete algorithm: everything that is entailed can be proved
Inference: proofs
Method 1: model-checking
For every possible world, if α is true make sure that is β true too
OK for propositional logic (finitely many worlds); not easy for first-order logic
Method 2: theorem-proving
Search for a sequence of proof steps (applications of inference rules) leading
from α to β
E.g., from P ∧ (P ⇒ Q), infer Q by Modus Ponens
Propositional logic syntax
Given: a set of proposition symbols {X1,X2,…, Xn}
(we often add True and False for convenience)
Xi is a sentence
If α is a sentence then ¬α is a sentence
If α and β are sentences then α ∧ β is a sentence
If α and β are sentences then α ∨ β is a sentence
If α and β are sentences then α ⇒ β is a sentence
If α and β are sentences then α ⇔ β is a sentence
And p.s. there are no other sentences!
Propositional logic semantics
Let m be a model assigning true or false to {X1,X2,…, Xn}
If α is a symbol then its truth value is given in m
¬α is true in m iff α is false in m
α ∧ β is true in m iff α is true in m and β is true in m
α ∨ β is true in m iff α is true in m or β is true in m
α ⇒ β is true in m iff α is false in m or β is true in m
α ⇔ β is true in m iff α ⇒ β is true in m and β ⇒ α is true in m
Propositional logic semantics in code
function PL-TRUE?(α,model) returns true or false
if α is a symbol then return Lookup(α, model)
if Op(α) = ¬ then return not(PL-TRUE?(Arg1(α),model))
if Op(α) = ∧ then return and(PL-TRUE?(Arg1(α),model),
PL-TRUE?(Arg2(α),model))
etc.
(Sometimes called “recursion over syntax”)
Sentence: P ∧ (¬Q ∨ R)
Model/possible-world/assignment-of-values-variables:
{P=true,Q=true,R=false} or 110
Example: Partially observable (near-sighted) Pacman
Pacman knows the map but perceives just wall/gap to NSEW
Formulation: what variables do we need?
Wall locations
Wall_0,0 there is a wall at [0,0]
Wall_0,1 there is a wall at [0,1], etc. (N symbols for N locations)
Percepts
Blocked_W (blocked by wall to my West) etc.
Blocked_W_0 (blocked by wall to my West at time 0) etc. (4T symbols for T time steps)
Actions
W_0 (Pacman moves West at time 0), E_0 etc. (4T symbols)
Pacman’s location
At_0,0_0 (Pacman is at [0,0] at time 0), At_0,1_0 etc. (NT symbols)
How many possible worlds?
N locations, T time steps => N + 4T + 4T + NT = O(NT) variables
2O(NT) possible worlds!
N=200, T=400 => ~1024000 worlds
Each world is a complete “history”
But most of them are pretty weird!
Pacman’s knowledge base: Map
Pacman knows where the walls are:
Wall_0,0 ∧ Wall_0,1 ∧ Wall_0,2 ∧ Wall_0,3 ∧ Wall_0,4 ∧ Wall_1,4 ∧ …
Pacman knows where the walls aren’t!
¬Wall_1,1 ∧ ¬Wall_1,2 ∧ ¬Wall_1,3 ∧ ¬Wall_2,1 ∧ ¬Wall_2,2 ∧ …
Pacman’s knowledge base: Initial state
Pacman doesn’t know where he is
But he knows he’s somewhere!
At_1,1_0 ∨ At_1,2_0 ∨ At_1,3_0 ∨ At_2,1_0 ∨ …
And he knows he’s not in more than one place!
¬ (At_1,1_0 ∧ At_1,2_0) ∧ ¬ (At_1,1_0 ∧ At_1,3_0) …
Pacman’s knowledge base: Sensor model
State facts about how Pacman’s percepts arise…
<Percept variable at t> ⇔ <some condition on world at t>
Pacman perceives a wall to the West at time t
if and only if he is in x,y and there is a wall at x-1,y
Blocked_W_0 ⇔ ((At_1,1_0 ∧ Wall_0,1) v
(At_1,2_0 ∧ Wall_0,2) v
(At_1,3_0 ∧ Wall_0,3) v …. )
4T sentences, each of size O(N)
Note: these are valid for any map
Pacman’s knowledge base: Transition model
How does each state variable at each time gets its value?
Here we care about location variables, e.g., At_3,3_17
A state variable X gets its value according to a successor-state axiom
X_t ⇔ [X_t-1 ∧ ¬(some action_t-1 made it false)] v
[¬X_t-1 ∧ (some action_t-1 made it true)]
For Pacman location:
At_3,3_17 ⇔ [At_3,3_16 ∧ ¬((¬Wall_3,4 ∧ N_16) v (¬Wall_4,3 ∧ E_16) v …)]
v [¬At_3,3_16 ∧ ((At_3,2_16 ∧ ¬Wall_3,3 ∧ N_16) v
(At_2,3_16 ∧ ¬Wall_3,3 ∧ E_16) v …)]
How many sentences?
Vast majority of KB occupied by O(NT) transition model sentences
Each about 10 lines of text
N=200, T=400 => ~800,000 lines of text, or 20,000 pages
This is because propositional logic has limited expressive power
Are we really going to write 20,000 pages of logic sentences???
No, but your code will generate all those sentences!
In first-order logic, we need O(1) transition model sentences
Entails vs. Implies
Entails: α |= β
Implies: α ⇒ β
One is a well-formed sentence in proposition logic
One is a fact about sets of models where sentences are true
Intuitive connection?
KB is a set of sentences (or KB is one sentence with lots of ∧s)
If α ⇒ β ∈ KB, then α ∧ KB|= β (Modus ponens)
If you want α ∧ KB|= β, good idea to put α ⇒ β ∈ KB
Some reasoning tasks
Localization with a map and local sensing:
Given an initial KB, plus a sequence of percepts and actions, where am I?
Mapping with a location sensor:
Given an initial KB, plus a sequence of percepts and actions, what is the map?
Simultaneous localization and mapping:
Given …, where am I and what is the map?
Planning:
Given …, what action sequence is guaranteed to reach the goal?
ALL OF THESE USE THE SAME KB AND THE SAME ALGORITHM!!
Summary
One possible agent architecture: knowledge + inference
Logics provide a formal way to encode knowledge
A logic is defined by: syntax, set of possible worlds, truth condition
A simple KB for Pacman covers the initial state, sensor model, and
transition model
Logical inference computes entailment relations among sentences,
enabling a wide range of tasks to be solved