0% found this document useful (0 votes)
4 views41 pages

cs3600-sp25-lec08

Intro to AI lecture

Uploaded by

ethan.ruan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views41 pages

cs3600-sp25-lec08

Intro to AI lecture

Uploaded by

ethan.ruan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

CS 3600: Introduction to Artificial Intelligence

— MCTS + Logic —

By Munroe, CC BY-NC 2.5, https://xkcd.com/638/

Animesh Garg, Larry Heck, Weicheng Ma


Spring 2025
1
[These slides are partially adopted from cs6601 by Thomas Plötz, cs6601 by Thad Starner, cs 3600 by Mark Riedl, and Berkeley cs188 course by Dan Klein and Pieter Abbeel]
Multi-Agent Utilities
 What if the game is not zero-sum, or has multiple players?

 Generalization of minimax:
 Terminals have utility tuples
 Node values are also utility tuples
 Each player maximizes its own component
 Can give rise to cooperation and
competition dynamically…

1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5


Mixed Layer Types
 E.g. Backgammon
 Expectiminimax
 Environment is an
extra “random
agent” player that
moves after each
min/max agent
 Each node
computes the
appropriate
combination of its
children
Monte Carlo Tree Search
Monte Carlo Tree Search
 Methods based on alpha-beta search assume a fixed horizon
 Pretty hopeless for Go, with b > 300
 MCTS combines two important ideas:
 Evaluation by rollouts – play multiple games to termination from a
state s (using a simple, fast rollout policy) and count wins and losses
 Selective search – explore parts of the tree that will help improve the
decision at the root, regardless of depth
Rollouts
“Move 37”

 For each rollout:


 Repeat until terminal:
 Play a move according to
a fixed, fast rollout policy
 Record the result
 Fraction of wins
correlates with the true
value of the position!
 Having a “better”
rollout policy helps
MCTS Version 0
 Do N rollouts from each child of the root, record fraction of wins
 Pick the move that gives the best outcome by this metric

57/100 39/100 65/100


MCTS Version 0
 Do N rollouts from each child of the root, record fraction of wins
 Pick the move that gives the best outcome by this metric

57/100 0/100 59/100


MCTS Version 0.9
 Allocate rollouts to more promising nodes

77/140 0/10 90/150


MCTS Version 0.9
 Allocate rollouts to more promising nodes

61/100 6/10 48/100


MCTS Version 1.0
 Allocate rollouts to more promising nodes
 Allocate rollouts to more uncertain nodes

61/100 6/10 48/100


UCB heuristics
 UCB1 formula combines “promising” and “uncertain”:

 N(n) = number of rollouts from node n


 U(n) = total utility of rollouts (e.g., # wins) for Player(Parent(n))
 A provably not terrible heuristic for bandit problems
 (which are not the same as the problem we face here!)
MCTS Version 2.0: UCT
 Repeat until out of time:
 Given the current search tree, recursively apply UCB to choose a path
down to a leaf (not fully expanded) node n
 Add a new child c to n and run a rollout from c
 Update the win counts from c back up to the root
 Choose the action leading to the child with highest N
UCT Example
5/10 4/9

4/7 1/2 0/1 0/1

2/3 0/1 2/2 1/1


Why is there no min or max?
 “Value” of a node, U(n)/N(n), is a weighted sum of child values!
 Idea: as N → ∞ , the vast majority of rollouts are concentrated in
the best child(ren), so weighted average → max/min
 Theorem: as N → ∞ UCT selects the minimax move
 (but N never approaches infinity!)
MCTS + Machine Learning: AlphaGo
■ Monte Carlo Tree Search with additions including:
■ Rollout policy is a neural network trained with reinforcement learning
and expert human moves
■ In combination with rollout outcomes, use a trained value function to
better predict node’s utility

[Mastering the game of Go with deep neural networks and tree search. Silver et al. Nature. 2016]
Summary
 Games require decisions when optimality is impossible
 Bounded-depth search and approximate evaluation functions
 Games force efficient use of computation
 Alpha-beta pruning, MCTS
 Game playing has produced important research ideas
 Reinforcement learning (checkers)
 Iterative deepening (chess)
 Rational metareasoning (Othello)
 Monte Carlo tree search (chess, Go)
 Solution methods for partial-information games in economics (poker)
 Video games present much greater challenges – lots to do!
 b = 10500, |S| = 104000, m = 10,000, partially observable, often > 2 players
By Munroe, CC BY-NC 2.5, https://xkcd.com/638/

27
[These slides are partially adopted from cs6601 by Thomas Plötz, cs6601 by Thad Starner, cs 3600 by Mark Riedl, and Berkeley cs188 course by Dan Klein , Pieter Abbeel], Stuart Ruseel
Outline of the course

unknown

RL

known
deterministic stochastic
atomic
SEARCH MDPs

factored Bayes
LOGIC
nets

structured First-order
logic
Logic Outline
1. Propositional Logic I
 Basic concepts of knowledge, logic, reasoning
 Propositional logic: syntax and semantics, Pacworld example
 Inference by theorem proving
2. Propositional logic II
 Inference by model checking
 A Pac agent using propositional logic
3. First-order logic
Agents that know things
 Agents acquire knowledge through perception, learning, language
 Knowledge of the effects of actions (“transition model”)
 Knowledge of how the world affects sensors (“sensor model”)
 Knowledge of the current state of the world
 Can keep track of a partially observable world
 Can formulate plans to achieve goals
 Check out OpenAI’s Operator (Web Agents)
Knowledge, contd.
 Knowledge base = set of sentences in a formal language
 Declarative approach to building an agent (or other system):
 Tell it what it needs to know (or have it Learn the knowledge)
 Then it can Ask itself what to do—answers should follow from the KB
 Agents can be viewed at the knowledge level
i.e., what they know, regardless of how implemented
 A single inference algorithm can answer any answerable question
Knowledge base Domain-specific facts

Inference engine Generic code


Logic
 Syntax: What sentences are allowed? Sentence: x > y
 Semantics: World1: x = 5; y = 2
World2: x = 2; y = 3
 What are the possible worlds?
 Which sentences are true in which worlds? (i.e., definition of truth)

α1
α2 α3
Syntaxland Semanticsland
Different kinds of logic
 Propositional logic
 Syntax: P ∨ (¬Q ∧ R); X1 ⇔ (Raining ⇒ ¬Sunny)
 Possible world: {P=true,Q=true,R=false,S=true} or 1101
 Semantics: α ∧ β is true in a world iff is α true and β is true (etc.)
 First-order logic
 Syntax: ∀x ∃y P(x,y) ∧ ¬Q(Joe,f(x)) ⇒ f(x)=f(y)
 Possible world: Objects o1, o2, o3; P holds for <o1,o2>; Q holds for <o3,o2 >;
f(o1)=o1; Joe=o3; etc.
 Semantics: φ(σ) is true in a world if σ=oj and φ holds for oj; etc.
Different kinds of logic, contd.
 Relational databases:
 Syntax: ground relational sentences, e.g., Sibling(Ali,Bo)
 Possible worlds: (typed) objects and (typed) relations
 Semantics: sentences in the DB are true, everything else is false
 Cannot express disjunction, implication, universals, etc.
 Query language (SQL etc.) typically some variant of first-order logic
 Often augmented by first-order rule languages, e.g., Datalog
 Knowledge graphs (roughly: relational DB + ontology of types and relations)
 Google Knowledge Graph: 5 billion entities, 500 billion facts, >30% of queries
 Facebook network: 2.8 billion people, trillions of posts, maybe quadrillions of facts
Inference: entailment
 Entailment: α |= β (“α entails β” or “β follows from α”) iff in
every world where α is true, β is also true
 I.e., the α-worlds are a subset of the β-worlds [models(α) ⊆ models(β)]
 In the example, α2 |= α1
 (Say α2 is ¬Q ∧ R ∧ S ∧ W
α1 is ¬Q )
α1
α2
Inference: proofs
 A proof is a demonstration of entailment between α and β
 Sound algorithm: everything it claims to prove is in fact entailed
 Complete algorithm: everything that is entailed can be proved
Inference: proofs
 Method 1: model-checking
 For every possible world, if α is true make sure that is β true too
 OK for propositional logic (finitely many worlds); not easy for first-order logic
 Method 2: theorem-proving
 Search for a sequence of proof steps (applications of inference rules) leading
from α to β
 E.g., from P ∧ (P ⇒ Q), infer Q by Modus Ponens
Propositional logic syntax
 Given: a set of proposition symbols {X1,X2,…, Xn}
 (we often add True and False for convenience)
 Xi is a sentence
 If α is a sentence then ¬α is a sentence
 If α and β are sentences then α ∧ β is a sentence
 If α and β are sentences then α ∨ β is a sentence
 If α and β are sentences then α ⇒ β is a sentence
 If α and β are sentences then α ⇔ β is a sentence
 And p.s. there are no other sentences!
Propositional logic semantics
 Let m be a model assigning true or false to {X1,X2,…, Xn}
 If α is a symbol then its truth value is given in m
 ¬α is true in m iff α is false in m
 α ∧ β is true in m iff α is true in m and β is true in m
 α ∨ β is true in m iff α is true in m or β is true in m
 α ⇒ β is true in m iff α is false in m or β is true in m
 α ⇔ β is true in m iff α ⇒ β is true in m and β ⇒ α is true in m
Propositional logic semantics in code
function PL-TRUE?(α,model) returns true or false
if α is a symbol then return Lookup(α, model)
if Op(α) = ¬ then return not(PL-TRUE?(Arg1(α),model))
if Op(α) = ∧ then return and(PL-TRUE?(Arg1(α),model),
PL-TRUE?(Arg2(α),model))
etc.
(Sometimes called “recursion over syntax”)
 Sentence: P ∧ (¬Q ∨ R)
 Model/possible-world/assignment-of-values-variables:
{P=true,Q=true,R=false} or 110
Example: Partially observable (near-sighted) Pacman
 Pacman knows the map but perceives just wall/gap to NSEW
 Formulation: what variables do we need?
 Wall locations
 Wall_0,0 there is a wall at [0,0]
 Wall_0,1 there is a wall at [0,1], etc. (N symbols for N locations)
 Percepts
 Blocked_W (blocked by wall to my West) etc.
 Blocked_W_0 (blocked by wall to my West at time 0) etc. (4T symbols for T time steps)
 Actions
 W_0 (Pacman moves West at time 0), E_0 etc. (4T symbols)
 Pacman’s location
 At_0,0_0 (Pacman is at [0,0] at time 0), At_0,1_0 etc. (NT symbols)
How many possible worlds?
 N locations, T time steps => N + 4T + 4T + NT = O(NT) variables
 2O(NT) possible worlds!
 N=200, T=400 => ~1024000 worlds
 Each world is a complete “history”
 But most of them are pretty weird!
Pacman’s knowledge base: Map
 Pacman knows where the walls are:
 Wall_0,0 ∧ Wall_0,1 ∧ Wall_0,2 ∧ Wall_0,3 ∧ Wall_0,4 ∧ Wall_1,4 ∧ …
 Pacman knows where the walls aren’t!
 ¬Wall_1,1 ∧ ¬Wall_1,2 ∧ ¬Wall_1,3 ∧ ¬Wall_2,1 ∧ ¬Wall_2,2 ∧ …
Pacman’s knowledge base: Initial state
 Pacman doesn’t know where he is
 But he knows he’s somewhere!
 At_1,1_0 ∨ At_1,2_0 ∨ At_1,3_0 ∨ At_2,1_0 ∨ …
 And he knows he’s not in more than one place!
 ¬ (At_1,1_0 ∧ At_1,2_0) ∧ ¬ (At_1,1_0 ∧ At_1,3_0) …
Pacman’s knowledge base: Sensor model
 State facts about how Pacman’s percepts arise…
 <Percept variable at t> ⇔ <some condition on world at t>
 Pacman perceives a wall to the West at time t
if and only if he is in x,y and there is a wall at x-1,y
 Blocked_W_0 ⇔ ((At_1,1_0 ∧ Wall_0,1) v
(At_1,2_0 ∧ Wall_0,2) v
(At_1,3_0 ∧ Wall_0,3) v …. )
 4T sentences, each of size O(N)
 Note: these are valid for any map
Pacman’s knowledge base: Transition model
 How does each state variable at each time gets its value?
 Here we care about location variables, e.g., At_3,3_17
 A state variable X gets its value according to a successor-state axiom
 X_t ⇔ [X_t-1 ∧ ¬(some action_t-1 made it false)] v
[¬X_t-1 ∧ (some action_t-1 made it true)]
 For Pacman location:
 At_3,3_17 ⇔ [At_3,3_16 ∧ ¬((¬Wall_3,4 ∧ N_16) v (¬Wall_4,3 ∧ E_16) v …)]
v [¬At_3,3_16 ∧ ((At_3,2_16 ∧ ¬Wall_3,3 ∧ N_16) v
(At_2,3_16 ∧ ¬Wall_3,3 ∧ E_16) v …)]
How many sentences?
 Vast majority of KB occupied by O(NT) transition model sentences
 Each about 10 lines of text
 N=200, T=400 => ~800,000 lines of text, or 20,000 pages
 This is because propositional logic has limited expressive power
 Are we really going to write 20,000 pages of logic sentences???
 No, but your code will generate all those sentences!
 In first-order logic, we need O(1) transition model sentences
Entails vs. Implies
 Entails: α |= β
 Implies: α ⇒ β
 One is a well-formed sentence in proposition logic
 One is a fact about sets of models where sentences are true
 Intuitive connection?
 KB is a set of sentences (or KB is one sentence with lots of ∧s)
 If α ⇒ β ∈ KB, then α ∧ KB|= β (Modus ponens)
 If you want α ∧ KB|= β, good idea to put α ⇒ β ∈ KB
Some reasoning tasks
 Localization with a map and local sensing:
 Given an initial KB, plus a sequence of percepts and actions, where am I?
 Mapping with a location sensor:
 Given an initial KB, plus a sequence of percepts and actions, what is the map?
 Simultaneous localization and mapping:
 Given …, where am I and what is the map?
 Planning:
 Given …, what action sequence is guaranteed to reach the goal?
 ALL OF THESE USE THE SAME KB AND THE SAME ALGORITHM!!
Summary
 One possible agent architecture: knowledge + inference
 Logics provide a formal way to encode knowledge
 A logic is defined by: syntax, set of possible worlds, truth condition
 A simple KB for Pacman covers the initial state, sensor model, and
transition model
 Logical inference computes entailment relations among sentences,
enabling a wide range of tasks to be solved

You might also like