Informed Search
(Best-first search, Greedy best-first search, A* search, Local Search)
(Based on slides of Bart Selman)
1
Search
• Search strategies determined by choice of node (in
queue) to expand
• Uninformed search:
• Distance to goal not taken into account
• Informed search :
• Information about cost to goal taken into account
• Aside: “Cleverness” about what option to explore next,
almost seems a hallmark of intelligence. E.g., a sense of
what might be a good move in chess or what step to try
next in a mathematical proof. We don’t do blind
search…
d = min dist. to goal
Reminder: Perfect distance
d=5 Start state
to goal info.
Eliminates search.
d >= 5 d=4 d >= 4
Select
d >= 4 Select
d=3 d >= 3
d=2
Select
dSelect
=1
Select
d = 0 d >= 1
Goal
A breadth-first search tree.
Practice:
Only have estimate of distance to goal (“heuristic
information”).
Basic idea: State evaluation function Start state
can effectively guide search.
Also in multi-agent settings. (Chess:
board eval.)
Reinforcement learning: Learn the
state eval function.
Goal
A breadth-first search tree.
Perfect “heuristics,” eliminates search.
Approximate heuristics, significantly reduces search.
Best (provably) use of search heuristic info: Best-first / A*
search.
Outline
• Depth First Search
• Breadth First Search
• Best-first search
• Greedy best-first search
• A* search
• Heuristics
General Tree Search Paradigm
•
General Graph Search Paradigm
•
How to take information into account? Best-first search.
• Idea : use an evaluation function for each node
• Estimate of “desirability” of node
• Expand most desirable unexpanded node first (“best-first search”)
• Heuristic Functions :
• f: States → Numbers
• f(n): expresses the quality of the state n
• Allows us to express problem-specific knowledge,
• Can be imported in a generic way in the algorithms.
• Use uniform-cost search. See Figure 3.14 but use f(n) instead of path
cost g(n).
• Queuing based on f(n):
• Order the nodes in fringe in decreasing order of
desirability
•
• Special cases:
• greedy best-first search
• A* search
•
Romanian path finding problem
Straight-line
Base eg on GPS info. dist. to Bucharest
No map needed.
374
253
329
Searching for good path from Arad to Bucharest,
what is a reasonable “desirability measure” to expand nodes
on the fringe?
Greedy best-first search
• Evaluation function at node n, f(n) = h(n) (heuristic)
• = estimate of cost from n to goal
•
• e.g., hSLD(n) = straight-line distance from n to Bucharest
•
• Greedy best-first search expands the node that
appears to have shortest path to goal.
• Idea: those nodes may lead to solution quickly.
•
Similar to depth-first search: It prefers to follow a single
path to goal (guided by the heuristic), backing up when it
hits a dead-end.
Greedy best-first search example
Greedy best-first search example
Greedy best-first search example
Greedy best-first search example
So, Arad --- Sibiu --- Fagaras --- Bucharest
140+99+211 = 450
Is it optimal?
What are we ignoring?
Also, consider going from
Iasi to Fagaras – what can happen?
Properties of greedy best-first search
• Complete? No – can get stuck in loops, e.g.,
• Iasi → Neamt → Iasi → Neamt…
•
• But, complete in finite space with repeated state elimination.
• Time? O(bm) (imagine nodes all have same distance estimate to
goal)
• but a good heuristic can give dramatic improvement → Becomes
more similar to depth-first search, with reduced branching.
• Space? O(bm) -- keeps all nodes in memory
•
• Optimal? b: maximum branching factor of the
No! search tree
d: depth of the least-cost solution
• How can we fix this?
m: maximum depth of the state space
• (may be ∞)
A* search
Note: Greedy best-first search expands the node that
appears to have shortest path to goal. But what about cost
of getting to that node? Take it into account!
• Idea: avoid expanding paths that are already
expensive
•
• Evaluation function f(n) = g(n) + h(n)
•
• g(n) = cost so far to reach n
• h(n) = estimated cost from n to goal
• f(n) = estimated total cost of path through n to
goal
• we still have “looping problem”?
Aside: do
No! We’ll eventually
Iasi to Fagaras:
get out of it. g(n)
Iasi → Neamt → Iasi → Neamt…
keeps going up.
A* search example
Using: f(n) = g(n) + h(n)
A* search example
Using: f(n) = g(n) + h(n)
A* search example
Using: f(n) = g(n) + h(n)
A* search example
Using: f(n) = g(n) + h(n)
A search example
*
Using: f(n) = g(n) + h(n)
What happens if
h(Pitesti) = 150?
Bucharest appears on the fringe
but not selected for expansion
since its cost (450)
is higher than that of Pitesti (417).
Important to understand for the
proof of
optimality of A*
A search example
*
Using: f(n) = g(n) + h(n) Arad --- Sibiu --- Rimnicu --- Pitesti --- Bucharest
Claim: Optimal path found!
1) Can it go wrong?
Note: Best first
Arad --- Sibiu --- Fagaras
2) What’s special about --- Bucharest
“straight distance” to goal?
Uniform cost search
It underestimates true path
distance!
3) What if all our estimates
to goal are 0? Eg h(n) = 0
4) What if we overestimate?
5) What if h(n) is true distance (h*(n))?
What is f(n)?
Shortest dist. through n --- perfect heuristics --- no search
A* search algorithm
A* properties
• Under some reasonable conditions for the heuristics, we have:
• Complete
• Yes, unless there are infinitely many nodes with f(n) < f(Goal)
• Time
• Sub-exponential grow when
• So, a good heuristics can bring exponential search down
significantly!
• Space
h(n) − h (n) O(log h* (n))
*
• Fringe nodes in memory. Often exponential. Solution: IDA*
• Optimal
• Yes (under admissible heuristics; discussed next)
• Also, optimal use of heuristics information!
• Widely used. After almost 40 yrs, still new applications found.
• Also, optimal use of heuristic information.
• Provably: Can’t do better!
Heuristics: (1) Admissibility
• A heuristic h(n) is admissible if for every node n, h(n) ≤
h*(n), where h*(n) is the true cost to reach the goal state
from n.
• An admissible heuristic never overestimates the cost to
reach the goal, i.e., it is optimistic. (But no info of where
the goal is if set to 0.)
• Example: hSLD(n) (never overestimates the actual road
distance)
• Note: it follows that h(goal) = 0.
•
Note: less optimistic heuristic push nodes to be expanded
later. Can prune a lot more.
Heuristics: (2) Consistency
A heuristic is consistent (or monotone) if for every node n, every successor n' of n generated by any action
a,
h(n) ≤ c(n,a,n') + h(n')
(form of the triangle inequality)
f(n') ≥ f(n)
If h is consistent, we have
f(n') = g(n') + h(n')
= g(n) + c(n,a,n') + h(n')
≥ g(n) + h(n)
→ sequence of nodes expanded by
= f(n) A* is in nondecreasing order of f(n)
i.e., f(n) is non-decreasing along any path. → the first goal selected for
expansion must be an optimal goal.
Note: Monotonicity is a stronger condition than admissibility.
Any consistent heuristic is also admissible.
(Exercise 3.29)
A*: Tree Search vs. Graph Search
TREE SEARCH (See Fig. 3.7; used in earlier
examples):
If h(n) is admissible, A* using tree search is
optimal.
GRAPH SEARCH (See Fig. 3.7) A modification of tree search
that includes an “explored set” (or “closed list”; list of expanded nodes to
avoid re-visiting the same state); if the current node matches a node on the
closed list, it is discarded instead of being expanded. In order to guarantee
optimality of A*, we need to make sure that the optimal path to any repeated
state is always the first one followed:
If h(n) is monotonic, A* using graph search is
optimal.
Intuition: Contours of A*
A* expands nodes in order of increasing f value.
A* expands all nodes
with f(n)<C*
Gradually adds "f-contours" of nodes. Uniform-cost (h(n)=0)
Contour i has all nodes with f <= fi, where fi < fi+1 expands in circles.
Note: with uniform cost (h(n)=0) the bands will be circular around the start state.
Completeness (intuition)
As we add bands of increasing f, we
380 must eventually reach a band where f
is equal to the cost of the path to a
400 goal state. (assuming b finite and step
cost exceed some positive finite ε).
Optimality (intuition)
420 1st solution found (goal node expanded)
must be an optimal one since goal nodes
in subsequent contours will have higher f-
cost and therefore higher g-cost (since
h(goal)=0)
A* Search: Optimality
• Theorem:
A* used with a consistent heuristic ensures optimality
with graph search.
A* Search: Pseudocode
•Open List: consists on nodes that have been visited but not expanded (meaning that
sucessors have not been explored yet ). This is the list of pending tasks.
•Close List: consists on nodes that have been visited and expanded (sucessors have been
explored already and included in the openlist, if this was the case).
• Proof:
(1) If h(n) is consistent, then the values of f(n) along any path are non-decreasing. See consistent heuristics slide.
(2) Whenever A* selects a node n for expansion, the optimal path to that node has been found. Why?
• Assume not. Then, the optimal path, P, must have some not yet
• expanded nodes. (*) Thus, on P, there must be an unexpanded
• node n` on the current frontier (because of graph separation;
• fig. 3.9; frontier separates explored region from unexplored
• region). But, because f is nondecreasing along any path, n`
• would have a lower f-cost than n and would have been selected
• first for expansion before n. Contradiction.
•From (1) and (2), it follows that the sequence of nodes expanded by A*
•using Graph-Search is in non-decreasing order of f(n). Thus, the first
•goal node selected must have the optimal path, because f(n) is the true
•path cost for goal nodes (h(Goal) = 0), and all later goal nodes have paths
•that are are at least as expensive. QED
•(*) requires a bit of thought. Must argue that there cannot be a shorter
•path going only through expanded nodes (by contradiction).
Note: Termination / Completeness
• Termination is guaranteed when the number of nodes
• withf (n) f * is finite.
• None-termination can only happen when
• There is a node with an infinite branching factor, or
• There is a path with a finite cost but an infinite
number of nodes along it.
• Can be avoided by assuming that the cost of each action
is larger than a positive constant d
A* Optimal in Another Way
• It has also been shown that A* makes optimal use of the
heuristics in the sense that there is no search algorithm
that could expand fewer nodes using the heuristic
information (and still find the optimal / least cost
solution.
• So, A* is “the best we can get” (in this setting).
• Note: We’re assuming a search based approach with
states/nodes, actions on them leading to other
states/nodes, start and goal states/nodes.
Heuristics
8-Puzzle
• Slide the tiles horizontally or vertically into the empty
space until the
• configuration matches the goal configuration
• What’s the branching factor?
• (slide “empty space”)
About 3, depending on location of
empty tile:
middle → 4; corner → 2; edge → 3
The average solution cost for a randomly generated 8-puzzle instance → about 22 steps
So, search space to depth 22 is about 322 3.1 1010 states.
→Reduced to by a factor of about 170,000 by keeping track of repeated states
(9!/2 = 181,440 distinct states) note: 2 sets of disjoint states. See exercise 3.4
We’d better find a good heuristic
But: 15-puzzle → 1013 distinct states!
to speed up search! Can you suggest one?
Note: “Clever” heuristics now allow us to solve the 15-puzzle in
a few milliseconds!
Admissible heuristics
E.g., for the 8-puzzle:
h1(n) = number of misplaced tiles
h2(n) = total Manhattan distance
(i.e., no. of steps from desired location of each tile)
h1(Start) = ? 8
h2(Start) = ? 3+1+2+2+2+3+3+2 = 18
Why are heuristics admissible? True cost = 26
Which is better?
How can we get the optimal heuristics? (Given H_opt(Start) = 26. How would we find the
next board on the optimal path to the goal?)
Desired properties heuristics:
(1) consistent (admissible)
(2) As close to opt as we can get (sometimes go a bit over…)
(3) Easy to compute! We want to explore many nodes.
Note: each empty-square-move = 1 step tile move.
Comparing heuristics
• Effective Branching Factor, b*
• If A* generates N nodes to find the goal at depth d
b* = branching factor such that a uniform tree of depth d contains N+1
nodes (we add one for the root node that wasn’t included in N)
N+1 = 1 + b* + (b*)2 + … + (b*)d
E.g., if A* finds solution at depth 5 using 52 nodes, then the effective branching
factor is 1.92.
• b* close to 1 is ideal
• because this means the heuristic guided the A* search is
closer to ideal (linear).
• If b* were 100, on average, the heuristic had to consider 100 children
for each node
• Compare heuristics based on their b*
Comparison of heuristics
h2 indeed significantly better than h1
d – depth of goal node
Dominating heuristics
• h2 is always better than h1
• Because for any node, n, h2(n) >= h1(n). (Why?)
We say h2 dominates h1
It follows that h1 will expand at least as many nodes as h2.
Because:
Recall all nodes with f(n) < C* will be expanded.
This means all nodes, h(n) + g(n) < C*, will be expanded.
So, all nodes n where h(n) < C* - g(n) will be expanded
All nodes h2 expands will also be expanded by h1 and because
h1 is smaller, others may be expanded as well
Inventing admissible heuristics:
Relaxed Problems
• Can we generate h(n) automatically?
• Simplify problem by reducing restrictions on
actions
• A problem with fewer restrictions on the
actions is called a
• relaxed problem
•
Examples of relaxed problems
• Original: A tile can move from square A to square B iff
• (1) A is horizontally or vertically adjacent to B and (2) B is blank
• Relaxed versions:
• A tile can move from A to B if A is adjacent to B (“overlap”; Manhattan
distance)
• A tile can move from A to B if B is blank (“teleport”)
• A tile can move from A to B (“teleport and overlap”)
• Key: Solutions to these relaxed problems can be computed without search
• and therefore provide a heuristic that is easy/fast to compute.
This technique was used by ABSOLVER (1993) to invent heuristics for the 8-puzzle
better than existing ones and it also found a useful heuristic for famous Rubik’s
cube puzzle.
Inventing admissible heuristics:
Relaxed Problems
• The cost of an optimal solution to a relaxed problem is an admissible
heuristic
• for the original problem. Why?
1) The optimal solution in the original problem is also a solution to the
relaxed problem (satisfying in addition all the relaxed constraints). So, the
solution cost matches at most the original optimal solution.
2) The relaxed problem has fewer constraints. So, there may be other, less
expensive solutions, given a lower cost (admissible) relaxed solution.
What if we have multiple heuristics available? I.e., h_1(n), h_2(n), …
h(n) = max {h1(n), h2(n), …, hm(n)}
If component heuristics are admissible so is the
composite.
Inventing admissible heuristics:
Sub-problem solutions as heuristic
• What is the optimal cost of solving some portion of original
problem?
• subproblem solution is heuristic of original problem
Pattern Databases
• Store optimal solutions to subproblems in database
• We use an exhaustive search to solve every
permutation of the 1,2,3,4-piece subproblem of the 8-
puzzle
• During solution of 8-puzzle, look up optimal cost to
solve the 1,2,3,4-piece sub-problem and use as
heuristic
• Other configurations can be considered
Inventing admissible heuristics:
Learning
Also automatically learning admissible heuristics
using machine learning techniques, e.g., inductive learning and
reinforcement learning.
Generally, you try to learn a “state-evaluation” function or “board
evaluation” function. (How desirable is state in terms of getting to the
goal?) Key: What “features / properties” of state are most useful?
More later…
Local Search
(Hill-climbing, Simulated Annealing)
(Based on slides of Mausam, Padhraic Smyth, Stuart Russell,
Rao Kambhampati, Raj Rao, Dan Weld, Jim Martin…)
46
Outline
• Local search techniques and optimization
• Hill-climbing
• Simulated annealing
• Issues with local search
47
Local search and optimization
• Previous lecture: path to goal is solution to problem
– systematic exploration of search space.
• This lecture: a state is solution to problem
– for some problems path is irrelevant.
– E.g., 8-queens
• Different algorithms can be used
– Depth First Branch and Bound
– Local search
48
Goal
Optimizatio
Satisfactio
n
n
reach the goal node optimize(objective fn)
Constraint satisfaction Constraint Optimization
You can go back and forth between the two problems
Typically in the same complexity class
© Mausam 49
Local search and optimization
• Sometimes referred to as iterative improvement or
local search.
• We’ll talk about three simple but effective
techniques:
• Hillclimbing
• Random Restart Hillclimbing
• Simulated Annealing
50
Optimization Framework
• Working with 1 state in memory
• No agenda/queue/fringe…
• Usually
• Usually generating new states from this 1 state in
an attempt to improve things
• Goal notion is slightly different
• Normally solutions are easy to find
• We can compare solutions and say one is better than
another
• Goal is usually an optimization of some function of the
“solution” (cost).
51
Local search and optimization
• Local search
• Keep track of single current state
• Move only to neighboring states
• Ignore paths
• Advantages:
• Use very little memory
• Can often find reasonable solutions in large or infinite (continuous)
state spaces.
• “Pure optimization” problems
• All states have an objective function
• Goal is to find state with max (or min) objective value
• Does not quite fit into path-cost/goal-state formulation
• Local search can do quite well on these problems.
52
Trivial Algorithms
• Random Sampling
• Generate a state randomly
• Random Walk
• Randomly pick a neighbor of the current state
• Both algorithms asymptotically complete.
© Mausam 53
Hill-climbing Search
• Generate nearby successor states to the current state based on
some knowledge of the problem.
• Pick the best of the bunch and replace the current state with that
one.
• Loop (until?)
© Mausam 54
Hill-climbing Search
• Implicit in this scheme is the notion of a neighborhood that in
some way preserves the cost behavior of the solution space.
• The successor function is where the intelligence lies in hill-
climbing search
• It has to be conservative enough to preserve significant “good”
portions of the current solution
© Mausam 55
Hill-climbing (Greedy Local Search)
max version
function HILL-CLIMBING( problem) return a state that is a local maximum
input: problem, a problem
local variables: current, a node.
neighbor, a node.
current MAKE-NODE(INITIAL-STATE[problem])
loop do
neighbor a highest valued successor of current
if VALUE [neighbor] ≤ VALUE[current] then return STATE[current]
current neighbor
min version will reverse inequalities and look for lowest valued successor
56
Hill-climbing search
• “a loop that continuously moves towards increasing value”
• terminates when a peak is reached
• Aka greedy local search
• Value can be either
• Objective function value
• Heuristic function value (minimized)
• Hill climbing does not look ahead of the immediate neighbors
• Can randomly choose among the set of best successors
• if multiple have the best value
• “climbing Mount Everest in a thick fog with amnesia”
57
“Landscape” of Hill-climbing search
Hill Climbing gets stuck in local minima
depending on? 58
Example: n-queens
• Put n queens on an n x n board with no two queens on the same
row, column, or diagonal
• Is it a satisfaction problem or optimization?
59
Hill-climbing search: 8-queens problem
• Need to convert to an optimization problem
• h = number of pairs of queens that are attacking each other
• h = 17 for the above state
60
Search Space
• State
• All 8 queens on the board in some configuration
• Successor function
• move a single queen to another square in the same column.
• Example of a heuristic function h(n):
• the number of pairs of queens that are attacking each other
• (so we want to minimize this)
61
Hill-climbing search: 8-queens problem
• Is this a solution?
• What is h?
62
Hill-climbing on 8-queens
• Randomly generated 8-queens starting states…
• 14% the time it solves the problem
• 86% of the time it get stuck at a local minimum
• However…
• Takes only 4 steps on average when it succeeds
• And 3 on average when it gets stuck
• (for a state space with 8^8 =~17 million states)
63
Hill Climbing Drawbacks
• Local maxima
• Plateaus
• Diagonal ridges
64
Escaping Shoulders: Sideways Move
• If no downhill (uphill) moves, allow sideways
moves in hope that algorithm can escape
• Need to place a limit on the possible number of
sideways moves to avoid infinite loops
• For 8-queens
• Now allow sideways moves with a limit of 100
• Raises percentage of problem instances solved from 14
to 94%
• However….
• 21 steps for every successful solution
• 64 for each failure
65
Local Minima/Maxima
• Hill-climbing is subject to getting stuck in a variety of
local conditions…
• Solutions
• Random-walk hill-climbing
• Random-restart hill-climbing
• Hill-climbing with both
• Simulated Annealing
66
Hill-climbing: stochastic variations
• Stochastic hill-climbing
• Random selection among the uphill moves.
• The selection probability can vary with the steepness of the uphill
move.
• To avoid getting stuck in local minima/maxima
• Random-walk hill-climbing
• Random-restart hill-climbing
• Hill-climbing with both
67
Hill Climbing: stochastic variations
→When the state-space landscape has local minima, any search
that moves only in the greedy direction cannot be complete
→Random walk, on the other hand, is
asymptotically complete
Idea: Put random walk into greedy hill-climbing
68
Hill-climbing with random restarts
• If at first you don’t succeed, try, try again!
• Pretty obvious what this is….
• Generate a random start state
• Run hill-climbing and store answer
• Iterate, keeping the current best answer as you go
• Stopping… when?
• If you want to pick one local search algorithm, learn this one!!
69
Hill-climbing with random walk
• At each step do one of the two
• Greedy: With prob p move to the neighbor with largest
value
• Random: With prob 1-p move to a random neighbor
Hill-climbing with both
• At each step do one of the three
– Greedy: move to the neighbor with largest value
– Random Walk: move to a random neighbor
– Random Restart: Resample a new current state
© Mausam 70
Simulated Annealing
• Based on a metallurgical metaphor
• Start with a temperature set very high and slowly reduce it.
• Run hillclimbing with the twist that you can occasionally
replace the current state with a worse state based on the
current temperature and how much worse the new state is.
71
Simulated Annealing
• Simulated Annealing = physics inspired twist on random walk
• Basic ideas:
• like hill-climbing identify the quality of the local improvements
• instead of picking the best move, pick one randomly
• say the change in objective function is
• if is positive, then move to that state
• otherwise:
• move to this state with probability proportional to
• thus: worse moves (very large negative ) are executed less often
• however, there is always a chance of escaping from local maxima
• over time, make it less likely to accept locally bad moves
• (Can also make the size of the move random as well, i.e., allow “large”
steps in state space)
72
Simulated annealing
function SIMULATED-ANNEALING( problem, schedule) return a solution state
input: problem, a problem
schedule, a mapping from time to temperature
local variables: current, a node.
next, a node.
T, a “temperature” controlling the prob. of downward steps
current MAKE-NODE(INITIAL-STATE[problem])
for t 1 to ∞ do
T schedule[t]
if T = 0 then return current
next a randomly selected successor of current
∆E VALUE[next] - VALUE[current]
if ∆E > 0 then current next
else current next only with probability e∆E /T
73
Temperature T
• high T: probability of “locally bad” move is higher
• low T: probability of “locally bad” move is lower
• typically, T is decreased as the algorithm runs longer
• i.e., there is a “temperature schedule”
74
Simulated Annealing in Practice
• method proposed in 1983 by IBM researchers for solving
VLSI layout problems (Kirkpatrick et al, Science,
220:671-680, 1983).
• theoretically will always find the global optimum
• Other applications: Traveling salesman, Graph
partitioning, Graph coloring, Scheduling, Facility Layout,
Image Processing, …
• useful for some problems, but can be very slow
• slowness comes about because T must be decreased very
gradually to retain optimality
75