Python AI Programming Guide
Python AI Programming Guide
https://aipython.org https://artint.info
    ©David L Poole and Alan K Mackworth 2017-2023.
    All code is licensed under a Creative Commons Attribution-NonCommercial-
ShareAlike 4.0 International License. See: https://creativecommons.org/licenses/
by-nc-sa/4.0/deed.en
    This document and all the code can be downloaded from
https://artint.info/AIPython/ or from https://aipython.org
    The authors and publisher of this book have used their best efforts in prepar-
ing this book. These efforts include the development, research and testing of
the theories and programs to determine their effectiveness. The authors and
publisher make no warranty of any kind, expressed or implied, with regard to
these programs or the documentation contained in this book. The author and
publisher shall not be liable in any event for incidental or consequential dam-
ages in connection with, or arising out of, the furnishing, performance, or use
of these programs.
Contents 3
                                            3
4                                                                                                                         Contents
11 Causality                                                                                 265
   11.1 Do Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . .                 265
   11.2 Counterfactual Example . . . . . . . . . . . . . . . . . . . . . .                   267
Bibliography 343
Index 345
AIPython contains runnable code for the book Artificial Intelligence, foundations
of computational agents, 3rd Edition [Poole and Mackworth, 2023]. It has the
following design goals:
                                       9
10                                                     1. Python for Artificial Intelligence
1.4       Pitfalls
It is important to know when side effects occur. Often AI programs consider
what would/might happen given certain conditions. In many such cases, we
don’t want side effects. When an agent acts in the world, side effects are ap-
propriate.
     In Python, you need to be careful to understand side effects. For example,
the inexpensive function to add an element to a list, namely append, changes the
list. In a functional language like Haskell or Lisp, adding a new element to a
list, without changing the original list, is a cheap operation. For example if x is
a list containing n elements, adding an extra element to the list in Python (using
append) is fast, but it has the side effect of changing the list x. To construct a new
list that contains the elements of x plus a new element, without changing the
value of x, entails copying the list, or using a different representation for lists.
In the searching code, we will use a different representation for lists for this
reason.
enumerates the values fe for each e in iter for which cond is true. The “if cond”
part is optional, but the “for” and “in” are not optional. Here e is a variable (or
a pattern that can be on the left side of =), iter is an iterator, which can generate
a stream of data, such as a list, a set, a range object (to enumerate integers
between ranges) or a file. cond is an expression that evaluates to either True or
False for each e, and fe is an expression that will be evaluated for each value of
e for which cond returns True.
    The result can go in a list or used in another iteration, or can be called
directly using next. The procedure next takes an iterator returns the next el-
ement (advancing the iterator) and raises a StopIteration exception if there is
no next element. The following shows a simple example, where user input is
prepended with >>>
>>>   [e*e for e in range(20) if e%2==0]
[0,   4, 16, 36, 64, 100, 144, 196, 256, 324]
>>>   a = (e*e for e in range(20) if e%2==0)
>>>   next(a)
     2 https://docs.python.org/3/reference/expressions.html#displays-for-lists-sets-and-dictionaries
0
>>> next(a)
4
>>> next(a)
16
>>> list(a)
[36, 64, 100, 144, 196, 256, 324]
>>> next(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
Notice how list(a) continued on the enumeration, and got to the end of it.
    Comprehensions can also be used for dictionaries. The following code cre-
ates an index for list a:
>>> a = ["a","f","bar","b","a","aaaaa"]
>>> ind = {a[i]:i for i in range(len(a))}
>>> ind
{'a': 4, 'f': 1, 'bar': 2, 'b': 3, 'aaaaa': 5}
>>> ind['b']
3
which means that 'b' is the 3rd element of the list.
  The assignment of ind could have also be written as:
>>> ind = {val:i for (i,val) in enumerate(a)}
where enumerate is a built-in function that, given a dictionary, returns an itera-
tor of (index, value) pairs.
the file is given in the gray text above the listing. The numbers correspond to the line numbers
in that file.
     Try to predict, and then test to see the output, of the output of the following
     calls, remembering that the function uses the latest value of any variable that
     is not bound in the function call:
pythonDemo.py — (continued)
29   # in Shell do
30   ## ipython -i pythonDemo.py
31   # Try these (copy text after the comment symbol and paste in the Python
         prompt):
32   # print([f(10) for f in fun_list1])
33   # print([f(10) for f in fun_list2])
34   # print([f(10) for f in fun_list3])
35   # print([f(10) for f in fun_list4])
     In the first for-loop, the function fun uses i, whose value is the last value it was
     assigned. In the second loop, the function fun2 uses iv. There is a separate iv
     variable for each function, and its value is the value of i when the function was
     defined. Thus fun1 uses late binding, and fun2 uses early binding. fun list3
     and fun list4 are equivalent to the first two (except fun list4 uses a different i
     variable).
         One of the advantages of using the embedded definitions (as in fun1 and
     fun2 above) over the lambda is that is it possible to add a __doc__ string, which
     is the standard for documenting functions in Python, to the embedded defini-
     tions.
     1.5.4 Generators
     Python has generators which can be used for a form of lazy evaluation – only
     computing values when needed.
         The yield command returns a value that is obtained with next. It is typi-
     cally used to enumerate the values for a for loop or in generators. (The yield
     command can also be used for coroutines, but AIPython only uses it for gener-
     ators.)
         A version of the built-in range, with 2 or 3 arguments (and positive steps)
     can be implemented as:
                                   pythonDemo.py — (continued)
49   def ga(n):
50       """generates square of even nonnegative integers less than n"""
51       for e in range(n):
52           if e%2==0:
53               yield e*e
54   a = ga(20)
     The sequence of next(a), and list(a) gives exactly the same results as the com-
     prehension in Section 1.5.2.
        It is straightforward to write a version of the built-in enumerate called myenumerate:
                                   pythonDemo.py — (continued)
56   def myenumerate(enum):
57       for i in range(len(enum)):
58           yield i,enum[i]
Exercise 1.2 Write a version of enumerate where the only iteration is “for val in
enum”. Hint: keep track of the index.
pythonDemo.py — (continued)
     At the end of the code are some commented-out commands you should try in
     interactive mode. Cut from the file and paste into Python (and remember to
     remove the comments symbol and leading space).
     1.7        Utilities
     1.7.1 Display
     In this distribution, to keep things simple and to only use standard Python, we
     use a text-oriented tracing of the code. A graphical depiction of the code can
     override the definition of display (see SearcherGUI in Section 3.2.2 and ConsistencyGUI
     in Section 4.4.2).
         The method self .display is used to trace the program. Any call
     where the level is less than or equal to the value for max display level will be
     printed. The to print . . . can be anything that is accepted by the built-in print
     (including any keyword arguments).
         The definition of display is:
     Note that args gets a tuple of the positional arguments, and nargs gets a dictio-
     nary of the keyword arguments). This will not work in Python 2, and will give
     an error.
        Any class that wants to use display can be made a subclass of Displayable.
        To change the maximum display level to say 3, for a class do:
     which will make calls to display in that class print when the value of level is less
     than-or-equal to 3. The default display level is 1. It can also be changed for
     individual objects (the object value overrides the class value).
        The value of max display level by convention is:
0 display nothing
2 also display the values as they change (little detail through a loop)
display.py — (continued)
26   def visualize(func):
27       """A decorator for algorithms that do interactive visualization.
28       Ignored here.
29       """
30       return func
     1.7.2 Argmax
     Python has a built-in max function that takes a generator (or a list or set) and re-
     turns the maximum value. The argmax method returns the index of an element
     that has the maximum value. If there are multiple elements with the maxi-
     mum value, one if the indexes to that value is returned at random. argmaxe
     assumes an enumeration; a generator of (element, value) pairs, as for example
     is generated by the built-in enumerate(list) for lists or dict.items() for dicts.
                                 utilities.py — AIPython useful utilities
11   import random
12   import math
13
14   def argmaxall(gen):
15       """gen is a generator of (element,value) pairs, where value is a real.
16       argmaxall returns a list of all of the elements with maximal value.
17       """
18       maxv = -math.inf      # negative infinity
19       maxvals = []     # list of maximal elements
20       for (e,v) in gen:
21           if v>maxv:
22               maxvals,maxv = [e], v
23           elif v==maxv:
24               maxvals.append(e)
25       return maxvals
26
27   def argmaxe(gen):
28       """gen is a generator of (element,value) pairs, where value is a real.
29       argmaxe returns an element with maximal value.
30       If there are multiple elements with the max value, one is returned at
             random.
31       """
32       return random.choice(argmaxall(gen))
33
34   def argmax(lst):
35       """returns maximum index in a list"""
36       return argmaxe(enumerate(lst))
37   # Try:
38   # argmax([1,6,3,77,3,55,23])
39
40   def argmaxd(dct):
41      """returns the arg max of a dictionary dct"""
42      return argmaxe(dct.items())
43   # Try:
44   # arxmaxd({2:5,5:9,7:7})
     Exercise 1.3 Change argmax to have an optional argument that specifies whether
     you want the “first”, “last” or a “random” index of the maximum value returned.
     If you want the first or the last, you don’t need to keep a list of the maximum
     elements.
     1.7.3 Probability
     For many of the simulations, we want to make a variable True with some prob-
     ability. flip(p) returns True with probability p, and otherwise returns False.
                                     utilities.py — (continued)
45   def flip(prob):
46       """return true with probability prob"""
47       return random.random() < prob
        The pick from dist method takes in a item : probability dictionary, and returns
     one of the items in proportion to its probability.
                                     utilities.py — (continued)
49   def pick_from_dist(item_prob_dist):
50       """ returns a value from a distribution.
51       item_prob_dist is an item:probability dictionary, where the
52           probabilities sum to 1.
53       returns an item chosen in proportion to its probability
54       """
55       ranreal = random.random()
56       for (it,prob) in item_prob_dist.items():
57           if ranreal < prob:
58               return it
59           else:
60               ranreal -= prob
61       raise RuntimeError(f"{item_prob_dist} is not a probability
             distribution")
63   def dict_union(d1,d2):
64       """returns a dictionary that contains the keys of d1 and d2.
65      The value for each key that is in d2 is the value from d2,
66      otherwise it is the value from d1.
67      This does not have side effects.
68      """
69      d = dict(d1) # copy d1
70      d.update(d2)
71      return d
utilities.py — (continued)
73   def test():
74       """Test part of utilities"""
75       assert argmax([1,6,55,3,55,23]) in [2,4]
76       assert dict_union({1:4, 2:5, 3:4},{5:7, 2:9}) == {1:4, 2:9, 3:4, 5:7}
77       print("Passed unit test in utilities")
78
79   if __name__ == "__main__":
80       test()
     The following does a simple check of all of AIPython that has automatic checks.
     If you develop new algorithms or tests, add there here!
utilities.py — (continued)
82   def test_aipython():
83       # Agents: currently no tests
84       # Search:
85       print("***** testing Search *****")
86       import searchGeneric, searchBranchAndBound, searchExample, searchTest
87       searchGeneric.test(searchGeneric.AStarSearcher)
88       searchBranchAndBound.test(searchBranchAndBound.DF_branch_and_bound)
89       searchTest.run(searchExample.problem1,"Problem 1")
90       # CSP
91       print("\n***** testing CSP *****")
92       import cspExamples, cspDFS, cspSearch, cspConsistency, cspSLS
93       cspExamples.test_csp(cspDFS.dfs_solve1)
94       cspExamples.test_csp(cspSearch.solver_from_searcher)
95       cspExamples.test_csp(cspConsistency.ac_solver)
96       cspExamples.test_csp(cspConsistency.ac_search_solver)
 97        cspExamples.test_csp(cspSLS.sls_solver)
 98        cspExamples.test_csp(cspSLS.any_conflict_solver)
 99        # Propositions
100        print("\n***** testing Propositional Logic *****")
101        import logicBottomUp, logicTopDown, logicExplain, logicNegation
102        logicBottomUp.test()
103        logicTopDown.test()
104        logicExplain.test()
105        logicNegation.test()
106        # Planning
107        print("\n***** testing Planning *****")
108        import stripsHeuristic
109        stripsHeuristic.test_forward_heuristic()
110        stripsHeuristic.test_regression_heuristic()
111        # Learning
112        print("\n***** testing Learning *****")
113        import learnProblem, learnNoInputs, learnDT, learnLinear
114        learnNoInputs.test_no_inputs(training_sizes=[4])
115        data = learnProblem.Data_from_file('data/carbool.csv', target_index=-1,
               seed=123)
116        learnDT.testDT(data, print_tree=False)
117        learnLinear.test()
118        # Deep Learning: currently no tests
119        # Uncertainty
120        print("\n***** testing Uncertainty *****")
121        import probGraphicalModels, probRC, probVE, probStochSim
122        probGraphicalModels.InferenceMethod.testIM(probRC.ProbSearch)
123        probGraphicalModels.InferenceMethod.testIM(probRC.ProbRC)
124        probGraphicalModels.InferenceMethod.testIM(probVE.VE)
125        probGraphicalModels.InferenceMethod.testIM(probStochSim.RejectionSampling,
               threshold=0.1)
126        probGraphicalModels.InferenceMethod.testIM(probStochSim.LikelihoodWeighting,
               threshold=0.1)
127        probGraphicalModels.InferenceMethod.testIM(probStochSim.ParticleFiltering,
               threshold=0.1)
128        probGraphicalModels.InferenceMethod.testIM(probStochSim.GibbsSampling,
               threshold=0.1)
129        # Learning under uncertainty: currently no tests
130        # Causality: currently no tests
131        import decnNetworks
132        decnNetworks.test(decnNetworks.fire_dn)
133        # Reinforement Learning: currently no tests
134        # Multiagent systems: currently no tests
135        # Relational Learning: currently no tests
                                        23
     24                             2. Agent Architectures and Hierarchical Control
         In this implementation, the state of the agent and the state of the environ-
     ment are represented using standard Python variables, which are updated as
     the state changes. The percept and the actions are represented as variable-value
     dictionaries. When agent has only a limited number of actions, the action can
     be a single value.
         In the following code raise NotImplementedError() is a way to specify
     an abstract method that needs to be overridden in any implemented agent or
     environment.
                                 agents.py — Agent and Controllers
11   from display import Displayable
12
13   class Agent(Displayable):
14
15        def initial_action(self, percept):
16            """return the initial action"""
17            raise NotImplementedError("go") # abstract method
18
19        def select_action(self, percept):
20            """return the next action (and update internal state) given percept
21            percept is variable:value dictionary
22            """
23            raise NotImplementedError("go") # abstract method
25   class Environment(Displayable):
26       def initial_percept(self):
27           """returns the initial percept for the agent"""
28           raise NotImplementedError("initial_percept") # abstract method
29
30        def do(self, action):
31            """does the action in the environment
         The simulator lets the agent and the environment take turns in updating
     their states and returning the action and the percept.
         The first implementation is a simple procedure to carry out n steps of the
     simulation and return the agent state and the environment state at the end.
agents.py — (continued)
35   class Simulate(Displayable):
36       """simulate the interaction between the agent and the environment
37       for n time steps.
38       Returns a pair of the agent state and the environment state.
39       """
40       def __init__(self,agent, environment):
41           self.agent = agent
42           self.env = environment
43           self.percept = self.env.initial_percept()
44           self.percept_history = [self.percept]
45           self.action_history = []
46
47      def go(self, n):
48          for i in range(n):
49              action = self.agent.select_action(self.percept)
50              self.display(2,f"i={i} action={action}")
51              self.percept = self.env.do(action)
52              self.display(2,f"    percept={self.percept}")
     tion”. The agent cannot access the price model; it just observes the prices and
     the amount in stock.
55   class TP_agent(Agent):
56       def __init__(self):
57           self.spent = 0
58           percept = env.initial_percept()
59           self.ave = self.last_price = percept['price']
60           self.instock = percept['instock']
61           self.buy_history = []
62
63      def select_action(self, percept):
64          """return next action to caurry out
65          """
66          self.last_price = percept['price']
67          self.ave = self.ave+(self.last_price-self.ave)*0.05
68          self.instock = percept['instock']
69          if self.last_price < 0.9*self.ave and self.instock < 60:
70              tobuy = 48
71          elif self.instock < 12:
72              tobuy = 12
73          else:
74              tobuy = 0
75          self.spent += tobuy*self.last_price
76          self.buy_history.append(tobuy)
77          return {'buy': tobuy}
     Set up an environment and an agent. Uncomment the last lines to run the agent
     for 90 steps, and determine the average amount spent.
                                   agentBuying.py — (continued)
79   env = TP_env()
80   ag = TP_agent()
81   sim = Simulate(ag,env)
82   #sim.go(90)
83   #ag.spent/env.time ## average spent per time period
     2.2.3 Plotting
     The following plots the price and number in stock history:
                                   agentBuying.py — (continued)
           • Give a controller that can work for many different price histories. An agent
             can use other local state variables, but does not have access to the environ-
             ment model.
           • Is it worthwhile trying to infer the amount of paper that the home uses?
             (Try your controller with the different paper consumption commented out
             in TP_env.do.)
300
250
                    200                                                 Price
            Value
                                                                        In stock
                    150                                                 Bought
100
50
                     0
                          0     20            40              60   80
                                                   Time
           Figure 2.1: Percept and command traces for the paper-buying agent
     In this implementation, each layer, including the top layer, implements the en-
     vironment class, because each layer is seen as an environment from the layer
     above.
         We arbitrarily divide the environment and the body, so that the environ-
     ment just defines the walls, and the body includes everything to do with the
     agent. Note that the named locations are part of the (top-level of the) agent,
     not part of the environment, although they could have been.
     2.3.1 Environment
     The environment defines the walls.
     2.3.2 Body
     The body defines everything about the agent body.
agentEnv.py — (continued)
22   import math
23   from agents import Environment
24   import matplotlib.pyplot as plt
25   import time
26
27   class Rob_body(Environment):
28       def __init__(self, env, init_pos=(0,0,90)):
29           """ env is the current environment
30           init_pos is a triple of (x-position, y-position, direction)
31               direction is in degrees; 0 is to right, 90 is straight-up, etc
32           """
33           self.env = env
34           self.rob_x, self.rob_y, self.rob_dir = init_pos
35           self.turning_angle = 18 # degrees that a left makes
36           self.whisker_length = 6 # length of the whisker
37           self.whisker_angle = 30 # angle of whisker relative to robot
38           self.crashed = False
39           # The following control how it is plotted
40           self.plotting = True    # whether the trace is being plotted
41           self.sleep_time = 0.05 # time between actions (for real-time
                 plotting)
42           # The following are data structures maintained:
43           self.history = [(self.rob_x, self.rob_y)] # history of (x,y)
                 positions
44           self.wall_history = [] # history of hitting the wall
45
46        def percept(self):
47            return {'rob_x_pos':self.rob_x, 'rob_y_pos':self.rob_y,
48                   'rob_dir':self.rob_dir, 'whisker':self.whisker(),
                         'crashed':self.crashed}
49        initial_percept = percept # use percept function for initial percept too
50
51        def do(self,action):
52            """ action is {'steer':direction}
53            direction is 'left', 'right' or 'straight'
54            """
55            if self.crashed:
56                return self.percept()
57            direction = action['steer']
58            compass_deriv =
                  {'left':1,'straight':0,'right':-1}[direction]*self.turning_angle
59            self.rob_dir = (self.rob_dir + compass_deriv +360)%360 # make in
                  range [0,360)
60            rob_x_new = self.rob_x + math.cos(self.rob_dir*math.pi/180)
61            rob_y_new = self.rob_y + math.sin(self.rob_dir*math.pi/180)
62            path = ((self.rob_x,self.rob_y),(rob_x_new,rob_y_new))
      The Boolean whisker method returns True when the whisker and the wall in-
      tersect.
agentEnv.py — (continued)
 76      def whisker(self):
 77          """returns true whenever the whisker sensor intersects with a wall
 78          """
 79          whisk_ang_world = (self.rob_dir-self.whisker_angle)*math.pi/180
 80              # angle in radians in world coordinates
 81          wx = self.rob_x + self.whisker_length * math.cos(whisk_ang_world)
 82          wy = self.rob_y + self.whisker_length * math.sin(whisk_ang_world)
 83          whisker_line = ((self.rob_x,self.rob_y),(wx,wy))
 84          hit = any(line_segments_intersect(whisker_line,wall)
 85                      for wall in self.env.walls)
 86          if hit:
 87              self.wall_history.append((self.rob_x, self.rob_y))
 88              if self.plotting:
 89                  plt.plot([self.rob_x],[self.rob_y],"ro")
 90                  plt.draw()
 91          return hit
 92
 93   def line_segments_intersect(linea,lineb):
 94       """returns true if the line segments, linea and lineb intersect.
 95       A line segment is represented as a pair of points.
 96       A point is represented as a (x,y) pair.
 97       """
 98       ((x0a,y0a),(x1a,y1a)) = linea
 99       ((x0b,y0b),(x1b,y1b)) = lineb
100       da, db = x1a-x0a, x1b-x0b
101       ea, eb = y1a-y0a, y1b-y0b
102       denom = db*ea-eb*da
103       if denom==0: # line segments are parallel
104           return False
105       cb = (da*(y0b-y0a)-ea*(x0b-x0a))/denom # position along line b
106       if cb<0 or cb>1:
107           return False
108       ca = (db*(y0b-y0a)-eb*(x0b-x0a))/denom # position along line a
     The following method determines how to steer depending on whether the goal
     is to the right or the left of where the robot is facing.
                                    agentMiddle.py — (continued)
44      def steer(self,target_pos):
45          if self.percept['whisker']:
46              self.display(3,'whisker on', self.percept)
47              return "left"
48          else:
49              return self.head_towards(target_pos)
50
51      def head_towards(self,target_pos):
52             """ given a target position, return the action that heads
                   towards that position
53             """
54             gx,gy = target_pos
55             rx,ry = self.percept['rob_x_pos'],self.percept['rob_y_pos']
56             goal_dir = math.acos((gx-rx)/math.sqrt((gx-rx)*(gx-rx)
57                                                +(gy-ry)*(gy-ry)))*180/math.pi
58             if ry>gy:
59                 goal_dir = -goal_dir
60             goal_from_rob = (goal_dir - self.percept['rob_dir']+540)%360-180
61             assert -180 < goal_from_rob <= 180
62             if goal_from_rob > self.straight_angle:
63                 return "left"
64             elif goal_from_rob < -self.straight_angle:
65                 return "right"
66             else:
67                 return "straight"
68
69      def close_enough(self,target_pos):
70          gx,gy = target_pos
71          rx,ry = self.percept['rob_x_pos'],self.percept['rob_y_pos']
72          return (gx-rx)**2 + (gy-ry)**2 <= self.close_threshold_squared
19           timeout is the number of steps the middle layer goes before giving
                 up
20           locations is a loc:pos dictionary
21               where loc is a named location, and pos is an (x,y) position.
22           """
23           self.middle = middle
24           self.timeout = timeout # number of steps before the middle layer
                 should give up
25           self.locations = locations
26
27        def do(self,plan):
28            """carry out actions.
29            actions is of the form {'visit':list_of_locations}
30            It visits the locations in turn.
31            """
32            to_do = plan['visit']
33            for loc in to_do:
34                position = self.locations[loc]
35                arrived = self.middle.do({'go_to':position,
                      'timeout':self.timeout})
36                self.display(1,"Arrived at",loc,arrived)
     2.3.5 Plotting
     The following is used to plot the locations, the walls and (eventually) the move-
     ment of the robot. It can either plot the movement if the robot as it is go-
     ing (with the default env.plotting = True), or not plot it as it is going (setting
     env.plotting = False; in this case the trace can be plotted using pl.plot run()).
agentTop.py — (continued)
                                                                       storage
              50
              40
              30
              20
              10    mail                          o103                 o109
                      0       20         40           60        80   100
     Figure 2.2: A trace of the trajectory of the agent. Red dots correspond to the
     whisker sensor being on; the green dot to the whisker sensor being off. The agent
     starts at position (0, 0) facing up.
57                plt.plot([x],[y],"k<")
58                plt.text(x+1.0,y+0.5,loc) # print the label above and to the
                      right
59            plt.plot([body.rob_x],[body.rob_y],"go")
60            plt.gca().figure.canvas.draw()
61            if self.body.history or self.body.wall_history:
62                self.plot_run()
63
64      def plot_run(self):
65          """plots the history after the agent has finished.
66          This is typically only used if body.plotting==False
67          """
68          if self.body.history:
69              xs,ys = zip(*self.body.history)
70              plt.plot(xs,ys,"go")
71          if self.body.wall_history:
72              wxs,wys = zip(*self.body.wall_history)
73              plt.plot(wxs,wys,"ro")
         The following code plots the agent as it acts in the world. Figure 2.2 shows
     the result of the top.do
                                    agentTop.py — (continued)
30
20
10
0 goal
10
                         20
                                0     10      20     30    40     50     60        70
83   #   pl=Plot_env(body,top)
84   #   top.do({'visit':['o109','storage','o109','o103']})
85   #   You can directly control the middle layer:
86   #   middle.do({'go_to':(30,-10), 'timeout':200})
87   #   Can you make it crash?
     Exercise 2.2 The following code implements a robot trap (Figure 2.3). Write a
     controller that can escape the “trap” and get to the goal. See Exercise 2.4 in the
     textbook for hints.
agentTop.py — (continued)
13
14   class Plot_follow(Plot_env):
15       def __init__(self, body, top, epsilon=2.5):
16           """plot the agent in the environment.
17           epsilon is the threshold how how close someone needs to click to
                 select a location.
18           """
19           Plot_env.__init__(self, body, top)
20           self.epsilon = epsilon
21           self.canvas = plt.gca().figure.canvas
22           self.canvas.mpl_connect('button_press_event', self.on_press)
23           self.canvas.mpl_connect('button_release_event', self.on_release)
24           self.canvas.mpl_connect('motion_notify_event', self.on_move)
25           self.pressloc = None
26           self.pressevent = None
27           for loc in self.top.locations:
28               self.display(2,f" loc {loc} at {self.top.locations[loc]}")
29
30      def on_press(self, event):
31          self.display(2,'v',end="")
32          self.display(2,f"Press at ({event.xdata},{event.ydata}")
33          for loc in self.top.locations:
34              lx,ly = self.top.locations[loc]
35              if abs(event.xdata- lx) <= self.epsilon and abs(event.ydata-
                    ly) <= self.epsilon :
36                  self.pressloc = loc
37                  self.pressevent = event
38                  self.display(2,"moving",loc)
39
40      def on_release(self, event):
41          self.display(2,'ˆ',end="")
42          if self.pressloc is not None: #and event.inaxes ==
                self.pressevent.inaxes:
43              self.top.locations[self.pressloc] = (event.xdata, event.ydata)
44              self.display(1,f"Placing {self.pressloc} at {(event.xdata,
                    event.ydata)}")
45          self.pressloc = None
46          self.pressevent = None
47
48      def on_move(self, event):
49          if self.pressloc is not None: # and event.inaxes ==
                self.pressevent.inaxes:
50              self.display(2,'-',end="")
51              self.top.locations[self.pressloc] = (event.xdata, event.ydata)
52              self.redraw()
53          else:
54              self.display(2,'.',end="")
55
56   # try:
57   # pl=Plot_follow(body,top)
58 # top.do({'visit':['o109','storage','o109','o103']})
• a start node
                                                   39
     40                                                             3. Searching for Solutions
17        * a   start node
18        * a   neighbors function that gives the neighbors of a node
19        * a   specification of a goal
20        * a   (optional) heuristic function.
21        The   methods must be overridden to define a search problem."""
22
23        def start_node(self):
24            """returns start node"""
25            raise NotImplementedError("start_node") # abstract method
26
27        def is_goal(self,node):
28            """is True if node is a goal"""
29            raise NotImplementedError("is_goal") # abstract method
30
31        def neighbors(self,node):
32            """returns a list (or enumeration) of the arcs for the neighbors of
                  node"""
33            raise NotImplementedError("neighbors") # abstract method
34
35        def heuristic(self,n):
36            """Gives the heuristic value of node n.
37            Returns 0 if not overridden."""
38            return 0
        The neighbors is a list of arcs. A (directed) arc consists of a from node node
     and a to node node. The arc is the pair ⟨from node, to node⟩, but can also contain
     a non-negative cost (which defaults to 1) and can be labeled with an action.
                                   searchProblem.py — (continued)
40   class Arc(object):
41       """An arc has a from_node and a to_node node and a (non-negative)
             cost"""
42       def __init__(self, from_node, to_node, cost=1, action=None):
43           self.from_node = from_node
44           self.to_node = to_node
45           self.action = action
46           self.cost = cost
47           assert cost >= 0, (f"Cost cannot be negative: {self}, cost={cost}")
48
49        def __repr__(self):
50            """string representation of an arc"""
51            if self.action:
52                return f"{self.from_node} --{self.action}--> {self.to_node}"
53            else:
54                return f"{self.from_node} --> {self.to_node}"
• a start node
     To define a search problem, we need to define the start node, the goal predicate,
     the neighbors function and the heuristic function.
                                   searchProblem.py — (continued)
56   class Search_problem_from_explicit_graph(Search_problem):
57       """A search problem from an explicit graph.
58       """
59
60      def __init__(self, title, nodes, arcs, start=None, goals=set(), hmap={},
61                      positions=None, show_costs = True):
62          """ A search problem consists of:
63          * list or set of nodes
64          * list or set of arcs
65          * start node
66          * list or set of goal nodes
67          * hmap: dictionary that maps each node into its heuristic value.
68          * positions: dictionary that maps each node into its (x,y) position
69          * show_costs is used for show()
70          """
71          self.title = title
72          self.neighs = {}
73          self.nodes = nodes
74          for node in nodes:
75              self.neighs[node]=[]
76          self.arcs = arcs
77          for arc in arcs:
78              self.neighs[arc.from_node].append(arc)
79          self.start = start
80          self.goals = goals
81          self.hmap = hmap
82          if positions is None:
83              self.positions = {node:(random.random(),random.random()) for
                    node in nodes}
84          else:
85              self.positions = positions
86          self.show_costs = show_costs
87
88
 89        def start_node(self):
 90            """returns start node"""
 91            return self.start
 92
 93        def is_goal(self,node):
 94            """is True if node is a goal"""
 95            return node in self.goals
 96
 97        def neighbors(self,node):
 98            """returns the neighbors of node (a list of arcs)"""
 99            return self.neighs[node]
100
101        def heuristic(self,node):
102            """Gives the heuristic value of node n.
103            Returns 0 if not overridden in the hmap."""
104            if node in self.hmap:
105                return self.hmap[node]
106            else:
107                return 0
108
109        def __repr__(self):
110            """returns a string representation of the search problem"""
111            res=""
112            for arc in self.arcs:
113                res += f"{arc}. "
114            return res
searchProblem.py — (continued)
134
135      def show_node(self, ax, node, node_color):
136             x,y = self.positions[node]
137             ax.text(x,y,node,bbox=dict(boxstyle="round4,pad=1.0,rounding_size=0.5",
138                                               facecolor=node_color),
                                                      ha='center',va='center',
139                         fontsize=self.fontsize)
140
141      def show_arc(self, ax, arc, arc_color='black', node_color='white'):
142             from_pos = self.positions[arc.from_node]
143             to_pos = self.positions[arc.to_node]
144             ax.annotate(arc.to_node, from_pos, xytext=to_pos,
145                               # arrowprops=dict(facecolor='black',
                                      shrink=0.1, width=2),
146                               arrowprops={'arrowstyle':'<|-', 'linewidth':
                                      2, 'color':arc_color},
147                               bbox=dict(boxstyle="round4,pad=1.0,rounding_size=0.5",
148                                               facecolor=node_color),
149                               ha='center',va='center',
150                               fontsize=self.fontsize)
151             # Add costs to middle of arcs:
152             if self.show_costs:
153                 ax.text((from_pos[0]+to_pos[0])/2, (from_pos[1]+to_pos[1])/2,
154                         arc.cost, bbox=dict(pad=1,fc='w',ec='w'),
155                         ha='center',va='center',fontsize=self.fontsize)
      3.1.2 Paths
      A searcher will return a path from the start node to a goal node. A Python list
      is not a suitable representation for a path, as many search algorithms consider
      multiple paths at once, and these paths should share initial parts of the path.
      If we wanted to do this with Python lists, we would need to keep copying the
      list, which can be expensive if the list is long. An alternative representation is
      used here in terms of a recursive data structure that can share subparts.
           A path is either:
         • a path, initial and an arc, where the from node of the arc is the node at the
           end of initial.
      These cases are distinguished in the following code by having arc = None if
      the path has length 0, in which case initial is the node of the path. Note that
      we only use the most basic form of Python’s yield for enumerations (Section
      1.5.4).
                                   searchProblem.py — (continued)
                                       A
                                       1           3
                                               1
                                       C               B
                                                       1       3
                                           3
                                                       D           G
                                                               1
                                   A                                   H
                                                       3
                                       1                                   1
                                B          1       D           G       J
                                                           1
                               3                       3
C E
                                 4 J            4              G 0
                                                        E 3            3
                                                2              5
                             7         5 B                                 H 3
                                                              F
                                                    3
                                        2                 2        4
                            C               A             D
                                   3                4
                            9               7             6
Figure 3.3: simp delivery graph with arc costs and h values of nodes
22        {'A','B','C','D','E','G','H','J'},
23        [Arc('A','B',1), Arc('B','C',3), Arc('B','D',1), Arc('D','E',3),
24            Arc('D','G',1), Arc('A','H',3), Arc('H','J',1)],
25        start = 'A',
26        goals = {'G'},
27        positions={'A': (0, 1), 'B': (0, 3/4), 'C': (0,0), 'D': (1/4,3/4), 'E':
              (1/4,0),
28                      'G': (2/4,3/4), 'H': (3/4,1), 'J': (3/4,3/4)})
     The third search problem is a disconnected graph (contains no arcs), where the
     start node is a goal node. This is a boundary case to make sure that weird cases
     work.
                                  searchExample.py — (continued)
                                                   4               G
                                   J
                                                       E               3
                                               2
                             6                                             H
                                           B                   F
                                                       3
                                       2                   2           4
                            C              A               D
                                   3                   4
         cyclic simp delivery graph is the graph shown Figure 3.4. This is the
     graph of Figure 3.10 of [Poole and Mackworth, 2023]. The heuristic values
     are the same as in simp delivery graph.
searchExample.py — (continued)
73 cyclic_simp_delivery_graph = Search_problem_from_explicit_graph("Cyclic
            Delivery Graph",
 74         {'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J'},
 75         [    Arc('A', 'B', 2),
 76              Arc('A', 'C', 3),
 77              Arc('A', 'D', 4),
 78              Arc('B', 'E', 2),
 79              Arc('B', 'F', 3),
 80              Arc('C', 'A', 3),
 81              Arc('C', 'J', 6),
 82              Arc('D', 'A', 4),
 83              Arc('D', 'H', 4),
 84              Arc('F', 'B', 3),
 85              Arc('F', 'D', 2),
 86              Arc('G', 'H', 3),
 87              Arc('G', 'J', 4),
 88              Arc('H', 'D', 4),
 89              Arc('H', 'G', 3),
 90              Arc('J', 'C', 6),
 91              Arc('J', 'G', 4)],
 92        start = 'A',
 93        goals = {'G'},
 94        hmap = {
 95             'A': 7,
 96             'B': 5,
 97             'C': 9,
 98             'D': 6,
 99             'E': 3,
100             'F': 5,
101             'G': 0,
102             'H': 3,
103             'J': 4,
104         },
105         positions = {
106             'A': (0.4,0.1),
107             'B': (0.4,0.4),
108             'C': (0.1,0.1),
109             'D': (0.7,0.1),
110             'E': (0.6,0.7),
111             'F': (0.7,0.4),
112             'G': (0.7,0.9),
113             'H': (0.9,0.6),
114             'J': (0.3,0.9)
115             })
          The next problem is the tree graph shown in Figure 3.6, and is Figure 3.15
      in Poole and Mackworth [2023].
searchExample.py — (continued)
                                              Tree Graph
                                                  A
B C
D E F G
H I J K L M N
O P Q R S T U V
W X Y Z AA BB CC DD EE
FF GG HH II JJ KK
119          'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'AA', 'BB',
                 'CC',
120          'DD', 'EE', 'FF', 'GG', 'HH', 'II', 'JJ', 'KK'},
121      [   Arc('A', 'B', 1),
122          Arc('A', 'C', 1),
123          Arc('B', 'D', 1),
124          Arc('B', 'E', 1),
125          Arc('C', 'F', 1),
126          Arc('C', 'G', 1),
127          Arc('D', 'H', 1),
128          Arc('D', 'I', 1),
129          Arc('E', 'J', 1),
130          Arc('E', 'K', 1),
131          Arc('F', 'L', 1),
132          Arc('G', 'M', 1),
133          Arc('G', 'N', 1),
134          Arc('H', 'O', 1),
135          Arc('H', 'P', 1),
      3.2.1 Searcher
      A Searcher for a problem can be asked repeatedly for the next path. To solve
      a problem, you can construct a Searcher object for the problem and then re-
      peatedly ask for the next path using search. If there are no more paths, None is
      returned.
26
27        def initialize_frontier(self):
28            self.frontier = []
29
30        def empty_frontier(self):
31            return self.frontier == []
32
33        def add_to_frontier(self,path):
34            self.frontier.append(path)
35
36        def search(self):
37            """returns (next) path from the problem's start node
38            to a goal node.
39            Returns None if no path exists.
40            """
41            while not self.empty_frontier():
42                self.path = self.frontier.pop()
43                self.num_expanded += 1
44                if self.problem.is_goal(self.path.end()): # solution found
45                    self.solution = self.path # store the solution found
46                    self.display(1, f"Solution: {self.path} (cost:
                          {self.path.cost})\n",
47                        self.num_expanded, "paths have been expanded and",
48                               len(self.frontier), "paths remain in the
                                     frontier")
49                    return self.path
50                else:
51                    self.display(4,f"Expanding: {self.path} (cost:
                          {self.path.cost})")
52                    neighs = self.problem.neighbors(self.path.end())
53                    self.display(2,f"Expanding: {self.path} with neighbors
                          {neighs}")
54                    for arc in reversed(list(neighs)):
55                        self.add_to_frontier(Path(self.path,arc))
56                    self.display(3, f"New frontier: {[p.end() for p in
                          self.frontier]}")
57
58           self.display(0,"No (more) solutions. Total of",
59                       self.num_expanded,"paths expanded.")
     Note that this reverses the neighbors so that it implements depth-first search in
     an intuitive manner (expanding the first neighbor first). The call to list is for the
     case when the neighbors are generated (and not already in a list). Reversing the
     neighbors might not be required for other methods. The calls to reversed and
     list can be removed, and the algorithm still implements depth-fist search.
          To use depth-first search to find multiple paths for problem1 and simp delivery graph,
     copy and paste the following into Python’s read-evaluate-print loop; keep find-
     ing next solutions until there are no more:
searchGeneric.py — (continued)
Expanding: A --> B
J 4 G
                                                                     3
                                                         E
                                                                           H
                                                 2
                             7
                                          B          3        F
                                                                     4
                                          2                   2
                                                                         red: selected
                        C        3        A          4        D          blue: neighbors
                                                                         green: frontier
                                                                         yellow: goal
     Exercise 3.1 Implement breadth-first search. Only add to frontier and/or pop need
     to be modified to implement a first-in first-out queue.
          2. what is shown for a “step” on a GUI; here it is assumed to the the path,
             the neighbors of the end of the path, and the other nodes at the end of
             paths on the frontier
          3. (shown with “fine step” but not with “step”) the frontier and the path
             selected
4. (shown with “fine step” but not with “step”) the frontier.
39          self.problem.show_node(self.ax, self.problem.start,
                 self.colors['frontier'])
40          for node in self.problem.nodes:
41              if self.problem.is_goal(node):
42                   self.problem.show_node(self.ax, node,self.colors['goal'])
43          plt.show()
44          self.click = 7 # bigger than any display!
45          #while self.click == 0:
46          #     plt.pause(0.1)
47          self.searcher.display = self.display
48          try:
49              while self.searcher.frontier:
50                   path = self.searcher.search()
51          except ExitToPython:
52              print("Exited")
53          else:
54              print("No more solutions")
55
56      def display(self, level,*args,**nargs):
57          if level <= self.click: #step
58              print(*args, **nargs)
59              self.ax.set_title(f"Expanding:
                    {self.searcher.path}",fontsize=self.problem.fontsize)
60              if level == 1:
61                  self.show_frontier(self.colors['frontier'])
62                  self.show_path(self.colors['selected'])
63                  self.ax.set_title(f"Solution Found:
                        {self.searcher.path}",fontsize=self.problem.fontsize)
64              elif level == 2: # what should be shown if a node is in all
                    three?
65                  self.show_frontier(self.colors['frontier'])
66                  self.show_path(self.colors['selected'])
67                  self.show_neighbors(self.colors['neighbors'])
68              elif level == 3:
69                  self.show_frontier(self.colors['frontier'])
70                  self.show_path(self.colors['selected'])
71              elif level == 4:
72                  self.show_frontier(self.colors['frontier'])
73
74
75             # wait for a button click
76             self.click = 0
77             plt.draw()
78             while self.click == 0:
79                 plt.pause(0.1)
80             # undo coloring:
81             self.ax.set_title("")
82             self.show_frontier('white')
83             self.show_neighbors('white')
84             path_show = self.searcher.path
 85               while path_show.arc:
 86                   self.problem.show_arc(self.ax, path_show.arc, 'black')
 87                   self.problem.show_node(self.ax, path_show.end(), 'white')
 88                   path_show = path_show.initial
 89               self.problem.show_node(self.ax, path_show.end(), 'white')
 90               if self.problem.is_goal(self.searcher.path.end()):
 91                   self.problem.show_node(self.ax, self.searcher.path.end(),
                          self.colors['goal'])
 92               plt.draw()
 93
 94        def show_frontier(self, color):
 95            for path in self.searcher.frontier:
 96                self.problem.show_node(self.ax, path.end(), color)
 97
 98        def show_path(self, color):
 99            """color selected path"""
100            path_show = self.searcher.path
101            while path_show.arc:
102                   self.problem.show_arc(self.ax, path_show.arc, color)
103                   self.problem.show_node(self.ax, path_show.end(), color)
104                   path_show = path_show.initial
105            self.problem.show_node(self.ax, path_show.end(), color)
106
107        def show_neighbors(self, color):
108            for neigh in self.problem.neighbors(self.searcher.path.end()):
109                self.problem.show_node(self.ax, neigh.to_node, color)
110
111        def auto(self,event):
112            self.click = 1
113        def step(self,event):
114            self.click = 2
115        def finestep(self,event):
116            self.click = 3
117        def quit(self,event):
118            quit()
119
120   class ExitToPython(Exception):
121       pass
searchGUI.py — (continued)
      The following methods are used for finding and printing information about
      the frontier.
                                   searchGeneric.py — (continued)
      3.2.4 A∗ Search
      For an A∗ Search the frontier is implemented using the FrontierPQ class.
                                   searchGeneric.py — (continued)
125          super().__init__(problem)
126
127       def initialize_frontier(self):
128           self.frontier = FrontierPQ()
129
130       def empty_frontier(self):
131           return self.frontier.empty()
132
133       def add_to_frontier(self,path):
134           """add path to the frontier with the appropriate cost"""
135           value = path.cost+self.problem.heuristic(path.end())
136           self.frontier.add(path, value)
         Code should always be tested. The following provides a simple unit test,
      using problem1 as the the default problem.
searchGeneric.py — (continued)
     Exercise 3.2 Change the code so that it implements (i) best-first search and (ii)
     lowest-cost-first search. For each of these methods compare it to A∗ in terms of the
     number of paths expanded, and the path found.
     Exercise 3.3 The searcher acts like a Python iterator, in that it returns one value
     (here a path) and then returns other values (paths) on demand, but does not imple-
     ment the iterator interface. Change the code so it implements the iterator interface.
     What does this enable us to do?
     Exercise 3.4 Chris was very puzzled as to why there was a minus (“−”) in the
     second element of the tuple added to the heap in the add method in FrontierPQ in
     searchGeneric.py.
         Sam suggested the following example would demonstrate the importance of
     the minus. Consider an infinite integer grid, where the states are pairs of integers,
     the start is (0,0), and the goal is (10,10). The neighbors of (i, j) are (i + 1, j) and (i, j +
     1). Consider the heuristic function h((i, j)) = |10 − i| + |10 − j|. Sam suggested you
     compare how many paths are expanded with the minus and without the minus.
     searchGrid is a representation of Sam’s graph. If something takes too long, you
     might consider changing the size.
     Explain to Chris what the minus does and why it is there. Give evidence for your
     claims. It might be useful to refer to other search strategies in your explanation.
     As part of your explanation, explain what is special about Sam’s example.
     Exercise 3.5 Implement a searcher that implements cycle pruning instead of
     multiple-path pruning. You need to decide whether to check for cycles when paths
     are added to the frontier or when they are removed. (Hint: either method can be
     implemented by only changing one or two lines in SearcherMPP. Hint: there is
     a cycle if path.end() in path.initial_nodes() ) Compare no pruning, multiple
     path pruning and cycle pruning for the cyclic delivery problem. Which works
     better in terms of number of paths expanded, computational time or space?
         Depth-first search methods do not need an a priority queue, but can use
     a list as a stack. In this implementation of branch-and-bound search, we call
     search to find an optimal solution with cost less than bound. This uses depth-
     first search to find a path to a goal that extends path with cost less than the
     bound. Once a path to a goal has been found, that path is remembered as the
     best path, the bound is reduced, and the search continues.
                         searchBranchAndBound.py — Branch and Bound Search
11   from searchProblem import Path
12   from searchGeneric import Searcher
13   from display import Displayable, visualize
14
15   class DF_branch_and_bound(Searcher):
16       """returns a branch and bound searcher for a problem.
17       An optimal path with cost less than bound can be found by calling
             search()
18       """
19       def __init__(self, problem, bound=float("inf")):
20           """creates a searcher than can be used with search() to find an
                 optimal path.
21           bound gives the initial bound. By default this is infinite -
                 meaning there
22           is no initial pruning due to depth bound
23           """
24           super().__init__(problem)
25           self.best_path = None
26           self.bound = bound
27
28      def search(self):
29          """returns an optimal solution to a problem with cost less than
                bound.
30          returns None if there is no solution with cost less than bound."""
31          self.frontier = [Path(self.problem.start_node())]
32          self.num_expanded = 0
33          while self.frontier:
34              self.path = self.frontier.pop()
35              if self.path.cost+self.problem.heuristic(self.path.end()) <
                    self.bound:
36                  # if self.path.end() not in self.path.initial_nodes(): # for
                        cycle pruning
37                  self.display(2,"Expanding:",self.path,"cost:",self.path.cost)
38                  self.num_expanded += 1
39                  if self.problem.is_goal(self.path.end()):
40                      self.best_path = self.path
41                      self.bound = self.path.cost
42                      self.display(1,"New best path:",self.path,"
                            cost:",self.path.cost)
43                  else:
44                      neighs = self.problem.neighbors(self.path.end())
45                      self.display(4,"Neighbors are", neighs)
46                      for arc in reversed(list(neighs)):
47                          self.add_to_frontier(Path(self.path, arc))
48                      self.display(3, f"New frontier: {[p.end() for p in
                            self.frontier]}")
49          self.path = self.best_path
50           self.solution = self.best_path
51           self.display(1,f"Optimal solution is {self.best_path}." if
                 self.best_path
52                               else "No solution found.",
53                          f"Number of paths expanded: {self.num_expanded}.")
54           return self.best_path
     Note that this code used reversed in order to expand the neighbors of a node
     in the left-to-right order one might expect. It does this because pop() removes
     the rightmost element of the list. The call to list is there because reversed only
     works on lists and tuples, but the neighbors can be generated.
         Here is a unit test and some queries:
                                    searchBranchAndBound.py — (continued)
     Exercise 3.6 In searcherb2, in the code above, what happens if the bound is
     smaller, say 10? What if it larger, say 1000?
     Exercise 3.7 Implement a branch-and-bound search uses recursion. Hint: you
     don’t need an explicit frontier, but can do a recursive call for the children.
     Exercise 3.8 After the branch-and-bound search found a solution, Sam ran search
     again, and noticed a different count. Sam hypothesized that this count was related
     to the number of nodes that an A∗ search would use (either expand or be added to
     the frontier). Or maybe, Sam thought, the count for a number of nodes when the
     bound is slightly above the optimal path case is related to how A∗ would work.
     Is there relationship between these counts? Are there different things that it could
     count so they are related? Try to find the most specific statement that is true, and
     explain why it is true.
         To test the hypothesis, Sam wrote the following code, but isn’t sure it is helpful:
17
18   def run(problem,name):
19       print("\n\n*******",name)
20
21      print("\nA*:")
22      asearcher = AStarSearcher(problem)
23      print("Path found:",asearcher.search()," cost=",asearcher.solution.cost)
24      print("there are",asearcher.frontier.count(asearcher.solution.cost),
25           "elements remaining on the queue with
                 f-value=",asearcher.solution.cost)
26
27      print("\nA* with MPP:"),
28      msearcher = SearcherMPP(problem)
29      print("Path found:",msearcher.search()," cost=",msearcher.solution.cost)
30      print("there are",msearcher.frontier.count(msearcher.solution.cost),
31           "elements remaining on the queue with
                 f-value=",msearcher.solution.cost)
32
33      bound = asearcher.solution.cost+0.01
34      print("\nBranch and bound (with too-good initial bound of", bound,")")
35      tbb = DF_branch_and_bound(problem,bound) # cheating!!!!
36      print("Path found:",tbb.search()," cost=",tbb.solution.cost)
37      print("Rerunning B&B")
38      print("Path found:",tbb.search())
39
40      bbound = asearcher.solution.cost*2+10
41      print("\nBranch and bound (with not-very-good initial bound of",
            bbound, ")")
42      tbb2 = DF_branch_and_bound(problem,bbound)
43      print("Path found:",tbb2.search()," cost=",tbb2.solution.cost)
44      print("Rerunning B&B")
45      print("Path found:",tbb2.search())
46
47      print("\nDepth-first search: (Use ˆC if it goes on forever)")
48      tsearcher = Searcher(problem)
49      print("Path found:",tsearcher.search()," cost=",tsearcher.solution.cost)
50
51
52   import searchExample
53   from searchTest import run
54   if __name__ == "__main__":
55       run(searchExample.problem1,"Problem 1")
56   # run(searchExample.simp_delivery_graph,"Acyclic Delivery")
57   # run(searchExample.cyclic_simp_delivery_graph,"Cyclic Delivery")
58   # also test some graphs with cycles, and some with multiple least-cost
         paths
                                                         67
     68                                                              4. Reasoning with Constraints
32            return self.name
33
34        def __repr__(self):
35            return self.name # f"Variable({self.name})"
     4.1.2 Constraints
     A constraint consists of:
• An optional name
     4.1.3 CSPs
     A constraint satisfaction problem (CSP) requires:
cspProblem.py — (continued)
50   class CSP(object):
51       """A CSP consists of
52       * a title (a string)
53       * variables, a set of variables
54       * constraints, a list of constraints
55       * var_to_const, a variable to set of constraints dictionary
56       """
78        def consistent(self,assignment):
79            """assignment is a variable:value dictionary
80            returns True if all of the constraints that can be evaluated
81                           evaluate to True given assignment.
82            """
83            return all(con.holds(assignment)
84                       for con in self.constraints
85                       if con.can_evaluate(assignment))
         The show method uses matplotlib to show the graphical structure of a con-
     straint network. If the node positions are not specified, this gives different
     positions each time it is run; if you don’t like the graph, try again.
                                     cspProblem.py — (continued)
      4.1.4 Examples
      In the following code ne , when given a number, returns a function that is true
      when its argument is not that number. For example, if f = ne (3), then f (2)
      is True and f (3) is False. That is, ne (x)(y) is true when x ̸= y. Allowing
      a function of multiple arguments to use its arguments one at a time is called
      currying, after the logician Haskell Curry. Functions used as conditions in
      constraints require names (so they can be printed).
                                  cspExamples.py — Example CSPs
 11   from cspProblem import Variable, CSP, Constraint
 12   from operator import lt,ne,eq,gt
 13
 14   def ne_(val):
 15       """not equal value"""
 16       # nev = lambda x: x != val # alternative definition
 17       # nev = partial(neq,val) # another alternative definition
 18       def nev(x):
 19           return val != x
 20       nev.__name__ = f"{val} != " # name of the function
 21       return nev
      Similarly is (x)(y) is true when x = y.
                                   cspExamples.py — (continued)
 23   def is_(val):
 24       """is a value"""
 25       # isv = lambda x: x == val # alternative definition
                                                  csp1
                                 A                                  B   B != 2
                                                                          C
                                            A<B
B<C
32   X = Variable('X', {1,2,3})
33   Y = Variable('Y', {1,2,3})
34   Z = Variable('Z', {1,2,3})
35   csp0 = CSP("csp0", {X,Y,Z},
36             [ Constraint([X,Y],lt),
37               Constraint([Y,Z],lt)])
     The CSP, csp1 has variables A, B and C, each with domain {1, 2, 3, 4}. The con-
     straints are A < B, B ̸= 2, and B < C. This is slightly more interesting than
     csp0 as it has more solutions. This example is used in the unit tests, and so if it
     is changed, the unit tests need to be changed. The CSP csp1s is the same, but
     with only the constraints A < B and B < C
                                     cspExamples.py — (continued)
csp2
A A != B B B != 3
A=D B != D A != C
                                        A>E                 B>E
                    D                              C<D                                  C
D>E C>E C != 2
        The next CSP, csp2 is Example 4.9 of Poole and Mackworth [2023]; the do-
     main consistent network (after applying the unary constraints) is shown in Fig-
     ure 4.2. Note that we use the same variables as the previous example and add
     two more.
cspExamples.py — (continued)
csp3
A A != B B
A<D
D != E C != E
     The following example is another scheduling problem (but with multiple an-
     swers). This is the same a scheduling 2 in the original AIspace.org consistency
     app.
cspExamples.py — (continued)
cspExamples.py — (continued)
75   def adjacent(x,y):
76      """True when x and y are adjacent numbers"""
77      return abs(x-y) == 1
                                                     csp4
                              A                adjacent(A,B)                B
B != D A != C adjacent(B,C)
D adjacent(C,D) C
adjacent(D,E) C != E
78
79   csp4 = CSP("csp4", {A,B,C,D},
80             [Constraint([A,B], adjacent,            "adjacent(A,B)"),
81              Constraint([B,C], adjacent,            "adjacent(B,C)"),
82              Constraint([C,D], adjacent,            "adjacent(C,D)"),
83              Constraint([A,C], ne, "A !=            C"),
84              Constraint([B,D], ne, "B !=            D") ])
         The following examples represent the crossword shown in Figure 4.5.
         In the first representation, the variables represent words. The constraint
     imposed by the crossword is that where two words intersect, the letter at the
     intersection must be the same. The method meet_at is used to test whether two
     words intersect with the same letter. For example, the constraint meet_at(2,0)
     means that the third letter (at position 2) of the first argument is the same as
     the first letter of the second argument. This is shown in Figure 4.6.
                                    cspExamples.py — (continued)
86   def meet_at(p1,p2):
87       """returns a function of two words that is true
88                   when the words intersect at positions p1, p2.
89       The positions are relative to the words; starting at position 0.
90       meet_at(p1,p2)(w1,w2) is true if the same letter is at position p1 of
             word w1
91            and at position p2 of word w2.
92       """
93       def meets(w1,w2):
94           return w1[p1] == w2[p2]
1 2
                                                                         Words:
                3
                                                                         ant, big, bus, car, has,
                                                                         book, buys, hold, lane,
                                                                         year, ginger, search,
                                                                         symbol, syntax.
                                     4
crossword1
                                         one_across
                                                             meet_at(2,0)[one_across, two_down]
                meet_at(0,0)[one_across, one_down]                                                          two_down
one_down
                                                  meet_at(2,2)[three_across, two_down]
     meet_at(0,2)[three_across, one_down]
meet_at(0,4)[four_across, two_down]
three_across
four_across
 95        meets.__name__ = f"meet_at({p1},{p2})"
 96        return meets
 97
 98   one_across = Variable('one_across', {'ant', 'big', 'bus', 'car', 'has'},
          position=(0.3,0.9))
 99   one_down = Variable('one_down', {'book', 'buys', 'hold', 'lane', 'year'},
          position=(0.1,0.7))
100   two_down = Variable('two_down', {'ginger', 'search', 'symbol', 'syntax'},
          position=(0.9,0.8))
101   three_across = Variable('three_across', {'book', 'buys', 'hold', 'land',
          'year'}, position=(0.1,0.3))
102   four_across = Variable('four_across',{'ant', 'big', 'bus', 'car', 'has'},
          position=(0.7,0.0))
103   crossword1 = CSP("crossword1",
104                   {one_across, one_down, two_down, three_across,
                          four_across},
105                   [Constraint([one_across,one_down], meet_at(0,0)),
106                    Constraint([one_across,two_down], meet_at(2,0)),
107                    Constraint([three_across,two_down], meet_at(2,2)),
108                    Constraint([three_across,one_down], meet_at(0,2)),
109                    Constraint([four_across,two_down], meet_at(0,4))])
          In an alternative representation of a crossword (the “dual” representation),
      the variables represent letters, and the constraints are that adjacent sequences
      of letters form words. This is shown in Figure 4.7.
                                   cspExamples.py — (continued)
                                                                    crossword1d
                                        is_word[p00, p10, p20]
p01 p21
p03 p23
p25
152 ])
      Exercise 4.1 How many assignments of a value to each variable are there for
      each of the representations of the above crossword? Do you think an exhaustive
      enumeration will work for either one?
          The queens problem is a puzzle on a chess board, where the idea is to place
      a queen on each column so the queens cannot take each other: there are no
      two queens on the same row, column or diagonal. The n-queens problem is a
      generalization where the size of the board is an n × n, and n queens have to be
      placed.
          Here is a representation of the n-queens problem, where the variables are
      the columns and the values are the rows in which the queen is placed. The
      original queens problem on a standard (8 × 8) chess board is n_queens(8)
                                   cspExamples.py — (continued)
      Exercise 4.2 How many constraints does this representation of the n-queens
      problem produce? Can it be done with fewer constraints? Either explain why it
      can’t be done with fewer constraints, or give a solution using fewer constraints.
      Unit tests
      The following defines a unit test for csp solvers, by default using example csp1.
                                   cspExamples.py — (continued)
      Exercise 4.3 Modify test so that instead of taking in a list of solutions, it checks
      whether the returned solution actually is a solution.
      Exercise 4.4 Propose a test that is appropriate for CSPs with no solutions. As-
      sume that the test designer knows there are no solutions. Consider what a CSP
      solver should return if there are no solutions to the CSP.
      Exercise 4.5 Write a unit test that checks whether all solutions (e.g., for the search
      algorithms that can return multiple solutions) are correct, and whether all solu-
      tions can be found.
     Exercise 4.6 Instead of testing all constraints at every node, change it so each
     constraint is only tested when all of it variables are assigned. Given an elimina-
     tion ordering, it is possible to determine when each constraint needs to be tested.
     Implement this. Hint: create a parallel list of sets of constraints, where at each po-
     sition i in the list, the constraints at position i can be evaluated when the variable
     at position i has been assigned.
     Exercise 4.7 Estimate how long dfs_solve_all(crossword1d) will take on your
     computer. To do this, reduce the number of variables that need to be assigned,
     so that the simplifies problem can be solved in a reasonable time (between 0.1
     second and 10 seconds). This can be done by reducing the number of variables in
     var_order, as the program only splits on these. How much more time will it take
     if the number of variables is increased by 1? (Try it!) Then extrapolate to all of the
     variables. See Section 1.6.1 for how to time your code. Would making the code 100
     times faster or using a computer 100 times faster help?
        The next solver constructs a search space that can be solved using the search
     methods of the previous chapter. This takes in a CSP problem and an optional
     variable ordering, which is a list of the variables in the CSP. In this search space:
          • A node is a variable : value dictionary which does not violate any con-
            straints (so that dictionaries that violate any conmtratints are not added).
         The neighbors(node) method uses the fact that the length of the node, which
     is the number of variables already assigned, is the index of the next variable to
     split on. Note that we do no need to check whether there are no more variables
     to split on, as the nodes are all consistent, by construction, and so when there
     are no more variables we have a solution, and so don’t need the neighbors.
cspSearch.py — (continued)
        The unit tests relies on a solver. The following procedure creates a solver
     using search that can be tested.
                                           cspSearch.py — (continued)
48   import cspExamples
49   from searchGeneric import Searcher
50
51   def solver_from_searcher(csp):
52       """depth-first search solver"""
53       path = Searcher(Search_from_CSP(csp)).search()
54       if path is not None:
55           return path.end()
56       else:
57           return None
58
59   if __name__ == "__main__":
60       test_csp(solver_from_searcher)
61
62   ## Test Solving CSPs with Search:
63   searcher1 = Searcher(Search_from_CSP(cspExamples.csp1))
64   #print(searcher1.search()) # get next solution
65   searcher2 = Searcher(Search_from_CSP(cspExamples.csp2))
66   #print(searcher2.search()) # get next solution
67   searcher3 = Searcher(Search_from_CSP(cspExamples.crossword1))
68   #print(searcher3.search()) # get next solution
69   searcher4 = Searcher(Search_from_CSP(cspExamples.crossword1d))
70   #print(searcher4.search()) # get next solution (warning: slow)
     Exercise 4.8 What would happen if we constructed the new assignment by as-
     signing node[var] = val (with side effects) instead of using dictionary union? Give
     an example of where this could give a wrong answer. How could the algorithm be
     changed to work with side effects? (Hint: think about what information needs to
     be in a node).
     Exercise 4.9 Change neighbors so that it returns an iterator of values rather than
     a list. (Hint: use yield.)
56
57        def new_to_do(self, var, const):
58            """returns new elements to be added to to_do after assigning
59            variable var in constraint const.
60            """
61            return {(nvar, nconst) for nconst in self.csp.var_to_const[var]
62                   if nconst != const
63                   for nvar in nconst.scope
64                   if nvar != var}
     The following selects an arc. Any element of to do can be selected. The se-
     lected element needs to be removed from to do. The default implementation
     just selects which ever element pop method for sets returns. The graphical user
     interface below allows the user to select an arc. Alternatively, a more sophisti-
     cated selection could be employed.
                                   cspConsistency.py — (continued)
         The value of new_domain is the subset of the domain of var that is consistent
     with the assignment to the other variables. To make it easier to understand, the
     following treats unary (with no other variables in the constraint) and binary
     (with one other variables in the constraint) constraints as special cases. These
     cases are not strictly necessary; the last case covers the first two cases, but is
     more difficult to understand without seeing the first two cases. Note that this
     case analysis is not in the code distribution, but can replace the assignment to
     new_domain above.
                   if len(other_vars)==0:          # unary constraint
                       new_domain = {val for val in self.domains[var]
                                      if const.holds({var:val})}
                   elif len(other_vars)==1:        # binary constraint
                       other = other_vars[0]
                       new_domain = {val for val in self.domains[var]
                               if any(const.holds({var: val,other:other_val})
                                       for other_val in self.domains[other])}
                   else:                           # general case
                       new_domain = {val for val in self.domains[var]
                           if self.any_holds(self.domains, const, {var: val}, other_vars)}
     any holds is a recursive function that tries to finds an assignment of values to the
     other variables (other vars) that satisfies constraint const given the assignment
     in env. The integer variable ind specifies which index to other vars needs to be
      checked next. As soon as one assignment returns True, the algorithm returns
      True.
cspConsistency.py — (continued)
cspConsistency.py — (continued)
cspConsistency.py — (continued)
      Exercise 4.10 Implement solve all that returns the set of all solutions without
      using yield. Hint: it can be like generate_sols but returns a set of solutions; the
      recursive calls can be unioned; | is Python’s union.
      Exercise 4.11 Implement solve one that returns one solution if one exists, or False
      otherwise, without using yield. Hint: Python’s “or” has the behaviour A or B
      will return the value of A unless it is None or False, in which case the value of B is
      returned.
           Unit test:
                                    cspConsistency.py — (continued)
A<D
D != E C != E
                           Auto AC                                    E
                                                                 {1, 2, 3, 4}
14   class ConsistencyGUI(Con_solver):
15       def __init__(self, csp, fontsize=10, speed=1, **kwargs):
16           """
17           csp is the csp to show
18           fontsize is the size of the text
19           speed is the number of animations per second (controls delay_time)
20                1 (slow) and 4 (fast) seem like good values
21           """
22           self.fontsize = fontsize
23           self.delay_time = 1/speed
24           Con_solver.__init__(self, csp, **kwargs)
25           csp.show(showAutoAC = True)
26
27        def go(self):
28            res = self.solve_all()
29            self.csp.draw_graph(domains=self.domains,
30                                  title="No more solutions. GUI finished. ",
31                                  fontsize=self.fontsize)
32            return res
33
34        def select_arc(self, to_do):
35            while True:
36                self.csp.draw_graph(domains=self.domains, to_do=to_do,
37                                      title="click on to_do (blue) arc",
                                            fontsize=self.fontsize)
38                while self.csp.picked == None and not self.csp.autoAC:
39                    plt.pause(0.01) # controls reaction time of GUI
40                if self.csp.autoAC:
41                    break
42                picked = self.csp.picked
43                self.csp.picked = None
44                if picked in to_do:
45                    to_do.remove(picked)
46                    print(f"{picked} picked")
47                    return picked
48                else:
49                    print(f"{picked} not in to_do")
50            if self.csp.autoAC:
51                self.csp.draw_graph(domains=self.domains, to_do=to_do,
52                                      title="Auto AC", fontsize=self.fontsize)
53                plt.pause(self.delay_time)
54                return to_do.pop()
55
56        def select_var(self, iter_vars):
57            vars = list(iter_vars)
58            while True:
59                self.csp.draw_graph(domains=self.domains,
60                                      title="Arc consistent. Click node to
                                            split",
61                                      fontsize=self.fontsize)
      Exercise 4.12 When splitting a domain, this code splits the domain into half,
      approximately in half (without any effort to make a sensible choice). Does it work
      better to split one element from a domain?
           Unit test:
                                     cspConsistency.py — (continued)
195   ## Test Solving CSPs with Arc consistency and domain splitting:
196   #Con_solver.max_display_level = 4 # display details of AC (0 turns off)
197   #Con_solver(cspExamples.csp1).solve_all()
198   #searcher1d = Searcher(Search_with_AC_from_CSP(cspExamples.csp1))
199   #print(searcher1d.search())
200   #Searcher.max_display_level = 2 # display search trace (0 turns off)
201   #searcher2c = Searcher(Search_with_AC_from_CSP(cspExamples.csp2))
202   #print(searcher2c.search())
203   #searcher3c = Searcher(Search_with_AC_from_CSP(cspExamples.crossword1))
204   #print(searcher3c.search())
205   #searcher4c = Searcher(Search_with_AC_from_CSP(cspExamples.crossword1d))
206   #print(searcher4c.search())
          The following code implements the two-stage choice (select one of the vari-
      ables that are involved in the most constraints that are violated, then a value),
      the any-conflict algorithm (select a variable that participates in a violated con-
      straint) and a random choice of variable, as well as a probabilistic mix of the
      three.
          Given a CSP, the stochastic local searcher (SLSearcher) creates the data struc-
      tures:
         • variables to select is the set of all of the variables with domain-size greater
           than one. For a variable not in this set, we cannot pick another value from
           that variable.
         • var to constraints maps from a variable into the set of constraints it is in-
           volved in. Note that the inverse mapping from constraints into variables
           is part of the definition of a constraint.
     restart creates a new total assignment, and constructs the set of conflicts (the
     constraints that are false in this assignment).
cspSLS.py — (continued)
29        def restart(self):
30            """creates a new total assignment and the conflict set
31            """
32            self.current_assignment = {var:random_choice(var.domain) for
33                                     var in self.csp.variables}
34            self.display(2,"Initial assignment",self.current_assignment)
35            self.conflicts = set()
36            for con in self.csp.constraints:
37                if not con.holds(self.current_assignment):
38                    self.conflicts.add(con)
39            self.display(2,"Number of conflicts",len(self.conflicts))
40            self.variable_pq = None
          The search method is the top-level searching algorithm. It can either be used
     to start the search or to continue searching. If there is no current assignment,
     it must create one. Note that, when counting steps, a restart is counted as one
     step, which is not appropriate for CSPs with many variables, as it is a relatively
     expensive operation for these cases.
          This method selects one of two implementations. The argument pob best
     is the probability of selecting a best variable (one involving the most conflicts).
     When the value of prob best is positive, the algorithm needs to maintain a prior-
     ity queue of variables and the number of conflicts (using search with var pq). If
     the probability of selecting a best variable is zero, it does not need to maintain
     this priority queue (as implemented in search with any conflict).
          The argument prob anycon is the probability that the any-conflict strategy is
     used (which selects a variable at random that is in a conflict), assuming that
     it is not picking a best variable. Note that for the probability parameters, any
     value less that zero acts like probability zero and any value greater than 1 acts
     like probability 1. This means that when prob anycon = 1.0, a best variable is
     chosen with probability prob best, otherwise a variable in any conflict is chosen.
     A variable is chosen at random with probability 1 − prob anycon − prob best as
     long as that is positive.
         This returns the number of steps needed to find a solution, or None if no
     solution is found. If there is a solution, it is in self .current assignment.
                                      cspSLS.py — (continued)
     Exercise 4.13 This does an initial random assignment but does not do any ran-
     dom restarts. Implement a searcher that takes in the maximum number of walk
     steps (corresponding to existing max steps) and the maximum number of restarts,
     and returns the total number of steps for the first solution found. (As in search, the
     solution found can be extracted from the variable self .current assignment).
     4.5.1 Any-conflict
     If the probability of picking a best variable is zero, the implementation need to
     keeps track of which variables are in conflicts.
                                      cspSLS.py — (continued)
     Exercise 4.14 This makes no attempt to find the best alternative value for a vari-
     able. Modify the code so that after selecting a variable it selects a value the reduces
     the number of conflicts by the most. Have a parameter that specifies the probabil-
     ity that the best value is chosen.
      updates. The change is recorded in the dictionary var differential, which is used
      to update the priority queue (see Section 4.5.3).
                                     cspSLS.py — (continued)
cspSLS.py — (continued)
cspSLS.py — (continued)
      Exercise 4.15 This makes no attempt to find the best alternative value for a vari-
      able. Modify the code so that after selecting a variable it selects a value the reduces
      the number of conflicts by the most. Have a parameter that specifies the probabil-
      ity that the best value is chosen.
      Exercise 4.16 These implementations always select a value for the variable se-
      lected that is different from its current value (if that is possible). Change the code
      so that it does not have this restriction (so it can leave the value the same). Would
      you expect this code to be faster? Does it work worse (or better)?
cspSLS.py — (continued)
204            """
205            for elt,incr in update_dict.items():
206                if incr != 0:
207                    newval = self.elt_map.get(elt,[0])[0] - incr
208                    assert newval <= 0, f"{elt}:{newval+incr}-{incr}"
209                    self.remove(elt)
210                    if newval != 0:
211                        self.add(elt,newval)
212
213         def pop(self):
214             """Removes and returns the (elt,value) pair with minimal value.
215             If the priority queue is empty, IndexError is raised.
216             """
217             self.max_size = max(self.max_size, len(self.pq)) # keep statistics
218             triple = heapq.heappop(self.pq)
219             while triple[2] == self.REMOVED:
220                 triple = heapq.heappop(self.pq)
221             del self.elt_map[triple[2]]
222             return triple[2], triple[0] # elt, value
223
224         def top(self):
225             """Returns the (elt,value) pair with minimal value, without
                    removing it.
226             If the priority queue is empty, IndexError is raised.
227             """
228             self.max_size = max(self.max_size, len(self.pq)) # keep statistics
229             triple = self.pq[0]
230             while triple[2] == self.REMOVED:
231                 heapq.heappop(self.pq)
232                 triple = self.pq[0]
233             return triple[2], triple[0] # elt, value
234
235         def empty(self):
236             """returns True iff the priority queue is empty"""
237             return all(triple[2] == self.REMOVED for triple in self.pq)
      4.5.5 Testing
                                     cspSLS.py — (continued)
600
400
200
                                             0
                                                  100                 101                     102          103
                                                                            Number of Steps
     plot the average time for each run. Before you start, try to estimate the total run
     time, so you will be able to tell if there is a problem with the algorithm stopping.
cspSoft.py — (continued)
cspSoft.py — (continued)
 86           """finds the optimal solution that extends path and is less the
                  bound"""
 87           self.display(2,"cbsearch:",asst,cost,constraints)
 88           can_eval = [c for c in constraints if c.can_evaluate(asst)]
 89           rem_cons = [c for c in constraints if c not in can_eval]
 90           newcost = cost + sum(c.value(asst) for c in can_eval)
 91           self.display(2,"Evaluaing:",can_eval,"cost:",newcost)
 92           if newcost < self.bound:
 93               self.num_expanded += 1
 94               if rem_cons==[]:
 95                   self.best_asst = asst
 96                   self.bound = newcost
 97                   self.display(1,"New best assignment:",asst," cost:",newcost)
 98               else:
 99                   var = next(var for var in self.csp.variables if var not in
                          asst)
100                   for val in var.domain:
101                       self.cbsearch({var:val}|asst, newcost, rem_cons)
102
103   # bnb = DF_branch_and_bound_opt(scsp1)
104   # bnb.max_display_level=3 # show more detail
105   # bnb.optimize()
      Exercise 4.18 Change the stochastic-local search algorithms to work for soft con-
      straints. Hint: The analog of a conflict is a soft constraint that is not at its lowest
      value. Instead of the number of constraints violated, consider how much a change
      in a variable affects the objective function. Instead of returning a solution, return
      the best assignment found.
     An askable atom can be asked of the user. The user can respond in English or
     French or just with a “y”.
                                   logicProblem.py — (continued)
27   class Askable(object):
28       """An askable atom"""
29
30      def __init__(self,atom):
                                               107
     108                                                       5. Propositions and Inference
      Here is a trivial example (I think therefore I am) using in the unit tests:
                                    logicProblem.py — (continued)
 74   triv_KB = KB([
 75       Clause('i_am', ['i_think']),
 76       Clause('i_think'),
 77       Clause('i_smell', ['i_exist'])
 78       ])
      Here is a representation of the electrical domain of the textbook:
                                    logicProblem.py — (continued)
 80   elect = KB([
 81       Clause('light_l1'),
 82       Clause('light_l2'),
 83       Clause('ok_l1'),
 84       Clause('ok_l2'),
 85       Clause('ok_cb1'),
 86       Clause('ok_cb2'),
 87       Clause('live_outside'),
 88       Clause('live_l1', ['live_w0']),
 89       Clause('live_w0', ['up_s2','live_w1']),
 90       Clause('live_w0', ['down_s2','live_w2']),
 91       Clause('live_w1', ['up_s1', 'live_w3']),
 92       Clause('live_w2', ['down_s1','live_w3' ]),
 93       Clause('live_l2', ['live_w4']),
 94       Clause('live_w4', ['up_s3','live_w3' ]),
 95       Clause('live_p_1', ['live_w3']),
 96       Clause('live_w3', ['live_w5', 'ok_cb1']),
 97       Clause('live_p_2', ['live_w6']),
 98       Clause('live_w6', ['live_w5', 'ok_cb2']),
 99       Clause('live_w5', ['live_outside']),
100       Clause('lit_l1', ['light_l1', 'live_l1', 'ok_l1']),
101       Clause('lit_l2', ['light_l2', 'live_l2', 'ok_l2']),
102       Askable('up_s1'),
103       Askable('down_s1'),
104       Askable('up_s2'),
105       Askable('down_s2'),
106       Askable('up_s3'),
107       Askable('down_s2')
108       ])
109
110   # print(kb)
      The following knowledge base is false of the intended interpretation. One of
      the clauses is wrong; can you see which one? We will show how to debug it.
                                    logicProblem.py — (continued)
115         Clause('ok_cb1'),
116         Clause('ok_cb2'),
117         Clause('live_outside'),
118         Clause('live_p_2', ['live_w6']),
119         Clause('live_w6', ['live_w5', 'ok_cb2']),
120         Clause('light_l1'),
121         Clause('live_w5', ['live_outside']),
122         Clause('lit_l1', ['light_l1', 'live_l1', 'ok_l1']),
123         Clause('lit_l2', ['light_l2', 'live_l2', 'ok_l2']),
124         Clause('live_l1', ['live_w0']),
125         Clause('live_w0', ['up_s2','live_w1']),
126         Clause('live_w0', ['down_s2','live_w2']),
127         Clause('live_w1', ['up_s3', 'live_w3']),
128         Clause('live_w2', ['down_s1','live_w3' ]),
129         Clause('live_l2', ['live_w4']),
130         Clause('live_w4', ['up_s3','live_w3' ]),
131         Clause('live_p_1', ['live_w3']),
132         Clause('live_w3', ['live_w5', 'ok_cb1']),
133         Askable('up_s1'),
134         Askable('down_s1'),
135         Askable('up_s2'),
136         Clause('light_l2'),
137         Clause('ok_l1'),
138         Clause('light_l2'),
139         Clause('ok_l1'),
140         Clause('ok_l2'),
141         Clause('ok_cb1'),
142         Clause('ok_cb2'),
143         Clause('live_outside'),
144         Clause('live_p_2', ['live_w6']),
145         Clause('live_w6', ['live_w5', 'ok_cb2']),
146         Clause('ok_l2'),
147         Clause('ok_cb1'),
148         Clause('ok_cb2'),
149         Clause('live_outside'),
150         Clause('live_p_2', ['live_w6']),
151         Clause('live_w6', ['live_w5', 'ok_cb2']),
152         Askable('down_s2'),
153         Askable('up_s3'),
154         Askable('down_s2')
155         ])
156
157   # print(kb)
     Exercise 5.1 It is not very user-friendly to ask all of the askables up-front. Imple-
     ment ask-the-user so that questions are only asked if useful, and are not re-asked.
     For example, if there is a clause h ← a ∧ b ∧ c ∧ d ∧ e, where c and e are askable,
     c and e only need to be asked if a, b, d are all in fp and they have not been asked
     before. Askable e only needs to be asked if the user says “yes” to c. Askable c
     doesn’t need to be asked if the user previously replied “no” to e.
         This form of ask-the-user can ask a different set of questions than the top-
     down interpreter that asks questions when encountered. Give an example where
     they ask different questions (neither set of questions asked is a subset of the other).
     Exercise 5.2 This algorithm runs in time O(n2 ), where n is the number of clauses,
     for a bounded number of elements in the body; each iteration goes through each
     of the clauses, and in the worst case, it will do an iteration for each clause. It is
     possible to implement this in time O(n) time by creating an index that maps an
     atom to the set of clauses with that atom in the body. Implement this. What is its
     Exercise 5.4 This code can re-ask a question multiple times. Implement this code
     so that it only asks a question once and remembers the answer. Also implement a
     function to forget the answers.
     Exercise 5.5 What search method is this using? Implement the search interface
     so that it can use A∗ or other searching methods. Define an admissible heuristic
     that is not always 0.
The following provides a simple unit test that is hard wired for triv_KB:
logicExplain.py — (continued)
logicExplain.py — (continued)
 72      going = True
 73      ups = [] # stack for going up
 74      proof="fail" # there is no proof to start
 75      while going:
 76          inp = input("logicExplain: ")
 77          inps = inp.split(" ")
 78          try:
 79              command = inps[0]
 80              if command == "quit":
 81                  going = False
 82              elif command == "ask":
 83                  proof = prove_atom(kb, inps[1])
 84                  if proof == "fail":
 85                      print("fail")
 86                  else:
 87                      print("yes")
 88              elif command == "how":
 89                  if proof=="fail":
 90                      print("there is no proof")
 91                  elif len(inps)==1:
 92                     print_rule(proof)
 93                  else:
 94                      try:
 95                          ups.append(proof)
 96                          proof = proof[1][int(inps[1])] #nth argument of rule
 97                          print_rule(proof)
 98                      except:
 99                          print('In "how n", n must be a number between 0
                                 and',len(proof[1])-1,"inclusive.")
100              elif command == "up":
101                  if ups:
102                      proof = ups.pop()
103                  else:
104                      print("No rule to go up to.")
105                  print_rule(proof)
106              elif command == "kb":
107                   print(kb)
108              elif command == "help":
109                  print(helptext)
110              else:
111                  print("unknown command:", inp)
112                  print("use help for help")
113          except:
114              print("unknown command:", inp)
115              print("use help for help")
116
117   def print_rule(proof):
118       (head,body) = proof
119       if body == "answered":
120           print(head,"was answered yes")
     or preferably using yield). Add the command ”retry” to the user interface to try
     another proof.
     5.5      Assumables
     Atom a can be made assumable by including Assumable(a) in the knowledge
     base. A knowledge base that can include assumables is declared with KBA.
     The top-down Horn clause interpreter, prove all ass returns a list of the sets of
     assumables that imply ans body. This list will contain all of the minimal sets
     of assumables, but can also find non-minimal sets, and repeated sets, if they
     can be generated with separate proofs. The set assumed is the set of assumables
     already assumed.
logicAssumables.py — (continued)
43                        return [] # no answers
44                elif selected in self.assumables:
45                    return self.prove_all_ass(ans_body[1:],assumed|{selected})
46                else:
47                    return [ass
48                            for cl in self.clauses_for_atom(selected)
49                            for ass in
                                  self.prove_all_ass(cl.body+ans_body[1:],assumed)
50                               ] # union of answers for each clause with
                                     head=selected
51            else:                # empty body
52                return [assumed] # one answer
53
54         def conflicts(self):
55             """returns a list of minimal conflicts"""
56             return minsets(self.prove_all_ass(['false']))
        Given a list of sets, minsets returns a list of the minimal sets in the list. For
     example, minsets([{2, 3, 4}, {2, 3}, {6, 2, 3}, {2, 3}, {2, 4, 5}]) returns [{2, 3}, {2, 4, 5}].
                                     logicAssumables.py — (continued)
58   def minsets(ls):
59       """ls is a list of sets
60       returns a list of minimal sets in ls
61       """
62       ans = []    # elements known to be minimal
63       for c in ls:
64           if not any(c1<c for c1 in ls) and not any(c1 <= c for c1 in ans):
65               ans.append(c)
66       return ans
67
68   # minsets([{2, 3, 4}, {2, 3}, {6, 2, 3}, {2, 3}, {2, 4, 5}])
     Warning: minsets works for a list of sets or for a set of (frozen) sets, but it does
     not work for a generator of sets (because ls is references in the loop). For
     example, try to predict and then test:
     minsets(e for e in [{2, 3, 4}, {2, 3}, {6, 2, 3}, {2, 3}, {2, 4, 5}])
        The diagnoses can be constructed from the (minimal) conflicts as follows.
     This also works if there are non-minimal conflicts, but is not as efficient.
                                     logicAssumables.py — (continued)
69   def diagnoses(cons):
70       """cons is a list of (minimal) conflicts.
71       returns a list of diagnoses."""
72       if cons == []:
73           return [set()]
74       else:
75           return minsets([({e}|d)            # | is set union
76                        for e in cons[0]
77                        for d in diagnoses(cons[1:])])
      Test cases:
                                    logicAssumables.py — (continued)
 80   electa = KBA([
 81       Clause('light_l1'),
 82       Clause('light_l2'),
 83       Assumable('ok_l1'),
 84       Assumable('ok_l2'),
 85       Assumable('ok_s1'),
 86       Assumable('ok_s2'),
 87       Assumable('ok_s3'),
 88       Assumable('ok_cb1'),
 89       Assumable('ok_cb2'),
 90       Assumable('live_outside'),
 91       Clause('live_l1', ['live_w0']),
 92       Clause('live_w0', ['up_s2','ok_s2','live_w1']),
 93       Clause('live_w0', ['down_s2','ok_s2','live_w2']),
 94       Clause('live_w1', ['up_s1', 'ok_s1', 'live_w3']),
 95       Clause('live_w2', ['down_s1', 'ok_s1','live_w3' ]),
 96       Clause('live_l2', ['live_w4']),
 97       Clause('live_w4', ['up_s3','ok_s3','live_w3' ]),
 98       Clause('live_p_1', ['live_w3']),
 99       Clause('live_w3', ['live_w5', 'ok_cb1']),
100       Clause('live_p_2', ['live_w6']),
101       Clause('live_w6', ['live_w5', 'ok_cb2']),
102       Clause('live_w5', ['live_outside']),
103       Clause('lit_l1', ['light_l1', 'live_l1', 'ok_l1']),
104       Clause('lit_l2', ['light_l2', 'live_l2', 'ok_l2']),
105       Askable('up_s1'),
106       Askable('down_s1'),
107       Askable('up_s2'),
108       Askable('down_s2'),
109       Askable('up_s3'),
110       Askable('down_s2'),
111       Askable('dark_l1'),
112       Askable('dark_l2'),
113       Clause('false', ['dark_l1', 'lit_l1']),
114       Clause('false', ['dark_l2', 'lit_l2'])
115       ])
116   # electa.prove_all_ass(['false'])
117   # cs=electa.conflicts()
118   # print(cs)
119   # diagnoses(cs)       # diagnoses from conflicts
     5.6        Negation-as-failure
     The negation af an atom a is written as Not(a) in a body.
                             logicNegation.py — Propositional negation-as-failure
11   from logicProblem import KB, Clause, Askable, yes
12
13   class Not(object):
14       def __init__(self, atom):
15           self.theatom = atom
16
17         def atom(self):
18             return self.theatom
19
20         def __repr__(self):
21             return f"Not({self.theatom})"
     Prove with negation-as-failure (prove_naf) is like prove, but with the extra case
     to cover Not:
                                       logicNegation.py — (continued)
48   triv_KB_naf = KB([
49       Clause('i_am', ['i_think']),
50       Clause('i_think'),
51       Clause('i_smell', ['i_am', Not('dead')]),
52       Clause('i_bad', ['i_am', Not('i_think')])
53       ])
54
55   triv_KB_naf.max_display_level = 4
56   def test():
57       a1 = prove_naf(triv_KB_naf,['i_smell'])
58       assert a1, f"triv_KB_naf proving i_smell gave {a1}"
59       a2 = prove_naf(triv_KB_naf,['i_bad'])
60       assert not a2, f"triv_KB_naf proving i_bad gave {a2}"
61       print("Passed unit tests")
62   if __name__ == "__main__":
63       test()
     Default reasoning about beaches at resorts (Example 5.28 of Poole and Mack-
     worth [2023]):
                                   logicNegation.py — (continued)
65   beach_KB = KB([
66      Clause('away_from_beach', [Not('on_beach')]),
67      Clause('beach_access', ['on_beach', Not('ab_beach_access')]),
68      Clause('swim_at_beach', ['beach_access', Not('ab_swim_at_beach')]),
69      Clause('ab_swim_at_beach', ['enclosed_bay', 'big_city',
            Not('ab_no_swimming_near_city')]),
70      Clause('ab_no_swimming_near_city', ['in_BC', Not('ab_BC_beaches')])
71       ])
72
73   #   prove_naf(beach_KB, ['away_from_beach'])
74   #   prove_naf(beach_KB, ['beach_access'])
75   #   beach_KB.add_clause(Clause('on_beach',[]))
76   #   prove_naf(beach_KB, ['away_from_beach'])
77   #   prove_naf(beach_KB, ['swim_at_beach'])
78   #   beach_KB.add_clause(Clause('enclosed_bay',[]))
79   #   prove_naf(beach_KB, ['swim_at_beach'])
80   #   beach_KB.add_clause(Clause('big_city',[]))
81   #   prove_naf(beach_KB, ['swim_at_beach'])
82   #   beach_KB.add_clause(Clause('in_BC',[]))
83   #   prove_naf(beach_KB, ['swim_at_beach'])
Deterministic Planning
        • effects: a dictionary of feature:value pairs that are made true by this action.
          In particular, a feature in the dictionary has the corresponding value (and
          not its previous value) after the action, and a feature not in the dictionary
          keeps its old value.
                                                  123
     124                                                           6. Deterministic Planning
23            self.name = name
24            self.preconds = preconds
25            self.effects = effects
26            self.cost = cost
27
28         def __repr__(self):
29             return self.name
• A set of actions.
        • A dictionary that maps each feature into a set of possible values for the
          feature.
stripsProblem.py — (continued)
31   class STRIPS_domain(object):
32       def __init__(self, feature_domain_dict, actions):
33           """Problem domain
34           feature_domain_dict is a feature:domain dictionary,
35                   mapping each feature to its domain
36           actions
37           """
38           self.feature_domain_dict = feature_domain_dict
39           self.actions = actions
stripsProblem.py — (continued)
41   class Planning_problem(object):
42       def __init__(self, prob_domain, initial_state, goal):
43           """
44           a planning problem consists of
45           * a planning domain
46           * the initial state
47           * a goal
48           """
49           self.prob_domain = prob_domain
50           self.initial_state = initial_state
51           self.goal = goal
                      Coffee
                      Shop
                       (cs)                                          Sam's
                                                                     Office
                                                                      (off )
                                      Mail                        Lab
                                     Room                         (lab)
                                     (mr )
stripsProblem.py — (continued)
stripsProblem.py — (continued)
                          b          move(b,c,a)                b
                 a        c                                      a          c
move(b,c,table)
a c b
71   problem0 = Planning_problem(delivery_domain,
72                            {'RLoc':'lab', 'MW':True,     'SWC':True, 'RHC':False,
73                             'RHM':False},
74                            {'RLoc':'off'})
75   problem1 = Planning_problem(delivery_domain,
76                            {'RLoc':'lab', 'MW':True,     'SWC':True, 'RHC':False,
77                             'RHM':False},
78                            {'SWC':False})
79   problem2 = Planning_problem(delivery_domain,
80                            {'RLoc':'lab', 'MW':True,     'SWC':True, 'RHC':False,
81                             'RHM':False},
82                            {'SWC':False, 'MW':False,     'RHM':False})
                                                                                      c
                                 b
                                                                                      a
                     a            c
                                                                                      b
27   def zero(*args,**nargs):
28       """always returns 0"""
29       return 0
30
31   class Forward_STRIPS(Search_problem):
32       """A search problem from a planning problem where:
33       * a node is a state object.
34       * the dynamics are specified by the STRIPS representation of actions
35       """
36       def __init__(self, planning_problem, heur=zero):
37           """creates a forward search space from a planning problem.
38           heur(state,goal) is a heuristic function,
39              an underestimate of the cost from state to goal, where
40              both state and goals are feature:value dictionaries.
41           """
42           self.prob_domain = planning_problem.prob_domain
43           self.initial_state = State(planning_problem.initial_state)
44           self.goal = planning_problem.goal
45           self.heur = heur
46
47      def is_goal(self, state):
48          """is True if node is a goal.
49
50          Every goal feature has the same value in the state and the goal."""
51          return all(state.assignment[prop]==self.goal[prop]
stripsForwardPlanner.py — (continued)
21   def h1(state,goal):
22       """ the distance to the goal location, if there is one"""
23       if 'RLoc' in goal:
24           return dist(state['RLoc'], goal['RLoc'])
25       else:
26           return 0
27
28   def h2(state,goal):
29       """ the distance to the coffee shop plus getting coffee and delivering
             it
30       if the robot needs to get coffee
31       """
32       if ('SWC' in goal and goal['SWC']==False
33               and state['SWC']==True
34               and state['RHC']==False):
35           return dist(state['RLoc'],'cs')+3
36       else:
37           return 0
     The maximum of the values of a set of admissible heuristics is also an admis-
     sible heuristic. The function maxh takes a number of heuristic functions as ar-
     guments, and returns a new heuristic function that takes the maximum of the
     values of the heuristics. For example, h1 and h2 are heuristic functions and so
     maxh(h1,h2) is also. maxh can take an arbitrary number of arguments.
stripsHeuristic.py — (continued)
39   def maxh(*heuristics):
40       """Returns a new heuristic function that is the maximum of the
             functions in heuristics.
41       heuristics is the list of arguments which must be heuristic functions.
42       """
43       # return lambda state,goal: max(h(state,goal) for h in heuristics)
44       def newh(state,goal):
45           return max(h(state,goal) for h in heuristics)
46       return newh
The following runs the example with and without the heuristic.
stripsHeuristic.py — (continued)
     Exercise 6.4 For more than one start-state/goal combination, test the forward
     planner with a heuristic function of just h1, with just h2 and with both. Explain
     why each one prunes or doesn’t prune the search space.
     Exercise 6.5 Create a better heuristic than maxh(h1, h2). Try it for a number of
     different problems. In particular, try and include the following costs:
        i) h3 is like h2 but also takes into account the case when Rloc is in goal.
       ii) h4 uses the distance to the mail room plus getting mail and delivering it if
           the robot needs to get need to deliver mail.
      iii) h5 is for getting mail when goal is for the robot to have mail, and then getting
           to the goal destination (if there is one).
44
45         def is_goal(self, subgoal):
46             """if subgoal is true in the initial state, a path has been found"""
47             goal_asst = subgoal.assignment
48             return all(self.initial_state[g]==goal_asst[g]
49                       for g in goal_asst)
50
51         def start_node(self):
52             """the start node is the top-level goal"""
53             return self.top_goal
54
55         def neighbors(self,subgoal):
56             """returns a list of the arcs for the neighbors of subgoal in this
                   problem"""
57             goal_asst = subgoal.assignment
58             return [ Arc(subgoal, self.weakest_precond(act,goal_asst),
                   act.cost, act)
59                     for act in self.prob_domain.actions
60                     if self.possible(act,goal_asst)]
61
62         def possible(self,act,goal_asst):
63             """True if act is possible to achieve goal_asst.
64
65            the action achieves an element of the effects and
66            the action doesn't delete something that needs to be achieved and
67            the preconditions are consistent with other subgoals that need to
                  be achieved
68            """
69            return ( any(goal_asst[prop] == act.effects[prop]
70                       for prop in act.effects if prop in goal_asst)
71                   and all(goal_asst[prop] == act.effects[prop]
72                           for prop in act.effects if prop in goal_asst)
73                   and all(goal_asst[prop]== act.preconds[prop]
74                           for prop in act.preconds if prop not in act.effects
                                 and prop in goal_asst)
75                   )
76
77         def weakest_precond(self,act,goal_asst):
78             """returns the subgoal that must be true so goal_asst holds after
                   act
79             should be: act.preconds | (goal_asst - act.effects)
80             """
81             new_asst = act.preconds.copy()
82             for g in goal_asst:
83                 if g not in act.effects:
84                     new_asst[g] = goal_asst[g]
85             return Subgoal(new_asst)
86
87         def heuristic(self,subgoal):
88             """in the regression planner a node is a subgoal.
stripsRegressionPlanner.py — (continued)
     Exercise 6.7 Multiple path pruning could be used to prune more than the current
     code. In particular, if the current node contains more conditions than a previously
     visited node, it can be pruned. For example, if {a : True, b : False} has been visited,
     then any node that is a superset, e.g., {a : True, b : False, d : True}, need not be
     expanded. If the simpler subgoal does not lead to a solution, the more complicated
     one wont either. Implement this more severe pruning. (Hint: This may require
     modifications to the searcher.)
     Exercise 6.8 It is possible that, as knowledge of the domain, that some as-
     signment of values to variables can never be achieved. For example, the robot
     cannot be holding mail when there is mail waiting (assuming it isn’t holding
     mail initially). An assignment of values to (some of the) variables is incompat-
     ible if no possible (reachable) state can include that assignment. For example,
     {′ MW ′ : True,′ RHM′ : True} is an incompatible assignment. This information may
     be useful information for a planner; there is no point in trying to achieve these
     together. Define a subclass of STRIPS domain that can accept a list of incompatible
     assignments. Modify the regression planner code to use such a list of incompatible
     assignments. Give an example where the search space is smaller.
     Exercise 6.9 After completing the previous exercise, design incompatible assign-
     ments for the blocks world. (This should result in dramatic search improvements.)
     Exercise 6.10 Try the regression planner with a heuristic function of just h1 and
     with just h2 (defined in Section 6.2.1). Explain how each one prunes or doesn’t
     prune the search space.
     Exercise 6.11 Create a better heuristic than heuristic fun defined in Section 6.2.1.
     and when applied to any other value returns False. So is (3)(3) returns True
     and is (3)(7) returns False.
         Note that the underscore (’ ’) is part of the name; here we use it as the
     convention that it is a function that returns a function. This uses two different
     styles to define is and if ; returning a function defined by lambda is equivalent
     to returning the embedded function, except that the embedded function has a
     name. The embedded function can also be given a docstring.
                                  stripsCSPPlanner.py — (continued)
68   def is_(val):
69       """returns a function that is true when it is it applied to val.
70       """
71       #return lambda x: x == val
72       def is_fun(x):
73           return x == val
74       is_fun.__name__ = f"value_is_{val}"
75       return is_fun
76
77   def if_(v1,v2):
78       """if the second argument is v2, the first argument must be v1"""
79       #return lambda x1,x2: x1==v1 if x2==v2 else True
80       def if_fun(x1,x2):
81           return x1==v1 if x2==v2 else True
82       if_fun.__name__ = f"if x2 is {v2} then x1 is {v1}"
83       return if_fun
84
85   def eq_if_not_in_(actset):
86       """first and third arguments are equal if action is not in actset"""
87       # return lambda x1, a, x2: x1==x2 if a not in actset else True
88       def eq_if_not_fun(x1, a, x2):
89           return x1==x2 if a not in actset else True
90       eq_if_not_fun.__name__ = f"first and third arguments are equal if
             action is not in {actset}"
91       return eq_if_not_fun
         Putting it together, this returns a list of actions that solves the problem prob
     for a given horizon. If you want to do more than just return the list of actions,
     you might want to get it to return the solution. Or even enumerate the solutions
     (by using Search with AC from CSP).
                                  stripsCSPPlanner.py — (continued)
93   def con_plan(prob,horizon):
94       """finds a plan for problem prob given horizon.
95       """
96       csp = CSP_from_STRIPS(prob, horizon)
97       sol = Con_solver(csp).solve_one()
98       return csp.extract_plan(sol) if sol else sol
           The following are some example queries.
                                  stripsCSPPlanner.py — (continued)
        • agenda: a list of (s, a) pairs, where s is a (var, val) pair and a is an action
          instance. This means that variable var must have value val before a can
          occur.
        • causal links: a set of (a0, g, a1) triples, where a1 and a2 are action instances
          and g is a (var, val) pair. This holds when action a0 makes g true for action
          a1 .
stripsPOP.py — (continued)
28   class POP_node(object):
29       """a (partial) partial-order plan. This is a node in the search
             space."""
30       def __init__(self, actions, constraints, agenda, causal_links):
31           """
32           * actions is a set of action instances
33           * constraints a set of (a0,a1) pairs, representing a0<a1,
34             closed under transitivity
35           * agenda list of (subgoal,action) pairs to be achieved, where
36             subgoal is a (variable,value) pair
37           * causal_links is a set of (a0,g,a1) triples,
38             where ai are action instances, and g is a (variable,value) pair
39           """
40           self.actions = actions # a set of action instances
41           self.constraints = constraints # a set of (a0,a1) pairs
42           self.agenda = agenda # list of (subgoal,action) pairs to be
                 achieved
43           self.causal_links = causal_links # set of (a0,g,a1) triples
44
45       def __str__(self):
46           return ("actions: "+str({str(a) for a in self.actions})+
47                  "\nconstraints: "+
48                  str({(str(a1),str(a2)) for (a1,a2) in self.constraints})+
49                  "\nagenda: "+
50                  str([(str(s),str(a)) for (s,a) in self.agenda])+
51                  "\ncausal_links:"+
52                  str({(str(a0),str(g),str(a2)) for (a0,g,a2) in
                        self.causal_links}) )
         extract plan constructs a total order of action instances that is consistent with
     the partial order.
stripsPOP.py — (continued)
54       def extract_plan(self):
55           """returns a total ordering of the action instances consistent
56           with the constraints.
57           raises IndexError if there is no choice.
58           """
59           sorted_acts = []
60           other_acts = set(self.actions)
61           while other_acts:
62               a = random.choice([a for a in other_acts if
63                       all(((a1,a) not in self.constraints) for a1 in
                             other_acts)])
64               sorted_acts.append(a)
65               other_acts.remove(a)
66           return sorted_acts
stripsPOP.py — (continued)
stripsPOP.py — (continued)
109                      consts1 =
                             self.add_constraint((self.start,new_a),node.constraints)
110                      consts2 = self.add_constraint((new_a,act1),consts1)
111                      new_agenda1 = new_agenda + [(pre,new_a) for pre in
                             a0.preconds.items()]
112                      new_clink = (new_a,subgoal,act1)
113                      new_cls = node.causal_links + [new_clink]
114                      for consts3 in
                             self.protect_all_cls(node.causal_links,new_a,consts2):
115                          for consts4 in
                                 self.protect_cl_for_actions(node.actions,consts3,new_clink):
116                              yield Arc(node,
117                                       POP_node(new_actions,consts4,new_agenda1,new_cls),
118                                       cost=1)
          Given a casual link (a0, subgoal, a1), the following method protects the causal
      link from each action in actions. Whenever an action deletes subgoal, the action
      needs to be before a0 or after a1. This method enumerates all constraints that
      result from protecting the causal link from all actions.
stripsPOP.py — (continued)
          Given an action act, the following method protects all the causal links in
      clinks from act. Whenever act deletes subgoal from some causal link (a0, subgoal, a1),
      the action act needs to be before a0 or after a1. This method enumerates all con-
      straints that result from protecting the causal links from act.
                                       stripsPOP.py — (continued)
      that adding a new constraint means adding the implied pairs, but querying
      whether some order is consistent is quick.
                                    stripsPOP.py — (continued)
This chapter is the first on machine learning. It covers the following topics:
   • Features: many of the features come directly from the data. Sometimes
     it is useful to construct features, e.g. height > 1.9m might be a Boolean
     feature constructed from the real-values feature height. The next chapter
     is about neural networdks and how to learn features; in this chapter we
     construct explicitly in what is often known a feature engineering.
   • Learning with no input features: this is the base case of many methods.
     What should we predict if we have no input features? This provides the
     base cases for many algorithms (e.g., decision tree algorithm) and base-
     lines that more sophisticated algorithms need to beat. It also provides
     ways to test various predictors.
   • Decision tree learning: one of the classic and simplest learning algo-
     rithms, which is the basis of many other algorithms.
                                       147
     148                                                   7. Supervised Machine Learning
        • A feature is a function from examples into the range of the feature. Each
          feature f also has the following attributes:
           Thus for example, a Boolean feature is a function from the examples into
           {False, True}. So, if f is a Boolean feature, f .frange == [False, True], and if
           e is an example, f (e) is either True or False.
learnProblem.py — (continued)
18   class Data_set(Displayable):
19       """ A dataset consists of a list of training data and a list of test
             data.
20       """
21
22      def __init__(self, train, test=None, prob_test=0.20, target_index=0,
23                      header=None, target_type= None, seed=None): #12345):
24          """A dataset for learning.
25          train is a list of tuples representing the training examples
26          test is the list of tuples representing the test examples
27          if test is None, a test set is created by selecting each
28              example with probability prob_test
29          target_index is the index of the target.
30              If negative, it counts from right.
31              If target_index is larger than the number of properties,
32              there is no target (for unsupervised learning)
33          header is a list of names for the features
34          target_type is either None for automatic detection of target type
35               or one of "numeric", "boolean", "cartegorical"
36          seed is for random number; None gives a different test set each time
37          """
38          if seed: # given seed makes partition consistent from run-to-run
39              random.seed(seed)
40          if test is None:
41              train,test = partition_data(train, prob_test)
42          self.train = train
43          self.test = test
44
45          self.display(1,"Training set has",len(train),"examples. Number of
                columns: ",{len(e) for e in train})
46          self.display(1,"Test set has",len(test),"examples. Number of
                columns: ",{len(e) for e in test})
47          self.prob_test = prob_test
48          self.num_properties = len(self.train[0])
49          if target_index < 0: #allows for -1, -2, etc.
50              self.target_index = self.num_properties + target_index
51          else:
52              self.target_index = target_index
53          self.header = header
54          self.domains = [set() for i in range(self.num_properties)]
55          for example in self.train:
56              for ind,val in enumerate(example):
57                    self.domains[ind].add(val)
58            self.conditions_cache = {} # cache for computed conditions
59            self.create_features()
60            if target_type:
61                self.target.ftype = target_type
62            self.display(1,"There are",len(self.input_features),"input
                  features")
63
64         def __str__(self):
65             if self.train and len(self.train)>0:
66                 return ("Data: "+str(len(self.train))+" training examples, "
67                        +str(len(self.test))+" test examples, "
68                        +str(len(self.train[0]))+" features.")
69             else:
70                 return ("Data: "+str(len(self.train))+" training examples, "
71                        +str(len(self.test))+" test examples.")
73         def create_features(self):
74             """create the set of features
75             """
76             self.target = None
77             self.input_features = []
78             for i in range(self.num_properties):
79                 def feat(e,index=i):
80                     return e[index]
81                 if self.header:
82                     feat.__doc__ = self.header[i]
83                 else:
84                     feat.__doc__ = "e["+str(i)+"]"
85                 feat.frange = list(self.domains[i])
86                 feat.ftype = self.infer_type(feat.frange)
87                 if i == self.target_index:
88                     self.target = feat
89                 else:
90                     self.input_features.append(feat)
       We try to infer the type of each feature. Sometimes this can be wrong, (e.g.,
     when the numbers are really categorical) and may need to be set explicitly.
                                   learnProblem.py — (continued)
92         def infer_type(self,domain):
93             """Infers the type of a feature with domain
94             """
95             if all(v in {True,False} for v in domain):
96                 return "boolean"
         • When the range only has two values, we designate one to be the “true”
           value.
         • When the values are all numeric, we assume they are ordered (as opposed
           to just being some classes that happen to be labelled with numbers) and
           construct Boolean features for splits of the data. That is, the feature is
           e[ind] < cut for some value cut. We choose a number of cut values, up to
           a maximum number of cuts, given by max num cuts.
         • When the values are not all numeric, we create an indicator function for
           each value. An indicator function for a value returns true when that value
           is given and false otherwise. Note that we can’t create an indicator func-
           tion for values that appear in the test set but not in the training set be-
           cause we haven’t seen the test set. For the examples in the test set with a
           value that doesn’t appear in the training set for that feature, the indicator
           functions all return false.
      Exercise 7.1 Change the code so that it splits using e[ind] ≤ cut instead of e[ind] <
      cut. Check boundary cases, such as 3 elements with 2 cuts. As a test case, make
      sure that when the range is the 30 integers from 100 to 129, and you want 2 cuts,
      the resulting Boolean features should be e[ind] ≤ 109 and e[ind] ≤ 119 to make
      sure that each of the resulting domains is of equal size.
      Exercise 7.2 This splits on whether the feature is less than one of the values in
      the training set. Sam suggested it might be better to split between the values in
      the training set, and suggested using
      Why might Sam have suggested this? Does this work better? (Try it on a few
      datasets).
      (Please keep the __doc__ strings a consistent length as they are used in tables.)
      The prediction is either a real value or a {value : probability} dictionary or a list.
      The actual is either a real number or a key of the prediction.
learnProblem.py — (continued)
      The following class is used for datasets where the training and test are in dif-
      ferent files
learnProblem.py — (continued)
      The following are useful unary feature constructors and binary feature com-
      biner.
learnProblem.py — (continued)
Example:
learnProblem.py — (continued)
      Exercise 7.3 For symmetric properties, such as product, we don’t need both
      f 1 ∗ f 2 as well as f 2 ∗ f 1 as extra properties. Allow the user to be able to declare
      feature constructors as symmetric (by associating a Boolean feature with them).
      Change construct features so that it does not create both versions for symmetric
      combiners.
learnProblem.py — (continued)
         • a point prediction, where we are only allowed to predict one of the values
           of the feature. For example, if the values of the feature are {0, 1} we are
           only allowed to predict 0 or 1 or of the values are ratings in {1, 2, 3, 4, 5},
           we can only predict one of these integers.
         • a point prediction, where we are allowed to predict any value. For exam-
           ple, if the values of the feature are {0, 1} we may be allowed to predict 0.3,
           1, or even 1.7. For all of the criteria we can imagine, there is no point in
           predicting a value greater than 1 or less that zero (but that doesn’t mean
           we can’t), but it is often useful to predict a value between 0 and 1. If the
           values are ratings in {1, 2, 3, 4, 5}, we may want to predict 3.4.
         • a probability distribution over the values of the feature. For each value v,
           we predict a non-negative number pv , such that the sum over all predic-
           tions is 1.
     cmedian returns one of middle values when there are an even number of exam-
     ples, whereas median gives the average of them (and so cmedian is applicable
     for ordinals that cannot be considered cardinal values). Similarly, cmode picks
     one of the values when more than one value has the maximum number of ele-
     ments.
50
51      def mean(data, domain=[0,1]):
52          "mean          "
53          # returns a real number
54          return statistics.mean(data)
55
56      def rmean(data, domain=[0,1], mean0=0, pseudo_count=1):
57          "regularized mean"
58          # returns a real number.
59          # mean0 is the mean to be used for 0 data points
60          # With mean0=0.5, pseudo_count=2, same as laplace for [0,1] data
61          # this works for enumerations as well as lists
62          sum = mean0 * pseudo_count
63          count = pseudo_count
64          for e in data:
65              sum += e
66              count += 1
67          return sum/count
68
69      def mode(data, domain=[0,1]):
70          "mode          "
71          return statistics.mode(data)
72
73      def median(data, domain=[0,1]):
74          "median        "
75          return statistics.median(data)
76
77      all = [empirical, mean, rmean, bounded_empirical, laplace, cmode, mode,
            median,cmedian]
78
79      # The following suggests appropriate predictions as a function of the
            target type
80      select = {"boolean": [empirical, bounded_empirical, laplace, cmode,
            cmedian],
81               "categorical": [empirical, bounded_empirical, laplace, cmode,
                      cmedian],
82               "numeric": [mean, rmean, mode, median]}
     7.3.1 Evaluation
     To evaluate a point prediction, we first generate some data from a simple (Bernoulli)
     distribution, where there are two possible values, 0 and 1 for the target feature.
     Given prob, a number in the range [0, 1], this generate some training and test
     data where prob is the probability of each example being 1. To generate a 1 with
     probability prob, we generate a random number in range [0,1] and return 1 if
     that number is less than prob. A prediction is computed by applying the pre-
     dictor to the training data, which is evaluated on the test set. This is repeated
     num_samples times.
      Exercise 7.4 Which predictor works best for low counts when the error is
        (a) Squared error
       (b) Absolute error
        (c) Log loss
      You may need to try this a few times to make sure your answer is supported by
      the evidence. Does the difference from the other methods get more or less as the
      number of examples grow?
      Exercise 7.5 Suggest some other predictions that only take the training data.
      Does your method do better than the given methods? A simple way to get other
      predictors is to vary the threshold of bounded average, or to change the pseodo-
      counts of the Laplace method (use other numbers instead of 1 and 2).
         The decision tree algorithm does binary splits, and assumes that all input
     features are binary functions of the examples. It stops splitting if there are
     no input features, the number of examples is less than a specified number of
     examples or all of the examples agree on the target feature.
                               learnDT.py — Learning a binary decision tree
11   from learnProblem import Learner, Evaluate
12   from learnNoInputs import Predict
13   import math
14
15   class DT_learner(Learner):
16       def __init__(self,
17                    dataset,
18                    split_to_optimize=Evaluate.log_loss, # to minimize for at
                          each split
19                    leaf_prediction=Predict.empirical, # what to use for value
                          at leaves
20                    train=None,                  # used for cross validation
21                    max_num_cuts=8, # maximum number of conditions to split a
                          numeric feature into
22                    gamma=1e-7 , # minimum improvement needed to expand a node
23                    min_child_weight=10):
24           self.dataset = dataset
25           self.target = dataset.target
26           self.split_to_optimize = split_to_optimize
27           self.leaf_prediction = leaf_prediction
28           self.max_num_cuts = max_num_cuts
29           self.gamma = gamma
30           self.min_child_weight = min_child_weight
31           if train is None:
32               self.train = self.dataset.train
33           else:
34               self.train = train
35
36       def learn(self, max_num_cuts=8):
37           """learn a decision tree"""
38           return self.learn_tree(self.dataset.conditions(self.max_num_cuts),
                 self.train)
         The main recursive algorithm, takes in a set of input features and a set of
     training data. It first decides whether to split. If it doesn’t split, it makes a point
     prediction, ignoring the input features.
         It only splits if the best split increases the error by at least gamma. This im-
     plies it does not split when:
         If it splits, it selects the best split according to the evaluation criterion (as-
     suming that is the only split it gets to do), and returns the condition to split on
     (in the variable split) and the corresponding partition of the examples.
                                      learnDT.py — (continued)
learnDT.py — (continued)
Test cases:
learnDT.py — (continued)
          Note that different runs may provide different values as they split the train-
      ing and test sets differently. So if you have a hypothesis about what works
      better, make sure it is true for different runs.
      Exercise 7.6 The current algorithm does not have a very sophisticated stopping
      criterion. What is the current stopping criterion? (Hint: you need to look at both
      learn tree and select split.)
      Exercise 7.7 Extend the current algorithm to include in the stopping criterion
        (a) A minimum child size; don’t use a split if one of the children has fewer
            elements that this.
        (b) A depth-bound on the depth of the tree.
        (c) An improvement bound such that a split is only carried out if error with the
            split is better than the error without the split by at least the improvement
            bound.
      Which values for these parameters make the prediction errors on the test set the
      smallest? Try it on more than one dataset.
      Exercise 7.8 Without any input features, it is often better to include a pseudo-
      count that is added to the counts from the training data. Modify the code so that
      it includes a pseudo-count for the predictions. When evaluating a split, including
      pseudo counts can make the split worse than no split. Does pruning with an im-
      provement bound and pseudo-counts make the algorithm work better than with
      an improvement bound by itself?
      Exercise 7.9 Some people have suggested using information gain (which is equiv-
      alent to greedy optimization of log loss) as the measure of improvement when
      building the tree, even in they want to have non-probabilistic predictions in the
      final tree. Does this work better than myopically choosing the split that is best for
      the evaluation criteria we will use to judge the final prediction?
         The above decision tree overfits the data. One way to determine whether
     the prediction is overfitting is by cross validation. The code below implements
     k-fold cross validation, which can be used to choose the value of parameters
     to best fit the training data. If we want to use parameter tuning to improve
     predictions on a particular dataset, we can only use the training data (and not
     the test data) to tune the parameter.
         In k-fold cross validation, we partition the training set into k approximately
     equal-sized folds (each fold is an enumeration of examples). For each fold, we
     train on the other examples, and determine the error of the prediction on that
     fold. For example, if there are 10 folds, we train on 90% of the data, and then
     test on remaining 10% of the data. We do this 10 times, so that each example
     gets used as a test set once, and in the training set 9 times.
         The code below creates one copy of the data, and multiple views of the data.
     For each fold, fold enumerates the examples in the fold, and fold complement
     enumerates the examples not in the fold.
                        learnCrossValidation.py — Cross Validation for Parameter Tuning
11   from learnProblem import Data_set, Data_from_file, Evaluate
12   from learnNoInputs import Predict
13   from learnDT import DT_learner
14   import matplotlib.pyplot as plt
15   import random
16
17   class K_fold_dataset(object):
18       def __init__(self, training_set, num_folds):
19           self.data = training_set.train.copy()
20           self.target = training_set.target
21           self.input_features = training_set.input_features
22           self.num_folds = num_folds
23           self.conditions = training_set.conditions
24
25            random.shuffle(self.data)
26            self.fold_boundaries = [(len(self.data)*i)//num_folds
27                                  for i in range(0,num_folds+1)]
28
29         def fold(self, fold_num):
30             for i in range(self.fold_boundaries[fold_num],
31                           self.fold_boundaries[fold_num+1]):
32                 yield self.data[i]
33
34         def fold_complement(self, fold_num):
35             for i in range(0,self.fold_boundaries[fold_num]):
36                 yield self.data[i]
37             for i in range(self.fold_boundaries[fold_num+1],len(self.data)):
38                 yield self.data[i]
     The validation error is the average error for each example, where we test on
     each fold, and learn on the other folds.
                                    learnCrossValidation.py — (continued)
         The plot error method plots the average error as a function of a the mini-
     mum number of examples in decision-tree search, both for the validation set
     and for the test set. The error on the validation set can be used to tune the
     parameter — choose the value of the parameter that minimizes the error. The
     error on the test set cannot be used to tune the parameters; if is were to be used
     this way then it cannot be used to test.
learnCrossValidation.py — (continued)
                                  0.20
           average squared loss
0.18
0.16
0.14
                                         0         20          40           60             80
                                                           min_child_weight
79   # The following produces the graphs of Figure 7.18 of Poole and Mackworth
         [2023]
80   # data = Data_from_file('data/SPECT.csv',target_index=0, seed=123)
81   # plot_error(data, criterion=Evaluate.log_loss,
         leaf_prediction=Predict.laplace)
82
83   #also try:
84   # plot_error(data)
85   # data = Data_from_file('data/carbool.csv', target_index=-1, seed=123)
     Figure 7.2 shows the average squared loss in the validation and test sets as
     a function of the min_child_weight in the decision-tree learning algorithm.
     (SPECT data with seed 12345 followed by plot_error(data)). Different seeds
     will produce different graphs. The assumption behind cross vaildation is that
     the parameter that minimizes the loss on the validation set, will be a good pa-
     rameter for the test set.
         Note that different runs for the same data will have the same test error, but
     different validation error. If you rerun the Data_from_file, with a different
     seed, you will get the new test and training sets, and so the graph will change.
     Exercise 7.10 Change the error plot so that it can evaluate the stopping criteria
     of the exercise of Section 7.6. Which criteria makes the most difference?
41
42      def predictor(self,e):
43          """returns the prediction of the learner on example e"""
44          linpred = sum(w*f(e) for f,w in self.weights.items())
45          if self.squashed:
46              return sigmoid(linpred)
47          else:
48              return linpred
49
50      def predictor_string(self, sig_dig=3):
     learn is the main algorithm of the learner. It does num iter steps of stochastic
     gradient descent. Only the number of iterations is specified; the other parame-
     ters it gets from the class.
learnLinear.py — (continued)
60         def learn(self,num_iter=100):
61             batch_size = min(self.batch_size, len(self.train))
62             d = {feat:0 for feat in self.weights}
63             for it in range(num_iter):
64                 self.display(2,"prediction=",self.predictor_string())
65                 for e in random.sample(self.train, batch_size):
66                     error = self.predictor(e) - self.target(e)
67                     update = self.learning_rate*error
68                     for feat in self.weights:
69                         d[feat] += update*feat(e)
70                 for feat in self.weights:
71                     self.weights[feat] -= d[feat]
72                     d[feat]=0
73             return self.predictor
     one is a function that always returns 1. This is used for one of the input prop-
     erties.
learnLinear.py — (continued)
75   def one(e):
76       "1"
77       return 1
                1
             1 + e−x
learnLinear.py — (continued)
79   def sigmoid(x):
80       return 1/(1+math.exp(-x))
81
82   def logit(x):
83       return -math.log(1/x-1)
                    exp(xi )
            vi =
                   ∑j exp(xj )
 85   def softmax(xs,domain=None):
 86       """xs is a list of values, and
 87       domain is the domain (a list) or None if the list should be returned
 88       returns a distribution over the domain (a dict)
 89       """
 90       m = max(xs) # use of m prevents overflow (and all values underflowing)
 91       exps = [math.exp(x-m) for x in xs]
 92       s = sum(exps)
 93       if domain:
 94           return {d:v/s for (d,v) in zip(domain,exps)}
 95       else:
 96           return [v/s for v in exps]
 97
 98   def indicator(v, domain):
 99       return [1 if v==dv else 0 for dv in domain]
          The following tests the learner on a datasets. Uncomment the other datasets
      for different examples.
                                           learnLinear.py — (continued)
          The following plots the errors on the training and test sets as a function of
      the number of steps of gradient descent.
                                           learnLinear.py — (continued)
119                  step=1,
120                  num_steps=1000,
121                  log_scale=True,
122                  legend_label=""):
123         """
124         plots the training and test error for a learner.
125         data is the
126         learner_class is the class of the learning algorithm
127         criterion gives the evaluation criterion plotted on the y-axis
128         step specifies how many steps are run for each point on the plot
129         num_steps is the number of points to plot
130
131         """
132         if legend_label != "": legend_label+=" "
133         plt.ion()
134         plt.xlabel("step")
135         plt.ylabel("Average "+criterion.__doc__)
136         if log_scale:
137             plt.xscale('log') #plt.semilogx() #Makes a log scale
138         else:
139             plt.xscale('linear')
140         if data is None:
141             data = Data_from_file('data/holiday.csv', has_header=True,
                    num_train=19, target_index=-1)
142             #data = Data_from_file('data/SPECT.csv', target_index=0)
143             # data = Data_from_file('data/mail_reading.csv', target_index=-1)
144             # data = Data_from_file('data/carbool.csv', target_index=-1)
145         #random.seed(None) # reset seed
146         if learner is None:
147             learner = Linear_learner(data)
148         train_errors = []
149         test_errors = []
150         for i in range(1,num_steps+1,step):
151             test_errors.append(data.evaluate_dataset(data.test,
                    learner.predictor, criterion))
152             train_errors.append(data.evaluate_dataset(data.train,
                    learner.predictor, criterion))
153             learner.display(2, "Train error:",train_errors[-1],
154                             "Test error:",test_errors[-1])
155             learner.learn(num_iter=step)
156         plt.plot(range(1,num_steps+1,step),train_errors,ls='-',label=legend_label+"training")
157         plt.plot(range(1,num_steps+1,step),test_errors,ls='--',label=legend_label+"test")
158         plt.legend()
159         plt.draw()
160         learner.display(1, "Train error:",train_errors[-1],
161                             "Test error:",test_errors[-1])
162
163   if __name__ == "__main__":
164       test()
165
                                                                                            training
                                     1.1                                                    test
                                     1.0
           Average log loss (bits)
0.9
0.8
0.7
0.6
0.5
                                     0.4
                                           100              101                      102        103
                                                                        step
174         like the built-in range(start,stop,step) but allows for integers and
                floats.
175         Note that rounding errors are expected with real numbers. (or use
                numpy.arange)
176         """
177         while start<stop:
178             yield start
179             start += step
180
181   def plot_prediction(data,
182                  learner = None,
183                  minx = 0,
184                  maxx = 5,
185                  step_size = 0.01, # for plotting
186                  label = "function"):
187       plt.ion()
188       plt.xlabel("x")
189       plt.ylabel("y")
190       if learner is None:
191           learner = Linear_learner(data, squashed=False)
192       learner.learning_rate=0.001
193       learner.learn(100)
194       learner.learning_rate=0.0001
195       learner.learn(1000)
196       learner.learning_rate=0.00001
197       learner.learn(10000)
198       learner.display(1,"function learned is", learner.predictor_string(),
199                 "error=",data.evaluate_dataset(data.train, learner.predictor,
                        Evaluate.squared_loss))
200       plt.plot([e[0] for e in data.train],[e[-1] for e in
              data.train],"bo",label="data")
201       plt.plot(list(arange(minx,maxx,step_size)),[learner.predictor([x])
202                                          for x in
                                                 arange(minx,maxx,step_size)],
203                                        label=label)
204       plt.legend()
205       plt.draw()
learnLinear.py — (continued)
218      plt.xlabel("x")
219      plt.ylabel("y")
220      plt.plot([e[0] for e in data.train],[e[-1] for e in
             data.train],"ko",label="data")
221      x_values = list(arange(minx,maxx,step_size))
222      line_styles = ['-','--','-.',':']
223      colors = ['0.5','k','k','k','k']
224      for degree in range(max_degree):
225          data_aug = Data_set_augmented(data,[power_feat(n) for n in
                 range(1,degree+1)],
226                                         include_orig=False)
227          learner = learner_class(data_aug,squashed=False)
228          learner.learning_rate = learning_rate
229          learner.learn(num_iter)
230          learner.display(1,"For degree",degree,
231                      "function learned is", learner.predictor_string(),
232                      "error=",data.evaluate_dataset(data.train,
                             learner.predictor, Evaluate.squared_loss))
233          ls = line_styles[degree % len(line_styles)]
234          col = colors[degree % len(colors)]
235          plt.plot(x_values,[learner.predictor([x]) for x in x_values],
                 linestyle=ls, color=col,
236                          label="degree="+str(degree))
237          plt.legend(loc='upper left')
238          plt.draw()
239
240   # Try:
241   # data0 = Data_from_file('data/simp_regr.csv', prob_test=0,
          boolean_features=False, target_index=-1)
242   # plot_prediction(data0)
243   # plot_polynomials(data0)
244   # What if the step size was bigger?
245   #datam = Data_from_file('data/mail_reading.csv', target_index=-1)
246   #plot_prediction(datam)
      7.7     Boosting
      The following code implements functional gradient boosting for regression.
          A Boosted dataset is created from a base dataset by subtracting the pre-
      diction of the offset function from each example. This does not save the new
      dataset, but generates it as needed. The amount of space used is constant, in-
      dependent on the size of the dataset.
                             learnBoosting.py — Functional Gradient Boosting
 11   from learnProblem import Data_set, Learner, Evaluate
 12   from learnNoInputs import Predict
 13   from learnLinear import sigmoid
 14   import statistics
 15   import random
16
17   class Boosted_dataset(Data_set):
18       def __init__(self, base_dataset, offset_fun, subsample=1.0):
19           """new dataset which is like base_dataset,
20              but offset_fun(e) is subtracted from the target of each example e
21           """
22           self.base_dataset = base_dataset
23           self.offset_fun = offset_fun
24           self.train =
                 random.sample(base_dataset.train,int(subsample*len(base_dataset.train)))
25           self.test = base_dataset.test
26           #Data_set.__init__(self, base_dataset.train, base_dataset.test,
27           #                base_dataset.prob_test, base_dataset.target_index)
28
29            #def create_features(self):
30            """creates new features - called at end of Data_set.init()
31            defines a new target
32            """
33            self.input_features = self.base_dataset.input_features
34            def newout(e):
35                return self.base_dataset.target(e) - self.offset_fun(e)
36            newout.frange = self.base_dataset.target.frange
37            newout.ftype = self.infer_type(newout.frange)
38            self.target = newout
39
40         def conditions(self, *args, colsample_bytree=0.5, **nargs):
41             conds = self.base_dataset.conditions(*args, **nargs)
42             return random.sample(conds, int(colsample_bytree*len(conds)))
        A boosting learner takes in a dataset and a base learner, and returns a new
     predictor. The base learner, takes a dataset, and returns a Learner object.
learnBoosting.py — (continued)
44   class Boosting_learner(Learner):
45       def __init__(self, dataset, base_learner_class, subsample=0.8):
46           self.dataset = dataset
47           self.base_learner_class = base_learner_class
48           self.subsample = subsample
49           mean = sum(self.dataset.target(e)
50                     for e in self.dataset.train)/len(self.dataset.train)
51           self.predictor = lambda e:mean # function that returns mean for
                 each example
52           self.predictor.__doc__ = "lambda e:"+str(mean)
53           self.offsets = [self.predictor] # list of base learners
54           self.predictors = [self.predictor] # list of predictors
55           self.errors = [data.evaluate_dataset(data.test, self.predictor,
                 Evaluate.squared_loss)]
56           self.display(1,"Predict mean test set mean squared loss=",
                 self.errors[0] )
57
58
76   # Testing
77
78   from learnDT import DT_learner
79   from learnProblem import Data_set, Data_from_file
80
81   def sp_DT_learner(split_to_optimize=Evaluate.squared_loss,
82                              leaf_prediction=Predict.mean,**nargs):
83       """Creates a learner with different default arguments replaced by
             **nargs
84       """
85       def new_learner(dataset):
86           return DT_learner(dataset,split_to_optimize=split_to_optimize,
87                                 leaf_prediction=leaf_prediction, **nargs)
88       return new_learner
89
90   #data = Data_from_file('data/car.csv', target_index=-1) regression
91   data = Data_from_file('data/student/student-mat-nq.csv',
         separator=';',has_header=True,target_index=-1,seed=13,include_only=list(range(30))+[32])
         #2.0537973790924946
92   #data = Data_from_file('data/SPECT.csv', target_index=0, seed=62) #123)
93   #data = Data_from_file('data/mail_reading.csv', target_index=-1)
94   #data = Data_from_file('data/holiday.csv', has_header=True, num_train=19,
         target_index=-1)
95   #learner10 = Boosting_learner(data,
         sp_DT_learner(split_to_optimize=Evaluate.squared_loss,
          leaf_prediction=Predict.mean, min_child_weight=10))
 96   #learner7 = Boosting_learner(data, sp_DT_learner(0.7))
 97   #learner5 = Boosting_learner(data, sp_DT_learner(0.5))
 98   #predictor9 =learner9.learn(10)
 99   #for i in learner9.offsets: print(i.__doc__)
100   import matplotlib.pyplot as plt
101
102   def plot_boosting_trees(data, steps=10, mcws=[30,20,20,10], gammas=
          [100,200,300,500]):
103       # to reduce clutter uncomment one of following two lines
104       #mcws=[10]
105       #gammas=[200]
106       learners = [(mcw, gamma, Boosting_learner(data,
              sp_DT_learner(min_child_weight=mcw, gamma=gamma)))
107                      for gamma in gammas for mcw in mcws
108                      ]
109       plt.ion()
110       plt.xscale('linear') # change between log and linear scale
111       plt.xlabel("number of trees")
112       plt.ylabel("mean squared loss")
113       markers = (m+c for c in ['k','g','r','b','m','c','y'] for m in
              ['-','--','-.',':'])
114       for (mcw,gamma,learner) in learners:
115           data.display(1,f"min_child_weight={mcw}, gamma={gamma}")
116           learner.learn(steps)
117           plt.plot(range(steps+1), learner.errors, next(markers),
118                       label=f"min_child_weight={mcw}, gamma={gamma}")
119       plt.legend()
120       plt.draw()
121
122   # plot_boosting_trees(data)
Testing
learnBoosting.py — (continued)
165 # gtb_learner.learn()
     8.1      Layers
     A neural network is built from layers.
         This provides a modular implementation of layers. Layers can easily be
     stacked in many configurations. A layer needs to implement a function to com-
     pute the output values from the inputs, a way to back-propagate the error, and
     perhaps update its parameters.
                               learnNN.py — Neural Network Learning
11   from learnProblem import Learner, Data_set, Data_from_file,
         Data_from_files, Evaluate
12   from learnLinear import sigmoid, one, softmax, indicator
13   import random, math, time
14
15   class Layer(object):
16       def __init__(self, nn, num_outputs=None):
17           """Given a list of inputs, outputs will produce a list of length
                 num_outputs.
18           nn is the neural network this layer is part of
                                              185
     186                                      8. Neural Networks and Deep Learning
52   class Linear_complete_layer(Layer):
53       """a completely connected layer"""
54       def __init__(self, nn, num_outputs, limit=None):
55           """A completely connected linear layer.
learnNN.py — (continued)
learnNN.py — (continued)
185          report_each means give the errors after each multiple of that
                 iterations
186          """
187          self.batch_size = min(batch_size, len(self.training_set)) # don't
                 have batches bigger than training size
188          if num_iter is None:
189               num_iter = (epochs * len(self.training_set)) // self.batch_size
190          #self.display(0,"Batch\t","\t".join(criterion.__doc__ for criterion
                 in Evaluate.all_criteria))
191          for i in range(num_iter):
192              batch = random.sample(self.training_set, self.batch_size)
193              for e in batch:
194                  # compute all outputs
195                  values = [f(e) for f in self.input_features]
196                  for layer in self.layers:
197                      values = layer.output_values(values, training=True)
198                  # backpropagate
199                  predicted = [sigmoid(v) for v in values] if self.output_type
                         == "boolean"\
200                              else softmax(values) if self.output_type ==
                                     "categorical"\
201                              else values
202                  actuals = indicator(self.dataset.target(e),
                         self.dataset.target.frange) \
203                             if self.output_type == "categorical"\
204                             else [self.dataset.target(e)]
205                  errors = [pred-obsd for (obsd,pred) in
                         zip(actuals,predicted)]
206                  for layer in reversed(self.layers):
207                      errors = layer.backprop(errors)
208              # Update all parameters in batch
209              for layer in self.layers:
210                  layer.update()
211              self.bn+=1
212              if (i+1)%report_each==0:
213                  self.display(0,self.bn,"\t",
214                             "\t\t".join("{:.4f}".format(
215                                 self.dataset.evaluate_dataset(self.validation_set,
                                        self.predictor, criterion))
216                                for criterion in Evaluate.all_criteria),
                                       sep="")
learnNN.py — (continued)
8.3.2 RMS-Prop
learnNN.py — (continued)
      8.4        Dropout
      Dropout is implemented as a layer.
learnNN.py — (continued)
      8.4.1 Examples
      The following constructs a neural network with one hidden layer. The output
      is assumed to be Boolean or Real. If it is categorical, the final layer should
      have the same number of outputs as the number of cetegories (so it can use a
      softmax).
learnNN.py — (continued)
339   # nn3do is like nn3 but with dropout on the hidden layer
340   nn3do = NN(data, validation_proportion = 0)
341   nn3do.add_layer(Linear_complete_layer(nn3do,3))
342   #nn3.add_layer(Sigmoid_layer(nn3)) # comment this or the next
343   nn3do.add_layer(ReLU_layer(nn3do))
344   nn3do.add_layer(Dropout_layer(nn3do, rate=0.5))
345   nn3do.add_layer(Linear_complete_layer(nn3do,1))
346   #nn3do.learn(epochs = 100)
347
348   # nn3_rmsp is like nn3 but uses RMS prop
349   nn3_rmsp = NN(data, validation_proportion = 0)
350   nn3_rmsp.add_layer(Linear_complete_layer_RMS_Prop(nn3_rmsp,3))
351   #nn3_rmsp.add_layer(Sigmoid_layer(nn3_rmsp)) # comment this or the next
352   nn3_rmsp.add_layer(ReLU_layer(nn3_rmsp))
353   nn3_rmsp.add_layer(Linear_complete_layer_RMS_Prop(nn3_rmsp,1))
354   #nn3_rmsp.learn(epochs = 100)
355
356   # nn3_m is like nn3 but uses momentum
357   mm1_m = NN(data, validation_proportion = 0)
358   mm1_m.add_layer(Linear_complete_layer_momentum(mm1_m,3))
359   #mm1_m.add_layer(Sigmoid_layer(mm1_m)) # comment this or the next
360   mm1_m.add_layer(ReLU_layer(mm1_m))
361   mm1_m.add_layer(Linear_complete_layer_momentum(mm1_m,1))
362   #mm1_m.learn(epochs = 100)
363
364   # nn2 has a single a hidden layer of width 2
365   nn2 = NN(data, validation_proportion = 0)
366   nn2.add_layer(Linear_complete_layer_RMS_Prop(nn2,2))
367   nn2.add_layer(ReLU_layer(nn2))
368   nn2.add_layer(Linear_complete_layer_RMS_Prop(nn2,1))
369
370   # nn5 is has a single hidden layer of width 5
371   nn5 = NN(data, validation_proportion = 0)
372   nn5.add_layer(Linear_complete_layer_RMS_Prop(nn5,5))
373   nn5.add_layer(ReLU_layer(nn5))
374   nn5.add_layer(Linear_complete_layer_RMS_Prop(nn5,1))
375
376   # nn0 has no hidden layers, and so is just logistic regression:
377   nn0 = NN(data, validation_proportion = 0) #learning_rate=0.05)
378   nn0.add_layer(Linear_complete_layer(nn0,1))
379   # Or try this for RMS-Prop:
380   #nn0.add_layer(Linear_complete_layer_RMS_Prop(nn0,1))
      Plotting. Figure 8.1 shows the training and test performance on the SPECT
      dataset for the architectures above. Note the nn5 test has infinite log loss after
      about 45,000 steps. The noisyness of the predictions might indicate that the
      step size is too big. This was produced by the code below:
                                     learnNN.py — (continued)
2.5
2.0
                                                                                        nn0 training
          Average log loss (bits)
0.5
Figure 8.1: Plotting train and test log loss for various algorithms on SPECT dataset
384
385   # To show plots first choose a criterion                       to use
386   # crit = Evaluate.log_loss
387   # crit = Evaluate.accuracy
388   # plot_steps(learner = nn0, data = data,                       criterion=crit, num_steps=10000,
          log_scale=False, legend_label="nn0")
389   # plot_steps(learner = nn2, data = data,                       criterion=crit, num_steps=10000,
          log_scale=False, legend_label="nn2")
390   # plot_steps(learner = nn3, data = data,                       criterion=crit, num_steps=10000,
          log_scale=False, legend_label="nn3")
391   # plot_steps(learner = nn5, data = data,                       criterion=crit, num_steps=10000,
          log_scale=False, legend_label="nn5")
392
393   # for (nn,nname) in [(nn0,"nn0"),(nn2,"nn2"),(nn3,"nn3"),(nn5,"nn5")]:
          plot_steps(learner = nn, data = data, criterion=crit,
          num_steps=100000, log_scale=False, legend_label=nname)
394
395   # Print some training examples
396   #for eg in random.sample(data.train,10): print(eg,nn3.predictor(eg))
397
398   # Print some test examples
399   #for eg in random.sample(data.test,10): print(eg,nn3.predictor(eg))
400
401   # To see the weights learned in linear layers
402   # nn3.layers[0].weights
403   # nn3.layers[2].weights
404
405   # Print test:
406   # for e in data.train: print(e,nn0.predictor(e))
407
408   def test(data, hidden_widths = [5], epochs=100,
409                optimizers = [Linear_complete_layer,
410                          Linear_complete_layer_momentum,
                                 Linear_complete_layer_RMS_Prop]):
411       data.display(0,"Batch\t","\t".join(criterion.__doc__ for criterion in
              Evaluate.all_criteria))
412       for optimizer in optimizers:
413           nn = NN(data)
414           for width in hidden_widths:
415               nn.add_layer(optimizer(nn,width))
416               nn.add_layer(ReLU_layer(nn))
417           if data.target.ftype == "boolean":
418               nn.add_layer(optimizer(nn,1))
419           else:
420               error(f"Not implemented: {data.output_type}")
421           nn.learn(epochs)
      The following tests on MNIST. The original files are from http://yann.lecun.
      com/exdb/mnist/. This code assumes you use the csv files from https://pjreddie.
      com/projects/mnist-in-csv/, and put them in the directory ../MNIST/. Note
      that this is very inefficient; you would be better to use Keras or Pytorch. There
      are 28 ∗ 28 = 784 input units and 512 hidden units, which makes 401,408 pa-
      rameters for the lowest linear layer. So don’t be surprised when it takes many
      hours in AIPython (even if it only takes a few seconds in Keras).
learnNN.py — (continued)
          - start_time,"seconds") #1 epoch
432   # determine test error:
433   # data_mnist.evaluate_dataset(data_mnist.test, nn_mnist.predictor,
          Evaluate.accuracy)
434   # Print some random predictions:
435   # for eg in random.sample(data_mnist.test,10):
          print(data_mnist.target(eg),nn_mnist.predictor(eg),nn_mnist.predictor(eg)[data_mnist.target(eg
      Exercise 8.1 In the definition of nn3 above, for each of the following, first hy-
      pothesize what will happen, then test your hypothesis, then explain whether you
      testing confirms your hypothesis or not. Test it for more than one data set, and use
      more than one run for each data set.
        (a) Which fits the data better, having a sigmoid layer or a ReLU layer after the
            first linear layer?
        (b) Which is faster, having a sigmoid layer or a ReLU layer after the first linear
            layer?
        (c) What happens if you have both the sigmoid layer and then a ReLU layer
            after the first linear layer and before the second linear layer?
       (d) What happens if you have a ReLU layer then a sigmoid layer after the first
           linear layer and before the second linear layer?
        (e) What happens if you have neither the sigmoid layer nor a ReLU layer after
            the first linear layer?
      Exercise 8.2 Do some
                                                 199
     200                                                    9. Reasoning with Uncertainty
19            if name:
20                self.name = name
21            else:
22                self.name = f"f{Factor.nextid}"
23                Factor.nextid += 1
24
25         def can_evaluate(self,assignment):
26             """True when the factor can be evaluated in the assignment
27             assignment is a {variable:value} dict
28             """
29             return all(v in assignment for v in self.variables)
30
31         def get_value(self,assignment):
32             """Returns the value of the factor given the assignment of values
                   to variables.
33             Needs to be defined for each subclass.
34             """
35             assert self.can_evaluate(assignment)
36             raise NotImplementedError("get_value") # abstract method
     The method __str__ returns a brief definition (like “f7(X,Y,Z)”).The method
     to_table returns string representations of a table showing all of the assign-
     ments of values to variables, and the corresponding value.
                                   probFactors.py — (continued)
38         def __str__(self):
39             """returns a string representing a summary of the factor"""
40             return f"{self.name}({','.join(str(var) for var in
                   self.variables)})"
41
42         def to_table(self, variables=None, given={}):
43             """returns a string representation of the factor.
44             Allows for an arbitrary variable ordering.
45             variables is a list of the variables in the factor
46             (can contain other variables)"""
47             if variables==None:
48                 variables = [v for v in self.variables if v not in given]
49             else: #enforce ordering and allow for extra variables in ordering
50                 variables = [v for v in variables if v in self.variables and v
                       not in given]
51             head = "\t".join(str(v) for v in variables)+"\t"+self.name
52             return head+"\n"+self.ass_to_str(variables, given, variables)
53
54         def ass_to_str(self, vars, asst, allvars):
55             #print(f"ass_to_str({vars}, {asst}, {allvars})")
56             if vars:
57                 return "\n".join(self.ass_to_str(vars[1:], {**asst,
                       vars[0]:val}, allvars)
58                               for val in vars[0].domain)
59             else:
60                 val = self.get_value(asst)
67   class CPD(Factor):
68       def __init__(self, child, parents):
69           """represents P(variable | parents)
70           """
71           self.parents = parents
72           self.child = child
73           Factor.__init__(self, parents+[child], name=f"Probability")
74
75      def __str__(self):
76          """A brief description of a factor using in tracing"""
77          if self.parents:
78              return f"P({self.child}|{','.join(str(p) for p in
                    self.parents)})"
79          else:
80              return f"P({self.child})"
81
82      __repr__ = __str__
     A constant CPD has no parents, and has probability 1 when the variable has
     the value specified, and 0 when the variable has a different value.
                                    probFactors.py — (continued)
84   class ConstantCPD(CPD):
85       def __init__(self, variable, value):
86           CPD.__init__(self, variable, [])
87           self.value = value
88       def get_value(self, assignment):
89           return 1 if self.value==assignment[self.child] else 0
      9.3.2 Noisy-or
      A noisy-or, for Boolean variable X with Boolean parents Y1 . . . Yk is parametrized
      by k + 1 parameters p0 , p1 , . . . , pk , where each 0 ≤ pi ≤ 1. The semantics is de-
      fined as though there are k + 1 hidden variables Z0 , Z1 . . . Zk , where P(Z0 ) = p0
      and P(Zi | Yi ) = pi for i ≥ 1, and where X is true if and only if Z0 ∨ Z1 ∨ · · · ∨ Zk
      (where ∨ is “or”). Thus X is false if all of the Zi are false. Intuitively, Z0 is the
      probability of X when all Yi are false and each Zi is a noisy (probabilistic) mea-
      sure that Yi makes X true, and X only needs one to make it true.
                                     probFactors.py — (continued)
      Note that not all parents needs to be assigned to evaluate the decision tree; you
      only need the branch down the tree that gives the distribition.
                                    probFactors.py — (continued)
184
185      def can_evaluate(self, assignment):
186          if self.var not in assignment:
187              return False
188          elif assignment[self.var] == self.val:
189              return self.true_cond.can_evaluate(assignment)
190          else:
191              return self.false_cond.can_evaluate(assignment)
probFactors.py — (continued)
      The following shows a decision representation of the Example 9.18 of Poole and
      Mackworth [2023]. When the Action is to go out, the probability is a function
      of rain; otherwise it is a function of full.
probFactors.py — (continued)
probGraphicalModels.py — (continued)
27   class BeliefNetwork(GraphicalModel):
28       """The class of belief networks."""
29
30         def __init__(self, title, variables, factors):
31             """vars is a set of variables
32             factors is a set of factors. All of the factors are instances of
                   CPD (e.g., Prob).
33             """
34             GraphicalModel.__init__(self, title, variables, factors)
35             assert all(isinstance(f,CPD) for f in factors), factors
36             self.var2cpt = {f.child:f for f in factors}
37             self.var2parents = {f.child:f.parents for f in factors}
38             self.children = {n:[] for n in self.variables}
39             for v in self.var2parents:
40                 for par in self.var2parents[v]:
41                     self.children[par].append(v)
42             self.topological_sort_saved = None
        The following creates a topological sort of the nodes, where the parents of
     a node come before the node in the resulting order. This is based on Kahn’s
     algorithm from 1962.
probGraphicalModels.py — (continued)
44      def topological_sort(self):
45          """creates a topological ordering of variables such that the
                parents of
46          a node are before the node.
47          """
48          if self.topological_sort_saved:
49              return self.topological_sort_saved
50          next_vars = {n for n in self.var2parents if not self.var2parents[n]
                }
51          self.display(3,'topological_sort: next_vars',next_vars)
52          top_order=[]
53          while next_vars:
54              var = next_vars.pop()
55              self.display(3,'select variable',var)
56              top_order.append(var)
57              next_vars |= {ch for ch in self.children[var]
58                              if all(p in top_order for p in
                                    self.var2parents[ch])}
59              self.display(3,'var_with_no_parents_left',next_vars)
60          self.display(3,"top_order",top_order)
61          assert
                set(top_order)==set(self.var2parents),(top_order,self.var2parents)
62          self.topologicalsort_saved=top_order
63          return top_order
probGraphicalModels.py — (continued)
4-chain
                   A
                                      B
                                                                C
                                                                                 D
                                     Report-of-leaving
                   Tamper                                Fire
Alarm Smoke
Leaving
Report
     Report-of-Leaving Example
     The second belief network, bn_report, is Example 9.13 of Poole and Mack-
     worth [2023] (http://artint.info). The output of bn_report.show() is shown
     in Figure 9.2 of this document.
                                 probExamples.py — (continued)
Simple Diagnosis
Influenza Smokes
Coughing Wheezing
Rained Sprinkler
Grass wet
66   p_cough = Prob(Coughing,[Bronchitis],[[0.93,0.07],[0.2,0.8]])
67   p_wheeze = Prob(Wheezing,[Bronchitis],[[0.999,0.001],[0.4,0.6]])
68
69   simple_diagnosis = BeliefNetwork("Simple Diagnosis",
70                     {Influenza, Smokes, SoreThroat, HasFever, Bronchitis,
                           Coughing, Wheezing},
71                     {p_infl, p_smokes, p_sth, p_fever, p_bronc, p_cough,
                           p_wheeze})
     Sprinkler Example
     The third belief network is the sprinkler example from Pearl. The output of
     bn_sprinkler.show() is shown in Figure 9.4 of this document.
                                 probExamples.py — (continued)
81   f_sprinkler = Prob(Sprinkler,[Season],{'dry_season':{'on':0.4,'off':0.6},
82                                      'wet_season':{'on':0.01,'off':0.99}})
83   f_rained = Prob(Rained,[Season],{'dry_season':[0.9,0.1], 'wet_season':
         [0.2,0.8]})
84   f_wet = Prob(Grass_wet,[Sprinkler,Rained], {'on': [[0.1,0.9],[0.01,0.99]],
85                                           'off':[[0.99,0.01],[0.3,0.7]]})
86   f_shiny = Prob(Grass_shiny, [Grass_wet], [[0.95,0.05], [0.3,0.7]])
87   f_shoes = Prob(Shoes_wet, [Grass_wet], [[0.98,0.02], [0.35,0.65]])
88
89   bn_sprinkler = BeliefNetwork("Pearl's Sprinkler Example",
90                         {Season, Sprinkler, Rained, Grass_wet, Grass_shiny,
                               Shoes_wet},
91                         {f_season, f_sprinkler, f_rained, f_wet, f_shiny,
                               f_shoes})
probExamples.py — (continued)
probExamples.py — (continued)
121
122   p_cold_lr = Prob(Cold,[],[0.9,0.1])
123   p_flu_lr = Prob(Flu,[],[0.95,0.05])
124   p_covid_lr = Prob(Covid,[],[0.99,0.01])
125
126   p_cough_lr = LogisticRegression(Cough, [Cold,Flu,Covid], [-2.2, 1.67,
          1.26, 3.19])
probGraphicalModels.py — (continued)
          We use bn_4ch as the test case, in particular P(B | D = true). This needs an
      error threshold, particularly for the approximate methods, where the default
      threshold is much too accurate.
                     Tamper                                  Fire
                   False: 0.601                          False: 0.769
                   True: 0.399                           True: 0.231
                                       Alarm                                Smoke
                                    False: 0.372                         False: 0.785
                                    True: 0.628                          True: 0.215
                                      Leaving
                                    False: 0.347
                                    True: 0.653
Report=True
probGraphicalModels.py — (continued)
probGraphicalModels.py — (continued)
24          ## self.max_display_level = 3
25
26      def query(self, qvar, obs={}, split_order=None):
27          """computes P(qvar | obs) where
28          qvar is the query variable
29          obs is a variable:value dictionary
30          split_order is a list of the non-observed non-query variables in gm
31          """
32          if qvar in obs:
33              return {val:(1 if val == obs[qvar] else 0)
34                         for val in qvar.domain}
35          else:
36             if split_order == None:
37                  split_order = [v for v in self.gm.variables
38                                 if (v not in obs) and v != qvar]
39             unnorm = [self.prob_search({qvar:val}|obs, self.gm.factors,
                   split_order)
40                          for val in qvar.domain]
41             p_obs = sum(unnorm)
42             return {val:pr/p_obs for val,pr in zip(qvar.domain, unnorm)}
         The following is the naive search-based algorithm. It is exponential in the
     number of variables, so is not very useful. However, it is simple, and useful
     to understand before looking at the more complicated algorithm used in the
     subclass.
                                   probRC.py — (continued)
 68   class ProbRC(ProbSearch):
 69       method_name = "recursive conditioning"
 70
 71         def __init__(self,gm=None):
 72             self.cache = {(frozenset(), frozenset()):1}
 73             ProbSearch.__init__(self,gm)
 74
 75         def prob_search(self, context, factors, split_order):
 76             """ returns \sum_{split_order} \prod_{factors} given assignment in
                    context
 77             context is a variable:value dictionary
 78             factors is a set of factors
 79             split_order: list of variables in factors that are not in context
 80             """
 81             self.display(3,"calling rc,",(context,factors))
 82             ce = (frozenset(context.items()), frozenset(factors)) # key for the
                    cache entry
 83             if ce in self.cache:
 84                 self.display(3,"rc cache lookup",(context,factors))
 85                 return self.cache[ce]
 86   #          if not factors: #no factors; not needed with forgetting and caching
 87   #              return 1
 88             elif vars_not_in_factors := {var for var in context
 89                                           if not any(var in fac.variables
 90                                                         for fac in factors)}:
 91                 # forget variables not in any factor
 92                 self.display(3,"rc forgetting variables", vars_not_in_factors)
 93                 return self.prob_search({key:val for (key,val) in
                        context.items()
 94                                   if key not in vars_not_in_factors},
 95                               factors, split_order)
 96             elif to_eval := {fac for fac in factors
 97                                if fac.can_evaluate(context)}:
 98                 # evaluate factors when all variables are assigned
 99                 self.display(3,"rc evaluating factors",to_eval)
100                 val = math.prod(fac.get_value(context) for fac in to_eval)
101                 if val == 0:
102                  return 0
103              else:
104               return val * self.prob_search(context,
105                                         {fac for fac in factors
106                                                   if fac not in to_eval},
107                                         split_order)
108          elif len(comp := connected_components(context, factors,
                 split_order)) > 1:
109              # there are disconnected components
110              self.display(3,"splitting into connected components",comp,"in
                     context",context)
111              return(math.prod(self.prob_search(context,f,eo) for (f,eo) in
                     comp))
112          else:
113              assert split_order, "split_order should not be empty to get
                     here"
114              total = 0
115              var = split_order[0]
116              self.display(3, "rc branching on", var)
117              for val in var.domain:
118                  total += self.prob_search({var:val}|context, factors,
                         split_order[1:])
119              self.cache[ce] = total
120              self.display(2, "rc branching on", var,"returning", total)
121              return total
      connected_components returns a list of connected components, where a con-
      nected component is a set of factors and a set of variables, where the graph that
      connects variables and factors that involve them is connected. The connected
      components are built one at a time; with a current connected component. At
      all times factors is partitioned into 3 disjoint sets:
         • other_factors the other factors that are not (yet) in the connected com-
           ponent
probRC.py — (continued)
Testing:
probRC.py — (continued)
      The following example uses the decision tree representation of Section 9.3.4
      (page 205). Does recursive conditioning split on variable full for the query
      commented out below? What can be done to guarantee that it does?
probRC.py — (continued)
200   from probFactors import Prob, action, rain, full, wet, p_wet
201   from probGraphicalModels import BeliefNetwork
202   p_action = Prob(action,[],{'go_out':0.3, 'get_coffee':0.7})
203   p_rain = Prob(rain,[],[0.4,0.6])
204   p_full = Prob(full,[],[0.1,0.9])
205
206   wetBN = BeliefNetwork("Wet (decision tree CPD)", {action, rain, full, wet},
207                          {p_action, p_rain, p_full, p_wet})
208   wetRC = ProbRC(wetBN)
209   # wetRC.query(wet, {action:'go_out', rain:True})
210   # wetRC.show_post({action:'go_out', rain:True})
211   # wetRC.show_post({action:'go_out', wet:True})
            ∑ ∏              f.
            var f ∈factors
      We store the values in a list in a lazy manner; if they are already computed, we
      used the stored values. If they are not already computed we can compute and
      store them.
                                    probFactors.py — (continued)
 43         def project_observations(self,factor,obs):
 44             """Returns the resulting factor after observing obs
 45
 46            obs is a dictionary of {variable:value} pairs.
 47            """
 48            if any((var in obs) for var in factor.variables):
 49                # a variable in factor is observed
 50                return FactorObserved(factor,obs)
 51            else:
 52                return factor
 53
 54         def eliminate_var(self,factors,var):
 55             """Eliminate a variable var from a list of factors.
 56             Returns a new set of factors that has var summed out.
 57             """
 58             self.display(2,"eliminating ",str(var))
 59             contains_var = []
 60             not_contains_var = []
 61             for fac in factors:
 62                 if var in fac.variables:
 63                     contains_var.append(fac)
 64                 else:
 65                     not_contains_var.append(fac)
 66             if contains_var == []:
 67                 return factors
 68             else:
 69                 newFactor = FactorSum(var,contains_var)
 70                 self.display(2,"Multiplying:",[str(f) for f in contains_var])
 71                 self.display(2,"Creating factor:", newFactor)
 72                 self.display(3, newFactor.to_table()) # factor in detail
 73                 not_contains_var.append(newFactor)
 74                 return not_contains_var
 75
 76   from probExamples import bn_4ch, A,B,C,D
 77   bn_4chv = VE(bn_4ch)
 78   ## bn_4chv.query(A,{})
 79   ## bn_4chv.query(D,{})
 80   ## InferenceMethod.max_display_level = 3 # show more detail in displaying
 81   ## InferenceMethod.max_display_level = 1 # show less detail in displaying
 82   ## bn_4chv.query(A,{D:True})
 83   ## bn_4chv.query(B,{A:True,D:False})
 84
 85   from probExamples import bn_report,Alarm,Fire,Leaving,Report,Smoke,Tamper
 86   bn_reportv = VE(bn_report) # answers queries using variable elimination
 87   ## bn_reportv.query(Tamper,{})
 88   ## InferenceMethod.max_display_level = 0 # show no detail in displaying
 89   ## bn_reportv.query(Leaving,{})
 90   ## bn_reportv.query(Tamper,{},elim_order=[Smoke,Report,Leaving,Alarm,Fire])
 91   ## bn_reportv.query(Tamper,{Report:True})
 92   ## bn_reportv.query(Tamper,{Report:True,Smoke:False})
 93
 94   from probExamples import bn_sprinkler, Season, Sprinkler, Rained,
          Grass_wet, Grass_shiny, Shoes_wet
 95   bn_sprinklerv = VE(bn_sprinkler)
 96   ## bn_sprinklerv.query(Shoes_wet,{})
 97   ## bn_sprinklerv.query(Shoes_wet,{Rained:True})
 98   ## bn_sprinklerv.query(Shoes_wet,{Grass_shiny:True})
 99   ## bn_sprinklerv.query(Shoes_wet,{Grass_shiny:False,Rained:True})
100
101   from probExamples import bn_lr1, Cough, Fever, Sneeze, Cold, Flu, Covid
102   vediag = VE(bn_lr1)
103   ## vediag.query(Cough,{})
104   ## vediag.query(Cold,{Cough:1,Sneeze:0,Fever:1})
105   ## vediag.query(Flu,{Cough:0,Sneeze:1,Fever:1})
106   ## vediag.query(Covid,{Cough:1,Sneeze:0,Fever:1})
107   ## vediag.query(Covid,{Cough:1,Sneeze:0,Fever:1,Flu:0})
108   ## vediag.query(Covid,{Cough:1,Sneeze:0,Fever:1,Flu:1})
109
110   if __name__ == "__main__":
111       InferenceMethod.testIM(VE)
18         for v in dist:
19             cum += dist[v]
20             if cum > rand:
21                 return v
probStochSim.py — (continued)
     Exercise 9.1
        What is the time and space complexity the following 4 methods to generate n
     samples, where m is the length of dist:
       (a) n calls to sample one
      (b) sample multiple
       (c) Create the cumulative distribution (choose how this is represented) and, for
           each random number, do a binary search to determine the sample associated
           with the random number.
      (d) Choose a random number in the range [i/n, (i + 1)/n) for each i ∈ range(n),
          where n is the number of samples. Use these as the random numbers to
          select the particles. (Does this give random samples?)
53   class SamplingInferenceMethod(InferenceMethod):
54       """The abstract class of sampling-based belief network inference
             methods"""
55
56      def __init__(self,gm=None):
57          InferenceMethod.__init__(self, gm)
58
59      def query(self,qvar,obs={},number_samples=1000,sample_order=None):
60          raise NotImplementedError("SamplingInferenceMethod query") #
                abstract
62   class RejectionSampling(SamplingInferenceMethod):
63       """The class that queries Graphical Models using Rejection Sampling.
64
65      gm is a belief network to query
66      """
67      method_name = "rejection sampling"
 68
 69         def __init__(self, gm=None):
 70             SamplingInferenceMethod.__init__(self, gm)
 71
 72         def query(self, qvar, obs={}, number_samples=1000, sample_order=None):
 73             """computes P(qvar | obs) where
 74             qvar is a variable.
 75             obs is a {variable:value} dictionary.
 76             sample_order is a list of variables where the parents
 77               come before the variable.
 78             """
 79             if sample_order is None:
 80                 sample_order = self.gm.topological_sort()
 81             self.display(2,*sample_order,sep="\t")
 82             counts = {val:0 for val in qvar.domain}
 83             for i in range(number_samples):
 84                 rejected = False
 85                 sample = {}
 86                 for nvar in sample_order:
 87                     fac = self.gm.var2cpt[nvar] #factor with nvar as child
 88                     val = sample_one({v:fac.get_value({**sample, nvar:v}) for v
                            in nvar.domain})
 89                     self.display(2,val,end="\t")
 90                     if nvar in obs and obs[nvar] != val:
 91                         rejected = True
 92                         self.display(2,"Rejected")
 93                         break
 94                     sample[nvar] = val
 95                 if not rejected:
 96                     counts[sample[qvar]] += 1
 97                     self.display(2,"Accepted")
 98             tot = sum(counts.values())
 99             # As well as the distribution we also include raw counts
100             dist = {c:v/tot if tot>0 else 1/len(qvar.domain) for (c,v) in
                    counts.items()}
101             dist["raw_counts"] = counts
102             return dist
      Exercise 9.2 Change this algorithm so that it does importance sampling using a
      proposal distribution. It needs sample one using a different distribution and then
      update the weight of the current sample. For testing, use a proposal distribution
      that only specifies probabilities for some of the variables (and the algorithm uses
      the probabilities for the network in other cases).
probStochSim.py — (continued)
      Resampling
      Resample is based on sample multiple but works with an array of particles.
      (Aside: Python doesn’t let us use sample multiple directly as it uses a dictionary,
      9.9.6 Examples
                                   probStochSim.py — (continued)
226   ## bn_reportr.query(Tamper,{Report:True,Smoke:False},number_samples=100)
227
228   ## bn_reportL.query(Tamper,{Report:True,Smoke:False},number_samples=100)
229   ## bn_reportL.query(Tamper,{Report:True,Smoke:False},number_samples=100)
230
231   from probExamples import bn_sprinkler,Season, Sprinkler
232   from probExamples import Rained, Grass_wet, Grass_shiny, Shoes_wet
233   bn_sprinklerr = RejectionSampling(bn_sprinkler) # answers queries using
          rejection sampling
234   bn_sprinklerL = LikelihoodWeighting(bn_sprinkler) # answers queries using
          rejection sampling
235   bn_sprinklerp = ParticleFiltering(bn_sprinkler) # answers queries using
          particle filtering
236   #bn_sprinklerr.query(Shoes_wet,{Grass_shiny:True,Rained:True})
237   #bn_sprinklerL.query(Shoes_wet,{Grass_shiny:True,Rained:True})
238   #bn_sprinklerp.query(Shoes_wet,{Grass_shiny:True,Rained:True})
239
240   if __name__ == "__main__":
241       InferenceMethod.testIM(RejectionSampling, threshold=0.1)
242       InferenceMethod.testIM(LikelihoodWeighting, threshold=0.1)
243       InferenceMethod.testIM(ParticleFiltering, threshold=0.1)
      Exercise 9.3 This code keeps regenerating the distribution of a variable given
      its parents. Implement one or both of the following, and compare them to the
      original. Make cond dist return a slice that corresponds to the distribution, and
      then use the slice instead of the dictionary (a list slice does not generate new data
      structures). Make cond dist remember values it has already computed, and only
      return these.
probStochSim.py — (continued)
260
261      def query(self, qvar, obs={}, number_samples=1000, burn_in=100,
             sample_order=None):
262          """computes P(qvar | obs) where
263          qvar is a variable.
264          obs is a {variable:value} dictionary.
265          sample_order is a list of non-observed variables in order, or
266          if sample_order None, an arbitrary ordering is used
267          """
268          counts = {val:0 for val in qvar.domain}
269          if sample_order is not None:
270              variables = sample_order
271          else:
272              variables = [v for v in self.gm.variables if v not in obs]
273              random.shuffle(variables)
274          var_to_factors = {v:set() for v in self.gm.variables}
275          for fac in self.gm.factors:
276              for var in fac.variables:
277                  var_to_factors[var].add(fac)
278          sample = {var:random.choice(var.domain) for var in variables}
279          self.display(3,"Sample:",sample)
280          sample.update(obs)
281          for i in range(burn_in + number_samples):
282              for var in variables:
283                  # get unnormalized probability distribution of var given its
                         neighbors
284                  vardist = {val:1 for val in var.domain}
285                  for val in var.domain:
286                      sample[var] = val
287                      for fac in var_to_factors[var]: # Markov blanket
288                          vardist[val] *= fac.get_value(sample)
289                  sample[var] = sample_one(vardist)
290              if i >= burn_in:
291                  counts[sample[qvar]] +=1
292                  self.display(3,"      ",sample)
293          tot = sum(counts.values())
294          # as well as the computed distribution, we also include raw counts
295          dist = {c:v/tot for (c,v) in counts.items()}
296          dist["raw_counts"] = counts
297          self.display(2, f"Gibbs sampling P({qvar}|{obs}) = {dist}")
298          return dist
299
300   #from probExamples import bn_4ch, A,B,C,D
301   bn_4chg = GibbsSampling(bn_4ch)
302   ## InferenceMethod.max_display_level = 2 # detailed tracing for all
          inference methods
303   bn_4chg.query(A,{})
304   ## bn_4chg.query(D,{})
305   ## bn_4chg.query(B,{D:True})
306   ## bn_4chg.query(B,{A:True,C:False})
307
308   from probExamples import bn_report,Alarm,Fire,Leaving,Report,Smoke,Tamper
309   bn_reportg = GibbsSampling(bn_report)
310   ## bn_reportg.query(Tamper,{Report:True},number_samples=1000)
311
312   if __name__ == "__main__":
313       InferenceMethod.testIM(GibbsSampling, threshold=0.1)
      Exercise 9.4 Change the code so that it can have multiple query variables. Make
      the list of query variable be an input to the algorithm, so that the default value is
      the list of all non-observed variables.
      Exercise 9.5 In this algorithm, explain where it computes the probability of a
      variable given its Markov blanket. Instead of returning the average of the samples
      for the query variable, it is possible to return the average estimate of the probabil-
      ity of the query variable given its Markov blanket. Does this converge to the same
      answer as the given code? Does it converge faster, slower, or the same?
          [ProbRC,RejectionSampling,LikelihoodWeighting,ParticleFiltering,GibbsSampling]
360   #plot_mult(methods,bn_report,Tamper,True,{Report:True,Smoke:False},number_samples=100,
          number_runs=1000)
361   #
          plot_mult(methods,bn_report,Tamper,True,{Report:False,Smoke:True},number_samples=100,
          number_runs=1000)
362
363   # Sprinkler Example:
364   #
          plot_stats(bn_sprinklerr,Shoes_wet,True,{Grass_shiny:True,Rained:True},number_samples=1000)
365   #
          plot_stats(bn_sprinklerL,Shoes_wet,True,{Grass_shiny:True,Rained:True},number_samples=1000)
     step. The animal is either close to one of the 3 points of the triangle or in the
     middle of the triangle.
                                   probHMM.py — (continued)
29   # state
30   #       0=middle, 1,2,3 are corners
31   states1 = {'middle', 'c1', 'c2', 'c3'} # states
32   obs1 = {'m1','m2','m3'} # microphones
         The observation model is as follows. If the animal is in a corner, it will
     be detected by the microphone at that corner with probability 0.6, and will be
     independently detected by each of the other microphones with a probability of
     0.1. If the animal is in the middle, it will be detected by each microphone with
     a probability of 0.4.
                                   probHMM.py — (continued)
 96   hmm1f1 = HMMVEfilter(hmm1)
 97   # hmm1f1.filter([{'m1':0, 'm2':1, 'm3':1}, {'m1':1, 'm2':0, 'm3':1}])
 98   ## HMMVEfilter.max_display_level = 2 # show more detail in displaying
 99   # hmm1f2 = HMMVEfilter(hmm1)
100   # hmm1f2.filter([{'m1':1, 'm2':0, 'm3':0}, {'m1':0, 'm2':1, 'm3':0},
          {'m1':1, 'm2':0, 'm3':0},
101   #              {'m1':0, 'm2':0, 'm3':0}, {'m1':0, 'm2':0, 'm3':0},
          {'m1':0, 'm2':0, 'm3':0},
102   #              {'m1':0, 'm2':0, 'm3':0}, {'m1':0, 'm2':0, 'm3':1},
          {'m1':0, 'm2':0, 'm3':1},
103   #              {'m1':0, 'm2':0, 'm3':1}])
104   # hmm1f3 = HMMVEfilter(hmm1)
105   # hmm1f3.filter([{'m1':1, 'm2':0, 'm3':0}, {'m1':0, 'm2':0, 'm3':0},
          {'m1':1, 'm2':0, 'm3':0}, {'m1':1, 'm2':0, 'm3':1}])
106
107   # How do the following differ in the resulting state distribution?
108   # Note they start the same, but have different initial observations.
109   ## HMMVEfilter.max_display_level = 1 # show less detail in displaying
110   # for i in range(100): hmm1f1.advance()
111   # hmm1f1.state_dist
112   # for i in range(100): hmm1f3.advance()
113   # hmm1f3.state_dist
      Exercise 9.6 The representation assumes that there are a list of Boolean obser-
      vations. Extend the representation so that the each observation variable can have
      multiple discrete values. You need to choose a representation for the model, and
      change the algorithm.
      9.10.2 Localization
      The localization example in the book is a controlled HMM, where there is a
      given action at each time and the transition depends on the action.
                        probLocalization.py — Controlled HMM and Localization example
 11   from probHMM import HMMVEfilter, HMM
 12   from display import Displayable
 13   import matplotlib.pyplot as plt
 14   from matplotlib.widgets import Button, CheckButtons
 15
 16   class HMM_Controlled(HMM):
 17       """A controlled HMM, where the transition probability depends on the
              action.
43   class HMM_Local(HMMVEfilter):
44       """VE filter for controlled HMMs
45       """
46       def __init__(self, hmm):
47           HMMVEfilter.__init__(self, hmm)
48
49         def go(self, action):
50             self.hmm.trans = self.hmm.act2trans[action]
51             self.advance()
52
53   loc_filt = HMM_Local(hmm_16pos)
54   # loc_filt.observe({'door':True}); loc_filt.go("right");
         loc_filt.observe({'door':False}); loc_filt.go("right");
         loc_filt.observe({'door':True})
55   # loc_filt.state_dist
         The following lets us interactively move the agent and provide observa-
     tions. It shows the distribution over locations.
                                  probLocalization.py — (continued)
57 class Show_Localization(Displayable):
106             self.draw_dist()
107         def nodoor(self,event):
108             self.loc_filt.observe({'door':False})
109             self.draw_dist()
110         def reset(self,event):
111             self.loc_filt.state_dist = {i:1/16 for i in range(16)}
112             self.draw_dist()
113
114   # sl = Show_Localization(hmm_16pos)
115   # sl = Show_Localization(hmm_16pos, fontsize=15) # for demos - enlarge
          window
probHMM.py — (continued)
      Is it better or worse than particle filtering? Hint: you need to think about how
      they can be compared. Is the comparison different if there are more states than
      particles?
      Exercise 9.8 Extend the particle filtering code to continuous variables and ob-
      servations. In particular, suppose the state transition is a linear function with
      Gaussian noise of the previous state, and the observations are linear functions
      with Gaussian noise of the state. You may need to research how to sample from a
      Gaussian distribution.
probHMM.py — (continued)
         • Rolling out the DBN for some time period, and using standard belief net-
           work inference. The latest time that needs to be in the rolled out network
           is the time of the latest observation or the time of a query (whichever is
           later). This allows us to observe any variables at any time and query any
           variables at any time. This is covered in Section 9.11.2.
         • An unrolled belief network may be very large, and we might only be in-
           terested in asking about “now”. In this case we can just representing the
           variables “now”. In this approach we can observe and query the current
           variables. We can them move to the next time. This does not allow for
           arbitrary historical queries (about the past or the future), but can be much
           simpler. This is covered in Section 9.11.3.
         • An initial distribution over the features ”now” (time 1). This is a belief
           network with all variables being time 1 variables.
19
20         A variable can have both a name and an index. The index defaults to 1.
21         """
22         def __init__(self,name,domain=[False,True],index=1):
23             Variable.__init__(self,f"{name}_{index}",domain)
24             self.basename = name
25             self.domain = domain
26             self.index = index
27             self.previous = None
28
29         def __lt__(self,other):
30             if self.name != other.name:
31                 return self.name<other.name
32             else:
33                 return self.index<other.index
34
35         def __gt__(self,other):
36             return other<self
37
38   def variable_pair(name,domain=[False,True]):
39       """returns a variable and its predecessor. This is used to define
             2-stage DBNs
40
41         If the name is X, it returns the pair of variables X_prev,X_now"""
42         var_now = DBNvariable(name,domain,index='now')
43         var_prev = DBNvariable(name,domain,index='prev')
44         var_now.previous = var_prev
45         return var_prev, var_now
         A FactorRename is a factor that is the result renaming the variables in the
     factor. It takes a factor, fac, and a {new : old} dictionary, where new is the name
     of a variable in the resulting factor and old is the corresponding name in fac.
     This assumes that the all variables are renamed.
                                     probDBN.py — (continued)
47   class FactorRename(Factor):
48       def __init__(self,fac,renaming):
49           """A renamed factor.
50           fac is a factor
51           renaming is a dictionary of the form {new:old} where old and new
                 var variables,
52              where the variables in fac appear exactly once in the renaming
53           """
54           Factor.__init__(self,[n for (n,o) in renaming.items() if o in
                 fac.variables])
55           self.orig_fac = fac
56           self.renaming = renaming
57
58         def get_value(self,assignment):
59             return self.orig_fac.get_value({self.renaming[var]:val
60                                          for (var,val) in assignment.items()
61                                         if var in self.variables})
     The following class renames the variables of a conditional probability distri-
     bution. It is used for template models (e.g., dynamic decision networks or
     relational models)
                                  probDBN.py — (continued)
probDBN.py — (continued)
70   class DBN(Displayable):
71       """The class of stationary Dynamic Belief networks.
72       * name is the DBN name
73       * vars_now is a list of current variables (each must have
74       previous variable).
75       * transition_factors is a list of factors for P(X|parents) where X
76       is a current variable and parents is a list of current or previous
             variables.
77       * init_factors is a list of factors for P(X|parents) where X is a
78       current variable and parents can only include current variables
79       The graph of transition factors + init factors must be acyclic.
80
81      """
82      def __init__(self, title, vars_now, transition_factors=None,
            init_factors=None):
83          self.title = title
84          self.vars_now = vars_now
85          self.vars_prev = [v.previous for v in vars_now]
86          self.transition_factors = transition_factors
87          self.init_factors = init_factors
88          self.var_index = {}     # var_index[v] is the index of variable v
89          for i,v in enumerate(vars_now):
90              self.var_index[v]=i
     Here is a 3 variable DBN:
                                  probDBN.py — (continued)
100
101   # initial distribution
102   pa0 = Prob(A1,[],[0.9,0.1])
103   pb0 = Prob(B1,[A1],[[0.3,0.7],[0.8,0.2]])
104   pc0 = Prob(C1,[],[0.2,0.8])
105
106   dbn1 = DBN("Simple DBN",[A1,B1,C1],[pa,pb,pc],[pa0,pb0,pc0])
      Here is the animal example
                                     probDBN.py — (continued)
108   from probHMM import closeMic, farMic, midMic, sm, mmc, sc, mcm, mcc
109
110   Pos_0,Pos_1 =   variable_pair("Position",domain=[0,1,2,3])
111   Mic1_0,Mic1_1   = variable_pair("Mic1")
112   Mic2_0,Mic2_1   = variable_pair("Mic2")
113   Mic3_0,Mic3_1   = variable_pair("Mic3")
114
115   # conditional probabilities - see hmm for the values of sm,mmc, etc
116   ppos = Prob(Pos_1, [Pos_0],
117              [[sm, mmc, mmc, mmc], #was in middle
118               [mcm, sc, mcc, mcc], #was in corner 1
119               [mcm, mcc, sc, mcc], #was in corner 2
120               [mcm, mcc, mcc, sc]]) #was in corner 3
121   pm1 = Prob(Mic1_1, [Pos_1], [[1-midMic, midMic], [1-closeMic, closeMic],
122                            [1-farMic, farMic], [1-farMic, farMic]])
123   pm2 = Prob(Mic2_1, [Pos_1], [[1-midMic, midMic], [1-farMic, farMic],
124                            [1-closeMic, closeMic], [1-farMic, farMic]])
125   pm3 = Prob(Mic3_1, [Pos_1], [[1-midMic, midMic], [1-farMic, farMic],
126                            [1-farMic, farMic], [1-closeMic, closeMic]])
127   ipos = Prob(Pos_1,[], [0.25, 0.25, 0.25, 0.25])
128   dbn_an =DBN("Animal DBN",[Pos_1,Mic1_1,Mic2_1,Mic3_1],
129              [ppos, pm1, pm2, pm3],
130              [ipos, pm1, pm2, pm3])
probDBN.py — (continued)
157   # Try
158   #from probRC import ProbRC
159   #bn = BNfromDBN(dbn1,2) # construct belief network
160   #drc = ProbRC(bn)            # initialize recursive conditioning
161   #B2 = bn.name2var['B'][2]
162   #drc.query(B2) #P(B2)
163   #drc.query(bn.name2var['B'][1],{bn.name2var['B'][0]:True,bn.name2var['C'][1]:False})
          #P(B1|B0,C1)
probDBN.py — (continued)
                                               251
252                                                            10. Learning with Uncertainty
   P_heads
 0.0: 0.000
 0.05: 0.001
 0.1: 0.005
 0.15: 0.012         Coin Tosses observed: {Toss#0: 'heads', Toss#1: 'heads', Toss#2: 'tails'}
 0.2: 0.019
 0.25: 0.028
 0.3: 0.038
 0.35: 0.048
 0.4: 0.058
 0.45: 0.067                                                                             Toss#0=heads
 0.5: 0.075
 0.55: 0.082
 0.6: 0.087
 0.65: 0.089
 0.7: 0.088                                                                              Toss#1=heads
 0.75: 0.085
 0.8: 0.077
 0.85: 0.065
 0.9: 0.049
 0.95: 0.027                                                                              Toss#2=tails
 1.0: 0.000
                                                                                             Toss#3
                                                                                           tails: 0.401
                                                                                          heads: 0.599
                                                                                             Toss#4
                                                                                           tails: 0.401
                                                                                          heads: 0.599
                                                                                             Toss#5
                                                                                           tails: 0.401
                                                                                          heads: 0.599
                                                                                             Toss#6
                                                                                           tails: 0.401
                                                                                          heads: 0.599
                                        Toss#10                                              Toss#7
                                      tails: 0.401                                         tails: 0.401
                                     heads: 0.599                                         heads: 0.599
                                                                                             Toss#8
                                                                                           tails: 0.401
                                                                                          heads: 0.599
                                                                                             Toss#9
                                                                                           tails: 0.401
                                                                                          heads: 0.599
                                                        Beta Distribution
                       4.0
                                   12 heads; 4 tails
                       3.5         3 heads; 1 tails
                                   6 heads; 2 tails
                       3.0
                       2.5
         Probability
                       2.0
                       1.5
                       1.0
                       0.5
                       0.0
                             0.0          0.2            0.4              0.6      0.8       1.0
                                                               P(Heads)
                   heads                 tails                 save                            reset
     Figure 10.2 shows a plot of the Beta distribution (the P head variable in the
     previous belief network) given some sets of observations.
        This is a plot that is produced by the following interactive tool.
                                                  learnBayesian.py — (continued)
 96      def tails(self,event):
 97          self.num_tails += 1
 98          self.dist = [self.dist[i]*(1-self.vals[i]) for i in range(self.num)]
 99          self.draw_dist()
100      def save(self,event):
101          self.saves.append((self.num_heads,self.num_tails,self.dist))
102          self.draw_dist()
103      def reset(self,event):
104          self.num_tails = 0
105          self.num_heads = 0
106          self.dist = [1/self.num for i in range(self.num)]
107          self.draw_dist()
108
109   # s1 = Show_Beta(100)
110   # sl = Show_Beta(100, fontsize=15) # for demos - enlarge window
      10.2       K-means
      The k-means learner maintains two lists that suffice as sufficient statistics to
      classify examples, and to learn the classification:
         • class counts is a list such that class counts[c] is the number of examples in
           the training set with class = c.
         • feature sum is a list such that feature sum[i][c] is sum of the values for the
           i’th feature i for members of class c. The average value of the ith feature
           in class i is
                 feature sum[i][c]
                  class counts[c]
     The distance from (the mean of) a class to an example is the sum, over all
     features, of the sum-of-squares differences of the class mean and the example
     value.
learnKMeans.py — (continued)
35         def distance(self,cl,eg):
36             """distance of the eg from the mean of the class"""
37             return sum( (self.class_prediction(ind,cl)-feat(eg))**2
38                            for (ind,feat) in
                                  enumerate(self.dataset.input_features))
39
40         def class_prediction(self,feat_ind,cl):
41             """prediction of the class cl on the feature with index feat_ind"""
42             if self.class_counts[cl] == 0:
43                 return 0 # there are no examples so we can choose any value
44             else:
45                 return self.feature_sum[feat_ind][cl]/self.class_counts[cl]
46
47         def class_of_eg(self,eg):
48             """class to which eg is assigned"""
49             return (min((self.distance(cl,eg),cl)
50                           for cl in range(self.num_classes)))[1]
51                   # second element of tuple, which is a class with minimum
                         distance
     One step of k-means updates the class counts and feature sum. It uses the old
     values to determine the classes, and so the new values for class counts and
     feature sum. At the end it determines whether the values of these have changes,
     and then replaces the old ones with the new ones. It returns an indicator of
     whether the values are stable (have not changed).
learnKMeans.py — (continued)
53         def k_means_step(self):
54             """Updates the model with one step of k-means.
55             Returns whether the assignment is stable.
56             """
57             new_class_counts = [0]*self.num_classes
      Exercise 10.1 Change boolean features = True flag to allow for numerical features.
      K-means assumes the features are numerical, so we want to make non-numerical
      features into numerical features (using characteristic functions) but we probably
      don’t want to change numerical features into Boolean.
      Exercise 10.2 If there are many classes, some of the classes can become empty
      (e.g., try 100 classes with carbool.csv). Implement a way to put some examples
      into a class, if possible. Two ideas are:
          (a) Initialize the classes with actual examples, so that the classes will not start
              empty. (Do the classes become empty?)
       (b) In class prediction, we test whether the code is empty, and make a prediction
           of 0 for an empty class. It is possible to make a different prediction to “steal”
           an example (but you should make sure that a class has a consistent value for
           each feature in a loop).
     Make your own suggestions, and compare it with the original, and whichever of
     these you think may work better.
     10.3        EM
     In the following definition, a class, c, is a integer in range [0, num classes). i is
     an index of a feature, so feat[i] is the ith feature, and a feature is a function from
     tuples to values. val is a value of a feature.
         A model consists of 2 lists, which form the sufficient statistics:
        • class counts is a list such that class counts[c] is the number of tuples with
          class = c, where each tuple is weighted by its probability, i.e.,
        • feature counts is a list such that feature counts[i][val][c] is the weighted count
          of the number of tuples t with feat[i](t) = val and class(t) = c, each tuple
          is weighted by its probability, i.e.,
                                       learnEM.py — EM Learning
11   from learnProblem import Data_set, Learner, Data_from_file
12   import random
13   import math
14   import matplotlib.pyplot as plt
15
16   class EM_learner(Learner):
17       def __init__(self,dataset, num_classes):
18           self.dataset = dataset
19           self.num_classes = num_classes
20           self.class_counts = None
21           self.feature_counts = None
     The function em step goes though the training examples, and updates these
     counts. The first time it is run, when there is no model, it uses random distri-
     butions.
                                        learnEM.py — (continued)
     The last step is because len(self .dataset) is a constant (independent of c). class counts[c]
     can be taken out of the product, but needs to be raised to the power of the num-
     ber of features, and one of them cancels.
                                         learnEM.py — (continued)
51         def learn(self,n):
52             """do n steps of em"""
53          for i in range(n):
54              self.class_counts,self.feature_counts =
                    self.em_step(self.class_counts,
55                                                                               self.feature_counts)
     The following is for visualizing the classes. It prints the dataset ordered by the
     probability of class c.
                                          learnEM.py — (continued)
57      def show_class(self,c):
58          """sorts the data by the class and prints in order.
59          For visualizing small data sets
60          """
61          sorted_data =
                sorted((self.prob(tpl,self.class_counts,self.feature_counts)[c],
62                              ind, # preserve ordering for equal
                                    probabilities
63                              tpl)
64                             for (ind,tpl) in enumerate(self.dataset.train))
65          for cc,r,tpl in sorted_data:
66              print(cc,*tpl,sep='\t')
     The following are for evaluating the classes.
        The probability of a tuple can be evaluated by marginalizing over the classes:
     where cc is the class count and fc is feature count. len(self .dataset) can be dis-
     tributed out of the sum, and cc[c] can be taken out of the product:
                        1                    1
           =
               len(self .dataset)   ∑ cc[c]#feats−1 ∗ ∏ fc[i][feati (tple)][c]
                                     c                    i
     Given the probability of each tuple, we can evaluate the logloss, as the negative
     of the log probability:
                                          learnEM.py — (continued)
68      def logloss(self,tple):
69          """returns the logloss of the prediction on tple, which is
                -log(P(tple))
70          based on the current class counts and feature counts
71          """
72          feats = self.dataset.input_features
73          res = 0
74          cc = self.class_counts
75          fc = self.feature_counts
76          for c in range(self.num_classes):
77              res += prod(fc[i][feat(tple)][c]
 78                           for (i,feat) in
                                  enumerate(feats))/(cc[c]**(len(feats)-1))
 79            if res>0:
 80                return -math.log2(res/len(self.dataset.train))
 81            else:
 82                return float("inf") #infinity
 83
 84         def plot_error(self, maxstep=20):
 85             """Plots the logloss error as a function of the number of steps"""
 86             plt.ion()
 87             plt.xlabel("step")
 88             plt.ylabel("Ave Logloss (bits)")
 89             train_errors = []
 90             if self.dataset.test:
 91                 test_errors = []
 92             for i in range(maxstep):
 93                 self.learn(1)
 94                 train_errors.append( sum(self.logloss(tple) for tple in
                        self.dataset.train)
 95                                     /len(self.dataset.train))
 96                 if self.dataset.test:
 97                     test_errors.append( sum(self.logloss(tple) for tple in
                            self.dataset.test)
 98                                         /len(self.dataset.test))
 99             plt.plot(range(1,maxstep+1),train_errors,
100                      label=str(self.num_classes)+" classes. Training set")
101             if self.dataset.test:
102                 plt.plot(range(1,maxstep+1),test_errors,
103                          label=str(self.num_classes)+" classes. Test set")
104             plt.legend()
105             plt.draw()
106
107   def prod(L):
108       """returns the product of the elements of L"""
109       res = 1
110       for e in L:
111           res *= e
112       return res
113
114   def random_dist(k):
115       """generate k random numbers that sum to 1"""
116       res = [random.random() for i in range(k)]
117       s = sum(res)
118       return [v/s for v in res]
119
120   data = Data_from_file('data/emdata2.csv', num_train=10, target_index=2000)
121   eml = EM_learner(data,2)
122   num_iter=2
123   print("Class assignment after",num_iter,"iterations:")
124   eml.learn(num_iter); eml.show_class(0)
125
126   #   Plot the error
127   #   em2=EM_learner(data,2); em2.plot_error(40) # 2 classes
128   #   em3=EM_learner(data,3); em3.plot_error(40) # 3 classes
129   #   em13=EM_learner(data,13); em13.plot_error(40) # 13 classes
130
131   # data = Data_from_file('data/carbool.csv',
          target_index=2000,boolean_features=False)
132   # [f.frange for f in data.input_features]
133   # eml = EM_learner(data,3)
134   # eml.learn(20); eml.show_class(0)
135   # em3=EM_learner(data,3); em3.plot_error(60) # 3 classes
136   # em3=EM_learner(data,30); em3.plot_error(60) # 30 classes
      Exercise 10.3 For the EM data, where there are naturally 2 classes, 3 classes does
      better on the training set after a while than 2 classes, but worse on the test set.
      Explain why. Hint: look what the 3 classes are. Use ”em3.show class(i)” for each
      of the classes i ∈ [0, 3).
      Exercise 10.4 Write code to plot the logloss as a function of the number of classes
      (from 1 to say 15) for a fixed number of iterations. (From the experience with the
      existing code, think about how many iterations is appropriate.)
Causality
     11.1       Do Questions
     A causal model can answer “do” questions.
        The intervene function takes a belief network and a variable:value dictio-
     nary specifying what to “do”, and returns a belief network resulting from in-
     tevening to set each variable in the disctionary to its value specified. It replaces
     each CPD with an constant CPD.
                          probDo.py — Probabilistic inference with the do operator
11   from probGraphicalModels import InferenceMethod, BeliefNetwork
12   from probFactors import CPD, ConstantCPD
13
14   def intervene(bn, do={}):
15       assert isinstance(bn, BeliefNetwork), f"Do only applies to belief
             networks ({bn.title})"
16       if do=={}:
17           return bn
18       else:
19           newfacs = ({f for (ch,f) in bn.var2cpt.items() if ch not in do} |
20                        {ConstantCPD(v,c) for (v,c) in do.items()})
21           return BeliefNetwork(f"{bn.title}(do={do})", bn.variables, newfacs)
     The following adds the queryDo method to the InferenceMethod class, so it
     can be used with any inference method. It replaces the graphical model with
     the modified one, runs the inference algorithm, and resores the initial belief
     network.
                                         probDo.py — (continued)
                                                   265
     266                                                           11. Causality
B C if b C if not b B'
C C'
31
32   # as a deterministic system with independent noise
33   A = Variable("A", boolean, position=(0.2,0.8))
34   B = Variable("B", boolean, position=(0.2,0.4))
35   C = Variable("C", boolean, position=(0.2,0.0))
36   Aprime = Variable("A'", boolean, position=(0.8,0.8))
37   Bprime = Variable("B'", boolean, position=(0.8,0.4))
38   Cprime = Variable("C'", boolean, position=(0.8,0.0))
39   BifA = Variable("B if a", boolean, position=(0.4,0.8))
40   BifnA = Variable("B if not a", boolean, position=(0.6,0.8))
41   CifB = Variable("C if b", boolean, position=(0.4,0.4))
42   CifnB = Variable("C if not b", boolean, position=(0.6,0.4))
43
44   p_A = Prob(A, [], [0.5,0.5])
45   p_B = Prob(B, [A, BifA, BifnA], [[[[1,0],[0,1]],[[1,0],[0,1]]], # A=0
46                                    [[[1,0],[1,0]],[[0,1],[0,1]]]]) # A=1
47   p_C = Prob(C, [B, CifB, CifnB], [[[[1,0],[0,1]],[[1,0],[0,1]]], # B=0
48                                    [[[1,0],[1,0]],[[0,1],[0,1]]]]) # B=1
49   p_Aprime = Prob(Aprime,[], [0.6,0.4])
50   p_Bprime = Prob(Bprime, [Aprime, BifA, BifnA],
         [[[[1,0],[0,1]],[[1,0],[0,1]]], # A=0
51                                    [[[1,0],[1,0]],[[0,1],[0,1]]]]) # A=1
52   p_Cprime = Prob(Cprime, [Bprime, CifB, CifnB],
         [[[[1,0],[0,1]],[[1,0],[0,1]]], # B=0
53                                    [[[1,0],[1,0]],[[0,1],[0,1]]]]) # B=1
54   p_bifa = Prob(BifA, [], [0.6,0.4]) # Does not actually depend on A!!!
55   p_bifna = Prob(BifnA, [], [0.6,0.4])
56   p_cifb = Prob(CifB, [], [0.9,0.1])
57   p_cifnb = Prob(CifnB, [], [0.2,0.8])
58
59   abcCounter = BeliefNetwork("ABC Counterfactual Example",
60                           [A,B,C,Aprime,Bprime,Cprime,BifA, BifnA, CifB,
                                 CifnB],
61                           [p_A,p_B,p_C,p_Aprime,p_Bprime, p_Cprime, p_bifa,
                                 p_bifna, p_cifb, p_cifnb])
62
63   abcq = ProbRC(abcCounter)
64   # abcq.queryDo(Cprime, obs = {Aprime:False, A:True})
65   # abcq.queryDo(Cprime, obs = {C:True, Aprime:False})
66   # abcq.queryDo(Cprime, obs = {A:True, C:True, Aprime:False})
67   # abcq.queryDo(Cprime, obs = {A:True, C:True, Aprime:False})
68   # abcq.queryDo(Cprime, obs = {A:False, C:True, Aprime:False})
69   # abcq.queryDo(CifB, obs = {C:True,Aprime:False})
70   # abcq.queryDo(CifnB, obs = {C:True,Aprime:False})
71
72   #   abcq.show_post(obs   =   {})
73   #   abcq.show_post(obs   =   {Aprime:False, A:True})
74   #   abcq.show_post(obs   =   {A:True, C:True, Aprime:False})
75   #   abcq.show_post(obs   =   {A:True, C:True, Aprime:True})
The following is the firing squad example of Pearl. See Figure 11.2.
probCounterfactual.py — (continued)
                                               Dead
                                           False: 0.882
                                           True: 0.118
                                                   271
     272                                                     12. Planning with Uncertainty
decnNetworks.py — (continued)
29   class DecisionVariable(Variable):
30       def __init__(self, name, domain, parents, position=None):
31           Variable.__init__(self, name, domain, position)
32           self.parents = parents
33           self.all_vars = set(parents) | {self}
     A decision network is a graphical model where the variables can be random
     variables or decision variables. Among the factors we assume there is one util-
     ity factor.
                                   decnNetworks.py — (continued)
35   class DecisionNetwork(BeliefNetwork):
36       def __init__(self, title, vars, factors):
37           """vars is a list of variables
38           factors is a list of factors (instances of CPD and Utility)
39           """
40           GraphicalModel.__init__(self, title, vars, factors) # don't call
                 init for BeliefNetwork
41           self.var2parents = ({v : v.parents for v in vars if
                 isinstance(v,DecisionVariable)}
42                        | {f.child:f.parents for f in factors if
                              isinstance(f,CPD)})
43           self.children = {n:[] for n in self.variables}
44           for v in self.var2parents:
45               for par in self.var2parents[v]:
46                   self.children[par].append(v)
47           self.utility_factor = [f for f in factors if
                 isinstance(f,Utility)][0]
48           self.topological_sort_saved = None
49
50         def __str__(self):
51             return self.title
         The split order ensures that the parents of a decision node are split before
     the decision node, and no other variables (if that is possible).
                                   decnNetworks.py — (continued)
53         def split_order(self):
54             so = []
55             tops = self.topological_sort()
56             for v in tops:
57                 if isinstance(v,DecisionVariable):
58                    so += [p for p in v.parents if p not in so]
59                    so.append(v)
60             so += [v for v in tops if v not in so]
61             return so
decnNetworks.py — (continued)
63         def show(self):
64             plt.ion() # interactive
Weather
Forecast Utility
Umbrella
65          ax = plt.figure().gca()
66          ax.set_axis_off()
67          plt.title(self.title)
68          for par in self.utility_factor.variables:
69              ax.annotate("Utility", par.position,
                    xytext=self.utility_factor.position,
70                                    arrowprops={'arrowstyle':'<-'},
71                                    bbox=dict(boxstyle="sawtooth,pad=1.0",color="red"),
72                                    ha='center')
73          for var in reversed(self.topological_sort()):
74              if isinstance(var,DecisionVariable):
75                  bbox = dict(boxstyle="square,pad=1.0",color="green")
76              else:
77                 bbox = dict(boxstyle="round4,pad=1.0,rounding_size=0.5")
78              if self.var2parents[var]:
79                  for par in self.var2parents[var]:
80                      ax.annotate(var.name, par.position, xytext=var.position,
81                                    arrowprops={'arrowstyle':'<-'},bbox=bbox,
82                                    ha='center')
83              else:
84                  x,y = var.position
85                  plt.text(x,y,var.name,bbox=bbox,ha='center')
decnNetworks.py — (continued)
      The following is a variant with the umbrella decision having 2 parents; nothing
      else has changed. This is interesting because one of the parents is not needed;
      if the agent knows the weather, it can ignore the forecast.
                                  decnNetworks.py — (continued)
Report Call
                                           Cheat Decision
               Watched                                               Punish
Caught1 Caught2
Grade_1 Grade_2
Fin_Grd
decnNetworks.py — (continued)
      Chain of 3 decisions
      The following example is a finite-stage fully-observable Markov decision pro-
      cess with a single reward (utility) at the end. It is interesting because the par-
      ents do not include all predecessors. The methods we use will work without
      change on this, even though the agent does not condition on all of its previous
      observations and actions. The output of ch3.show() is shown in Figure 12.4.
                                   decnNetworks.py — (continued)
                                               3-chain
                                                                               Utility
S0 S1 S2 S3
D0 D1 D2
decnNetworks.py — (continued)
          We can combine the optimization for decision networks above, with the
      improvements of recursive conditioning used for graphical models (Section
      9.7, page 218).
decnNetworks.py — (continued)
decnNetworks.py — (continued)
decnNetworks.py — (continued)
397                else:
398                    proj_factors = self.eliminate_var(proj_factors, v)
399            assert len(proj_factors)==1,"Should there be only one element of
                   proj_factors?"
400            return proj_factors[0].get_value({})
401
402         def show_policy(self):
403             print('\n'.join(df.to_table() for df in self.opt_policy.values()))
decnNetworks.py — (continued)
441   # vf.show_policy()
442
443   #   VE_DN.max_display_level = 3 # if you want to show lots of detail
444   #   vc = VE_DN(cheating_dn)
445   #   vc.optimize()
446   #   vc.show_policy()
447
448
449   def test(dn):
450       rc0dn = RC_DN(dn)
451       rc0v = rc0dn.optimize(algorithm=rc0dn.rc0)
452       rcdn = RC_DN(dn)
453       rcv = rcdn.optimize()
454       assert abs(rc0v-rcv)<1e-10, f"rc0 produces {rc0v}; rc produces {rcv}"
455       vedn = VE_DN(dn)
456       vev = vedn.optimize()
457       assert abs(vev-rcv)<1e-10, f"VE_DN produces {vev}; RC produces {rcv}"
458       print(f"passed unit test. rc0, rc and VE gave same result for {dn}")
459
460   if __name__ == "__main__":
461       test(fire_dn)
30
31         def P(self,s,a):
32             """Transition probability function
33             returns a dictionary of {s1:p1} such that P(s1 | s,a)=p1. Other
                   probabilities are zero.
34             """
35             raise NotImplementedError("P") # abstract method
36
37         def R(self,s,a):
38             """Reward function R(s,a)
39             returns the expected reward for doing a in state s.
40             """
41             raise NotImplementedError("R") # abstract method
     Two state partying example (Example 9.27 in Poole and Mackworth [2017]):
                                 mdpExamples.py — MDP Examples
11   from mdpProblem import MDP, GridMDP
12   import matplotlib.pyplot as plt
13
14   class party(MDP):
15       """Simple 2-state, 2-Action Partying MDP Example"""
16       def __init__(self, discount=0.9):
17           states = {'healthy','sick'}
18           actions = {'relax', 'party'}
19           MDP.__init__(self, states, actions, discount)
20
21         def R(self,s,a):
22             "R(s,a)"
23             return { 'healthy': {'relax': 7, 'party': 10},
24                      'sick': {'relax': 0, 'party': 2 }}[s][a]
25
26         def P(self,s,a):
27             "returns a dictionary of {s1:p1} such that P(s1 | s,a)=p1. Other
                   probabilities are zero."
28             phealthy = { # P('healthy' | s, a)
29                         'healthy': {'relax': 0.95, 'party': 0.7},
30                         'sick': {'relax': 0.5, 'party': 0.1 }}[s][a]
31             return {'healthy':phealthy, 'sick':1-phealthy}
         The next example is the tiny game from Example 12.1 and Figure 12.1 of
     Poole and Mackworth [2017]. The state is represented as (x, y) where x counts
     from zero from the left, and y counts from zero upwards, so the state (0, 0) is on
     the bottom-left state. The actions are upC for up-careful, and upR for up-risky.
     (Note that GridMDP is just a type of MDP for which we have methods to show;
     you can assume it is just MDP here).
                                   mdpExamples.py — (continued)
34   class MDPtiny(GridMDP):
35       def __init__(self, discount=0.9):
36           actions = ['right', 'upC', 'left', 'upR']
37          self.x_dim = 2 # x-dimension
38          self.y_dim = 3
39          states = [(x,y) for x in range(self.x_dim) for y in
                range(self.y_dim)]
40          # for GridMDP
41          self.xoff = {'right':0.25, 'upC':0, 'left':-0.25, 'upR':0}
42          self.yoff = {'right':0, 'upC':-0.25, 'left':0, 'upR':0.25}
43          GridMDP.__init__(self, states, actions, discount)
44
45      def P(self,s,a):
46          """return a dictionary of {s1:p1} if P(s1 | s,a)=p1. Other
                probabilities are zero.
47          """
48          (x,y) = s
49          if a == 'right':
50              return {(1,y):1}
51          elif a == 'upC':
52              return {(x,min(y+1,2)):1}
53          elif a == 'left':
54              if (x,y) == (0,2): return {(0,0):1}
55              else: return {(0,y): 1}
56          elif a == 'upR':
57              if x==0:
58                  if y<2: return {(x,y):0.1, (x+1,y):0.1, (x,y+1):0.8}
59                  else: # at (0,2)
60                      return {(0,0):0.1, (1,2): 0.1, (0,2): 0.8}
61              elif y < 2: # x==1
62                  return {(0,y):0.1, (1,y):0.1, (1,y+1):0.8}
63              else: # at (1,2)
64                 return {(0,2):0.1, (1,2): 0.9}
65
66      def R(self,s,a):
67          (x,y) = s
68          if a == 'right':
69              return [0,-1][x]
70          elif a == 'upC':
71              return [-1,-1,-2][y]
72          elif a == 'left':
73              if x==0:
74                  return [-1, -100, 10][y]
75              else: return 0
76          elif a == 'upR':
77              return [[-0.1, -10, 0.2],[-0.1, -0.1, -0.9]][x][y]
78                  # at (0,2) reward is 0.1*10+0.8*-1=0.2
         Here is the domain of Example 9.28 of Poole and Mackworth [2017]. Here
     the state is represented as (x, y) where x counts from zero from the left, and y
     counts from zero upwards, so the state (0, 0) is on the bottom-left state.
                                  mdpExamples.py — (continued)
80 class grid(GridMDP):
128               (x,y) = s
129               rew = 0
130               # rewards from crashing:
131               if y==0: ## on bottom.
132                   rew += -0.7 if a == 'down' else -0.1
133               if y==self.y_dim-1: ## on top.
134                   rew += -0.7 if a == 'up' else -0.1
135               if x==0: ## on left
136                   rew += -0.7 if a == 'left' else -0.1
137               if x==self.x_dim-1: ## on right.
138                   rew += -0.7 if a == 'right' else -0.1
139               return rew
144   #   pt.vi(1)
145   #   pt.vi(100)
146   #   party(discount=0.99).vi(100)
147   #   party(discount=0.4).vi(100)
148
149   #   gr = grid(discount=0.9)
150   #   gr.show()
151   #   q,v,pi = gr.vi(100)
152   #   q[(7,2)]
mdpProblem.py — (continued)
 61   class GridMDP(MDP):
 62       def __init__(self, states, actions, discount):
 63           MDP.__init__(self, states, actions, discount)
 64
 65         def show(self):
 66             #plt.ion() # interactive
 67             fig,(self.ax) = plt.subplots()
 68             plt.subplots_adjust(bottom=0.2)
 69             stepB = Button(plt.axes([0.8,0.05,0.1,0.075]), "step")
 70             stepB.on_clicked(self.on_step)
 71             resetB = Button(plt.axes([0.65,0.05,0.1,0.075]), "reset")
 72             resetB.on_clicked(self.on_reset)
 73             self.qcheck = CheckButtons(plt.axes([0.2,0.05,0.35,0.075]),
 74                                         ["show q-values","show policy"])
 75             self.qcheck.on_clicked(self.show_vals)
 76             self.font_box = TextBox(plt.axes([0.1,0.05,0.05,0.075]),"Font:",
                    textalignment="center")
 77             self.font_box.on_submit(self.set_font_size)
 78             self.font_box.set_val(str(plt.rcParams['font.size']))
 79             self.show_vals(None)
 80             plt.show()
 81
 82         def set_font_size(self, s):
 83             plt.rcParams.update({'font.size': eval(s)})
 84             plt.draw()
 85
 86         def show_vals(self,event):
 87             self.ax.cla()
 88             array = [[self.v[(x,y)] for x in range(self.x_dim)]
 89                                               for y in range(self.y_dim)]
 90             self.ax.pcolormesh([x-0.5 for x in range(self.x_dim+1)],
 91                                  [x-0.5 for x in range(self.y_dim+1)],
 92                                  array, edgecolors='black',cmap='summer')
          Figure 12.5 shows the user interface for the tiny domain, which can be ob-
      tained using
      MDPtiny(discount=0.9).show()
      resizing it, checking “show q-values” and “show policy”, and clicking “step” a
      few times.
                                 24.27                    21.71
                    2    28.09           22.34    25.03           21.34
                                 23.03                    20.34
                                 14.10                    21.84
                    1    -78.56          19.25    21.44           18.25
                                 24.03                    21.34
                                 20.53                    18.78
                    0    17.09           16.67    18.09           15.67
                                 20.44                    18.25
0 1
Figure 12.5: Interface for tiny example, after a number of steps. Each rectangle
represents a state. In each rectangle are the 4 Q-values for the state. The leftmost
number is the for the left action; the rightmost number is for the right action; the
upper most is for the upR (up-risky) action and the lowest number is for the
upC action. The arrow points to the action(s) with the maximum Q-value. Use
MDPtiny().show() after loading mdpExamples.py
    Figure 12.6 shows the user interface for the grid domain, which can be ob-
tained using
grid(discount=0.9).show()
resizing it, checking “show q-values” and “show policy”, and clicking “step” a
few times.
Exercise 12.1 Computing q before v may seem like a waste of space because we
don’t need to store q in order to compute value function or the policy. Change the
              0.12      0.54      0.85      1.18      1.57      2.01      2.50      2.89      2.57      2.03
         9 0.12 0.94 0.92 1.32 1.27 1.65 1.59 2.01 1.94 2.43 2.35 2.90 2.80 3.37 3.22 3.27 3.39 2.87 2.93 2.03
              0.93      1.35      1.68      2.04      2.46      2.94      3.49      3.99      3.58      3.02
              0.90      1.33      1.65      2.00      2.40      2.87      3.41      3.82      3.49      2.93
         8 0.51 1.33 1.32 1.74 1.68 2.10 2.03 2.51 2.43 3.00 2.90 3.56 3.44 4.17 3.94 4.00 4.21 3.58 3.64 2.72
              1.19      1.63      1.99      2.42      2.93      3.52      4.21      4.91      4.32      3.73
              1.17      1.59      1.93      2.32      2.82      3.37      4.00      6.01      4.21      3.60
         7 0.65 1.48 1.45 1.91 1.83 2.32 2.21 2.82 2.73 3.44 3.31 4.13 3.96 4.97 6.01 6.01 5.12 4.30 4.35 3.42
              1.20      1.60      1.90      2.27      3.07      3.69      4.33      6.01      5.10      4.50
              1.24      1.67      2.00      2.07      3.07      3.77      4.50      5.34      4.86      4.34
         6 0.59 1.39 1.39 1.75 1.69 2.05 1.66 2.41 2.51 3.45 3.40 4.14 4.05 4.83 4.70 5.32 5.10 5.01 5.14 4.23
              1.21      1.60      1.70      -0.62     3.07      4.05      4.79      5.57      5.97      5.40
              1.21      1.58      1.49        -2.72       2.80      3.91      4.62      5.34      5.71      5.22
         5 0.63 1.43 1.41 1.59 1.35 -0.79 -3.07 -2.16 -0.23 3.45 3.54 4.65 4.53 5.50 5.31 6.21 5.96 5.97 6.19 5.20
              1.37      1.78      1.77        -2.32       3.38      4.63      5.51      6.45      7.19      6.46
              1.29      1.70      1.83      -0.44     3.42      4.49      5.34      6.24      6.86      6.27
         4 0.82 1.67 1.64 2.13 2.02 2.58 2.12 3.17 3.26 4.51 4.42 5.48 5.32 6.48 6.25 7.46 7.10 7.13 7.48 6.33
              1.43      1.88      2.26      2.46      4.33      5.43      6.47      7.62      8.71      7.69
              1.43      1.89      2.24      2.13      4.14      5.24      6.25      7.40      8.29      7.48
         3 0.83 1.68 1.65 2.13 2.00 2.57 1.81 3.20 3.43 5.15 5.06 6.39 6.20 7.61 7.39 9.01 8.45 8.50 9.06 7.65
              1.34      1.73      1.65      -2.96     4.30      6.08      7.44      9.00      10.61     9.10
              1.41      1.81      1.46        -7.13       3.78      5.81      7.07      8.44       13.01        8.59
         2 0.72 1.50 1.47 1.47 1.06 -3.31 -8.04 -6.26 -2.38 4.81 4.96 7.05 6.77 8.68 8.26 10.60 13.01 13.01 10.70 8.85
              1.44      1.84      1.50        -7.10       3.78      5.81      7.07      8.44       13.01        8.59
              1.35      1.76      1.69      -2.91     4.30      6.08      7.44      9.00      10.61     9.11
         1 0.87 1.72 1.69 2.19 2.07 2.64 1.89 3.25 3.46 5.16 5.06 6.39 6.20 7.62 7.39 9.01 8.46 8.51 9.06 7.65
              1.45      1.99      2.45      2.43      4.15      5.22      6.24      7.40      8.32      7.50
              1.39      1.90      2.35      2.94      4.37      5.40      6.46      7.63      8.76      7.71
         0 0.78 1.69 1.63 2.28 2.16 2.89 2.75 3.63 3.55 4.53 4.40 5.45 5.29 6.47 6.26 7.50 7.15 7.19 7.52 6.36
              0.78      1.34      1.89      2.55      3.44      4.30      5.24      6.29      7.15      6.36
                0         1          2           3         4         5          6            7        8          9
                                 show q-values
                                                                                    reset                     step
                                 show policy
Figure 12.6: Interface for grid example, after a number of steps. Each rectan-
gle represents a state. In each rectangle are the 4 Q-values for the state. The
leftmost number is the for the left action; the rightmost number is for the right
action; the upper most is for the up action and the lowest number is for the down
action. The arrow points to the action(s) with the maximum Q-value. From
grid(discount=0.9).show()
      algorithm so that it loops through the states and actions once per iteration, and
      only stores the value function and the policy. Note that to get the same results as
      before, you would need to make sure that you use the previous value of v in the
      computation not the current value of v. Does using the current value of v hurt the
      algorithm or make it better (in approaching the actual value function)?
      Exercise 12.2 Implement value iteration that stores the V-values rather than the
      Q-values. Does it work better than storing Q? (What might better mean?)
      Exercise 12.3 In asynchronous value iteration, try a number of different ways
      to choose the states and actions to update (e.g., sweeping through the state-action
      pairs, choosing them at random). Note that the best way may be to determine
      which states have had their Q-values change the most, and then update the pre-
      vious ones, but that is not so straightforward to implement, because you need to
      find those previous states.
Reinforcement Learning
        • An environment implements the method do that takes the action and re-
          turns a pair of the reward and the resulting environment state.
                                                   295
     296                                                           13. Reinforcement Learning
rlProblem.py — (continued)
26
27   class RL_agent(Agent):
28       """An RL_Agent
29       has percepts (s, r) for some state s and real reward r
30       """
31       def __init__(self, actions):
32          self.actions = actions
33
34         def initial_action(self, env_state):
35             """return the initial action, and remember the state and action
36             Act randomly initially
37             Could be overridden to initialize data structures (as the agent now
                   knows about one state)
38             """
39             self.state = env_state
40             self.action = random.choice(self.actions)
41             return self.action
42
43         def select_action(self, reward, state):
44             """
45             Select the action given the reward and next state
46             Remember the action in self.action
47             This implements "Act randomly" and should be overridden!
48             """
49             self.action = random.choice(self.actions)
50             return self.action
           This is similar to Simulate of Section 2.1, except it is initialized by agent.initial_action(state).
                                      rlProblem.py — (continued)
The following plots the sum of rewards as a function of the step in a simulation.
rlProblem.py — (continued)
          Here is the definition of the simple 2-state, 2-action decision about whether
      to party or relax (Example 12.29 in Poole and Mackworth [2023]).
                        rlExamples.py — Some example reinforcement learning environments
 11   from rlProblem import RL_env
 12   class Healthy_env(RL_env):
 13       def __init__(self):
 14           RL_env.__init__(self, "Party Decision", ["party", "relax"],
                  "healthy")
 15
 16         def do(self, action):
 17             """updates the state based on          the agent doing action.
 18             returns reward,state
 19             """
 20             if self.state=="healthy":
 21                 if action=="party":
 22                     self.state = "healthy"         if flip(0.7) else "sick"
 23                     reward = 10
 24                 else: # action=="relax"
 25                     self.state = "healthy"         if flip(0.95) else "sick"
 26                     reward = 7
 27             else: # self.state=="sick"
 28                 if action=="party":
 29                     self.state = "healthy"         if flip(0.1) else "sick"
 30                     reward = 2
 31                 else:
 32                     self.state = "healthy"         if flip(0.5) else "sick"
 33                     reward = 0
 34             return reward, self.state
4 P1 R P2
3 M
2 M
1 M M M
0 P3 P4
0 1 2 3 4
 36   import random
 37   from utilities import flip
 38   from rlProblem import RL_env
 39
 40   class Monster_game_env(RL_env):
 41       xdim = 5
 42       ydim = 5
 43
 44      vwalls = [(0,3), (0,4), (1,4)] # vertical walls right of these locations
 45      hwalls = [] # not implemented
 46      crashed_reward = -1
 47
 48      prize_locs = [(0,0), (0,4), (4,0), (4,4)]
 49      prize_apears_prob = 0.3
 50      prize_reward = 10
 51
 52      monster_locs = [(0,1), (1,1), (2,3), (3,1), (4,2)]
 53         monster_appears_prob = 0.4
 54         monster_reward_when_damaged = -10
 55         repair_stations = [(1,4)]
 56
 57         actions = ["up","down","left","right"]
 58
 59         def __init__(self):
 60             # State:
 61             self.x = 2
 62             self.y = 2
 63             self.damaged = False
 64             self.prize = None
 65             # Statistics
 66             self.number_steps = 0
 67             self.accumulated_rewards = 0 # sum of rewards received
 68             self.min_accumulated_rewards = 0
 69             self.min_step = 0
 70             self.zero_crossing = 0
 71             RL_env.__init__(self, "Monster Game", self.actions, (self.x,
                    self.y, self.damaged, self.prize))
 72             self.display(2,"","Step","Tot Rew","Ave Rew",sep="\t")
 73
 74         def do(self,action):
 75             """updates the state based on the agent doing action.
 76             returns reward,state
 77             """
 78             assert action in self.actions, f"Monster game, unknown action:
                    {action}"
 79             reward = 0.0
 80             # A prize can appear:
 81             if self.prize is None and flip(self.prize_apears_prob):
 82                     self.prize = random.choice(self.prize_locs)
 83             # Actions can be noisy
 84             if flip(0.4):
 85                 actual_direction = random.choice(self.actions)
 86             else:
 87                 actual_direction = action
 88             # Modeling the actions given the actual direction
 89             if actual_direction == "right":
 90                 if self.x==self.xdim-1 or (self.x,self.y) in self.vwalls:
 91                     reward += self.crashed_reward
 92                 else:
 93                     self.x += 1
 94             elif actual_direction == "left":
 95                 if self.x==0 or (self.x-1,self.y) in self.vwalls:
 96                     reward += self.crashed_reward
 97                 else:
 98                     self.x += -1
 99             elif actual_direction == "up":
100                 if self.y==self.ydim-1:
      13.2      Q Learning
         To run the Q-learning demo, in folder “aipython”, load
         “rlQLearner.py”, and copy and paste the example queries at the
         bottom of that file.
                                  rlQLearner.py — Q Learning
 11   import random
 12   import math
rlQLearner.py — (continued)
 56          """
 57          self.Q[state] = {act:self.Qinit for act in self.actions}
 58          self.action = random.choice(self.actions)
 59          self.visits[state] = {act:0 for act in self.actions}
 60          self.state = state
 61          self.display(2, f"Initial State: {state} Action {self.action}")
 62          self.display(2,"s\ta\tr\ts'\tQ")
 63          return self.action
 64
 65      def select_action(self, reward, next_state):
 66          """give reward and next state, select next action to be carried
                 out"""
 67          if next_state not in self.visits: # next state not seen before
 68              self.Q[next_state] = {act:self.Qinit for act in self.actions}
 69              self.visits[next_state] = {act:0 for act in self.actions}
 70          self.visits[self.state][self.action] +=1
 71          alpha = self.alpha_fun(self.visits[self.state][self.action])
 72          self.Q[self.state][self.action] += alpha*(
 73                            reward
 74                            + self.discount * max(self.Q[next_state].values())
 75                            - self.Q[self.state][self.action])
 76          self.display(2,self.state, self.action, reward, next_state,
 77                      self.Q[self.state][self.action], sep='\t')
 78          self.state = next_state
 79          self.action = self.exploration_strategy(self.Q[next_state],
 80                                    self.visits[next_state],**self.es_kwargs)
 81          self.display(3,f"Agent {self.role} doing {self.action} in state
                 {self.state}")
 82          return self.action
      Exercise 13.1 Implement SARSA. Hint: it does not do a max in do. Instead it
      needs to choose next act before it does the update.
         • Vs is a {action : visits} dictionary for the current state; where visits is the
           number of times that the action has been carried out in the current state.
rlProblem.py — (continued)
128          if flip(epsilon):
129              return random.choice(list(Qs.keys())) # act randomly
130          else:
131              return argmaxd(Qs)
132
133   def ucb(Qs, Vs, c=1.4):
134          """select action given upper-confidence bound
135          Qs is the {action:Q-value} dictionary for current state
136          Vs is the {action:visits} dictionary for current state
137
138          0.01 is to prevent divide-by zero (could just be infinity)
139          """
140          Ns = sum(Vs.values())
141          ucb1 = {a:Qs[a]+c*math.sqrt(Ns/(0.01+Vs[a]))
142                     for a in Qs.keys()}
143          action = argmaxd(ucb1)
144          return action
102   #   Simulate(ag_ucb,env).go(100).plot()
103   #   Simulate(ag_opt,env).go(100).plot()
104   #   Simulate(ag_exp_m,env).go(100).plot()
105   #   Simulate(ag_greedy,env).go(100).plot()
106
107
108   from mdpExamples import MDPtiny
109   from rlProblem import Env_from_MDP
110   envt = Env_from_MDP(MDPtiny())
111   agt = Q_learner(envt.name, envt.actions, 0.8)
112   #Simulate(agt, envt).go(1000).plot()
113
114   mon_env = Monster_game_env()
115   mag1 = Q_learner(mon_env.name, mon_env.actions,0.9)
116   #Simulate(mag1,mon_env).go(100000).plot()
117   mag_ucb = Q_learner(mon_env.name, mon_env.actions,0.9,exploration_strategy
          = ucb,es_kwargs={'c':0.1},method="UCB(0.1)")
118   #Simulate(mag_ucb,mon_env).go(100000).plot()
119
120   mag2 = Q_learner(mon_env.name, mon_env.actions,
          0.9,es_kwargs={'epsilon':0.2},alpha_fun=lambda
          k:1/k,method="alpha=1/k")
121   #Simulate(mag2,mon_env).go(100000).plot()
122   mag3 = Q_learner(mon_env.name, mon_env.actions, 0.9,alpha_fun=lambda
          k:10/(9+k),method="alpha=10/(9+k)")
123   #Simulate(mag3,mon_env).go(100000).plot()
27                   self.buffer[position] = experience
28            self.number_added += 1
29
30         def get(self):
31             return self.buffer[random.randrange(min(self.number_added,
                   self.buffer_size))]
rlQExperienceReplay.py — (continued)
33   class Q_ER_learner(Q_learner):
34       def __init__(self, role, actions, discount,
35                   max_buffer_size=10000,
36                   num_updates_per_action=5, burn_in=1000,
37                  method="Q_ER_learner", **q_kwargs):
38           """Q-learner with experience replay
39           role is the role of the agent (e.g., in a game)
40           actions is the set of actions the agent can do
41           discount is the discount factor
42           max_buffer_size is the maximum number of past experiences that is
                 remembered
43           burn_in is the number of steps before using old experiences
44           num_updates_per_action is the number of q-updates for past
                 experiences per action
45           q_kwargs are any extra parameters for Q_learner
46           """
47           Q_learner.__init__(self, role, actions, discount, method=method,
                 **q_kwargs)
48           self.experience_buffer = BoundedBuffer(max_buffer_size)
49           self.num_updates_per_action = num_updates_per_action
50           self.burn_in = burn_in
51
52         def select_action(self, reward, next_state):
53             """give reward and new state, select next action to be carried
                   out"""
54             self.experience_buffer.add((self.state,self.action,reward,next_state))
                   #remember experience
55             if next_state not in self.Q: # Q and visits are defined on the same
                   states
56                 self.Q[next_state] = {act:self.Qinit for act in self.actions}
57                 self.visits[next_state] = {act:0 for act in self.actions}
58             self.visits[self.state][self.action] +=1
59             alpha = self.alpha_fun(self.visits[self.state][self.action])
60             self.Q[self.state][self.action] += alpha*(
61                               reward
62                               + self.discount * max(self.Q[next_state].values())
63                               - self.Q[self.state][self.action])
64             self.display(2,self.state, self.action, reward, next_state,
65                         self.Q[self.state][self.action], sep='\t')
66             self.state = next_state
rlQExperienceReplay.py — (continued)
        • Q[s][a] is dictionary that, given state s and action a returns the Q-value,
          the estimate of the future (discounted) value of being in state s and doing
          action a.
        • R[s][a] is dictionary that, given a (s, a) state s and action a is the average
          reward received from doing a in state s.
        • T [s][a][s′ ] is dictionary that, given states s and s′ and action a returns the
          number of times a was done in state s and the result was state s′ . Note
          that s′ is only a key if it has been the result of doing a in s; there are no 0
          counts recorded.
        • visits[s][a] is dictionary that, given state s and action a returns the number
          of times action a was carried out in state s. This is the C of Figure 13.6 of
          Poole and Mackworth [2023].
            Note that visits[s][a] = ∑s′ T [s][a][s′ ] but is stored separately to keep the
            code more readable.
     The main difference to Figure 13.6 of Poole and Mackworth [2023] is the code
     below does a fixed number of asynchronous value iteration updates per step.
                           rlModelLearner.py — Model-based Reinforcement Learner
11   import random
12   from rlProblem import RL_agent, Simulate, epsilon_greedy, ucb
13   from display import Displayable
14   from utilities import argmaxe, flip
15
16   class Model_based_reinforcement_learner(RL_agent):
17       """A Model-based reinforcement learner
18       """
19
20         def __init__(self, role, actions, discount,
21                         exploration_strategy=epsilon_greedy, es_kwargs={},
22                         Qinit=0,
23                       updates_per_step=10, method="MBR_learner"):
24             """role is the role of the agent (e.g., in a game)
25             actions is the list of actions the agent can do
26             discount is the discount factor
27             explore is the proportion of time the agent will explore
28             Qinit is the initial value of the Q's
29             updates_per_step is the number of AVI updates per action
30             label is the label for plotting
31             """
32             RL_agent.__init__(self, actions)
33             self.role = role
34             self.actions = actions
35             self.discount = discount
36             self.exploration_strategy = exploration_strategy
37             self.es_kwargs = es_kwargs
38             self.Qinit = Qinit
39             self.updates_per_step = updates_per_step
40             self.method = method
rlModelLearner.py — (continued)
rlModelLearner.py — (continued)
rlModelLearner.py — (continued)
     Exercise 13.3 If there was only one update per step, the algorithm can be made
     simpler and use less space. Explain how. Does it make it more efficient? Is it
     worthwhile having more than one update per step for the games implemented
     here?
     Exercise 13.4 It is possible to implement the model-based reinforcement learner
     by replacing Q, R, T, visits, res states with a single dictionary that, given a state and
     action returns a tuple corresponding to these data structures. Does this make the
     algorithm easier to understand? Does this make the algorithm more efficient?
     Exercise 13.5 If the states and the actions were mapped into integers, the dic-
     tionaries could be implemented perhaps more efficiently as arrays. How does the
     code need to change?. Implement this for the monster game. Is it more efficient?
     Exercise 13.6 In random_choice in the updates of select_action, all state-action
     pairs have the same chance of being chosen. Does selecting state-action pairs pro-
     portionally to the number of times visited work better than what is implemented?
     Provide evidence for your answer.
 66             return 1
 67         elif action == "up" and (x,y+1) in Monster_game_env.monster_locs:
 68             return 1
 69         elif action == "down" and (x,y-1) in Monster_game_env.monster_locs:
 70             return 1
 71         else:
 72             return 0
 73
 74   def wall_ahead(x,y,action):
 75       """returns 1 if there is a wall in the direction of action from (x,y).
 76       This is complicated by the internal walls.
 77       """
 78       if action == "right" and (x==Monster_game_env.xdim-1 or (x,y) in
              Monster_game_env.vwalls):
 79           return 1
 80       elif action == "left" and (x==0 or (x-1,y) in Monster_game_env.vwalls):
 81           return 1
 82       elif action == "up" and y==Monster_game_env.ydim-1:
 83           return 1
 84       elif action == "down" and y==0:
 85           return 1
 86       else:
 87           return 0
 88
 89   def towards_prize(x,y,action,p):
 90       """action goes in the direction of the prize from (x,y)"""
 91       if p is None:
 92           return 0
 93       elif p==(0,4): # take into account the wall near the top-left prize
 94           if action == "left" and (x>1 or x==1 and y<3):
 95               return 1
 96           elif action == "down" and (x>0 and y>2):
 97               return 1
 98           elif action == "up" and (x==0 or y<2):
 99               return 1
100           else:
101               return 0
102       else:
103           px,py = p
104           if p==(4,4) and x==0:
105               if (action=="right" and y<3) or (action=="down" and y>2) or
                      (action=="up" and y<2):
106                   return 1
107               else:
108                   return 0
109           if (action == "up" and y<py) or (action == "down" and py<y):
110               return 1
111           elif (action == "left" and px<x) or (action == "right" and x<px):
112               return 1
113           else:
114              return 0
115
116   def towards_repair(x,y,action):
117       """returns 1 if action is towards the repair station.
118       """
119       if action == "up" and (x>0 and y<4 or x==0 and y<2):
120           return 1
121       elif action == "left" and x>1:
122           return 1
123       elif action == "right" and x==0 and y<3:
124           return 1
125       elif action == "down" and x==0 and y>2:
126           return 1
127       else:
128           return 0
        The following uses a simpler set of features. In particular, it only considers
      whether the action will most likely result in a monster position or a wall, and
      whether the action moves towards the current prize.
                                 rlMonsterGameFeatures.py — (continued)
20                        exploration_strategy=epsilon_greedy, es_kwargs={},
21                        step_size=0.01, winit=0, method="SARSA_LFA"):
22            """role is the role of the agent (e.g., in a game)
23            actions is the set of actions the agent can do
24            discount is the discount factor
25            get_features is a function get_features(state,action) -> list of
                  feature values
26            exploration_strategy is the exploration function, default
                  "epsilon_greedy"
27            es_kwargs is extra keyword arguments of the exploration_strategy
28            step_size is gradient descent step size
29            winit is the initial value of the weights
30            method gives the method used to implement the role (for plotting)
31            """
32            RL_agent.__init__(self, actions)
33            self.role = role
34            self.discount = discount
35            self.exploration_strategy = exploration_strategy
36            self.es_kwargs = es_kwargs
37            self.get_features = get_features
38            self.step_size = step_size
39            self.winit = winit
40            self.method = method
     The initial action is a random action. It remembers the state, and initializes the
     data structures.
                                    rlFeatures.py — (continued)
54
55         def Q(self, state,action):
56             """returns Q-value of the state and action for current weights
57             """
58             return dot_product(self.weights, self.get_features(state,action))
59
60         def select_action(self, reward, next_state):
61             """do num_steps of interaction with the environment"""
62             feature_values = self.get_features(self.state,self.action)
63          oldQ = self.Q(self.state,self.action)
64          next_action = self.exploration_strategy({a:self.Q(next_state,a)
65                                                   for a in self.actions}, {})
66          nextQ = self.Q(next_state,next_action)
67          delta = reward + self.discount * nextQ - oldQ
68          for i in range(len(self.weights)):
69              self.weights[i] += self.step_size * delta * feature_values[i]
70          self.display(2,self.state, self.action, reward, next_state,
71                      self.Q(self.state,self.action), delta, sep='\t')
72          self.state = next_state
73          self.action = next_action
74          return self.action
75
76      def show_actions(self,state=None):
77          """prints the value for each action in a state.
78          This may be useful for debugging.
79          """
80          if state is None:
81              state = self.state
82          for next_act in self.actions:
83              print(next_act,dot_product(self.weights,
                    self.get_features(state,next_act)))
84
85   def dot_product(l1,l2):
86       return sum(e1*e2 for (e1,e2) in zip(l1,l2))
Test code:
rlFeatures.py — (continued)
     Exercise 13.7 How does the step-size affect performance? Try different step sizes
     (e.g., 0.1, 0.001, other sizes in between). Explain the behavior you observe. Which
     step size works best for this example. Explain what evidence you are basing your
     prediction on.
     Exercise 13.8 Does having extra features always help? Does it sometime help?
     Does whether it helps depend on the step size? Give evidence for your claims.
     Exercise 13.9 For each of the following first predict, then plot, then explain the
     behavior you observed:
  (a) SARSA LFA, Model-based learning (with 1 update per step) and Q-learning
      for 10,000 steps 20% exploring followed by 10,000 steps 100% exploiting
 (b) SARSA LFA, model-based learning and Q-learning for
        i) 100,000 steps 20% exploring followed by 100,000 steps 100% exploit
        ii) 10,000 steps 20% exploring followed by 190,000 steps 100% exploit
  (c) Suppose your goal was to have the best accumulated reward after 200,000
      steps. You are allowed to change the exploration rate at a fixed number of
      steps. For each of the methods, which is the best position to start exploiting
      more? Which method is better? What if you wanted to have the best reward
      after 10,000 or 1,000 steps?
Based on this evidence, explain when it is preferable to use SARSA LFA, Model-
based learner, or Q-learning.
   Important: you need to run each algorithm more than once. Your explanation
should include the variability as well as the typical behavior.
Exercise 13.10 In the call to self.exploration_strategy, what should the counts
be? (The code above will fail for ucb, for example.) Think about the case where
there are too many states. Suppose we are just learning for a neighborhood of a
current state (e.g., a fixed number of steps away the from the current state); how
could the algorithm be modifies to make sure it has at least explored the close
neighborhood of the current state?
Multiagent Systems
     14.1      Minimax
     Here we consider two-player zero-sum games. Here a player only wins when
     another player loses. This can be modeled as where there is a single utility
     which one agent (the maximizing agent) is trying minimize and the other agent
     (the minimizing agent) is trying to minimize.
                                              317
     318                                                         14. Multiagent Systems
29
30         def children(self):
31             """returns the list of all children."""
32             return self.allchildren
33
34         def evaluate(self):
35             """returns the evaluation for this node if it is a leaf"""
36             return self.value
     The following gives the tree from Figure 11.5 of the book. Note how 888 is used
     as a value here, but never appears in the trace.
masProblem.py — (continued)
38   fig10_5 = Node("a",True,None, [
39              Node("b",False,None, [
40                  Node("d",True,None, [
41                      Node("h",False,None, [
42                          Node("h1",True,7,None),
43                          Node("h2",True,9,None)]),
44                      Node("i",False,None, [
45                          Node("i1",True,6,None),
46                          Node("i2",True,888,None)])]),
47                  Node("e",True,None, [
48                      Node("j",False,None, [
49                          Node("j1",True,11,None),
50                          Node("j2",True,12,None)]),
51                      Node("k",False,None, [
52                          Node("k1",True,888,None),
53                          Node("k2",True,888,None)])])]),
54              Node("c",False,None, [
55                  Node("f",True,None, [
56                      Node("l",False,None, [
57                          Node("l1",True,5,None),
58                          Node("l2",True,888,None)]),
59                      Node("m",False,None, [
60                          Node("m1",True,4,None),
61                          Node("m2",True,888,None)])]),
62                  Node("g",True,None, [
63                      Node("n",False,None, [
64                          Node("n1",True,888,None),
65                          Node("n2",True,888,None)]),
66                      Node("o",False,None, [
67                          Node("o1",True,888,None),
68                          Node("o2",True,888,None)])])])])
                                          6    1    8
                                          7    5    3
                                          2    9    4
 70
 71   class Magic_sum(Node):
 72       def __init__(self, xmove=True, last_move=None,
 73                   available=[1,2,3,4,5,6,7,8,9], x=[], o=[]):
 74           """This is a node in the search for the magic-sum game.
 75           xmove is True if the next move belongs to X.
 76           last_move is the number selected in the last move
 77           available is the list of numbers that are available to be chosen
 78           x is the list of numbers already chosen by x
 79           o is the list of numbers already chosen by o
 80           """
 81           self.isMax = self.xmove = xmove
 82           self.last_move = last_move
 83           self.available = available
 84           self.x = x
 85           self.o = o
 86           self.allchildren = None #computed on demand
 87           lm = str(last_move)
 88           self.name = "start" if not last_move else "o="+lm if xmove else
                  "x="+lm
 89
 90      def children(self):
 91          if self.allchildren is None:
 92              if self.xmove:
 93                  self.allchildren = [
 94                      Magic_sum(xmove = not self.xmove,
 95                               last_move = sel,
 96                               available = [e for e in self.available if e is
                                      not sel],
 97                               x = self.x+[sel],
 98                               o = self.o)
 99                             for sel in self.available]
100              else:
101                  self.allchildren = [
102                      Magic_sum(xmove = not self.xmove,
103                               last_move = sel,
104                               available = [e for e in self.available if e is
                                      not sel],
105                                x = self.x,
106                                o = self.o+[sel])
107                              for sel in self.available]
108            return self.allchildren
109
110         def isLeaf(self):
111             """A leaf has no numbers available or is a win for one of the
                    players.
112             We only need to check for a win for o if it is currently x's turn,
113             and only check for a win for x if it is o's turn (otherwise it would
114             have been a win earlier).
115             """
116             return (self.available == [] or
117                    (sum_to_15(self.last_move,self.o)
118                     if self.xmove
119                     else sum_to_15(self.last_move,self.x)))
120
121         def evaluate(self):
122             if self.xmove and sum_to_15(self.last_move,self.o):
123                 return -1
124             elif not self.xmove and sum_to_15(self.last_move,self.x):
125                 return 1
126             else:
127                 return 0
128
129   def sum_to_15(last,selected):
130       """is true if last, together with two other elements of selected sum to
              15.
131       """
132       return any(last+a+b == 15
133                 for a in selected if a != last
134                 for b in selected if b != last and b != a)
24          return max_score,max_path
25      else:
26          min_score = float("inf")
27          min_path = None
28          for C in node.children():
29              score,path = minimax(C,depth+1)
30              if score < min_score:
31                  min_score = score
32                  min_path = C.name,path
33          return min_score,min_path
        The following is a depth-first minimax with α-β pruning. It returns the
     value for a node as well as a best path for the agents.
                                masMiniMax.py — (continued)
35   def minimax_alpha_beta(node,alpha,beta,depth=0):
36       """node is a Node, alpha and beta are cutoffs, depth is the depth
37       returns value, path
38       where path is a sequence of nodes that results in the value
39       """
40       node.display(2," "*depth,"minimax_alpha_beta(",node.name,", ",alpha, ",
             ", beta,")")
41       best=None      # only used if it will be pruned
42       if node.isLeaf():
43           node.display(2," "*depth,"returning leaf value",node.evaluate())
44           return node.evaluate(),None
45       elif node.isMax:
46           for C in node.children():
47               score,path = minimax_alpha_beta(C,alpha,beta,depth+1)
48               if score >= beta: # beta pruning
49                   node.display(2," "*depth,"pruned due to
                         beta=",beta,"C=",C.name)
50                   return score, None
51               if score > alpha:
52                   alpha = score
53                   best = C.name, path
54           node.display(2," "*depth,"returning max alpha",alpha,"best",best)
55           return alpha,best
56       else:
57           for C in node.children():
58               score,path = minimax_alpha_beta(C,alpha,beta,depth+1)
59               if score <= alpha: # alpha pruning
60                   node.display(2," "*depth,"pruned due to
                         alpha=",alpha,"C=",C.name)
61                   return score, None
62               if score < beta:
63                   beta=score
64                   best = C.name,path
65           node.display(2," "*depth,"returning min beta",beta,"best=",best)
66           return beta,best
        Testing:
masMiniMax.py — (continued)
         The agent can be tested on the reinforcement learning benchmarks from the
      previous chapter:
masLearn.py — (continued)
         The simulation for a game passes the joint action from all agents to the
      environment, which returns a tuple of rewards – one for each agent – and the
      next state.
masLearn.py — (continued)
114          self.num_steps = 0
115
116      def go(self, steps):
117          for i in range(steps):
118              self.num_steps += 1
119              (self.rewards, state) = self.game.play(self.actions)
120              self.display(3, f"In go rewards={self.rewards}, state={state}")
121              self.reward_history.append(self.rewards)
122              self.state_history.append(state)
123              self.actions = tuple(agent.select_action(reward, state)
124                                     for (agent,reward) in
                                            zip(self.agents,self.rewards))
125              self.action_history.append(self.actions)
126              for i in range(self.game.num_agents):
127                   self.action_dists[i][self.actions[i]] += 1
128              self.dist_history.append([{a:i for (a,i) in elt.items()} for
                     elt in self.action_dists]) # deep copy
129          #print("Scores:", ' '.join(f"{self.agents[i].role} average
                 reward={ag.total_score/self.num_steps}" for ag in self.agents))
130          print("Distributions:", '
                 '.join(str({a:self.dist_history[-1][i][a]/sum(self.dist_history[-1][i].values())
                 for a in self.game.actions[i]})
131                                            for i in
                                                   range(self.game.num_agents)))
132          #return self.reward_history, self.action_history
133
134      def action_dist(self,which_actions=[1,1]):
135          """ which actions is [a0,a1]
136          returns the empirical distribution of actions for agents,
137             where ai specifies the index of the actions for agent i
138          remove this???
139          """
140          return [sum(1 for a in sim.action_history
141                         if
                                a[i]==gm.actions[i][which_actions[i]])/len(sim.action_history)
142                     for i in range(2)]
         The plotting shows how the empirical distributions of the first two agents
      changes as the learning continues.
masLearn.py — (continued)
151            plt.plot([self.dist_history[i][0][x_act]/sum(self.dist_history[i][0].values())
                   for i in range(len(self.dist_history))],
152                    [self.dist_history[i][1][y_act]/sum(self.dist_history[i][1].values())
                           for i in range(len(self.dist_history))])
153            #plt.legend()
154            plt.savefig('soccerplot.pdf')
155            plt.show()
masLearn.py — (continued)
157
158   class ShoppingGame(Displayable):
159       def __init__(self):
160           self.num_agents = 2
161           self.states = ['s']
162           self.initial_state = 's'
163           self.actions = [['shopping', 'football']]*2
164           self.players = ['football-preferrer goes to', 'shopping-preferrer
                  goes to']
165
166         def play(self, actions):
167             """Given (action1,action2) returns (resulting_state, (rewward1,
                    reward2))
168             """
169             return ({('football', 'football'): (2, 1),
170                     ('football', 'shopping'): (0, 0),
171                     ('shopping', 'football'): (0, 0),
172                     ('shopping', 'shopping'): (1, 2)
173                         }[actions], 's')
174
175
176   class SoccerGame(Displayable):
177       def __init__(self):
178           self.num_agents = 2
179           self.states = ['s']
180           self.initial_state = 's'
181           self.initial_state = 's'
182           self.actions = [['right', 'left']]*2
183           self.players = ['goalkeeper', 'kicker']
184
185         def play(self, actions):
186             """Given (action1,action2) returns (resulting_state, (rewward1,
                    reward2))
187             resulting state is 's'
188             """
189             return ({('left', 'left'): (0.6, 0.4),
190                     ('left', 'right'): (0.3, 0.7),
191                     ('right', 'left'): (0.2, 0.8),
192                     ('right', 'right'): (0.9,0.1)
193                   }[actions], 's')
194
masLearn.py — (continued)
244
245
246   # sim.plot_dynamics()
247
248   # empirical proportion that agents did their action at index 1:
249   # sim.action_dist([1,1])
250
251   # (unnormalized) empirical distribution for agent 0
252   # sim.agents[0].dist
      Exercise 14.1 Try the game show game (prisoner’s dilemma) with two StochasticPIAgent
      agents and alpha_fun=lambda k:0.1. Try also 0.01. Why does this work qualita-
      tively different? Is this better?
Relational Learning
                                                     329
     330                                                              15. Relational Learning
27                     ):
28             self.rating_set = rating_set
29             self.ratings = rating_subset or rating_set.training_ratings #
                   whichever is not empty
30             if test_subset is None:
31                 self.test_ratings = self.rating_set.test_ratings
32             else:
33                 self.test_ratings = test_subset
34             self.step_size = step_size
35             self.reglz = reglz
36             self.num_properties = num_properties
37             self.num_ratings = len(self.ratings)
38             self.ave_rating = (sum(r for (u,i,r,t) in self.ratings)
39                              /self.num_ratings)
40             self.users = {u for (u,i,r,t) in self.ratings}
41             self.items = {i for (u,i,r,t) in self.ratings}
42             self.user_bias = {u:0 for u in self.users}
43             self.item_bias = {i:0 for i in self.items}
44             self.user_prop = {u:[random.uniform(-property_range,property_range)
45                                 for p in range(num_properties)]
46                                for u in self.users}
47             self.item_prop = {i:[random.uniform(-property_range,property_range)
48                                 for p in range(num_properties)]
49                                for i in self.items}
50             self.zeros = [0 for p in range(num_properties)]
51             self.iter=0
52
53         def stats(self):
54             self.display(1,"ave sumsq error of mean for training=",
55                      sum((self.ave_rating-rating)**2 for
                            (user,item,rating,timestamp)
56                          in self.ratings)/len(self.ratings))
57             self.display(1,"ave sumsq error of mean for test=",
58                      sum((self.ave_rating-rating)**2 for
                            (user,item,rating,timestamp)
59                          in self.test_ratings)/len(self.test_ratings))
60             self.display(1,"error on training set",
61                         self.evaluate(self.ratings))
62             self.display(1,"error on test set",
63                         self.evaluate(self.test_ratings))
relnCollFilt.py — (continued)
65         def prediction(self,user,item):
66             """Returns prediction for this user on this item.
67             The use of .get() is to handle users or items not in the training
                   set.
68             """
69             return (self.ave_rating
70                    + self.user_bias.get(user,0) #self.user_bias[user]
 71                  + self.item_bias.get(item,0) #self.item_bias[item]
 72                  +
                         sum([self.user_prop.get(user,self.zeros)[p]*self.item_prop.get(item,self.zeros)[
 73                         for p in range(self.num_properties)]))
 74
 75      def learn(self, num_iter = 50):
 76          """ do num_iter iterations of gradient descent."""
 77          for i in range(num_iter):
 78              self.iter += 1
 79              abs_error=0
 80              sumsq_error=0
 81              for (user,item,rating,timestamp) in
                     random.sample(self.ratings,len(self.ratings)):
 82                  error = self.prediction(user,item) - rating
 83                  abs_error += abs(error)
 84                  sumsq_error += error * error
 85                  self.user_bias[user] -= self.step_size*error
 86                  self.item_bias[item] -= self.step_size*error
 87                  for p in range(self.num_properties):
 88                      self.user_prop[user][p] -=
                             self.step_size*error*self.item_prop[item][p]
 89                      self.item_prop[item][p] -=
                             self.step_size*error*self.user_prop[user][p]
 90              for user in self.users:
 91                   self.user_bias[user] -= self.step_size*self.reglz*
                          self.user_bias[user]
 92                   for p in range(self.num_properties):
 93                       self.user_prop[user][p] -=
                              self.step_size*self.reglz*self.user_prop[user][p]
 94              for item in self.items:
 95                  self.item_bias[item] -=
                         self.step_size*self.reglz*self.item_bias[item]
 96                  for p in range(self.num_properties):
 97                      self.item_prop[item][p] -=
                             self.step_size*self.reglz*self.item_prop[item][p]
 98              self.display(1,"Iteration",self.iter,
 99                    "(Ave Abs,AveSumSq) training
                           =",self.evaluate(self.ratings),
100                    "test =",self.evaluate(self.test_ratings))
relnCollFilt.py — (continued)
      Exercise 15.1 The above code updates the parameters after each example, but
      only regularizes after the whole batch. Change the program so that it implements
      stochastic gradient descent with a given batch size, and only updates the parame-
      ters after a batch.
      Exercise 15.2 In the previous questions, can the regularization avoid iterating
      through the parameters for all users and items after a batch? Consider items that
      are in many batches versus those in a few or even no batches. (Warning: This is
      challenging to get right.)
      15.1.1 Plotting
                                    relnCollFilt.py — (continued)
180            if local_file:
181                lines = open(file_name,'r')
182            else:
183                lines = (line.decode('utf-8') for line in
                       urllib.request.urlopen(url))
184            all_ratings = (tuple(int(e) for e in line.strip().split('\t'))
185                            for line in lines)
186            self.training_ratings = []
187            self.training_stats = {1:0, 2:0, 3:0, 4:0 ,5:0}
188            self.test_ratings = []
189            self.test_stats = {1:0, 2:0, 3:0, 4:0 ,5:0}
190            for rate in all_ratings:
191                if rate[3] < date_split: # rate[3] is timestamp
192                    self.training_ratings.append(rate)
193                    self.training_stats[rate[2]] += 1
194                else:
195                    self.test_ratings.append(rate)
196                    self.test_stats[rate[2]] += 1
197            self.display(1,"...read:", len(self.training_ratings),"training
                   ratings and",
198                    len(self.test_ratings),"test ratings")
199            tr_users = {user for (user,item,rating,timestamp) in
                   self.training_ratings}
200            test_users = {user for (user,item,rating,timestamp) in
                   self.test_ratings}
201            self.display(1,"users:",len(tr_users),"training,",len(test_users),"test,",
202                         len(tr_users & test_users),"in common")
203            tr_items = {item for (user,item,rating,timestamp) in
                   self.training_ratings}
204            test_items = {item for (user,item,rating,timestamp) in
                   self.test_ratings}
205            self.display(1,"items:",len(tr_items),"training,",len(test_items),"test,",
206                         len(tr_items & test_items),"in common")
207            self.display(1,"Rating statistics for training set:
                   ",self.training_stats)
208            self.display(1,"Rating statistics for test set: ",self.test_stats)
          Sometimes it is useful to plot a property for all (user, item, rating) triples.
      There are too many such triples in the data set. The method create top subset
      creates a much smaller dataset where this makes sense. It picks the most rated
      items, then picks the users who have the most ratings on these items. It is
      designed for depicting the meaning of properties, and may not be useful for
      other purposes.
                                     relnCollFilt.py — (continued)
20   class ParVar(object):
21       """Parametrized random variable"""
22       def __init__(self, name, log_vars, domain, position=None):
23           self.name = name # string
24           self.log_vars = log_vars
25           self.domain = domain # list of values
26           self.position = position if position else (random.random(),
                 random.random())
27           self.size = len(domain)
     The class RBN is of relational belief networks.
                                  relnProbModels.py — (continued)
29   class RBN(Displayable):
30       def __init__(self, title, parvars, parfactors):
31           self.title = title
32           self.parvars = parvars
33           self.parfactors = parfactors
34           self.log_vars = {V for PV in parvars for V in PV.log_vars}
35
36         def ground(self, populations):
37             """Ground the belief network with the populations of the logical
                   variables.
38             populations is a dictionary that maps each logical variable to the
                   list of individuals.
39             Returns a belief network representation of the grounding.
40             """
41             assert all(lv in populations for lv in self.log_vars)
42             self.cps = []    # conditional probabilities in the grounding
43             self.var_dict = {} # ground variables created
44             for pp in self.parfactors:
45                 self.ground_parfactor(pp, list(self.log_vars), populations, {})
46             return BeliefNetwork(self.title+"_grounded",
                   self.var_dict.values(), self.cps)
47
48         def ground_parfactor(self, parfactor, lvs, populations, context):
49             """
50             parfactor is the parfactor to get instances of
51             lvs is a list of the logical variables in parfactor not assigned in
                   context
relnProbModels.py — (continued)
83   class GrVar(Variable):
84       """Grounded Variable"""
85       def __init__(self,parvar,args):
86           (x,y) = parvar.position
87           pos = (x + random.uniform(-0.2,0.2), y + random.uniform(-0.2,0.2))
88           Variable.__init__(self,parvar.name+"("+",".join(args)+")",
                 parvar.domain, pos)
89           self.parvar= parvar
90           self.args = tuple(args)
91           self.hash_value = None
92
 93         def __hash__(self):
 94             if self.hash_value is None:
 95                 self.hash_value = hash((self.parvar, self.args))
 96             return self.hash_value
 97
 98         def __eq__(self, other):
 99             return isinstance(other,GrVar) and self.parvar == other.parvar and
                    self.args == other.args
      Exercise 15.3 The grounding above creates a random variable for each element
      for each possible combination of individuals in the populations. Change it so that
      it only creates as many random variables as needed to answer a query. For exam-
      ple, for the observations and queries above, only the variables in Figure 17.5 need
      to be created.
Exercise 15.4 Displaying the ground network, e.g., using grades_gr.show(), cre-
ates a messy diagram. Make it so that the user can provide offsets for each individ-
ual and uses the position of the prv plus the offsets of all the individuals involved.
Use this create to create a 2D grid of grades in the example above.
Version History
 • 2023-10-6 Version 0.9.8 GUIS for search, Bayesian learning, causality and
   many smaller changes.
 • 2022-08-13 Version 0.9.5 major revisions including extra code for causality
   and deep learning
 • 2021-07-08 Version 0.9.1 updated the CSP code to have the same repre-
   sentation of variables as used by the probability code
 • 2020-10-20 Version 0.8.4 planning simplified, and gives error if goal not
   part of state (by design). Fixed arc costs.
                                    341
Bibliography
Dua, D. and Graff, C. (2017), UCI machine learning repository. URL http://
 archive.ics.uci.edu/ml. 147
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-
  Y. (2017), LightGBM: A highly efficient gradient boosting decision tree. In
  Advances in Neural Information Processing Systems 30. 182
                                       343
344                                                                 Bibliography
 //artint.info. 9, 23, 25, 46, 47, 48, 74, 121, 205, 209, 210, 216, 218, 298, 299,
 304, 308, 338
                                    345
346                                                                          Index
sampling, 225
    importance sampling, 229
    belief networks, 227
    likelihood weighting, 228
    particle filtering, 229
    rejection, 227
scope, 68
search, 39
    A∗ , 51
    branch-and-bound, 62
    multiple path pruning, 60
search with any conflict, 95
search with var pq, 97
show, 70, 207
sigmoid, 174
softmax, 175
stochastic local search, 93
    any-conflict, 95
    two-stage choice, 96
stochastic simulation, 225
uncertainty, 199
unit test, 21, 59, 80, 111, 112, 114
unrolling
     DBN, 248
updatable priority queue, 99
utility, 271
utility table, 271