Python Code For Artificial Intelligence: Foundations of Computational Agents
Python Code For Artificial Intelligence: Foundations of Computational Agents
http://aipython.org http://artint.info
    ©David L Poole and Alan K Mackworth 2017.
    All code is licensed under a Creative Commons Attribution-NonCommercial-
ShareAlike 4.0 International License. See: http://creativecommons.org/licenses/
by-nc-sa/4.0/deed.en US
    This document and all the code can be downloaded from
http://artint.info/AIPython/ or from http://aipython.org
    The authors and publisher of this book have used their best efforts in prepar-
ing this book. These efforts include the development, research and testing of
the theories and programs to determine their effectiveness. The authors and
publisher make no warranty of any kind, expressed or implied, with regard to
these programs or the documentation contained in this book. The author and
publisher shall not be liable in any event for incidental or consequential dam-
ages in connection with, or arising out of, the furnishing, performance, or use
of these programs.
Contents 3
                                                3
4                                                                                                               Contents
Index 217
in a terminal shell (not in Python). That should “just work”. If not, try using
pip instead of pip3.
    The command python or python3 should then start the interactive python
shell. You can quit Python with a control-D or with quit().
                                       7
8                                          1. Python for Artificial Intelligence
In [6]:
You can then interact at the last prompt.
    There are many textbooks for Python. The best source of information about
python is https://www.python.org/. We will be using Python 3; please down-
load the latest release. The documentation is at https://docs.python.org/3/.
    The rest of this chapter is about what is special about the code for AI tools.
We will only use the Standard Python Library and matplotlib. All of the exer-
cises can be done (and should be done) without using other libraries; the aim
is for you to spend your time thinking about how to solve the problem rather
than searching for pre-existing solutions.
1.4       Pitfalls
It is important to know when side effects occur. Often AI programs consider
what would happen or what may have happened. In many such cases, we
don’t want side effects. When an agent acts in the world, side effects are ap-
propriate.
     In Python, you need to be careful to understand side effects. For example,
the inexpensive function to add an element to a list, namely append, changes
the list. In a functional language like Lisp, adding a new element to a list,
without changing the original list, is a cheap operation. For example if x is a
list containing n elements, adding an extra element to the list in Python (using
append) is fast, but it has the side effect of changing the list x. To construct a
new list that contains the elements of x plus a new element, without changing
the value of x, entails copying the list, or using a different representation for
lists. In the searching code, we will use a different representation for lists for
this reason.
enumerates the values fe for each e in iter for which cond is true. The “if cond”
part is optional, but the “for” and “in” are not optional. Here e has to be a
variable, iter is an iterator, which can generate a stream of data, such as a list,
a set, a range object (to enumerate integers between ranges) or a file. cond
     called, not the value of the variable when the function was defined (this is called
     “late binding”). This means if you want to use the value a variable has when
     the function is created, you need to save the current value of that variable.
     Whereas Python uses “late binding” by default, the alternative that newcomers
     often expect is “early binding”, where a function uses the value a variable had
     when the function was defined, can be easily implemented.
         Consider the following programs designed to create a list of 5 functions,
     where the ith function in the list is meant to add i to its argument:1
                                    pythonDemo.py — Some tricky examples
11   fun_list1 = []
12   for i in range(5):
13       def fun1(e):
14           return e+i
15       fun_list1.append(fun1)
16
17   fun_list2 = []
18   for i in range(5):
19       def fun2(e,iv=i):
20           return e+iv
21       fun_list2.append(fun2)
22
23   fun_list3 = [lambda e: e+i for i in range(5)]
24
25   fun_list4 = [lambda e,iv=i: e+iv for i in range(5)]
26
27   i=56
     Try to predict, and then test to see the output, of the output of the following
     calls, remembering that the function uses the latest value of any variable that
     is not bound in the function call:
                                        pythonDemo.py — (continued)
29   # in Shell do
30   ## ipython -i pythonDemo.py
31   # Try these (copy text after the comment symbol and paste in the Python prompt):
32   # print([f(10) for f in fun_list1])
33   # print([f(10) for f in fun_list2])
34   # print([f(10) for f in fun_list3])
35   # print([f(10) for f in fun_list4])
     In the first for-loop, the function fun uses i, whose value is the last value it was
     assigned. In the second loop, the function fun2 uses iv. There is a separate iv
     variable for each function, and its value is the value of i when the function was
     defined. Thus fun1 uses late binding, and fun2 uses early binding. fun list3
     and fun list4 are equivalent to the first two (except fun list4 uses a different i
     variable).
          1 Numbered lines are Python code available in the code-directory, aipython. The name of
     the file is given in the gray text above the listing. The numbers correspond to the line numbers
     in that file.
         One of the advantages of using the embedded definitions (as in fun1 and
     fun2 above) over the lambda is that is it possible to add a __doc__ string, which
     is the standard for documenting functions in Python, to the embedded defini-
     tions.
pythonDemo.py — (continued)
     Note that the built-in range is unconventional in how it handles a single ar-
     gument, as the single argument acts as the second argument of the function.
     Note also that the built-in range also allows for indexing (e.g., range(2, 30, 3)[2]
     returns 8), which the above implementation does not. However myrange also
     works for floats, which the built-in range does not.
     Exercise 1.1 Implement a version of myrange that acts like the built-in version
     when there is a single argument. (Hint: make the second argument have a default
     value that can be recognized in the function.)
         Yield can be used to generate the same sequence of values as in the example
     of Section 1.5.1:
pythonDemo.py — (continued)
49   def ga(n):
50       """generates square of even nonnegative integers less than n"""
51       for e in range(n):
52           if e%2==0:
53               yield e*e
54   a = ga(20)
     The sequence of next(a), and list(a) gives exactly the same results as the com-
     prehension in Section 1.5.1.
56   def myenumerate(enum):
57       for i in range(len(enum)):
58           yield i,enum[i]
     Exercise 1.2 Write a version of enumerate where the only iteration is “for val in
     enum”. Hint: keep track of the index.
pythonDemo.py — (continued)
     At the end of the code are some commented-out commands you should try in
     interactive mode. Cut from the file and paste into Python (and remember to
     remove the comments symbol and leading space).
     1.7        Utilities
     1.7.1 Display
     In this distribution, to keep things simple and to only use standard Python, we
     use a text-oriented tracing of the code. A graphical depiction of the code could
     override the definition of display (but we leave it as a project).
         The method self .display is used to trace the program. Any call
     where the level is less than or equal to the value for max display level will be
     printed. The to print . . . can be anything that is accepted by the built-in print
     (including any keyword arguments).
         The definition of display is:
     Note that args gets a tuple of the positional arguments, and nargs gets a dictio-
     nary of the keyword arguments). This will not work in Python 2, and will give
     an error.
        Any class that wants to use display can be made a subclass of Displayable.
        To change the maximum display level to say 3, for a class do:
     which will make calls to display in that class print when the value of level is less
     than-or-equal to 3. The default display level is 1. It can also be changed for
     individual objects (the object value overrides the class value).
        The value of max display level by convention is:
0 display nothing
2 also display the values as they change (little detail through a loop)
utilities.py — (continued)
23   def visualize(func):
24       return func
     1.7.2 Argmax
     Python has a built-in max function that takes a generator (or a list or set) and re-
     turns the maximum value. The argmax method returns the index of an element
     that has the maximum value. If there are multiple elements with the maximum
     value, one if the indexes to that value is returned at random. This assumes a
     generator of (element, value) pairs, as for example is generated by the built-in
     enumerate.
                                     utilities.py — (continued)
25   import random
26
27   def argmax(gen):
28       """gen is a generator of (element,value) pairs, where value is a real.
29       argmax returns an element with maximal value.
30       If there are multiple elements with the max value, one is returned at random.
31       """
32       maxv = float('-Infinity')     # negative infinity
33       maxvals = []     # list of maximal elements
34       for (e,v) in gen:
35           if v>maxv:
36               maxvals,maxv = [e], v
37           elif v==maxv:
38               maxvals.append(e)
39       return random.choice(maxvals)
40
41   # Try:
42   # argmax(enumerate([1,6,3,77,3,55,23]))
     Exercise 1.3 Change argmax to have an optinal argument that specifies whether
     you want the “first”, “last” or a “random” index of the maximum value returned.
     If you want the first or the last, you don’t need to keep a list of the maximum
     elements.
     1.7.3 Probability
     For many of the simulations, we want to make a variable True with some prob-
     ability. flip(p) returns True with probability p, and otherwise returns False.
                                     utilities.py — (continued)
44   def flip(prob):
45       """return true with probability prob"""
46       return random.random() < prob
utilities.py — (continued)
48   def dict_union(d1,d2):
49       """returns a dictionary that contains the keys of d1 and d2.
50       The value for each key that is in d2 is the value from d2,
51       otherwise it is the value from d1.
52       This does not have side effects.
53       """
54       d = dict(d1) # copy d1
55       d.update(d2)
56       return d
58   def test():
59       """Test part of utilities"""
60       assert argmax(enumerate([1,6,55,3,55,23])) in [2,4]
61       assert dict_union({1:4, 2:5, 3:4},{5:7, 2:9}) == {1:4, 2:9, 3:4, 5:7}
62       print("Passed unit test in utilities")
63
64   if __name__ == "__main__":
65       test()
                                                19
     20                                                         2. Agents and Control
33 class TP_env(Environment):
34      prices = [234, 234, 234, 234, 255, 255, 275, 275, 211, 211,    211,
35      234, 234, 234, 234, 199, 199, 275, 275, 234, 234, 234, 234,    255,
36      255, 260, 260, 265, 265, 265, 265, 270, 270, 255, 255, 260,    260,
37      265, 265, 150, 150, 265, 265, 270, 270, 255, 255, 260, 260,    265,
38      265, 265, 265, 270, 270, 211, 211, 255, 255, 260, 260, 265,    265,
39      260, 265, 270, 270, 205, 255, 255, 260, 260, 265, 265, 265,    265,
40      270, 270]
41      max_price_addon = 20 # maximum of random value added to get    price
42
43      def __init__(self):
44          """paper buying agent"""
45          self.time=0
46          self.stock=20
47          self.stock_history = [] # memory of the stock history
48          self.price_history = [] # memory of the price history
49
50      def initial_percepts(self):
51          """return initial percepts"""
52          self.stock_history.append(self.stock)
53          price = self.prices[0]+random.randrange(self.max_price_addon)
54          self.price_history.append(price)
55          return {'price': price,
56                 'instock': self.stock}
57
58      def do(self, action):
59          """does action (buy) and returns percepts (price and instock)"""
60          used = pick_from_dist({6:0.1, 5:0.1, 4:0.2, 3:0.3, 2:0.2, 1:0.1})
61          bought = action['buy']
62          self.stock = self.stock+bought-used
63          self.stock_history.append(self.stock)
64          self.time += 1
65          price = (self.prices[self.time%len(self.prices)] # repeating pattern
66                  +random.randrange(self.max_price_addon) # plus randomness
67                  +self.time//2)                      # plus inflation
68          self.price_history.append(price)
69          return {'price': price,
70                 'instock': self.stock}
     The pick from dist method takes in a item : probability dictionary, and returns
     one of the items in proportion to its probability.
agents.py — (continued)
72   def pick_from_dist(item_prob_dist):
73       """ returns a value from a distribution.
74       item_prob_dist is an item:probability dictionary, where the
75           probabilities sum to 1.
76       returns an item chosen in proportion to its probability
77       """
78       ranreal = random.random()
79       for (it,prob) in item_prob_dist.items():
80           if ranreal < prob:
 81                return it
 82            else:
 83                ranreal -= prob
 84        raise RuntimeError(str(item_prob_dist)+" is not a probability distribution")
 86   class TP_agent(Agent):
 87       def __init__(self, env):
 88           self.env = env
 89           self.spent = 0
 90           percepts = env.initial_percepts()
 91           self.ave = self.last_price = percepts['price']
 92           self.instock = percepts['instock']
 93
 94        def go(self, n):
 95            """go for n time steps
 96            """
 97            for i in range(n):
 98                if self.last_price < 0.9*self.ave and self.instock < 60:
 99                    tobuy = 48
100                elif self.instock < 12:
101                    tobuy = 12
102                else:
103                    tobuy = 0
104                self.spent += tobuy*self.last_price
105                percepts = env.do({'buy': tobuy})
106                self.last_price = percepts['price']
107                self.ave = self.ave+(self.last_price-self.ave)*0.05
108                self.instock = percepts['instock']
      Set up an environment and an agent. Uncomment the last lines to run the agent
      for 90 steps, and determine the average amount spent.
                                      agents.py — (continued)
      2.2.3 Plotting
      The following plots the price and number in stock history:
agents.py — (continued)
      In this implementation, each layer, including the top layer, implements the en-
      vironment class, because each layer is seen as an environment from the layer
      above.
          We arbitrarily divide the environment and the body, so that the environ-
      ment just defines the walls, and the body includes everything to do with the
      agent. Note that the named locations are part of the (top-level of the) agent,
      not part of the environment, although they could have been.
      2.3.1 Environment
      The environment defines the walls.
                                   agentEnv.py — Agent environment
 11   import math
 12   from agents import Environment
 13
 14   class Rob_env(Environment):
     2.3.2 Body
     The body defines everything about the agent body.
                                   agentEnv.py — (continued)
21   import math
22   from agents import Environment
23   import matplotlib.pyplot as plt
24   import time
25
26   class Rob_body(Environment):
27       def __init__(self, env, init_pos=(0,0,90)):
28           """ env is the current environment
29           init_pos is a triple of (x-position, y-position, direction)
30               direction is in degrees; 0 is to right, 90 is straight-up, etc
31           """
32           self.env = env
33           self.rob_x, self.rob_y, self.rob_dir = init_pos
34           self.turning_angle = 18 # degrees that a left makes
35           self.whisker_length = 6 # length of the whisker
36           self.whisker_angle = 30 # angle of whisker relative to robot
37           self.crashed = False
38           # The following control how it is plotted
39           self.plotting = True    # whether the trace is being plotted
40           self.sleep_time = 0.05 # time between actions (for real-time plotting)
41           # The following are data structures maintained:
42           self.history = [(self.rob_x, self.rob_y)] # history of (x,y) positions
43           self.wall_history = [] # history of hitting the wall
44
45        def percepts(self):
46            return {'rob_x_pos':self.rob_x, 'rob_y_pos':self.rob_y,
47                   'rob_dir':self.rob_dir, 'whisker':self.whisker() , 'crashed':self.crashed}
48        initial_percepts = percepts # use percept function for initial percepts too
49
50        def do(self,action):
51            """ action is {'steer':direction}
52            direction is 'left', 'right' or 'straight'
53            """
54            if self.crashed:
55                return self.percepts()
56            direction = action['steer']
57            compass_deriv = {'left':1,'straight':0,'right':-1}[direction]*self.turning_angle
58            self.rob_dir = (self.rob_dir + compass_deriv +360)%360 # make in range [0,360)
59            rob_x_new = self.rob_x + math.cos(self.rob_dir*math.pi/180)
      This detects if the whisker and the wall intersect. It’s value is returned as a
      percept.
agentEnv.py — (continued)
 75      def whisker(self):
 76          """returns true whenever the whisker sensor intersects with a wall
 77          """
 78          whisk_ang_world = (self.rob_dir-self.whisker_angle)*math.pi/180
 79              # angle in radians in world coordinates
 80          wx = self.rob_x + self.whisker_length * math.cos(whisk_ang_world)
 81          wy = self.rob_y + self.whisker_length * math.sin(whisk_ang_world)
 82          whisker_line = ((self.rob_x,self.rob_y),(wx,wy))
 83          hit = any(line_segments_intersect(whisker_line,wall)
 84                      for wall in self.env.walls)
 85          if hit:
 86              self.wall_history.append((self.rob_x, self.rob_y))
 87              if self.plotting:
 88                  plt.plot([self.rob_x],[self.rob_y],"ro")
 89                  plt.draw()
 90          return hit
 91
 92   def line_segments_intersect(linea,lineb):
 93       """returns true if the line segments, linea and lineb intersect.
 94       A line segment is represented as a pair of points.
 95       A point is represented as a (x,y) pair.
 96       """
 97       ((x0a,y0a),(x1a,y1a)) = linea
 98       ((x0b,y0b),(x1b,y1b)) = lineb
 99       da, db = x1a-x0a, x1b-x0b
100       ea, eb = y1a-y0a, y1b-y0b
101       denom = db*ea-eb*da
102       if denom==0: # line segments are parallel
103           return False
104       cb = (da*(y0b-y0a)-ea*(x0b-x0a))/denom # position along line b
105       if cb<0 or cb>1:
106           return False
     This determines how to steer depending on whether the goal is to the right or
     the left of where the robot is facing.
                                    agentMiddle.py — (continued)
44      def steer(self,target_pos):
45          if self.percepts['whisker']:
46              self.display(3,'whisker on', self.percepts)
47              return "left"
48          else:
49              gx,gy = target_pos
50              rx,ry = self.percepts['rob_x_pos'],self.percepts['rob_y_pos']
51              goal_dir = math.acos((gx-rx)/math.sqrt((gx-rx)*(gx-rx)
52                                                 +(gy-ry)*(gy-ry)))*180/math.pi
53              if ry>gy:
54                  goal_dir = -goal_dir
55              goal_from_rob = (goal_dir - self.percepts['rob_dir']+540)%360-180
56              assert -180 < goal_from_rob <= 180
57              if goal_from_rob > self.straight_angle:
58                  return "left"
59              elif goal_from_rob < -self.straight_angle:
60                  return "right"
61              else:
62                  return "straight"
63
64      def close_enough(self,target_pos):
65          gx,gy = target_pos
66          rx,ry = self.percepts['rob_x_pos'],self.percepts['rob_y_pos']
67          return (gx-rx)**2 + (gy-ry)**2 <= self.close_threshold_squared
26        def do(self,plan):
27            """carry out actions.
28            actions is of the form {'visit':list_of_locations}
29            It visits the locations in turn.
30            """
31            to_do = plan['visit']
32            for loc in to_do:
33                position = self.locations[loc]
34                arrived = self.middle.do({'go_to':position, 'timeout':self.timeout})
35                self.display(1,"Arrived at",loc,arrived)
     2.3.5 Plotting
     The following is used to plot the locations, the walls and (eventually) the move-
     ment of the robot. It can either plot the movement if the robot as it is go-
     ing (with the default env.plotting = True), or not plot it as it is going (setting
     env.plotting = False; in this case the trace can be plotted using pl.plot run()).
                                     agentTop.py — (continued)
agentTop.py — (continued)
     Exercise 2.1 The following code implements a robot trap. Write a controller that
     can escape the “trap” and get to the goal. See textbook for hints.
                                    agentTop.py — (continued)
• a start node
                                                   31
     32                                                               3. Searching for Solutions
36   class Arc(object):
37       """An arc has a from_node and a to_node node and a (non-negative) cost"""
38       def __init__(self, from_node, to_node, cost=1, action=None):
39           assert cost >= 0, ("Cost cannot be negative for"+
40                            str(from_node)+"->"+str(to_node)+", cost: "+str(cost))
41           self.from_node = from_node
42           self.to_node = to_node
43           self.action = action
44           self.cost=cost
45
46        def __repr__(self):
47            """string representation of an arc"""
48            if self.action:
49                return str(self.from_node)+" --"+str(self.action)+"--> "+str(self.to_node)
50            else:
51                return str(self.from_node)+" --> "+str(self.to_node)
• a start node
     To define a search problem, we need to define the start node, the goal predicate,
     the neighbors function and the heuristic function.
                                  searchProblem.py — (continued)
53   class Search_problem_from_explicit_graph(Search_problem):
54       """A search problem consists of:
55       * a list or set of nodes
56       * a list or set of arcs
57       * a start node
58       * a list or set of goal nodes
59       * a dictionary that maps each node into its heuristic value.
60       """
61
62      def __init__(self, nodes, arcs, start=None, goals=set(), hmap={}):
63          self.neighs = {}
64          self.nodes = nodes
65          for node in nodes:
66              self.neighs[node]=[]
67          self.arcs = arcs
68          for arc in arcs:
69              self.neighs[arc.from_node].append(arc)
70          self.start = start
71          self.goals = goals
72          self.hmap = hmap
73
74      def start_node(self):
75          """returns start node"""
76          return self.start
77
78      def is_goal(self,node):
79          """is True if node is a goal"""
80          return node in self.goals
81
82      def neighbors(self,node):
83          """returns the neighbors of node"""
84          return self.neighs[node]
85
86      def heuristic(self,node):
87          """Gives the heuristic value of node n.
88          Returns 0 if not overridden in the hmap."""
89          if node in self.hmap:
90              return self.hmap[node]
91          else:
92              return 0
 93
 94        def __repr__(self):
 95            """returns a string representation of the search problem"""
 96            res=""
 97            for arc in self.arcs:
 98                res += str(arc)+". "
 99            return res
      The following is used for the depth-first search implementation below.
                                     searchProblem.py — (continued)
      3.1.2 Paths
      A searcher will return a path from the start node to a goal node. A Python list
      is not a suitable representation for a path, as many search algorithms consider
      multiple paths at once, and these paths should share initial parts of the path.
      If we wanted to do this with Python lists, we would need to keep copying the
      list, which can be expensive if the list is long. An alternative representation is
      used here in terms of a recursive data structure that can share subparts.
           A path is either:
           • a path, initial and an arc, where the from node of the arc is the node at the
             end of initial.
      These cases are distinguished in the following code by having arc = None if the
      path has length 0, in which case initial is the node of the path.
                                     searchProblem.py — (continued)
                                a                          3
                                    1
                                               b
                                                                                   3
                                                   1           c   1
                                            3                              d
                                                                               1
                                                                                   g
                                           a
                                                           3
                                    1
                                                                       h
                                b          1                               1       j
                                                       d       1
                               3                                   g
                                                   3
                                    c
                                                           g
216        {'mail','ts','o103','o109','o111','b1','b2','b3','b4','c1','c2','c3',
217         'o125','o123','o119','r123','storage'},
218         [ Arc('ts','mail',6), Arc('mail','ts',6),
219            Arc('o103','ts',8), Arc('ts','o103',8),
220            Arc('o103','b3',4),
221            Arc('o103','o109',12), Arc('o109','o103',12),
222            Arc('o109','o119',16), Arc('o119','o109',16),
223            Arc('o109','o111',4), Arc('o111','o109',4),
224            Arc('b1','c2',3),
225            Arc('b1','b2',6), Arc('b2','b1',6),
226            Arc('b2','b4',3), Arc('b4','b2',3),
227            Arc('b3','b1',4), Arc('b1','b3',4),
228            Arc('b3','b4',7), Arc('b4','b3',7),
229            Arc('b4','o109',7),
230            Arc('c1','c3',8), Arc('c3','c1',8),
231            Arc('c2','c3',6), Arc('c3','c2',6),
232            Arc('c2','c1',4), Arc('c1','c2',4),
233            Arc('o123','o125',4), Arc('o125','o123',4),
234            Arc('o123','r123',4), Arc('r123','o123',4),
235            Arc('o119','o123',9), Arc('o123','o119',9),
236            Arc('o119','storage',7), Arc('storage','o119',7)],
237        start = 'o103',
238        goals = {'r123'},
239        hmap = {
240            'mail' : 26,
241            'ts' : 23,
242            'o103' : 21,
243            'o109' : 24,
244            'o111' : 27,
245            'o119' : 11,
246            'o123' : 4,
247            'o125' : 6,
248            'r123' : 0,
249            'b1' : 13,
250            'b2' : 15,
251            'b3' : 17,
252            'b4' : 18,
253            'c1' : 6,
254            'c2' : 10,
255            'c3' : 12,
256            'storage' : 12
257            }
258        )
     3.2.1 Searcher
     A Searcher for a problem can be asked repeatedly for the next path. To solve a
     problem, we can construct a Searcher object for the problem and then repeatedly
     ask for the next path using search. If there are no more paths, None is returned.
                       searchGeneric.py — Generic Searcher, including depth-first and A*
11   from utilities import Displayable, visualize
12
13   class Searcher(Displayable):
14       """returns a searcher for a problem.
15       Paths can be found by repeatedly calling search().
16       This does depth-first search unless overridden
17       """
18       def __init__(self, problem):
19           """creates a searcher from a problem
20           """
21           self.problem = problem
22           self.initialize_frontier()
23           self.num_expanded = 0
24           self.add_to_frontier(Path(problem.start_node()))
25           super().__init__()
26
27      def initialize_frontier(self):
28          self.frontier = []
29
30      def empty_frontier(self):
31          return self.frontier == []
32
33      def add_to_frontier(self,path):
34          self.frontier.append(path)
35
36      @visualize
37      def search(self):
38          """returns (next) path from the problem's start node
39          to a goal node.
40          Returns None if no path exists.
41          """
42          while not self.empty_frontier():
43              path = self.frontier.pop()
44              self.display(2, "Expanding:",path,"(cost:",path.cost,")")
45              self.num_expanded += 1
      The following methods are used for finding and printing information about
      the frontier.
searchGeneric.py — (continued)
 94      def count(self,val):
 95          """returns the number of elements of the frontier with value=val"""
 96          return sum(1 for e in self.frontierpq if e[0]==val)
 97
 98      def __repr__(self):
 99          """string representation of the frontier"""
100          return str([(n,c,str(p)) for (n,c,p) in self.frontierpq])
101
102      def __len__(self):
103          """length of the frontier"""
104          return len(self.frontierpq)
105
106      def __iter__(self):
107          """iterate through the paths in the frontier"""
108          for (_,_,path) in self.frontierpq:
109              yield path
      3.2.3 A∗ Search
      For an A∗ Search the frontier is implemented using the FrontierPQ class.
                                   searchGeneric.py — (continued)
     Exercise 3.2 Change the code so that it implements (i) best-first search and (ii)
     lowest-cost-first search. For each of these methods compare it to A∗ in terms of the
     number of paths expanded, and the path found.
     Exercise 3.3 In the add method in FrontierPQ what does the ”-” in front of frontier index
     do? When there are multiple paths with the same f -value, which search method
     does this act like? What happens if the ”-” is removed? When there are multiple
     paths with the same value, which search method does this act like? Does it work
     better with or without the ”-”? What evidence did you base your conclusion on?
     Exercise 3.4 The searcher acts like a Python iterator, in that it returns one value
     (here a path) and then returns other values (paths) on demand, but does not imple-
     ment the iterator interface. Change the code so it implements the iterator interface.
     What does this enable us to do?
         Depth-first search methods do not need an a priority queue, but can use
     a list as a stack. In this implementation of branch-and-bound search, we call
     search to find an optimal solution with cost less than bound. This uses depth-
     first search to find a path to a goal that extends path with cost less than the
     bound. Once a path to a goal has been found, that path is remembered as the
     best path, the bound is reduced, and the search continues.
                          searchBranchAndBound.py — Branch and Bound Search
11   from searchProblem import Path
12   from searchGeneric import Searcher
13   from utilities import Displayable, visualize
14
15   class DF_branch_and_bound(Searcher):
16       """returns a branch and bound searcher for a problem.
17      An optimal path with cost less than bound can be found by calling search()
18      """
19      def __init__(self, problem, bound=float("inf")):
20          """creates a searcher than can be used with search() to find an optimal path.
21          bound gives the initial bound. By default this is infinite - meaning there
22          is no initial pruning due to depth bound
23          """
24          super().__init__(problem)
25          self.best_path = None
26          self.bound = bound
27
28      @visualize
29      def search(self):
30          """returns an optimal solution to a problem with cost less than bound.
31          returns None if there is no solution with cost less than bound."""
32          self.frontier = [Path(self.problem.start_node())]
33          self.num_expanded = 0
34          while self.frontier:
35              path = self.frontier.pop()
36              if path.cost+self.problem.heuristic(path.end()) < self.bound:
37                  self.display(3,"Expanding:",path,"cost:",path.cost)
38                  self.num_expanded += 1
39                  if self.problem.is_goal(path.end()):
40                      self.best_path = path
41                      self.bound = path.cost
42                      self.display(2,"New best path:",path," cost:",path.cost)
43                  else:
44                      neighs = self.problem.neighbors(path.end())
45                      self.display(3,"Neighbors are", neighs)
46                      for arc in reversed(list(neighs)):
47                          self.add_to_frontier(Path(path, arc))
48          self.display(1,"Number of paths expanded:",self.num_expanded)
49          self.solution = self.best_path
50          return self.best_path
     Note that this code used reversed in order to expand the neighbors of a node
     in the left-to-right order one might expect. It does this because pop() removes
     the rightmost element of the list. Note that reversed only works on lists and
     tuples, but the neighbours can be generated.
         Here is a unit test and some queries:
                              searchBranchAndBound.py — (continued)
45      print("Path found:",tbb2.search())
46
47      print("\nDepth-first search: (Use ˆC if it goes on forever)")
48      tsearcher = Searcher(problem)
49      print("Path found:",tsearcher.search()," cost=",tsearcher.solution.cost)
50
51
52   import aispace.searchProblem as searchProblem
53   from searchTest import run
54   if __name__ == "__main__":
55       run(searchProblem.problem1,"Problem 1")
56   # run(searchProblem.acyclic_delivery_problem,"Acyclic Delivery")
57   # run(searchProblem.cyclic_delivery_problem,"Cyclic Delivery")
58   # also test some graphs with cycles, and some with multiple least-cost paths
        • The condition is a Boolean function that takes the same number of ar-
          guments as there are variables in the scope. The condition must have a
          __name__ property that gives a printable name of the function; built-in
          functions and functions that are defined using def have such a property;
          for other functions you may need to define this property.
                                                     49
     50                                                        4. Reasoning with Constraints
26        def holds(self,assignment):
27            """returns the value of Constraint con evaluated in assignment.
28
29            precondition: all variables are assigned in assignment
30            """
31            return self.condition(*tuple(assignment[v] for v in self.scope))
     4.1.2 CSPs
     A constraint satisfaction problem (CSP) requires:
cspProblem.py — (continued)
33   class CSP(Displayable):
34       """A CSP consists of
35       * domains, a dictionary that maps each variable to its domain
36       * constraints, a list of constraints
37       * variables, a set of variables
38       * var_to_const, a variable to set of constraints dictionary
39       """
40       def __init__(self,domains,constraints):
60       def consistent(self,assignment):
61           """assignment is a variable:value dictionary
62           returns True if all of the constraints that can be evaluated
63                          evaluate to True given assignment.
64           """
65           return all(con.holds(assignment)
66                      for con in self.constraints
67                      if all(v in assignment for v in con.scope))
     4.1.3 Examples
     In the following code ne , when given a number, returns a function that is true
     when its argument is not that number. For example, if f = ne (3), then f (2)
     is True and f (3) is False. That is, ne (x)(y) is true when x 6= y. Allowing
     a function of multiple arguments to use its arguments one at a time is called
     currying, after the logician Haskell Curry. The use of a condition in constraints
     requires that the function with a single argument has a name.
                                   cspExamples.py — Example CSPs
11   from cspProblem import CSP, Constraint
12   from operator import lt,ne,eq,gt
13
14   def ne_(val):
15       """not equal value"""
23   def is_(val):
24       """is a value"""
25       # isv = lambda x: x == val # alternative definition
26       # isv = partial(eq,val)   # another alternative definition
27       def isv(x):
28           return val == x
29       isv.__name__ = str(val)+"=="
30       return isv
     The CSP, csp0 has variables X, Y and Z, each with domain {1, 2, 3}. The con-
     straints are X < Y and Y < Z.
                                  cspExamples.py — (continued)
36   C0 =   Constraint(('A','B'),lt)
37   C1 =   Constraint(('B',),ne_(2))
38   C2 =   Constraint(('B','C'),lt)
39   csp1   = CSP({'A':{1,2,3,4},'B':{1,2,3,4}, 'C':{1,2,3,4}},
40               [C0, C1, C2])
        The next CSP, csp2 is Example 4.9 of the textbook; the domain consistent
     network is shown in Figure 4.1.
                                  cspExamples.py — (continued)
                                              A≠B
                       A                                                      B
                    {1,2,3,4}                                              {1,2,4}
                 A=D                                                              B≠C
                                                      B≠D
                       D                                                     C          E<B
          E<A       {1,2,3,4}                                              {1,3,4}
                                                 C<D
E<D E<C
                                                  E
                                               {1,2,3,4}
51              Constraint(('B','E'),gt),
52              Constraint(('C','E'),gt),
53              Constraint(('D','E'),gt),
54              Constraint(('B','D'),ne)])
     The following example is another scheduling problem (but with multiple an-
     swers). This is the same a scheduling 2 in the original AIspace.org consistency
     app.
cspExamples.py — (continued)
cspExamples.py — (continued)
1 2
                                                      Words:
                  3
                                                      ant, big, bus, car, has,
                                                      book, buys, hold, lane,
                                                      year, ginger, search,
                                                      symbol, syntax.
                             4
66   def adjacent(x,y):
67      """True when x and y are adjacent numbers"""
68      return abs(x-y) == 1
69
70   csp4 = CSP({'A':{1,2,3,4,5},'B':{1,2,3,4,5}, 'C':{1,2,3,4,5},
71              'D':{1,2,3,4,5}, 'E':{1,2,3,4,5}},
72             [Constraint(('A','B'),adjacent),
73              Constraint(('B','C'),adjacent),
74              Constraint(('C','D'),adjacent),
75              Constraint(('D','E'),adjacent),
76              Constraint(('A','C'),ne),
77              Constraint(('B','D'),ne),
78              Constraint(('C','E'),ne)])
          The following examples represent the crossword shown in Figure 4.2.
                                  cspExamples.py — (continued)
80   def meet_at(p1,p2):
81       """returns a function that is true when the words meet at the postions p1, p2
82       """
83       def meets(w1,w2):
84           return w1[p1] == w2[p2]
85       meets.__name__ = "meet_at("+str(p1)+','+str(p2)+')'
86       return meets
87
88   crossword1 = CSP({'one_across':{'ant', 'big', 'bus', 'car', 'has'},
89                   'one_down':{'book', 'buys', 'hold', 'lane', 'year'},
90                   'two_down':{'ginger', 'search', 'symbol', 'syntax'},
91                   'three_across':{'book', 'buys', 'hold', 'land', 'year'},
92                   'four_across':{'ant', 'big', 'bus', 'car', 'has'}},
93                   [Constraint(('one_across','one_down'),meet_at(0,0)),
94                    Constraint(('one_across','two_down'),meet_at(2,0)),
95                    Constraint(('three_across','two_down'),meet_at(2,2)),
 96                     Constraint(('three_across','one_down'),meet_at(0,2)),
 97                     Constraint(('four_across','two_down'),meet_at(0,4))])
      Unit tests
      The following defines a unit test for solvers on example csp1.
                                   cspExamples.py — (continued)
     Exercise 4.1 Modify test so that instead of taking in a list of solutions, it checks
     whether the returned solution actually is a solution.
     Exercise 4.2 Propose a test that is appropriate for CSPs with no solutions. As-
     sume that the test designer knows there are no solutions. Consider what a CSP
     solver should return if there are no solutions to the CSP.
         The first solver searches through the space of partial assignments. This
     takes in a CSP problem and an optional variable ordering, which is a list of the
     variables in the CSP. It then constructs a search space that can be solved using
     the search methods of the previous chapter. In this search space:
         The neighbors(node) method uses the fact that the length of the node, which
     is the number of variables already assigned, is the index of the next variable to
     split on.
cspSearch.py — (continued)
cspSearch.py — (continued)
     Exercise 4.3 What would happen if we constructed the new assignment by as-
     signing node[var] = val (with side effects) instead of using dictionary union? Give
     an example of where this could give a wrong answer. How could the algorithm be
     changed to work with side effects? (Hint: think about what information needs to
     be in a node).
     Exercise 4.4 Change neighbors so that it returns an iterator of values rather than
     a list. (Hint: use yield.)
      env; it changes the values of the variables in other vars. It should only be called
      when the side effects have no ill effects.
cspConsistency.py — (continued)
cspConsistency.py — (continued)
118                   self.display(3, " adding", to_do if to_do else "nothing", "to to_do.")
119                   return self.solve_one(new_doms1, to_do) or self.solve_one(new_doms2, to_do)
120
121       def select_var(self, iter_vars):
122           """return the next variable to split"""
123           return select(iter_vars)
124
125   def partition_domain(dom):
126       """partitions domain dom into two.
127       """
128       split = len(dom) // 2
129       dom1 = set(list(dom)[:split])
130       dom2 = dom - dom1
131       return dom1, dom2
cspConsistency.py — (continued)
cspConsistency.py — (continued)
      Exercise 4.5 Implement of solve all that is like solve one but returns the set of all
      solutions.
      Exercise 4.6 Implement solve enum that enumerates the solutions. It should use
      Python’s yield (and perhaps yield from).
Unit test:
cspConsistency.py — (continued)
cspConsistency.py — (continued)
      Exercise 4.7 When splitting a domain, this code splits the domain into half, ap-
      proximately in half (without any effort to make a sensible choice). Does it work
      better to split one element from a domain?
         Unit test:
                                   cspConsistency.py — (continued)
          This implements both the two-stage choice, the any-conflict algorithm and
      a random choice of variable (and a probabilistic mix of the three).
          Given a CSP, the stochastic local searcher (SLSearcher) creates the data struc-
      tures:
          • variables to select is the set of all of the variables with domain-size greater
            than one. For a variable not in this set, we cannot pick another value from
            that variable.
          • var to constraints maps from a variable into the set of constraints it is in-
            volved in. Note that the inverse mapping from constraints into variables
            is part of the definition of a constraint.
29        def restart(self):
30            """creates a new total assignment and the conflict set
31            """
32            self.current_assignment = {var:random_sample(dom) for
33                                     (var,dom) in self.csp.domains.items()}
34            self.display(2,"Initial assignment",self.current_assignment)
35            self.conflicts = set()
36            for con in self.csp.constraints:
37                if not con.holds(self.current_assignment):
38                    self.conflicts.add(con)
39            self.display(2,"Number of conflicts",len(self.conflicts))
40            self.variable_pq = None
         The search method is the top-level searching algorithm. It can either be used
     to start the search or to continue searching. If there is no current assignment,
     it must create one. Note that, when counting steps, a restart is counted as one
     step.
         This method selects one of two implementations. The argument pob best
     is the probability of selecting a best variable (one involving the most conflicts).
     When the value of prob best is positive, the algorithm needs to maintain a prior-
     ity queue of variables and the number of conflicts (using search with var pq). If
     the probability of selecting a best variable is zero, it does not need to maintain
     this priority queue (as implemented in search with any conflict).
          The argument prob anycon is the probability that the any-conflict strategy is
     used (which selects a variable at random that is in a conflict), assuming that
     it is not picking a best variable. Note that for the probability parameters, any
     value less that zero acts like probability zero and any value greater than 1 acts
     like probability 1. This means that when prob anycon = 1.0, a best variable is
     chosen with probability prob best, otherwise a variable in any conflict is chosen.
     A variable is chosen at random with probability 1 − prob anycon − prob best as
     long as that is positive.
          This returns the number of steps needed to find a solution, or None if no
     solution is found. If there is a solution, it is in self .current assignment.
                                     cspSLS.py — (continued)
     Exercise 4.8 This does an initial random assignment but does not do any random
     restarts. Implement a searcher that takes in the maximum number of walk steps
     (corresponding to existing max steps) and the maximum number of restarts, and
     returns the total number of steps for the first solution found. (As in search, the
     solution found can be extracted from the variable self .current assignment).
     4.4.1 Any-conflict
     If the probability of picking a best variable is zero, the implementation need to
     keeps track of which variables are in conflicts.
                                     cspSLS.py — (continued)
     Exercise 4.9 This makes no attempt to find the best alternative value for a vari-
     able. Modify the code so that after selecting a variable it selects a value the reduces
     the number of conflicts by the most. Have a parameter that specifies the probabil-
     ity that the best value is chosen.
      should change. This is used with the updatable queue (page 68) to find a vari-
      able with the most conflicts.
                                    cspSLS.py — (continued)
cspSLS.py — (continued)
      Exercise 4.10 This makes no attempt to find the best alternative value for a vari-
      able. Modify the code so that after selecting a variable it selects a value the reduces
      the number of conflicts by the most. Have a parameter that specifies the probabil-
      ity that the best value is chosen.
      Exercise 4.11 These implementations always select a value for the variable se-
      lected that is different from its current value (if that is possible). Change the code
      so that it does not have this restriction (so it can leave the value the same). Would
      you expect this code to be faster? Does it work worse (or better)?
      than popping (as might happen if the probability of choosing the best is close
      to zero).
          In this implementation, the equal values are sorted randomly. This is achieved
      by having the elements of the heap being [val, rand, elt] triples, where the sec-
      ond element is a random number. Note that Python requires this to be a list,
      not a tuple, as the tuple cannot be modified.
cspSLS.py — (continued)
      4.4.5 Testing
                                      cspSLS.py — (continued)
Exercise 4.12 Modify this to plot the runtime, instead of the number of steps.
To measure runtime use timeit (https://docs.python.org/3.5/library/timeit.
html). Small runtimes are inaccurate, so timeit can run the same code multi-
ple times. Stochastic local algorithms give different runtimes each time called.
To make the timing meaningful, you need to make sure the random seed is the
same for each repeated call (see random.getstate and random.setstate in https:
//docs.python.org/3.5/library/random.html). Because the runtime for different
seeds can vary a great deal, for each seed, you should start with 1 iteration and
multiplying it by, say 10, until the time is greater than 0.2 seconds. Make sure you
plot the average time for each run. Before you start, try to estimate the total run-
time, so you will be able to tell if there is a problem with the algorithm stopping.
     An askable atom can be asked of the user. The user can respond in English or
     French or just with a “y”.
                                   logicProblem.py — (continued)
27   class Askable(object):
28       """An askable atom"""
29
30      def __init__(self,atom):
31          """clause with atom head and lost of atoms body"""
                                                73
     74                                                        5. Propositions and Inference
32           self.atom=atom
33
34        def __str__(self):
35            """returns the string representation of a clause."""
36            return "askable " + self.atom + "."
37
38   def yes(ans):
39       """returns true if the answer is yes in some form"""
40       return ans.lower() in ['yes', 'yes.', 'oui', 'oui.', 'y', 'y.'] # bilingual
     A knowledge base is a list of clauses and askables. In order to make top-down
     inference faster, this creates a dictionary that maps each atoms into the set of
     clauses with that atom in the head.
                                   logicProblem.py — (continued)
71   triv_KB = KB([
72       Clause('i_am', ['i_think']),
73       Clause('i_think'),
74       Clause('i_smell', ['i_exist'])
75       ])
 77   elect = KB([
 78       Clause('light_l1'),
 79       Clause('light_l2'),
 80       Clause('ok_l1'),
 81       Clause('ok_l2'),
 82       Clause('ok_cb1'),
 83       Clause('ok_cb2'),
 84       Clause('live_outside'),
 85       Clause('live_l1', ['live_w0']),
 86       Clause('live_w0', ['up_s2','live_w1']),
 87       Clause('live_w0', ['down_s2','live_w2']),
 88       Clause('live_w1', ['up_s1', 'live_w3']),
 89       Clause('live_w2', ['down_s1','live_w3' ]),
 90       Clause('live_l2', ['live_w4']),
 91       Clause('live_w4', ['up_s3','live_w3' ]),
 92       Clause('live_p_1', ['live_w3']),
 93       Clause('live_w3', ['live_w5', 'ok_cb1']),
 94       Clause('live_p_2', ['live_w6']),
 95       Clause('live_w6', ['live_w5', 'ok_cb2']),
 96       Clause('live_w5', ['live_outside']),
 97       Clause('lit_l1', ['light_l1', 'live_l1', 'ok_l1']),
 98       Clause('lit_l2', ['light_l2', 'live_l2', 'ok_l2']),
 99       Askable('up_s1'),
100       Askable('down_s1'),
101       Askable('up_s2'),
102       Askable('down_s2'),
103       Askable('up_s3'),
104       Askable('down_s2')
105       ])
106
107   # print(kb)
20            for c in kb.clauses:
21                if c.head not in fp and all(b in fp for b in c.body):
22                    fp.add(c.head)
23                    added = True
24                    kb.display(2,c.head,"added to fp due to clause",c)
25        return fp
26
27   def ask_askables(kb):
28       return {at for at in kb.askables if yes(input("Is "+at+" true? "))}
Testing:
logicBottomUp.py — (continued)
     Exercise 5.1 It is not very user-friendly to ask all of the askables up-front. Imple-
     ment ask-the-user so that questions are only asked if useful, and are not re-asked.
     For example, if there is a clause h ← a ∧ b ∧ c ∧ d ∧ e, where c and e are askable,
     c and e only need to be asked if a, b, d are all in fp and they have not been asked
     before. Askable e only needs to be asked if the user says “yes” to c. Askable c
     doesn’t need to be asked if the user previously replied “no” to e.
         This form of ask-the-user can ask a different set of questions than the top-
     down interpreter that asks questions when encountered. Give an example where
     they ask different questions (neither set of questions asked is a subset of the other).
     Exercise 5.2 This algorithm runs in time O(n2 ), where n is the number of clauses,
     for a bounded number of elements in the body; each iteration goes through each
     of the clauses, and in the worst case, it will do an iteration for each clause. It is
     possible to implement this in time O(n) time by creating an index that maps an
     atom to the set of clauses with that atom in the body. Implement this. What is its
     complexity as a function of n and b, the maximum number of atoms in the body of
     a clause?
     Exercise 5.3 It is possible to be asymptitocally more efficient (in terms of b) than
     the method in the previous question by noticing that each element of the body of
     clause only needs to be checked once. For example, the clause a ← b ∧ c ∧ d, needs
     only be considered when b is added to fp. Once b is added to fp, if c is already in
     pf , we know that a can be added as soon as d is added. Implement this. What is its
     complexity as a function of n and b, the maximum number of atoms in the body of
     a clause?
Testing:
logicTopDown.py — (continued)
     Exercise 5.4 This code can re-ask a question multiple times. Implement this code
     so that it only asks a question once and remembers the answer. Also implement a
     function to forget the answers.
     Exercise 5.5 What search method is this using? Implement the search interface
     so that it can use A∗ or other searching methods. Define an admissible heuristic
     that is not always 0.
     5.4       Assumables
     Atom a can be made assumable by including Assumable(a) in the knowledge
     base. A knowledge base that can include assumables is declared with KBA.
                          logicAssumables.py — Definite clauses with assumables
11   from logicProblem import Clause, Askable, KB, yes
12
13   class Assumable(object):
14       """An askable atom"""
15
16        def __init__(self,atom):
17            """clause with atom head and lost of atoms body"""
18            self.atom = atom
19
20        def __str__(self):
21            """returns the string representation of a clause.
22            """
23            return "assumable " + self.atom + "."
24
25   class KBA(KB):
26       """A knowledge base that can include assumables"""
27       def __init__(self,statements):
28           self.assumables = [c.atom for c in statements if isinstance(c, Assumable)]
29           KB.__init__(self,statements)
     The top-down Horn clause interpreter, prove all ass returns a list of the sets of
     assumables that imply ans body. This list will contain all of the minimal sets
     of assumables, but can also find non-minimal sets, and repeated sets, if they
     can be generated with separate proofs. The set assumed is the set of assumables
     already assumed.
                                   logicAssumables.py — (continued)
        Given a list of sets, minsets returns a list of the minimal sets in the list. For
     example, minsets([{2, 3, 4}, {2, 3}, {6, 2, 3}, {2, 3}, {2, 4, 5}]) returns [{2, 3}, {2, 4, 5}].
                                     logicAssumables.py — (continued)
58   def minsets(ls):
59       """ls is a list of sets
60       returns a list of minimal sets in ls
61       """
62       ans = []     # elements known to be minimal
63       for c in ls:
64           if not any(c1<c for c1 in ls) and not any(c1 <= c for c1 in ans):
65               ans.append(c)
66       return ans
67
68   # minsets([{2, 3, 4}, {2, 3}, {6, 2, 3}, {2, 3}, {2, 4, 5}])
     Warning: minsets works for a list of sets or for a set of (frozen) sets, but it does
     not work for a generator of sets. For example, try to predict and then test:
     minsets(e for e in [{2, 3, 4}, {2, 3}, {6, 2, 3}, {2, 3}, {2, 4, 5}])
        The diagnoses can be constructed from the (minimal) conflicts as follows.
     This also works if there are non-minimal conflicts, but is not as efficient.
                                     logicAssumables.py — (continued)
69   def diagnoses(cons):
70       """cons is a list of (minimal) conflicts.
71       returns a list of diagnoses."""
72       if cons == []:
73           return [set()]
74       else:
75           return minsets([({e}|d)            # | is set union
76                        for e in cons[0]
77                        for d in diagnoses(cons[1:])])
     Test cases:
                                     logicAssumables.py — (continued)
80   electa = KBA([
81       Clause('light_l1'),
82       Clause('light_l2'),
83       Assumable('ok_l1'),
84       Assumable('ok_l2'),
 85         Assumable('ok_s1'),
 86         Assumable('ok_s2'),
 87         Assumable('ok_s3'),
 88         Assumable('ok_cb1'),
 89         Assumable('ok_cb2'),
 90         Assumable('live_outside'),
 91         Clause('live_l1', ['live_w0']),
 92         Clause('live_w0', ['up_s2','ok_s2','live_w1']),
 93         Clause('live_w0', ['down_s2','ok_s2','live_w2']),
 94         Clause('live_w1', ['up_s1', 'ok_s1', 'live_w3']),
 95         Clause('live_w2', ['down_s1', 'ok_s1','live_w3' ]),
 96         Clause('live_l2', ['live_w4']),
 97         Clause('live_w4', ['up_s3','ok_s3','live_w3' ]),
 98         Clause('live_p_1', ['live_w3']),
 99         Clause('live_w3', ['live_w5', 'ok_cb1']),
100         Clause('live_p_2', ['live_w6']),
101         Clause('live_w6', ['live_w5', 'ok_cb2']),
102         Clause('live_w5', ['live_outside']),
103         Clause('lit_l1', ['light_l1', 'live_l1', 'ok_l1']),
104         Clause('lit_l2', ['light_l2', 'live_l2', 'ok_l2']),
105         Askable('up_s1'),
106         Askable('down_s1'),
107         Askable('up_s2'),
108         Askable('down_s2'),
109         Askable('up_s3'),
110         Askable('down_s2'),
111         Askable('dark_l1'),
112         Askable('dark_l2'),
113         Clause('false', ['dark_l1', 'lit_l1']),
114         Clause('false', ['dark_l2', 'lit_l2'])
115         ])
116   #   electa.prove_all_ass(['false'])
117   #   cs=electa.conflicts()
118   #   print(cs)
119   #   diagnoses(cs)       # diagnoses from conflicts
        • effects: a dictionary of feature:value pairs that are made true by this action.
          In particular, a feature in the dictionary has the corresponding value (and
          not its previous value) after the action, and a feature not in the dictionary
          keeps its old value.
                                                   81
     82                                                             6. Planning with Certainty
• A set of actions.
          • A dictionary that maps each feature into a set of possible values for the
            feature.
stripsProblem.py — (continued)
26   class STRIPS_domain(object):
27       def __init__(self, feats_vals, strips_map):
28           """Problem domain
29           feats_vals is a feature:domain dictionary,
30                  mapping each feature to its domain
31           strips_map is an action:strips dictionary,
32                  mapping each action to its Strips representation
33           """
34           self.actions = set(strips_map) # set of all actions
35           self.feats_vals = feats_vals
36           self.strips_map = strips_map
56 class Planning_problem(object):
                        b          move(b,c,a)                b
              a         c                                      a          c
move(b,c,table)
a c b
      represents the blocks world. Note that the actions and the conditions are all
      strings.
                                   stripsProblem.py — (continued)
stripsProblem.py — (continued)
        In order to define a search problem (page 31), we need to define the goal
     condition, the start nodes, the neighbours, and (optionally) a heuristic function.
     Here zero is the default heuristic function.
stripsForwardPlanner.py — (continued)
27   def zero(*args,**nargs):
28       """always returns 0"""
29       return 0
30
31   class Forward_STRIPS(Search_problem):
32       """A search problem from a planning problem where:
33       * a node is a state object.
34       * the dynamics are specified by the STRIPS representation of actions
35       """
36       def __init__(self, planning_problem, heur=zero):
37           """creates a forward seach space from a planning problem.
38           heur(state,goal) is a heuristic function,
39              an underestimate of the cost from state to goal, where
40              both state and goals are feature:value dictionaries.
41           """
42           self.prob_domain = planning_problem.prob_domain
43           self.initial_state = State(planning_problem.initial_state)
44           self.goal = planning_problem.goal
45           self.heur = heur
46
47        def is_goal(self, state):
48            """is True if node is a goal.
49
50           Every goal feature has the same value in the state and the goal."""
51           state_asst = state.assignment
52           return all(prop in state_asst and state_asst[prop]==self.goal[prop]
53                     for prop in self.goal)
54
55        def start_node(self):
56            """returns start node"""
57            return self.initial_state
58
59        def neighbors(self,state):
60            """returns neighbors of state in this problem"""
61            cost=1
62            state_asst = state.assignment
63            return [ Arc(state,self.effect(act,state_asst),cost,act)
64                    for act in self.prob_domain.actions
65                    if self.possible(act,state_asst)]
66
67        def possible(self,act,state_asst):
68            """True if act is possible in state.
69            act is possible if all of its preconditions have the same value in the state"""
70            preconds = self.prob_domain.strips_map[act].preconditions
71            return all(pre in state_asst and state_asst[pre]==preconds[pre]
stripsForwardPlanner.py — (continued)
17            return 2
18        else:
19            return 1
         Note that the current state is a complete description; there is a value for
     every feature. However the goal need not be complete; it does not need to
     define a value for every feature. Before checking the value for a feature in the
     goal, a heuristic needs to define whether the feature is defined in the goal.
                                  stripsHeuristic.py — (continued)
21   def h1(state,goal):
22       """ the distance to the goal location, if there is one"""
23       if 'RLoc' in goal:
24           return dist(state['RLoc'], goal['RLoc'])
25       else:
26           return 0
27
28   def h2(state,goal):
29       """ the distance to the coffee shop plus getting coffee and delivering it
30       if the robot needs to get coffee
31       """
32       if ('SWC' in goal and goal['SWC']==False
33               and state['SWC']==True
34               and state['RHC']==False):
35           return dist(state['RLoc'],'cs')+3
36       else:
37           return 0
     The maximum of the values of a set of admissible heuristics is also an admis-
     sible heuristic. The function maxh takes a number of heuristic functions as ar-
     guments, and returns a new heuristic function that takes the maximum of the
     values of the heuristics. For example, h1 and h2 are heuristic functions and so
     maxh(h1,h2) is also. maxh can take an arbitrary number of arguments.
                                  stripsHeuristic.py — (continued)
39   def maxh(*heuristics):
40       """Returns a new heuristic function that is the maximum of the functions in heuristics.
41       heuristics is the list of arguments which must be heuristic functions.
42       """
43       return lambda state,goal: max(h(state,goal) for h in heuristics)
     The following runs the example with and without the heuristic. (Also try using
     AStarSearcher instead of SearcherMPP.)
                                  stripsHeuristic.py — (continued)
51   def test_forward_heuristic(thisproblem=problem1):
52       print("\n***** FORWARD NO HEURISTIC")
53       print(SearcherMPP(Forward_STRIPS(thisproblem)).search())
54
55      print("\n***** FORWARD WITH HEURISTIC h1")
56      print(SearcherMPP(Forward_STRIPS(thisproblem,h1)).search())
57
58      print("\n***** FORWARD WITH HEURISTICs h1 and h2")
59      print(SearcherMPP(Forward_STRIPS(thisproblem,maxh(h1,h2))).search())
60
61   if __name__ == "__main__":
62       test_forward_heuristic()
     Exercise 6.4 Try the forward planner with a heuristic function of just h1, with
     just h2 and with both. Explain how each one prunes or doesn’t prune the search
     space.
     Exercise 6.5 Create a better heuristic than maxh(h1, h2). Try it for a number of
     different problems.
     Exercise 6.6 Create an admissible heuristic for the blocks world.
     A regression search has subgoals as nodes. The initial node is the top-level goal
     of the planner. The goal for the search (when the search can stop) is a subgoal
     that holds in the initial state.
stripsRegressionPlanner.py — (continued)
 71            preconds = self.prob_domain.strips_map[act].preconditions
 72            return ( any(goal_asst[prop]==effects[prop]
 73                       for prop in effects if prop in goal_asst)
 74                   and all(goal_asst[prop]==effects[prop]
 75                           for prop in effects if prop in goal_asst)
 76                   and all(goal_asst[prop]==preconds[prop]
 77                           for prop in preconds if prop not in effects and prop in goal_asst)
 78                   )
 79
 80       def weakest_precond(self,act,goal_asst):
 81           """returns the subgoal that must be true so goal_asst holds after act"""
 82           new_asst = self.prob_domain.strips_map[act].preconditions.copy()
 83           for g in goal_asst:
 84               if g not in self.prob_domain.strips_map[act].effects:
 85                   new_asst[g] = goal_asst[g]
 86           return Subgoal(new_asst)
 87
 88       def heuristic(self,subgoal):
 89           """in the regression planner a node is a subgoal.
 90           the heuristic is an (under)estimate of the cost of going from the initial state to subgoal.
 91           """
 92           return self.heur(self.initial_state, subgoal.assignment)
stripsRegressionPlanner.py — (continued)
      Exercise 6.7 Multiple path pruning could be used to prune more than the current
      code. In particular, if the current node contains more conditions than a previously
      visited node, it can be pruned. For example, if {a : True, b : False} has been visited,
      then any node that is a superset, e.g., {a : True, b : False, d : True}, need not be
      expanded. If the simpler subgoal does not lead to a solution, the more complicated
      one wont either. Implement this more severe pruning. (Hint: This may require
      modifications to the searcher.)
      Exercise 6.8 It is possible that, as knowledge of the domain, that some as-
      signment of values to variables can never be achieved. For example, the robot
      cannot be holding mail when there is mail waiting (assuming it isn’t holding
      mail initially). An assignment of values to (some of the) variables is incompat-
      ible if no possible (reachable) state can include that assignment. For example,
      {0 MW 0 : True,0 RHM0 : True} is an incompatible assignment. This information may
      be useful information for a planner; there is no point in trying to achieve these
      together. Define a subclass of STRIPS domain that can accept a list of incompatible
     assignments. Modify the regression planner code to use such a list of incompatible
     assignments. Give an example where the search space is smaller.
     Exercise 6.9 After completing the previous exercise, design incompatible assign-
     ments for the blocks world. (This should result in dramatic search improvements.)
     Exercise 6.10 Try the regression planner with a heuristic function of just h1 and
     with just h2 (defined in Section 6.2.1). Explain how each one prunes or doesn’t
     prune the search space.
     Exercise 6.11 Create a better heuristic than heuristic fun defined in Section 6.2.1.
60        return str(var)+"_"+str(stage)
     The following methods return methods which can be applied to the particular
     environment.
         For example, is (3) returns a function that when applied to 3, returns True
     and when aplied to any other value returns False. So is (3)(3) trurns True and
     is (3)(7) returns False.
         Note that the underscore (’ ’) is part of the name; here we use it as the
     convention that it is a function that returns a function. This uses two different
     styles to define is and if ; returning a function defined by lambda is equivaent
     to returning the embedded function, except that the embedded function has a
     name. The embedded function can also be given a docstring.
                                  stripsCSPPlanner.py — (continued)
62   def is_(val):
63       """returns a function that is true when it is it applied to val.
64       """
65       return lambda x: x == val
66
67   def if_(v1,v2):
68       """if the second argument is v2, the first argument must be v1"""
69       #return lambda x1,x2: x1==v1 if x2==v2 else True
70       def if_fun(x1,x2):
71           return x1==v1 if x2==v2 else True
72       if_fun.__doc__ = "if x2 is "+str(v2)+" then x1 is "+str(v1)
73       return if_fun
74
75   def eq_if_not_in_(actset):
76       """first and third arguments are equal if action is not in actset"""
77       return lambda x1, a, x2: x1==x2 if a not in actset else True
         Putting it together, this returns a list of actions that solves the problem prob
     for a given horizon. If you want to do more than just return the list of actions,
     you might want to get it to return the solution. Or even enumerate the solutions
     (by using Search with AC from CSP).
                                  stripsCSPPlanner.py — (continued)
79   def con_plan(prob,horizon):
80       """finds a plan for problem prob given horizon.
81       """
82       csp = CSP_from_STRIPS(prob, horizon)
83       sol = Con_solver(csp).solve_one()
84       return csp.extract_plan(sol) if sol else sol
          The following are some example queries.
                                  stripsCSPPlanner.py — (continued)
 90
 91   # Problem 0
 92   # con_plan(problem0,1) # should it succeed?
 93   # con_plan(problem0,2) # should it succeed?
 94   # con_plan(problem0,3) # should it succeed?
 95   # To use search to enumerate solutions
 96   #searcher0a = Searcher(Search_with_AC_from_CSP(CSP_from_STRIPS(problem0, 1)))
 97   #print(searcher0a.search())
 98
 99   ## Problem 1
100   # con_plan(problem1,5) # should it succeed?
101   # con_plan(problem1,4) # should it succeed?
102   ## To use search to enumerate solutions:
103   #searcher15a = Searcher(Search_with_AC_from_CSP(CSP_from_STRIPS(problem1, 5)))
104   #print(searcher15a.search())
105
106   ## Problem 2
107   #con_plan(problem2, 6) # should fail??
108   #con_plan(problem2, 7) # should succeed???
109
110   ## Example 6.13
111   problem3 = Planning_problem(delivery_domain,
112                            {'SWC':True, 'RHC':False}, {'SWC':False})
113   #con_plan(problem3,2) # Horizon of 2
114   #con_plan(problem3,3) # Horizon of 3
115
116   problem4 = Planning_problem(delivery_domain,{'SWC':True},
117                               {'SWC':False, 'MW':False, 'RHM':False})
118
119   # For the stochastic local search:
120   #from cspSLS import SLSearcher, Runtime_distribution
121   # cspplanning15 = CSP_from_STRIPS(problem1, 5) # should succeed
122   #se0 = SLSearcher(cspplanning15); print(se0.search(100000,0.5))
123   #p = Runtime_distribution(cspplanning15)
124   #p.plot_run(1000,1000,0.7) # warning will take a few minutes
12   import random
13
14   class Action_instance(object):
15       next_index = 0
16       def __init__(self,action,index=None):
17           if index is None:
18               index = Action_instance.next_index
19               Action_instance.next_index += 1
20           self.action = action
21           self.index = index
22
23        def __str__(self):
24            return str(self.action)+"#"+str(self.index)
25
26        __repr__ = __str__ # __repr__ function is the same as the __str__ function
        A node (as in the abstraction of search space) in a partial-order planner
     consists of:
          • agenda: a list of (s, a) pairs, where s is a (var, val) pair and a is an action
            instance. This means that variable var must have value val before a can
            occur.
          • causal links: a set of (a0, g, a1) triples, where a1 and a2 are action instances
            and g is a (var, val) pair. This holds when action a0 makes g true for action
            a1 .
stripsPOP.py — (continued)
28   class POP_node(object):
29       """a (partial) partial-order plan. This is a node in the search space."""
30       def __init__(self, actions, constraints, agenda, causal_links):
31           """
32           * actions is a set of action instances
33           * constraints a set of (a0,a1) pairs, representing a0<a1,
34             closed under transitivity
35           * agenda list of (subgoal,action) pairs to be achieved, where
36             subgoal is a (variable,value) pair
37           * causal_links is a set of (a0,g,a1) triples,
38             where ai are action instances, and g is a (variable,value) pair
39           """
40           self.actions = actions # a set of action instances
54       def extract_plan(self):
55           """returns a total ordering of the action instances consistent
56           with the constraints.
57           raises IndexError if there is no choice.
58           """
59           sorted_acts = []
60           other_acts = set(self.actions)
61           while other_acts:
62               a = random.choice([a for a in other_acts if
63                       all(((a1,a) not in self.constraints) for a1 in other_acts)])
64               sorted_acts.append(a)
65               other_acts.remove(a)
66           return sorted_acts
        POP search from STRIPS is an instance of a search problem. As such, we
     need to define the start nodes, the goal, and the neighbors of a node.
                                     stripsPOP.py — (continued)
126          if actions:
127              a = actions[0]
128              rem_actions = actions[1:]
129              a0, subgoal, a1 = clink
130              if a != a0 and a != a1 and self.deletes(a,subgoal):
131                  if self.possible((a,a0),constrs):
132                      new_const = self.add_constraint((a,a0),constrs)
133                      for e in self.protect_cl_for_actions(rem_actions,new_const,clink): yield e
      # could be "yield from"
134                  if self.possible((a1,a),constrs):
135                      new_const = self.add_constraint((a1,a),constrs)
136                      for e in self.protect_cl_for_actions(rem_actions,new_const,clink): yield e
137              else:
138                  for e in self.protect_cl_for_actions(rem_actions,constrs,clink): yield e
139          else:
140              yield constrs
          Given an action act, the following method protects all the causal links in
      clinks from act. Whenever act deletes subgoal from some causal link (a0, subgoal, a1),
      the action act needs to be before a0 or after a1. This method enumerates all con-
      straints that result from protecting the causal links from act.
                                     stripsPOP.py — (continued)
          The following methods check whether an action (or action instance) achives
      or deletes some subgoal.
                                     stripsPOP.py — (continued)
166
167         def effects(self,action):
168             """returns the variable:value dictionary of the effects of action.
169             works for both actions and action instances"""
170             if isinstance(action, Action_instance):
171                 action = action.action
172             if action == "start":
173                 return self.planning_problem.initial_state
174             elif action == "finish":
175                 return {}
176             else:
177                 return self.planning_problem.prob_domain.strips_map[action].effects
stripsPOP.py — (continued)
stripsPOP.py — (continued)
     A good source of datasets is the UCI machine Learning Repository [?]; the
     SPECT and car datasets are from this repository.
17   class Data_set(Displayable):
18       """ A data set consists of a list of training data and a list of test data.
19       """
20       seed = None #123456 # make it None for a different test set each time
21
                                               103
     104                                                  7. Supervised Machine Learning
51         def create_features(self):
52             """create the input features and target feature.
53             This assumes that the features all have domain {0,1}.
54             This should be overridden if the features have a different domain.
55             """
56             self.input_features = []
57             for i in range(self.num_properties):
58                 def feat(e,index=i):
59                     return e[index]
60                 if self.header:
61                     feat.__doc__ = self.header[i]
62                 else:
63                     feat.__doc__ = "e["+str(i)+"]"
64                 feat.frange = [0,1]
65                 if i == self.target_index:
66                     self.target = feat
67                 else:
68 self.input_features.append(feat)
70      evaluation_criteria = ["sum-of-squares","sum_absolute","logloss"]
71
72      def evaluate_dataset(self, data, predictor, evaluation_criterion):
73          """Evaluates predictor on data according to the evaluation_criterion.
74          predictor is a function that takes an example and returns a
75                  prediction for the target feature.
76          evaluation_criterion is one of the evaluation_criteria.
77          """
78          assert evaluation_criterion in self.evaluation_criteria,"given: "+str(evaluation_criterion)
79          if data:
80              try:
81                  error = sum(error_example(predictor(example), self.target(example),
82                                          evaluation_criterion)
83                             for example in data)/len(data)
84              except ValueError:
85                  return float("inf") # infinity
86              return error
     error example is used to evaluate a single example, based on the predicted value,
     the actual value and the evaluation criterion. Note that for logloss, the actual
     value must be 0 or 1.
                                  learnProblem.py — (continued)
      standard csv package, that allows quoted arguments, can be used by uncom-
      menting the line for data aa and commenting out the following line. data tuples
      contains only those lines that contain the delimiter (others lines are assumed to
      be empty or comments), and tries to convert the elements to numbers when-
      ever possible.
          This allows for some of the columns to be included. Note that if include only
      is specified, the target index is in the resulting
learnProblem.py — (continued)
         • When the attribute only has two values, we designate one to be the “true”
           value.
         • When the values are all numeric, we assume they are ordered (as opposed
           to just being some classes that happen to be labelled with numbers, but
           where the numbers have no meaning) and construct Boolean features for
           splits of the data. That is, the feature is e[ind] < cut for some value cut.
           We choose a number of cut values, up to a maximum number of cuts,
           given by max num cuts.
         • When the values are not all numeric, we assume they are unordered, and
           create an indicator function for each value. An indicator function for a
           value returns true when that value is given and false otherwise. Note
           that we can’t create an indicator function for values that appear in the
           test set but not in the training set because we haven’t seen the test set.
           For the examples in the test set with that value, the indicator functions
           return false.
learnProblem.py — (continued)
185              else:
186                  target.__doc__ = "e["+str(ind)+"]"
187              target.frange = ranges[self.target_index]
188              self.target = target
189          if self.boolean_features:
190              self.input_features = []
191              for ind,frange in enumerate(ranges):
192                  if ind != self.target_index and len(frange)>1:
193                      if len(frange) == 2:
194                          # two values, the feature is equality to one of them.
195                          true_val = list(frange)[1] # choose one as true
196                          def feat(e, i=ind, tv=true_val):
197                              return e[i]==tv
198                          if self.header:
199                              feat.__doc__ = self.header[ind]+"=="+str(true_val)
200                          else:
201                              feat.__doc__ = "e["+str(ind)+"]=="+str(true_val)
202                          feat.frange = boolean
203                          self.input_features.append(feat)
204                      elif all(isinstance(val,(int,float)) for val in frange):
205                          # all numeric, create cuts of the data
206                          sorted_frange = sorted(frange)
207                          num_cuts = min(max_num_cuts,len(frange))
208                          cut_positions = [len(frange)*i//num_cuts for i in range(1,num_cuts)]
209                          for cut in cut_positions:
210                              cutat = sorted_frange[cut]
211                              def feat(e, ind_=ind, cutat=cutat):
212                                  return e[ind_] < cutat
213
214                            if self.header:
215                                feat.__doc__ = self.header[ind]+"<"+str(cutat)
216                            else:
217                                feat.__doc__ = "e["+str(ind)+"]<"+str(cutat)
218                            feat.frange = boolean
219                            self.input_features.append(feat)
220                      else:
221                          # create an indicator function for every value
222                          for val in frange:
223                              def feat(e, ind_=ind, val_=val):
224                                  return e[ind_] == val_
225                              if self.header:
226                                  feat.__doc__ = self.header[ind]+"=="+str(val)
227                              else:
228                                  feat.__doc__= "e["+str(ind)+"]=="+str(val)
229                              feat.frange = boolean
230                              self.input_features.append(feat)
231          else: # boolean_features is off
232              self.input_features = []
233              for i in range(self.num_properties):
234                  def feat(e,index=i):
      Exercise 7.1 Change the code so that it splits using e[ind] ≤ cut instead of e[ind] <
      cut. Check boundary cases, such as 3 elements with 2 cuts, and if there are 30
      elements (integers from 100 to 129), and you want 2 cuts, the resulting Boolean
      features should be e[ind] ≤ 109 and e[ind] ≤ 119 to make sure that each of the
      resulting ranges is equal size.
      Exercise 7.2 This splits on whether the feature is less than one of the values in
      the training set. Sam suggested it might be better to split between the values in
      the training set, and suggested using
      Why might Sam have suggested this? Does this work better? (Try it on a few data
      sets).
          When reading from a file all of the values are strings. This next method
      tries to convert each values into a number (an int or a float), if it is possible.
                                     learnProblem.py — (continued)
Example:
learnProblem.py — (continued)
      Exercise 7.3 For symmetric properties, such as product, we don’t need both
      f 1 ∗ f 2 as well as f 2 ∗ f 1 as extra properties. Allow the user to be able to declare
      feature constructors as symmetric (by associating a Boolean feature with them).
      Change construct features so that it does not create both versions for symmetric
      combiners.
      7.1.6 Learner
      A learner takes a dataset (and possible other arguments specific to the method).
      To get it to learn, we call the learn() method. This implements Displayable so
that we can display traces at multiple levels of detail (and perhaps with a GUI).
learnProblem.py — (continued)
         • a point prediction, where we are only allowed to predict one of the values
           of the feature. For example, if the values of the feature are {0, 1} we are
           only allowed to predict 0 or 1 or of the values are ratings in {1, 2, 3, 4, 5},
           we can only predict one of these integers.
         • a point prediction, where we are allowed to predict any value. For exam-
           ple, if the values of the feature are {0, 1} we may be allowed to predict 0.3,
           1, or even 1.7. For all of the criteria we can imagine, there is no point in
           predicting a value greater than 1 or less that zero (but that doesn’t mean
           we can’t), but it is often useful to predict a value between 0 and 1. If the
           values are ratings in {1, 2, 3, 4, 5}, we may want to predict 3.4.
         • a probability distribution over the values of the feature. For each value v,
           we predict a non-negative number pv , such that the sum over all predic-
           tions is 1.
      The following code assumes the second of these, where we can make a point
      prediction of any value (although median will only predict one of the actual
      values for the feature).
          The point prediction function takes in a target feature (which is assumed to
      be numeric), some training data, and a section of what to return, and returns
      a function that takes in an example, and makes a prediction of a value for the
      target variable, but makes same prediction for all examples.
          This method uses selection, whose value should be “median”, “proportion”,
      or “Laplace” determine what prediction should be made.
     7.2.1 Testing
     To test the point prediction, we will first generate some data from a simple
     (Bernoulli) distribution, where there are two possible values, 0 and 1 for the
     target feature. Given prob, a number in the range [0, 1], this generate some
     training and test data where prob is the probability of each example being 1.
                                  learnNoInputs.py — (continued)
65   class Data_set_random(Data_set):
66       """A data set of a {0,1} feature generated randomly given a probability"""
67       def __init__(self, prob, train_size, test_size=100):
68           """a data set of with train_size training examples,
69           test_size test examples
70           where each examples in generated where prob i the probability of 1
71           """
72           self.max_display_level = 0
73           train = [[1] if random.random()<prob else [0] for i in range(train_size)]
74           test = [[1] if random.random()<prob else [0] for i in range(test_size)]
75           Data_set.__init__(self, train, test, target_index=0)
         Let’s try to evaluate the predictions of the possible selections according to
     the different evaluation criteria, for various training sizes.
                                  learnNoInputs.py — (continued)
77   def test_no_inputs():
78       num_samples = 1000 #number of runs to average over
79       test_size = 100     # number of test examples for each prediction
80       for train_size in [1,2,3,4,5,10,20,100,1000]:
81           total_error = {(select,crit):0
82                          for select in selections
83                          for crit in Data_set.evaluation_criteria}
84           for sample in range(num_samples): # average over num_samples
85               p = random.random()
86               data = Data_set_random(p, train_size, test_size)
87               for select in selections:
88                   prediction = point_prediction(data.target, data.train, selection=select)
89                   for ecrit in Data_set.evaluation_criteria:
90                       test_error = data.evaluate_dataset(data.test,prediction,ecrit)
91                       total_error[(select,ecrit)] += test_error
92           print("For training size",train_size,":")
93           for ecrit in Data_set.evaluation_criteria:
94               print(" Evaluated according to",ecrit,":")
95               for select in selections:
96                   print("       Average error of",select,"is",
97                         total_error[(select,ecrit)]/num_samples)
 98
 99   if __name__ == "__main__":
100       test_no_inputs()
          The decision tree algorithm does binary splits, and assumes that all input
      features are binary functions of the examples. It stops splitting if there are
      no input features, the number of examples is less than a specified number of
      examples or all of the examples agree on the target feature.
                                learnDT.py — Learning a binary decision tree
 11   from learnProblem import Learner, error_example
 12   from learnNoInputs import point_prediction, target_counts, selections
 13   import math
 14
 15   class DT_learner(Learner):
 16       def __init__(self,
 17                   dataset,
 18                   to_optimize="sum-of-squares",
 19                   leaf_selection="mean", # what to use for point prediction at leaves
 20                   train=None,            # used for cross validation
 21                   min_number_examples=10):
 22           self.dataset = dataset
 23           self.target = dataset.target
 24           self.to_optimize = to_optimize
 25           self.leaf_selection = leaf_selection
 26           self.min_number_examples = min_number_examples
 27           if train is None:
 28               self.train = self.dataset.train
 29           else:
 30               self.train = train
 31
 32         def learn(self):
 33             return self.learn_tree(self.dataset.input_features, self.train)
          The main recursive algorithm, takes in a set of input features and a set of
      training data. It first decides whether to split. If it doesn’t split, it makes a point
      prediction, ignoring the input features.
          It doesn’t split if there are no more input features, if there are fewer exam-
      ples than min number examples, if all the examples agree on the value of the
      target or if the best split makes all examples in the same partition
         If it decides to split, it selects the best split and returns the condition to split
     on (in the variable split) and the corresponding partition of the examples.
learnDT.py — (continued)
learnDT.py — (continued)
 80          best_partition = None
 81          for feat in input_features:
 82              false_examples, true_examples = partition(data_subset,feat)
 83              if false_examples and true_examples: #both partitons are non-empty
 84                  err = (training_error(self.dataset,false_examples,self.to_optimize)
 85                         + training_error(self.dataset,true_examples,self.to_optimize))
 86                  self.display(3," split on",feat.__doc__,"has err=",err,
 87                            "splits into",len(true_examples),":",len(false_examples))
 88                  if err < best_error:
 89                      best_feat = feat
 90                      best_error=err
 91                      best_partition = false_examples, true_examples
 92          self.display(3,"best split is on",best_feat.__doc__,
 93                                 "with err=",best_error)
 94          return best_feat, best_partition
 95
 96   def partition(data_subset,feature):
 97       """partitions the data_subset by the feature"""
 98       true_examples = []
 99       false_examples = []
100       for example in data_subset:
101           if feature(example):
102               true_examples.append(example)
103           else:
104               false_examples.append(example)
105       return false_examples, true_examples
106
107
108   def training_error(dataset, data_subset, to_optimize):
109       """returns training error for dataset on to_optimize.
110       This assumes that we choose the best value for the optimization
111       criteria for dataset according to point_prediction
112       """
113       select_dict = {"sum-of-squares":"mean", "sum_absolute":"median",
114                        "logloss":"Laplace"} # arbitrary mapping. Perhaps wrong.
115       selection = select_dict[to_optimize]
116       predictor = point_prediction(dataset.target, data_subset, selection=selection)
117       error = sum(error_example(predictor(example),
118                                dataset.target(example),
119                                to_optimize)
120                   for example in data_subset)
121       return error
Test cases:
learnDT.py — (continued)
      Exercise 7.4 The current algorithm does not have a very sophisticated stopping
      criterion. What is the current stopping criterion? (Hint: you need to look at both
      learn tree and select split.)
Exercise 7.5 Extend the current algorithm to include in the stopping criterion
        (a) A minimum child size; don’t use a split if one of the children has fewer
            elements that this.
        (c) An improvement bound such that a split is only carried out if error with the
            split is better than the error without the split by at least the improvement
            bound.
      Which values for these parameters make the prediction errors on the test set the
      smallest? Try it on more than one dataset.
      Exercise 7.6 Without any input features, it is often better to include a pseudo-
      count that is added to the counts from the training data. Modify the code so that
      it includes a pseudo-count for the predictions. When evaluating a split, including
      pseudo counts can make the split worse than no split. Does pruning with an im-
      provement bound and pseudo-counts make the algorithm work better than with
      an improvement bound by itself?
      Exercise 7.7 Some people have suggested using information gain (which is equiv-
      alent to greedy optimization of logloss) as the measure of improvement when
      building the tree, even in they want to have non-probabilistic predictions in the
      final tree. Does this work better than myopically choosing the split that is best for
      the evaluation criteria we will use to judge the final prediction?
         The above decision tree overfits the data. One way to determine whether
     the prediction is overfitting is by cross validation. The code below implements
     k-fold cross validation, which can be used to choose the value of parameters
     to best fit the training data. If we want to use parameter tuning to improve
     predictions on a particular data set, we can only use the training data (and not
     the test data) to tune the parameter.
         In k-fold cross validation, we partition the training set into k approximately
     equal-sized folds (each fold is an enumeration of examples). For each fold, we
     train on the other examples, and determine the error of the prediction on that
     fold. For example, if there are 10 folds, we train on 90% of the data, and then
     test on remaining 10% of the data. We do this 10 times, so that each example
     gets used as a test set once, and in the training set 9 times.
         The code below creates one copy of the data, and multiple views of the data.
     For each fold, fold enumerates the examples in the fold, and fold complement
     enumerates the examples not in the fold.
70         plt.legend()
71         plt.draw()
72
73   #   Try
74   #   data = Data_from_file('data/mail_reading.csv', target_index=-1)
75   #   data = Data_from_file('data/SPECT.csv',target_index=0)
76   #   data = Data_from_file('data/carbool.csv', target_index=-1)
77   #   plot_error(data) # warning, may take a long time depending on the dataset
78
79   def plot_fig_7_15(): # different runs produce different plots
80       data = Data_from_file('data/SPECT.csv',target_index=0)
81       # data = Data_from_file('data/carbool.csv', target_index=-1)
82       plot_error(data)
83   # plot_fig_7_15() # warning takes a long time!
     predictor predicts the value of an example from the current parameter settings.
     predictor string gives a string representation of the predictor.
                                   learnLinear.py — (continued)
40
41      def predictor(self,e):
42          """returns the prediction of the learner on example e"""
43          linpred = sum(w*f(e) for f,w in self.weights.items())
44          if self.squashed:
45              return sigmoid(linpred)
46          else:
47              return linpred
48
49      def predictor_string(self, sig_dig=3):
50          """returns the doc string for the current prediction function
51          sig_dig is the number of significant digits in the numbers"""
52          doc = "+".join(str(round(val,sig_dig))+"*"+feat.__doc__
53                         for feat,val in self.weights.items())
54          if self.squashed:
55              return "sigmoid("+ doc+")"
56          else:
57              return doc
     learn is the main algorithm of the learner. It does num iter steps of gradient
     descent. The other parameters it gets from the class.
                                   learnLinear.py — (continued)
59      def learn(self,num_iter=100):
60          for it in range(num_iter):
61              self.display(2,"prediction=",self.predictor_string())
62              for e in self.train:
63                  predicted = self.predictor(e)
64                  error = self.target(e) - predicted
65                  update = self.learning_rate*error
66                  for feat in self.weights:
67                      self.weights[feat] += update*feat(e)
68          #self.predictor.__doc__ = self.predictor_string()
69          #return self.predictor
     one is a function that always returns 1. This is used for one of the input prop-
     erties.
                                   learnLinear.py — (continued)
71   def one(e):
72       "1"
73       return 1
     sigmoid(x) is the function
              1
           1 + e−x
learnLinear.py — (continued)
 75   def sigmoid(x):
 76       return 1/(1+math.exp(-x))
          The following tests the learner on a data sets. Uncomment the other data
      sets for different examples.
                                    learnLinear.py — (continued)
          The following plots the errors on the training and test sets as a function of
      the number of steps of gradient descent.
                                    learnLinear.py — (continued)
 91   def plot_steps(learner=None,
 92                 data = None,
 93                 criterion="sum-of-squares",
 94                 step=1,
 95                 num_steps=1000,
 96                 log_scale=True,
 97                 label=""):
 98       """
 99       plots the training and test error for a learner.
100       data is the
101       learner_class is the class of the learning algorithm
102       criterion gives the evaluation criterion plotted on the y-axis
103       step specifies how many steps are run for each point on the plot
104       num_steps is the number of points to plot
105
106         """
107         plt.ion()
108         plt.xlabel("step")
109         plt.ylabel("Average "+criterion+" error")
110         if log_scale:
111             plt.xscale('log') #plt.semilogx() #Makes a log scale
112         else:
113             plt.xscale('linear')
114         if data is None:
115             data = Data_from_file('data/holiday.csv', num_train=19, target_index=-1)
116             #data = Data_from_file('data/SPECT.csv', target_index=0)
      Exercise 7.8 The squashed learner only makes predictions in the range (0, 1). If
      the output values are {1, 2, 3, 4} there is no use prediction less than 1 or greater
      than 4. Change the squashed learner so that it can learn values in the range (1, 4).
      Test it on the file 'data/car.csv'.
          The following plots the prediction as a function of the function of the num-
      ber of steps of gradient descent. We first define a version of range that allows
      for real numbers (integers and floats).
                                     learnLinear.py — (continued)
157                   minx = 0,
158                   maxx = 5,
159                   step_size = 0.01, # for plotting
160                   label="function"):
161         plt.ion()
162         plt.xlabel("x")
163         plt.ylabel("y")
164         if data is None:
165             data = Data_from_file('data/simp_regr.csv', prob_test=0,
166                                 boolean_features=False, target_index=-1)
167         if learner is None:
168             learner = Linear_learner(data,squashed=False)
169         learner.learning_rate=0.001
170         learner.learn(100)
171         learner.learning_rate=0.0001
172         learner.learn(1000)
173         learner.learning_rate=0.00001
174         learner.learn(10000)
175         learner.display(1,"function learned is", learner.predictor_string(),
176                   "error=",data.evaluate_dataset(data.train, learner.predictor, "sum-of-squares"))
177         plt.plot([e[0] for e in data.train],[e[-1] for e in data.train],"bo",label="data")
178         plt.plot(list(arange(minx,maxx,step_size)),[learner.predictor([x])
179                                            for x in arange(minx,maxx,step_size)],
180                                          label=label)
181         plt.legend()
182         plt.draw()
learnLinear.py — (continued)
206                                         include_orig=False)
207          learner = learner_class(data_aug,squashed=False)
208          learner.learning_rate=learning_rate
209          learner.learn(num_iter)
210          learner.display(1,"For degree",degree,
211                      "function learned is", learner.predictor_string(),
212                      "error=",data.evaluate_dataset(data.train, learner.predictor, "sum-of-squares")
213          ls = line_styles[degree % len(line_styles)]
214          col = colors[degree % len(colors)]
215          plt.plot(x_values,[learner.predictor([x]) for x in x_values], linestyle=ls, color=col,
216                          label="degree="+str(degree))
217          plt.legend(loc='upper left')
218          plt.draw()
219
220   # Try:
221   # plot_prediction()
222   # plot_polynomials()
223   #data = Data_from_file('data/mail_reading.csv', target_index=-1)
224   #plot_prediction(data=data)
36          """Backpropagate the errors on the outputs, return the errors on the inputs.
37          errors is a list of errors for the outputs (of length self.num_outputs).
38          Return the errors for the inputs to this layer (of length self.num_inputs).
39          You can assume that this is only called after corresponding output_values,
40             and it can remember information information required for the backpropagation.
41          """
42          raise NotImplementedError("backprop") # abstract method
     A linear layer maintains an array of weights. self .weights[o][i] is the weight
     between input i and output o. A 1 is added to the inputs.
                                   learnNN.py — (continued)
44   class Linear_complete_layer(Layer):
45       """a completely connected layer"""
46       def __init__(self, nn, num_outputs, max_init=0.2):
47           """A completely connected linear layer.
48           nn is a neural network that the inputs come from
49           num_outputs is the number of outputs
50           max_init is the maximum value for random initialization of parameters
51           """
52           Layer.__init__(self, nn, num_outputs)
53           # self.weights[o][i] is the weight between input i and output o
54           self.weights = [[random.uniform(-max_init, max_init)
55                           for inf in range(self.num_inputs+1)]
56                         for outf in range(self.num_outputs)]
57
58      def output_values(self,input_values):
59          """Returns the outputs for the input values.
60          It remembers the values for the backprop.
61
62          Note in self.weights there is a weight list for every output,
63          so wts in self.weights effectively loops over the outputs.
64          """
65          self.inputs = input_values + [1]
66          return [sum(w*val for (w,val) in zip(wts,self.inputs))
67                     for wts in self.weights]
68
69      def backprop(self,errors):
70          """Backpropagate the errors, updating the weights and returning the error in its inputs.
71          """
72          input_errors = [0]*(self.num_inputs+1)
73          for out in range(self.num_outputs):
74              for inp in range(self.num_inputs+1):
75                  input_errors[inp] += self.weights[out][inp] * errors[out]
76                  self.weights[out][inp] += self.nn.learning_rate * self.inputs[inp] * errors[out]
77          return input_errors[:-1] # remove the error for the "1"
learnNN.py — (continued)
79   class Sigmoid_layer(Layer):
80       """sigmoids of the inputs.
81       The number of outputs is equal to the number of inputs.
learnNN.py — (continued)
 98   class ReLU_layer(Layer):
 99       """Rectified linear unit (ReLU) f(z) = max(0, z).
100       The number of outputs is equal to the number of inputs.
101       """
102       def __init__(self, nn):
103           Layer.__init__(self, nn)
104
105         def output_values(self,input_values):
106             """Returns the outputs for the input values.
107             It remembers the input values for the backprop.
108             """
109             self.input_values = input_values
110             self.outputs= [max(0,inp) for inp in input_values]
111             return self.outputs
112
113         def backprop(self,errors):
114             """Returns the derivative of the errors"""
115             return [e if inp>0 else 0 for e,inp in zip(errors, self.input_values)]
learnNN.py — (continued)
The test method learns a network and evaluates it according to various criteria.
learnNN.py — (continued)
143
144      def learn(self,num_iter):
145          """Learns parameters for a neural network using stochastic gradient decent.
146          num_iter is the number of iterations
147          """
148          for i in range(num_iter):
149              for e in random.sample(self.dataset.train,len(self.dataset.train)):
150                  # compute all outputs
151                  values = [f(e) for f in self.input_features]
152                  for layer in self.layers:
153                      values = layer.output_values(values)
154                  # backpropagate
155                  errors = self.sum_squares_error([self.dataset.target(e)],values)
156                  for layer in reversed(self.layers):
157                      errors = layer.backprop(errors)
158
159      def sum_squares_error(self,observed,predicted):
160          """Returns the errors for each of the target features.
161          """
162          return [obsd-pred for obsd,pred in zip(observed,predicted)]
learnNN.py — (continued)
174   nn1.add_layer(Sigmoid_layer(nn1))
175   nn1.learning_rate=0.1
176   #nn1.learn(100)
177
178   from learnLinear import plot_steps
179   import time
180   start_time = time.perf_counter()
181   plot_steps(learner = nn1, data = data, num_steps=10000)
182   for eg in data.train:
183       print(eg,nn1.predictor(eg))
184   end_time = time.perf_counter()
185   print("Time:", end_time - start_time)
      Exercise 7.9 In the definition of nn1 above, for each of the following, first hy-
      pothesize what will happen, then test your hypothesis, then explain whether you
      testing confirms your hypothesis or not. Test it for more than one data set, and use
      more than one run for each data set.
        (a) Which fits the data better, having a sigmoid layer or a ReLU layer after the
            first linear layer?
        (b) Which is faster, having a sigmoid layer or a ReLU layer after the first linear
            layer?
        (c) What happens if you have both the sigmoid layer and then a ReLU layer
            after the first linear layer and before the second linear layer?
       (d) What happens if you have neither the sigmoid layer nor a ReLU layer after
           the first linear layer?
        (e) What happens if you have a ReLU layer then a sigmoid layer after the first
            linear layer and before the second linear layer?
      Exercise 7.10 Do some
      It is even possible to define a perceptron layer. Warning: you may need to
      change the learning rate to make this work. Should I add it into the code? It
      doesn’t follow the official line.
      class PerceptronLayer(Layer):
          def __init__(self, nn):
              Layer.__init__(self, nn)
            def output_values(self,input_values):
                """Returns the outputs for the input values.
                """
                self.outputs= [1 if inp>0 else -1 for inp in input_values]
                return self.outputs
            def backprop(self,errors):
                """Pass the errors through"""
                return errors
     7.7     Boosting
     The following code implements functional gradient boosting for regression.
         A Boosted dataset is created from a base dataset by subtracting the pre-
     diction of the offset function from each example. This does not save the new
     dataset, but generates it as needed. The amount of space used is constant, in-
     dependent on the size of the data set.
                           learnBoosting.py — Functional Gradient Boosting
11   from learnProblem import Data_set, Learner
12
13   class Boosted_dataset(Data_set):
14       def __init__(self, base_dataset, offset_fun):
15           """new dataset which is like base_dataset,
16              but offset_fun(e) is subtracted from the target of each example e
17           """
18           self.base_dataset = base_dataset
19           self.offset_fun = offset_fun
20           Data_set.__init__(self, base_dataset.train, base_dataset.test,
21                           base_dataset.prob_test, base_dataset.target_index)
22
23      def create_features(self):
24          self.input_features = self.base_dataset.input_features
25          def newout(e):
26              return self.base_dataset.target(e) - self.offset_fun(e)
27          newout.frange = self.base_dataset.target.frange
28          self.target = newout
        A boosting learner takes in a dataset and a base learner, and returns a new
     predictor. The base learner, takes a dataset, and returns a Learner object.
                                   learnBoosting.py — (continued)
30   class Boosting_learner(Learner):
31       def __init__(self, dataset, base_learner_class):
32           self.dataset = dataset
33           self.base_learner_class = base_learner_class
34           mean = sum(self.dataset.target(e)
35                     for e in self.dataset.train)/len(self.dataset.train)
36           self.predictor = lambda e:mean # function that returns mean for each example
37           self.predictor.__doc__ = "lambda e:"+str(mean)
38           self.offsets = [self.predictor]
39           self.errors = [data.evaluate_dataset(data.test, self.predictor, "sum-of-squares")]
40           self.display(1,"Predict mean test set error=", self.errors[0] )
41
42
43      def learn(self, num_ensemble=10):
44          """adds num_ensemble learners to the ensemble.
45          returns a new predictor.
46          """
47          for i in range(num_ensemble):
48              train_subset = Boosted_dataset(self.dataset, self.predictor)
49              learner = self.base_learner_class(train_subset)
50              new_offset = learner.learn()
51              self.offsets.append(new_offset)
52              def new_pred(e, old_pred=self.predictor, off=new_offset):
53                  return old_pred(e)+off(e)
54              self.predictor = new_pred
55              self.errors.append(data.evaluate_dataset(data.test, self.predictor,"sum-of-squares"))
56              self.display(1,"After Iteration",len(self.offsets)-1,"test set error=", self.errors[-1])
57          return self.predictor
     For testing, sp DT learner returns a function that constructs a decision tree learner
     where the minimum number of examples is a proportion of the number of
     training examples. The value of 0.9 tends to have one split, and a value of 0.5
     tends to have two splits (but test it). Thus this can be used to construct small
     decision trees that can be used as weak learners.
                                   learnBoosting.py — (continued)
59   # Testing
60
61   from learnDT import DT_learner
62   from learnProblem import Data_set, Data_from_file
63
64   def sp_DT_learner(min_prop=0.9):
65       def make_learner(dataset):
66           mne = len(dataset.train)*min_prop
67           return DT_learner(dataset,min_number_examples=mne)
68       return make_learner
69
70   data = Data_from_file('data/carbool.csv', target_index=-1)
71   #data = Data_from_file('data/SPECT.csv', target_index=0)
72   #data = Data_from_file('data/mail_reading.csv', target_index=-1)
73   #data = Data_from_file('data/holiday.csv', num_train=19, target_index=-1)
74   learner9 = Boosting_learner(data, sp_DT_learner(0.9))
75   #learner7 = Boosting_learner(data, sp_DT_learner(0.7))
76   #learner5 = Boosting_learner(data, sp_DT_learner(0.5))
77   predictor9 =learner9.learn(10)
78   for i in learner9.offsets: print(i.__doc__)
79   import matplotlib.pyplot as plt
80
81   def plot_boosting(data,steps=10, thresholds=[0.5,0.1,0.01,0.001], markers=['-','--','-.',':'] ):
82       learners = [Boosting_learner(data, sp_DT_learner(th)) for th in thresholds]
83       predictors = [learner.learn(steps) for learner in learners]
84       plt.ion()
85       plt.xscale('linear') # change between log and linear scale
86       plt.xlabel("number of trees")
87       plt.ylabel(" error")
88       for (learner,(threshold,marker)) in zip(learners,zip(thresholds,markers)):
89           plt.plot(range(len(learner.errors)), learner.errors, ls=marker,c='k',
90                       label=str(round(threshold*100))+"% min example threshold")
91       plt.legend()
92       plt.draw()
93
94   # plot_boosting(data)
                                                 137
     138                                                           8. Reasoning Under Uncertainty
                                            A     B     C     Value
                                            0     a     s     v0
                                            0     a     t     v1
                                            0     b     s     v2
                                            0     b     t     v3
                                            0     c     s     v4
                                            0     c     t     v5
                                            1     a     s     v6
                                            1     a     t     v7
                                            1     b     s     v8
                                            1     b     t     v9
                                            1     c     s     v10
                                            1     c     t     v11
28
29         def __repr__(self):
30             return "Variable('"+self.name+"')"
     8.2        Factors
     Factors are functions from variables into values. The main problem with vari-
     able elimination is the amount of space used, because it saves the intermedi-
     ate factors. (If instead it recomputed factors rather than saving the factors, it
     would be effectively enumerating the worlds, and so would be exponential in
     the number of variables). We only want to store the list of numbers, with as
     little bookkeeping as possible.
          A total ordering of the variables, and a total ordering of the values in the
     domains of the variables induces a total ordering of the values of the factor
     according to the lexicographic ordering. E.g., suppose the domain of A is [0, 1],
     domain of B is [0 a0 ,0 b0 ,0 c0 ], and the domain of C is [0 s0 ,0 t0 ], the ordering [A, B, C]
     of variables induces an ordering on the values of the factor, as in Figure 8.1.
      We just need to store the list of variables and the vi s. For any assignment to A,
     B and C, we can compute the index of the value for that assignment. A = a, B =
     b, C = c is stored at location a0 ∗ 6 + b0 ∗ 2 + c0 , where a0 is A.val to index[a], and
     similarly for b0 and c0 .
                             probFactors.py — Factor manipulation for graphical models
11   from functools import reduce
12   #from probVariables import Variable
13
14   class Factor(object):
15       nextid=0 # each factor has a unique identifier; for printing
16
17      def __init__(self,variables):
18          """variables is the ordered list of variables
19          """
20          self.variables = variables # ordered list of variables
21          # Compute the size and the offsets for the variables
22          self.var_offsets = {}
23          self.size = 1
24          for i in range(len(variables)-1,-1,-1):
25              self.var_offsets[variables[i]]=self.size
26              self.size *= variables[i].size
27          self.id = Factor.nextid
28          self.name = "f"+str(self.id)
29          Factor.nextid += 1
     For each factor, get value returns the value of the factor for an assignment. An
     assignment is a variable:value dictionary. The assignment must include all of
     the variables involved in the factor, and can include variables not in the factor.
     This needs to be defined for every subclass.
                                   probFactors.py — (continued)
31      def get_value(self,assignment):
32          raise NotImplementedError("get_value") # abstract method
     The methods str and brief return string representations of the factor, as a table
     or just as a name with the variables it is a factor on.
                                   probFactors.py — (continued)
57            res = self.name+"("
58            for i in range(0,len(self.variables)-1):
59                res += str(self.variables[i])+","
60            if len(self.variables)>0:
61                res += str(self.variables[len(self.variables)-1])
62            res += ")"
63            return res
     The methods assignment to index and index to assignment map between the as-
     signments of values to variables and the index of where that assignment would
     be stored.
probFactors.py — (continued)
65         def assignment_to_index(self,assignment):
66             """returns the index where the variable:value assignment is stored"""
67             index = 0
68             for var in self.variables:
69                 index += var.val_to_index[assignment[var]]*self.var_offsets[var]
70             return index
71
72         def index_to_assignment(self,index):
73             """gives a dict representation of the variable assignment for index
74             """
75             asst = {}
76             for i in range(len(self.variables)-1,-1,-1):
77                 asst[self.variables[i]] = self.variables[i].domain[index % self.variables[i].size]
78                 index = index // self.variables[i].size
79             return asst
probFactors.py — (continued)
81   class Factor_stored(Factor):
82       def __init__(self,variables,values):
83           Factor.__init__(self, variables)
84           self.values = values
85
86         def get_value(self,assignment):
87             return self.values[self.assignment_to_index(assignment)]
probFactors.py — (continued)
89   class Factor_observed(Factor):
90       def __init__(self,factor,obs):
91           Factor.__init__(self, [v for v in factor.variables if v not in obs])
92           self.observed = obs
93           self.orig_factor = factor
 94
 95      def get_value(self,assignment):
 96          ass = assignment.copy()
 97          for ob in self.observed:
 98              ass[ob]=self.observed[ob]
 99          return self.orig_factor.get_value(ass)
      A Factor sum is a factor that is the result of summing out a variable from the
      product of other factors. Ie., it constructs a representation of:
            ∑ ∏              f.
            var f ∈factors
      We store the values in a list in a lazy manner; if they are already computed, we
      used the stored values. If they are not already computed we can compute and
      store them.
                                    probFactors.py — (continued)
      The method factor times multiples a set of factors that are all factors on the same
      variable (or on no variables). This is the last step in variable elimination before
      normalizing. It returns an array giving the product for each value of variable.
probFactors.py — (continued)
22   class Belief_network(Graphical_model):
23       """The class of belief networks."""
24
25         def __init__(self,vars=None,factors=None):
26             """vars is a list of variables
27             factors is a list of factors. Here we assume that all of the factors are instances of Prob.
28             """
29             Graphical_model.__init__(self,vars,factors)
30             assert all(isinstance(f,Prob) for f in factors) if factors else True
        Each of the inference methods implements the query method that com-
     putes the posterior probability of a variable given a dictionary of variable:value
     observations. These are all Displayable because they implement the display
     method which is currently text-based.
probGraphicalModels.py — (continued)
probGraphicalModels.py — (continued)
     The second Bayesian network is the report-of-leaving example from Poole and
     Mackworth, Artificial Intelligence, 2010 http://artint.info. This is Example
     6.10 (page 236) shown in Figure 6.1.
probGraphicalModels.py — (continued)
59   Le   =   Variable("Leaving", boolean)
60   Re   =   Variable("Report", boolean)
61   Sm   =   Variable("Smoke", boolean)
62   Ta   =   Variable("Tamper", boolean)
63
64   f_ta     =   Prob(Ta,[],[0.98,0.02])
65   f_fi     =   Prob(Fi,[],[0.99,0.01])
66   f_sm     =   Prob(Sm,[Fi],[0.99,0.01,0.1,0.9])
67   f_al     =   Prob(Al,[Fi,Ta],[0.9999, 0.0001, 0.15, 0.85, 0.01, 0.99, 0.5, 0.5])
68   f_lv     =   Prob(Le,[Al],[0.999, 0.001, 0.12, 0.88])
69   f_re     =   Prob(Re,[Le],[0.99, 0.01, 0.25, 0.75])
70
71   bn2 = Belief_network([Al,Fi,Le,Re,Sm,Ta],[f_ta,f_fi,f_sm,f_al,f_lv,f_re])
73
74   Season = Variable("Season",["summer","winter"])
75   Sprinkler = Variable("Sprinkler",["on","off"])
76   Rained = Variable("Rained",boolean)
77   Grass_wet = Variable("Grass wet",boolean)
78   Grass_shiny = Variable("Grass shiny",boolean)
79   Shoes_wet = Variable("Shoes wet",boolean)
80
81   f_season = Prob(Season,[],[0.5,0.5])
82   f_sprinkler = Prob(Sprinkler,[Season],[0.9,0.1,0.05,0.95])
83   f_rained = Prob(Rained,[Season],[0.7,0.3,0.2,0.8])
84   f_wet = Prob(Grass_wet,[Sprinkler,Rained], [1,0,0.1,0.9,0.2,0.8,0.02,0.98])
85   f_shiny = Prob(Grass_shiny, [Grass_wet], [0.95,0.05,0.3,0.7])
86   f_shoes = Prob(Shoes_wet, [Grass_wet], [0.92,0.08,0.35,0.65])
87
88   bn3 = Belief_network([Season, Sprinkler, Rained, Grass_wet, Grass_shiny, Shoes_wet],
89                 [f_season, f_sprinkler, f_rained, f_wet, f_shiny, f_shoes])
18         """
19         def __init__(self,gm=None):
20             self.gm = gm
21
22         def query(self,var,obs={},elim_order=None):
23             """computes P(var|obs) where
24             var is a variable
25             obs is a variable:value dictionary"""
26             if var in obs:
27                 return [1 if val == obs[var] else 0 for val in var.domain]
28             else:
29                 if elim_order == None:
30                     elim_order = self.gm.variables
31                 projFactors = [self.project_observations(fact,obs)
32                               for fact in self.gm.factors]
33                 for v in elim_order:
34                     if v != var and v not in obs:
35                         projFactors = self.eliminate_var(projFactors,v)
36                 unnorm = factor_times(var,projFactors)
37                 p_obs=sum(unnorm)
38                 self.display(1,"Unnormalized probs:",unnorm,"Prob obs:",p_obs)
39                 return {val:pr/p_obs for val,pr in zip(var.domain, unnorm)}
     To project observations onto a factor, for each variable that is observed in the
     factor, we construct a new factor that is the factor projected onto that variable.
     Factor observed creates a new factor that is the result is assigning a value to a
     single variable.
                                     probVE.py — (continued)
41         def project_observations(self,factor,obs):
42             """Returns the resulting factor after observing obs
43
44            obs is a dictionary of variable:value pairs.
45            """
46            if any((var in obs) for var in factor.variables):
47                # a variable in factor is observed
48                return Factor_observed(factor,obs)
49            else:
50                return factor
51
52         def eliminate_var(self,factors,var):
53             """Eliminate a variable var from a list of factors.
54             Returns a new set of factors that has var summed out.
55             """
56             self.display(2,"eliminating ",str(var))
57             contains_var = []
58             not_contains_var = []
59             for fac in factors:
60                 if var in fac.variables:
61                     contains_var.append(fac)
62                 else:
63                  not_contains_var.append(fac)
64          if contains_var == []:
65              return factors
66          else:
67              newFactor = Factor_sum(var,contains_var)
68              self.display(2,"Multiplying:",[f.brief() for f in contains_var])
69              self.display(2,"Creating factor:", newFactor.brief())
70              self.display(3, newFactor) # factor in detail
71              not_contains_var.append(newFactor)
72              return not_contains_var
73
74   from probGraphicalModels import bn1, A,B,C
75   bn1v = VE(bn1)
76   ## bn1v.query(A,{})
77   ## bn1v.query(C,{})
78   ## Inference_method.max_display_level = 3 # show more detail in displaying
79   ## Inference_method.max_display_level = 1 # show less detail in displaying
80   ## bn1v.query(A,{C:True})
81   ## bn1v.query(B,{A:True,C:False})
82
83   from probGraphicalModels import bn2,Al,Fi,Le,Re,Sm,Ta
84   bn2v = VE(bn2) # answers queries using variable elimination
85   ## bn2v.query(Ta,{})
86   ## Inference_method.max_display_level = 0 # show no detail in displaying
87   ## bn2v.query(Le,{})
88   ## bn2v.query(Ta,{},elim_order=[Sm,Re,Le,Al,Fi])
89   ## bn2v.query(Ta,{Re:True})
90   ## bn2v.query(Ta,{Re:True,Sm:False})
91
92   from probGraphicalModels import bn3, Season, Sprinkler, Rained, Grass_wet, Grass_shiny, Shoes_wet
93   bn3v = VE(bn3)
94   ## bn3v.query(Shoes_wet,{})
95   ## bn3v.query(Shoes_wet,{Rained:True})
96   ## bn3v.query(Shoes_wet,{Grass_shiny:True})
97   ## bn3v.query(Shoes_wet,{Grass_shiny:False,Rained:True})
     Exercise 8.1
        What is the time and space complexity the following 4 methods to generate n
     samples, where m is the length of dist:
      (a) n calls to sample one
      (b) sample multiple
       (c) Create the cumulative distribution (choose how this is represented) and, for
           each random number, do a binary search to determine the sample associated
           with the random number.
      (d) Choose a random number in the range [i/n, (i + 1)/n) for each i ∈ range(n),
           where n is the number of samples. Use these as the random numbers to
           select the particles. (Does this give random samples?)
53   class Sampling_inference_method(Inference_method):
54       """The abstract class of sampling-based belief network inference methods"""
55       def query(self,qvar,obs={},number_samples=1000,sample_order=None):
56           raise NotImplementedError("Sampling_inference_method query") # abstract
         Some of the sampling methods require a sample order of factors represent-
     ing conditional probabilities, where the parents of a node must come before
     the node in the sample order. The following method computes such a sample
     ordering, and is used when the sample order argument is None.
                                  probStochSim.py — (continued)
58   def select_sample_ordering(bn):
59       """creates a sample ordering of factors such that the parents of a node
60       are before the node.
61       raises StopIteration if there is no such ordering. This would occur in next(.).
62       """
63       sample_order=[]
64       defined = set() # set of variables whose probability is defined
65       factors_to_sample = bn.factors.copy()
66       while factors_to_sample:
67           fac = next(f for f in factors_to_sample
68                     if all(par in defined for par in f.parents))
 69             factors_to_sample.remove(fac)
 70             sample_order.append(fac)
 71             defined.add(fac.child)
 72         return sample_order
 74   class Rejection_sampling(Sampling_inference_method):
 75       """The class that queries Graphical Models using Rejection Sampling.
 76
 77         bn is a belief network to query
 78         """
 79         def __init__(self,bn=None):
 80             self.bn = bn
 81             self.label = "Rejection Sampling"
 82
 83         def query(self,qvar,obs={},number_samples=1000,sample_order=None):
 84             """computes P(qvar|obs) where
 85             qvar is a variable.
 86             obs is a variable:value dictionary.
 87             sample_order is a list of factors where factors defining the parents
 88               come before the factors for the child.
 89             """
 90             if sample_order is None:
 91                 sample_order = select_sample_ordering(self.bn)
 92             self.display(2,*[f.child for f in sample_order],sep="\t")
 93             counts = {val:0 for val in qvar.domain}
 94             for i in range(number_samples):
 95                 rejected = False
 96                 sample = {}
 97                 for fac in sample_order:
 98                     nvar = fac.child #next variable
 99                     val = sample_one(fac.cond_dist(sample))
100                     self.display(2,val,end="\t")
101                     if nvar in obs and obs[nvar] != val:
102                         rejected = True
103                         self.display(2,"Rejected")
104                         break
105                     sample[nvar] = val
106                 if not rejected:
107                     counts[sample[qvar]] += 1
108                     self.display(2,"Accepted")
109             tot = sum(counts.values())
110             return counts, {c:divide(v,tot) for (c,v) in counts.items()}
          It is possible that all samples get rejected. In that case, Python would give
      as a arithmetic error. Instead, we implement the convention that 0/0 = 1. You
      need to be careful is using these numbers as probabilities.
probStochSim.py — (continued)
154                self.display(2,weight)
155            tot = sum(counts)
156            return counts, {c:v/tot for (c,v) in counts.items()}
      Exercise 8.2 Change this algorithm so that it does importance sampling using a
      proposal distribution. It needs sample one using a different distribution and then
      update the weight of the current sample. For testing, use a proposal distribution
      that only specifies probabilities for some of the variables (and the algorithm uses
      the probabilities for the network in other cases).
191          self.display(2,weight)
192          return counts
      Resampling
      Resample is based on sample multiple but works with an array of particles.
      (Aside: Python doesn’t let us use sample multiple directly as it uses a dictionary,
      and particles, represented as dictionaries can’t be the key of dictionaries).
                                    probStochSim.py — (continued)
      8.5.6 Examples
                                    probStochSim.py — (continued)
229   ## bn2r.query(Ta,{Re:True},number_samples=100000)
230   ## bn2r.query(Ta,{Re:True,Sm:False})
231   ## bn2r.query(Ta,{Re:True,Sm:False},number_samples=100)
232
233   ## bn2L.query(Ta,{Re:True,Sm:False},number_samples=100)
234   ## bn2L.query(Ta,{Re:True,Sm:False},number_samples=100)
235
236
237   from probGraphicalModels import bn3,Season, Sprinkler
238   from probGraphicalModels import Rained, Grass_wet, Grass_shiny, Shoes_wet
239   bn3r = Rejection_sampling(bn3) # answers queries using rejection sampling
240   bn3L = Likelihood_weighting(bn3) # answers queries using rejection sampling
241   bn3p = Particle_filtering(bn3) # answers queries using particle filtering
242   #bn3r.query(Shoes_wet,{Grass_shiny:True,Rained:True})
243   #bn3L.query(Shoes_wet,{Grass_shiny:True,Rained:True})
244   #bn3p.query(Shoes_wet,{Grass_shiny:True,Rained:True})
      Exercise 8.3 This code keeps regenerating the distribution of a variable given
      its parents. Implement one or both of the following, and compare them to the
      original. Make cond dist return a slice that corresponds to the distribution, and
      then use the slice instead of the dictionary (a list slice does not generate new data
      structures). Make cond dist remember values it has already computed, and only
      return these.
247
248   def plot_stats(method, what, qvar, obs, number_samples=100, number_runs=1000):
249       """Plots a cumulative distribution of the prediction of the model.
250       method is a Sampling_inference_method (that implements appropriate query(.))
251       what is either "prob_ev" or the value of qvar to plot
252       qvar is the query variable
253       obs is the variable:value dictionary representing the observations
254       number_samples is the number of samples for each run
255       number_iterations is the number of runs that are plotted
256       """
257       plt.ion()
258       plt.xlabel("value")
259       plt.ylabel("Cumulative Number")
260       Inference_method.max_display_level, prev_max_display_level = 0, Inference_method.max_display_le
261       answers = [method.query(qvar,obs,number_samples=number_samples)
262                 for i in range(number_runs)]
263       if what == "prob_ev":
264           values = [sum(ans)/number_samples for ans in answers]
265           label = method.label+"(prob of evidence)"
266       else:
267           values = [divide(ans[qvar.val_to_index[what]],sum(ans)) for ans in answers]
268           label = method.label+" ("+str(qvar)+"="+str(what)+")"
269       values.sort()
270       plt.plot(values,range(number_runs),label=label)
271       plt.legend(loc="upper left")
272       plt.draw()
273       Inference_method.max_display_level = prev_max_display_level # restore display level
274
275
276   #   plot_stats(bn2r,False,Ta,{Re:True,Sm:False},number_samples=1000, number_runs=1000)
277   #   plot_stats(bn2L,False,Ta,{Re:True,Sm:False},number_samples=1000, number_runs=1000)
278   #   plot_stats(bn2r,False,Ta,{Re:True,Sm:False},number_samples=100, number_runs=1000)
279   #   plot_stats(bn2L,False,Ta,{Re:True,Sm:False},number_samples=100, number_runs=1000)
280   #   plot_stats(bn3r,True,Shoes_wet,{Grass_shiny:True,Rained:True},number_samples=1000)
281   #   plot_stats(bn3L,True,Shoes_wet,{Grass_shiny:True,Rained:True},number_samples=1000)
282   #   plot_stats(bn2r,"prob_ev",Ta,{Re:True,Sm:False},number_samples=1000, number_runs=1000)
283   #   plot_stats(bn2L,"prob_ev",Ta,{Re:True,Sm:False},number_samples=1000, number_runs=1000)
 11   import random
 12   from probGraphicalModels import Inference_method
 13
 14   from probStochSim import sample_one, Sampling_inference_method
 15
16   class Gibbs_sampling(Sampling_inference_method):
17       """The class that queries Graphical Models using Gibbs Sampling.
18
19         bn is a graphical model (e.g., a belief network) to query
20         """
21         def __init__(self,bn=None):
22             self.bn = bn
23             self.label = "Gibbs Sampling"
24
25         def query(self, qvar, obs={}, number_samples=1000, burn_in=100, sample_order=None):
26             """computes P(qvar|obs) where
27             qvar is a variable.
28             obs is a variable:value dictionary.
29             sample_order is a list of non-observed variables in order.
30             """
31             counts = {val:0 for val in qvar.domain}
32             if sample_order is not None:
33                 variables = sample_order
34             else:
35                 variables = [v for v in self.bn.variables if v not in obs]
36             var_to_factors = {v:set() for v in self.bn.variables}
37             for fac in self.bn.factors:
38                 for var in fac.variables:
39                     var_to_factors[var].add(fac)
40             sample = {var:random.choice(var.domain) for var in variables}
41             self.display(2,"Sample:",sample)
42             sample.update(obs)
43             for i in range(burn_in + number_samples):
44                 if sample_order == None:
45                     random.shuffle(variables)
46                 for var in variables:
47                     # get probability distribution of var given its neighbours
48                     vardist = {val:1 for val in var.domain}
49                     for val in var.domain:
50                         sample[var] = val
51                         for fac in var_to_factors[var]: # Markov blanket
52                             vardist[val] *= fac.get_value(sample)
53                     sample[var] = sample_one(vardist)
54                 if i >= burn_in:
55                     counts[sample[qvar]] +=1
56             tot = sum(counts.values())
57             return counts, {c:v/tot for (c,v) in counts.items()}
58
59   from probGraphicalModels import bn1, A,B,C
60   bn1g = Gibbs_sampling(bn1)
61   ## Inference_method.max_display_level = 2 # detailed tracing for all inference methods
62   bn1g.query(A,{})
63   ## bn1g.query(C,{})
64   ## bn1g.query(A,{C:True})
65   ## bn1g.query(B,{A:True,C:False})
66
67   from probGraphicalModels import bn2,Al,Fi,Le,Re,Sm,Ta
68   bn2g = Gibbs_sampling(bn2)
69   ## bn2g.query(Ta,{Re:True},number_samples=100000)
     Exercise 8.4 Change the code so that it can have multiple query variables. Make
     the list of query variable be an input to the algorithm, so that the default value is
     the list of all non-observed variables.
     Exercise 8.5 In this algorithm, explain where it computes the probability of a
     variable given its Markov blanket. Instead of returning the average of the samples
     for the query variable, it is possible to return the average estimate of the probabil-
     ity of the query variable given its Markov blanket. Does this converge to the same
     answer as the given code? Does it converge faster, slower, or the same?
probHMM.py — (continued)
29   # state
30   #       0=middle, 1,2,3 are corners
31   states1 = {'middle', 'c1', 'c2', 'c3'} # states
32   obs1 = {'m1','m2','m3'} # microphones
probHMM.py — (continued)
probHMM.py — (continued)
Initially the animal is in one of the four states, with equal probability.
probHMM.py — (continued)
probHMM.py — (continued)
 95   hmm1f1 = HMM_VE_filter(hmm1)
 96   # hmm1f1.filter([{'m1':0, 'm2':1, 'm3':1}, {'m1':1, 'm2':0, 'm3':1}])
 97   ## HMM_VE_filter.max_display_level = 2 # show more detail in displaying
 98   # hmm1f2 = HMM_VE_filter(hmm1)
 99   # hmm1f2.filter([{'m1':1, 'm2':0, 'm3':0}, {'m1':0, 'm2':1, 'm3':0}, {'m1':1, 'm2':0, 'm3':0},
100   #              {'m1':0, 'm2':0, 'm3':0}, {'m1':0, 'm2':0, 'm3':0}, {'m1':0, 'm2':0, 'm3':0},
101   #              {'m1':0, 'm2':0, 'm3':0}, {'m1':0, 'm2':0, 'm3':1}, {'m1':0, 'm2':0, 'm3':1},
102   #              {'m1':0, 'm2':0, 'm3':1}])
      Exercise 8.6 The localization example in the book is a controlled HMM, where
      there is a given action at each time and the transition depends on the action.
      Change the code to allow for controlled HMMs. Hint: the action only influences
      the state transition.
      Exercise 8.7 The representation assumes that there are a list of Boolean obser-
      vations. Extend the representation so that the each observation variable can have
      multiple discrete values. You need to choose a representation for the model, and
      change the algorithm.
134              self.resample_particles()
135              self.display(2,"After observing", str(obs),
136                            "state distribution:", self.histogram(self.particles))
137          self.display(1,"Final state distribution:", self.histogram(self.particles))
138          return self.histogram(self.particles)
139
140      def advance(self):
141          """advance to the next time.
142          This assumes that all of the weights are 1."""
143          self.particles = [sample_one(self.hmm.trans[st])
144                          for st in self.particles]
145
146      def observe(self, obs):
147          """reweight the particles to incorporate observations obs"""
148          for i in range(len(self.particles)):
149              for obv in obs:
150                  if obs[obv]:
151                      self.weights[i] *= self.hmm.pobs[obv][self.particles[i]]
152                  else:
153                      self.weights[i] *= 1-self.hmm.pobs[obv][self.particles[i]]
154
155      def histogram(self, particles):
156          """returns list of the probability of each state as represented by
157          the particles"""
158          tot=0
159          hist = {st: 0.0 for st in self.hmm.states}
160          for (st,wt) in zip(self.particles,self.weights):
161              hist[st]+=wt
162              tot += wt
163          return {st:hist[st]/tot for st in hist}
164
165      def resample_particles(self):
166          """resamples to give a new set of particles."""
167          self.particles = resample(self.particles, self.weights, len(self.particles))
168          self.weights = [1] * len(self.particles)
      Is it better or worse than particle filtering? Hint: you need to think about how
      they can be compared. Is the comparison different if there are more states than
      particles?
      Exercise 8.9 Extend the particle filtering code to continuous variables and ob-
      servations. In particular, suppose the state transition is a linear function with
      Gaussian noise of the previous state, and the observations are linear functions
      with Gaussian noise of the state. You may need to research how to sample from a
      Gaussian distribution.
probHMM.py — (continued)
         There are a number of ways that reasoning can be carried out in a DBN,
     including:
        • Rolling out the DBN for some time period, and using standard belief net-
          work inference. The latest time that needs to be in the rolled out network
          is the time of the latest observation or the time of a query (whichever is
          later). This allows us to observe any variables at any time and query any
          variables at any time. However, the unrolled Bayesian network may be
          very large. We also need to construct multiple copies of each feature.
30            else:
31                return self.index<other.index
32
33         def __gt__(self,other):
34             return other<self
35
36         def __str__(self):
37   #         if self.index==1:
38   #             return self.name
39   #         else:
40                return self.name+"_"+str(self.index)
41
42         __repr__ = __str__
43
44   def variable_pair(name,domain=[False,True]):
45       """returns a variable and its predecessor. This is used to define 2-stage DBNs
46
47         If the name is X, it returns the pair of variables X0,X"""
48         var = DBN_variable(name,domain)
49         var0 = DBN_variable(name,domain,index=0)
50         var.previous = var0
51         return var0, var
probDBN.py — (continued)
53   class DBN(Displayable):
54       """The class of stationary Dynamic Bayesian networks.
55
56         * vars1 is a list of current variables (each must have
57         previous variable).
58         * transition_factors is a list of factors for P(X|parents) where X
59         is a current variable and parents is a list of current or previous variables.
60         * init_factors is a list of factors for P(X|parents) where X is a
61         current variable and parents can only include current variables
62         The graph of transition factors + init factors must be acyclic.
63
64         """
65         def __init__(self,vars1, transition_factors=None, init_factors=None):
66             self.vars1 = vars1
67             self.vars0 = [v.previous for v in vars1]
68             self.transition_factors = transition_factors
69             self.init_factors = init_factors
70             self.var_index = {}     # var_index[v] is the index of variable v
71             for i,v in enumerate(vars1):
72                 self.var_index[v]=i
     Here is a 3 variable DBN:
                                     probDBN.py — (continued)
74   A0,A1 = variable_pair("A")
75   B0,B1 = variable_pair("B")
76   C0,C1 = variable_pair("C")
 77
 78   # dynamics
 79   pc = Prob(C1,[B1,C0],[0.03,0.97,0.38,0.62,0.23,0.77,0.78,0.22])
 80   pb = Prob(B1,[A0,A1],[0.5,0.5,0.77,0.23,0.4,0.6,0.83,0.17])
 81   pa = Prob(A1,[A0,B0],[0.1,0.9,0.65,0.35,0.3,0.7,0.8,0.2])
 82
 83   # initial distribution
 84   pa0 = Prob(A1,[],[0.9,0.1])
 85   pb0 = Prob(B1,[A1],[0.3,0.7,0.8,0.2])
 86   pc0 = Prob(C1,[],[0.2,0.8])
 87
 88   dbn1 = DBN([A1,B1,C1],[pa,pb,pc],[pa0,pb0,pc0])
      Here is the animal example
                                    probDBN.py — (continued)
 90   from probHMM import closeMic, farMic, midMic, sm, mmc, sc, mcm, mcc
 91
 92   Pos_0,Pos_1 =   variable_pair("Position",domain=[0,1,2,3])
 93   Mic1_0,Mic1_1   = variable_pair("Mic1")
 94   Mic2_0,Mic2_1   = variable_pair("Mic2")
 95   Mic3_0,Mic3_1   = variable_pair("Mic3")
 96
 97   # conditional probabilities - see hmm for the values of sm,mmc, etc
 98   ppos = Prob(Pos_1, [Pos_0],
 99              [sm, mmc, mmc, mmc, #was in middle
100               mcm, sc, mcc, mcc, #was in corner 1
101               mcm, mcc, sc, mcc, #was in corner 2
102               mcm, mcc, mcc, sc]) #was in corner 3
103   pm1 = Prob(Mic1_1, [Pos_1], [1-midMic, midMic, 1-closeMic, closeMic,
104                             1-farMic, farMic, 1-farMic, farMic])
105   pm2 = Prob(Mic2_1, [Pos_1], [1-midMic, midMic, 1-farMic, farMic,
106                             1-closeMic, closeMic, 1-farMic, farMic])
107   pm3 = Prob(Mic3_1, [Pos_1], [1-midMic, midMic, 1-farMic, farMic,
108                             1-farMic, farMic, 1-closeMic, closeMic])
109   ipos = Prob(Pos_1,[], [0.25, 0.25, 0.25, 0.25])
110   dbn_an =DBN([Pos_1,Mic1_1,Mic2_1,Mic3_1],
111              [ppos, pm1, pm2, pm3],
112              [ipos, pm1, pm2, pm3])
probDBN.py — (continued)
124            """
125            assert all(self.current_obs[var]==obs[var] for var in obs
126                      if var in self.current_obs),"inconsistent current observations"
127            self.current_obs.update(obs)
128
129         def query(self,var):
130             """returns the posterior probability of current variable var"""
131             return VE(Graphical_model(self.dbn.vars1,self.current_factors)).query(var,self.current_obs)
132
133         def advance(self):
134             """advance to the next time"""
135             prev_factors = [self.make_previous(fac) for fac in self.current_factors]
136             prev_obs = {var.previous:val for var,val in self.current_obs.items()}
137             two_stage_factors = prev_factors + self.dbn.transition_factors
138             self.current_factors = self.elim_vars(two_stage_factors,self.dbn.vars0,prev_obs)
139             self.current_obs = {}
140
141         def make_previous(self,fac):
142             """Creates new factor from fac where the current variables in fac
143             are renamed to previous variables.
144             """
145             return Factor_rename(fac, {var.previous:var for var in fac.variables})
146
147         def elim_vars(self,factors, vars, obs):
148             for var in vars:
149                 if var in obs:
150                     factors = [self.project_observations(fac,obs) for fac in factors]
151                 else:
152                     factors = self.eliminate_var(factors, var)
153             return factors
      Example queries:
                                     probDBN.py — (continued)
155   df = DBN_VE_filter(dbn1)
156   #df.observe({B1:True}); df.advance(); df.observe({C1:False})
157   #df.query(B1)
158   #df.advance()
159   #df.query(B1)
160   dfa = DBN_VE_filter(dbn_an)
161   # dfa.observe({Mic1_1:0, Mic2_1:1, Mic3_1:1})
162   # dfa.advance()
163   # dfa.observe({Mic1_1:1, Mic2_1:0, Mic3_1:1})
164   # dfa.query(Pos_1)
25   class DecisionVariable(Variable):
26       def __init__(self,name,domain,parents):
                                                   167
     168                                                       9. Planning with Uncertainty
27            Variable.__init__(self,name,domain)
28            self.parents = parents
29            self.all_vars = set(parents) | {self}
     A decision network is a graphical model where the variables can be random
     variables or decision variables. In the factors we assume there is one utility
     factor.
                                   decnNetworks.py — (continued)
31   class DecisionNetwork(Graphical_model):
32       def __init__(self,vars=None,factors=None):
33           """vars is a list of variables
34           factors is a list of factors (instances of Prob and Utility)
35           """
36           Graphical_model.__init__(self,vars,factors)
     VE DN is variable elimination for decision networks. The method optimize is
     used to optimize all the decisions. Note that optimize requires a legal emimina-
     tion ordering of the random and decision variables, otherwise it will give an
     exception. (A decision node can only be maximized if the variables that are not
     its parents have already been eliminated.)
                                   decnNetworks.py — (continued)
decnNetworks.py — (continued)
 70   class Factor_max(Factor_stored):
 71       """A factor obtained by maximizing a variable in a factor.
 72       Also builds a decision_function. This is based on Factor_sum.
 73       """
 74
 75      def __init__(self, dvar, factor):
 76          """dvar is a decision variable.
 77          factor is a factor that contains dvar and only parents of dvar
 78          """
 79          self.dvar = dvar
 80          self.factor = factor
 81          vars = [v for v in factor.variables if v is not dvar]
 82          Factor_stored.__init__(self,vars,None)
 83          self.values = [None]*self.size
 84          self.decision_fun = Factor_DF(dvar,vars,[None]*self.size)
 85
 86      def get_value(self,assignment):
 87          """lazy implementation: if saved, return saved value, else compute it"""
 88          index = self.assignment_to_index(assignment)
 89          if self.values[index]:
 90              return self.values[index]
 91          else:
 92              max_val = float("-inf") # -infinity
 93              new_asst = assignment.copy()
 94              for elt in self.dvar.domain:
 95                  new_asst[self.dvar] = elt
 96                  fac_val = self.factor.get_value(new_asst)
 97                  if fac_val>max_val:
 98                      max_val = fac_val
 99                      best_elt = elt
100              self.values[index] = max_val
101              self.decision_fun.values[index] = best_elt
102              return max_val
decnNetworks.py — (continued)
                                                 Smoke
                          Alarm
Leaving See_smoke
Check_smoke
Report Call
decnNetworks.py — (continued)
      The following is the representation of the cheating decision of Figure 9.2. Note
      that we keep the names of the variables short (less than 8 characters) so that
      tables Python prints look good.
Watched Punishment
Caught1 Caught2
Grade1 Grade2
Final Grade
decnNetworks.py — (continued)
160
161   cheat_dn = DecisionNetwork([Pun,CC2,Wa,GrF,Gr2,Gr1,Ch2,CC1,Ch1],
162                       [p_wa, p_cc1, p_cc2, p_pun, p_gr1, p_gr2,p_fg,utc])
163
164   # VE_DN.max_display_level = 3 # if you want to show lots of detail
165   # v,p = VE_DN(cheat_dn).optimize(); print(v)
166   # for df in p: print(df,"\n") # print decision functions
22
23   # reward[s][a] gives the expected reward of doing a in state s.
24   reward2 = ((7,10),(0,2))
25
26   healthy2 = MDP(['Healthy','Sick'], ['Relax','Party'], trans2, reward2, discount=0.8)
        Tiny game from Example 11.7 and Figure 11.8 of Poole and Mackworth,
     2010:
                                   mdpExamples.py — (continued)
28   ## Tiny Game from Example 11.7 and Figure 11.8 of Poole and Mackworth, 2010 #
29
30   # actions      up               right        upC           left
31   transt = (((0.1,0.1,0.8,0,0,0), (0,1,0,0,0,0), (0,0,1,0,0,0), (1,0,0,0,0,0)), #s0
32           ((0.1,0.1,0,0.8,0,0), (0,1,0,0,0,0), (0,0,0,1,0,0), (1,0,0,0,0,0)),
     #s1
33           ((0,0,0.1,0.1,0.8,0), (0,0,0,1,0,0), (0,0,0,0,1,0), (0,0,1,0,0,0)),
     #s2
34           ((0,0,0.1,0.1,0,0.8), (0,0,0,1,0,0), (0,0,0,0,0,1), (0,0,1,0,0,0)),
     #s3
35           ((0.1,0,0,0,0.8,0.1), (0,0,0,0,0,1), (0,0,0,0,1,0), (1,0,0,0,0,0)),
     #s4
36           ((0,0,0,0,0.1,0.9), (0,0,0,0,0,1), (0,0,0,0,0,1), (0,0,0,0,1,0)) ) #s5
37
38   # actions    up rt upC left
39   rewardt = ((-0.1, 0, -1, -1),        #s0
40             (-0.1, -1, -2,  0),        #s1
41             (-10, 0, -1, -100),        #s2
42             (-0.1, -1, -1,  0),        #s3
43             (-1,   0, -2, 10),         #s4
44             (-1, -1, -2,    0))        #s5
45
46   mdpt = MDP(['s0','s1','s2','s3','s4','s5'], # states
47             ['up', 'right', 'upC', 'left'], # actions
48             transt, rewardt, discount=0.9)
28       def vi1(self,v):
29            """carry out one iteration of value iteration and
30            returns a value function (a list of a value for each state).
31            v is the previous value function.
32             """
33             return [max([self.reward[s][a]+self.discount*product(self.trans[s][a],v)
34                         for a in range(len(self.actions))])
35                    for s in range(len(self.states))]
36
37         def vi(self,v0,n):
38             """carries out n iterations of value iteration starting with value v0.
39
40            Returns a value function
41            """
42            val = self.v0
43            for i in range(n):
44               val= self.vi1(val)
45            return val
46
47         def policy(self,v):
48             """returns an optimal policy assuming the next value function is v
49                v is a list of values for each state
50                returns a list of the indexes of optimal actions for each state
51             """
52             return [argmax(enumerate([self.reward[s][a]+self.discount*product(self.trans[s][a],v)
53                                     for a in range(len(self.actions))]))
54                     for s in range(len(self.states))]
55
56         def q(self,v):
57             """returns the one-step-lookahead q-value assuming the next value function is v
58             v is a list of values for each state
59             returns a list of q values for each state. so that q[s][a] represents Q(s,a)
60             """
61             return [[self.reward[s][a]+self.discount*product(self.trans[s][a],v)
62                      for a in range(len(self.actions))]
63                     for s in range(len(self.states))]
mdpProblem.py — (continued)
65   def product(l1,l2):
66       """returns the dot product of l1 and l2"""
67       return sum([i1*i2 for (i1,i2) in zip(l1,l2)])
           The following gives a trace for the examples:
                                    mdpExamples.py — (continued)
50   def trace(mdp,numiter):
51       print("Q values are shown as",[[st+"_"+ac for ac in mdp.actions] for st in mdp.states])
52       print("One step lookahead Q-values:")
53       print(mdp.q(mdp.v0))
54       print("Values are for the states:", mdp.states)
55       print("One step lookahead values:")
56       print(mdp.vi(mdp.v0,1))
57       print("Two step lookahead Q-values:")
58       print(mdp.q(mdp.vi(mdp.v0,1)))
59       print("Two step lookahead values:")
60      print(mdp.vi(mdp.v0,2))
61      vfin = mdp.vi(mdp.v0,numiter)
62      print("After",numiter,"iterations, values:")
63      print(vfin)
64      print("After",numiter,"iterations, Q-values:")
65      print(mdp.q(vfin))
66      print("After",numiter,"iterations, Policy:",
67            [st+"->"+mdp.actions[act] for (st,act) in zip(mdp.states ,mdp.policy(vfin))])
68
69   # Try the following:
70   # trace(healthy2,10)
     Exercise 9.1 Implement value iteration that stores the Q-values rather than the
     V-values. Does it work better than storing V? (What might better mean?)
     Exercise 9.2 Implement asynchronous value iteration. Try a number of differ-
     ent ways to choose the states and actions to update (e.g., sweeping through the
     state-action pairs, choosing them at random). Note that the best way may be to
     determine which states have had their Q-values change the most, and then update
     the previous ones, but that is not so straightforward to implement, because you
     need to find those previous states.
     10.1       K-means
     The k-means learner maintains two lists that suffice as sufficient statistics to
     classify examples, and to learn the classification:
        • class counts is a list such that class counts[c] is the number of examples in
          the training set with class = c.
        • feature sum is a list such that feature sum[i][c] is sum of the values for the
          i’th feature i for members of class c. The average value of the ith feature
          in class i is
                feature sum[i][c]
                 class counts[c]
                                                  177
     178                                                    10. Learning with Uncertainty
35         def distance(self,cl,eg):
36             """distance of the eg from the mean of the class"""
37             return sum( (self.class_prediction(ind,cl)-feat(eg))**2
38                            for (ind,feat) in enumerate(self.dataset.input_features))
39
40         def class_prediction(self,feat_ind,cl):
41             """prediction of the class cl on the feature with index feat_ind"""
42             if self.class_counts[cl] == 0:
43                 return 0 # there are no examples so we can choose any value
44             else:
45                 return self.feature_sum[feat_ind][cl]/self.class_counts[cl]
46
47         def class_of_eg(self,eg):
48             """class to which eg is assigned"""
49             return (min((self.distance(cl,eg),cl)
50                           for cl in range(self.num_classes)))[1]
51                   # second element of tuple, which is a class with minimum distance
     One step of k-means updates the class counts and feature sum. It uses the old
     values to determine the classes, and so the new values for class counts and
     feature sum. At the end it determines whether the values of these have changes,
     and then replaces the old ones with the new ones. It returns an indicator of
     whether the values are stable (have not changed).
                                   learnKMeans.py — (continued)
53         def k_means_step(self):
54             """Updates the model with one step of k-means.
55             Returns whether the assignment is stable.
56             """
57             new_class_counts = [0]*self.num_classes
58             # feature_sum[i][c] is the sum of the values of feature i for class c
59             new_feature_sum = [[0]*self.num_classes
60                               for feat in self.dataset.input_features]
 61         for eg in self.dataset.train:
 62             cl = self.class_of_eg(eg)
 63             new_class_counts[cl] += 1
 64             for (ind,feat) in enumerate(self.dataset.input_features):
 65                 new_feature_sum[ind][cl] += feat(eg)
 66         stable = (new_class_counts == self.class_counts) and (self.feature_sum == new_feature_sum)
 67         self.class_counts = new_class_counts
 68         self.feature_sum = new_feature_sum
 69         self.num_iterations += 1
 70         return stable
 71
 72
 73      def learn(self,n=100):
 74          """do n steps of k-means, or until convergence"""
 75          i=0
 76          stable = False
 77          while i<n and not stable:
 78              stable = self.k_means_step()
 79              i += 1
 80              self.display(1,"Iteration",self.num_iterations,
 81                             "class counts: ",self.class_counts," Stable=",stable)
 82          return stable
 83
 84      def show_classes(self):
 85          """sorts the data by the class and prints in order.
 86          For visualizing small data sets
 87          """
 88          class_examples = [[] for i in range(self.num_classes)]
 89          for eg in self.dataset.train:
 90              class_examples[self.class_of_eg(eg)].append(eg)
 91          print("Class","Example",sep='\t')
 92          for cl in range(self.num_classes):
 93              for eg in class_examples[cl]:
 94                  print(cl,*eg,sep='\t')
 95
 96      def plot_error(self, maxstep=20):
 97          """Plots the sum-of-suares error as a function of the number of steps"""
 98          plt.ion()
 99          plt.xlabel("step")
100          plt.ylabel("Ave sum-of-squares error")
101          train_errors = []
102          if self.dataset.test:
103              test_errors = []
104          for i in range(maxstep):
105              self.learn(1)
106              train_errors.append( sum(self.distance(self.class_of_eg(eg),eg)
107                                        for eg in self.dataset.train)
108                                 /len(self.dataset.train))
109              if self.dataset.test:
110                  test_errors.append( sum(self.distance(self.class_of_eg(eg),eg)
      Exercise 10.1 Change boolean features = True flag to allow for numerical features.
      K-means assumes the features are numerical, so we want to make non-numerical
      features into numerical features (using characteristic functions) but we probably
      don’t want to change numerical features into Boolean.
      Exercise 10.2 If there are many classes, some of the classes can become empty
      (e.g., try 100 classes with carbool.csv). Implement a way to put some examples
      into a class, if possible. Two ideas are:
          (a) Initialize the classes with actual examples, so that the classes will not start
              empty. (Do the classes become empty?)
          (b) In class prediction, we test whether the code is empty, and make a prediction
              of 0 for an empty class. It is possible to make a different prediction to “steal”
              an example (but you should make sure that a class has a consistent value for
              each feature in a loop).
      Make your own suggestions, and compare it with the original, and whichever of
      these you think may work better.
     10.2        EM
     In the following definition, a class, c, is a integer in range [0, num classes). i is
     an index of a feature, so feat[i] is the ith feature, and a feature is a function from
     tuples to values. val is a value of a feature.
         A model consists of 2 lists, which form the sufficient statistics:
        • class counts is a list such that class counts[c] is the number of tuples with
          class = c, where each tuple is weighted by its probability, i.e.,
        • feature counts is a list such that feature counts[i][val][c] is the weighted count
          of the number of tuples t with feat[i](t) = val and class(t) = c, each tuple
          is weighted by its probability, i.e.,
                                       learnEM.py — EM Learning
11   from learnProblem import Data_set, Learner, Data_from_file
12   import random
13   import math
14   import matplotlib.pyplot as plt
15
16   class EM_learner(Learner):
17       def __init__(self,dataset, num_classes):
18           self.dataset = dataset
19           self.num_classes = num_classes
20           self.class_counts = None
21           self.feature_counts = None
     The function em step goes though the training examples, and updates these
     counts. The first time it is run, when there is no model, it uses random distri-
     butions.
                                        learnEM.py — (continued)
33                    tpl_class_dist = random_dist(self.num_classes)
34                for cl in range(self.num_classes):
35                    class_counts[cl] += tpl_class_dist[cl]
36                    for (ind,feat) in enumerate(self.dataset.input_features):
37                        feature_counts[ind][feat(tple)][cl] += tpl_class_dist[cl]
38            return class_counts, feature_counts
     prob computes the probability of a class for a tuple, given the current statistics.
     len(self .dataset) is a constant (independent of c). class counts[c] can be taken out
     of the product, but needs to be raised to the power of the number of features,
     and one of them cancels.
                                        learnEM.py — (continued)
40         def prob(self,tple,class_counts,feature_counts):
41             """returns a distribution over the classes for the original tuple in the current model
42             """
43             feats = self.dataset.input_features
44             unnorm = [prod(feature_counts[i][feat(tple)][c]
45                            for (i,feat) in enumerate(feats))/(class_counts[c]**(len(feats)-1))
46                        for c in range(self.num_classes)]
47             thesum = sum(unnorm)
48             return [un/thesum for un in unnorm]
     learn does n steps of EM:
                                        learnEM.py — (continued)
50         def learn(self,n):
51             """do n steps of em"""
52             for i in range(n):
53                 self.class_counts,self.feature_counts = self.em_step(self.class_counts,
54                                                                 self.feature_counts)
     The following is for visualizing the classes. It prints the dataset ordered by the
     probability of class c.
                                        learnEM.py — (continued)
56         def show_class(self,c):
57             """sorts the data by the class and prints in order.
58             For visualizing small data sets
59             """
60             sorted_data = sorted((self.prob(tpl,self.class_counts,self.feature_counts)[c],
61                                 ind, # preserve ordering for equal probabilities
62                                 tpl)
63                                for (ind,tpl) in enumerate(self.dataset.train))
64             for cc,r,tpl in sorted_data:
65                 print(cc,*tpl,sep='\t')
     where cc is the class count and fc is feature count. len(self .dataset) can be dis-
     tributed out of the sum, and cc[c] can be taken out of the product:
                        1                    1
           =
               len(self .dataset)   ∑ cc[c]#feats−1 ∗ ∏ fc[i][feati (tple)][c]
                                     c                    i
     Given the probability of each tuple, we can evaluate the logloss, as the negative
     of the log probability:
                                          learnEM.py — (continued)
67      def logloss(self,tple):
68          """returns the logloss of the prediction on tple, which is -log(P(tple))
69          based on the current class counts and feature counts
70          """
71          feats = self.dataset.input_features
72          res = 0
73          cc = self.class_counts
74          fc = self.feature_counts
75          for c in range(self.num_classes):
76              res += prod(fc[i][feat(tple)][c]
77                         for (i,feat) in enumerate(feats))/(cc[c]**(len(feats)-1))
78          if res>0:
79              return -math.log2(res/len(self.dataset.train))
80          else:
81              return float("inf") #infinity
82
83      def plot_error(self, maxstep=20):
84          """Plots the logloss error as a function of the number of steps"""
85          plt.ion()
86          plt.xlabel("step")
87          plt.ylabel("Ave Logloss (bits)")
88          train_errors = []
89          if self.dataset.test:
90              test_errors = []
91          for i in range(maxstep):
92              self.learn(1)
93              train_errors.append( sum(self.logloss(tple) for tple in self.dataset.train)
94                                 /len(self.dataset.train))
95              if self.dataset.test:
96                  test_errors.append( sum(self.logloss(tple) for tple in self.dataset.test)
97                                     /len(self.dataset.test))
98          plt.plot(range(1,maxstep+1),train_errors,
      Exercise 10.3 For the EM data, where there are naturally 2 classes, 3 classes does
      better on the training set after a while than 2 classes, but worse on the test set.
      Explain why. Hint: look what the 3 classes are. Use ”em3.show class(i)” for each
      of the classes i ∈ [0, 3).
      Exercise 10.4 Write code to plot the logloss as a function of the number of classes
      (from 1 to say 15) for a fixed number of iterations. (From the experience with the
      existing code, think about how many iterations is appropriate.)
Multiagent Systems
     11.1      Minimax
     Here we consider two-player zero-sum games. Here a player only wins when
     another player loses. This can be modeled as where there is a single utility
     which one agent (the maximizing agent) is trying minimize and the other agent
     (the minimizing agent) is trying to minimize.
                                              185
     186                                                         11. Multiagent Systems
30         def children(self):
31             """returns the list of all children."""
32             return self.allchildren
33
34         def evaluate(self):
35             """returns the evaluation for this node if it is a leaf"""
36             return self.value
     The following gives the tree from Figure 10.5 of the book. Note how 888 is used
     as a value here, but never appears in the trace.
                                   masProblem.py — (continued)
38   fig10_5 = Node("a",True,None, [
39              Node("b",False,None, [
40                  Node("d",True,None, [
41                      Node("h",False,None, [
42                          Node("h1",True,7,None),
43                          Node("h2",True,9,None)]),
44                      Node("i",False,None, [
45                          Node("i1",True,6,None),
46                          Node("i2",True,888,None)])]),
47                  Node("e",True,None, [
48                      Node("j",False,None, [
49                          Node("j1",True,11,None),
50                          Node("j2",True,12,None)]),
51                      Node("k",False,None, [
52                          Node("k1",True,888,None),
53                          Node("k2",True,888,None)])])]),
54              Node("c",False,None, [
55                  Node("f",True,None, [
56                      Node("l",False,None, [
57                          Node("l1",True,5,None),
58                          Node("l2",True,888,None)]),
59                      Node("m",False,None, [
60                          Node("m1",True,4,None),
61                          Node("m2",True,888,None)])]),
62                  Node("g",True,None, [
63                      Node("n",False,None, [
64                          Node("n1",True,888,None),
65                          Node("n2",True,888,None)]),
66                      Node("o",False,None, [
67                          Node("o1",True,888,None),
68                          Node("o2",True,888,None)])])])])
     The following is a representation of a magic-sum game, where players take
     turns picking a number in the range [1, 9], and the first player to have 3 num-
     bers that sum to 15 wins. Note that this is a syntactic variant of tic-tac-toe or
     naughts and crosses. To see this, consider the numbers on a magic square (Fig-
     ure 11.1); 3 numbers that add to 15 correspond exactly to the winning positions
     of tic-tac-toe played on the magic square.
         Note that we do not remove symmetries. (What are the symmetries? How
                                           6    1    8
                                           7    5    3
                                           2    9    4
masProblem.py — (continued)
 70
 71   class Magic_sum(Node):
 72       def __init__(self, xmove=True, last_move=None,
 73                   available=[1,2,3,4,5,6,7,8,9], x=[], o=[]):
 74           """This is a node in the search for the magic-sum game.
 75           xmove is True if the next move belongs to X.
 76           last_move is the number selected in the last move
 77           available is the list of numbers that are available to be chosen
 78           x is the list of numbers already chosen by x
 79           o is the list of numbers already chosen by o
 80           """
 81           self.isMax = self.xmove = xmove
 82           self.last_move = last_move
 83           self.available = available
 84           self.x = x
 85           self.o = o
 86           self.allchildren = None #computed on demand
 87           lm = str(last_move)
 88           self.name = "start" if not last_move else "o="+lm if xmove else "x="+lm
 89
 90      def children(self):
 91          if self.allchildren is None:
 92              if self.xmove:
 93                  self.allchildren = [
 94                      Magic_sum(xmove = not self.xmove,
 95                               last_move = sel,
 96                               available = [e for e in self.available if e is not sel],
 97                               x = self.x+[sel],
 98                               o = self.o)
 99                             for sel in self.available]
100              else:
101                  self.allchildren = [
102                      Magic_sum(xmove = not self.xmove,
103                               last_move = sel,
104                               available = [e for e in self.available if e is not sel],
105                               x = self.x,
106                               o = self.o+[sel])
107                             for sel in self.available]
108          return self.allchildren
109
31                 min_score = score
32                 min_path = C.name,path
33          return min_score,min_path
        The following is a depth-first minimax with α-β pruning. It returns the
     value for a node as well as a best path for the agents.
                                masMiniMax.py — (continued)
35   def minimax_alpha_beta(node,alpha,beta,depth=0):
36       """node is a Node, alpha and beta are cutoffs, depth is the depth
37       returns value, path
38       where path is a sequence of nodes that results in the value"""
39       node.display(2," "*depth,"minimax_alpha_beta(",node.name,", ",alpha, ", ", beta,")")
40       best=None      # only used if it will be pruned
41       if node.isLeaf():
42           node.display(2," "*depth,"returning leaf value",node.evaluate())
43           return node.evaluate(),None
44       elif node.isMax:
45           for C in node.children():
46               score,path = minimax_alpha_beta(C,alpha,beta,depth+1)
47               if score >= beta: # beta pruning
48                   node.display(2," "*depth,"pruned due to beta=",beta,"C=",C.name)
49                   return score, None
50               if score > alpha:
51                   alpha = score
52                   best = C.name, path
53           node.display(2," "*depth,"returning max alpha",alpha,"best",best)
54           return alpha,best
55       else:
56           for C in node.children():
57               score,path = minimax_alpha_beta(C,alpha,beta,depth+1)
58               if score <= alpha: # alpha pruning
59                   node.display(2," "*depth,"pruned due to alpha=",alpha,"C=",C.name)
60                   return score, None
61               if score < beta:
62                   beta=score
63                   best = C.name,path
64           node.display(2," "*depth,"returning min beta",beta,"best=",best)
65           return beta,best
        Testing:
                                masMiniMax.py — (continued)
76   ##             ).timeit(number=1)
77   ## trace=False
78   ## timeit.Timer("minimax_alpha_beta(Magic_sum(), -9999, 9999,0)",
79   ##             setup="from __main__ import minimax_alpha_beta, Magic_sum"
80   ##             ).timeit(number=1)
Reinforcement Learning
25   class Healthy_env(RL_env):
26       def __init__(self):
27           RL_env.__init__(self,["party","relax"], "healthy")
28
29      def do(self, action):
                                                   191
     192                                                         12. Reinforcement Learning
49   class Env_from_MDP(RL_env):
50       def __init__(self, mdp):
51           initial_state = mdp.states[0]
52           RL_env.__init__(self,mdp.actions, initial_state)
53           self.mdp = mdp
54           self.action_index = {action:index for (index,action) in enumerate(mdp.actions)}
55           self.state_index = {state:index for (index,state) in enumerate(mdp.states)}
56
57         def do(self, action):
58             """updates the state based on the agent doing action.
59             returns state,reward
60             """
61             action_ind = self.action_index[action]
62             state_ind = self.state_index[self.state]
63             self.state = pick_from_dist(self.mdp.trans[state_ind][action_ind], self.mdp.states)
64             reward = self.mdp.reward[state_ind][action_ind]
65             return self.state, reward
66
67   def pick_from_dist(dist,values):
68       """
4 P1 R P2
3 M
2 M
1 M M M
0 P3 P4
0 1 2 3 4
 77              else:
 78                  self.y += 1
 79          elif actual_direction == "down":
 80              if self.y==0:
 81                  reward += self.crashed_reward
 82              else:
 83                  self.y += -1
 84          else:
 85              raise RuntimeError("unknown_direction "+str(direction))
 86
 87          # Monsters
 88          if (self.x,self.y) in self.monster_locs and flip(self.monster_appears_prob):
 89              if self.damaged:
 90                  reward += self.monster_reward_when_damaged
 91              else:
 92                  self.damaged = True
 93          if (self.x,self.y) in self.repair_stations:
 94              self.damaged = False
 95
 96          # Prizes
 97          if (self.x,self.y) == self.prize:
 98              reward += self.prize_reward
 99              self.prize = None
100
101          # Statistics
102          self.number_steps += 1
103          self.total_reward += reward
104          if self.total_reward < self.min_reward:
105              self.min_reward = self.total_reward
106              self.min_step = self.number_steps
107          if self.total_reward>0 and reward>self.total_reward:
108              self.zero_crossing = self.number_steps
109          self.display(2,"",self.number_steps,self.total_reward,
110                       self.total_reward/self.number_steps,sep="\t")
111
112          return (self.x, self.y, self.damaged, self.prize), reward
     12.2      Q Learning
        To run the Q-learning demo, in folder “aipython”, load “rlQTest.py”,
        and copy and paste the example queries at the bottom of that file. This
        assumes Python 3.
                                 rlQLearner.py — Q Learning
11   import random
12   from utilities import Displayable, argmax, flip
13
14   class RL_agent(Displayable):
15       """An RL_Agent
16       has percepts (s, r) for some state s and real reward r
17       """
rlQLearner.py — (continued)
19   class Q_learner(RL_agent):
20       """A Q-learning agent has
21       belief-state consisting of
22           state is the previous state
23           q is a {(state,action):value} dict
24           visits is a {(state,action):n} dict. n is how many times action was done in state
25           acc_rewards is the accumulated reward
26
27      it observes (s, r) for some world-state s and real reward r
28      """
rlQLearner.py — (continued)
50            self.qinit = qinit
51            self.label = label
52            self.restart()
     restart is used to make the learner relearn everything. This is used by the plot-
     ter to create new plots.
                                    rlQLearner.py — (continued)
54         def restart(self):
55             """make the agent relearn, and reset the accumulated rewards
56             """
57             self.acc_rewards = 0
58             self.state = self.env.state
59             self.q = {}
60             self.visits = {}
     do takes in the number of steps.
                                    rlQLearner.py — (continued)
62         def do(self,num_steps=100):
63             """do num_steps of interaction with the environment"""
64             self.display(2,"s\ta\tr\ts'\tQ")
65             alpha = self.alpha
66             for i in range(num_steps):
67                 action = self.select_action(self.state)
68                 next_state,reward = self.env.do(action)
69                 if not self.fixed_alpha:
70                     k = self.visits[(self.state, action)] = self.visits.get((self.state, action),0)+1
71                     alpha = self.alpha_fun(k)
72                 self.q[(self.state, action)] = (
73                     (1-alpha) * self.q.get((self.state, action),self.qinit)
74                     + alpha * (reward + self.discount
75                                       * max(self.q.get((next_state, next_act),self.qinit)
76                                            for next_act in self.actions)))
77                 self.display(2,self.state, action, reward, next_state,
78                             self.q[(self.state, action)], sep='\t')
79                 self.state = next_state
80                 self.acc_rewards += reward
     select action us used to select the next action to perform. This can be reimple-
     mented to give a different exploration strategy.
                                    rlQLearner.py — (continued)
        • q[s, a] is dictionary that, given a (s, a) pair returns the Q-value, the esti-
          mate of the future (discounted) value of being in state s and doing action
          a.
        • r[s, a] is dictionary that, given a (s, a) pair returns the average reward
          from doing a in state s.
        • visits[s, a] is dictionary that, given a (s, a) pair returns the number of times
          action a was carried out in state s.
        • res states[s, a] is dictionary that, given a (s, a) pair returns the list of re-
          sulting states that have occurred when action a was carried out in state s.
          This is used in the asynchronous value iteration to determine the s0 states
          to sum over.
        • visits list is a list of (s, a) pair that have been carried out. This is used
          to ensure there is no divide-by zero in the asynchronous value iteration.
          Note that this could be constructed from r, visits or res states by enumer-
          ating the keys, but needs to be a list for random.choice, and we don’t want
          to keep recreating it.
rlModelLearner.py — (continued)
38      def restart(self):
39          """make the agent relearn, and reset the accumulated rewards
40          """
41          self.acc_rewards = 0
42          self.state = self.env.state
43          self.q = {}           # {(st,action):q_value} map
44          self.r = {}           # {(st,action):reward} map
45          self.t = {}           # {(st,action,st_next):count} map
46          self.visits = {}      # {(st,action):count} map
47          self.res_states = {} # {(st,action):set_of_states} map
48          self.visits_list = [] # list of (st,action)
49          self.previous_action = None
rlModelLearner.py — (continued)
51      def do(self,num_steps=100):
52          """do num_steps of interaction with the environment
53          for each action, do updates_per_step iterations of asynchronous value iteration
54          """
55          for step in range(num_steps):
56              pst = self.state # previous state
57              action = self.select_action(pst)
58              self.state,reward = self.env.do(action)
59              self.acc_rewards += reward
60              self.t[(pst,action,self.state)] = self.t.get((pst, action,self.state),0)+1
61              if (pst,action) in self.visits:
62                  self.visits[(pst,action)] += 1
63                  self.r[(pst,action)] += (reward-self.r[(pst,action)])/self.visits[(pst,action)]
64                  self.res_states[(pst,action)].add(self.state)
65              else:
66                  self.visits[(pst,action)] = 1
67                  self.r[(pst,action)] = reward
68                  self.res_states[(pst,action)] = {self.state}
69                  self.visits_list.append((pst,action))
70              st,act = pst,action    #initial state-action pair for AVI
71              for update in range(self.updates_per_step):
72                   self.q[(st,act)] = self.r[(st,act)]+self.discount*(
73                       sum(self.t[st,act,rst]/self.visits[st,act]*
74                           max(self.q.get((rst,nact),self.qinit) for nact in self.actions)
75                           for rst in self.res_states[(st,act)]))
76                   st,act = random.choice(self.visits_list)
rlModelLearner.py — (continued)
rlModelLearner.py — (continued)
     Exercise 12.3 If there was only one update per step, the algorithm can be made
     simpler and use less space. Explain how. Does it make it more efficient? Is it
     worthwhile having more than one update per step for the games implemented
     here?
     Exercise 12.4 It is possible to implement the model-based reinforcement learner
     by replacing q, r, visits, res states with a single dictionary that returns a tuple
     (q, r, v, tm) where q, r and v are numbers, and tm is a map from resulting states
     into counts. Does this make the algorithm easier to understand? Does this make
     the algorithm more efficient?
     Exercise 12.5 If the states and the actions were mapped into integers, the dictio-
     naries could be implemented more efficiently as arrays. This entails an extra step
     in specifying problems. Implement this for the simple game. Is it more efficient?
     list of all feature values for that state and action. This feature set is redesigned
     for each problem.
          get features(state, action) returns the feature values appropriate for the sim-
     ple game.
 55         return features
 56
 57   def monster_ahead(x,y,action):
 58       """returns 1 if the location expected to get to by doing
 59       action from (x,y) can contain a monster.
 60       """
 61       if action == "right" and (x+1,y) in Simple_game_env.monster_locs:
 62           return 1
 63       elif action == "left" and (x-1,y) in Simple_game_env.monster_locs:
 64           return 1
 65       elif action == "up" and (x,y+1) in Simple_game_env.monster_locs:
 66           return 1
 67       elif action == "down" and (x,y-1) in Simple_game_env.monster_locs:
 68           return 1
 69       else:
 70           return 0
 71
 72   def wall_ahead(x,y,action):
 73       """returns 1 if there is a wall in the direction of action from (x,y).
 74       This is complicated by the internal walls.
 75       """
 76       if action == "right" and (x==Simple_game_env.xdim-1 or (x,y) in Simple_game_env.vwalls):
 77           return 1
 78       elif action == "left" and (x==0 or (x-1,y) in Simple_game_env.vwalls):
 79           return 1
 80       elif action == "up" and y==Simple_game_env.ydim-1:
 81           return 1
 82       elif action == "down" and y==0:
 83           return 1
 84       else:
 85           return 0
 86
 87   def towards_prize(x,y,action,p):
 88       """action goes in the direction of the prize from (x,y)"""
 89       if p is None:
 90           return 0
 91       elif p==(0,4): # take into account the wall near the top-left prize
 92           if action == "left" and (x>1 or x==1 and y<3):
 93               return 1
 94           elif action == "down" and (x>0 and y>2):
 95               return 1
 96           elif action == "up" and (x==0 or y<2):
 97               return 1
 98           else:
 99               return 0
100       else:
101           px,py = p
102           if p==(4,4) and x==0:
103               if (action=="right" and y<3) or (action=="down" and y>2) or (action=="up" and y<2):
104                   return 1
105              else:
106                  return 0
107          if (action == "up" and y<py) or (action == "down" and py<y):
108              return 1
109          elif (action == "left" and px<x) or (action == "right" and x<px):
110              return 1
111          else:
112              return 0
113
114   def towards_repair(x,y,action):
115       """returns 1 if action is towards the repair station.
116       """
117       if action == "up" and (x>0 and y<4 or x==0 and y<2):
118           return 1
119       elif action == "left" and x>1:
120           return 1
121       elif action == "right" and x==0 and y<3:
122           return 1
123       elif action == "down" and x==0 and y>2:
124           return 1
125       else:
126           return 0
127
128   def simp_features(state,action):
129       """returns a list of feature values for the state-action pair
130       """
131       assert action in Simple_game_env.actions
132       (x,y,d,p) = state
133       # f1: would go to a monster
134       f1 = monster_ahead(x,y,action)
135       # f2: would crash into wall
136       f2 = wall_ahead(x,y,action)
137       # f3: action is towards a prize
138       f3 = towards_prize(x,y,action,p)
139       return [1,f1,f2,f3]
46         def restart(self):
47             """make the agent relearn, and reset the accumulated rewards
48             """
49             self.acc_rewards = 0
50             self.state = self.env.state
51             self.features = self.get_features(self.state, list(self.env.actions)[0])
52             self.weights = [self.winit for f in self.features]
53             self.action = self.select_action(self.state)
     do takes in the number of steps.
                                    rlFeatures.py — (continued)
55         def do(self,num_steps=100):
56             """do num_steps of interaction with the environment"""
57             self.display(2,"s\ta\tr\ts'\tQ\tdelta")
58             for i in range(num_steps):
59                 next_state,reward = self.env.do(self.action)
60                 self.acc_rewards += reward
61                 next_action = self.select_action(next_state)
62                 feature_values = self.get_features(self.state,self.action)
      Test code:
                                     rlFeatures.py — (continued)
      Exercise 12.6 How does the step-size affect performance? Try different step sizes
      (e.g., 0.1, 0.001, other sizes in between). Explain the behaviour you observe. Which
     step size works best for this example. Explain what evidence you are basing your
     prediction on.
     Exercise 12.7 Does having extra features always help? Does it sometime help?
     Does whether it helps depend on the step size? Give evidence for your claims.
     Exercise 12.8 For each of the following first predict, then plot, then explain the
     behavour you observed:
       (a) SARSA LFA, Model-based learning (with 1 update per step) and Q-learning
           for 10,000 steps 20% exploring followed by 10,000 steps 100% exploiting
      (b) SARSA LFA, model-based learning and Q-learning for
             i) 100,000 steps 20% exploring followed by 100,000 steps 100% exploit
             ii) 10,000 steps 20% exploring followed by 190,000 steps 100% exploit
       (c) Suppose your goal was to have the best accumulated reward after 200,000
           steps. You are allowed to change the exploration rate at a fixed number of
           steps. For each of the methods, which is the best position to start exploiting
           more? Which method is better? What if you wanted to have the best reward
           after 10,000 or 1,000 steps?
     Based on this evidence, explain when it is preferable to use SARSA LFA, Model-
     based learner, or Q-learning.
        Important: you need to run each algorithm more than once. Your explanation
     should include the variability as well as the typical behavior.
Relational Learning
                                                     209
     210                                                              13. Relational Learning
31                 self.test_ratings = self.rating_set.test_ratings
32             else:
33                 self.test_ratings = test_subset
34             self.step_size = step_size
35             self.reglz = reglz
36             self.num_properties = num_properties
37             self.num_ratings = len(self.ratings)
38             self.ave_rating = (sum(r for (u,i,r,t) in self.ratings)
39                              /self.num_ratings)
40             self.users = {u for (u,i,r,t) in self.ratings}
41             self.items = {i for (u,i,r,t) in self.ratings}
42             self.user_bias = {u:0 for u in self.users}
43             self.item_bias = {i:0 for i in self.items}
44             self.user_prop = {u:[random.uniform(-property_range,property_range)
45                                 for p in range(num_properties)]
46                                for u in self.users}
47             self.item_prop = {i:[random.uniform(-property_range,property_range)
48                                 for p in range(num_properties)]
49                                for i in self.items}
50             self.zeros = [0 for p in range(num_properties)]
51             self.iter=0
52
53         def stats(self):
54             self.display(1,"ave sumsq error of mean for training=",
55                      sum((self.ave_rating-rating)**2 for (user,item,rating,timestamp)
56                          in self.ratings)/len(self.ratings))
57             self.display(1,"ave sumsq error of mean for test=",
58                      sum((self.ave_rating-rating)**2 for (user,item,rating,timestamp)
59                          in self.test_ratings)/len(self.test_ratings))
60             self.display(1,"error on training set",
61                         self.evaluate(self.ratings))
62             self.display(1,"error on test set",
63                         self.evaluate(self.test_ratings))
relnCollFilt.py — (continued)
65         def prediction(self,user,item):
66             """Returns prediction for this user on this item.
67             The use of .get() is to handle users or items not in the training set.
68             """
69             return (self.ave_rating
70                    + self.user_bias.get(user,0) #self.user_bias[user]
71                    + self.item_bias.get(item,0) #self.item_bias[item]
72                    + sum([self.user_prop.get(user,self.zeros)[p]*self.item_prop.get(item,self.zeros)[p]
73                           for p in range(self.num_properties)]))
74
75         def learn(self, num_iter = 50):
76             """ do num_iter iterations of gradient descent."""
77             for i in range(num_iter):
78                 self.iter += 1
 79              abs_error=0
 80              sumsq_error=0
 81              for (user,item,rating,timestamp) in random.sample(self.ratings,len(self.ratings)):
 82                  error = self.prediction(user,item) - rating
 83                  abs_error += abs(error)
 84                  sumsq_error += error * error
 85                  self.user_bias[user] -= self.step_size*error
 86                  self.item_bias[item] -= self.step_size*error
 87                  for p in range(self.num_properties):
 88                      self.user_prop[user][p] -= self.step_size*error*self.item_prop[item][p]
 89                      self.item_prop[item][p] -= self.step_size*error*self.user_prop[user][p]
 90              for user in self.users:
 91                   self.user_bias[user] -= self.step_size*self.reglz* self.user_bias[user]
 92                   for p in range(self.num_properties):
 93                       self.user_prop[user][p] -= self.step_size*self.reglz*self.user_prop[user][p]
 94              for item in self.items:
 95                  self.item_bias[item] -= self.step_size*self.reglz*self.item_bias[item]
 96                  for p in range(self.num_properties):
 97                      self.item_prop[item][p] -= self.step_size*self.reglz*self.item_prop[item][p]
 98              self.display(1,"Iteration",self.iter,
 99                    "(Ave Abs,AveSumSq) training =",self.evaluate(self.ratings),
100                    "test =",self.evaluate(self.test_ratings))
         evaluate evaluates current predictions on the rating set:
                                      relnCollFilt.py — (continued)
      13.1.2 Plotting
                                      relnCollFilt.py — (continued)
163                            self.item_prop[i][p],
164                            str(r))
165           else:
166               for i in range(num_points):
167                   (u,i,r,t) = random.choice(self.ratings)
168                   plt.text(self.user_prop[u][p],
169                           self.item_prop[i][p],
170                           str(r))
171           plt.show()
249 #learner1.plot_property(0,plot_all=True)
                                       217
218                                                                         Index
sampling, 147
    importance sampling, 152
    belief networks, 149
    likelihood weighting, 151
    particle filtering, 152
    rejection, 150
scope, 49
search, 31
    A∗ , 39
    branch-and-bound, 44
    multiple path pruning, 43
search with any conflict, 65
search with var pq, 67
sigmoid, 123
stochastic local search, 63
    any-conflict, 65
    two-stage choice, 66
stochastic simulation, 147
test
     SLS, 71
tic-tac-toe, 186
top-down proof, 77
uncertainty, 137
unit tests, 17
updatable priority queue, 68
yield, 12