0% found this document useful (0 votes)

89 views350 pages

Python AI Programming Guide

This document is a version (0.9.9) of an ebook on Python code for artificial intelligence foundations and computational agents. It includes an introduction to using Python for AI, describing features like lists, dictionaries, and generators. It also outlines several useful Python libraries for tasks like timing code and plotting. The bulk of the document discusses different AI and machine learning techniques in Python like searching, constraint satisfaction problems, propositional logic, and more. It provides code examples and explanations for implementing these algorithms.

Uploaded by

Nurbek Murzamuratov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views350 pages

Python AI Programming Guide

Uploaded by

Nurbek Murzamuratov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 350

1

Python code for Artificial

Intelligence: Foundations of
Computational Agents

David L. Poole and Alan K. Mackworth

Version 0.9.9 of October 15, 2023.

https://aipython.org https://artint.info
©David L Poole and Alan K Mackworth 2017-2023.
All code is licensed under a Creative Commons Attribution-NonCommercial-
ShareAlike 4.0 International License. See: https://creativecommons.org/licenses/
by-nc-sa/4.0/deed.en
This document and all the code can be downloaded from
https://artint.info/AIPython/ or from https://aipython.org
The authors and publisher of this book have used their best efforts in prepar-
ing this book. These efforts include the development, research and testing of
the theories and programs to determine their effectiveness. The authors and
publisher make no warranty of any kind, expressed or implied, with regard to
these programs or the documentation contained in this book. The author and
publisher shall not be liable in any event for incidental or consequential dam-
ages in connection with, or arising out of, the furnishing, performance, or use
of these programs.

https://aipython.org Version 0.9.9 October 15, 2023

Contents

Contents 3

1 Python for Artificial Intelligence 9

1.1 Why Python? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Getting Python . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Running Python . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Features of Python . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.1 f-strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.2 Lists, Tuples, Sets, Dictionaries and Comprehensions . . 12
1.5.3 Functions as first-class objects . . . . . . . . . . . . . . . . 13
1.5.4 Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Useful Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6.1 Timing Code . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6.2 Plotting: Matplotlib . . . . . . . . . . . . . . . . . . . . . 16
1.7 Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7.1 Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7.2 Argmax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.7.3 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7.4 Dictionary Union . . . . . . . . . . . . . . . . . . . . . . . 20
1.8 Testing Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2 Agent Architectures and Hierarchical Control 23

2.1 Representing Agents and Environments . . . . . . . . . . . . . 23
2.2 Paper buying agent and environment . . . . . . . . . . . . . . 25
2.2.1 The Environment . . . . . . . . . . . . . . . . . . . . . . . 25

3
4 Contents

2.2.2 The Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.2.3 Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Hierarchical Controller . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.2 Body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.3 Middle Layer . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.4 Top Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.5 Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3 Searching for Solutions 39

3.1 Representing Search Problems . . . . . . . . . . . . . . . . . . 39
3.1.1 Explicit Representation of Search Graph . . . . . . . . . . 40
3.1.2 Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.1.3 Example Search Problems . . . . . . . . . . . . . . . . . . 44
3.2 Generic Searcher and Variants . . . . . . . . . . . . . . . . . . . 51
3.2.1 Searcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.2 GUI for Tracing Search . . . . . . . . . . . . . . . . . . . . 53
3.2.3 Frontier as a Priority Queue . . . . . . . . . . . . . . . . . 57
3.2.4 A∗ Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2.5 Multiple Path Pruning . . . . . . . . . . . . . . . . . . . . 60
3.3 Branch-and-bound Search . . . . . . . . . . . . . . . . . . . . . 62

4 Reasoning with Constraints 67

4.1 Constraint Satisfaction Problems . . . . . . . . . . . . . . . . . 67
4.1.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.2 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.1.3 CSPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.1.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2 A Simple Depth-first Solver . . . . . . . . . . . . . . . . . . . . 81
4.3 Converting CSPs to Search Problems . . . . . . . . . . . . . . . 82
4.4 Consistency Algorithms . . . . . . . . . . . . . . . . . . . . . . 84
4.4.1 Direct Implementation of Domain Splitting . . . . . . . . 87
4.4.2 Consistency GUI . . . . . . . . . . . . . . . . . . . . . . . 89
4.4.3 Domain Splitting as an interface to graph searching . . . 91
4.5 Solving CSPs using Stochastic Local Search . . . . . . . . . . . 93
4.5.1 Any-conflict . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.5.2 Two-Stage Choice . . . . . . . . . . . . . . . . . . . . . . . 96
4.5.3 Updatable Priority Queues . . . . . . . . . . . . . . . . . 99
4.5.4 Plotting Run-Time Distributions . . . . . . . . . . . . . . 100
4.5.5 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.6 Discrete Optimization . . . . . . . . . . . . . . . . . . . . . . . 103
4.6.1 Branch-and-bound Search . . . . . . . . . . . . . . . . . . 104

5 Propositions and Inference 107

5.1 Representing Knowledge Bases . . . . . . . . . . . . . . . . . . 107

https://aipython.org Version 0.9.9 October 15, 2023

Contents 5

5.2 Bottom-up Proofs (with askables) . . . . . . . . . . . . . . . . . 110

5.3 Top-down Proofs (with askables) . . . . . . . . . . . . . . . . . 112
5.4 Debugging and Explanation . . . . . . . . . . . . . . . . . . . . 113
5.5 Assumables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.6 Negation-as-failure . . . . . . . . . . . . . . . . . . . . . . . . . 120

6 Deterministic Planning 123

6.1 Representing Actions and Planning Problems . . . . . . . . . . 123
6.1.1 Robot Delivery Domain . . . . . . . . . . . . . . . . . . . 124
6.1.2 Blocks World . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.2 Forward Planning . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.2.1 Defining Heuristics for a Planner . . . . . . . . . . . . . . 131
6.3 Regression Planning . . . . . . . . . . . . . . . . . . . . . . . . 133
6.3.1 Defining Heuristics for a Regression Planner . . . . . . . 135
6.4 Planning as a CSP . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.5 Partial-Order Planning . . . . . . . . . . . . . . . . . . . . . . . 140

7 Supervised Machine Learning 147

7.1 Representations of Data and Predictions . . . . . . . . . . . . . 148
7.1.1 Creating Boolean Conditions from Features . . . . . . . . 151
7.1.2 Evaluating Predictions . . . . . . . . . . . . . . . . . . . . 153
7.1.3 Creating Test and Training Sets . . . . . . . . . . . . . . . 155
7.1.4 Importing Data From File . . . . . . . . . . . . . . . . . . 155
7.1.5 Augmented Features . . . . . . . . . . . . . . . . . . . . . 158
7.2 Generic Learner Interface . . . . . . . . . . . . . . . . . . . . . 160
7.3 Learning With No Input Features . . . . . . . . . . . . . . . . . 161
7.3.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.4 Decision Tree Learning . . . . . . . . . . . . . . . . . . . . . . . 165
7.5 Cross Validation and Parameter Tuning . . . . . . . . . . . . . 169
7.6 Linear Regression and Classification . . . . . . . . . . . . . . . 173
7.7 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
7.7.1 Gradient Tree Boosting . . . . . . . . . . . . . . . . . . . . 182

8 Neural Networks and Deep Learning 185

8.1 Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
8.2 Feedforward Networks . . . . . . . . . . . . . . . . . . . . . . . 188
8.3 Improved Optimization . . . . . . . . . . . . . . . . . . . . . . 190
8.3.1 Momentum . . . . . . . . . . . . . . . . . . . . . . . . . . 190
8.3.2 RMS-Prop . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
8.4 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
8.4.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

9 Reasoning with Uncertainty 199

9.1 Representing Probabilistic Models . . . . . . . . . . . . . . . . 199
9.2 Representing Factors . . . . . . . . . . . . . . . . . . . . . . . . 199

https://aipython.org Version 0.9.9 October 15, 2023

6 Contents

9.3 Conditional Probability Distributions . . . . . . . . . . . . . . 201

9.3.1 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . 201
9.3.2 Noisy-or . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
9.3.3 Tabular Factors and Prob . . . . . . . . . . . . . . . . . . 203
9.3.4 Decision Tree Representations of Factors . . . . . . . . . 204
9.4 Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . . 206
9.4.1 Showing Belief Networks . . . . . . . . . . . . . . . . . . 207
9.4.2 Example Belief Networks . . . . . . . . . . . . . . . . . . 208
9.5 Inference Methods . . . . . . . . . . . . . . . . . . . . . . . . . 214
9.5.1 Showing Posterior Distributions . . . . . . . . . . . . . . 215
9.6 Naive Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
9.7 Recursive Conditioning . . . . . . . . . . . . . . . . . . . . . . 218
9.8 Variable Elimination . . . . . . . . . . . . . . . . . . . . . . . . 222
9.9 Stochastic Simulation . . . . . . . . . . . . . . . . . . . . . . . . 225
9.9.1 Sampling from a discrete distribution . . . . . . . . . . . 225
9.9.2 Sampling Methods for Belief Network Inference . . . . . 227
9.9.3 Rejection Sampling . . . . . . . . . . . . . . . . . . . . . . 227
9.9.4 Likelihood Weighting . . . . . . . . . . . . . . . . . . . . 228
9.9.5 Particle Filtering . . . . . . . . . . . . . . . . . . . . . . . 229
9.9.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
9.9.7 Gibbs Sampling . . . . . . . . . . . . . . . . . . . . . . . . 232
9.9.8 Plotting Behavior of Stochastic Simulators . . . . . . . . 234
9.10 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . 236
9.10.1 Exact Filtering for HMMs . . . . . . . . . . . . . . . . . . 238
9.10.2 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . 239
9.10.3 Particle Filtering for HMMs . . . . . . . . . . . . . . . . . 242
9.10.4 Generating Examples . . . . . . . . . . . . . . . . . . . . 244
9.11 Dynamic Belief Networks . . . . . . . . . . . . . . . . . . . . . 245
9.11.1 Representing Dynamic Belief Networks . . . . . . . . . . 245
9.11.2 Unrolling DBNs . . . . . . . . . . . . . . . . . . . . . . . . 248
9.11.3 DBN Filtering . . . . . . . . . . . . . . . . . . . . . . . . . 249

10 Learning with Uncertainty 251

10.1 Bayesian Learning . . . . . . . . . . . . . . . . . . . . . . . . . 251
10.2 K-means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
10.3 EM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

11 Causality 265
11.1 Do Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
11.2 Counterfactual Example . . . . . . . . . . . . . . . . . . . . . . 267

12 Planning with Uncertainty 271

12.1 Decision Networks . . . . . . . . . . . . . . . . . . . . . . . . . 271
12.1.1 Example Decision Networks . . . . . . . . . . . . . . . . 273
12.1.2 Decision Functions . . . . . . . . . . . . . . . . . . . . . . 279

https://aipython.org Version 0.9.9 October 15, 2023

Contents 7

12.1.3 Recursive Conditioning for decision networks . . . . . . 279

12.1.4 Variable elimination for decision networks . . . . . . . . 283
12.2 Markov Decision Processes . . . . . . . . . . . . . . . . . . . . 285
12.2.1 Value Iteration . . . . . . . . . . . . . . . . . . . . . . . . 289
12.2.2 Showing Grid MDPs . . . . . . . . . . . . . . . . . . . . . 290
12.2.3 Asynchronous Value Iteration . . . . . . . . . . . . . . . . 294

13 Reinforcement Learning 295

13.1 Representing Agents and Environments . . . . . . . . . . . . . 295
13.1.1 Simulating an environment from an MDP . . . . . . . . . 298
13.1.2 Monster Game . . . . . . . . . . . . . . . . . . . . . . . . 299
13.2 Q Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
13.2.1 Exploration Strategies . . . . . . . . . . . . . . . . . . . . 303
13.2.2 Testing Q-learning . . . . . . . . . . . . . . . . . . . . . . 304
13.3 Q-leaning with Experience Replay . . . . . . . . . . . . . . . . 305
13.4 Model-based Reinforcement Learner . . . . . . . . . . . . . . . 307
13.5 Reinforcement Learning with Features . . . . . . . . . . . . . . 310
13.5.1 Representing Features . . . . . . . . . . . . . . . . . . . . 310
13.5.2 Feature-based RL learner . . . . . . . . . . . . . . . . . . 313

14 Multiagent Systems 317

14.1 Minimax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
14.1.1 Creating a two-player game . . . . . . . . . . . . . . . . . 317
14.1.2 Minimax and α-β Pruning . . . . . . . . . . . . . . . . . . 320
14.2 Multiagent Learning . . . . . . . . . . . . . . . . . . . . . . . . 322

15 Relational Learning 329

15.1 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . 329
15.1.1 Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
15.1.2 Creating Rating Sets . . . . . . . . . . . . . . . . . . . . . 333
15.2 Relational Probabilistic Models . . . . . . . . . . . . . . . . . . 335

16 Version History 341

Bibliography 343

Index 345

https://aipython.org Version 0.9.9 October 15, 2023

Chapter 1

Python for Artificial Intelligence

AIPython contains runnable code for the book Artificial Intelligence, foundations
of computational agents, 3rd Edition [Poole and Mackworth, 2023]. It has the
following design goals:

• Readability is more important than efficiency, although the asymptotic

complexity is not compromised. AIPython is not a replacement for well-
designed libraries, or optimized tools. Think of it like a model of an en-
gine made of glass, so you can see the inner workings; don’t expect it to
power a big truck, but it lets you see how a metal engine can power a
truck.

• It uses as few libraries as possible. A reader only needs to understand

Python. Libraries hide details that we make explicit. The only library
used is matplotlib for plotting and drawing.

1.1 Why Python?

We use Python because Python programs can be close to pseudo-code. It is
designed for humans to read.
Python is reasonably efficient. Efficiency is usually not a problem for small
examples. If your Python code is not efficient enough, a general procedure to
improve it is to find out what is taking most of the time, and implement just
that part more efficiently in some lower-level language. Most of these lower-
level languages interoperate with Python nicely. This will result in much less
programming and more efficient code (because you will have more time to
optimize) than writing everything in a low-level language. You will not have
to do that for the code here if you are using it for larger projects.

9
10 1. Python for Artificial Intelligence

1.2 Getting Python

You need Python 3.9 or later1 (https://python.org/) and a compatible version
of matplotlib (https://matplotlib.org/). This code is not compatible with
Python 2 (e.g., with Python 2.7).
Download and install the latest Python 3 release from https://python.org/
or https://www.anaconda.com/download. This should also install pip3. You can
install matplotlib using
pip3 install matplotlib
in a terminal shell (not in Python). That should “just work”. If not, try using
pip instead of pip3.
The command python or python3 should then start the interactive python
shell. You can quit Python with a control-D or with quit().
To upgrade matplotlib to the latest version (which you should do if you
install a new version of Python) do:
pip3 install --upgrade matplotlib
We recommend using the enhanced interactive python ipython (https://
ipython.org/) [Pérez and Granger, 2007]. To install ipython after you have
installed python do:
pip3 install ipython

1.3 Running Python

We assume that everything is done with an interactive Python shell. You can
either do this with an IDE, such as IDLE that comes with standard Python
distributions, or just running ipython3 or python3 (or perhaps just ipython or
python) from a shell.
Here we describe the most simple version that uses no IDE. If you down-
load the zip file, and cd to the “aipython” folder where the .py files are, you
should be able to do the following, with user input in bold. The first python
command is in the operating system shell; the -i is important to enter interac-
tive mode.
python -i searchGeneric.py
Testing problem 1:
7 paths have been expanded and 4 paths remain in the frontier
Path found: A --> C --> B --> D --> G
Passed unit test
>>> searcher2 = AStarSearcher(searchProblem.acyclic_delivery_problem) #A*
>>> searcher2.search() # find first path
The feature of 3.8 used is :=. To use earlier
1 The only feature of 3.9 used is dictionary union.

versions 3.8, replace | with dict union defined in Section 1.7.4.

https://aipython.org Version 0.9.9 October 15, 2023

1.4. Pitfalls 11

16 paths have been expanded and 5 paths remain in the frontier

o103 --> o109 --> o119 --> o123 --> r123
>>> searcher2.search() # find first path
21 paths have been expanded and 6 paths remain in the frontier
o103 --> b3 --> b4 --> o109 --> o119 --> o123 --> r123
>>> searcher2.search() # find first path
28 paths have been expanded and 5 paths remain in the frontier
o103 --> b3 --> b1 --> b2 --> b4 --> o109 --> o119 --> o123 --> r123
>>> searcher2.search() # find first path
No (more) solutions. Total of 33 paths expanded.
>>>

You can then interact at the last prompt.

There are many textbooks for Python. The best source of information about
python is https://www.python.org/. The documentation is at https://docs.
python.org/3/.
The rest of this chapter is about what is special about the code for AI tools.
We will only use the standard Python library and matplotlib. All of the exer-
cises can be done (and should be done) without using other libraries; the aim
is for you to spend your time thinking about how to solve the problem rather
than searching for pre-existing solutions.

1.4 Pitfalls
It is important to know when side effects occur. Often AI programs consider
what would/might happen given certain conditions. In many such cases, we
don’t want side effects. When an agent acts in the world, side effects are ap-
propriate.
In Python, you need to be careful to understand side effects. For example,
the inexpensive function to add an element to a list, namely append, changes the
list. In a functional language like Haskell or Lisp, adding a new element to a
list, without changing the original list, is a cheap operation. For example if x is
a list containing n elements, adding an extra element to the list in Python (using
append) is fast, but it has the side effect of changing the list x. To construct a new
list that contains the elements of x plus a new element, without changing the
value of x, entails copying the list, or using a different representation for lists.
In the searching code, we will use a different representation for lists for this
reason.

https://aipython.org Version 0.9.9 October 15, 2023

12 1. Python for Artificial Intelligence

1.5 Features of Python

1.5.1 f-strings
Python can use matching ', ", ''' or """, the latter two respecting line breaks
in the string. We use the convention that when the string denotes a unique
symbol, we use single quotes, and when it is a designed to be for printing, we
use double quotes.
We make extensive use of f-strings https://docs.python.org/3/tutorial/
inputoutput.html. In its simplest form
"str1{e1}str2{e2}str3"
where e1 and e2 are expressions, is an abbreviation for
"str1"+str(e2)+"str2"+str(e2)+"str3"
where + is string concatenation, and str is the function that returns a string
representation of its expression argument.

1.5.2 Lists, Tuples, Sets, Dictionaries and Comprehensions

We make extensive uses of lists, tuples, sets and dictionaries (dicts). See
https://docs.python.org/3/library/stdtypes.html
One of the nice features of Python is the use of comprehensions2 (and also
list, tuple, set and dictionary comprehensions). A generator expression is of
the form

(fe for e in iter if cond)

enumerates the values fe for each e in iter for which cond is true. The “if cond”
part is optional, but the “for” and “in” are not optional. Here e is a variable (or
a pattern that can be on the left side of =), iter is an iterator, which can generate
a stream of data, such as a list, a set, a range object (to enumerate integers
between ranges) or a file. cond is an expression that evaluates to either True or
False for each e, and fe is an expression that will be evaluated for each value of
e for which cond returns True.
The result can go in a list or used in another iteration, or can be called
directly using next. The procedure next takes an iterator returns the next el-
ement (advancing the iterator) and raises a StopIteration exception if there is
no next element. The following shows a simple example, where user input is
prepended with >>>
>>> [e*e for e in range(20) if e%2==0]
[0, 4, 16, 36, 64, 100, 144, 196, 256, 324]
>>> a = (e*e for e in range(20) if e%2==0)
>>> next(a)
2 https://docs.python.org/3/reference/expressions.html#displays-for-lists-sets-and-dictionaries

https://aipython.org Version 0.9.9 October 15, 2023

1.5. Features of Python 13

0
>>> next(a)
4
>>> next(a)
16
>>> list(a)
[36, 64, 100, 144, 196, 256, 324]
>>> next(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Notice how list(a) continued on the enumeration, and got to the end of it.
Comprehensions can also be used for dictionaries. The following code cre-
ates an index for list a:
>>> a = ["a","f","bar","b","a","aaaaa"]
>>> ind = {a[i]:i for i in range(len(a))}
>>> ind
{'a': 4, 'f': 1, 'bar': 2, 'b': 3, 'aaaaa': 5}
>>> ind['b']
3
which means that 'b' is the 3rd element of the list.
The assignment of ind could have also be written as:
>>> ind = {val:i for (i,val) in enumerate(a)}
where enumerate is a built-in function that, given a dictionary, returns an itera-
tor of (index, value) pairs.

1.5.3 Functions as first-class objects

Python can create lists and other data structures that contain functions. There
is an issue that tricks many newcomers to Python. For a local variable in a
function, the function uses the last value of the variable when the function is
called, not the value of the variable when the function was defined (this is called
“late binding”). This means if you want to use the value a variable has when
the function is created, you need to save the current value of that variable.
Whereas Python uses “late binding” by default, the alternative that newcomers
often expect is “early binding”, where a function uses the value a variable had
when the function was defined, can be easily implemented.
Consider the following programs designed to create a list of 5 functions,
where the ith function in the list is meant to add i to its argument:3
3 Numbered lines are Python code available in the code-directory, aipython. The name of

the file is given in the gray text above the listing. The numbers correspond to the line numbers
in that file.

https://aipython.org Version 0.9.9 October 15, 2023

14 1. Python for Artificial Intelligence

pythonDemo.py — Some tricky examples

11 fun_list1 = []
12 for i in range(5):
13 def fun1(e):
14 return e+i
15 fun_list1.append(fun1)
16
17 fun_list2 = []
18 for i in range(5):
19 def fun2(e,iv=i):
20 return e+iv
21 fun_list2.append(fun2)
22
23 fun_list3 = [lambda e: e+i for i in range(5)]
24
25 fun_list4 = [lambda e,iv=i: e+iv for i in range(5)]
26
27 i=56

Try to predict, and then test to see the output, of the output of the following
calls, remembering that the function uses the latest value of any variable that
is not bound in the function call:

pythonDemo.py — (continued)

29 # in Shell do
30 ## ipython -i pythonDemo.py
31 # Try these (copy text after the comment symbol and paste in the Python
prompt):
32 # print([f(10) for f in fun_list1])
33 # print([f(10) for f in fun_list2])
34 # print([f(10) for f in fun_list3])
35 # print([f(10) for f in fun_list4])

In the first for-loop, the function fun uses i, whose value is the last value it was
assigned. In the second loop, the function fun2 uses iv. There is a separate iv
variable for each function, and its value is the value of i when the function was
defined. Thus fun1 uses late binding, and fun2 uses early binding. fun list3
and fun list4 are equivalent to the first two (except fun list4 uses a different i
variable).
One of the advantages of using the embedded definitions (as in fun1 and
fun2 above) over the lambda is that is it possible to add a __doc__ string, which
is the standard for documenting functions in Python, to the embedded defini-
tions.

1.5.4 Generators
Python has generators which can be used for a form of lazy evaluation – only
computing values when needed.

https://aipython.org Version 0.9.9 October 15, 2023

1.5. Features of Python 15

The yield command returns a value that is obtained with next. It is typi-
cally used to enumerate the values for a for loop or in generators. (The yield
command can also be used for coroutines, but AIPython only uses it for gener-
ators.)
A version of the built-in range, with 2 or 3 arguments (and positive steps)
can be implemented as:
pythonDemo.py — (continued)

37 def myrange(start, stop, step=1):

38 """enumerates the values from start in steps of size step that are
39 less than stop.
40 """
41 assert step>0, f"only positive steps implemented in myrange: {step}"
42 i = start
43 while i<stop:
44 yield i
45 i += step
46
47 print("list(myrange(2,30,3)):",list(myrange(2,30,3)))
Note that the built-in range is unconventional in how it handles a single ar-
gument, as the single argument acts as the second argument of the function.
Note also that the built-in range also allows for indexing (e.g., range(2, 30, 3)[2]
returns 8), which the above implementation does not. However myrange also
works for floats, which the built-in range does not.
Exercise 1.1 Implement a version of myrange that acts like the built-in version
when there is a single argument. (Hint: make the second argument have a default
value that can be recognized in the function.) There is not need to make it with
indexing.
Yield can be used to generate the same sequence of values as in the example
of Section 1.5.2:
pythonDemo.py — (continued)

49 def ga(n):
50 """generates square of even nonnegative integers less than n"""
51 for e in range(n):
52 if e%2==0:
53 yield e*e
54 a = ga(20)
The sequence of next(a), and list(a) gives exactly the same results as the com-
prehension in Section 1.5.2.
It is straightforward to write a version of the built-in enumerate called myenumerate:
pythonDemo.py — (continued)

56 def myenumerate(enum):
57 for i in range(len(enum)):
58 yield i,enum[i]

https://aipython.org Version 0.9.9 October 15, 2023

16 1. Python for Artificial Intelligence

Exercise 1.2 Write a version of enumerate where the only iteration is “for val in
enum”. Hint: keep track of the index.

1.6 Useful Libraries

1.6.1 Timing Code
In order to compare algorithms, we often want to compute how long a program
takes; this is called the run time of the program. The most straightforward way
to compute run time is to use time.perf counter(), as in:
import time
start_time = time.perf_counter()
compute_for_a_while()
end_time = time.perf_counter()
print("Time:", end_time - start_time, "seconds")
Note that time.perf_counter() measures clock time; so this should be done
without user interaction between the calls. On the interactive python shell, you
should do:
start_time = time.perf_counter(); compute_for_a_while(); end_time = time.perf_counter()
If this time is very small (say less than 0.2 second), it is probably very inac-
curate, and it may be better to run your code many times to get a more accu-
rate count. For this you can use timeit (https://docs.python.org/3/library/
timeit.html). To use timeit to time the call to foo.bar(aaa) use:
import timeit
time = timeit.timeit("foo.bar(aaa)",
setup="from __main__ import foo,aaa", number=100)
The setup is needed so that Python can find the meaning of the names in the
string that is called. This returns the number of seconds to execute foo.bar(aaa)
100 times. The variable number should be set so that the run time is at least 0.2
seconds.
You should not trust a single measurement as that can be confounded by
interference from other processes. timeit.repeat can be used for running timit a
few (say 3) times. When reporting the time of any computation, you should be
explicit and explain what you are reporting. Usually the minimum time is the
one to report.

1.6.2 Plotting: Matplotlib

The standard plotting for Python is matplotlib (https://matplotlib.org/). We
will use the most basic plotting using the pyplot interface.
Here is a simple example that uses everything we will use.

https://aipython.org Version 0.9.9 October 15, 2023

1.7. Utilities 17

pythonDemo.py — (continued)

60 import matplotlib.pyplot as plt

61
62 def myplot(minv,maxv,step,fun1,fun2):
63 plt.ion() # make it interactive
64 plt.xlabel("The x axis")
65 plt.ylabel("The y axis")
66 plt.xscale('linear') # Makes a 'log' or 'linear' scale
67 xvalues = range(minv,maxv,step)
68 plt.plot(xvalues,[fun1(x) for x in xvalues],
69 label="The first fun")
70 plt.plot(xvalues,[fun2(x) for x in xvalues], linestyle='--',color='k',
71 label=fun2.__doc__) # use the doc string of the function
72 plt.legend(loc="upper right") # display the legend
73
74 def slin(x):
75 """y=2x+7"""
76 return 2*x+7
77 def sqfun(x):
78 """y=(x-40)ˆ2/10-20"""
79 return (x-40)**2/10-20
80
81 # Try the following:
82 # from pythonDemo import myplot, slin, sqfun
83 # import matplotlib.pyplot as plt
84 # myplot(0,100,1,slin,sqfun)
85 # plt.legend(loc="best")
86 # import math
87 # plt.plot([41+40*math.cos(th/10) for th in range(50)],
88 # [100+100*math.sin(th/10) for th in range(50)])
89 # plt.text(40,100,"ellipse?")
90 # plt.xscale('log')

At the end of the code are some commented-out commands you should try in
interactive mode. Cut from the file and paste into Python (and remember to
remove the comments symbol and leading space).

1.7 Utilities
1.7.1 Display
In this distribution, to keep things simple and to only use standard Python, we
use a text-oriented tracing of the code. A graphical depiction of the code can
override the definition of display (see SearcherGUI in Section 3.2.2 and ConsistencyGUI
in Section 4.4.2).
The method self .display is used to trace the program. Any call

self .display(level, to print . . . )

https://aipython.org Version 0.9.9 October 15, 2023

18 1. Python for Artificial Intelligence

where the level is less than or equal to the value for max display level will be
printed. The to print . . . can be anything that is accepted by the built-in print
(including any keyword arguments).
The definition of display is:

display.py — A simple way to trace the intermediate steps of algorithms.

11 class Displayable(object):
12 """Class that uses 'display'.
13 The amount of detail is controlled by max_display_level
14 """
15 max_display_level = 1 # can be overridden in subclasses or instances
16
17 def display(self,level,*args,**nargs):
18 """print the arguments if level is less than or equal to the
19 current max_display_level.
20 level is an integer.
21 the other arguments are whatever arguments print can take.
22 """
23 if level <= self.max_display_level:
24 print(*args, **nargs) ##if error you are using Python2 not
Python3

Note that args gets a tuple of the positional arguments, and nargs gets a dictio-
nary of the keyword arguments). This will not work in Python 2, and will give
an error.
Any class that wants to use display can be made a subclass of Displayable.
To change the maximum display level to say 3, for a class do:

Classname.max display level = 3

which will make calls to display in that class print when the value of level is less
than-or-equal to 3. The default display level is 1. It can also be changed for
individual objects (the object value overrides the class value).
The value of max display level by convention is:

0 display nothing

1 display solutions (nothing that happens repeatedly)

2 also display the values as they change (little detail through a loop)

3 also display more details

4 and above even more detail

In order to implement more sophisticated visualizations of the algorithm,

we add a visualize “decorator” to the methods to be visualized. The following
code ignores the decorator:

https://aipython.org Version 0.9.9 October 15, 2023

1.7. Utilities 19

display.py — (continued)

26 def visualize(func):
27 """A decorator for algorithms that do interactive visualization.
28 Ignored here.
29 """
30 return func

1.7.2 Argmax
Python has a built-in max function that takes a generator (or a list or set) and re-
turns the maximum value. The argmax method returns the index of an element
that has the maximum value. If there are multiple elements with the maxi-
mum value, one if the indexes to that value is returned at random. argmaxe
assumes an enumeration; a generator of (element, value) pairs, as for example
is generated by the built-in enumerate(list) for lists or dict.items() for dicts.
utilities.py — AIPython useful utilities
11 import random
12 import math
13
14 def argmaxall(gen):
15 """gen is a generator of (element,value) pairs, where value is a real.
16 argmaxall returns a list of all of the elements with maximal value.
17 """
18 maxv = -math.inf # negative infinity
19 maxvals = [] # list of maximal elements
20 for (e,v) in gen:
21 if v>maxv:
22 maxvals,maxv = [e], v
23 elif v==maxv:
24 maxvals.append(e)
25 return maxvals
26
27 def argmaxe(gen):
28 """gen is a generator of (element,value) pairs, where value is a real.
29 argmaxe returns an element with maximal value.
30 If there are multiple elements with the max value, one is returned at
random.
31 """
32 return random.choice(argmaxall(gen))
33
34 def argmax(lst):
35 """returns maximum index in a list"""
36 return argmaxe(enumerate(lst))
37 # Try:
38 # argmax([1,6,3,77,3,55,23])
39
40 def argmaxd(dct):
41 """returns the arg max of a dictionary dct"""

https://aipython.org Version 0.9.9 October 15, 2023

20 1. Python for Artificial Intelligence

42 return argmaxe(dct.items())
43 # Try:
44 # arxmaxd({2:5,5:9,7:7})

Exercise 1.3 Change argmax to have an optional argument that specifies whether
you want the “first”, “last” or a “random” index of the maximum value returned.
If you want the first or the last, you don’t need to keep a list of the maximum
elements.

1.7.3 Probability
For many of the simulations, we want to make a variable True with some prob-
ability. flip(p) returns True with probability p, and otherwise returns False.
utilities.py — (continued)

45 def flip(prob):
46 """return true with probability prob"""
47 return random.random() < prob
The pick from dist method takes in a item : probability dictionary, and returns
one of the items in proportion to its probability.
utilities.py — (continued)

49 def pick_from_dist(item_prob_dist):
50 """ returns a value from a distribution.
51 item_prob_dist is an item:probability dictionary, where the
52 probabilities sum to 1.
53 returns an item chosen in proportion to its probability
54 """
55 ranreal = random.random()
56 for (it,prob) in item_prob_dist.items():
57 if ranreal < prob:
58 return it
59 else:
60 ranreal -= prob
61 raise RuntimeError(f"{item_prob_dist} is not a probability
distribution")

1.7.4 Dictionary Union

This is now | in Python 3.9, has been be replaced in the code. Use this if you
want to back-port to an older version of Python.
The function dict union(d1, d2) returns the union of dictionaries d1 and d2.
If the values for the keys conflict, the values in d2 are used. This is similar to
dict(d1, ∗ ∗ d2), but that only works when the keys of d2 are strings.
utilities.py — (continued)

63 def dict_union(d1,d2):
64 """returns a dictionary that contains the keys of d1 and d2.

https://aipython.org Version 0.9.9 October 15, 2023

1.8. Testing Code 21

65 The value for each key that is in d2 is the value from d2,
66 otherwise it is the value from d1.
67 This does not have side effects.
68 """
69 d = dict(d1) # copy d1
70 d.update(d2)
71 return d

1.8 Testing Code

It is important to test code early and test it often. We include a simple form of
unit test. The value of the current module is in __name__ and if the module is
run at the top-level, it’s value is "__main__". See https://docs.python.org/3/
library/ main .html.
The following code tests argmax and dict_union, but only when if utilities
is loaded in the top-level. If it is loaded in a module the test code is not run.
In your code, you should do more substantial testing than done here. In
particular, you should also test boundary cases.

utilities.py — (continued)

73 def test():
74 """Test part of utilities"""
75 assert argmax([1,6,55,3,55,23]) in [2,4]
76 assert dict_union({1:4, 2:5, 3:4},{5:7, 2:9}) == {1:4, 2:9, 3:4, 5:7}
77 print("Passed unit test in utilities")
78
79 if __name__ == "__main__":
80 test()

The following does a simple check of all of AIPython that has automatic checks.
If you develop new algorithms or tests, add there here!

utilities.py — (continued)

82 def test_aipython():
83 # Agents: currently no tests
84 # Search:
85 print("***** testing Search *****")
86 import searchGeneric, searchBranchAndBound, searchExample, searchTest
87 searchGeneric.test(searchGeneric.AStarSearcher)
88 searchBranchAndBound.test(searchBranchAndBound.DF_branch_and_bound)
89 searchTest.run(searchExample.problem1,"Problem 1")
90 # CSP
91 print("\n***** testing CSP *****")
92 import cspExamples, cspDFS, cspSearch, cspConsistency, cspSLS
93 cspExamples.test_csp(cspDFS.dfs_solve1)
94 cspExamples.test_csp(cspSearch.solver_from_searcher)
95 cspExamples.test_csp(cspConsistency.ac_solver)
96 cspExamples.test_csp(cspConsistency.ac_search_solver)

https://aipython.org Version 0.9.9 October 15, 2023

22 1. Python for Artificial Intelligence

97 cspExamples.test_csp(cspSLS.sls_solver)
98 cspExamples.test_csp(cspSLS.any_conflict_solver)
99 # Propositions
100 print("\n***** testing Propositional Logic *****")
101 import logicBottomUp, logicTopDown, logicExplain, logicNegation
102 logicBottomUp.test()
103 logicTopDown.test()
104 logicExplain.test()
105 logicNegation.test()
106 # Planning
107 print("\n***** testing Planning *****")
108 import stripsHeuristic
109 stripsHeuristic.test_forward_heuristic()
110 stripsHeuristic.test_regression_heuristic()
111 # Learning
112 print("\n***** testing Learning *****")
113 import learnProblem, learnNoInputs, learnDT, learnLinear
114 learnNoInputs.test_no_inputs(training_sizes=[4])
115 data = learnProblem.Data_from_file('data/carbool.csv', target_index=-1,
seed=123)
116 learnDT.testDT(data, print_tree=False)
117 learnLinear.test()
118 # Deep Learning: currently no tests
119 # Uncertainty
120 print("\n***** testing Uncertainty *****")
121 import probGraphicalModels, probRC, probVE, probStochSim
122 probGraphicalModels.InferenceMethod.testIM(probRC.ProbSearch)
123 probGraphicalModels.InferenceMethod.testIM(probRC.ProbRC)
124 probGraphicalModels.InferenceMethod.testIM(probVE.VE)
125 probGraphicalModels.InferenceMethod.testIM(probStochSim.RejectionSampling,
threshold=0.1)
126 probGraphicalModels.InferenceMethod.testIM(probStochSim.LikelihoodWeighting,
threshold=0.1)
127 probGraphicalModels.InferenceMethod.testIM(probStochSim.ParticleFiltering,
threshold=0.1)
128 probGraphicalModels.InferenceMethod.testIM(probStochSim.GibbsSampling,
threshold=0.1)
129 # Learning under uncertainty: currently no tests
130 # Causality: currently no tests
131 import decnNetworks
132 decnNetworks.test(decnNetworks.fire_dn)
133 # Reinforement Learning: currently no tests
134 # Multiagent systems: currently no tests
135 # Relational Learning: currently no tests

https://aipython.org Version 0.9.9 October 15, 2023

Chapter 2

Agent Architectures and

Hierarchical Control

This implements the controllers described in Chapter 2 of Poole and Mack-

worth [2023].
These provide sequential implementations of the control. More sophisti-
cated version may have them run concurrently (either as coroutines or in par-
allel).
In this version the higher-levels call the lower-levels. The higher-levels call-
ing the lower-level works in simulated environments when there is a single
agent, and where the lower-level are written to make sure they return (and
don’t go on forever), and the higher level doesn’t take too long (as the lower-
levels will wait until called again).

2.1 Representing Agents and Environments

In the initial implementation, both agents and the envoronment are treated as
objects in the send of object-oriented programs: they can have an internal state
they maintain, and can evaluate methods that can provide answers. This is the
same representation used for the reinforcement learning algorithms (Chapter
13).
An environment takes in actions of the agents, updates it internal state and
returns the next percept, using the method do.
An agent takes the precept, updates its internal state, and output it next
action. An agent implements the method select_action that takes percept
and returns its next action.

23
24 2. Agent Architectures and Hierarchical Control

The methods do and select_action are chained together to build a simula-

tor. In order to start this, we need either an action or a percept. There are two
variants used:

• An agent implements the initial_action() method which is used ini-

tially. This is the method used in the reinforcement learning chapter
(page 295).

• The environment implements the initial_percept() method which gives

the initial percept. This is the method used in this chapter.

In this implementation, the state of the agent and the state of the environ-
ment are represented using standard Python variables, which are updated as
the state changes. The percept and the actions are represented as variable-value
dictionaries. When agent has only a limited number of actions, the action can
be a single value.
In the following code raise NotImplementedError() is a way to specify
an abstract method that needs to be overridden in any implemented agent or
environment.
agents.py — Agent and Controllers
11 from display import Displayable
12
13 class Agent(Displayable):
14
15 def initial_action(self, percept):
16 """return the initial action"""
17 raise NotImplementedError("go") # abstract method
18
19 def select_action(self, percept):
20 """return the next action (and update internal state) given percept
21 percept is variable:value dictionary
22 """
23 raise NotImplementedError("go") # abstract method

The environment implements a do(action) method where action is a variable-

value dictionary. This returns a percept, which is also a variable-value dictio-
nary. The use of dictionaries allows for structured actions and percepts.
Note that Environment is a subclass of Displayable so that it can use the
display method described in Section 1.7.1.
agents.py — (continued)

25 class Environment(Displayable):
26 def initial_percept(self):
27 """returns the initial percept for the agent"""
28 raise NotImplementedError("initial_percept") # abstract method
29
30 def do(self, action):
31 """does the action in the environment

https://aipython.org Version 0.9.9 October 15, 2023

2.2. Paper buying agent and environment 25

32 returns the next percept """

33 raise NotImplementedError("Environment.do") # abstract method

The simulator lets the agent and the environment take turns in updating
their states and returning the action and the percept.
The first implementation is a simple procedure to carry out n steps of the
simulation and return the agent state and the environment state at the end.

agents.py — (continued)

35 class Simulate(Displayable):
36 """simulate the interaction between the agent and the environment
37 for n time steps.
38 Returns a pair of the agent state and the environment state.
39 """
40 def __init__(self,agent, environment):
41 self.agent = agent
42 self.env = environment
43 self.percept = self.env.initial_percept()
44 self.percept_history = [self.percept]
45 self.action_history = []
46
47 def go(self, n):
48 for i in range(n):
49 action = self.agent.select_action(self.percept)
50 self.display(2,f"i={i} action={action}")
51 self.percept = self.env.do(action)
52 self.display(2,f" percept={self.percept}")

2.2 Paper buying agent and environment

To run the demo, in folder ”aipython”, load ”agents.py”, using e.g.,
ipython -i agentBuying.py, and copy and paste the commented-out
commands at the bottom of that file.

This is an implementation of Example 2.1 of Poole and Mackworth [2023].

You might get different plots to Fugures 2.2 and 2.3 as there is randomness in
the environment.

2.2.1 The Environment

The environment state is given in terms of the time and the amount of paper in
stock. It also remembers the in-stock history and the price history. The percept
consistes of the price and the amount of paper in stock. The action of the agent
is the number to buy.
Here we assume that the prices are obtained from the prices list (which cy-
cles) plus a random integer in range [0, max price addon) plus a linear ”infla-

https://aipython.org Version 0.9.9 October 15, 2023

26 2. Agent Architectures and Hierarchical Control

tion”. The agent cannot access the price model; it just observes the prices and
the amount in stock.

agentBuying.py — Paper-buying agent

11 import random
12 from agents import Agent, Environment, Simulate
13 from utilities import pick_from_dist
14
15 class TP_env(Environment):
16 prices = [234, 234, 234, 234, 255, 255, 275, 275, 211, 211, 211,
17 234, 234, 234, 234, 199, 199, 275, 275, 234, 234, 234, 234, 255,
18 255, 260, 260, 265, 265, 265, 265, 270, 270, 255, 255, 260, 260,
19 265, 265, 150, 150, 265, 265, 270, 270, 255, 255, 260, 260, 265,
20 265, 265, 265, 270, 270, 211, 211, 255, 255, 260, 260, 265, 265,
21 260, 265, 270, 270, 205, 255, 255, 260, 260, 265, 265, 265, 265,
22 270, 270]
23 max_price_addon = 20 # maximum of random value added to get price
24
25 def __init__(self):
26 """paper buying agent"""
27 self.time=0
28 self.stock=20
29 self.stock_history = [] # memory of the stock history
30 self.price_history = [] # memory of the price history
31
32 def initial_percept(self):
33 """return initial percept"""
34 self.stock_history.append(self.stock)
35 price = self.prices[0]+random.randrange(self.max_price_addon)
36 self.price_history.append(price)
37 return {'price': price,
38 'instock': self.stock}
39
40 def do(self, action):
41 """does action (buy) and returns percept consiting of price and
instock"""
42 used = pick_from_dist({6:0.1, 5:0.1, 4:0.1, 3:0.3, 2:0.2, 1:0.2})
43 # used = pick_from_dist({7:0.1, 6:0.2, 5:0.2, 4:0.3, 3:0.1, 2:0.1})
# uses more paper
44 bought = action['buy']
45 self.stock = self.stock+bought-used
46 self.stock_history.append(self.stock)
47 self.time += 1
48 price = (self.prices[self.time%len(self.prices)] # repeating pattern
49 +random.randrange(self.max_price_addon) # plus randomness
50 +self.time//2) # plus inflation
51 self.price_history.append(price)
52 return {'price': price,
53 'instock': self.stock}

https://aipython.org Version 0.9.9 October 15, 2023

2.2. Paper buying agent and environment 27

2.2.2 The Agent

The agent does not have access to the price model but can only observe the
current price and the amount in stock. It has to decide how much to buy.
The belief state of the agent is an estimate of the average price of the paper,
and the total amount of money the agent has spent.
agentBuying.py — (continued)

55 class TP_agent(Agent):
56 def __init__(self):
57 self.spent = 0
58 percept = env.initial_percept()
59 self.ave = self.last_price = percept['price']
60 self.instock = percept['instock']
61 self.buy_history = []
62
63 def select_action(self, percept):
64 """return next action to caurry out
65 """
66 self.last_price = percept['price']
67 self.ave = self.ave+(self.last_price-self.ave)*0.05
68 self.instock = percept['instock']
69 if self.last_price < 0.9*self.ave and self.instock < 60:
70 tobuy = 48
71 elif self.instock < 12:
72 tobuy = 12
73 else:
74 tobuy = 0
75 self.spent += tobuy*self.last_price
76 self.buy_history.append(tobuy)
77 return {'buy': tobuy}
Set up an environment and an agent. Uncomment the last lines to run the agent
for 90 steps, and determine the average amount spent.
agentBuying.py — (continued)

79 env = TP_env()
80 ag = TP_agent()
81 sim = Simulate(ag,env)
82 #sim.go(90)
83 #ag.spent/env.time ## average spent per time period

2.2.3 Plotting
The following plots the price and number in stock history:
agentBuying.py — (continued)

85 import matplotlib.pyplot as plt

86
87 class Plot_history(object):

https://aipython.org Version 0.9.9 October 15, 2023

28 2. Agent Architectures and Hierarchical Control

88 """Set up the plot for history of price and number in stock"""

89 def __init__(self, ag, env):
90 self.ag = ag
91 self.env = env
92 plt.ion()
93 plt.xlabel("Time")
94 plt.ylabel("Value")
95
96
97 def plot_env_hist(self):
98 """plot history of price and instock"""
99 num = len(env.stock_history)
100 plt.plot(range(num),env.price_history,label="Price")
101 plt.plot(range(num),env.stock_history,label="In stock")
102 plt.legend()
103 #plt.draw()
104
105 def plot_agent_hist(self):
106 """plot history of buying"""
107 num = len(ag.buy_history)
108 plt.bar(range(1,num+1), ag.buy_history, label="Bought")
109 plt.legend()
110 #plt.draw()
111
112 # pl = Plot_history(ag,env)
113 # sim.go(90)
114 #pl.plot_env_hist()
115 #pl.plot_agent_hist()
Figure 2.1 shows the result of the plotting in the previous code.
Exercise 2.1 Design a better controller for a paper-buying agent.
• Justify a performance measure that is a fair comparison. Note that minimiz-
ing the total amount of money spent may be unfair to agents who have built
up a stockpile, and favors agents that end up with no paper.

• Give a controller that can work for many different price histories. An agent
can use other local state variables, but does not have access to the environ-
ment model.

• Is it worthwhile trying to infer the amount of paper that the home uses?
(Try your controller with the different paper consumption commented out
in TP_env.do.)

2.3 Hierarchical Controller

To run the hierarchical controller, in folder ”aipython”, load
”agentTop.py”, using e.g., ipython -i agentTop.py, and copy and
paste the commands near the bottom of that file.

https://aipython.org Version 0.9.9 October 15, 2023

2.3. Hierarchical Controller 29

300

250

200 Price
Value

In stock
150 Bought

100

0
0 20 40 60 80
Time
Figure 2.1: Percept and command traces for the paper-buying agent

In this implementation, each layer, including the top layer, implements the en-
vironment class, because each layer is seen as an environment from the layer
above.
We arbitrarily divide the environment and the body, so that the environ-
ment just defines the walls, and the body includes everything to do with the
agent. Note that the named locations are part of the (top-level of the) agent,
not part of the environment, although they could have been.

2.3.1 Environment
The environment defines the walls.

agentEnv.py — Agent environment

11 import math
12 from display import Displayable
13 from agents import Environment
14
15 class Rob_env(Environment):
16 def __init__(self,walls = {}):
17 """walls is a set of line segments
18 where each line segment is of the form ((x0,y0),(x1,y1))
19 """
20 self.walls = walls

https://aipython.org Version 0.9.9 October 15, 2023

30 2. Agent Architectures and Hierarchical Control

2.3.2 Body
The body defines everything about the agent body.

agentEnv.py — (continued)

22 import math
23 from agents import Environment
24 import matplotlib.pyplot as plt
25 import time
26
27 class Rob_body(Environment):
28 def __init__(self, env, init_pos=(0,0,90)):
29 """ env is the current environment
30 init_pos is a triple of (x-position, y-position, direction)
31 direction is in degrees; 0 is to right, 90 is straight-up, etc
32 """
33 self.env = env
34 self.rob_x, self.rob_y, self.rob_dir = init_pos
35 self.turning_angle = 18 # degrees that a left makes
36 self.whisker_length = 6 # length of the whisker
37 self.whisker_angle = 30 # angle of whisker relative to robot
38 self.crashed = False
39 # The following control how it is plotted
40 self.plotting = True # whether the trace is being plotted
41 self.sleep_time = 0.05 # time between actions (for real-time
plotting)
42 # The following are data structures maintained:
43 self.history = [(self.rob_x, self.rob_y)] # history of (x,y)
positions
44 self.wall_history = [] # history of hitting the wall
45
46 def percept(self):
47 return {'rob_x_pos':self.rob_x, 'rob_y_pos':self.rob_y,
48 'rob_dir':self.rob_dir, 'whisker':self.whisker(),
'crashed':self.crashed}
49 initial_percept = percept # use percept function for initial percept too
50
51 def do(self,action):
52 """ action is {'steer':direction}
53 direction is 'left', 'right' or 'straight'
54 """
55 if self.crashed:
56 return self.percept()
57 direction = action['steer']
58 compass_deriv =
{'left':1,'straight':0,'right':-1}[direction]*self.turning_angle
59 self.rob_dir = (self.rob_dir + compass_deriv +360)%360 # make in
range [0,360)
60 rob_x_new = self.rob_x + math.cos(self.rob_dir*math.pi/180)
61 rob_y_new = self.rob_y + math.sin(self.rob_dir*math.pi/180)
62 path = ((self.rob_x,self.rob_y),(rob_x_new,rob_y_new))

https://aipython.org Version 0.9.9 October 15, 2023

2.3. Hierarchical Controller 31

63 if any(line_segments_intersect(path,wall) for wall in

self.env.walls):
64 self.crashed = True
65 if self.plotting:
66 plt.plot([self.rob_x],[self.rob_y],"r*",markersize=20.0)
67 plt.draw()
68 self.rob_x, self.rob_y = rob_x_new, rob_y_new
69 self.history.append((self.rob_x, self.rob_y))
70 if self.plotting and not self.crashed:
71 plt.plot([self.rob_x],[self.rob_y],"go")
72 plt.draw()
73 plt.pause(self.sleep_time)
74 return self.percept()

The Boolean whisker method returns True when the whisker and the wall in-
tersect.

agentEnv.py — (continued)

76 def whisker(self):
77 """returns true whenever the whisker sensor intersects with a wall
78 """
79 whisk_ang_world = (self.rob_dir-self.whisker_angle)*math.pi/180
80 # angle in radians in world coordinates
81 wx = self.rob_x + self.whisker_length * math.cos(whisk_ang_world)
82 wy = self.rob_y + self.whisker_length * math.sin(whisk_ang_world)
83 whisker_line = ((self.rob_x,self.rob_y),(wx,wy))
84 hit = any(line_segments_intersect(whisker_line,wall)
85 for wall in self.env.walls)
86 if hit:
87 self.wall_history.append((self.rob_x, self.rob_y))
88 if self.plotting:
89 plt.plot([self.rob_x],[self.rob_y],"ro")
90 plt.draw()
91 return hit
92
93 def line_segments_intersect(linea,lineb):
94 """returns true if the line segments, linea and lineb intersect.
95 A line segment is represented as a pair of points.
96 A point is represented as a (x,y) pair.
97 """
98 ((x0a,y0a),(x1a,y1a)) = linea
99 ((x0b,y0b),(x1b,y1b)) = lineb
100 da, db = x1a-x0a, x1b-x0b
101 ea, eb = y1a-y0a, y1b-y0b
102 denom = db*ea-eb*da
103 if denom==0: # line segments are parallel
104 return False
105 cb = (da*(y0b-y0a)-ea*(x0b-x0a))/denom # position along line b
106 if cb<0 or cb>1:
107 return False
108 ca = (db*(y0b-y0a)-eb*(x0b-x0a))/denom # position along line a

https://aipython.org Version 0.9.9 October 15, 2023

32 2. Agent Architectures and Hierarchical Control

109 return 0<=ca<=1

110
111 # Test cases:
112 # assert line_segments_intersect(((0,0),(1,1)),((1,0),(0,1)))
113 # assert not line_segments_intersect(((0,0),(1,1)),((1,0),(0.6,0.4)))
114 # assert line_segments_intersect(((0,0),(1,1)),((1,0),(0.4,0.6)))

2.3.3 Middle Layer

The middle layer acts like both a controller (for the environment layer) and an
environment for the upper layer. It has to tell the environment how to steer.
Thus it calls env.do(·). It also is told the position to go to and the timeout. Thus
it also has to implement do(·).

agentMiddle.py — Middle Layer

11 from agents import Environment
12 import math
13
14 class Rob_middle_layer(Environment):
15 def __init__(self,env):
16 self.env=env
17 self.percept = env.initial_percept()
18 self.straight_angle = 11 # angle that is close enough to straight
ahead
19 self.close_threshold = 2 # distance that is close enough to arrived
20 self.close_threshold_squared = self.close_threshold**2 # just
compute it once
21
22 def initial_percept(self):
23 return {}
24
25 def do(self, action):
26 """action is {'go_to':target_pos,'timeout':timeout}
27 target_pos is (x,y) pair
28 timeout is the number of steps to try
29 returns {'arrived':True} when arrived is true
30 or {'arrived':False} if it reached the timeout
31 """
32 if 'timeout' in action:
33 remaining = action['timeout']
34 else:
35 remaining = -1 # will never reach 0
36 target_pos = action['go_to']
37 arrived = self.close_enough(target_pos)
38 while not arrived and remaining != 0:
39 self.percept = self.env.do({"steer":self.steer(target_pos)})
40 remaining -= 1
41 arrived = self.close_enough(target_pos)
42 return {'arrived':arrived}

https://aipython.org Version 0.9.9 October 15, 2023

2.3. Hierarchical Controller 33

The following method determines how to steer depending on whether the goal
is to the right or the left of where the robot is facing.
agentMiddle.py — (continued)

44 def steer(self,target_pos):
45 if self.percept['whisker']:
46 self.display(3,'whisker on', self.percept)
47 return "left"
48 else:
49 return self.head_towards(target_pos)
50
51 def head_towards(self,target_pos):
52 """ given a target position, return the action that heads
towards that position
53 """
54 gx,gy = target_pos
55 rx,ry = self.percept['rob_x_pos'],self.percept['rob_y_pos']
56 goal_dir = math.acos((gx-rx)/math.sqrt((gx-rx)*(gx-rx)
57 +(gy-ry)*(gy-ry)))*180/math.pi
58 if ry>gy:
59 goal_dir = -goal_dir
60 goal_from_rob = (goal_dir - self.percept['rob_dir']+540)%360-180
61 assert -180 < goal_from_rob <= 180
62 if goal_from_rob > self.straight_angle:
63 return "left"
64 elif goal_from_rob < -self.straight_angle:
65 return "right"
66 else:
67 return "straight"
68
69 def close_enough(self,target_pos):
70 gx,gy = target_pos
71 rx,ry = self.percept['rob_x_pos'],self.percept['rob_y_pos']
72 return (gx-rx)**2 + (gy-ry)**2 <= self.close_threshold_squared

2.3.4 Top Layer

The top layer treats the middle layer as its environment. Note that the top layer
is an environment for us to tell it what to visit.
agentTop.py — Top Layer
11 from display import Displayable
12 from agentMiddle import Rob_middle_layer
13 from agents import Environment
14
15 class Rob_top_layer(Environment):
16 def __init__(self, middle, timeout=200, locations = {'mail':(-5,10),
17 'o103':(50,10), 'o109':(100,10),'storage':(101,51)}
):
18 """middle is the middle layer

https://aipython.org Version 0.9.9 October 15, 2023

34 2. Agent Architectures and Hierarchical Control

19 timeout is the number of steps the middle layer goes before giving
up
20 locations is a loc:pos dictionary
21 where loc is a named location, and pos is an (x,y) position.
22 """
23 self.middle = middle
24 self.timeout = timeout # number of steps before the middle layer
should give up
25 self.locations = locations
26
27 def do(self,plan):
28 """carry out actions.
29 actions is of the form {'visit':list_of_locations}
30 It visits the locations in turn.
31 """
32 to_do = plan['visit']
33 for loc in to_do:
34 position = self.locations[loc]
35 arrived = self.middle.do({'go_to':position,
'timeout':self.timeout})
36 self.display(1,"Arrived at",loc,arrived)

2.3.5 Plotting
The following is used to plot the locations, the walls and (eventually) the move-
ment of the robot. It can either plot the movement if the robot as it is go-
ing (with the default env.plotting = True), or not plot it as it is going (setting
env.plotting = False; in this case the trace can be plotted using pl.plot run()).

agentTop.py — (continued)

38 import matplotlib.pyplot as plt

39
40 class Plot_env(Displayable):
41 def __init__(self, body,top):
42 """sets up the plot
43 """
44 self.body = body
45 self.top = top
46 plt.ion()
47 plt.axes().set_aspect('equal')
48 self.redraw()
49
50 def redraw(self):
51 plt.clf()
52 for wall in body.env.walls:
53 ((x0,y0),(x1,y1)) = wall
54 plt.plot([x0,x1],[y0,y1],"-k",linewidth=3)
55 for loc in top.locations:
56 (x,y) = top.locations[loc]

https://aipython.org Version 0.9.9 October 15, 2023

2.3. Hierarchical Controller 35

storage
50
40
30
20
10 mail o103 o109

0 20 40 60 80 100
Figure 2.2: A trace of the trajectory of the agent. Red dots correspond to the
whisker sensor being on; the green dot to the whisker sensor being off. The agent
starts at position (0, 0) facing up.

57 plt.plot([x],[y],"k<")
58 plt.text(x+1.0,y+0.5,loc) # print the label above and to the
right
59 plt.plot([body.rob_x],[body.rob_y],"go")
60 plt.gca().figure.canvas.draw()
61 if self.body.history or self.body.wall_history:
62 self.plot_run()
63
64 def plot_run(self):
65 """plots the history after the agent has finished.
66 This is typically only used if body.plotting==False
67 """
68 if self.body.history:
69 xs,ys = zip(*self.body.history)
70 plt.plot(xs,ys,"go")
71 if self.body.wall_history:
72 wxs,wys = zip(*self.body.wall_history)
73 plt.plot(wxs,wys,"ro")
The following code plots the agent as it acts in the world. Figure 2.2 shows
the result of the top.do
agentTop.py — (continued)

75 from agentEnv import Rob_body, Rob_env

76
77 env = Rob_env({((20,0),(30,20)), ((70,-5),(70,25))})
78 body = Rob_body(env)
79 middle = Rob_middle_layer(body)
80 top = Rob_top_layer(middle)
81
82 # try:

https://aipython.org Version 0.9.9 October 15, 2023

36 2. Agent Architectures and Hierarchical Control

0 goal

20
0 10 20 30 40 50 60 70

Figure 2.3: Robot trap

83 # pl=Plot_env(body,top)
84 # top.do({'visit':['o109','storage','o109','o103']})
85 # You can directly control the middle layer:
86 # middle.do({'go_to':(30,-10), 'timeout':200})
87 # Can you make it crash?

Exercise 2.2 The following code implements a robot trap (Figure 2.3). Write a
controller that can escape the “trap” and get to the goal. See Exercise 2.4 in the
textbook for hints.

agentTop.py — (continued)

89 # Robot Trap for which the current controller cannot escape:

90 trap_env = Rob_env({((10,-21),(10,0)), ((10,10),(10,31)),
((30,-10),(30,0)),
91 ((30,10),(30,20)), ((50,-21),(50,31)),
((10,-21),(50,-21)),
92 ((10,0),(30,0)), ((10,10),(30,10)), ((10,31),(50,31))})
93 trap_body = Rob_body(trap_env,init_pos=(-1,0,90))
94 trap_middle = Rob_middle_layer(trap_body)
95 trap_top = Rob_top_layer(trap_middle,locations={'goal':(71,0)})
96
97 # Robot trap exercise:
98 # pl=Plot_env(trap_body,trap_top)
99 # trap_top.do({'visit':['goal']})

Plotting for Moving Targets

Exercise 2.5 refers to targets that can move. The following implements targets
than can be moved by the user (using the mouse).
agentFollowTarget.py — Plotting for moving targets
11 import matplotlib.pyplot as plt
12 from agentTop import Plot_env, body, top

https://aipython.org Version 0.9.9 October 15, 2023

2.3. Hierarchical Controller 37

13
14 class Plot_follow(Plot_env):
15 def __init__(self, body, top, epsilon=2.5):
16 """plot the agent in the environment.
17 epsilon is the threshold how how close someone needs to click to
select a location.
18 """
19 Plot_env.__init__(self, body, top)
20 self.epsilon = epsilon
21 self.canvas = plt.gca().figure.canvas
22 self.canvas.mpl_connect('button_press_event', self.on_press)
23 self.canvas.mpl_connect('button_release_event', self.on_release)
24 self.canvas.mpl_connect('motion_notify_event', self.on_move)
25 self.pressloc = None
26 self.pressevent = None
27 for loc in self.top.locations:
28 self.display(2,f" loc {loc} at {self.top.locations[loc]}")
29
30 def on_press(self, event):
31 self.display(2,'v',end="")
32 self.display(2,f"Press at ({event.xdata},{event.ydata}")
33 for loc in self.top.locations:
34 lx,ly = self.top.locations[loc]
35 if abs(event.xdata- lx) <= self.epsilon and abs(event.ydata-
ly) <= self.epsilon :
36 self.pressloc = loc
37 self.pressevent = event
38 self.display(2,"moving",loc)
39
40 def on_release(self, event):
41 self.display(2,'ˆ',end="")
42 if self.pressloc is not None: #and event.inaxes ==
self.pressevent.inaxes:
43 self.top.locations[self.pressloc] = (event.xdata, event.ydata)
44 self.display(1,f"Placing {self.pressloc} at {(event.xdata,
event.ydata)}")
45 self.pressloc = None
46 self.pressevent = None
47
48 def on_move(self, event):
49 if self.pressloc is not None: # and event.inaxes ==
self.pressevent.inaxes:
50 self.display(2,'-',end="")
51 self.top.locations[self.pressloc] = (event.xdata, event.ydata)
52 self.redraw()
53 else:
54 self.display(2,'.',end="")
55
56 # try:
57 # pl=Plot_follow(body,top)

https://aipython.org Version 0.9.9 October 15, 2023

38 2. Agent Architectures and Hierarchical Control

58 # top.do({'visit':['o109','storage','o109','o103']})

Exercise 2.3 Change the code to also allow walls to move.

https://aipython.org Version 0.9.9 October 15, 2023

Chapter 3

Searching for Solutions

3.1 Representing Search Problems

A search problem consists of:

• a start node

• a neighbors function that given a node, returns an enumeration of the arcs

from the node

• a specification of a goal in terms of a Boolean function that takes a node

and returns true if the node is a goal

• a (optional) heuristic function that, given a node, returns a non-negative

real number. The heuristic function defaults to zero.

As far as the searcher is concerned a node can be anything. If multiple-path

pruning is used, a node must be hashable. In the simple examples, it is a string,
but in more complicated examples (in later chapters) it can be a tuple, a frozen
set, or a Python object.
In the following code, “raise NotImplementedError()” is a way to specify
that this is an abstract method that needs to be overridden to define an actual
search problem.
searchProblem.py — representations of search problems
11 from display import Displayable
12 import matplotlib.pyplot as plt
13 import random
14
15 class Search_problem(Displayable):
16 """A search problem consists of:

39
40 3. Searching for Solutions

17 * a start node
18 * a neighbors function that gives the neighbors of a node
19 * a specification of a goal
20 * a (optional) heuristic function.
21 The methods must be overridden to define a search problem."""
22
23 def start_node(self):
24 """returns start node"""
25 raise NotImplementedError("start_node") # abstract method
26
27 def is_goal(self,node):
28 """is True if node is a goal"""
29 raise NotImplementedError("is_goal") # abstract method
30
31 def neighbors(self,node):
32 """returns a list (or enumeration) of the arcs for the neighbors of
node"""
33 raise NotImplementedError("neighbors") # abstract method
34
35 def heuristic(self,n):
36 """Gives the heuristic value of node n.
37 Returns 0 if not overridden."""
38 return 0
The neighbors is a list of arcs. A (directed) arc consists of a from node node
and a to node node. The arc is the pair ⟨from node, to node⟩, but can also contain
a non-negative cost (which defaults to 1) and can be labeled with an action.
searchProblem.py — (continued)

40 class Arc(object):
41 """An arc has a from_node and a to_node node and a (non-negative)
cost"""
42 def __init__(self, from_node, to_node, cost=1, action=None):
43 self.from_node = from_node
44 self.to_node = to_node
45 self.action = action
46 self.cost = cost
47 assert cost >= 0, (f"Cost cannot be negative: {self}, cost={cost}")
48
49 def __repr__(self):
50 """string representation of an arc"""
51 if self.action:
52 return f"{self.from_node} --{self.action}--> {self.to_node}"
53 else:
54 return f"{self.from_node} --> {self.to_node}"

3.1.1 Explicit Representation of Search Graph

The first representation of a search problem is from an explicit graph (as op-
posed to one that is generated as needed).

https://aipython.org Version 0.9.9 October 15, 2023

3.1. Representing Search Problems 41

An explicit graph consists of

• a list or set of nodes

• a list or set of arcs

• a start node

• a list or set of goal nodes

• (optionally) a dictionary that maps a node to a heuristic value for that

node

To define a search problem, we need to define the start node, the goal predicate,
the neighbors function and the heuristic function.
searchProblem.py — (continued)

56 class Search_problem_from_explicit_graph(Search_problem):
57 """A search problem from an explicit graph.
58 """
59
60 def __init__(self, title, nodes, arcs, start=None, goals=set(), hmap={},
61 positions=None, show_costs = True):
62 """ A search problem consists of:
63 * list or set of nodes
64 * list or set of arcs
65 * start node
66 * list or set of goal nodes
67 * hmap: dictionary that maps each node into its heuristic value.
68 * positions: dictionary that maps each node into its (x,y) position
69 * show_costs is used for show()
70 """
71 self.title = title
72 self.neighs = {}
73 self.nodes = nodes
74 for node in nodes:
75 self.neighs[node]=[]
76 self.arcs = arcs
77 for arc in arcs:
78 self.neighs[arc.from_node].append(arc)
79 self.start = start
80 self.goals = goals
81 self.hmap = hmap
82 if positions is None:
83 self.positions = {node:(random.random(),random.random()) for
node in nodes}
84 else:
85 self.positions = positions
86 self.show_costs = show_costs
87
88

https://aipython.org Version 0.9.9 October 15, 2023

42 3. Searching for Solutions

89 def start_node(self):
90 """returns start node"""
91 return self.start
92
93 def is_goal(self,node):
94 """is True if node is a goal"""
95 return node in self.goals
96
97 def neighbors(self,node):
98 """returns the neighbors of node (a list of arcs)"""
99 return self.neighs[node]
100
101 def heuristic(self,node):
102 """Gives the heuristic value of node n.
103 Returns 0 if not overridden in the hmap."""
104 if node in self.hmap:
105 return self.hmap[node]
106 else:
107 return 0
108
109 def __repr__(self):
110 """returns a string representation of the search problem"""
111 res=""
112 for arc in self.arcs:
113 res += f"{arc}. "
114 return res

Graphical Display of a Search Graph

searchProblem.py — (continued)

116 def show(self, fontsize=10, node_color='orange', show_costs = None):

117 """Show the graph as a figure
118 """
119 self.fontsize = fontsize
120 if show_costs is not None: # override default defintion
121 self.show_costs = show_costs
122 plt.ion() # interactive
123 ax = plt.figure().gca()
124 ax.set_axis_off()
125 plt.title(self.title, fontsize=fontsize)
126 self.show_graph(ax, node_color)
127
128 def show_graph(self, ax, node_color='orange'):
129 bbox =
dict(boxstyle="round4,pad=1.0,rounding_size=0.5",facecolor=node_color)
130 for arc in self.arcs:
131 self.show_arc(ax, arc)
132 for node in self.nodes:
133 self.show_node(ax, node, node_color = node_color)

https://aipython.org Version 0.9.9 October 15, 2023

3.1. Representing Search Problems 43

134
135 def show_node(self, ax, node, node_color):
136 x,y = self.positions[node]
137 ax.text(x,y,node,bbox=dict(boxstyle="round4,pad=1.0,rounding_size=0.5",
138 facecolor=node_color),
ha='center',va='center',
139 fontsize=self.fontsize)
140
141 def show_arc(self, ax, arc, arc_color='black', node_color='white'):
142 from_pos = self.positions[arc.from_node]
143 to_pos = self.positions[arc.to_node]
144 ax.annotate(arc.to_node, from_pos, xytext=to_pos,
145 # arrowprops=dict(facecolor='black',
shrink=0.1, width=2),
146 arrowprops={'arrowstyle':'<|-', 'linewidth':
2, 'color':arc_color},
147 bbox=dict(boxstyle="round4,pad=1.0,rounding_size=0.5",
148 facecolor=node_color),
149 ha='center',va='center',
150 fontsize=self.fontsize)
151 # Add costs to middle of arcs:
152 if self.show_costs:
153 ax.text((from_pos[0]+to_pos[0])/2, (from_pos[1]+to_pos[1])/2,
154 arc.cost, bbox=dict(pad=1,fc='w',ec='w'),
155 ha='center',va='center',fontsize=self.fontsize)

3.1.2 Paths
A searcher will return a path from the start node to a goal node. A Python list
is not a suitable representation for a path, as many search algorithms consider
multiple paths at once, and these paths should share initial parts of the path.
If we wanted to do this with Python lists, we would need to keep copying the
list, which can be expensive if the list is long. An alternative representation is
used here in terms of a recursive data structure that can share subparts.
A path is either:

• a node (representing a path of length 0) or

• a path, initial and an arc, where the from node of the arc is the node at the
end of initial.

These cases are distinguished in the following code by having arc = None if
the path has length 0, in which case initial is the node of the path. Note that
we only use the most basic form of Python’s yield for enumerations (Section
1.5.4).
searchProblem.py — (continued)

157 class Path(object):

158 """A path is either a node or a path followed by an arc"""
159

https://aipython.org Version 0.9.9 October 15, 2023

44 3. Searching for Solutions

160 def init(self,initial,arc=None):

161 """initial is either a node (in which case arc is None) or
162 a path (in which case arc is an object of type Arc)"""
163 self.initial = initial
164 self.arc=arc
165 if arc is None:
166 self.cost=0
167 else:
168 self.cost = initial.cost+arc.cost
169
170 def end(self):
171 """returns the node at the end of the path"""
172 if self.arc is None:
173 return self.initial
174 else:
175 return self.arc.to_node
176
177 def nodes(self):
178 """enumerates the nodes for the path.
179 This enumerates the nodes in the path from the last elements
backwards.
180 """
181 current = self
182 while current.arc is not None:
183 yield current.arc.to_node
184 current = current.initial
185 yield current.initial
186
187 def initial_nodes(self):
188 """enumerates the nodes for the path before the end node.
189 This calls nodes() for the initial part of the path.
190 """
191 if self.arc is not None:
192 yield from self.initial.nodes()
193
194 def __repr__(self):
195 """returns a string representation of a path"""
196 if self.arc is None:
197 return str(self.initial)
198 elif self.arc.action:
199 return f"{self.initial}\n --{self.arc.action}-->
{self.arc.to_node}"
200 else:
201 return f"{self.initial} --> {self.arc.to_node}"

3.1.3 Example Search Problems

The first search problem is one with 5 nodes where the least-cost path is one
with many arcs. See Figure 3.1. Note that this example is used for the unit tests,

https://aipython.org Version 0.9.9 October 15, 2023

3.1. Representing Search Problems 45

A
1 3
1
C B
1 3
3
D G
1

Figure 3.1: problem1

A H
3
1 1
B 1 D G J
1
3 3

C E

Figure 3.2: problem2

so the test (in searchGeneric) will need to be changed if this is changed.

searchExample.py — Search Examples
11 from searchProblem import Arc, Search_problem_from_explicit_graph,
Search_problem
12
13 problem1 = Search_problem_from_explicit_graph('Problem 1',
14 {'A','B','C','D','G'},
15 [Arc('A','B',3), Arc('A','C',1), Arc('B','D',1), Arc('B','G',3),
16 Arc('C','B',1), Arc('C','D',3), Arc('D','G',1)],
17 start = 'A',
18 goals = {'G'},
19 positions={'A': (0, 1), 'B': (0.5, 0.5), 'C': (0,0.5), 'D': (0.5,0),
'G': (1,0)})
The second search problem is one with 8 nodes where many paths do not lead
to the goal. See Figure 3.2.
searchExample.py — (continued)

21 problem2 = Search_problem_from_explicit_graph('Problem 2',

https://aipython.org Version 0.9.9 October 15, 2023

46 3. Searching for Solutions

4 J 4 G 0
E 3 3
2 5
7 5 B H 3
F
3
2 2 4
C A D
3 4
9 7 6

Figure 3.3: simp delivery graph with arc costs and h values of nodes

22 {'A','B','C','D','E','G','H','J'},
23 [Arc('A','B',1), Arc('B','C',3), Arc('B','D',1), Arc('D','E',3),
24 Arc('D','G',1), Arc('A','H',3), Arc('H','J',1)],
25 start = 'A',
26 goals = {'G'},
27 positions={'A': (0, 1), 'B': (0, 3/4), 'C': (0,0), 'D': (1/4,3/4), 'E':
(1/4,0),
28 'G': (2/4,3/4), 'H': (3/4,1), 'J': (3/4,3/4)})
The third search problem is a disconnected graph (contains no arcs), where the
start node is a goal node. This is a boundary case to make sure that weird cases
work.
searchExample.py — (continued)

30 problem3 = Search_problem_from_explicit_graph('Problem 3',

31 {'a','b','c','d','e','g','h','j'},
32 [],
33 start = 'g',
34 goals = {'k','g'})
The simp delivery graph is the graph shown Figure 3.3. This is Figure 3.3
with the heuristics of Figure 3.1 as shown in Figure 3.13 of Poole and Mack-
worth [2023],
searchExample.py — (continued)

36 simp_delivery_graph = Search_problem_from_explicit_graph("Acyclic Delivery

Graph",
37 {'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J'},
38 [ Arc('A', 'B', 2),
39 Arc('A', 'C', 3),
40 Arc('A', 'D', 4),
41 Arc('B', 'E', 2),
42 Arc('B', 'F', 3),
43 Arc('C', 'J', 7),
44 Arc('D', 'H', 4),

https://aipython.org Version 0.9.9 October 15, 2023

3.1. Representing Search Problems 47

4 G
J
E 3
2
6 H
B F
3
2 2 4

C A D
3 4

Figure 3.4: cyclic simp delivery graph with arc costs

45 Arc('F', 'D', 2),

46 Arc('H', 'G', 3),
47 Arc('J', 'G', 4)],
48 start = 'A',
49 goals = {'G'},
50 hmap = {
51 'A': 7,
52 'B': 5,
53 'C': 9,
54 'D': 6,
55 'E': 3,
56 'F': 5,
57 'G': 0,
58 'H': 3,
59 'J': 4,
60 },
61 positions = {
62 'A': (0.4,0.1),
63 'B': (0.4,0.4),
64 'C': (0.1,0.1),
65 'D': (0.7,0.1),
66 'E': (0.6,0.7),
67 'F': (0.7,0.4),
68 'G': (0.7,0.9),
69 'H': (0.9,0.6),
70 'J': (0.3,0.9)
71 }
72 )

cyclic simp delivery graph is the graph shown Figure 3.4. This is the
graph of Figure 3.10 of [Poole and Mackworth, 2023]. The heuristic values
are the same as in simp delivery graph.

searchExample.py — (continued)

73 cyclic_simp_delivery_graph = Search_problem_from_explicit_graph("Cyclic

https://aipython.org Version 0.9.9 October 15, 2023

48 3. Searching for Solutions

Delivery Graph",
74 {'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J'},
75 [ Arc('A', 'B', 2),
76 Arc('A', 'C', 3),
77 Arc('A', 'D', 4),
78 Arc('B', 'E', 2),
79 Arc('B', 'F', 3),
80 Arc('C', 'A', 3),
81 Arc('C', 'J', 6),
82 Arc('D', 'A', 4),
83 Arc('D', 'H', 4),
84 Arc('F', 'B', 3),
85 Arc('F', 'D', 2),
86 Arc('G', 'H', 3),
87 Arc('G', 'J', 4),
88 Arc('H', 'D', 4),
89 Arc('H', 'G', 3),
90 Arc('J', 'C', 6),
91 Arc('J', 'G', 4)],
92 start = 'A',
93 goals = {'G'},
94 hmap = {
95 'A': 7,
96 'B': 5,
97 'C': 9,
98 'D': 6,
99 'E': 3,
100 'F': 5,
101 'G': 0,
102 'H': 3,
103 'J': 4,
104 },
105 positions = {
106 'A': (0.4,0.1),
107 'B': (0.4,0.4),
108 'C': (0.1,0.1),
109 'D': (0.7,0.1),
110 'E': (0.6,0.7),
111 'F': (0.7,0.4),
112 'G': (0.7,0.9),
113 'H': (0.9,0.6),
114 'J': (0.3,0.9)
115 })

The next problem is the tree graph shown in Figure 3.6, and is Figure 3.15
in Poole and Mackworth [2023].

searchExample.py — (continued)

117 tree_graph = Search_problem_from_explicit_graph("Tree Graph",

118 {'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
'O',

https://aipython.org Version 0.9.9 October 15, 2023

3.1. Representing Search Problems 49

Tree Graph
A

B C

D E F G

H I J K L M N

O P Q R S T U V

W X Y Z AA BB CC DD EE

FF GG HH II JJ KK

Figure 3.5: tree graph.show(show costs = False)

119 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'AA', 'BB',
'CC',
120 'DD', 'EE', 'FF', 'GG', 'HH', 'II', 'JJ', 'KK'},
121 [ Arc('A', 'B', 1),
122 Arc('A', 'C', 1),
123 Arc('B', 'D', 1),
124 Arc('B', 'E', 1),
125 Arc('C', 'F', 1),
126 Arc('C', 'G', 1),
127 Arc('D', 'H', 1),
128 Arc('D', 'I', 1),
129 Arc('E', 'J', 1),
130 Arc('E', 'K', 1),
131 Arc('F', 'L', 1),
132 Arc('G', 'M', 1),
133 Arc('G', 'N', 1),
134 Arc('H', 'O', 1),
135 Arc('H', 'P', 1),

https://aipython.org Version 0.9.9 October 15, 2023

50 3. Searching for Solutions

136 Arc('J', 'Q', 1),

137 Arc('J', 'R', 1),
138 Arc('L', 'S', 1),
139 Arc('L', 'T', 1),
140 Arc('N', 'U', 1),
141 Arc('N', 'V', 1),
142 Arc('O', 'W', 1),
143 Arc('P', 'X', 1),
144 Arc('P', 'Y', 1),
145 Arc('R', 'Z', 1),
146 Arc('R', 'AA', 1),
147 Arc('T', 'BB', 1),
148 Arc('T', 'CC', 1),
149 Arc('V', 'DD', 1),
150 Arc('V', 'EE', 1),
151 Arc('W', 'FF', 1),
152 Arc('X', 'GG', 1),
153 Arc('Y', 'HH', 1),
154 Arc('AA', 'II', 1),
155 Arc('CC', 'JJ', 1),
156 Arc('CC', 'KK', 1)
157 ],
158 start = 'A',
159 goals = {'K', 'M', 'T', 'X', 'Z', 'HH'},
160 positions = {
161 'A': (0.5,0.95),
162 'B': (0.3,0.8),
163 'C': (0.7,0.8),
164 'D': (0.2,0.65),
165 'E': (0.4,0.65),
166 'F': (0.6,0.65),
167 'G': (0.8,0.65),
168 'H': (0.2,0.5),
169 'I': (0.3,0.5),
170 'J': (0.4,0.5),
171 'K': (0.5,0.5),
172 'L': (0.6,0.5),
173 'M': (0.7,0.5),
174 'N': (0.8,0.5),
175 'O': (0.1,0.35),
176 'P': (0.2,0.35),
177 'Q': (0.3,0.35),
178 'R': (0.4,0.35),
179 'S': (0.5,0.35),
180 'T': (0.6,0.35),
181 'U': (0.7,0.35),
182 'V': (0.8,0.35),
183 'W': (0.1,0.2),
184 'X': (0.2,0.2),
185 'Y': (0.3,0.2),

https://aipython.org Version 0.9.9 October 15, 2023

3.2. Generic Searcher and Variants 51

186 'Z': (0.4,0.2),

187 'AA': (0.5,0.2),
188 'BB': (0.6,0.2),
189 'CC': (0.7,0.2),
190 'DD': (0.8,0.2),
191 'EE': (0.9,0.2),
192 'FF': (0.1,0.05),
193 'GG': (0.2,0.05),
194 'HH': (0.3,0.05),
195 'II': (0.5,0.05),
196 'JJ': (0.7,0.05),
197 'KK': (0.8,0.05)
198 },
199 show_costs = False
200 )
201
202 # tree_graph.show(show_costs = False)

3.2 Generic Searcher and Variants

To run the search demos, in folder “aipython”, load
“searchGeneric.py” , using e.g., ipython -i searchGeneric.py,
and copy and paste the example queries at the bottom of that file.

3.2.1 Searcher
A Searcher for a problem can be asked repeatedly for the next path. To solve
a problem, you can construct a Searcher object for the problem and then re-
peatedly ask for the next path using search. If there are no more paths, None is
returned.

searchGeneric.py — Generic Searcher, including depth-first and A*

11 from display import Displayable, visualize
12
13 class Searcher(Displayable):
14 """returns a searcher for a problem.
15 Paths can be found by repeatedly calling search().
16 This does depth-first search unless overridden
17 """
18 def __init__(self, problem):
19 """creates a searcher from a problem
20 """
21 self.problem = problem
22 self.initialize_frontier()
23 self.num_expanded = 0
24 self.add_to_frontier(Path(problem.start_node()))
25 super().__init__()

https://aipython.org Version 0.9.9 October 15, 2023

52 3. Searching for Solutions

26
27 def initialize_frontier(self):
28 self.frontier = []
29
30 def empty_frontier(self):
31 return self.frontier == []
32
33 def add_to_frontier(self,path):
34 self.frontier.append(path)
35
36 def search(self):
37 """returns (next) path from the problem's start node
38 to a goal node.
39 Returns None if no path exists.
40 """
41 while not self.empty_frontier():
42 self.path = self.frontier.pop()
43 self.num_expanded += 1
44 if self.problem.is_goal(self.path.end()): # solution found
45 self.solution = self.path # store the solution found
46 self.display(1, f"Solution: {self.path} (cost:
{self.path.cost})\n",
47 self.num_expanded, "paths have been expanded and",
48 len(self.frontier), "paths remain in the
frontier")
49 return self.path
50 else:
51 self.display(4,f"Expanding: {self.path} (cost:
{self.path.cost})")
52 neighs = self.problem.neighbors(self.path.end())
53 self.display(2,f"Expanding: {self.path} with neighbors
{neighs}")
54 for arc in reversed(list(neighs)):
55 self.add_to_frontier(Path(self.path,arc))
56 self.display(3, f"New frontier: {[p.end() for p in
self.frontier]}")
57
58 self.display(0,"No (more) solutions. Total of",
59 self.num_expanded,"paths expanded.")

Note that this reverses the neighbors so that it implements depth-first search in
an intuitive manner (expanding the first neighbor first). The call to list is for the
case when the neighbors are generated (and not already in a list). Reversing the
neighbors might not be required for other methods. The calls to reversed and
list can be removed, and the algorithm still implements depth-fist search.
To use depth-first search to find multiple paths for problem1 and simp delivery graph,
copy and paste the following into Python’s read-evaluate-print loop; keep find-
ing next solutions until there are no more:

searchGeneric.py — (continued)

https://aipython.org Version 0.9.9 October 15, 2023

3.2. Generic Searcher and Variants 53

Expanding: A --> B

J 4 G

3
E
H
2
7
B 3 F
4
2 2
red: selected
C 3 A 4 D blue: neighbors
green: frontier
yellow: goal

step fine step auto search quit

Figure 3.6: SearcherGUI(Searcher, simp delivery graph).go()

61 # Depth-first search for problem1; do the following:

62 # searcher1 = Searcher(searchExample.problem1)
63 # searcher1.search() # find first solution
64 # searcher1.search() # find next solution (repeat until no solutions)
65 # searcher_sdg = Searcher(searchExample.simp_delivery_graph)
66 # searcher_sdg.search() # find first or next solution

Exercise 3.1 Implement breadth-first search. Only add to frontier and/or pop need
to be modified to implement a first-in first-out queue.

3.2.2 GUI for Tracing Search

This GUI implements most of the functionality of the AISpace.org search app.
Figure 3.6 shows the GUI to step through various algorithms. Here the path
A → B is being expanded, and the neighbors are E and F. The other nodes at
the end of paths of the frontier are C and D. Thus the frontier contains paths to
C and D, used to also contain A → B, and now will contain A → B → E and
A → B → F.
SearcherGUI takes a search class and a problem, and let’s one explore the
search space after calling go(). A GUI can only be used for one search; at the
end of the search the loop ends and the buttons no longer work.

https://aipython.org Version 0.9.9 October 15, 2023

54 3. Searching for Solutions

This is implemented by redefining display. The search algorithms don’t

need to be modified. Ig you modify them (or create your won) you just have to
be careful to use the appropriate number for the display. The first argument to
display has the following meanings:

1. a solution has been found

2. what is shown for a “step” on a GUI; here it is assumed to the the path,
the neighbors of the end of the path, and the other nodes at the end of
paths on the frontier

3. (shown with “fine step” but not with “step”) the frontier and the path
selected

4. (shown with “fine step” but not with “step”) the frontier.

It is also useful to look at the Python console, as the display information is

printed there.
searchGUI.py — GUI for search
11 import matplotlib.pyplot as plt
12 from matplotlib.widgets import Button
13 import time
14
15 class SearcherGUI(object):
16 def __init__(self, SearchClass, problem, fontsize=10,
17 colors = {'selected':'red', 'neighbors':'blue',
'frontier':'green', 'goal':'yellow'}):
18 self.problem = problem
19 self.searcher = SearchClass(problem)
20 self.problem.fontsize = fontsize
21 self.colors = colors
22 #self.go()
23
24 def go(self):
25 fig, self.ax = plt.subplots()
26 plt.ion() # interactive
27 self.ax.set_axis_off()
28 plt.subplots_adjust(bottom=0.15)
29 step_butt = Button(plt.axes([0.05,0.02,0.15,0.05]), "step")
30 step_butt.on_clicked(self.step)
31 fine_butt = Button(plt.axes([0.25,0.02,0.15,0.05]), "fine step")
32 fine_butt.on_clicked(self.finestep)
33 auto_butt = Button(plt.axes([0.45,0.02,0.15,0.05]), "auto search")
34 auto_butt.on_clicked(self.auto)
35 quit_butt = Button(plt.axes([0.65,0.02,0.15,0.05]), "quit")
36 quit_butt.on_clicked(self.quit)
37 self.ax.text(0.85,0, '\n'.join(self.colors[a]+": "+a for a in
self.colors))
38 self.problem.show_graph(self.ax, node_color='white')

https://aipython.org Version 0.9.9 October 15, 2023

3.2. Generic Searcher and Variants 55

39 self.problem.show_node(self.ax, self.problem.start,
self.colors['frontier'])
40 for node in self.problem.nodes:
41 if self.problem.is_goal(node):
42 self.problem.show_node(self.ax, node,self.colors['goal'])
43 plt.show()
44 self.click = 7 # bigger than any display!
45 #while self.click == 0:
46 # plt.pause(0.1)
47 self.searcher.display = self.display
48 try:
49 while self.searcher.frontier:
50 path = self.searcher.search()
51 except ExitToPython:
52 print("Exited")
53 else:
54 print("No more solutions")
55
56 def display(self, level,*args,**nargs):
57 if level <= self.click: #step
58 print(*args, **nargs)
59 self.ax.set_title(f"Expanding:
{self.searcher.path}",fontsize=self.problem.fontsize)
60 if level == 1:
61 self.show_frontier(self.colors['frontier'])
62 self.show_path(self.colors['selected'])
63 self.ax.set_title(f"Solution Found:
{self.searcher.path}",fontsize=self.problem.fontsize)
64 elif level == 2: # what should be shown if a node is in all
three?
65 self.show_frontier(self.colors['frontier'])
66 self.show_path(self.colors['selected'])
67 self.show_neighbors(self.colors['neighbors'])
68 elif level == 3:
69 self.show_frontier(self.colors['frontier'])
70 self.show_path(self.colors['selected'])
71 elif level == 4:
72 self.show_frontier(self.colors['frontier'])
73
74
75 # wait for a button click
76 self.click = 0
77 plt.draw()
78 while self.click == 0:
79 plt.pause(0.1)
80 # undo coloring:
81 self.ax.set_title("")
82 self.show_frontier('white')
83 self.show_neighbors('white')
84 path_show = self.searcher.path

https://aipython.org Version 0.9.9 October 15, 2023

56 3. Searching for Solutions

85 while path_show.arc:
86 self.problem.show_arc(self.ax, path_show.arc, 'black')
87 self.problem.show_node(self.ax, path_show.end(), 'white')
88 path_show = path_show.initial
89 self.problem.show_node(self.ax, path_show.end(), 'white')
90 if self.problem.is_goal(self.searcher.path.end()):
91 self.problem.show_node(self.ax, self.searcher.path.end(),
self.colors['goal'])
92 plt.draw()
93
94 def show_frontier(self, color):
95 for path in self.searcher.frontier:
96 self.problem.show_node(self.ax, path.end(), color)
97
98 def show_path(self, color):
99 """color selected path"""
100 path_show = self.searcher.path
101 while path_show.arc:
102 self.problem.show_arc(self.ax, path_show.arc, color)
103 self.problem.show_node(self.ax, path_show.end(), color)
104 path_show = path_show.initial
105 self.problem.show_node(self.ax, path_show.end(), color)
106
107 def show_neighbors(self, color):
108 for neigh in self.problem.neighbors(self.searcher.path.end()):
109 self.problem.show_node(self.ax, neigh.to_node, color)
110
111 def auto(self,event):
112 self.click = 1
113 def step(self,event):
114 self.click = 2
115 def finestep(self,event):
116 self.click = 3
117 def quit(self,event):
118 quit()
119
120 class ExitToPython(Exception):
121 pass

searchGUI.py — (continued)

123 from searchGeneric import Searcher, AStarSearcher

124 from searchMPP import SearcherMPP
125 import searchExample
126 from searchBranchAndBound import DF_branch_and_bound
127
128 # to demonstrate depth-first search:
129 # sdfs = SearcherGUI(Searcher, searchExample.tree_graph); sdfs.go()
130
131 # delivery graph examples:
132 # sh = SearcherGUI(Searcher, searchExample.simp_delivery_graph); sh.go()

https://aipython.org Version 0.9.9 October 15, 2023

3.2. Generic Searcher and Variants 57

133 # sha = SearcherGUI(AStarSearcher, searchExample.simp_delivery_graph);

sha.go()
134 # shac = SearcherGUI(AStarSearcher,
searchExample.cyclic_simp_delivery_graph); shac.go()
135 # shm = SearcherGUI(SearcherMPP,
searchExample.cyclic_simp_delivery_graph); shm.go()
136 # shb = SearcherGUI(DF_branch_and_bound,
searchExample.simp_delivery_graph); shb.go()
137
138 # The following is AI:FCA figure 3.15, and is useful to show branch&bound:
139 # shbt = SearcherGUI(DF_branch_and_bound, searchExample.tree_graph);
shbt.go()

3.2.3 Frontier as a Priority Queue

In many of the search algorithms, such as A∗ and other best-first searchers,
the frontier is implemented as a priority queue. The following code uses the
Python’s built-in priority queue implementations, heapq.
Following the lead of the Python documentation, http://docs.python.org/
3.9/library/heapq.html, a frontier is a list of triples. The first element of each
triple is the value to be minimized. The second element is a unique index which
specifies the order that the elements were added to the queue, and the third el-
ement is the path that is on the queue. The use of the unique index ensures that
the priority queue implementation does not compare paths; whether one path
is less than another is not defined. It also lets us control what sort of search
(e.g., depth-first or breadth-first) occurs when the value to be minimized does
not give a unique next path.
The variable frontier index is the total number of elements of the frontier
that have been created. As well as being used as the unique index, it is useful
for statistics, particularly in conjunction with the current size of the frontier.
searchGeneric.py — (continued)

68 import heapq # part of the Python standard library

69 from searchProblem import Path
70
71 class FrontierPQ(object):
72 """A frontier consists of a priority queue (heap), frontierpq, of
73 (value, index, path) triples, where
74 * value is the value we want to minimize (e.g., path cost + h).
75 * index is a unique index for each element
76 * path is the path on the queue
77 Note that the priority queue always returns the smallest element.
78 """
79
80 def __init__(self):
81 """constructs the frontier, initially an empty priority queue
82 """
83 self.frontier_index = 0 # the number of items added to the frontier

https://aipython.org Version 0.9.9 October 15, 2023

58 3. Searching for Solutions

84 self.frontierpq = [] # the frontier priority queue

85
86 def empty(self):
87 """is True if the priority queue is empty"""
88 return self.frontierpq == []
89
90 def add(self, path, value):
91 """add a path to the priority queue
92 value is the value to be minimized"""
93 self.frontier_index += 1 # get a new unique index
94 heapq.heappush(self.frontierpq,(value, -self.frontier_index, path))
95
96 def pop(self):
97 """returns and removes the path of the frontier with minimum value.
98 """
99 (_,_,path) = heapq.heappop(self.frontierpq)
100 return path

The following methods are used for finding and printing information about
the frontier.
searchGeneric.py — (continued)

102 def count(self,val):

103 """returns the number of elements of the frontier with value=val"""
104 return sum(1 for e in self.frontierpq if e[0]==val)
105
106 def __repr__(self):
107 """string representation of the frontier"""
108 return str([(n,c,str(p)) for (n,c,p) in self.frontierpq])
109
110 def __len__(self):
111 """length of the frontier"""
112 return len(self.frontierpq)
113
114 def __iter__(self):
115 """iterate through the paths in the frontier"""
116 for (_,_,path) in self.frontierpq:
117 yield path

3.2.4 A∗ Search
For an A∗ Search the frontier is implemented using the FrontierPQ class.
searchGeneric.py — (continued)

119 class AStarSearcher(Searcher):

120 """returns a searcher for a problem.
121 Paths can be found by repeatedly calling search().
122 """
123
124 def __init__(self, problem):

https://aipython.org Version 0.9.9 October 15, 2023

3.2. Generic Searcher and Variants 59

125 super().__init__(problem)
126
127 def initialize_frontier(self):
128 self.frontier = FrontierPQ()
129
130 def empty_frontier(self):
131 return self.frontier.empty()
132
133 def add_to_frontier(self,path):
134 """add path to the frontier with the appropriate cost"""
135 value = path.cost+self.problem.heuristic(path.end())
136 self.frontier.add(path, value)

Code should always be tested. The following provides a simple unit test,
using problem1 as the the default problem.

searchGeneric.py — (continued)

138 import searchExample

139
140 def test(SearchClass, problem=searchExample.problem1,
solutions=[['G','D','B','C','A']] ):
141 """Unit test for aipython searching algorithms.
142 SearchClass is a class that takes a problem and implements search()
143 problem is a search problem
144 solutions is a list of optimal solutions
145 """
146 print("Testing problem 1:")
147 schr1 = SearchClass(problem)
148 path1 = schr1.search()
149 print("Path found:",path1)
150 assert path1 is not None, "No path is found in problem1"
151 assert list(path1.nodes()) in solutions, "Shortest path not found in
problem1"
152 print("Passed unit test")
153
154 if __name__ == "__main__":
155 #test(Searcher) # what needs to be changed to make this succeed?
156 test(AStarSearcher)
157
158 # example queries:
159 # searcher1 = Searcher(searchExample.simp_delivery_graph) # DFS
160 # searcher1.search() # find first path
161 # searcher1.search() # find next path
162 # searcher2 = AStarSearcher(searchExample.simp_delivery_graph) # A*
163 # searcher2.search() # find first path
164 # searcher2.search() # find next path
165 # searcher3 = Searcher(searchExample.cyclic_simp_delivery_graph) # DFS
166 # searcher3.search() # find first path with DFS. What do you expect to
happen?
167 # searcher4 = AStarSearcher(searchExample.cyclic_simp_delivery_graph) # A*
168 # searcher4.search() # find first path

https://aipython.org Version 0.9.9 October 15, 2023

60 3. Searching for Solutions

Exercise 3.2 Change the code so that it implements (i) best-first search and (ii)
lowest-cost-first search. For each of these methods compare it to A∗ in terms of the
number of paths expanded, and the path found.
Exercise 3.3 The searcher acts like a Python iterator, in that it returns one value
(here a path) and then returns other values (paths) on demand, but does not imple-
ment the iterator interface. Change the code so it implements the iterator interface.
What does this enable us to do?

3.2.5 Multiple Path Pruning

To run the multiple-path pruning demo, in folder “aipython”, load
“searchMPP.py” , using e.g., ipython -i searchMPP.py, and copy and
paste the example queries at the bottom of that file.

The following implements A∗ with multiple-path pruning. It overrides search()

in Searcher.

searchMPP.py — Searcher with multiple-path pruning

11 from searchGeneric import AStarSearcher, visualize
12 from searchProblem import Path
13
14 class SearcherMPP(AStarSearcher):
15 """returns a searcher for a problem.
16 Paths can be found by repeatedly calling search().
17 """
18 def __init__(self, problem):
19 super().__init__(problem)
20 self.explored = set()
21
22 def search(self):
23 """returns next path from an element of problem's start nodes
24 to a goal node.
25 Returns None if no path exists.
26 """
27 while not self.empty_frontier():
28 self.path = self.frontier.pop()
29 if self.path.end() not in self.explored:
30 self.explored.add(self.path.end())
31 self.num_expanded += 1
32 if self.problem.is_goal(self.path.end()):
33 self.solution = self.path # store the solution found
34 self.display(1, f"Solution: {self.path} (cost:
{self.path.cost})\n",
35 self.num_expanded, "paths have been expanded and",
36 len(self.frontier), "paths remain in the
frontier")
37 return self.path
38 else:

https://aipython.org Version 0.9.9 October 15, 2023

3.2. Generic Searcher and Variants 61

39 self.display(4,f"Expanding: {self.path} (cost:

{self.path.cost})")
40 neighs = self.problem.neighbors(self.path.end())
41 self.display(2,f"Expanding: {self.path} with neighbors
{neighs}")
42 for arc in neighs:
43 self.add_to_frontier(Path(self.path,arc))
44 self.display(3, f"New frontier: {[p.end() for p in
self.frontier]}")
45 self.display(0,"No (more) solutions. Total of",
46 self.num_expanded,"paths expanded.")
47
48 from searchGeneric import test
49 if __name__ == "__main__":
50 test(SearcherMPP)
51
52 import searchExample
53 # searcherMPPcdp = SearcherMPP(searchExample.cyclic_simp_delivery_graph)
54 # searcherMPPcdp.search() # find first path

Exercise 3.4 Chris was very puzzled as to why there was a minus (“−”) in the
second element of the tuple added to the heap in the add method in FrontierPQ in
searchGeneric.py.
Sam suggested the following example would demonstrate the importance of
the minus. Consider an infinite integer grid, where the states are pairs of integers,
the start is (0,0), and the goal is (10,10). The neighbors of (i, j) are (i + 1, j) and (i, j +
1). Consider the heuristic function h((i, j)) = |10 − i| + |10 − j|. Sam suggested you
compare how many paths are expanded with the minus and without the minus.
searchGrid is a representation of Sam’s graph. If something takes too long, you
might consider changing the size.

searchGrid.py — A grid problem to demonstrate A*

11 from searchProblem import Search_problem, Arc
12
13 class GridProblem(Search_problem):
14 """a node is a pair (x,y)"""
15 def __init__(self, size=10):
16 self.size = size
17
18 def start_node(self):
19 """returns the start node"""
20 return (0,0)
21
22 def is_goal(self,node):
23 """returns True when node is a goal node"""
24 return node == (self.size,self.size)
25
26 def neighbors(self,node):
27 """returns a list of the neighbors of node"""
28 (x,y) = node

https://aipython.org Version 0.9.9 October 15, 2023

62 3. Searching for Solutions

29 return [Arc(node,(x+1,y)), Arc(node,(x,y+1))]

30
31 def heuristic(self,node):
32 (x,y) = node
33 return abs(x-self.size)+abs(y-self.size)
34
35 class GridProblemNH(GridProblem):
36 """Grid problem with a heuristic of 0"""
37 def heuristic(self,node):
38 return 0
39
40 from searchGeneric import Searcher, AStarSearcher
41 from searchMPP import SearcherMPP
42 from searchBranchAndBound import DF_branch_and_bound
43
44 def testGrid(size = 10):
45 print("\nWith MPP")
46 gridsearchermpp = SearcherMPP(GridProblem(size))
47 print(gridsearchermpp.search())
48 print("\nWithout MPP")
49 gridsearchera = AStarSearcher(GridProblem(size))
50 print(gridsearchera.search())
51 print("\nWith MPP and a heuristic = 0 (Dijkstra's algorithm)")
52 gridsearchermppnh = SearcherMPP(GridProblemNH(size))
53 print(gridsearchermppnh.search())

Explain to Chris what the minus does and why it is there. Give evidence for your
claims. It might be useful to refer to other search strategies in your explanation.
As part of your explanation, explain what is special about Sam’s example.
Exercise 3.5 Implement a searcher that implements cycle pruning instead of
multiple-path pruning. You need to decide whether to check for cycles when paths
are added to the frontier or when they are removed. (Hint: either method can be
implemented by only changing one or two lines in SearcherMPP. Hint: there is
a cycle if path.end() in path.initial_nodes() ) Compare no pruning, multiple
path pruning and cycle pruning for the cyclic delivery problem. Which works
better in terms of number of paths expanded, computational time or space?

3.3 Branch-and-bound Search

To run the demo, in folder “aipython”, load
“searchBranchAndBound.py”, and copy and paste the example queries
at the bottom of that file.

Depth-first search methods do not need an a priority queue, but can use
a list as a stack. In this implementation of branch-and-bound search, we call
search to find an optimal solution with cost less than bound. This uses depth-
first search to find a path to a goal that extends path with cost less than the

https://aipython.org Version 0.9.9 October 15, 2023

3.3. Branch-and-bound Search 63

bound. Once a path to a goal has been found, that path is remembered as the
best path, the bound is reduced, and the search continues.
searchBranchAndBound.py — Branch and Bound Search
11 from searchProblem import Path
12 from searchGeneric import Searcher
13 from display import Displayable, visualize
14
15 class DF_branch_and_bound(Searcher):
16 """returns a branch and bound searcher for a problem.
17 An optimal path with cost less than bound can be found by calling
search()
18 """
19 def __init__(self, problem, bound=float("inf")):
20 """creates a searcher than can be used with search() to find an
optimal path.
21 bound gives the initial bound. By default this is infinite -
meaning there
22 is no initial pruning due to depth bound
23 """
24 super().__init__(problem)
25 self.best_path = None
26 self.bound = bound
27
28 def search(self):
29 """returns an optimal solution to a problem with cost less than
bound.
30 returns None if there is no solution with cost less than bound."""
31 self.frontier = [Path(self.problem.start_node())]
32 self.num_expanded = 0
33 while self.frontier:
34 self.path = self.frontier.pop()
35 if self.path.cost+self.problem.heuristic(self.path.end()) <
self.bound:
36 # if self.path.end() not in self.path.initial_nodes(): # for
cycle pruning
37 self.display(2,"Expanding:",self.path,"cost:",self.path.cost)
38 self.num_expanded += 1
39 if self.problem.is_goal(self.path.end()):
40 self.best_path = self.path
41 self.bound = self.path.cost
42 self.display(1,"New best path:",self.path,"
cost:",self.path.cost)
43 else:
44 neighs = self.problem.neighbors(self.path.end())
45 self.display(4,"Neighbors are", neighs)
46 for arc in reversed(list(neighs)):
47 self.add_to_frontier(Path(self.path, arc))
48 self.display(3, f"New frontier: {[p.end() for p in
self.frontier]}")
49 self.path = self.best_path

https://aipython.org Version 0.9.9 October 15, 2023

64 3. Searching for Solutions

50 self.solution = self.best_path
51 self.display(1,f"Optimal solution is {self.best_path}." if
self.best_path
52 else "No solution found.",
53 f"Number of paths expanded: {self.num_expanded}.")
54 return self.best_path
Note that this code used reversed in order to expand the neighbors of a node
in the left-to-right order one might expect. It does this because pop() removes
the rightmost element of the list. The call to list is there because reversed only
works on lists and tuples, but the neighbors can be generated.
Here is a unit test and some queries:
searchBranchAndBound.py — (continued)

56 from searchGeneric import test

57 if __name__ == "__main__":
58 test(DF_branch_and_bound)
59
60 # Example queries:
61 import searchExample
62 # searcherb1 = DF_branch_and_bound(searchExample.simp_delivery_graph)
63 # searcherb1.search() # find optimal path
64 # searcherb2 =
DF_branch_and_bound(searchExample.cyclic_simp_delivery_graph,
bound=100)
65 # searcherb2.search() # find optimal path

Exercise 3.6 In searcherb2, in the code above, what happens if the bound is
smaller, say 10? What if it larger, say 1000?
Exercise 3.7 Implement a branch-and-bound search uses recursion. Hint: you
don’t need an explicit frontier, but can do a recursive call for the children.
Exercise 3.8 After the branch-and-bound search found a solution, Sam ran search
again, and noticed a different count. Sam hypothesized that this count was related
to the number of nodes that an A∗ search would use (either expand or be added to
the frontier). Or maybe, Sam thought, the count for a number of nodes when the
bound is slightly above the optimal path case is related to how A∗ would work.
Is there relationship between these counts? Are there different things that it could
count so they are related? Try to find the most specific statement that is true, and
explain why it is true.
To test the hypothesis, Sam wrote the following code, but isn’t sure it is helpful:

searchTest.py — code that may be useful to compare A* and branch-and-bound

11 from searchGeneric import Searcher, AStarSearcher
12 from searchBranchAndBound import DF_branch_and_bound
13 from searchMPP import SearcherMPP
14
15 DF_branch_and_bound.max_display_level = 1
16 Searcher.max_display_level = 1

https://aipython.org Version 0.9.9 October 15, 2023

3.3. Branch-and-bound Search 65

17
18 def run(problem,name):
19 print("\n\n*******",name)
20
21 print("\nA*:")
22 asearcher = AStarSearcher(problem)
23 print("Path found:",asearcher.search()," cost=",asearcher.solution.cost)
24 print("there are",asearcher.frontier.count(asearcher.solution.cost),
25 "elements remaining on the queue with
f-value=",asearcher.solution.cost)
26
27 print("\nA* with MPP:"),
28 msearcher = SearcherMPP(problem)
29 print("Path found:",msearcher.search()," cost=",msearcher.solution.cost)
30 print("there are",msearcher.frontier.count(msearcher.solution.cost),
31 "elements remaining on the queue with
f-value=",msearcher.solution.cost)
32
33 bound = asearcher.solution.cost+0.01
34 print("\nBranch and bound (with too-good initial bound of", bound,")")
35 tbb = DF_branch_and_bound(problem,bound) # cheating!!!!
36 print("Path found:",tbb.search()," cost=",tbb.solution.cost)
37 print("Rerunning B&B")
38 print("Path found:",tbb.search())
39
40 bbound = asearcher.solution.cost*2+10
41 print("\nBranch and bound (with not-very-good initial bound of",
bbound, ")")
42 tbb2 = DF_branch_and_bound(problem,bbound)
43 print("Path found:",tbb2.search()," cost=",tbb2.solution.cost)
44 print("Rerunning B&B")
45 print("Path found:",tbb2.search())
46
47 print("\nDepth-first search: (Use ˆC if it goes on forever)")
48 tsearcher = Searcher(problem)
49 print("Path found:",tsearcher.search()," cost=",tsearcher.solution.cost)
50
51
52 import searchExample
53 from searchTest import run
54 if __name__ == "__main__":
55 run(searchExample.problem1,"Problem 1")
56 # run(searchExample.simp_delivery_graph,"Acyclic Delivery")
57 # run(searchExample.cyclic_simp_delivery_graph,"Cyclic Delivery")
58 # also test some graphs with cycles, and some with multiple least-cost
paths

https://aipython.org Version 0.9.9 October 15, 2023

Chapter 4

Reasoning with Constraints

4.1 Constraint Satisfaction Problems

4.1.1 Variables
A variable consists of a name, a domain and an optional (x,y) position (for
displaying). The domain of a variable is a list or a tuple, as the ordering will
matter in the representation of constraints.
variable.py — Representations of a variable in CSPs and probabilistic models
11 import random
12
13 class Variable(object):
14 """A random variable.
15 name (string) - name of the variable
16 domain (list) - a list of the values for the variable.
17 Variables are ordered according to their name.
18 """
19
20 def __init__(self, name, domain, position=None):
21 """Variable
22 name a string
23 domain a list of printable values
24 position of form (x,y)
25 """
26 self.name = name # string
27 self.domain = domain # list of values
28 self.position = position if position else (random.random(),
random.random())
29 self.size = len(domain)
30
31 def __str__(self):

67
68 4. Reasoning with Constraints

32 return self.name
33
34 def __repr__(self):
35 return self.name # f"Variable({self.name})"

4.1.2 Constraints
A constraint consists of:

• A tuple (or list) of variables is called the scope.

• A condition is a Boolean function that takes the same number of argu-

ments as there are variables in the scope. The condition must have a
__name__ property that gives a printable name of the function; built-in
functions and functions that are defined using def have such a property;
for other functions you may need to define this property.

• An optional name

• An optional (x, y) position

cspProblem.py — Representations of a Constraint Satisfaction Problem

11 from variable import Variable
12
13 # for showing csps:
14 import matplotlib.pyplot as plt
15 import matplotlib.lines as lines
16
17 class Constraint(object):
18 """A Constraint consists of
19 * scope: a tuple of variables
20 * condition: a Boolean function that can applied to a tuple of values
for variables in scope
21 * string: a string for printing the constraints. All of the strings
must be unique.
22 for the variables
23 """
24 def __init__(self, scope, condition, string=None, position=None):
25 self.scope = scope
26 self.condition = condition
27 if string is None:
28 self.string = f"{self.condition.__name__}({self.scope})"
29 else:
30 self.string = string
31 self.position = position
32
33 def __repr__(self):
34 return self.string

https://aipython.org Version 0.9.9 October 15, 2023

4.1. Constraint Satisfaction Problems 69

An assignment is a variable:value dictionary.

If con is a constraint, con.holds(assignment) returns True or False depending
on whether the condition is true or false for that assignment. The assignment
assignment must assigns a value to every variable in the scope of the constraint
con (and could also assign values other variables); con.holds gives an error if
not all variables in the scope of con are assigned in the assignment. It ignores
variables in assignment that are not in the scope of the constraint.
In Python, the ∗ notation is used for unpacking a tuple. For example,
F(∗(1, 2, 3)) is the same as F(1, 2, 3). So if t has value (1, 2, 3), then F(∗t) is
the same as F(1, 2, 3).
cspProblem.py — (continued)

36 def can_evaluate(self, assignment):

37 """
38 assignment is a variable:value dictionary
39 returns True if the constraint can be evaluated given assignment
40 """
41 return all(v in assignment for v in self.scope)
42
43 def holds(self,assignment):
44 """returns the value of Constraint con evaluated in assignment.
45
46 precondition: all variables are assigned in assignment, ie
self.can_evaluate(assignment) is true
47 """
48 return self.condition(*tuple(assignment[v] for v in self.scope))

4.1.3 CSPs
A constraint satisfaction problem (CSP) requires:

• variables: a list or set of variables

• constraints: a set or list of constraints.

Other properties are inferred from these:

• var to const is a mapping from variables to set of constraints, such that

var to const[var] is the set of constraints with var in the scope.

cspProblem.py — (continued)

50 class CSP(object):
51 """A CSP consists of
52 * a title (a string)
53 * variables, a set of variables
54 * constraints, a list of constraints
55 * var_to_const, a variable to set of constraints dictionary
56 """

https://aipython.org Version 0.9.9 October 15, 2023

70 4. Reasoning with Constraints

57 def init(self, title, variables, constraints):

58 """title is a string
59 variables is set of variables
60 constraints is a list of constraints
61 """
62 self.title = title
63 self.variables = variables
64 self.constraints = constraints
65 self.var_to_const = {var:set() for var in self.variables}
66 for con in constraints:
67 for var in con.scope:
68 self.var_to_const[var].add(con)
69
70 def __str__(self):
71 """string representation of CSP"""
72 return str(self.title)
73
74 def __repr__(self):
75 """more detailed string representation of CSP"""
76 return f"CSP({self.title}, {self.variables}, {([str(c) for c in
self.constraints])})"
csp.consistent(assignment) returns true if the assignment is consistent with each
of the constraints in csp (i.e., all of the constraints that can be evaluated evaluate
to true). Note that this is a local consistency with each constraint; it does not
imply the CSP is consistent or has a solution.
cspProblem.py — (continued)

78 def consistent(self,assignment):
79 """assignment is a variable:value dictionary
80 returns True if all of the constraints that can be evaluated
81 evaluate to True given assignment.
82 """
83 return all(con.holds(assignment)
84 for con in self.constraints
85 if con.can_evaluate(assignment))
The show method uses matplotlib to show the graphical structure of a con-
straint network. If the node positions are not specified, this gives different
positions each time it is run; if you don’t like the graph, try again.
cspProblem.py — (continued)

87 def show(self, linewidth=3, showDomains=False, showAutoAC = False):

88 self.linewidth = linewidth
89 self.picked = None
90 plt.ion() # interactive
91 self.arcs = {} # arc: (con,var) dictionary
92 self.thelines = {} # (con,var):arc dictionary
93 self.nodes = {} # node: variable dictionary
94 self.fig, self.ax= plt.subplots(1, 1)
95 self.ax.set_axis_off()

https://aipython.org Version 0.9.9 October 15, 2023

4.1. Constraint Satisfaction Problems 71

96 for var in self.variables:

97 if var.position is None:
98 var.position = (random.random(), random.random())
99 self.showAutoAC = showAutoAC # used for consistency GUI
100 self.autoAC = False
101 domains = {var:var.domain for var in self.variables} if showDomains
else {}
102 self.draw_graph(domains=domains)
103
104 def draw_graph(self, domains={}, to_do = {}, title=None, fontsize=10):
105 self.ax.clear()
106 self.ax.set_axis_off()
107 if title:
108 plt.title(title, fontsize=fontsize)
109 else:
110 plt.title(self.title, fontsize=fontsize)
111 var_bbox = dict(boxstyle="round4,pad=1.0,rounding_size=0.5")
112 con_bbox = dict(boxstyle="square,pad=1.0",color="green")
113 self.autoACtext = plt.text(0,0,"Auto AC" if self.showAutoAC else "",
114 bbox={'boxstyle':'square','color':'yellow'},
115 picker=True, fontsize=fontsize)
116 for con in self.constraints:
117 if con.position is None:
118 con.position = tuple(sum(var.position[i] for var in
con.scope)/len(con.scope)
119 for i in range(2))
120 cx,cy = con.position
121 bbox = dict(boxstyle="square,pad=1.0",color="green")
122 for var in con.scope:
123 vx,vy = var.position
124 if (var,con) in to_do:
125 color = 'blue'
126 else:
127 color = 'limegreen'
128 line = lines.Line2D([cx,vx], [cy,vy], axes=self.ax,
color=color,
129 picker=True, pickradius=10,
linewidth=self.linewidth)
130 self.arcs[line]= (var,con)
131 self.thelines[(var,con)] = line
132 self.ax.add_line(line)
133 plt.text(cx,cy,con.string,
134 bbox=con_bbox,
135 ha='center',va='center', fontsize=fontsize)
136 for var in self.variables:
137 x,y = var.position
138 if domains:
139 node_label = f"{var.name}\n{domains[var]}"
140 else:
141 node_label = var.name

https://aipython.org Version 0.9.9 October 15, 2023

72 4. Reasoning with Constraints

142 node = plt.text(x, y, node_label, bbox=var_bbox, ha='center',

va='center',
143 picker=True, fontsize=fontsize)
144 self.nodes[node] = var
145 self.fig.canvas.mpl_connect('pick_event', self.pick_handler)
146
147 def pick_handler(self,event):
148 mouseevent = event.mouseevent
149 self.last_artist = artist = event.artist
150 #print('***picker handler:',artist, 'mouseevent:', mouseevent)
151 if artist in self.arcs:
152 print('### selected arc',self.arcs[artist])
153 self.picked = self.arcs[artist]
154 elif artist in self.nodes:
155 print('### selected node',self.nodes[artist])
156 self.picked = self.nodes[artist]
157 elif artist==self.autoACtext:
158 self.autoAC = True
159 print("*** autoAC")
160 else:
161 print("### unknown click")

4.1.4 Examples
In the following code ne , when given a number, returns a function that is true
when its argument is not that number. For example, if f = ne (3), then f (2)
is True and f (3) is False. That is, ne (x)(y) is true when x ̸= y. Allowing
a function of multiple arguments to use its arguments one at a time is called
currying, after the logician Haskell Curry. Functions used as conditions in
constraints require names (so they can be printed).
cspExamples.py — Example CSPs
11 from cspProblem import Variable, CSP, Constraint
12 from operator import lt,ne,eq,gt
13
14 def ne_(val):
15 """not equal value"""
16 # nev = lambda x: x != val # alternative definition
17 # nev = partial(neq,val) # another alternative definition
18 def nev(x):
19 return val != x
20 nev.__name__ = f"{val} != " # name of the function
21 return nev
Similarly is (x)(y) is true when x = y.
cspExamples.py — (continued)

23 def is_(val):
24 """is a value"""
25 # isv = lambda x: x == val # alternative definition

https://aipython.org Version 0.9.9 October 15, 2023

4.1. Constraint Satisfaction Problems 73

csp1
A B B != 2

C
A<B

B<C

Figure 4.1: csp1.show()

26 # isv = partial(eq,val) # another alternative definition

27 def isv(x):
28 return val == x
29 isv.__name__ = f"{val} == "
30 return isv
The CSP, csp0 has variables X, Y and Z, each with domain {1, 2, 3}. The con-
straints are X < Y and Y < Z.
cspExamples.py — (continued)

32 X = Variable('X', {1,2,3})
33 Y = Variable('Y', {1,2,3})
34 Z = Variable('Z', {1,2,3})
35 csp0 = CSP("csp0", {X,Y,Z},
36 [ Constraint([X,Y],lt),
37 Constraint([Y,Z],lt)])
The CSP, csp1 has variables A, B and C, each with domain {1, 2, 3, 4}. The con-
straints are A < B, B ̸= 2, and B < C. This is slightly more interesting than
csp0 as it has more solutions. This example is used in the unit tests, and so if it
is changed, the unit tests need to be changed. The CSP csp1s is the same, but
with only the constraints A < B and B < C
cspExamples.py — (continued)

39 A = Variable('A', {1,2,3,4}, position=(0.2,0.9))

40 B = Variable('B', {1,2,3,4}, position=(0.8,0.9))
41 C = Variable('C', {1,2,3,4}, position=(1,0.4))
42 C0 = Constraint([A,B], lt, "A < B", position=(0.4,0.3))
43 C1 = Constraint([B], ne_(2), "B != 2", position=(1,0.9))
44 C2 = Constraint([B,C], lt, "B < C", position=(0.6,0.1))

https://aipython.org Version 0.9.9 October 15, 2023

74 4. Reasoning with Constraints

csp2

A A != B B B != 3

A=D B != D A != C

A>E B>E
D C<D C

D>E C>E C != 2

Figure 4.2: csp2.show()

45 csp1 = CSP("csp1", {A, B, C},

46 [C0, C1, C2])
47
48 csp1s = CSP("csp1s", {A, B, C},
49 [C0, C2]) # A<B, B<C

The next CSP, csp2 is Example 4.9 of Poole and Mackworth [2023]; the do-
main consistent network (after applying the unary constraints) is shown in Fig-
ure 4.2. Note that we use the same variables as the previous example and add
two more.

cspExamples.py — (continued)

51 D = Variable('D', {1,2,3,4}, position=(0,0.4))

52 E = Variable('E', {1,2,3,4}, position=(0.5,0))
53 csp2 = CSP("csp2", {A,B,C,D,E},
54 [ Constraint([B], ne_(3), "B != 3", position=(1,0.9)),
55 Constraint([C], ne_(2), "C != 2", position=(1,0.2)),
56 Constraint([A,B], ne, "A != B"),
57 Constraint([B,C], ne, "A != C"),
58 Constraint([C,D], lt, "C < D"),
59 Constraint([A,D], eq, "A = D"),
60 Constraint([E,A], lt, "E < A"),
61 Constraint([E,B], lt, "E < B"),
62 Constraint([E,C], lt, "E < C"),
63 Constraint([E,D], lt, "E < D"),

https://aipython.org Version 0.9.9 October 15, 2023

4.1. Constraint Satisfaction Problems 75

csp3

A A != B B

A<D

A-E is odd B<E

D D<C C

D != E C != E

Figure 4.3: csp3.show()

64 Constraint([B,D], ne, "B != D")])

The following example is another scheduling problem (but with multiple an-
swers). This is the same a scheduling 2 in the original AIspace.org consistency
app.

cspExamples.py — (continued)

66 csp3 = CSP("csp3", {A,B,C,D,E},

67 [Constraint([A,B], ne, "A != B"),
68 Constraint([A,D], lt, "A < D"),
69 Constraint([A,E], lambda a,e: (a-e)%2 == 1, "A-E is odd"),
70 Constraint([B,E], lt, "B < E"),
71 Constraint([D,C], lt, "D < C"),
72 Constraint([C,E], ne, "C != E"),
73 Constraint([D,E], ne, "D != E")])

The following example is another abstract scheduling problem. What are

the solutions?

cspExamples.py — (continued)

75 def adjacent(x,y):
76 """True when x and y are adjacent numbers"""
77 return abs(x-y) == 1

https://aipython.org Version 0.9.9 October 15, 2023

76 4. Reasoning with Constraints

csp4
A adjacent(A,B) B

B != D A != C adjacent(B,C)

D adjacent(C,D) C

adjacent(D,E) C != E

Figure 4.4: csp4.show()

78
79 csp4 = CSP("csp4", {A,B,C,D},
80 [Constraint([A,B], adjacent, "adjacent(A,B)"),
81 Constraint([B,C], adjacent, "adjacent(B,C)"),
82 Constraint([C,D], adjacent, "adjacent(C,D)"),
83 Constraint([A,C], ne, "A != C"),
84 Constraint([B,D], ne, "B != D") ])
The following examples represent the crossword shown in Figure 4.5.
In the first representation, the variables represent words. The constraint
imposed by the crossword is that where two words intersect, the letter at the
intersection must be the same. The method meet_at is used to test whether two
words intersect with the same letter. For example, the constraint meet_at(2,0)
means that the third letter (at position 2) of the first argument is the same as
the first letter of the second argument. This is shown in Figure 4.6.
cspExamples.py — (continued)

86 def meet_at(p1,p2):
87 """returns a function of two words that is true
88 when the words intersect at positions p1, p2.
89 The positions are relative to the words; starting at position 0.
90 meet_at(p1,p2)(w1,w2) is true if the same letter is at position p1 of
word w1
91 and at position p2 of word w2.
92 """
93 def meets(w1,w2):
94 return w1[p1] == w2[p2]

https://aipython.org Version 0.9.9 October 15, 2023

4.1. Constraint Satisfaction Problems 77

1 2

Words:
3
ant, big, bus, car, has,
book, buys, hold, lane,
year, ginger, search,
symbol, syntax.
4

Figure 4.5: crossword1: a crossword puzzle to be solved

crossword1

one_across
meet_at(2,0)[one_across, two_down]
meet_at(0,0)[one_across, one_down] two_down

one_down

meet_at(2,2)[three_across, two_down]
meet_at(0,2)[three_across, one_down]

meet_at(0,4)[four_across, two_down]

three_across

four_across

Figure 4.6: crossword1.show()

https://aipython.org Version 0.9.9 October 15, 2023

78 4. Reasoning with Constraints

95 meets.__name__ = f"meet_at({p1},{p2})"
96 return meets
97
98 one_across = Variable('one_across', {'ant', 'big', 'bus', 'car', 'has'},
position=(0.3,0.9))
99 one_down = Variable('one_down', {'book', 'buys', 'hold', 'lane', 'year'},
position=(0.1,0.7))
100 two_down = Variable('two_down', {'ginger', 'search', 'symbol', 'syntax'},
position=(0.9,0.8))
101 three_across = Variable('three_across', {'book', 'buys', 'hold', 'land',
'year'}, position=(0.1,0.3))
102 four_across = Variable('four_across',{'ant', 'big', 'bus', 'car', 'has'},
position=(0.7,0.0))
103 crossword1 = CSP("crossword1",
104 {one_across, one_down, two_down, three_across,
four_across},
105 [Constraint([one_across,one_down], meet_at(0,0)),
106 Constraint([one_across,two_down], meet_at(2,0)),
107 Constraint([three_across,two_down], meet_at(2,2)),
108 Constraint([three_across,one_down], meet_at(0,2)),
109 Constraint([four_across,two_down], meet_at(0,4))])
In an alternative representation of a crossword (the “dual” representation),
the variables represent letters, and the constraints are that adjacent sequences
of letters form words. This is shown in Figure 4.7.
cspExamples.py — (continued)

111 words = {'ant', 'big', 'bus', 'car', 'has','book', 'buys', 'hold',

112 'lane', 'year', 'ginger', 'search', 'symbol', 'syntax'}
113
114 def is_word(*letters, words=words):
115 """is true if the letters concatenated form a word in words"""
116 return "".join(letters) in words
117
118 letters = {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l",
119 "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y",
120 "z"}
121
122 # pij is the variable representing the letter i from the left and j down
(starting from 0)
123 p00 = Variable('p00', letters, position=(0.1,0.85))
124 p10 = Variable('p10', letters, position=(0.3,0.85))
125 p20 = Variable('p20', letters, position=(0.5,0.85))
126 p01 = Variable('p01', letters, position=(0.1,0.7))
127 p21 = Variable('p21', letters, position=(0.5,0.7))
128 p02 = Variable('p02', letters, position=(0.1,0.55))
129 p12 = Variable('p12', letters, position=(0.3,0.55))
130 p22 = Variable('p22', letters, position=(0.5,0.55))
131 p32 = Variable('p32', letters, position=(0.7,0.55))
132 p03 = Variable('p03', letters, position=(0.1,0.4))
133 p23 = Variable('p23', letters, position=(0.5,0.4))

https://aipython.org Version 0.9.9 October 15, 2023

4.1. Constraint Satisfaction Problems 79

crossword1d
is_word[p00, p10, p20]

p00 p10 p20

p01 p21

is_word[p00, p01, p02, p03] is_word[p02, p12, p22, p32]

p02 p12 p22 p32

is_word[p20, p21, p22, p23, p24, p25]

p03 p23

is_word[p24, p34, p44]

p24 p34 p44

p25

Figure 4.7: crossword1d.show()

134 p24 = Variable('p24', letters, position=(0.5,0.25))

135 p34 = Variable('p34', letters, position=(0.7,0.25))
136 p44 = Variable('p44', letters, position=(0.9,0.25))
137 p25 = Variable('p25', letters, position=(0.5,0.1))
138
139 crossword1d = CSP("crossword1d",
140 {p00, p10, p20, # first row
141 p01, p21, # second row
142 p02, p12, p22, p32, # third row
143 p03, p23, #fourth row
144 p24, p34, p44, # fifth row
145 p25 # sixth row
146 },
147 [Constraint([p00, p10, p20], is_word,
position=(0.3,0.95)), #1-across
148 Constraint([p00, p01, p02, p03], is_word,
position=(0,0.625)), # 1-down
149 Constraint([p02, p12, p22, p32], is_word,
position=(0.3,0.625)), # 3-across
150 Constraint([p20, p21, p22, p23, p24, p25], is_word,
position=(0.45,0.475)), # 2-down
151 Constraint([p24, p34, p44], is_word,
position=(0.7,0.325)) # 4-across

https://aipython.org Version 0.9.9 October 15, 2023

80 4. Reasoning with Constraints

152 ])

Exercise 4.1 How many assignments of a value to each variable are there for
each of the representations of the above crossword? Do you think an exhaustive
enumeration will work for either one?
The queens problem is a puzzle on a chess board, where the idea is to place
a queen on each column so the queens cannot take each other: there are no
two queens on the same row, column or diagonal. The n-queens problem is a
generalization where the size of the board is an n × n, and n queens have to be
placed.
Here is a representation of the n-queens problem, where the variables are
the columns and the values are the rows in which the queen is placed. The
original queens problem on a standard (8 × 8) chess board is n_queens(8)
cspExamples.py — (continued)

154 def queens(ri,rj):

155 """ri and rj are different rows, return the condition that the queens
cannot take each other"""
156 def no_take(ci,cj):
157 """is true if queen at (ri,ci) cannot take a queen at (rj,cj)"""
158 return ci != cj and abs(ri-ci) != abs(rj-cj)
159 return no_take
160
161 def n_queens(n):
162 """returns a CSP for n-queens"""
163 columns = list(range(n))
164 variables = [Variable(f"R{i}",columns) for i in range(n)]
165 return CSP("n-queens",
166 variables,
167 [Constraint([variables[i], variables[j]], queens(i,j))
168 for i in range(n) for j in range(n) if i != j])
169
170 # try the CSP n_queens(8) in one of the solvers.
171 # What is the smallest n for which there is a solution?

Exercise 4.2 How many constraints does this representation of the n-queens
problem produce? Can it be done with fewer constraints? Either explain why it
can’t be done with fewer constraints, or give a solution using fewer constraints.

Unit tests
The following defines a unit test for csp solvers, by default using example csp1.
cspExamples.py — (continued)

173 def test_csp(CSP_solver, csp=csp1,

174 solutions=[{A: 1, B: 3, C: 4}, {A: 2, B: 3, C: 4}]):
175 """CSP_solver is a solver that takes a csp and returns a solution
176 csp is a constraint satisfaction problem
177 solutions is the list of all solutions to csp

https://aipython.org Version 0.9.9 October 15, 2023

4.2. A Simple Depth-first Solver 81

178 This tests whether the solution returned by CSP_solver is a solution.

179 """
180 print("Testing csp with",CSP_solver.__doc__)
181 sol0 = CSP_solver(csp)
182 print("Solution found:",sol0)
183 assert sol0 in solutions, f"Solution not correct for {csp}"
184 print("Passed unit test")

Exercise 4.3 Modify test so that instead of taking in a list of solutions, it checks
whether the returned solution actually is a solution.
Exercise 4.4 Propose a test that is appropriate for CSPs with no solutions. As-
sume that the test designer knows there are no solutions. Consider what a CSP
solver should return if there are no solutions to the CSP.
Exercise 4.5 Write a unit test that checks whether all solutions (e.g., for the search
algorithms that can return multiple solutions) are correct, and whether all solu-
tions can be found.

4.2 A Simple Depth-first Solver

The first solver carries out a depth-first search through the space of partial as-
signments. This takes in a CSP problem and an optional variable ordering (a
list of the variables in the CSP). It returns a generator of the solutions (see Sec-
tion 1.5.4 on yield for enumerations).
cspDFS.py — Solving a CSP using depth-first search.
11 import cspExamples
12
13 def dfs_solver(constraints, context, var_order):
14 """generator for all solutions to csp.
15 context is an assignment of values to some of the variables.
16 var_order is a list of the variables in csp that are not in context.
17 """
18 to_eval = {c for c in constraints if c.can_evaluate(context)}
19 if all(c.holds(context) for c in to_eval):
20 if var_order == []:
21 yield context
22 else:
23 rem_cons = [c for c in constraints if c not in to_eval]
24 var = var_order[0]
25 for val in var.domain:
26 yield from dfs_solver(rem_cons, context|{var:val},
var_order[1:])
27
28 def dfs_solve_all(csp, var_order=None):
29 """depth-first CSP solver to return a list of all solutions to csp.
30 """
31 if var_order == None: # use an arbitrary variable order
32 var_order = list(csp.variables)

https://aipython.org Version 0.9.9 October 15, 2023

82 4. Reasoning with Constraints

33 return list( dfs_solver(csp.constraints, {}, var_order))

34
35 def dfs_solve1(csp, var_order=None):
36 """depth-first CSP solver"""
37 if var_order == None: # use an arbitrary variable order
38 var_order = list(csp.variables)
39 for sol in dfs_solver(csp.constraints, {}, var_order):
40 return sol #return first one
41
42 if __name__ == "__main__":
43 cspExamples.test_csp(dfs_solve1)
44
45 #Try:
46 # dfs_solve_all(cspExamples.csp1)
47 # dfs_solve_all(cspExamples.csp2)
48 # dfs_solve_all(cspExamples.crossword1)
49 # dfs_solve_all(cspExamples.crossword1d) # warning: may take a *very* long
time!

Exercise 4.6 Instead of testing all constraints at every node, change it so each
constraint is only tested when all of it variables are assigned. Given an elimina-
tion ordering, it is possible to determine when each constraint needs to be tested.
Implement this. Hint: create a parallel list of sets of constraints, where at each po-
sition i in the list, the constraints at position i can be evaluated when the variable
at position i has been assigned.
Exercise 4.7 Estimate how long dfs_solve_all(crossword1d) will take on your
computer. To do this, reduce the number of variables that need to be assigned,
so that the simplifies problem can be solved in a reasonable time (between 0.1
second and 10 seconds). This can be done by reducing the number of variables in
var_order, as the program only splits on these. How much more time will it take
if the number of variables is increased by 1? (Try it!) Then extrapolate to all of the
variables. See Section 1.6.1 for how to time your code. Would making the code 100
times faster or using a computer 100 times faster help?

4.3 Converting CSPs to Search Problems

To run the demo, in folder ”aipython”, load ”cspSearch.py”, and copy
and paste the example queries at the bottom of that file.

The next solver constructs a search space that can be solved using the search
methods of the previous chapter. This takes in a CSP problem and an optional
variable ordering, which is a list of the variables in the CSP. In this search space:

• A node is a variable : value dictionary which does not violate any con-
straints (so that dictionaries that violate any conmtratints are not added).

https://aipython.org Version 0.9.9 October 15, 2023

4.3. Converting CSPs to Search Problems 83

• An arc corresponds to an assignment of a value to the next variable. This

assumes a static ordering; the next variable chosen to split does not de-
pend on the context. If no variable ordering is given, this makes no at-
tempt to choose a good ordering.

cspSearch.py — Representations of a Search Problem from a CSP.

11 from cspProblem import CSP, Constraint
12 from searchProblem import Arc, Search_problem
13
14 class Search_from_CSP(Search_problem):
15 """A search problem directly from the CSP.
16
17 A node is a variable:value dictionary"""
18 def __init__(self, csp, variable_order=None):
19 self.csp=csp
20 if variable_order:
21 assert set(variable_order) == set(csp.variables)
22 assert len(variable_order) == len(csp.variables)
23 self.variables = variable_order
24 else:
25 self.variables = list(csp.variables)
26
27 def is_goal(self, node):
28 """returns whether the current node is a goal for the search
29 """
30 return len(node)==len(self.csp.variables)
31
32 def start_node(self):
33 """returns the start node for the search
34 """
35 return {}

The neighbors(node) method uses the fact that the length of the node, which
is the number of variables already assigned, is the index of the next variable to
split on. Note that we do no need to check whether there are no more variables
to split on, as the nodes are all consistent, by construction, and so when there
are no more variables we have a solution, and so don’t need the neighbors.

cspSearch.py — (continued)

37 def neighbors(self, node):

38 """returns a list of the neighboring nodes of node.
39 """
40 var = self.variables[len(node)] # the next variable
41 res = []
42 for val in var.domain:
43 new_env = node|{var:val} #dictionary union
44 if self.csp.consistent(new_env):
45 res.append(Arc(node,new_env))
46 return res

https://aipython.org Version 0.9.9 October 15, 2023

84 4. Reasoning with Constraints

The unit tests relies on a solver. The following procedure creates a solver
using search that can be tested.
cspSearch.py — (continued)

48 import cspExamples
49 from searchGeneric import Searcher
50
51 def solver_from_searcher(csp):
52 """depth-first search solver"""
53 path = Searcher(Search_from_CSP(csp)).search()
54 if path is not None:
55 return path.end()
56 else:
57 return None
58
59 if __name__ == "__main__":
60 test_csp(solver_from_searcher)
61
62 ## Test Solving CSPs with Search:
63 searcher1 = Searcher(Search_from_CSP(cspExamples.csp1))
64 #print(searcher1.search()) # get next solution
65 searcher2 = Searcher(Search_from_CSP(cspExamples.csp2))
66 #print(searcher2.search()) # get next solution
67 searcher3 = Searcher(Search_from_CSP(cspExamples.crossword1))
68 #print(searcher3.search()) # get next solution
69 searcher4 = Searcher(Search_from_CSP(cspExamples.crossword1d))
70 #print(searcher4.search()) # get next solution (warning: slow)

Exercise 4.8 What would happen if we constructed the new assignment by as-
signing node[var] = val (with side effects) instead of using dictionary union? Give
an example of where this could give a wrong answer. How could the algorithm be
changed to work with side effects? (Hint: think about what information needs to
be in a node).
Exercise 4.9 Change neighbors so that it returns an iterator of values rather than
a list. (Hint: use yield.)

4.4 Consistency Algorithms

To run the demo, in folder ”aipython”, load ”cspConsistency.py”, and
copy and paste the commented-out example queries at the bottom of
that file.

A Con solver is used to simplify a CSP using arc consistency.

cspConsistency.py — Arc Consistency and Domain splitting for solving a CSP
11 from display import Displayable
12
13 class Con_solver(Displayable):

https://aipython.org Version 0.9.9 October 15, 2023

4.4. Consistency Algorithms 85

14 """Solves a CSP with arc consistency and domain splitting

15 """
16 def __init__(self, csp):
17 """a CSP solver that uses arc consistency
18 * csp is the CSP to be solved
19 """
20 self.csp = csp
21 super().__init__() # Or Displayable.__init__(self)
The following implementation of arc consistency maintains the set to do of
(variable, constraint) pairs that are to be checked. It takes in a domain dictio-
nary and returns a new domain dictionary. It needs to be careful to avoid side
effects (by copying the domains dictionary and the to do set).
cspConsistency.py — (continued)

23 def make_arc_consistent(self, domains=None, to_do=None):

24 """Makes this CSP arc-consistent using generalized arc consistency
25 domains is a variable:domain dictionary
26 to_do is a set of (variable,constraint) pairs
27 returns the reduced domains (an arc-consistent variable:domain
dictionary)
28 """
29 if domains is None:
30 self.domains = {var:var.domain for var in self.csp.variables}
31 else:
32 self.domains = domains.copy() # use a copy of domains
33 if to_do is None:
34 to_do = {(var, const) for const in self.csp.constraints
35 for var in const.scope}
36 else:
37 to_do = to_do.copy() # use a copy of to_do
38 self.display(5,"Performing AC with domains", self.domains)
39 while to_do:
40 self.arc_selected = (var, const) = self.select_arc(to_do)
41 self.display(5, "Processing arc (", var, ",", const, ")")
42 other_vars = [ov for ov in const.scope if ov != var]
43 new_domain = {val for val in self.domains[var]
44 if self.any_holds(self.domains, const, {var:
val}, other_vars)}
45 if new_domain != self.domains[var]:
46 self.add_to_do = self.new_to_do(var, const) - to_do
47 self.display(3, f"Arc: ({var}, {const}) is inconsistent\n"
48 f"Domain pruned, dom({var}) ={new_domain} due to
{const}")
49 self.domains[var] = new_domain
50 self.display(4, " adding", self.add_to_do if self.add_to_do
51 else "nothing", "to to_do.")
52 to_do |= self.add_to_do # set union
53 self.display(5, f"Arc: ({var},{const}) now consistent")
54 self.display(5, "AC done. Reduced domains", self.domains)
55 return self.domains

https://aipython.org Version 0.9.9 October 15, 2023

86 4. Reasoning with Constraints

56
57 def new_to_do(self, var, const):
58 """returns new elements to be added to to_do after assigning
59 variable var in constraint const.
60 """
61 return {(nvar, nconst) for nconst in self.csp.var_to_const[var]
62 if nconst != const
63 for nvar in nconst.scope
64 if nvar != var}

The following selects an arc. Any element of to do can be selected. The se-
lected element needs to be removed from to do. The default implementation
just selects which ever element pop method for sets returns. The graphical user
interface below allows the user to select an arc. Alternatively, a more sophisti-
cated selection could be employed.
cspConsistency.py — (continued)

66 def select_arc(self, to_do):

67 """Selects the arc to be taken from to_do .
68 * to_do is a set of arcs, where an arc is a (variable,constraint)
pair
69 the element selected must be removed from to_do.
70 """
71 return to_do.pop()

The value of new_domain is the subset of the domain of var that is consistent
with the assignment to the other variables. To make it easier to understand, the
following treats unary (with no other variables in the constraint) and binary
(with one other variables in the constraint) constraints as special cases. These
cases are not strictly necessary; the last case covers the first two cases, but is
more difficult to understand without seeing the first two cases. Note that this
case analysis is not in the code distribution, but can replace the assignment to
new_domain above.
if len(other_vars)==0: # unary constraint
new_domain = {val for val in self.domains[var]
if const.holds({var:val})}
elif len(other_vars)==1: # binary constraint
other = other_vars[0]
new_domain = {val for val in self.domains[var]
if any(const.holds({var: val,other:other_val})
for other_val in self.domains[other])}
else: # general case
new_domain = {val for val in self.domains[var]
if self.any_holds(self.domains, const, {var: val}, other_vars)}
any holds is a recursive function that tries to finds an assignment of values to the
other variables (other vars) that satisfies constraint const given the assignment
in env. The integer variable ind specifies which index to other vars needs to be

https://aipython.org Version 0.9.9 October 15, 2023

4.4. Consistency Algorithms 87

checked next. As soon as one assignment returns True, the algorithm returns
True.

cspConsistency.py — (continued)

73 def any_holds(self, domains, const, env, other_vars, ind=0):

74 """returns True if Constraint const holds for an assignment
75 that extends env with the variables in other_vars[ind:]
76 env is a dictionary
77 """
78 if ind == len(other_vars):
79 return const.holds(env)
80 else:
81 var = other_vars[ind]
82 for val in domains[var]:
83 if self.any_holds(domains, const, env|{var:val}, other_vars,
ind + 1):
84 return True
85 return False

4.4.1 Direct Implementation of Domain Splitting

The following is a direct implementation of domain splitting with arc consis-
tency. It implements the generator interface of Python (see Section 1.5.4). When
it has found a solution it |yield—s the result; otherwise it recursively splits a
domain (using |yield from—).

cspConsistency.py — (continued)

87 def generate_sols(self, domains=None, to_do=None, context=dict()):

88 """return list of all solution to the current CSP
89 to_do is the list of arcs to check
90 context is a dictionary of splits made (used for display)
91 """
92 new_domains = self.make_arc_consistent(domains, to_do)
93 if any(len(new_domains[var]) == 0 for var in new_domains):
94 self.display(1,f"No solutions for context {context}")
95 elif all(len(new_domains[var]) == 1 for var in new_domains):
96 self.display(1, "solution:", str({var: select(
97 new_domains[var]) for var in new_domains}))
98 yield {var: select(new_domains[var]) for var in new_domains}
99 else:
100 var = self.select_var(x for x in self.csp.variables if
len(new_domains[x]) > 1)
101 dom1, dom2 = partition_domain(new_domains[var])
102 self.display(5, "...splitting", var, "into", dom1, "and", dom2)
103 new_doms1 = new_domains | {var:dom1}
104 new_doms2 = new_domains | {var:dom2}
105 to_do = self.new_to_do(var, None)
106 self.display(4, " adding", to_do if to_do else "nothing", "to
to_do.")

https://aipython.org Version 0.9.9 October 15, 2023

88 4. Reasoning with Constraints

107 yield from self.generate_sols(new_doms1, to_do,

context|{var:dom1})
108 yield from self.generate_sols(new_doms2, to_do,
context|{var:dom1})
109
110 def solve_all(self, domains=None, to_do=None):
111 return list(self.generate_sols())
112
113 def solve_one(self, domains=None, to_do=None):
114 return select(self.generate_sols())
115
116 def select_var(self, iter_vars):
117 """return the next variable to split"""
118 return select(iter_vars)
119
120 def partition_domain(dom):
121 """partitions domain dom into two.
122 """
123 split = len(dom) // 2
124 dom1 = set(list(dom)[:split])
125 dom2 = dom - dom1
126 return dom1, dom2

cspConsistency.py — (continued)

128 def select(iterable):

129 """select an element of iterable. Returns None if there is no such
element.
130
131 This implementation just picks the first element.
132 For many of the uses, which element is selected does not affect
correctness,
133 but may affect efficiency.
134 """
135 for e in iterable:
136 return e # returns first element found

Exercise 4.10 Implement solve all that returns the set of all solutions without
using yield. Hint: it can be like generate_sols but returns a set of solutions; the
recursive calls can be unioned; | is Python’s union.
Exercise 4.11 Implement solve one that returns one solution if one exists, or False
otherwise, without using yield. Hint: Python’s “or” has the behaviour A or B
will return the value of A unless it is None or False, in which case the value of B is
returned.
Unit test:
cspConsistency.py — (continued)

138 import cspExamples

139 def ac_solver(csp):
140 "arc consistency (ac_solver)"

https://aipython.org Version 0.9.9 October 15, 2023

4.4. Consistency Algorithms 89

click on to_do (blue) arc

A A != B B
{1, 2, 3, 4} {1, 2, 3}

A<D

D A-E is odd B<E C

{1, 2, 3, 4} D<C {1, 2, 3, 4}

D != E C != E

Auto AC E
{1, 2, 3, 4}

Figure 4.8: ConsistencyGUI(cspExamples.csp3).go()

141 for sol in Con_solver(csp).generate_sols():

142 return sol
143
144 if __name__ == "__main__":
145 cspExamples.test_csp(ac_solver)

4.4.2 Consistency GUI

The consistency GUI allows students to step through the algorithm, choosing
which arc to process next, and which variable to split.
Figure 4.8 shows the state of the GUI after two arcs have been made arc
consistent. The arcs on the to_do list arc colored blue. The green arcs are those
have been made arc consistent. The user can click on a blue arc to process
that arc. If the arc selected is not arc consistent, it is made red, the domain is
reduced, and then the arc becomes green. If the arc was already arc consistent
it turns green.
This is implemented by overriding select_arc and select_var to allow the
user to pick the arcs and the variables, and overriding display to allow for the
animation. Note that the first argument of display (the number) in the code
above is interpreted with a special meaning by the GUI and should only be
changed with care.
Clicking AutoAC automates arc selection until the network is arc consistent.
cspConsistencyGUI.py — GUI for consistency-based CSP solving
11 from cspConsistency import Con_solver
12 import matplotlib.pyplot as plt
13

https://aipython.org Version 0.9.9 October 15, 2023

90 4. Reasoning with Constraints

14 class ConsistencyGUI(Con_solver):
15 def __init__(self, csp, fontsize=10, speed=1, **kwargs):
16 """
17 csp is the csp to show
18 fontsize is the size of the text
19 speed is the number of animations per second (controls delay_time)
20 1 (slow) and 4 (fast) seem like good values
21 """
22 self.fontsize = fontsize
23 self.delay_time = 1/speed
24 Con_solver.__init__(self, csp, **kwargs)
25 csp.show(showAutoAC = True)
26
27 def go(self):
28 res = self.solve_all()
29 self.csp.draw_graph(domains=self.domains,
30 title="No more solutions. GUI finished. ",
31 fontsize=self.fontsize)
32 return res
33
34 def select_arc(self, to_do):
35 while True:
36 self.csp.draw_graph(domains=self.domains, to_do=to_do,
37 title="click on to_do (blue) arc",
fontsize=self.fontsize)
38 while self.csp.picked == None and not self.csp.autoAC:
39 plt.pause(0.01) # controls reaction time of GUI
40 if self.csp.autoAC:
41 break
42 picked = self.csp.picked
43 self.csp.picked = None
44 if picked in to_do:
45 to_do.remove(picked)
46 print(f"{picked} picked")
47 return picked
48 else:
49 print(f"{picked} not in to_do")
50 if self.csp.autoAC:
51 self.csp.draw_graph(domains=self.domains, to_do=to_do,
52 title="Auto AC", fontsize=self.fontsize)
53 plt.pause(self.delay_time)
54 return to_do.pop()
55
56 def select_var(self, iter_vars):
57 vars = list(iter_vars)
58 while True:
59 self.csp.draw_graph(domains=self.domains,
60 title="Arc consistent. Click node to
split",
61 fontsize=self.fontsize)

https://aipython.org Version 0.9.9 October 15, 2023

4.4. Consistency Algorithms 91

62 while self.csp.picked == None:

63 plt.pause(0.01) # controls reaction time of GUI
64 picked = self.csp.picked
65 self.csp.picked = None
66 self.csp.autoAC = False
67 if picked in vars:
68 print("splitting",picked)
69 return picked
70 else:
71 print(picked,"not in",vars)
72
73 def display(self,n,*args,**nargs):
74 if n <= self.max_display_level: # default display
75 print(*args, **nargs)
76 if n==1: # solution found or no solutions"
77 self.csp.draw_graph(domains=self.domains, to_do=set(),
78 title=' '.join(args)+": click any node or
arc to continue",
79 fontsize=self.fontsize)
80 self.csp.autoAC = False
81 while self.csp.picked == None and not self.csp.autoAC:
82 plt.pause(0.01) # controls reaction time of GUI
83 self.csp.picked = None
84 elif n==2: # backtracking
85 plt.title("backtracking: click any node or arc to continue")
86 self.csp.autoAC = False
87 while self.csp.picked == None and not self.csp.autoAC:
88 plt.pause(0.01)
89 self.csp.picked = None
90 elif n==3: # inconsistent arc
91 line = self.csp.thelines[self.arc_selected]
92 line.set_color('red')
93 line.set_linewidth(10)
94 plt.pause(self.delay_time)
95 line.set_color('limegreen')
96 line.set_linewidth(self.csp.linewidth)
97 #elif n==4 and self.add_to_do: # adding to to_do
98 # print("adding to to_do",self.add_to_do) ## highlight these arc
99
100 import cspExamples
101 # Try:
102 # ConsistencyGUI(cspExamples.csp1).go()
103 # ConsistencyGUI(cspExamples.csp3).go()
104 # ConsistencyGUI(cspExamples.csp3, speed=4, fontsize=15).go()

4.4.3 Domain Splitting as an interface to graph searching

An alternative implementation is to implement domain splitting in terms of
the search abstraction of Chapter 3.

https://aipython.org Version 0.9.9 October 15, 2023

92 4. Reasoning with Constraints

A node is domains dictionary.

cspConsistency.py — (continued)

147 from searchProblem import Arc, Search_problem

148
149 class Search_with_AC_from_CSP(Search_problem,Displayable):
150 """A search problem with arc consistency and domain splitting
151
152 A node is a CSP """
153 def __init__(self, csp):
154 self.cons = Con_solver(csp) #copy of the CSP
155 self.domains = self.cons.make_arc_consistent()
156
157 def is_goal(self, node):
158 """node is a goal if all domains have 1 element"""
159 return all(len(node[var])==1 for var in node)
160
161 def start_node(self):
162 return self.domains
163
164 def neighbors(self,node):
165 """returns the neighboring nodes of node.
166 """
167 neighs = []
168 var = select(x for x in node if len(node[x])>1)
169 if var:
170 dom1, dom2 = partition_domain(node[var])
171 self.display(2,"Splitting", var, "into", dom1, "and", dom2)
172 to_do = self.cons.new_to_do(var,None)
173 for dom in [dom1,dom2]:
174 newdoms = node | {var:dom}
175 cons_doms = self.cons.make_arc_consistent(newdoms,to_do)
176 if all(len(cons_doms[v])>0 for v in cons_doms):
177 # all domains are non-empty
178 neighs.append(Arc(node,cons_doms))
179 else:
180 self.display(2,"...",var,"in",dom,"has no solution")
181 return neighs

Exercise 4.12 When splitting a domain, this code splits the domain into half,
approximately in half (without any effort to make a sensible choice). Does it work
better to split one element from a domain?
Unit test:
cspConsistency.py — (continued)

183 import cspExamples

184 from searchGeneric import Searcher
185
186 def ac_search_solver(csp):
187 """arc consistency (search interface)"""

https://aipython.org Version 0.9.9 October 15, 2023

4.5. Solving CSPs using Stochastic Local Search 93

188 sol = Searcher(Search_with_AC_from_CSP(csp)).search()

189 if sol:
190 return {v:select(d) for (v,d) in sol.end().items()}
191
192 if __name__ == "__main__":
193 cspExamples.test_csp(ac_search_solver)
Testing:
cspConsistency.py — (continued)

195 ## Test Solving CSPs with Arc consistency and domain splitting:
196 #Con_solver.max_display_level = 4 # display details of AC (0 turns off)
197 #Con_solver(cspExamples.csp1).solve_all()
198 #searcher1d = Searcher(Search_with_AC_from_CSP(cspExamples.csp1))
199 #print(searcher1d.search())
200 #Searcher.max_display_level = 2 # display search trace (0 turns off)
201 #searcher2c = Searcher(Search_with_AC_from_CSP(cspExamples.csp2))
202 #print(searcher2c.search())
203 #searcher3c = Searcher(Search_with_AC_from_CSP(cspExamples.crossword1))
204 #print(searcher3c.search())
205 #searcher4c = Searcher(Search_with_AC_from_CSP(cspExamples.crossword1d))
206 #print(searcher4c.search())

4.5 Solving CSPs using Stochastic Local Search

To run the demo, in folder ”aipython”, load ”cspSLS.py”, and copy
and paste the commented-out example queries at the bottom of that
file. This assumes Python 3. Some of the queries require matplotlib.

The following code implements the two-stage choice (select one of the vari-
ables that are involved in the most constraints that are violated, then a value),
the any-conflict algorithm (select a variable that participates in a violated con-
straint) and a random choice of variable, as well as a probabilistic mix of the
three.
Given a CSP, the stochastic local searcher (SLSearcher) creates the data struc-
tures:

• variables to select is the set of all of the variables with domain-size greater
than one. For a variable not in this set, we cannot pick another value from
that variable.

• var to constraints maps from a variable into the set of constraints it is in-
volved in. Note that the inverse mapping from constraints into variables
is part of the definition of a constraint.

cspSLS.py — Stochastic Local Search for Solving CSPs

11 from cspProblem import CSP, Constraint

https://aipython.org Version 0.9.9 October 15, 2023

94 4. Reasoning with Constraints

12 from searchProblem import Arc, Search_problem

13 from display import Displayable
14 import random
15 import heapq
16
17 class SLSearcher(Displayable):
18 """A search problem directly from the CSP..
19
20 A node is a variable:value dictionary"""
21 def __init__(self, csp):
22 self.csp = csp
23 self.variables_to_select = {var for var in self.csp.variables
24 if len(var.domain) > 1}
25 # Create assignment and conflicts set
26 self.current_assignment = None # this will trigger a random restart
27 self.number_of_steps = 0 #number of steps after the initialization

restart creates a new total assignment, and constructs the set of conflicts (the
constraints that are false in this assignment).

cspSLS.py — (continued)

29 def restart(self):
30 """creates a new total assignment and the conflict set
31 """
32 self.current_assignment = {var:random_choice(var.domain) for
33 var in self.csp.variables}
34 self.display(2,"Initial assignment",self.current_assignment)
35 self.conflicts = set()
36 for con in self.csp.constraints:
37 if not con.holds(self.current_assignment):
38 self.conflicts.add(con)
39 self.display(2,"Number of conflicts",len(self.conflicts))
40 self.variable_pq = None

The search method is the top-level searching algorithm. It can either be used
to start the search or to continue searching. If there is no current assignment,
it must create one. Note that, when counting steps, a restart is counted as one
step, which is not appropriate for CSPs with many variables, as it is a relatively
expensive operation for these cases.
This method selects one of two implementations. The argument pob best
is the probability of selecting a best variable (one involving the most conflicts).
When the value of prob best is positive, the algorithm needs to maintain a prior-
ity queue of variables and the number of conflicts (using search with var pq). If
the probability of selecting a best variable is zero, it does not need to maintain
this priority queue (as implemented in search with any conflict).
The argument prob anycon is the probability that the any-conflict strategy is
used (which selects a variable at random that is in a conflict), assuming that
it is not picking a best variable. Note that for the probability parameters, any
value less that zero acts like probability zero and any value greater than 1 acts

https://aipython.org Version 0.9.9 October 15, 2023

4.5. Solving CSPs using Stochastic Local Search 95

like probability 1. This means that when prob anycon = 1.0, a best variable is
chosen with probability prob best, otherwise a variable in any conflict is chosen.
A variable is chosen at random with probability 1 − prob anycon − prob best as
long as that is positive.
This returns the number of steps needed to find a solution, or None if no
solution is found. If there is a solution, it is in self .current assignment.
cspSLS.py — (continued)

42 def search(self,max_steps, prob_best=0, prob_anycon=1.0):

43 """
44 returns the number of steps or None if these is no solution.
45 If there is a solution, it can be found in self.current_assignment
46
47 max_steps is the maximum number of steps it will try before giving
up
48 prob_best is the probability that a best variable (one in most
conflict) is selected
49 prob_anycon is the probability that a variable in any conflict is
selected
50 (otherwise a variable is chosen at random)
51 """
52 if self.current_assignment is None:
53 self.restart()
54 self.number_of_steps += 1
55 if not self.conflicts:
56 self.display(1,"Solution found:", self.current_assignment,
"after restart")
57 return self.number_of_steps
58 if prob_best > 0: # we need to maintain a variable priority queue
59 return self.search_with_var_pq(max_steps, prob_best,
prob_anycon)
60 else:
61 return self.search_with_any_conflict(max_steps, prob_anycon)

Exercise 4.13 This does an initial random assignment but does not do any ran-
dom restarts. Implement a searcher that takes in the maximum number of walk
steps (corresponding to existing max steps) and the maximum number of restarts,
and returns the total number of steps for the first solution found. (As in search, the
solution found can be extracted from the variable self .current assignment).

4.5.1 Any-conflict
If the probability of picking a best variable is zero, the implementation need to
keeps track of which variables are in conflicts.
cspSLS.py — (continued)

63 def search_with_any_conflict(self, max_steps, prob_anycon=1.0):

64 """Searches with the any_conflict heuristic.
65 This relies on just maintaining the set of conflicts;

https://aipython.org Version 0.9.9 October 15, 2023

96 4. Reasoning with Constraints

66 it does not maintain a priority queue

67 """
68 self.variable_pq = None # we are not maintaining the priority queue.
69 # This ensures it is regenerated if
70 # we call search_with_var_pq.
71 for i in range(max_steps):
72 self.number_of_steps +=1
73 if random.random() < prob_anycon:
74 con = random_choice(self.conflicts) # pick random conflict
75 var = random_choice(con.scope) # pick variable in conflict
76 else:
77 var = random_choice(self.variables_to_select)
78 if len(var.domain) > 1:
79 val = random_choice([val for val in var.domain
80 if val is not
self.current_assignment[var]])
81 self.display(2,self.number_of_steps,":
Assigning",var,"=",val)
82 self.current_assignment[var]=val
83 for varcon in self.csp.var_to_const[var]:
84 if varcon.holds(self.current_assignment):
85 if varcon in self.conflicts:
86 self.conflicts.remove(varcon)
87 else:
88 if varcon not in self.conflicts:
89 self.conflicts.add(varcon)
90 self.display(2," Number of conflicts",len(self.conflicts))
91 if not self.conflicts:
92 self.display(1,"Solution found:", self.current_assignment,
93 "in", self.number_of_steps,"steps")
94 return self.number_of_steps
95 self.display(1,"No solution in",self.number_of_steps,"steps",
96 len(self.conflicts),"conflicts remain")
97 return None

Exercise 4.14 This makes no attempt to find the best alternative value for a vari-
able. Modify the code so that after selecting a variable it selects a value the reduces
the number of conflicts by the most. Have a parameter that specifies the probabil-
ity that the best value is chosen.

4.5.2 Two-Stage Choice

This is the top-level searching algorithm that maintains a priority queue of vari-
ables ordered by (the negative of) the number of conflicts, so that the variable
with the most conflicts is selected first. If there is no current priority queue of
variables, one is created.
The main complexity here is to maintain the priority queue. When a vari-
able var is assigned a value val, for each constraint that has become satisfied
or unsatisfied, each variable involved in the constraint need to have it’s count

https://aipython.org Version 0.9.9 October 15, 2023

4.5. Solving CSPs using Stochastic Local Search 97

updates. The change is recorded in the dictionary var differential, which is used
to update the priority queue (see Section 4.5.3).
cspSLS.py — (continued)

99 def search_with_var_pq(self,max_steps, prob_best=1.0, prob_anycon=1.0):

100 """search with a priority queue of variables.
101 This is used to select a variable with the most conflicts.
102 """
103 if not self.variable_pq:
104 self.create_pq()
105 pick_best_or_con = prob_best + prob_anycon
106 for i in range(max_steps):
107 self.number_of_steps +=1
108 randnum = random.random()
109 ## Pick a variable
110 if randnum < prob_best: # pick best variable
111 var,oldval = self.variable_pq.top()
112 elif randnum < pick_best_or_con: # pick a variable in a conflict
113 con = random_choice(self.conflicts)
114 var = random_choice(con.scope)
115 else: #pick any variable that can be selected
116 var = random_choice(self.variables_to_select)
117 if len(var.domain) > 1: # var has other values
118 ## Pick a value
119 val = random_choice([val for val in var.domain if val is not
120 self.current_assignment[var]])
121 self.display(2,"Assigning",var,val)
122 ## Update the priority queue
123 var_differential = {}
124 self.current_assignment[var]=val
125 for varcon in self.csp.var_to_const[var]:
126 self.display(3,"Checking",varcon)
127 if varcon.holds(self.current_assignment):
128 if varcon in self.conflicts: #was incons, now consis
129 self.display(3,"Became consistent",varcon)
130 self.conflicts.remove(varcon)
131 for v in varcon.scope: # v is in one fewer
conflicts
132 var_differential[v] =
var_differential.get(v,0)-1
133 else:
134 if varcon not in self.conflicts: # was consis, not now
135 self.display(3,"Became inconsistent",varcon)
136 self.conflicts.add(varcon)
137 for v in varcon.scope: # v is in one more
conflicts
138 var_differential[v] =
var_differential.get(v,0)+1
139 self.variable_pq.update_each_priority(var_differential)
140 self.display(2,"Number of conflicts",len(self.conflicts))
141 if not self.conflicts: # no conflicts, so solution found

https://aipython.org Version 0.9.9 October 15, 2023

98 4. Reasoning with Constraints

142 self.display(1,"Solution found:",

self.current_assignment,"in",
143 self.number_of_steps,"steps")
144 return self.number_of_steps
145 self.display(1,"No solution in",self.number_of_steps,"steps",
146 len(self.conflicts),"conflicts remain")
147 return None

create pq creates an updatable priority queue of the variables, ordered by the

number of conflicts they participate in. The priority queue only includes vari-
ables in conflicts and the value of a variable is the negative of the number of
conflicts the variable is in. This ensures that the priority queue, which picks
the minimum value, picks a variable with the most conflicts.

cspSLS.py — (continued)

149 def create_pq(self):

150 """Create the variable to number-of-conflicts priority queue.
151 This is needed to select the variable in the most conflicts.
152
153 The value of a variable in the priority queue is the negative of the
154 number of conflicts the variable appears in.
155 """
156 self.variable_pq = Updatable_priority_queue()
157 var_to_number_conflicts = {}
158 for con in self.conflicts:
159 for var in con.scope:
160 var_to_number_conflicts[var] =
var_to_number_conflicts.get(var,0)+1
161 for var,num in var_to_number_conflicts.items():
162 if num>0:
163 self.variable_pq.add(var,-num)

cspSLS.py — (continued)

165 def random_choice(st):

166 """selects a random element from set st.
167 It would be more efficient to convert to a tuple or list only once
168 (left as exercise)."""
169 return random.choice(tuple(st))

Exercise 4.15 This makes no attempt to find the best alternative value for a vari-
able. Modify the code so that after selecting a variable it selects a value the reduces
the number of conflicts by the most. Have a parameter that specifies the probabil-
ity that the best value is chosen.

Exercise 4.16 These implementations always select a value for the variable se-
lected that is different from its current value (if that is possible). Change the code
so that it does not have this restriction (so it can leave the value the same). Would
you expect this code to be faster? Does it work worse (or better)?

https://aipython.org Version 0.9.9 October 15, 2023

4.5. Solving CSPs using Stochastic Local Search 99

4.5.3 Updatable Priority Queues

An updatable priority queue is a priority queue, where key-value pairs can be
stored, and the pair with the smallest key can be found and removed quickly,
and where the values can be updated. This implementation follows the idea
of http://docs.python.org/3.9/library/heapq.html, where the updated ele-
ments are marked as removed. This means that the priority queue can be used
unmodified. However, this might be expensive if changes are more common
than popping (as might happen if the probability of choosing the best is close
to zero).
In this implementation, the equal values are sorted randomly. This is achieved
by having the elements of the heap being [val, rand, elt] triples, where the sec-
ond element is a random number. Note that Python requires this to be a list,
not a tuple, as the tuple cannot be modified.

cspSLS.py — (continued)

171 class Updatable_priority_queue(object):

172 """A priority queue where the values can be updated.
173 Elements with the same value are ordered randomly.
174
175 This code is based on the ideas described in
176 http://docs.python.org/3.3/library/heapq.html
177 It could probably be done more efficiently by
178 shuffling the modified element in the heap.
179 """
180 def __init__(self):
181 self.pq = [] # priority queue of [val,rand,elt] triples
182 self.elt_map = {} # map from elt to [val,rand,elt] triple in pq
183 self.REMOVED = "*removed*" # a string that won't be a legal element
184 self.max_size=0
185
186 def add(self,elt,val):
187 """adds elt to the priority queue with priority=val.
188 """
189 assert val <= 0,val
190 assert elt not in self.elt_map, elt
191 new_triple = [val, random.random(),elt]
192 heapq.heappush(self.pq, new_triple)
193 self.elt_map[elt] = new_triple
194
195 def remove(self,elt):
196 """remove the element from the priority queue"""
197 if elt in self.elt_map:
198 self.elt_map[elt][2] = self.REMOVED
199 del self.elt_map[elt]
200
201 def update_each_priority(self,update_dict):
202 """update values in the priority queue by subtracting the values in
203 update_dict from the priority of those elements in priority queue.

https://aipython.org Version 0.9.9 October 15, 2023

100 4. Reasoning with Constraints

204 """
205 for elt,incr in update_dict.items():
206 if incr != 0:
207 newval = self.elt_map.get(elt,[0])[0] - incr
208 assert newval <= 0, f"{elt}:{newval+incr}-{incr}"
209 self.remove(elt)
210 if newval != 0:
211 self.add(elt,newval)
212
213 def pop(self):
214 """Removes and returns the (elt,value) pair with minimal value.
215 If the priority queue is empty, IndexError is raised.
216 """
217 self.max_size = max(self.max_size, len(self.pq)) # keep statistics
218 triple = heapq.heappop(self.pq)
219 while triple[2] == self.REMOVED:
220 triple = heapq.heappop(self.pq)
221 del self.elt_map[triple[2]]
222 return triple[2], triple[0] # elt, value
223
224 def top(self):
225 """Returns the (elt,value) pair with minimal value, without
removing it.
226 If the priority queue is empty, IndexError is raised.
227 """
228 self.max_size = max(self.max_size, len(self.pq)) # keep statistics
229 triple = self.pq[0]
230 while triple[2] == self.REMOVED:
231 heapq.heappop(self.pq)
232 triple = self.pq[0]
233 return triple[2], triple[0] # elt, value
234
235 def empty(self):
236 """returns True iff the priority queue is empty"""
237 return all(triple[2] == self.REMOVED for triple in self.pq)

4.5.4 Plotting Run-Time Distributions

Runtime distribution uses matplotlib to plot run time distributions. Here the
run time is a misnomer as we are only plotting the number of steps, not the
time. Computing the run time is non-trivial as many of the runs have a very
short run time. To compute the time accurately would require running the
same code, with the same random seed, multiple times to get a good estimate
of the run time. This is left as an exercise.
cspSLS.py — (continued)

239 import matplotlib.pyplot as plt

240 # plt.style.use('grayscale')
241

https://aipython.org Version 0.9.9 October 15, 2023

4.5. Solving CSPs using Stochastic Local Search 101

242 class Runtime_distribution(object):

243 def __init__(self, csp, xscale='log'):
244 """Sets up plotting for csp
245 xscale is either 'linear' or 'log'
246 """
247 self.csp = csp
248 plt.ion()
249 plt.xlabel("Number of Steps")
250 plt.ylabel("Cumulative Number of Runs")
251 plt.xscale(xscale) # Makes a 'log' or 'linear' scale
252
253 def plot_runs(self,num_runs=100,max_steps=1000, prob_best=1.0,
prob_anycon=1.0):
254 """Plots num_runs of SLS for the given settings.
255 """
256 stats = []
257 SLSearcher.max_display_level, temp_mdl = 0,
SLSearcher.max_display_level # no display
258 for i in range(num_runs):
259 searcher = SLSearcher(self.csp)
260 num_steps = searcher.search(max_steps, prob_best, prob_anycon)
261 if num_steps:
262 stats.append(num_steps)
263 stats.sort()
264 if prob_best >= 1.0:
265 label = "P(best)=1.0"
266 else:
267 p_ac = min(prob_anycon, 1-prob_best)
268 label = "P(best)=%.2f, P(ac)=%.2f" % (prob_best, p_ac)
269 plt.plot(stats,range(len(stats)),label=label)
270 plt.legend(loc="upper left")
271 SLSearcher.max_display_level= temp_mdl #restore display
Figure 4.9 gives run-time distributions for 3 algorithms. It is also useful to
compare the distributions of different runs of the same algorithms and settings.

4.5.5 Testing
cspSLS.py — (continued)

273 import cspExamples

274 def sls_solver(csp,prob_best=0.7):
275 """stochastic local searcher (prob_best=0.7)"""
276 se0 = SLSearcher(csp)
277 se0.search(1000,prob_best)
278 return se0.current_assignment
279 def any_conflict_solver(csp):
280 """stochastic local searcher (any-conflict)"""
281 return sls_solver(csp,0)
282

https://aipython.org Version 0.9.9 October 15, 2023

102 4. Reasoning with Constraints

1000 P(best)=0.00, P(ac)=1.00

P(best)=1.0
P(best)=0.70, P(ac)=0.30
800
Cumulative Number of Runs

600

400

200

0
100 101 102 103
Number of Steps

Figure 4.9: Run-time distributions for three algorithms on csp2.

283 if name == "main":

284 cspExamples.test_csp(sls_solver)
285 cspExamples.test_csp(any_conflict_solver)
286
287 ## Test Solving CSPs with Search:
288 #se1 = SLSearcher(cspExamples.csp1); print(se1.search(100))
289 #se2 = SLSearcher(cspExamples.csp2); print(se2.search(1000,1.0)) # greedy
290 #se2 = SLSearcher(cspExamples.csp2); print(se2.search(1000,0)) #
any_conflict
291 #se2 = SLSearcher(cspExamples.csp2); print(se2.search(1000,0.7)) # 70%
greedy; 30% any_conflict
292 #SLSearcher.max_display_level=2 #more detailed display
293 #se3 = SLSearcher(cspExamples.crossword1); print(se3.search(100),0.7)
294 #p = Runtime_distribution(cspExamples.csp2)
295 #p.plot_runs(1000,1000,0) # any_conflict
296 #p.plot_runs(1000,1000,1.0) # greedy
297 #p.plot_runs(1000,1000,0.7) # 70% greedy; 30% any_conflict
Exercise 4.17 Modify this to plot the run time, instead of the number of steps.
To measure run time use timeit (https://docs.python.org/3.9/library/timeit.
html). Small run times are inaccurate, so timeit can run the same code multi-
ple times. Stochastic local algorithms give different run times each time called.
To make the timing meaningful, you need to make sure the random seed is the
same for each repeated call (see random.getstate and random.setstate in https:
//docs.python.org/3.9/library/random.html). Because the run time for differ-
ent seeds can vary a great deal, for each seed, you should start with 1 iteration and
multiplying it by, say 10, until the time is greater than 0.2 seconds. Make sure you

https://aipython.org Version 0.9.9 October 15, 2023

4.6. Discrete Optimization 103

plot the average time for each run. Before you start, try to estimate the total run
time, so you will be able to tell if there is a problem with the algorithm stopping.

4.6 Discrete Optimization

A SoftConstraint is a constraint, but where the condition is a real-valued func-
tion. Because the definition of the constraint class did not force the condition
to be Boolean, you can use the Constraint class for soft constraints too.
cspSoft.py — Representations of Soft Constraints
11 from cspProblem import Variable, Constraint, CSP
12 class SoftConstraint(Constraint):
13 """A Constraint consists of
14 * scope: a tuple of variables
15 * function: a real-valued function that can applied to a tuple of values
16 * string: a string for printing the constraints. All of the strings
must be unique.
17 for the variables
18 """
19 def __init__(self, scope, function, string=None, position=None):
20 Constraint.__init__(self, scope, function, string, position)
21
22 def value(self,assignment):
23 return self.holds(assignment)

cspSoft.py — (continued)

25 A = Variable('A', {1,2}, position=(0.2,0.9))

26 B = Variable('B', {1,2,3}, position=(0.8,0.9))
27 C = Variable('C', {1,2}, position=(0.5,0.5))
28 D = Variable('D', {1,2}, position=(0.8,0.1))
29
30 def c1fun(a,b):
31 if a==1: return (5 if b==1 else 2)
32 else: return (0 if b==1 else 4 if b==2 else 3)
33 c1 = SoftConstraint([A,B],c1fun,"c1")
34 def c2fun(b,c):
35 if b==1: return (5 if c==1 else 2)
36 elif b==2: return (0 if c==1 else 4)
37 else: return (2 if c==1 else 0)
38 c2 = SoftConstraint([B,C],c2fun,"c2")
39 def c3fun(b,d):
40 if b==1: return (3 if d==1 else 0)
41 elif b==2: return 2
42 else: return (2 if d==1 else 4)
43 c3 = SoftConstraint([B,D],c3fun,"c3")
44
45 def penalty_if_same(pen):
46 "returns a function that gives a penalty of pen if the arguments are
the same"

https://aipython.org Version 0.9.9 October 15, 2023

104 4. Reasoning with Constraints

47 return lambda x,y: (pen if (x==y) else 0)

48
49 c4 = SoftConstraint([C,A],penalty_if_same(3),"c4")
50
51 scsp1 = CSP("scsp1", {A,B,C,D}, [c1,c2,c3,c4])
52
53 ### The second soft CSP has an extra variable, and 2 constraints
54 E = Variable('E', {1,2}, position=(0.1,0.1))
55
56 c5 = SoftConstraint([C,E],penalty_if_same(3),"c5")
57 c6 = SoftConstraint([D,E],penalty_if_same(2),"c6")
58 scsp2 = CSP("scsp1", {A,B,C,D,E}, [c1,c2,c3,c4,c5,c6])

4.6.1 Branch-and-bound Search

Here we specialize the branch-and-bound algorithm (Section 3.3 on page 62) to
solve soft CSP problems.

cspSoft.py — (continued)

60 from display import Displayable, visualize

61 import math
62
63 class DF_branch_and_bound_opt(Displayable):
64 """returns a branch and bound searcher for a problem.
65 An optimal assignment with cost less than bound can be found by calling
search()
66 """
67 def __init__(self, csp, bound=math.inf):
68 """creates a searcher than can be used with search() to find an
optimal path.
69 bound gives the initial bound. By default this is infinite -
meaning there
70 is no initial pruning due to depth bound
71 """
72 super().__init__()
73 self.csp = csp
74 self.best_asst = None
75 self.bound = bound
76
77 def optimize(self):
78 """returns an optimal solution to a problem with cost less than
bound.
79 returns None if there is no solution with cost less than bound."""
80 self.num_expanded=0
81 self.cbsearch({}, 0, self.csp.constraints)
82 self.display(1,"Number of paths expanded:",self.num_expanded)
83 return self.best_asst, self.bound
84
85 def cbsearch(self, asst, cost, constraints):

https://aipython.org Version 0.9.9 October 15, 2023

4.6. Discrete Optimization 105

86 """finds the optimal solution that extends path and is less the
bound"""
87 self.display(2,"cbsearch:",asst,cost,constraints)
88 can_eval = [c for c in constraints if c.can_evaluate(asst)]
89 rem_cons = [c for c in constraints if c not in can_eval]
90 newcost = cost + sum(c.value(asst) for c in can_eval)
91 self.display(2,"Evaluaing:",can_eval,"cost:",newcost)
92 if newcost < self.bound:
93 self.num_expanded += 1
94 if rem_cons==[]:
95 self.best_asst = asst
96 self.bound = newcost
97 self.display(1,"New best assignment:",asst," cost:",newcost)
98 else:
99 var = next(var for var in self.csp.variables if var not in
asst)
100 for val in var.domain:
101 self.cbsearch({var:val}|asst, newcost, rem_cons)
102
103 # bnb = DF_branch_and_bound_opt(scsp1)
104 # bnb.max_display_level=3 # show more detail
105 # bnb.optimize()

Exercise 4.18 Change the stochastic-local search algorithms to work for soft con-
straints. Hint: The analog of a conflict is a soft constraint that is not at its lowest
value. Instead of the number of constraints violated, consider how much a change
in a variable affects the objective function. Instead of returning a solution, return
the best assignment found.

https://aipython.org Version 0.9.9 October 15, 2023

Chapter 5

Propositions and Inference

5.1 Representing Knowledge Bases

A clause consists of a head (an atom) and a body. A body is represented as a
list of atoms. Atoms are represented as strings.
logicProblem.py — Representations Logics
11 class Clause(object):
12 """A definite clause"""
13
14 def __init__(self,head,body=[]):
15 """clause with atom head and lost of atoms body"""
16 self.head=head
17 self.body = body
18
19 def __repr__(self):
20 """returns the string representation of a clause.
21 """
22 if self.body:
23 return self.head + " <- " + " & ".join(str(a) for a in
self.body) + "\n"
24 else:
25 return self.head + "."

An askable atom can be asked of the user. The user can respond in English or
French or just with a “y”.
logicProblem.py — (continued)

27 class Askable(object):
28 """An askable atom"""
29
30 def __init__(self,atom):

107
108 5. Propositions and Inference

31 """clause with atom head and lost of atoms body"""

32 self.atom=atom
33
34 def __str__(self):
35 """returns the string representation of a clause."""
36 return "askable " + self.atom + "."
37
38 def yes(ans):
39 """returns true if the answer is yes in some form"""
40 return ans.lower() in ['yes', 'yes.', 'oui', 'oui.', 'y', 'y.'] #
bilingual
A knowledge base is a list of clauses and askables. In order to make top-down
inference faster, this creates a dictionary that maps each atoms into the set of
clauses with that atom in the head.
logicProblem.py — (continued)

42 from display import Displayable

43
44 class KB(Displayable):
45 """A knowledge base consists of a set of clauses.
46 This also creates a dictionary to give fast access to the clauses with
an atom in head.
47 """
48 def __init__(self, statements=[]):
49 self.statements = statements
50 self.clauses = [c for c in statements if isinstance(c, Clause)]
51 self.askables = [c.atom for c in statements if isinstance(c,
Askable)]
52 self.atom_to_clauses = {} # dictionary giving clauses with atom as
head
53 for c in self.clauses:
54 self.add_clause(c)
55
56 def add_clause(self, c):
57 if c.head in self.atom_to_clauses:
58 self.atom_to_clauses[c.head].add(c)
59 else:
60 self.atom_to_clauses[c.head] = {c}
61
62 def clauses_for_atom(self,a):
63 """returns set of clauses with atom a as the head"""
64 if a in self.atom_to_clauses:
65 return self.atom_to_clauses[a]
66 else:
67 return set()
68
69 def __str__(self):
70 """returns a string representation of this knowledge base.
71 """
72 return '\n'.join([str(c) for c in self.statements])

https://aipython.org Version 0.9.9 October 15, 2023

5.1. Representing Knowledge Bases 109

Here is a trivial example (I think therefore I am) using in the unit tests:
logicProblem.py — (continued)

74 triv_KB = KB([
75 Clause('i_am', ['i_think']),
76 Clause('i_think'),
77 Clause('i_smell', ['i_exist'])
78 ])
Here is a representation of the electrical domain of the textbook:
logicProblem.py — (continued)

80 elect = KB([
81 Clause('light_l1'),
82 Clause('light_l2'),
83 Clause('ok_l1'),
84 Clause('ok_l2'),
85 Clause('ok_cb1'),
86 Clause('ok_cb2'),
87 Clause('live_outside'),
88 Clause('live_l1', ['live_w0']),
89 Clause('live_w0', ['up_s2','live_w1']),
90 Clause('live_w0', ['down_s2','live_w2']),
91 Clause('live_w1', ['up_s1', 'live_w3']),
92 Clause('live_w2', ['down_s1','live_w3' ]),
93 Clause('live_l2', ['live_w4']),
94 Clause('live_w4', ['up_s3','live_w3' ]),
95 Clause('live_p_1', ['live_w3']),
96 Clause('live_w3', ['live_w5', 'ok_cb1']),
97 Clause('live_p_2', ['live_w6']),
98 Clause('live_w6', ['live_w5', 'ok_cb2']),
99 Clause('live_w5', ['live_outside']),
100 Clause('lit_l1', ['light_l1', 'live_l1', 'ok_l1']),
101 Clause('lit_l2', ['light_l2', 'live_l2', 'ok_l2']),
102 Askable('up_s1'),
103 Askable('down_s1'),
104 Askable('up_s2'),
105 Askable('down_s2'),
106 Askable('up_s3'),
107 Askable('down_s2')
108 ])
109
110 # print(kb)
The following knowledge base is false of the intended interpretation. One of
the clauses is wrong; can you see which one? We will show how to debug it.
logicProblem.py — (continued)

111 elect_bug = KB([

112 Clause('light_l2'),
113 Clause('ok_l1'),
114 Clause('ok_l2'),

https://aipython.org Version 0.9.9 October 15, 2023

110 5. Propositions and Inference

115 Clause('ok_cb1'),
116 Clause('ok_cb2'),
117 Clause('live_outside'),
118 Clause('live_p_2', ['live_w6']),
119 Clause('live_w6', ['live_w5', 'ok_cb2']),
120 Clause('light_l1'),
121 Clause('live_w5', ['live_outside']),
122 Clause('lit_l1', ['light_l1', 'live_l1', 'ok_l1']),
123 Clause('lit_l2', ['light_l2', 'live_l2', 'ok_l2']),
124 Clause('live_l1', ['live_w0']),
125 Clause('live_w0', ['up_s2','live_w1']),
126 Clause('live_w0', ['down_s2','live_w2']),
127 Clause('live_w1', ['up_s3', 'live_w3']),
128 Clause('live_w2', ['down_s1','live_w3' ]),
129 Clause('live_l2', ['live_w4']),
130 Clause('live_w4', ['up_s3','live_w3' ]),
131 Clause('live_p_1', ['live_w3']),
132 Clause('live_w3', ['live_w5', 'ok_cb1']),
133 Askable('up_s1'),
134 Askable('down_s1'),
135 Askable('up_s2'),
136 Clause('light_l2'),
137 Clause('ok_l1'),
138 Clause('light_l2'),
139 Clause('ok_l1'),
140 Clause('ok_l2'),
141 Clause('ok_cb1'),
142 Clause('ok_cb2'),
143 Clause('live_outside'),
144 Clause('live_p_2', ['live_w6']),
145 Clause('live_w6', ['live_w5', 'ok_cb2']),
146 Clause('ok_l2'),
147 Clause('ok_cb1'),
148 Clause('ok_cb2'),
149 Clause('live_outside'),
150 Clause('live_p_2', ['live_w6']),
151 Clause('live_w6', ['live_w5', 'ok_cb2']),
152 Askable('down_s2'),
153 Askable('up_s3'),
154 Askable('down_s2')
155 ])
156
157 # print(kb)

5.2 Bottom-up Proofs (with askables)

fixed point computes the fixed point of the knowledge base kb.

logicBottomUp.py — Bottom-up Proof Procedure for Definite Clauses

https://aipython.org Version 0.9.9 October 15, 2023

5.2. Bottom-up Proofs (with askables) 111

11 from logicProblem import yes

12
13 def fixed_point(kb):
14 """Returns the fixed point of knowledge base kb.
15 """
16 fp = ask_askables(kb)
17 added = True
18 while added:
19 added = False # added is true when an atom was added to fp this
iteration
20 for c in kb.clauses:
21 if c.head not in fp and all(b in fp for b in c.body):
22 fp.add(c.head)
23 added = True
24 kb.display(2,c.head,"added to fp due to clause",c)
25 return fp
26
27 def ask_askables(kb):
28 return {at for at in kb.askables if yes(input("Is "+at+" true? "))}
The following provides a trivial unit test, by default using the knowledge base
triv_KB:
logicBottomUp.py — (continued)

30 from logicProblem import triv_KB

31 def test(kb=triv_KB, fixedpt = {'i_am','i_think'}):
32 fp = fixed_point(kb)
33 assert fp == fixedpt, f"kb gave result {fp}"
34 print("Passed unit test")
35 if __name__ == "__main__":
36 test()
37
38 from logicProblem import elect
39 # elect.max_display_level=3 # give detailed trace
40 # fixed_point(elect)

Exercise 5.1 It is not very user-friendly to ask all of the askables up-front. Imple-
ment ask-the-user so that questions are only asked if useful, and are not re-asked.
For example, if there is a clause h ← a ∧ b ∧ c ∧ d ∧ e, where c and e are askable,
c and e only need to be asked if a, b, d are all in fp and they have not been asked
before. Askable e only needs to be asked if the user says “yes” to c. Askable c
doesn’t need to be asked if the user previously replied “no” to e.
This form of ask-the-user can ask a different set of questions than the top-
down interpreter that asks questions when encountered. Give an example where
they ask different questions (neither set of questions asked is a subset of the other).
Exercise 5.2 This algorithm runs in time O(n2 ), where n is the number of clauses,
for a bounded number of elements in the body; each iteration goes through each
of the clauses, and in the worst case, it will do an iteration for each clause. It is
possible to implement this in time O(n) time by creating an index that maps an
atom to the set of clauses with that atom in the body. Implement this. What is its

https://aipython.org Version 0.9.9 October 15, 2023

112 5. Propositions and Inference

complexity as a function of n and b, the maximum number of atoms in the body of

a clause?
Exercise 5.3 It is possible to be asymptotically more efficient (in terms of the
number of elements in a body) than the method in the previous question by notic-
ing that each element of the body of clause only needs to be checked once. For
example, the clause a ← b ∧ c ∧ d, needs only be considered when b is added to fp.
Once b is added to fp, if c is already in pf , we know that a can be added as soon
as d is added. Implement this. What is its complexity as a function of n and b, the
maximum number of atoms in the body of a clause?

5.3 Top-down Proofs (with askables)

prove(kb, goal) is used to prove goal from a knowledge base, kb, where a goal is
a list of atoms. It returns True if kb ⊢ goal. The indent is used when displaying
the code (and doesn’t need to be called initially with a non-default value).
logicTopDown.py — Top-down Proof Procedure for Definite Clauses
11 from logicProblem import yes
12
13 def prove(kb, ans_body, indent=""):
14 """returns True if kb |- ans_body
15 ans_body is a list of atoms to be proved
16 """
17 kb.display(2,indent,'yes <-',' & '.join(ans_body))
18 if ans_body:
19 selected = ans_body[0] # select first atom from ans_body
20 if selected in kb.askables:
21 return (yes(input("Is "+selected+" true? "))
22 and prove(kb,ans_body[1:],indent+" "))
23 else:
24 return any(prove(kb,cl.body+ans_body[1:],indent+" ")
25 for cl in kb.clauses_for_atom(selected))
26 else:
27 return True # empty body is true
The following provides a simple unit test that is hard wired for triv_KB:
logicTopDown.py — (continued)

29 from logicProblem import triv_KB

30 def test():
31 a1 = prove(triv_KB,['i_am'])
32 assert a1, f"triv_KB proving i_am gave {a1}"
33 a2 = prove(triv_KB,['i_smell'])
34 assert not a2, f"triv_KB proving i_smell gave {a2}"
35 print("Passed unit tests")
36 if __name__ == "__main__":
37 test()
38 # try
39 from logicProblem import elect

https://aipython.org Version 0.9.9 October 15, 2023

5.4. Debugging and Explanation 113

40 # elect.max_display_level=3 # give detailed trace

41 # prove(elect,['live_w6'])
42 # prove(elect,['lit_l1'])

Exercise 5.4 This code can re-ask a question multiple times. Implement this code
so that it only asks a question once and remembers the answer. Also implement a
function to forget the answers.
Exercise 5.5 What search method is this using? Implement the search interface
so that it can use A∗ or other searching methods. Define an admissible heuristic
that is not always 0.

5.4 Debugging and Explanation

Here we modify the top-down procedure to build a proof tree than can be
traversed for explanation and debugging.
prove_atom(kb,atom) returns a proof for atom from a knowledge base kb,
where a proof is a pair of the atom and the proofs for the elements of the body of
the clause used to prove the atom. prove_body(kb,body) returns a list of proofs
for list body from a knowledge base, kb. The indent is used when displaying the
code (and doesn’t need to have a non-default value).
logicExplain.py — Explaining Proof Procedure for Definite Clauses
11 from logicProblem import yes # for asking the user
12
13 def prove_atom(kb, atom, indent=""):
14 """returns a pair (atom,proofs) where proofs is the list of proofs
15 of the elements of a body of a clause used to prove atom.
16 """
17 kb.display(2,indent,'proving',atom)
18 if atom in kb.askables:
19 if yes(input("Is "+atom+" true? ")):
20 return (atom,"answered")
21 else:
22 return "fail"
23 else:
24 for cl in kb.clauses_for_atom(atom):
25 kb.display(2,indent,"trying",atom,'<-',' & '.join(cl.body))
26 pr_body = prove_body(kb, cl.body, indent)
27 if pr_body != "fail":
28 return (atom, pr_body)
29 return "fail"
30
31 def prove_body(kb, ans_body, indent=""):
32 """returns proof tree if kb |- ans_body or "fail" if there is no proof
33 ans_body is a list of atoms in a body to be proved
34 """
35 proofs = []
36 for atom in ans_body:

https://aipython.org Version 0.9.9 October 15, 2023

114 5. Propositions and Inference

37 proof_at = prove_atom(kb, atom, indent+" ")

38 if proof_at == "fail":
39 return "fail" # fail if any proof fails
40 else:
41 proofs.append(proof_at)
42 return proofs

The following provides a simple unit test that is hard wired for triv_KB:

logicExplain.py — (continued)

44 from logicProblem import triv_KB

45 def test():
46 a1 = prove_atom(triv_KB,'i_am')
47 assert a1, f"triv_KB proving i_am gave {a1}"
48 a2 = prove_atom(triv_KB,'i_smell')
49 assert a2=="fail", "triv_KB proving i_smell gave {a2}"
50 print("Passed unit tests")
51
52 if __name__ == "__main__":
53 test()
54
55 # try
56 from logicProblem import elect, elect_bug
57 # elect.max_display_level=3 # give detailed trace
58 # prove_atom(elect, 'live_w6')
59 # prove_atom(elect, 'lit_l1')

The interact(kb) provides an interactive interface to explore proofs for

knowledge base kb. The user can ask to prove atoms and can ask how an atom
was proved.
To ask how, there must be a current atom for which there is a proof. This
starts as the atom asked. When the user asks “how n” the current atom be-
comes the n-th element of the body of the clause used to prove the (previous)
current atom. The command “up” makes the current atom the atom in the head
of the rule containing the (previous) current atom. Thus ”how n” moves down
the proof tree and “up” moves up the proof tree, allowing the user to explore
the full proof.

logicExplain.py — (continued)

61 helptext = """Commands are:

62 ask atom ask is there is a proof for atom (atom should not be in quotes)
63 how show the clause that was used to prove atom
64 how n show the clause used to prove the nth element of the body
65 up go back up proof tree to explore other parts of the proof tree
66 kb print the knowledge base
67 quit quit this interaction (and go back to Python)
68 help print this text
69 """
70
71 def interact(kb):

https://aipython.org Version 0.9.9 October 15, 2023

5.4. Debugging and Explanation 115

72 going = True
73 ups = [] # stack for going up
74 proof="fail" # there is no proof to start
75 while going:
76 inp = input("logicExplain: ")
77 inps = inp.split(" ")
78 try:
79 command = inps[0]
80 if command == "quit":
81 going = False
82 elif command == "ask":
83 proof = prove_atom(kb, inps[1])
84 if proof == "fail":
85 print("fail")
86 else:
87 print("yes")
88 elif command == "how":
89 if proof=="fail":
90 print("there is no proof")
91 elif len(inps)==1:
92 print_rule(proof)
93 else:
94 try:
95 ups.append(proof)
96 proof = proof[1][int(inps[1])] #nth argument of rule
97 print_rule(proof)
98 except:
99 print('In "how n", n must be a number between 0
and',len(proof[1])-1,"inclusive.")
100 elif command == "up":
101 if ups:
102 proof = ups.pop()
103 else:
104 print("No rule to go up to.")
105 print_rule(proof)
106 elif command == "kb":
107 print(kb)
108 elif command == "help":
109 print(helptext)
110 else:
111 print("unknown command:", inp)
112 print("use help for help")
113 except:
114 print("unknown command:", inp)
115 print("use help for help")
116
117 def print_rule(proof):
118 (head,body) = proof
119 if body == "answered":
120 print(head,"was answered yes")

https://aipython.org Version 0.9.9 October 15, 2023

116 5. Propositions and Inference

121 elif body == []:

122 print(head,"is a fact")
123 else:
124 print(head,"<-")
125 for i,a in enumerate(body):
126 print(i,":",a[0])
127
128 # try
129 # interact(elect)
130 # Which clause is wrong in elect_bug? Try:
131 # interact(elect_bug)
132 # logicExplain: ask lit_l1
The following shows an interaction for the knowledge base elect:
>>> interact(elect)
logicExplain: ask lit_l1
Is up_s2 true? no
Is down_s2 true? yes
Is down_s1 true? yes
yes
logicExplain: how
lit_l1 <-
0 : light_l1
1 : live_l1
2 : ok_l1
logicExplain: how 1
live_l1 <-
0 : live_w0
logicExplain: how 0
live_w0 <-
0 : down_s2
1 : live_w2
logicExplain: how 0
down_s2 was answered yes
logicExplain: up
live_w0 <-
0 : down_s2
1 : live_w2
logicExplain: how 1
live_w2 <-
0 : down_s1
1 : live_w3
logicExplain: quit
>>>
Exercise 5.6 The above code only ever explores one proof – the first proof found.
Change the code to enumerate the proof trees (by returning a list all proof trees,

https://aipython.org Version 0.9.9 October 15, 2023

5.5. Assumables 117

or preferably using yield). Add the command ”retry” to the user interface to try
another proof.

5.5 Assumables
Atom a can be made assumable by including Assumable(a) in the knowledge
base. A knowledge base that can include assumables is declared with KBA.

logicAssumables.py — Definite clauses with assumables

11 from logicProblem import Clause, Askable, KB, yes
12
13 class Assumable(object):
14 """An askable atom"""
15
16 def __init__(self,atom):
17 """clause with atom head and lost of atoms body"""
18 self.atom = atom
19
20 def __str__(self):
21 """returns the string representation of a clause.
22 """
23 return "assumable " + self.atom + "."
24
25 class KBA(KB):
26 """A knowledge base that can include assumables"""
27 def __init__(self,statements):
28 self.assumables = [c.atom for c in statements if isinstance(c,
Assumable)]
29 KB.__init__(self,statements)

The top-down Horn clause interpreter, prove all ass returns a list of the sets of
assumables that imply ans body. This list will contain all of the minimal sets
of assumables, but can also find non-minimal sets, and repeated sets, if they
can be generated with separate proofs. The set assumed is the set of assumables
already assumed.

logicAssumables.py — (continued)

31 def prove_all_ass(self, ans_body, assumed=set()):

32 """returns a list of sets of assumables that extends assumed
33 to imply ans_body from self.
34 ans_body is a list of atoms (it is the body of the answer clause).
35 assumed is a set of assumables already assumed
36 """
37 if ans_body:
38 selected = ans_body[0] # select first atom from ans_body
39 if selected in self.askables:
40 if yes(input("Is "+selected+" true? ")):
41 return self.prove_all_ass(ans_body[1:],assumed)
42 else:

https://aipython.org Version 0.9.9 October 15, 2023

118 5. Propositions and Inference

43 return [] # no answers
44 elif selected in self.assumables:
45 return self.prove_all_ass(ans_body[1:],assumed|{selected})
46 else:
47 return [ass
48 for cl in self.clauses_for_atom(selected)
49 for ass in
self.prove_all_ass(cl.body+ans_body[1:],assumed)
50 ] # union of answers for each clause with
head=selected
51 else: # empty body
52 return [assumed] # one answer
53
54 def conflicts(self):
55 """returns a list of minimal conflicts"""
56 return minsets(self.prove_all_ass(['false']))
Given a list of sets, minsets returns a list of the minimal sets in the list. For
example, minsets([{2, 3, 4}, {2, 3}, {6, 2, 3}, {2, 3}, {2, 4, 5}]) returns [{2, 3}, {2, 4, 5}].
logicAssumables.py — (continued)

58 def minsets(ls):
59 """ls is a list of sets
60 returns a list of minimal sets in ls
61 """
62 ans = [] # elements known to be minimal
63 for c in ls:
64 if not any(c1<c for c1 in ls) and not any(c1 <= c for c1 in ans):
65 ans.append(c)
66 return ans
67
68 # minsets([{2, 3, 4}, {2, 3}, {6, 2, 3}, {2, 3}, {2, 4, 5}])
Warning: minsets works for a list of sets or for a set of (frozen) sets, but it does
not work for a generator of sets (because ls is references in the loop). For
example, try to predict and then test:
minsets(e for e in [{2, 3, 4}, {2, 3}, {6, 2, 3}, {2, 3}, {2, 4, 5}])
The diagnoses can be constructed from the (minimal) conflicts as follows.
This also works if there are non-minimal conflicts, but is not as efficient.
logicAssumables.py — (continued)

69 def diagnoses(cons):
70 """cons is a list of (minimal) conflicts.
71 returns a list of diagnoses."""
72 if cons == []:
73 return [set()]
74 else:
75 return minsets([({e}|d) # | is set union
76 for e in cons[0]
77 for d in diagnoses(cons[1:])])

https://aipython.org Version 0.9.9 October 15, 2023

5.5. Assumables 119

Test cases:
logicAssumables.py — (continued)

80 electa = KBA([
81 Clause('light_l1'),
82 Clause('light_l2'),
83 Assumable('ok_l1'),
84 Assumable('ok_l2'),
85 Assumable('ok_s1'),
86 Assumable('ok_s2'),
87 Assumable('ok_s3'),
88 Assumable('ok_cb1'),
89 Assumable('ok_cb2'),
90 Assumable('live_outside'),
91 Clause('live_l1', ['live_w0']),
92 Clause('live_w0', ['up_s2','ok_s2','live_w1']),
93 Clause('live_w0', ['down_s2','ok_s2','live_w2']),
94 Clause('live_w1', ['up_s1', 'ok_s1', 'live_w3']),
95 Clause('live_w2', ['down_s1', 'ok_s1','live_w3' ]),
96 Clause('live_l2', ['live_w4']),
97 Clause('live_w4', ['up_s3','ok_s3','live_w3' ]),
98 Clause('live_p_1', ['live_w3']),
99 Clause('live_w3', ['live_w5', 'ok_cb1']),
100 Clause('live_p_2', ['live_w6']),
101 Clause('live_w6', ['live_w5', 'ok_cb2']),
102 Clause('live_w5', ['live_outside']),
103 Clause('lit_l1', ['light_l1', 'live_l1', 'ok_l1']),
104 Clause('lit_l2', ['light_l2', 'live_l2', 'ok_l2']),
105 Askable('up_s1'),
106 Askable('down_s1'),
107 Askable('up_s2'),
108 Askable('down_s2'),
109 Askable('up_s3'),
110 Askable('down_s2'),
111 Askable('dark_l1'),
112 Askable('dark_l2'),
113 Clause('false', ['dark_l1', 'lit_l1']),
114 Clause('false', ['dark_l2', 'lit_l2'])
115 ])
116 # electa.prove_all_ass(['false'])
117 # cs=electa.conflicts()
118 # print(cs)
119 # diagnoses(cs) # diagnoses from conflicts

Exercise 5.7 To implement a version of conflicts that never generates non-minimal

conflicts, modify prove all ass to implement iterative deepening on the number of
assumables used in a proof, and prune any set of assumables that is a superset of
a conflict.
Exercise 5.8 Implement explanations(self , body), where body is a list of atoms, that
returns the a list of the minimal explanations of the body. This does not require

https://aipython.org Version 0.9.9 October 15, 2023

120 5. Propositions and Inference

modification of prove all ass.

Exercise 5.9 Implement explanations, as in the previous question, so that it never
generates non-minimal explanations. Hint: modify prove all ass to implement iter-
ative deepening on the number of assumptions, generating conflicts and explana-
tions together, and pruning as early as possible.

5.6 Negation-as-failure
The negation af an atom a is written as Not(a) in a body.
logicNegation.py — Propositional negation-as-failure
11 from logicProblem import KB, Clause, Askable, yes
12
13 class Not(object):
14 def __init__(self, atom):
15 self.theatom = atom
16
17 def atom(self):
18 return self.theatom
19
20 def __repr__(self):
21 return f"Not({self.theatom})"

Prove with negation-as-failure (prove_naf) is like prove, but with the extra case
to cover Not:
logicNegation.py — (continued)

23 def prove_naf(kb, ans_body, indent=""):

24 """ prove with negation-as-failure and askables
25 returns True if kb |- ans_body
26 ans_body is a list of atoms to be proved
27 """
28 kb.display(2,indent,'yes <-',' & '.join(str(e) for e in ans_body))
29 if ans_body:
30 selected = ans_body[0] # select first atom from ans_body
31 if isinstance(selected, Not):
32 kb.display(2,indent,f"proving {selected.atom()}")
33 if prove_naf(kb, [selected.atom()], indent):
34 kb.display(2,indent,f"{selected.atom()} succeeded so
Not({selected.atom()}) fails")
35 return False
36 else:
37 kb.display(2,indent,f"{selected.atom()} fails so
Not({selected.atom()}) succeeds")
38 return prove_naf(kb, ans_body[1:],indent+" ")
39 if selected in kb.askables:
40 return (yes(input("Is "+selected+" true? "))
41 and prove_naf(kb,ans_body[1:],indent+" "))
42 else:

https://aipython.org Version 0.9.9 October 15, 2023

5.6. Negation-as-failure 121

43 return any(prove_naf(kb,cl.body+ans_body[1:],indent+" ")

44 for cl in kb.clauses_for_atom(selected))
45 else:
46 return True # empty body is true
Test cases:
logicNegation.py — (continued)

48 triv_KB_naf = KB([
49 Clause('i_am', ['i_think']),
50 Clause('i_think'),
51 Clause('i_smell', ['i_am', Not('dead')]),
52 Clause('i_bad', ['i_am', Not('i_think')])
53 ])
54
55 triv_KB_naf.max_display_level = 4
56 def test():
57 a1 = prove_naf(triv_KB_naf,['i_smell'])
58 assert a1, f"triv_KB_naf proving i_smell gave {a1}"
59 a2 = prove_naf(triv_KB_naf,['i_bad'])
60 assert not a2, f"triv_KB_naf proving i_bad gave {a2}"
61 print("Passed unit tests")
62 if __name__ == "__main__":
63 test()
Default reasoning about beaches at resorts (Example 5.28 of Poole and Mack-
worth [2023]):
logicNegation.py — (continued)

65 beach_KB = KB([
66 Clause('away_from_beach', [Not('on_beach')]),
67 Clause('beach_access', ['on_beach', Not('ab_beach_access')]),
68 Clause('swim_at_beach', ['beach_access', Not('ab_swim_at_beach')]),
69 Clause('ab_swim_at_beach', ['enclosed_bay', 'big_city',
Not('ab_no_swimming_near_city')]),
70 Clause('ab_no_swimming_near_city', ['in_BC', Not('ab_BC_beaches')])
71 ])
72
73 # prove_naf(beach_KB, ['away_from_beach'])
74 # prove_naf(beach_KB, ['beach_access'])
75 # beach_KB.add_clause(Clause('on_beach',[]))
76 # prove_naf(beach_KB, ['away_from_beach'])
77 # prove_naf(beach_KB, ['swim_at_beach'])
78 # beach_KB.add_clause(Clause('enclosed_bay',[]))
79 # prove_naf(beach_KB, ['swim_at_beach'])
80 # beach_KB.add_clause(Clause('big_city',[]))
81 # prove_naf(beach_KB, ['swim_at_beach'])
82 # beach_KB.add_clause(Clause('in_BC',[]))
83 # prove_naf(beach_KB, ['swim_at_beach'])

https://aipython.org Version 0.9.9 October 15, 2023

Chapter 6

Deterministic Planning

6.1 Representing Actions and Planning Prob-

lems
The STRIPS representation of an action consists of:

• the name of the action

• preconditions: a dictionary of feature:value pairs that specifies that the

feature must have this value for the action to be possible

• effects: a dictionary of feature:value pairs that are made true by this action.
In particular, a feature in the dictionary has the corresponding value (and
not its previous value) after the action, and a feature not in the dictionary
keeps its old value.

stripsProblem.py — STRIPS Representations of Actions

11 class Strips(object):
12 def __init__(self, name, preconds, effects, cost=1):
13 """
14 defines the STRIPS representation for an action:
15 * name is the name of the action
16 * preconds, the preconditions, is feature:value dictionary that
must hold
17 for the action to be carried out
18 * effects is a feature:value map that this action makes
19 true. The action changes the value of any feature specified
20 here, and leaves other features unchanged.
21 * cost is the cost of the action
22 """

123
124 6. Deterministic Planning

23 self.name = name
24 self.preconds = preconds
25 self.effects = effects
26 self.cost = cost
27
28 def __repr__(self):
29 return self.name

A STRIPS domain consists of:

• A set of actions.

• A dictionary that maps each feature into a set of possible values for the
feature.

• A list of the actions

stripsProblem.py — (continued)

31 class STRIPS_domain(object):
32 def __init__(self, feature_domain_dict, actions):
33 """Problem domain
34 feature_domain_dict is a feature:domain dictionary,
35 mapping each feature to its domain
36 actions
37 """
38 self.feature_domain_dict = feature_domain_dict
39 self.actions = actions

A planning problem consists of a planning domain, an initial state, and a

goal. The goal does not need to fully specify the final state.

stripsProblem.py — (continued)

41 class Planning_problem(object):
42 def __init__(self, prob_domain, initial_state, goal):
43 """
44 a planning problem consists of
45 * a planning domain
46 * the initial state
47 * a goal
48 """
49 self.prob_domain = prob_domain
50 self.initial_state = initial_state
51 self.goal = goal

6.1.1 Robot Delivery Domain

The following specifies the robot delivery domain of Section 6.1, shown in Fig-
ure 6.1.

https://aipython.org Version 0.9.9 October 15, 2023

6.1. Representing Actions and Planning Problems 125

Coffee
Shop
(cs) Sam's
Office
(off )

Mail Lab
Room (lab)
(mr )

Features to describe states Actions

RLoc – Rob’s location mc – move clockwise
RHC – Rob has coffee mcc – move counterclockwise
SWC – Sam wants coffee puc – pickup coffee
MW – Mail is waiting dc – deliver coffee
RHM – Rob has mail pum – pickup mail
dm – deliver mail

Figure 6.1: Robot Delivery Domain

stripsProblem.py — (continued)

53 boolean = {True, False}

54 delivery_domain = STRIPS_domain(
55 {'RLoc':{'cs', 'off', 'lab', 'mr'}, 'RHC':boolean, 'SWC':boolean,
56 'MW':boolean, 'RHM':boolean}, #feature:values dictionary
57 { Strips('mc_cs', {'RLoc':'cs'}, {'RLoc':'off'}),
58 Strips('mc_off', {'RLoc':'off'}, {'RLoc':'lab'}),
59 Strips('mc_lab', {'RLoc':'lab'}, {'RLoc':'mr'}),
60 Strips('mc_mr', {'RLoc':'mr'}, {'RLoc':'cs'}),
61 Strips('mcc_cs', {'RLoc':'cs'}, {'RLoc':'mr'}),
62 Strips('mcc_off', {'RLoc':'off'}, {'RLoc':'cs'}),
63 Strips('mcc_lab', {'RLoc':'lab'}, {'RLoc':'off'}),
64 Strips('mcc_mr', {'RLoc':'mr'}, {'RLoc':'lab'}),
65 Strips('puc', {'RLoc':'cs', 'RHC':False}, {'RHC':True}),
66 Strips('dc', {'RLoc':'off', 'RHC':True}, {'RHC':False, 'SWC':False}),
67 Strips('pum', {'RLoc':'mr','MW':True}, {'RHM':True,'MW':False}),
68 Strips('dm', {'RLoc':'off', 'RHM':True}, {'RHM':False})
69 } )

stripsProblem.py — (continued)

https://aipython.org Version 0.9.9 October 15, 2023

126 6. Deterministic Planning

b move(b,c,a) b
a c a c

move(b,c,table)

a c b

Figure 6.2: Blocks world with two actions

71 problem0 = Planning_problem(delivery_domain,
72 {'RLoc':'lab', 'MW':True, 'SWC':True, 'RHC':False,
73 'RHM':False},
74 {'RLoc':'off'})
75 problem1 = Planning_problem(delivery_domain,
76 {'RLoc':'lab', 'MW':True, 'SWC':True, 'RHC':False,
77 'RHM':False},
78 {'SWC':False})
79 problem2 = Planning_problem(delivery_domain,
80 {'RLoc':'lab', 'MW':True, 'SWC':True, 'RHC':False,
81 'RHM':False},
82 {'SWC':False, 'MW':False, 'RHM':False})

6.1.2 Blocks World

The blocks world consist of blocks and a table. Each block can be on the table
or on another block. A block can only have one other block on top of it. Figure
6.2 shows 3 states with some of the actions between them.
A state is defined by the two features:

• on where on(x) = y when block x is on block or table y

• clear where clear(x) = True when block x has nothing on it.

There is one parameterized action

• move(x, y, z) move block x from y to z, where y and z could be a block or

the table.

https://aipython.org Version 0.9.9 October 15, 2023

6.1. Representing Actions and Planning Problems 127

To handle parameterized actions (which depend on the blocks involved), the

actions and the features are all strings, created for the all combinations of the
blocks. Note that we treat moving to a block separately from moving to the
table, because the blocks needs to be clear, but the table always has room for
another block.
stripsProblem.py — (continued)

84 ### blocks world

85 def move(x,y,z):
86 """string for the 'move' action"""
87 return 'move_'+x+'_from_'+y+'_to_'+z
88 def on(x):
89 """string for the 'on' feature"""
90 return x+'_is_on'
91 def clear(x):
92 """string for the 'clear' feature"""
93 return 'clear_'+x
94 def create_blocks_world(blocks = {'a','b','c','d'}):
95 blocks_and_table = blocks | {'table'}
96 stmap = {Strips(move(x,y,z),{on(x):y, clear(x):True, clear(z):True},
97 {on(x):z, clear(y):True, clear(z):False})
98 for x in blocks
99 for y in blocks_and_table
100 for z in blocks
101 if x!=y and y!=z and z!=x}
102 stmap.update({Strips(move(x,y,'table'), {on(x):y, clear(x):True},
103 {on(x):'table', clear(y):True})
104 for x in blocks
105 for y in blocks
106 if x!=y})
107 feature_domain_dict = {on(x):blocks_and_table-{x} for x in blocks}
108 feature_domain_dict.update({clear(x):boolean for x in blocks_and_table})
109 return STRIPS_domain(feature_domain_dict, stmap)
The problem blocks1 is a classic example, with 3 blocks, and the goal consists of
two conditions. See Figure 6.3. Note that this example is challenging because
we can’t achieve one of the goals and then the other; whichever one we achieve
first has to be undone to achieve the second.
stripsProblem.py — (continued)

111 blocks1dom = create_blocks_world({'a','b','c'})

112 blocks1 = Planning_problem(blocks1dom,
113 {on('a'):'table', clear('a'):True,
114 on('b'):'c', clear('b'):True,
115 on('c'):'table', clear('c'):False}, # initial state
116 {on('a'):'b', on('c'):'a'}) #goal
The problem blocks2 is one to invert a tower of size 4.
stripsProblem.py — (continued)

118 blocks2dom = create_blocks_world({'a','b','c','d'})

https://aipython.org Version 0.9.9 October 15, 2023

128 6. Deterministic Planning

c
b
a
a c
b

Figure 6.3: Blocks problem blocks1

119 tower4 = {clear('a'):True, on('a'):'b',

120 clear('b'):False, on('b'):'c',
121 clear('c'):False, on('c'):'d',
122 clear('d'):False, on('d'):'table'}
123 blocks2 = Planning_problem(blocks2dom,
124 tower4, # initial state
125 {on('d'):'c',on('c'):'b',on('b'):'a'}) #goal
The problem blocks3 is to move the bottom block to the top of a tower of size 4.
stripsProblem.py — (continued)

127 blocks3 = Planning_problem(blocks2dom,

128 tower4, # initial state
129 {on('d'):'a', on('a'):'b', on('b'):'c'}) #goal

Exercise 6.1 Represent the problem of given a tower of 4 blocks (a on b on c on

d on table), the goal is to have a tower with the previous top block on the bottom
(b on c on d on a). Do not include the table in your goal (the goal does not care
whether a is on the table). [Before you run the program, estimate how many steps
it will take to solve this.] How many steps does an optimal planner take?
Exercise 6.2 Represent the domain so that on(x, y) is a Boolean feature that is True
when x is on y, Does the representation of the state need to not include negative
on facts? Why or why not? (Note that this may depend on the planner; write your
answer with respect to particular planners.)
Exercise 6.3 It is possible to write the representation of the problem without
using clear, where clear(x) means nothing is on x. Change the definition of the
blocks world so that it does not use clear but uses on being false instead. Does this
work better for any of the planners?

6.2 Forward Planning

To run the demo, in folder ”aipython”, load
”stripsForwardPlanner.py”, and copy and paste the commented-
out example queries at the bottom of that file.

https://aipython.org Version 0.9.9 October 15, 2023

6.2. Forward Planning 129

In a forward planner, a node is a state. A state consists of an assignment,

which is a variable:value dictionary. In order to be able to do multiple-path
pruning, we need to define a hash function, and equality between states.
stripsForwardPlanner.py — Forward Planner with STRIPS actions
11 from searchProblem import Arc, Search_problem
12 from stripsProblem import Strips, STRIPS_domain
13
14 class State(object):
15 def __init__(self,assignment):
16 self.assignment = assignment
17 self.hash_value = None
18 def __hash__(self):
19 if self.hash_value is None:
20 self.hash_value = hash(frozenset(self.assignment.items()))
21 return self.hash_value
22 def __eq__(self,st):
23 return self.assignment == st.assignment
24 def __str__(self):
25 return str(self.assignment)
In order to define a search problem (page 39), we need to define the goal
condition, the start nodes, the neighbours, and (optionally) a heuristic function.
Here zero is the default heuristic function.
stripsForwardPlanner.py — (continued)

27 def zero(*args,**nargs):
28 """always returns 0"""
29 return 0
30
31 class Forward_STRIPS(Search_problem):
32 """A search problem from a planning problem where:
33 * a node is a state object.
34 * the dynamics are specified by the STRIPS representation of actions
35 """
36 def __init__(self, planning_problem, heur=zero):
37 """creates a forward search space from a planning problem.
38 heur(state,goal) is a heuristic function,
39 an underestimate of the cost from state to goal, where
40 both state and goals are feature:value dictionaries.
41 """
42 self.prob_domain = planning_problem.prob_domain
43 self.initial_state = State(planning_problem.initial_state)
44 self.goal = planning_problem.goal
45 self.heur = heur
46
47 def is_goal(self, state):
48 """is True if node is a goal.
49
50 Every goal feature has the same value in the state and the goal."""
51 return all(state.assignment[prop]==self.goal[prop]

https://aipython.org Version 0.9.9 October 15, 2023

130 6. Deterministic Planning

52 for prop in self.goal)

53
54 def start_node(self):
55 """returns start node"""
56 return self.initial_state
57
58 def neighbors(self,state):
59 """returns neighbors of state in this problem"""
60 return [ Arc(state, self.effect(act,state.assignment), act.cost,
act)
61 for act in self.prob_domain.actions
62 if self.possible(act,state.assignment)]
63
64 def possible(self,act,state_asst):
65 """True if act is possible in state.
66 act is possible if all of its preconditions have the same value in
the state"""
67 return all(state_asst[pre] == act.preconds[pre]
68 for pre in act.preconds)
69
70 def effect(self,act,state_asst):
71 """returns the state that is the effect of doing act given
state_asst
72 Python 3.9: return state_asst | act.effects"""
73 new_state_asst = state_asst.copy()
74 new_state_asst.update(act.effects)
75 return State(new_state_asst)
76
77 def heuristic(self,state):
78 """in the forward planner a node is a state.
79 the heuristic is an (under)estimate of the cost
80 of going from the state to the top-level goal.
81 """
82 return self.heur(state.assignment, self.goal)

Here are some test cases to try.

stripsForwardPlanner.py — (continued)

84 from searchBranchAndBound import DF_branch_and_bound

85 from searchMPP import SearcherMPP
86 import stripsProblem
87
88 # SearcherMPP(Forward_STRIPS(stripsProblem.problem1)).search() #A* with MPP
89 # DF_branch_and_bound(Forward_STRIPS(stripsProblem.problem1),10).search()
#B&B
90 # To find more than one plan:
91 # s1 = SearcherMPP(Forward_STRIPS(stripsProblem.problem1)) #A*
92 # s1.search() #find another plan

https://aipython.org Version 0.9.9 October 15, 2023

6.2. Forward Planning 131

6.2.1 Defining Heuristics for a Planner

Each planning domain requires its own heuristics. If you change the actions,
you will need to reconsider the heuristic function, as there might then be a
lower-cost path, which might make the heuristic non-admissible.
Here is an example of defining heuristics for the coffee delivery planning
domain.
First we define the distance between two locations, which is used for the
heuristics.
stripsHeuristic.py — Planner with Heuristic Function
11 def dist(loc1, loc2):
12 """returns the distance from location loc1 to loc2
13 """
14 if loc1==loc2:
15 return 0
16 if {loc1,loc2} in [{'cs','lab'},{'mr','off'}]:
17 return 2
18 else:
19 return 1
Note that the current state is a complete description; there is a value for
every feature. However the goal need not be complete; it does not need to
define a value for every feature. Before checking the value for a feature in the
goal, a heuristic needs to define whether the feature is defined in the goal.
stripsHeuristic.py — (continued)

21 def h1(state,goal):
22 """ the distance to the goal location, if there is one"""
23 if 'RLoc' in goal:
24 return dist(state['RLoc'], goal['RLoc'])
25 else:
26 return 0
27
28 def h2(state,goal):
29 """ the distance to the coffee shop plus getting coffee and delivering
it
30 if the robot needs to get coffee
31 """
32 if ('SWC' in goal and goal['SWC']==False
33 and state['SWC']==True
34 and state['RHC']==False):
35 return dist(state['RLoc'],'cs')+3
36 else:
37 return 0
The maximum of the values of a set of admissible heuristics is also an admis-
sible heuristic. The function maxh takes a number of heuristic functions as ar-
guments, and returns a new heuristic function that takes the maximum of the
values of the heuristics. For example, h1 and h2 are heuristic functions and so
maxh(h1,h2) is also. maxh can take an arbitrary number of arguments.

https://aipython.org Version 0.9.9 October 15, 2023

132 6. Deterministic Planning

stripsHeuristic.py — (continued)

39 def maxh(*heuristics):
40 """Returns a new heuristic function that is the maximum of the
functions in heuristics.
41 heuristics is the list of arguments which must be heuristic functions.
42 """
43 # return lambda state,goal: max(h(state,goal) for h in heuristics)
44 def newh(state,goal):
45 return max(h(state,goal) for h in heuristics)
46 return newh

The following runs the example with and without the heuristic.

stripsHeuristic.py — (continued)

48 ##### Forward Planner #####

49 from searchMPP import SearcherMPP
50 from stripsForwardPlanner import Forward_STRIPS
51 import stripsProblem
52
53 def test_forward_heuristic(thisproblem=stripsProblem.problem1):
54 print("\n***** FORWARD NO HEURISTIC")
55 print(SearcherMPP(Forward_STRIPS(thisproblem)).search())
56
57 print("\n***** FORWARD WITH HEURISTIC h1")
58 print(SearcherMPP(Forward_STRIPS(thisproblem,h1)).search())
59
60 print("\n***** FORWARD WITH HEURISTIC h2")
61 print(SearcherMPP(Forward_STRIPS(thisproblem,h2)).search())
62
63 print("\n***** FORWARD WITH HEURISTICs h1 and h2")
64 print(SearcherMPP(Forward_STRIPS(thisproblem,maxh(h1,h2))).search())
65
66 if __name__ == "__main__":
67 test_forward_heuristic()

Exercise 6.4 For more than one start-state/goal combination, test the forward
planner with a heuristic function of just h1, with just h2 and with both. Explain
why each one prunes or doesn’t prune the search space.
Exercise 6.5 Create a better heuristic than maxh(h1, h2). Try it for a number of
different problems. In particular, try and include the following costs:

i) h3 is like h2 but also takes into account the case when Rloc is in goal.
ii) h4 uses the distance to the mail room plus getting mail and delivering it if
the robot needs to get need to deliver mail.
iii) h5 is for getting mail when goal is for the robot to have mail, and then getting
to the goal destination (if there is one).

Exercise 6.6 Create an admissible heuristic for the blocks world.

https://aipython.org Version 0.9.9 October 15, 2023

6.3. Regression Planning 133

6.3 Regression Planning

To run the demo, in folder ”aipython”, load
”stripsRegressionPlanner.py”, and copy and paste the commented-
out example queries at the bottom of that file.

In a regression planner a node is a subgoal that need to be achieved.

A Subgoal object consists of an assignment, which is variable:value dictionary.
We make it hashable so that multiple path pruning can work. The hash is only
computed when necessary (and only once).
stripsRegressionPlanner.py — Regression Planner with STRIPS actions
11 from searchProblem import Arc, Search_problem
12
13 class Subgoal(object):
14 def __init__(self,assignment):
15 self.assignment = assignment
16 self.hash_value = None
17 def __hash__(self):
18 if self.hash_value is None:
19 self.hash_value = hash(frozenset(self.assignment.items()))
20 return self.hash_value
21 def __eq__(self,st):
22 return self.assignment == st.assignment
23 def __str__(self):
24 return str(self.assignment)
A regression search has subgoals as nodes. The initial node is the top-level goal
of the planner. The goal for the search (when the search can stop) is a subgoal
that holds in the initial state.
stripsRegressionPlanner.py — (continued)

26 from stripsForwardPlanner import zero

27
28 class Regression_STRIPS(Search_problem):
29 """A search problem where:
30 * a node is a goal to be achieved, represented by a set of propositions.
31 * the dynamics are specified by the STRIPS representation of actions
32 """
33
34 def __init__(self, planning_problem, heur=zero):
35 """creates a regression search space from a planning problem.
36 heur(state,goal) is a heuristic function;
37 an underestimate of the cost from state to goal, where
38 both state and goals are feature:value dictionaries
39 """
40 self.prob_domain = planning_problem.prob_domain
41 self.top_goal = Subgoal(planning_problem.goal)
42 self.initial_state = planning_problem.initial_state
43 self.heur = heur

https://aipython.org Version 0.9.9 October 15, 2023

134 6. Deterministic Planning

44
45 def is_goal(self, subgoal):
46 """if subgoal is true in the initial state, a path has been found"""
47 goal_asst = subgoal.assignment
48 return all(self.initial_state[g]==goal_asst[g]
49 for g in goal_asst)
50
51 def start_node(self):
52 """the start node is the top-level goal"""
53 return self.top_goal
54
55 def neighbors(self,subgoal):
56 """returns a list of the arcs for the neighbors of subgoal in this
problem"""
57 goal_asst = subgoal.assignment
58 return [ Arc(subgoal, self.weakest_precond(act,goal_asst),
act.cost, act)
59 for act in self.prob_domain.actions
60 if self.possible(act,goal_asst)]
61
62 def possible(self,act,goal_asst):
63 """True if act is possible to achieve goal_asst.
64
65 the action achieves an element of the effects and
66 the action doesn't delete something that needs to be achieved and
67 the preconditions are consistent with other subgoals that need to
be achieved
68 """
69 return ( any(goal_asst[prop] == act.effects[prop]
70 for prop in act.effects if prop in goal_asst)
71 and all(goal_asst[prop] == act.effects[prop]
72 for prop in act.effects if prop in goal_asst)
73 and all(goal_asst[prop]== act.preconds[prop]
74 for prop in act.preconds if prop not in act.effects
and prop in goal_asst)
75 )
76
77 def weakest_precond(self,act,goal_asst):
78 """returns the subgoal that must be true so goal_asst holds after
act
79 should be: act.preconds | (goal_asst - act.effects)
80 """
81 new_asst = act.preconds.copy()
82 for g in goal_asst:
83 if g not in act.effects:
84 new_asst[g] = goal_asst[g]
85 return Subgoal(new_asst)
86
87 def heuristic(self,subgoal):
88 """in the regression planner a node is a subgoal.

https://aipython.org Version 0.9.9 October 15, 2023

6.3. Regression Planning 135

89 the heuristic is an (under)estimate of the cost of going from the

initial state to subgoal.
90 """
91 return self.heur(self.initial_state, subgoal.assignment)

stripsRegressionPlanner.py — (continued)

93 from searchBranchAndBound import DF_branch_and_bound

94 from searchMPP import SearcherMPP
95 import stripsProblem
96
97 # SearcherMPP(Regression_STRIPS(stripsProblem.problem1)).search() #A* with
MPP
98 #
DF_branch_and_bound(Regression_STRIPS(stripsProblem.problem1),10).search()
#B&B

Exercise 6.7 Multiple path pruning could be used to prune more than the current
code. In particular, if the current node contains more conditions than a previously
visited node, it can be pruned. For example, if {a : True, b : False} has been visited,
then any node that is a superset, e.g., {a : True, b : False, d : True}, need not be
expanded. If the simpler subgoal does not lead to a solution, the more complicated
one wont either. Implement this more severe pruning. (Hint: This may require
modifications to the searcher.)
Exercise 6.8 It is possible that, as knowledge of the domain, that some as-
signment of values to variables can never be achieved. For example, the robot
cannot be holding mail when there is mail waiting (assuming it isn’t holding
mail initially). An assignment of values to (some of the) variables is incompat-
ible if no possible (reachable) state can include that assignment. For example,
{′ MW ′ : True,′ RHM′ : True} is an incompatible assignment. This information may
be useful information for a planner; there is no point in trying to achieve these
together. Define a subclass of STRIPS domain that can accept a list of incompatible
assignments. Modify the regression planner code to use such a list of incompatible
assignments. Give an example where the search space is smaller.
Exercise 6.9 After completing the previous exercise, design incompatible assign-
ments for the blocks world. (This should result in dramatic search improvements.)

6.3.1 Defining Heuristics for a Regression Planner

The regression planner can use the same heuristic function as the forward plan-
ner. However, just because a heuristic is useful for a forward planner does
not mean it is useful for a regression planner, and vice versa. you should ex-
periment with whether the same heuristic works well for both a a regression
planner and a forward planner.
The following runs the same example as the forward planner with and
without the heuristic defined for the forward planner:
stripsHeuristic.py — (continued)

https://aipython.org Version 0.9.9 October 15, 2023

136 6. Deterministic Planning

69 ##### Regression Planner

70 from stripsRegressionPlanner import Regression_STRIPS
71
72 def test_regression_heuristic(thisproblem=stripsProblem.problem1):
73 print("\n***** REGRESSION NO HEURISTIC")
74 print(SearcherMPP(Regression_STRIPS(thisproblem)).search())
75
76 print("\n***** REGRESSION WITH HEURISTICs h1 and h2")
77 print(SearcherMPP(Regression_STRIPS(thisproblem,maxh(h1,h2))).search())
78
79 if __name__ == "__main__":
80 test_regression_heuristic()

Exercise 6.10 Try the regression planner with a heuristic function of just h1 and
with just h2 (defined in Section 6.2.1). Explain how each one prunes or doesn’t
prune the search space.
Exercise 6.11 Create a better heuristic than heuristic fun defined in Section 6.2.1.

6.4 Planning as a CSP

To run the demo, in folder ”aipython”, load ”stripsCSPPlanner.py”,
and copy and paste the commented-out example queries at the bottom
of that file. This assumes Python 3.

Here we implement the CSP planner assuming there is a single action at

each step. This creates a CSP that can use any of the CSP algorithms to solve
(e.g., stochastic local search or arc consistency with domain splitting).
This assumes the same action representation as before; we do not consider
factored actions (action features), nor do we implement state constraints.
stripsCSPPlanner.py — CSP planner where actions are represented using STRIPS
11 from cspProblem import Variable, CSP, Constraint
12
13 class CSP_from_STRIPS(CSP):
14 """A CSP where:
15 * CSP variables are constructed for each feature and time, and each
action and time
16 * the dynamics are specified by the STRIPS representation of actions
17 """
18
19 def __init__(self, planning_problem, number_stages=2):
20 prob_domain = planning_problem.prob_domain
21 initial_state = planning_problem.initial_state
22 goal = planning_problem.goal
23 # self.action_vars[t] is the action variable for time t
24 self.action_vars = [Variable(f"Action{t}", prob_domain.actions)
25 for t in range(number_stages)]
26 # feat_time_var[f][t] is the variable for feature f at time t

https://aipython.org Version 0.9.9 October 15, 2023

6.4. Planning as a CSP 137

27 feat_time_var = {feat: [Variable(f"{feat}_{t}",dom)

28 for t in range(number_stages+1)]
29 for (feat,dom) in
prob_domain.feature_domain_dict.items()}
30
31 # initial state constraints:
32 constraints = [Constraint((feat_time_var[feat][0],), is_(val))
33 for (feat,val) in initial_state.items()]
34
35 # goal constraints on the final state:
36 constraints += [Constraint((feat_time_var[feat][number_stages],),
37 is_(val))
38 for (feat,val) in goal.items()]
39
40 # precondition constraints:
41 constraints += [Constraint((feat_time_var[feat][t],
self.action_vars[t]),
42 if_(val,act)) # feat@t==val if action@t==act
43 for act in prob_domain.actions
44 for (feat,val) in act.preconds.items()
45 for t in range(number_stages)]
46
47 # effect constraints:
48 constraints += [Constraint((feat_time_var[feat][t+1],
self.action_vars[t]),
49 if_(val,act)) # feat@t+1==val if
action@t==act
50 for act in prob_domain.actions
51 for feat,val in act.effects.items()
52 for t in range(number_stages)]
53 # frame constraints:
54
55 constraints += [Constraint((feat_time_var[feat][t],
self.action_vars[t], feat_time_var[feat][t+1]),
56 eq_if_not_in_({act for act in
prob_domain.actions
57 if feat in act.effects}))
58 for feat in prob_domain.feature_domain_dict
59 for t in range(number_stages) ]
60 variables = set(self.action_vars) | {feat_time_var[feat][t]
61 for feat in
prob_domain.feature_domain_dict
62 for t in range(number_stages+1)}
63 CSP.__init__(self, "CSP_from_Strips", variables, constraints)
64
65 def extract_plan(self,soln):
66 return [soln[a] for a in self.action_vars]
The following methods return methods which can be applied to the particular
environment.
For example, is (3) returns a function that when applied to 3, returns True

https://aipython.org Version 0.9.9 October 15, 2023

138 6. Deterministic Planning

and when applied to any other value returns False. So is (3)(3) returns True
and is (3)(7) returns False.
Note that the underscore (’ ’) is part of the name; here we use it as the
convention that it is a function that returns a function. This uses two different
styles to define is and if ; returning a function defined by lambda is equivalent
to returning the embedded function, except that the embedded function has a
name. The embedded function can also be given a docstring.
stripsCSPPlanner.py — (continued)

68 def is_(val):
69 """returns a function that is true when it is it applied to val.
70 """
71 #return lambda x: x == val
72 def is_fun(x):
73 return x == val
74 is_fun.__name__ = f"value_is_{val}"
75 return is_fun
76
77 def if_(v1,v2):
78 """if the second argument is v2, the first argument must be v1"""
79 #return lambda x1,x2: x1==v1 if x2==v2 else True
80 def if_fun(x1,x2):
81 return x1==v1 if x2==v2 else True
82 if_fun.__name__ = f"if x2 is {v2} then x1 is {v1}"
83 return if_fun
84
85 def eq_if_not_in_(actset):
86 """first and third arguments are equal if action is not in actset"""
87 # return lambda x1, a, x2: x1==x2 if a not in actset else True
88 def eq_if_not_fun(x1, a, x2):
89 return x1==x2 if a not in actset else True
90 eq_if_not_fun.__name__ = f"first and third arguments are equal if
action is not in {actset}"
91 return eq_if_not_fun
Putting it together, this returns a list of actions that solves the problem prob
for a given horizon. If you want to do more than just return the list of actions,
you might want to get it to return the solution. Or even enumerate the solutions
(by using Search with AC from CSP).
stripsCSPPlanner.py — (continued)

93 def con_plan(prob,horizon):
94 """finds a plan for problem prob given horizon.
95 """
96 csp = CSP_from_STRIPS(prob, horizon)
97 sol = Con_solver(csp).solve_one()
98 return csp.extract_plan(sol) if sol else sol
The following are some example queries.
stripsCSPPlanner.py — (continued)

https://aipython.org Version 0.9.9 October 15, 2023

6.4. Planning as a CSP 139

100 from searchGeneric import Searcher

101 from cspConsistency import Search_with_AC_from_CSP, Con_solver
102 from stripsProblem import Planning_problem
103 import stripsProblem
104
105 # Problem 0
106 # con_plan(stripsProblem.problem0,1) # should it succeed?
107 # con_plan(stripsProblem.problem0,2) # should it succeed?
108 # con_plan(stripsProblem.problem0,3) # should it succeed?
109 # To use search to enumerate solutions
110 #searcher0a =
Searcher(Search_with_AC_from_CSP(CSP_from_STRIPS(stripsProblem.problem0,
1)))
111 #print(searcher0a.search()) # returns path to solution
112
113 ## Problem 1
114 # con_plan(stripsProblem.problem1,5) # should it succeed?
115 # con_plan(stripsProblem.problem1,4) # should it succeed?
116 ## To use search to enumerate solutions:
117 #searcher15a =
Searcher(Search_with_AC_from_CSP(CSP_from_STRIPS(stripsProblem.problem1,
5)))
118 #print(searcher15a.search()) # returns path to solution
119
120 ## Problem 2
121 #con_plan(stripsProblem.problem2, 6) # should fail??
122 #con_plan(stripsProblem.problem2, 7) # should succeed???
123
124 ## Example 6.13
125 problem3 = Planning_problem(stripsProblem.delivery_domain,
126 {'SWC':True, 'RHC':False}, {'SWC':False})
127 #con_plan(problem3,2) # Horizon of 2
128 #con_plan(problem3,3) # Horizon of 3
129
130 problem4 = Planning_problem(stripsProblem.delivery_domain,{'SWC':True},
131 {'SWC':False, 'MW':False, 'RHM':False})
132
133 # For the stochastic local search:
134 #from cspSLS import SLSearcher, Runtime_distribution
135 # cspplanning15 = CSP_from_STRIPS(stripsProblem.problem1, 5) # should
succeed
136 #se0 = SLSearcher(cspplanning15); print(se0.search(100000,0.5))
137 #p = Runtime_distribution(cspplanning15)
138 #p.plot_runs(1000,1000,0.7) # warning will take a few minutes

https://aipython.org Version 0.9.9 October 15, 2023

140 6. Deterministic Planning

6.5 Partial-Order Planning

To run the demo, in folder ”aipython”, load ”stripsPOP.py”, and copy
and paste the commented-out example queries at the bottom of that
file.

A partial order planner maintains a partial order of action instances. An

action instance consists of a name and an index. We need action instances
because the same action could be carried out at different times.
stripsPOP.py — Partial-order Planner using STRIPS representation
11 from searchProblem import Arc, Search_problem
12 import random
13
14 class Action_instance(object):
15 next_index = 0
16 def __init__(self,action,index=None):
17 if index is None:
18 index = Action_instance.next_index
19 Action_instance.next_index += 1
20 self.action = action
21 self.index = index
22
23 def __str__(self):
24 return f"{self.action}#{self.index}"
25
26 __repr__ = __str__ # __repr__ function is the same as the __str__
function

A node (as in the abstraction of search space) in a partial-order planner

consists of:

• actions: a set of action instances.

• constraints: a set of (a1 , a2 ) pairs, where a1 and a2 are action instances,

which represents that a1 must come before a2 in the partial order. There
are a number of ways that this could be represented. Here we represent
the set of pairs that are in transitive closure of the before relation. This lets
us quickly determine whether some before relation is consistent with the
current constraints.

• agenda: a list of (s, a) pairs, where s is a (var, val) pair and a is an action
instance. This means that variable var must have value val before a can
occur.

• causal links: a set of (a0, g, a1) triples, where a1 and a2 are action instances
and g is a (var, val) pair. This holds when action a0 makes g true for action
a1 .

https://aipython.org Version 0.9.9 October 15, 2023

6.5. Partial-Order Planning 141

stripsPOP.py — (continued)

28 class POP_node(object):
29 """a (partial) partial-order plan. This is a node in the search
space."""
30 def __init__(self, actions, constraints, agenda, causal_links):
31 """
32 * actions is a set of action instances
33 * constraints a set of (a0,a1) pairs, representing a0<a1,
34 closed under transitivity
35 * agenda list of (subgoal,action) pairs to be achieved, where
36 subgoal is a (variable,value) pair
37 * causal_links is a set of (a0,g,a1) triples,
38 where ai are action instances, and g is a (variable,value) pair
39 """
40 self.actions = actions # a set of action instances
41 self.constraints = constraints # a set of (a0,a1) pairs
42 self.agenda = agenda # list of (subgoal,action) pairs to be
achieved
43 self.causal_links = causal_links # set of (a0,g,a1) triples
44
45 def __str__(self):
46 return ("actions: "+str({str(a) for a in self.actions})+
47 "\nconstraints: "+
48 str({(str(a1),str(a2)) for (a1,a2) in self.constraints})+
49 "\nagenda: "+
50 str([(str(s),str(a)) for (s,a) in self.agenda])+
51 "\ncausal_links:"+
52 str({(str(a0),str(g),str(a2)) for (a0,g,a2) in
self.causal_links}) )

extract plan constructs a total order of action instances that is consistent with
the partial order.

stripsPOP.py — (continued)

54 def extract_plan(self):
55 """returns a total ordering of the action instances consistent
56 with the constraints.
57 raises IndexError if there is no choice.
58 """
59 sorted_acts = []
60 other_acts = set(self.actions)
61 while other_acts:
62 a = random.choice([a for a in other_acts if
63 all(((a1,a) not in self.constraints) for a1 in
other_acts)])
64 sorted_acts.append(a)
65 other_acts.remove(a)
66 return sorted_acts

POP search from STRIPS is an instance of a search problem. As such, we

need to define the start nodes, the goal, and the neighbors of a node.

https://aipython.org Version 0.9.9 October 15, 2023

142 6. Deterministic Planning

stripsPOP.py — (continued)

68 from display import Displayable

69
70 class POP_search_from_STRIPS(Search_problem, Displayable):
71 def __init__(self,planning_problem):
72 Search_problem.__init__(self)
73 self.planning_problem = planning_problem
74 self.start = Action_instance("start")
75 self.finish = Action_instance("finish")
76
77 def is_goal(self, node):
78 return node.agenda == []
79
80 def start_node(self):
81 constraints = {(self.start, self.finish)}
82 agenda = [(g, self.finish) for g in
self.planning_problem.goal.items()]
83 return POP_node([self.start,self.finish], constraints, agenda, [] )

The neighbors method is a coroutine that enumerates the neighbors of a

given node.

stripsPOP.py — (continued)

85 def neighbors(self, node):

86 """enumerates the neighbors of node"""
87 self.display(3,"finding neighbors of\n",node)
88 if node.agenda:
89 subgoal,act1 = node.agenda[0]
90 self.display(2,"selecting",subgoal,"for",act1)
91 new_agenda = node.agenda[1:]
92 for act0 in node.actions:
93 if (self.achieves(act0, subgoal) and
94 self.possible((act0,act1),node.constraints)):
95 self.display(2," reusing",act0)
96 consts1 =
self.add_constraint((act0,act1),node.constraints)
97 new_clink = (act0,subgoal,act1)
98 new_cls = node.causal_links + [new_clink]
99 for consts2 in
self.protect_cl_for_actions(node.actions,consts1,new_clink):
100 yield Arc(node,
101 POP_node(node.actions,consts2,new_agenda,new_cls),
102 cost=0)
103 for a0 in self.planning_problem.prob_domain.actions: #a0 is an
action
104 if self.achieves(a0, subgoal):
105 #a0 acheieves subgoal
106 new_a = Action_instance(a0)
107 self.display(2," using new action",new_a)
108 new_actions = node.actions + [new_a]

https://aipython.org Version 0.9.9 October 15, 2023

6.5. Partial-Order Planning 143

109 consts1 =
self.add_constraint((self.start,new_a),node.constraints)
110 consts2 = self.add_constraint((new_a,act1),consts1)
111 new_agenda1 = new_agenda + [(pre,new_a) for pre in
a0.preconds.items()]
112 new_clink = (new_a,subgoal,act1)
113 new_cls = node.causal_links + [new_clink]
114 for consts3 in
self.protect_all_cls(node.causal_links,new_a,consts2):
115 for consts4 in
self.protect_cl_for_actions(node.actions,consts3,new_clink):
116 yield Arc(node,
117 POP_node(new_actions,consts4,new_agenda1,new_cls),
118 cost=1)

Given a casual link (a0, subgoal, a1), the following method protects the causal
link from each action in actions. Whenever an action deletes subgoal, the action
needs to be before a0 or after a1. This method enumerates all constraints that
result from protecting the causal link from all actions.

stripsPOP.py — (continued)

120 def protect_cl_for_actions(self, actions, constrs, clink):

121 """yields constraints that extend constrs and
122 protect causal link (a0, subgoal, a1)
123 for each action in actions
124 """
125 if actions:
126 a = actions[0]
127 rem_actions = actions[1:]
128 a0, subgoal, a1 = clink
129 if a != a0 and a != a1 and self.deletes(a,subgoal):
130 if self.possible((a,a0),constrs):
131 new_const = self.add_constraint((a,a0),constrs)
132 for e in
self.protect_cl_for_actions(rem_actions,new_const,clink):
yield e # could be "yield from"
133 if self.possible((a1,a),constrs):
134 new_const = self.add_constraint((a1,a),constrs)
135 for e in
self.protect_cl_for_actions(rem_actions,new_const,clink):
yield e
136 else:
137 for e in
self.protect_cl_for_actions(rem_actions,constrs,clink):
yield e
138 else:
139 yield constrs

Given an action act, the following method protects all the causal links in
clinks from act. Whenever act deletes subgoal from some causal link (a0, subgoal, a1),

https://aipython.org Version 0.9.9 October 15, 2023

144 6. Deterministic Planning

the action act needs to be before a0 or after a1. This method enumerates all con-
straints that result from protecting the causal links from act.
stripsPOP.py — (continued)

141 def protect_all_cls(self, clinks, act, constrs):

142 """yields constraints that protect all causal links from act"""
143 if clinks:
144 (a0,cond,a1) = clinks[0] # select a causal link
145 rem_clinks = clinks[1:] # remaining causal links
146 if act != a0 and act != a1 and self.deletes(act,cond):
147 if self.possible((act,a0),constrs):
148 new_const = self.add_constraint((act,a0),constrs)
149 for e in self.protect_all_cls(rem_clinks,act,new_const):
yield e
150 if self.possible((a1,act),constrs):
151 new_const = self.add_constraint((a1,act),constrs)
152 for e in self.protect_all_cls(rem_clinks,act,new_const):
yield e
153 else:
154 for e in self.protect_all_cls(rem_clinks,act,constrs): yield
e
155 else:
156 yield constrs
The following methods check whether an action (or action instance) achieves
or deletes some subgoal.
stripsPOP.py — (continued)

158 def achieves(self,action,subgoal):

159 var,val = subgoal
160 return var in self.effects(action) and self.effects(action)[var] ==
val
161
162 def deletes(self,action,subgoal):
163 var,val = subgoal
164 return var in self.effects(action) and self.effects(action)[var] !=
val
165
166 def effects(self,action):
167 """returns the variable:value dictionary of the effects of action.
168 works for both actions and action instances"""
169 if isinstance(action, Action_instance):
170 action = action.action
171 if action == "start":
172 return self.planning_problem.initial_state
173 elif action == "finish":
174 return {}
175 else:
176 return action.effects
The constraints are represented as a set of pairs closed under transitivity.
Thus if (a, b) and (b, c) are the list, then (a, c) must also be in the list. This means

https://aipython.org Version 0.9.9 October 15, 2023

6.5. Partial-Order Planning 145

that adding a new constraint means adding the implied pairs, but querying
whether some order is consistent is quick.
stripsPOP.py — (continued)

178 def add_constraint(self, pair, const):

179 if pair in const:
180 return const
181 todo = [pair]
182 newconst = const.copy()
183 while todo:
184 x0,x1 = todo.pop()
185 newconst.add((x0,x1))
186 for x,y in newconst:
187 if x==x1 and (x0,y) not in newconst:
188 todo.append((x0,y))
189 if y==x0 and (x,x1) not in newconst:
190 todo.append((x,x1))
191 return newconst
192
193 def possible(self,pair,constraint):
194 (x,y) = pair
195 return (y,x) not in constraint
Some code for testing:
stripsPOP.py — (continued)

197 from searchBranchAndBound import DF_branch_and_bound

198 from searchMPP import SearcherMPP
199 import stripsProblem
200
201 rplanning0 = POP_search_from_STRIPS(stripsProblem.problem0)
202 rplanning1 = POP_search_from_STRIPS(stripsProblem.problem1)
203 rplanning2 = POP_search_from_STRIPS(stripsProblem.problem2)
204 searcher0 = DF_branch_and_bound(rplanning0,5)
205 searcher0a = SearcherMPP(rplanning0)
206 searcher1 = DF_branch_and_bound(rplanning1,10)
207 searcher1a = SearcherMPP(rplanning1)
208 searcher2 = DF_branch_and_bound(rplanning2,10)
209 searcher2a = SearcherMPP(rplanning2)
210 # Try one of the following searchers
211 # a = searcher0.search()
212 # a = searcher0a.search()
213 # a.end().extract_plan() # print a plan found
214 # a.end().constraints # print the constraints
215 # SearcherMPP.max_display_level = 0 # less detailed display
216 # DF_branch_and_bound.max_display_level = 0 # less detailed display
217 # a = searcher1.search()
218 # a = searcher1a.search()
219 # a = searcher2.search()
220 # a = searcher2a.search()

https://aipython.org Version 0.9.9 October 15, 2023

Chapter 7

Supervised Machine Learning

This chapter is the first on machine learning. It covers the following topics:

• Data: how to load it, training and test sets

• Features: many of the features come directly from the data. Sometimes
it is useful to construct features, e.g. height > 1.9m might be a Boolean
feature constructed from the real-values feature height. The next chapter
is about neural networdks and how to learn features; in this chapter we
construct explicitly in what is often known a feature engineering.

• Learning with no input features: this is the base case of many methods.
What should we predict if we have no input features? This provides the
base cases for many algorithms (e.g., decision tree algorithm) and base-
lines that more sophisticated algorithms need to beat. It also provides
ways to test various predictors.

• Decision tree learning: one of the classic and simplest learning algo-
rithms, which is the basis of many other algorithms.

• Cross validation and parameter tuning: methods to prevent overfitting.

• Linear regression and classification: other classic and simple techniques

that often work well (particularly combined with feature learning or en-
gineering).

• Boosting: combining simpler learning methods to make even better learn-

ers.

A good source of classic datasets is the UCI Machine Learning Repository

[Lichman, 2013] [Dua and Graff, 2017]. The SPECT, IRIS, and car datasets (car-
book is a Boolean version of the car dataset) are from this repository.

147
148 7. Supervised Machine Learning

Dataset # Examples #Columns Input Types Target Type

SPECT 267 23 Boolean Boolean
IRIS 150 5 numeric categorical
carbool 1728 7 categorical/numeric numeric
holiday 32 6 Boolean Boolean
mail reading 28 5 Boolean Boolean
tv likes 12 5 Boolean Boolean
simp regr 7 2 numeric numeric

Figure 7.1: Some of the datasets used here.

7.1 Representations of Data and Predictions

The code uses the following definitions and conventions:

• A dataset is an enumeration of examples.

• An example is a list (or tuple) of values. The values can be numbers or

strings.

• A feature is a function from examples into the range of the feature. Each
feature f also has the following attributes:

f.ftype, the type of f, one of: "boolean", "categorical", "numeric"

f.frange, the set of values of f seen in the dataset, represented as a list.
The ftype is inferred from the frange if not given explicitly.
f.__doc__, the docstring, a string description of f (for printing).

Thus for example, a Boolean feature is a function from the examples into
{False, True}. So, if f is a Boolean feature, f .frange == [False, True], and if
e is an example, f (e) is either True or False.

learnProblem.py — A Learning Problem

11 import math, random, statistics
12 import csv
13 from display import Displayable
14 from utilities import argmax
15
16 boolean = [False, True]
When creating a dataset, we partition the data into a training set (train) and
a test set (test). The target feature is the feature that we are making a prediction
of. A dataset ds has the following attributes

ds.train a list of the training examples

ds.test a list of the test examples

https://aipython.org Version 0.9.9 October 15, 2023

7.1. Representations of Data and Predictions 149

ds.target_index the index of the target

ds.target the feature corresponding to the target (a function as described

above)

ds.input_features a list of the input features

learnProblem.py — (continued)

18 class Data_set(Displayable):
19 """ A dataset consists of a list of training data and a list of test
data.
20 """
21
22 def __init__(self, train, test=None, prob_test=0.20, target_index=0,
23 header=None, target_type= None, seed=None): #12345):
24 """A dataset for learning.
25 train is a list of tuples representing the training examples
26 test is the list of tuples representing the test examples
27 if test is None, a test set is created by selecting each
28 example with probability prob_test
29 target_index is the index of the target.
30 If negative, it counts from right.
31 If target_index is larger than the number of properties,
32 there is no target (for unsupervised learning)
33 header is a list of names for the features
34 target_type is either None for automatic detection of target type
35 or one of "numeric", "boolean", "cartegorical"
36 seed is for random number; None gives a different test set each time
37 """
38 if seed: # given seed makes partition consistent from run-to-run
39 random.seed(seed)
40 if test is None:
41 train,test = partition_data(train, prob_test)
42 self.train = train
43 self.test = test
44
45 self.display(1,"Training set has",len(train),"examples. Number of
columns: ",{len(e) for e in train})
46 self.display(1,"Test set has",len(test),"examples. Number of
columns: ",{len(e) for e in test})
47 self.prob_test = prob_test
48 self.num_properties = len(self.train[0])
49 if target_index < 0: #allows for -1, -2, etc.
50 self.target_index = self.num_properties + target_index
51 else:
52 self.target_index = target_index
53 self.header = header
54 self.domains = [set() for i in range(self.num_properties)]
55 for example in self.train:
56 for ind,val in enumerate(example):

https://aipython.org Version 0.9.9 October 15, 2023

150 7. Supervised Machine Learning

57 self.domains[ind].add(val)
58 self.conditions_cache = {} # cache for computed conditions
59 self.create_features()
60 if target_type:
61 self.target.ftype = target_type
62 self.display(1,"There are",len(self.input_features),"input
features")
63
64 def __str__(self):
65 if self.train and len(self.train)>0:
66 return ("Data: "+str(len(self.train))+" training examples, "
67 +str(len(self.test))+" test examples, "
68 +str(len(self.train[0]))+" features.")
69 else:
70 return ("Data: "+str(len(self.train))+" training examples, "
71 +str(len(self.test))+" test examples.")

A feature is a function that takes an example and returns a value in the

range of the feature. Each feature has a frange, which gives the range of the
feature, and an ftype that gives the type, one of “boolean”, “numeric” or “cat-
egorical”.
learnProblem.py — (continued)

73 def create_features(self):
74 """create the set of features
75 """
76 self.target = None
77 self.input_features = []
78 for i in range(self.num_properties):
79 def feat(e,index=i):
80 return e[index]
81 if self.header:
82 feat.__doc__ = self.header[i]
83 else:
84 feat.__doc__ = "e["+str(i)+"]"
85 feat.frange = list(self.domains[i])
86 feat.ftype = self.infer_type(feat.frange)
87 if i == self.target_index:
88 self.target = feat
89 else:
90 self.input_features.append(feat)

We try to infer the type of each feature. Sometimes this can be wrong, (e.g.,
when the numbers are really categorical) and may need to be set explicitly.
learnProblem.py — (continued)

92 def infer_type(self,domain):
93 """Infers the type of a feature with domain
94 """
95 if all(v in {True,False} for v in domain):
96 return "boolean"

https://aipython.org Version 0.9.9 October 15, 2023

7.1. Representations of Data and Predictions 151

97 if all(isinstance(v,(float,int)) for v in domain):

98 return "numeric"
99 else:
100 return "categorical"

7.1.1 Creating Boolean Conditions from Features

Some of the algorithms require Boolean input features or features with range
{0, 1}. In order to be able to use these algorithms on datasets that allow for
arbitrary domains of input variables, we construct Boolean conditions from
the attributes.
There are 3 cases:

• When the range only has two values, we designate one to be the “true”
value.

• When the values are all numeric, we assume they are ordered (as opposed
to just being some classes that happen to be labelled with numbers) and
construct Boolean features for splits of the data. That is, the feature is
e[ind] < cut for some value cut. We choose a number of cut values, up to
a maximum number of cuts, given by max num cuts.

• When the values are not all numeric, we create an indicator function for
each value. An indicator function for a value returns true when that value
is given and false otherwise. Note that we can’t create an indicator func-
tion for values that appear in the test set but not in the training set be-
cause we haven’t seen the test set. For the examples in the test set with a
value that doesn’t appear in the training set for that feature, the indicator
functions all return false.

There is also an option categorical_only to only create Boolean features for

categorical input features, and not to make cuts for numerical values.
learnProblem.py — (continued)

102 def conditions(self, max_num_cuts=8, categorical_only = False):

103 """returns a set of boolean conditions from the input features
104 max_num_cuts is the maximum number of cute for numeric features
105 categorical_only is true if only categorical features are made
binary
106 """
107 if (max_num_cuts, categorical_only) in self.conditions_cache:
108 return self.conditions_cache[(max_num_cuts, categorical_only)]
109 conds = []
110 for ind,frange in enumerate(self.domains):
111 if ind != self.target_index and len(frange)>1:
112 if len(frange) == 2:
113 # two values, the feature is equality to one of them.
114 true_val = list(frange)[1] # choose one as true

https://aipython.org Version 0.9.9 October 15, 2023

152 7. Supervised Machine Learning

115 def feat(e, i=ind, tv=true_val):

116 return e[i]==tv
117 if self.header:
118 feat.__doc__ = f"{self.header[ind]}=={true_val}"
119 else:
120 feat.__doc__ = f"e[{ind}]=={true_val}"
121 feat.frange = boolean
122 feat.ftype = "boolean"
123 conds.append(feat)
124 elif all(isinstance(val,(int,float)) for val in frange):
125 if categorical_only: # numeric, don't make cuts
126 def feat(e, i=ind):
127 return e[i]
128 feat.__doc__ = f"e[{ind}]"
129 conds.append(feat)
130 else:
131 # all numeric, create cuts of the data
132 sorted_frange = sorted(frange)
133 num_cuts = min(max_num_cuts,len(frange))
134 cut_positions = [len(frange)*i//num_cuts for i in
range(1,num_cuts)]
135 for cut in cut_positions:
136 cutat = sorted_frange[cut]
137 def feat(e, ind_=ind, cutat=cutat):
138 return e[ind_] < cutat
139
140 if self.header:
141 feat.__doc__ = self.header[ind]+"<"+str(cutat)
142 else:
143 feat.__doc__ = "e["+str(ind)+"]<"+str(cutat)
144 feat.frange = boolean
145 feat.ftype = "boolean"
146 conds.append(feat)
147 else:
148 # create an indicator function for every value
149 for val in frange:
150 def feat(e, ind_=ind, val_=val):
151 return e[ind_] == val_
152 if self.header:
153 feat.__doc__ = self.header[ind]+"=="+str(val)
154 else:
155 feat.__doc__= "e["+str(ind)+"]=="+str(val)
156 feat.frange = boolean
157 feat.ftype = "boolean"
158 conds.append(feat)
159 self.conditions_cache[(max_num_cuts, categorical_only)] = conds
160 return conds

Exercise 7.1 Change the code so that it splits using e[ind] ≤ cut instead of e[ind] <
cut. Check boundary cases, such as 3 elements with 2 cuts. As a test case, make
sure that when the range is the 30 integers from 100 to 129, and you want 2 cuts,

https://aipython.org Version 0.9.9 October 15, 2023

7.1. Representations of Data and Predictions 153

the resulting Boolean features should be e[ind] ≤ 109 and e[ind] ≤ 119 to make
sure that each of the resulting domains is of equal size.
Exercise 7.2 This splits on whether the feature is less than one of the values in
the training set. Sam suggested it might be better to split between the values in
the training set, and suggested using

cutat = (sorted frange[cut] + sorted frange[cut − 1])/2

Why might Sam have suggested this? Does this work better? (Try it on a few
datasets).

7.1.2 Evaluating Predictions

A predictor is a function that takes an example and makes a prediction on the
values of the target features.
A loss takes a prediction and the actual value and returns a non-negative
real number; lower is better. The error for a dataset is either the mean loss, or
sometimes the sum of the losses. When reporting results the mean is usually
used. When it is the sum, this will be made explicit.
The function evaluate dataset returns the average error for each example,
where the error for each example depends on the evaluation criteria. Here we
consider three evaluation criteria, the squared error (average of the square of
the difference between the actual and predicted values), absolute errors(average
of the absolute difference between the actual and predicted values) and the log
loss (the a average negative log-likelihood, which can be interpreted as the
number of bits to describe an example using a code based on the prediction
treated as a probability).
learnProblem.py — (continued)

162 def evaluate_dataset(self, data, predictor, error_measure):

163 """Evaluates predictor on data according to the error_measure
164 predictor is a function that takes an example and returns a
165 prediction for the target features.
166 error_measure(prediction,actual) -> non-negative real
167 """
168 if data:
169 try:
170 value = statistics.mean(error_measure(predictor(e),
self.target(e))
171 for e in data)
172 except ValueError: # if error_measure gives an error
173 return float("inf") # infinity
174 return value
175 else:
176 return math.nan # not a number
The following evaluation criteria are defined. This is defined using a class,
Evaluate but no instances will be created. Just use Evaluate.squared_loss etc.

https://aipython.org Version 0.9.9 October 15, 2023

154 7. Supervised Machine Learning

(Please keep the __doc__ strings a consistent length as they are used in tables.)
The prediction is either a real value or a {value : probability} dictionary or a list.
The actual is either a real number or a key of the prediction.

learnProblem.py — (continued)

178 class Evaluate(object):

179 """A container for the evaluation measures"""
180
181 def squared_loss(prediction, actual):
182 "squared loss "
183 if isinstance(prediction, (list,dict)):
184 return (1-prediction[actual])**2 # the correct value is 1
185 else:
186 return (prediction-actual)**2
187
188 def absolute_loss(prediction, actual):
189 "absolute loss "
190 if isinstance(prediction, (list,dict)):
191 return abs(1-prediction[actual]) # the correct value is 1
192 else:
193 return abs(prediction-actual)
194
195 def log_loss(prediction, actual):
196 "log loss (bits)"
197 try:
198 if isinstance(prediction, (list,dict)):
199 return -math.log2(prediction[actual])
200 else:
201 return -math.log2(prediction) if actual==1 else
-math.log2(1-prediction)
202 except ValueError:
203 return float("inf") # infinity
204
205 def accuracy(prediction, actual):
206 "accuracy "
207 if isinstance(prediction, dict):
208 prev_val = prediction[actual]
209 return 1 if all(prev_val >= v for v in prediction.values())
else 0
210 if isinstance(prediction, list):
211 prev_val = prediction[actual]
212 return 1 if all(prev_val >= v for v in prediction) else 0
213 else:
214 return 1 if abs(actual-prediction) <= 0.5 else 0
215
216 all_criteria = [accuracy, absolute_loss, squared_loss, log_loss]

https://aipython.org Version 0.9.9 October 15, 2023

7.1. Representations of Data and Predictions 155

7.1.3 Creating Test and Training Sets

The following method partitions the data into a training set and a test set. Note
that this does not guarantee that the test set will contain exactly a proportion
of the data equal to prob test.
[An alternative is to use random.sample() which can guarantee that the test
set will contain exactly a particular proportion of the data. However this would
require knowing how many elements are in the dataset, which we may not
know, as data may just be a generator of the data (e.g., when reading the data
from a file).]
learnProblem.py — (continued)

218 def partition_data(data, prob_test=0.30):

219 """partitions the data into a training set and a test set, where
220 prob_test is the probability of each example being in the test set.
221 """
222 train = []
223 test = []
224 for example in data:
225 if random.random() < prob_test:
226 test.append(example)
227 else:
228 train.append(example)
229 return train, test

7.1.4 Importing Data From File

A dataset is typically loaded from a file. The default here is that it loaded from
a CSV (comma separated values) file, although the separator can be changed.
This assumes that all lines that contain the separator are valid data (so we only
include those data items that contain more than one element). This allows for
blank lines and comment lines that do not contain the separator. However, it
means that this method is not suitable for cases where there is only one feature.
Note that data all and data tuples are generators. data all is a generator of
a list of list of strings. This version assumes that CSV files are simple. The
standard csv package, that allows quoted arguments, can be used by uncom-
menting the line for data all and commenting out the following line. data tuples
contains only those lines that contain the delimiter (others lines are assumed to
be empty or comments), and tries to convert the elements to numbers when-
ever possible.
This allows for some of the columns to be included; specified by include only.
Note that if include only is specified, the target index is the index for the in-
cluded columns, not the original columns.
learnProblem.py — (continued)

231 class Data_from_file(Data_set):

232 def __init__(self, file_name, separator=',', num_train=None,
prob_test=0.3,

https://aipython.org Version 0.9.9 October 15, 2023

156 7. Supervised Machine Learning

233 has_header=False, target_index=0, boolean_features=True,

234 categorical=[], target_type= None, include_only=None,
seed=None): #seed=12345):
235 """create a dataset from a file
236 separator is the character that separates the attributes
237 num_train is a number specifying the first num_train tuples are
training, or None
238 prob_test is the probability an example should in the test set (if
num_train is None)
239 has_header is True if the first line of file is a header
240 target_index specifies which feature is the target
241 boolean_features specifies whether we want to create Boolean
features
242 (if False, it uses the original features).
243 categorical is a set (or list) of features that should be treated
as categorical
244 target_type is either None for automatic detection of target type
245 or one of "numeric", "boolean", "cartegorical"
246 include_only is a list or set of indexes of columns to include
247 """
248 self.boolean_features = boolean_features
249 with open(file_name,'r',newline='') as csvfile:
250 self.display(1,"Loading",file_name)
251 # data_all = csv.reader(csvfile,delimiter=separator) # for more
complicated CSV files
252 data_all = (line.strip().split(separator) for line in csvfile)
253 if include_only is not None:
254 data_all = ([v for (i,v) in enumerate(line) if i in
include_only]
255 for line in data_all)
256 if has_header:
257 header = next(data_all)
258 else:
259 header = None
260 data_tuples = (interpret_elements(d) for d in data_all if
len(d)>1)
261 if num_train is not None:
262 # training set is divided into training then text examples
263 # the file is only read once, and the data is placed in
appropriate list
264 train = []
265 for i in range(num_train): # will give an error if
insufficient examples
266 train.append(next(data_tuples))
267 test = list(data_tuples)
268 Data_set.__init__(self,train, test=test,
target_index=target_index,header=header)
269 else: # randomly assign training and test examples
270 Data_set.__init__(self,data_tuples, test=None,
prob_test=prob_test,

https://aipython.org Version 0.9.9 October 15, 2023

7.1. Representations of Data and Predictions 157

271 target_index=target_index, header=header,

seed=seed, target_type=target_type)

The following class is used for datasets where the training and test are in dif-
ferent files

learnProblem.py — (continued)

273 class Data_from_files(Data_set):

274 def __init__(self, train_file_name, test_file_name, separator=',',
275 has_header=False, target_index=0, boolean_features=True,
276 categorical=[], target_type= None, include_only=None):
277 """create a dataset from separate training and file
278 separator is the character that separates the attributes
279 num_train is a number specifying the first num_train tuples are
training, or None
280 prob_test is the probability an example should in the test set (if
num_train is None)
281 has_header is True if the first line of file is a header
282 target_index specifies which feature is the target
283 boolean_features specifies whether we want to create Boolean
features
284 (if False, it uses the original features).
285 categorical is a set (or list) of features that should be treated
as categorical
286 target_type is either None for automatic detection of target type
287 or one of "numeric", "boolean", "cartegorical"
288 include_only is a list or set of indexes of columns to include
289 """
290 self.boolean_features = boolean_features
291 with open(train_file_name,'r',newline='') as train_file:
292 with open(test_file_name,'r',newline='') as test_file:
293 # data_all = csv.reader(csvfile,delimiter=separator) # for more
complicated CSV files
294 train_data = (line.strip().split(separator) for line in
train_file)
295 test_data = (line.strip().split(separator) for line in
test_file)
296 if include_only is not None:
297 train_data = ([v for (i,v) in enumerate(line) if i in
include_only]
298 for line in train_data)
299 test_data = ([v for (i,v) in enumerate(line) if i in
include_only]
300 for line in test_data)
301 if has_header: # this assumes the training file has a header
and the test file doesn't
302 header = next(train_data)
303 else:
304 header = None
305 train_tuples = [interpret_elements(d) for d in train_data if
len(d)>1]

https://aipython.org Version 0.9.9 October 15, 2023

158 7. Supervised Machine Learning

306 test_tuples = [interpret_elements(d) for d in test_data if

len(d)>1]
307 Data_set.__init__(self,train_tuples, test_tuples,
308 target_index=target_index, header=header)
When reading from a file all of the values are strings. This next method
tries to convert each values into a number (an int or a float) or Boolean, if it is
possible.
learnProblem.py — (continued)

310 def interpret_elements(str_list):

311 """make the elements of string list str_list numeric if possible.
312 Otherwise remove initial and trailing spaces.
313 """
314 res = []
315 for e in str_list:
316 try:
317 res.append(int(e))
318 except ValueError:
319 try:
320 res.append(float(e))
321 except ValueError:
322 se = e.strip()
323 if se in ["True","true","TRUE"]:
324 res.append(True)
325 elif se in ["False","false","FALSE"]:
326 res.append(False)
327 else:
328 res.append(e.strip())
329 return res

7.1.5 Augmented Features

Sometimes we want to augment the features with new features computed from
the old features (eg. the product of features). Here we allow the creation of
a new dataset from an old dataset but with new features. Note that special
cases of these are kernels; mapping the original feature space into a new space,
which allow a neat way to do learning in the augmented space for many map-
pings (the “kernel trick”). This is beyond the scope of AIPython; those inter-
ested should read about support vector machines.
A feature is a function of examples. A unary feature constructor takes a fea-
ture and returns a new feature. A binary feature combiner takes two features
and returns a new feature.
learnProblem.py — (continued)

331 class Data_set_augmented(Data_set):

332 def __init__(self, dataset, unary_functions=[], binary_functions=[],
include_orig=True):
333 """creates a dataset like dataset but with new features

https://aipython.org Version 0.9.9 October 15, 2023

7.1. Representations of Data and Predictions 159

334 unary_function is a list of unary feature constructors

335 binary_functions is a list of binary feature combiners.
336 include_orig specifies whether the original features should be
included
337 """
338 self.orig_dataset = dataset
339 self.unary_functions = unary_functions
340 self.binary_functions = binary_functions
341 self.include_orig = include_orig
342 self.target = dataset.target
343 Data_set.__init__(self,dataset.train, test=dataset.test,
344 target_index = dataset.target_index)
345
346 def create_features(self):
347 if self.include_orig:
348 self.input_features = self.orig_dataset.input_features.copy()
349 else:
350 self.input_features = []
351 for u in self.unary_functions:
352 for f in self.orig_dataset.input_features:
353 self.input_features.append(u(f))
354 for b in self.binary_functions:
355 for f1 in self.orig_dataset.input_features:
356 for f2 in self.orig_dataset.input_features:
357 if f1 != f2:
358 self.input_features.append(b(f1,f2))

The following are useful unary feature constructors and binary feature com-
biner.

learnProblem.py — (continued)

360 def square(f):

361 """a unary feature constructor to construct the square of a feature
362 """
363 def sq(e):
364 return f(e)**2
365 sq.__doc__ = f.__doc__+"**2"
366 return sq
367
368 def power_feat(n):
369 """given n returns a unary feature constructor to construct the nth
power of a feature.
370 e.g., power_feat(2) is the same as square, defined above
371 """
372 def fn(f,n=n):
373 def pow(e,n=n):
374 return f(e)**n
375 pow.__doc__ = f.__doc__+"**"+str(n)
376 return pow
377 return fn
378

https://aipython.org Version 0.9.9 October 15, 2023

160 7. Supervised Machine Learning

379 def prod_feat(f1,f2):

380 """a new feature that is the product of features f1 and f2
381 """
382 def feat(e):
383 return f1(e)*f2(e)
384 feat.__doc__ = f1.__doc__+"*"+f2.__doc__
385 return feat
386
387 def eq_feat(f1,f2):
388 """a new feature that is 1 if f1 and f2 give same value
389 """
390 def feat(e):
391 return 1 if f1(e)==f2(e) else 0
392 feat.__doc__ = f1.__doc__+"=="+f2.__doc__
393 return feat
394
395 def neq_feat(f1,f2):
396 """a new feature that is 1 if f1 and f2 give different values
397 """
398 def feat(e):
399 return 1 if f1(e)!=f2(e) else 0
400 feat.__doc__ = f1.__doc__+"!="+f2.__doc__
401 return feat

Example:

learnProblem.py — (continued)

403 # from learnProblem import Data_set_augmented,prod_feat

404 # data = Data_from_file('data/holiday.csv', has_header=True, num_train=19,
target_index=-1)
405 # data = Data_from_file('data/iris.data', prob_test=1/3, target_index=-1)
406 ## Data = Data_from_file('data/SPECT.csv', prob_test=0.5, target_index=0)
407 # dataplus = Data_set_augmented(data,[],[prod_feat])
408 # dataplus = Data_set_augmented(data,[],[prod_feat,neq_feat])

Exercise 7.3 For symmetric properties, such as product, we don’t need both
f 1 ∗ f 2 as well as f 2 ∗ f 1 as extra properties. Allow the user to be able to declare
feature constructors as symmetric (by associating a Boolean feature with them).
Change construct features so that it does not create both versions for symmetric
combiners.

7.2 Generic Learner Interface

A learner takes a dataset (and possibly other arguments specific to the method).
To get it to learn, we call the learn() method. This implements Displayable so
that we can display traces at multiple levels of detail (and perhaps with a GUI).

learnProblem.py — (continued)

https://aipython.org Version 0.9.9 October 15, 2023

7.3. Learning With No Input Features 161

409 from display import Displayable

410
411 class Learner(Displayable):
412 def __init__(self, dataset):
413 raise NotImplementedError("Learner.__init__") # abstract method
414
415 def learn(self):
416 """returns a predictor, a function from a tuple to a value for the
target feature
417 """
418 raise NotImplementedError("learn") # abstract method

7.3 Learning With No Input Features

If we make the same prediction for each example, what prediction should we
make? This can be used as a naive baseline; if a more sophisticated method
does not do better than this, it is not useful. This also provides the base case
for some methods, such as decision-tree learning.

To run demo to compare different prediction methods on various eval-

uation criteria, in folder ”aipython”, load ”learnNoInputs.py”, using
e.g., ipython -i learnNoInputs.py, and it prints some test results.

There are a few alternatives as to what could be allowed in a prediction:

• a point prediction, where we are only allowed to predict one of the values
of the feature. For example, if the values of the feature are {0, 1} we are
only allowed to predict 0 or 1 or of the values are ratings in {1, 2, 3, 4, 5},
we can only predict one of these integers.

• a point prediction, where we are allowed to predict any value. For exam-
ple, if the values of the feature are {0, 1} we may be allowed to predict 0.3,
1, or even 1.7. For all of the criteria we can imagine, there is no point in
predicting a value greater than 1 or less that zero (but that doesn’t mean
we can’t), but it is often useful to predict a value between 0 and 1. If the
values are ratings in {1, 2, 3, 4, 5}, we may want to predict 3.4.

• a probability distribution over the values of the feature. For each value v,
we predict a non-negative number pv , such that the sum over all predic-
tions is 1.

For regression, we do the first of these. For classification, we do the second.

The third can be implemented by having multiple indicator functions for the
target.
Here are some prediction functions that take in an enumeration of values,
a domain, and returns a value or dictionary of {value : prediction}. Note that

https://aipython.org Version 0.9.9 October 15, 2023

162 7. Supervised Machine Learning

cmedian returns one of middle values when there are an even number of exam-
ples, whereas median gives the average of them (and so cmedian is applicable
for ordinals that cannot be considered cardinal values). Similarly, cmode picks
one of the values when more than one value has the maximum number of ele-
ments.

learnNoInputs.py — Learning ignoring all input features

11 from learnProblem import Evaluate
12 import math, random, collections, statistics
13 import utilities # argmax for (element,value) pairs
14
15 class Predict(object):
16 """The class of prediction methods for a list of values.
17 Please make the doc strings the same length, because they are used in
tables.
18 Note that we don't need self argument, as we are creating Predict
objects,
19 To use call Predict.laplace(data) etc."""
20
21 ### The following return a distribution over values (for classification)
22 def empirical(data, domain=[0,1], icount=0):
23 "empirical dist "
24 # returns a distribution over values
25 counts = {v:icount for v in domain}
26 for e in data:
27 counts[e] += 1
28 s = sum(counts.values())
29 return {k:v/s for (k,v) in counts.items()}
30
31 def bounded_empirical(data, domain=[0,1], bound=0.01):
32 "bounded empirical"
33 return {k:min(max(v,bound),1-bound) for (k,v) in
Predict.empirical(data, domain).items()}
34
35 def laplace(data, domain=[0,1]):
36 "Laplace " # for categorical data
37 return Predict.empirical(data, domain, icount=1)
38
39 def cmode(data, domain=[0,1]):
40 "mode " # for categorical data
41 md = statistics.mode(data)
42 return {v: 1 if v==md else 0 for v in domain}
43
44 def cmedian(data, domain=[0,1]):
45 "median " # for categorical data
46 md = statistics.median_low(data) # always return one of the values
47 return {v: 1 if v==md else 0 for v in domain}
48
49 ### The following return a single prediction (for regression). domain
is ignored.

https://aipython.org Version 0.9.9 October 15, 2023

7.3. Learning With No Input Features 163

50
51 def mean(data, domain=[0,1]):
52 "mean "
53 # returns a real number
54 return statistics.mean(data)
55
56 def rmean(data, domain=[0,1], mean0=0, pseudo_count=1):
57 "regularized mean"
58 # returns a real number.
59 # mean0 is the mean to be used for 0 data points
60 # With mean0=0.5, pseudo_count=2, same as laplace for [0,1] data
61 # this works for enumerations as well as lists
62 sum = mean0 * pseudo_count
63 count = pseudo_count
64 for e in data:
65 sum += e
66 count += 1
67 return sum/count
68
69 def mode(data, domain=[0,1]):
70 "mode "
71 return statistics.mode(data)
72
73 def median(data, domain=[0,1]):
74 "median "
75 return statistics.median(data)
76
77 all = [empirical, mean, rmean, bounded_empirical, laplace, cmode, mode,
median,cmedian]
78
79 # The following suggests appropriate predictions as a function of the
target type
80 select = {"boolean": [empirical, bounded_empirical, laplace, cmode,
cmedian],
81 "categorical": [empirical, bounded_empirical, laplace, cmode,
cmedian],
82 "numeric": [mean, rmean, mode, median]}

7.3.1 Evaluation
To evaluate a point prediction, we first generate some data from a simple (Bernoulli)
distribution, where there are two possible values, 0 and 1 for the target feature.
Given prob, a number in the range [0, 1], this generate some training and test
data where prob is the probability of each example being 1. To generate a 1 with
probability prob, we generate a random number in range [0,1] and return 1 if
that number is less than prob. A prediction is computed by applying the pre-
dictor to the training data, which is evaluated on the test set. This is repeated
num_samples times.

https://aipython.org Version 0.9.9 October 15, 2023

164 7. Supervised Machine Learning

Let’s evaluate the predictions of the possible selections according to the

different evaluation criteria, for various training sizes.
learnNoInputs.py — (continued)

83 def test_no_inputs(error_measures = Evaluate.all_criteria,

num_samples=10000,
84 test_size=10, training_sizes=
[1,2,3,4,5,10,20,100,1000]):
85 for train_size in training_sizes:
86 results = {predictor: {error_measure: 0 for error_measure in
error_measures}
87 for predictor in Predict.all}
88 for sample in range(num_samples):
89 prob = random.random()
90 training = [1 if random.random()<prob else 0 for i in
range(train_size)]
91 test = [1 if random.random()<prob else 0 for i in
range(test_size)]
92 for predictor in Predict.all:
93 prediction = predictor(training)
94 for error_measure in error_measures:
95 results[predictor][error_measure] += sum(
error_measure(prediction,actual) for actual in
test)/test_size
96 print(f"For training size {train_size}:")
97 print(" Predictor\t","\t".join(error_measure.__doc__ for
98 error_measure in
error_measures),sep="\t")
99 for predictor in Predict.all:
100 print(f" {predictor.__doc__}",
101 "\t".join("{:.7f}".format(results[predictor][error_measure]/num_samples)
102 for error_measure in
error_measures),sep="\t")
103
104 if __name__ == "__main__":
105 test_no_inputs()

Exercise 7.4 Which predictor works best for low counts when the error is
(a) Squared error
(b) Absolute error
(c) Log loss
You may need to try this a few times to make sure your answer is supported by
the evidence. Does the difference from the other methods get more or less as the
number of examples grow?
Exercise 7.5 Suggest some other predictions that only take the training data.
Does your method do better than the given methods? A simple way to get other
predictors is to vary the threshold of bounded average, or to change the pseodo-
counts of the Laplace method (use other numbers instead of 1 and 2).

https://aipython.org Version 0.9.9 October 15, 2023

7.4. Decision Tree Learning 165

7.4 Decision Tree Learning

To run the decision tree learning demo, in folder ”aipython”, load
”learnDT.py”, using e.g., ipython -i learnDT.py, and it prints some
test results. To try more examples, copy and paste the commented-
out commands at the bottom of that file. This requires Python 3 with
matplotlib.

The decision tree algorithm does binary splits, and assumes that all input
features are binary functions of the examples. It stops splitting if there are
no input features, the number of examples is less than a specified number of
examples or all of the examples agree on the target feature.
learnDT.py — Learning a binary decision tree
11 from learnProblem import Learner, Evaluate
12 from learnNoInputs import Predict
13 import math
14
15 class DT_learner(Learner):
16 def __init__(self,
17 dataset,
18 split_to_optimize=Evaluate.log_loss, # to minimize for at
each split
19 leaf_prediction=Predict.empirical, # what to use for value
at leaves
20 train=None, # used for cross validation
21 max_num_cuts=8, # maximum number of conditions to split a
numeric feature into
22 gamma=1e-7 , # minimum improvement needed to expand a node
23 min_child_weight=10):
24 self.dataset = dataset
25 self.target = dataset.target
26 self.split_to_optimize = split_to_optimize
27 self.leaf_prediction = leaf_prediction
28 self.max_num_cuts = max_num_cuts
29 self.gamma = gamma
30 self.min_child_weight = min_child_weight
31 if train is None:
32 self.train = self.dataset.train
33 else:
34 self.train = train
35
36 def learn(self, max_num_cuts=8):
37 """learn a decision tree"""
38 return self.learn_tree(self.dataset.conditions(self.max_num_cuts),
self.train)
The main recursive algorithm, takes in a set of input features and a set of
training data. It first decides whether to split. If it doesn’t split, it makes a point
prediction, ignoring the input features.

https://aipython.org Version 0.9.9 October 15, 2023

166 7. Supervised Machine Learning

It only splits if the best split increases the error by at least gamma. This im-
plies it does not split when:

• there are no more input features

• there are fewer examples than min number examples,

• all the examples agree on the value of the target, or

• the best split makes all examples in the same partition.

If it splits, it selects the best split according to the evaluation criterion (as-
suming that is the only split it gets to do), and returns the condition to split on
(in the variable split) and the corresponding partition of the examples.
learnDT.py — (continued)

40 def learn_tree(self, conditions, data_subset):

41 """returns a decision tree
42 conditions is a set of possible conditions
43 data_subset is a subset of the data used to build this (sub)tree
44
45 where a decision tree is a function that takes an example and
46 makes a prediction on the target feature
47 """
48 self.display(2,f"learn_tree with {len(conditions)} features and
{len(data_subset)} examples")
49 split, partn = self.select_split(conditions, data_subset)
50 if split is None: # no split; return a point prediction
51 prediction = self.leaf_value(data_subset, self.target.frange)
52 self.display(2,f"leaf prediction for {len(data_subset)}
examples is {prediction}")
53 def leaf_fun(e):
54 return prediction
55 leaf_fun.__doc__ = str(prediction)
56 leaf_fun.num_leaves = 1
57 return leaf_fun
58 else: # a split succeeded
59 false_examples, true_examples = partn
60 rem_features = [fe for fe in conditions if fe != split]
61 self.display(2,"Splitting on",split.__doc__,"with examples
split",
62 len(true_examples),":",len(false_examples))
63 true_tree = self.learn_tree(rem_features,true_examples)
64 false_tree = self.learn_tree(rem_features,false_examples)
65 def fun(e):
66 if split(e):
67 return true_tree(e)
68 else:
69 return false_tree(e)
70 #fun = lambda e: true_tree(e) if split(e) else false_tree(e)
71 fun.__doc__ = (f"(if {split.__doc__} then {true_tree.__doc__}"

https://aipython.org Version 0.9.9 October 15, 2023

7.4. Decision Tree Learning 167

72 f" else {false_tree.doc})")

73 fun.num_leaves = true_tree.num_leaves + false_tree.num_leaves
74 return fun

learnDT.py — (continued)

76 def leaf_value(self, egs, domain):

77 return self.leaf_prediction((self.target(e) for e in egs), domain)
78
79 def select_split(self, conditions, data_subset):
80 """finds best feature to split on.
81
82 conditions is a non-empty list of features.
83 returns feature, partition
84 where feature is an input feature with the smallest error as
85 judged by split_to_optimize or
86 feature==None if there are no splits that improve the error
87 partition is a pair (false_examples, true_examples) if feature is
not None
88 """
89 best_feat = None # best feature
90 # best_error = float("inf") # infinity - more than any error
91 best_error = self.sum_losses(data_subset) - self.gamma
92 self.display(3," no split has
error=",best_error,"with",len(conditions),"conditions")
93 best_partition = None
94 for feat in conditions:
95 false_examples, true_examples = partition(data_subset,feat)
96 if
min(len(false_examples),len(true_examples))>=self.min_child_weight:
97 err = (self.sum_losses(false_examples)
98 + self.sum_losses(true_examples))
99 self.display(3," split on",feat.__doc__,"has error=",err,
100 "splits
into",len(true_examples),":",len(false_examples),"gamma=",self.gamma)
101 if err < best_error:
102 best_feat = feat
103 best_error=err
104 best_partition = false_examples, true_examples
105 self.display(2,"best split is on",best_feat.__doc__,
106 "with err=",best_error)
107 return best_feat, best_partition
108
109 def sum_losses(self, data_subset):
110 """returns sum of losses for dataset (with no more splits)
111 There a single prediction for all leaves using leaf_prediction
112 It is evaluated using split_to_optimize
113 """
114 prediction = self.leaf_value(data_subset, self.target.frange)
115 error = sum(self.split_to_optimize(prediction, self.target(e))
116 for e in data_subset)

https://aipython.org Version 0.9.9 October 15, 2023

168 7. Supervised Machine Learning

117 return error

118
119 def partition(data_subset,feature):
120 """partitions the data_subset by the feature"""
121 true_examples = []
122 false_examples = []
123 for example in data_subset:
124 if feature(example):
125 true_examples.append(example)
126 else:
127 false_examples.append(example)
128 return false_examples, true_examples

Test cases:

learnDT.py — (continued)

131 from learnProblem import Data_set, Data_from_file

132
133 def testDT(data, print_tree=True, selections = None, **tree_args):
134 """Prints errors and the trees for various evaluation criteria and ways
to select leaves.
135 """
136 if selections == None: # use selections suitable for target type
137 selections = Predict.select[data.target.ftype]
138 evaluation_criteria = Evaluate.all_criteria
139 print("Split Choice","Leaf Choice\t","#leaves",'\t'.join(ecrit.__doc__
140 for ecrit in
evaluation_criteria),sep="\t")
141 for crit in evaluation_criteria:
142 for leaf in selections:
143 tree = DT_learner(data, split_to_optimize=crit,
leaf_prediction=leaf,
144 **tree_args).learn()
145 print(crit.__doc__, leaf.__doc__, tree.num_leaves,
146 "\t".join("{:.7f}".format(data.evaluate_dataset(data.test,
tree, ecrit))
147 for ecrit in evaluation_criteria),sep="\t")
148 if print_tree:
149 print(tree.__doc__)
150
151 #DT_learner.max_display_level = 4
152 if __name__ == "__main__":
153 # Choose one of the data files
154 #data=Data_from_file('data/SPECT.csv', target_index=0);
print("SPECT.csv")
155 #data=Data_from_file('data/iris.data', target_index=-1);
print("iris.data")
156 data = Data_from_file('data/carbool.csv', target_index=-1, seed=123)
157 #data = Data_from_file('data/mail_reading.csv', target_index=-1);
print("mail_reading.csv")

https://aipython.org Version 0.9.9 October 15, 2023

7.5. Cross Validation and Parameter Tuning 169

158 #data = Data_from_file('data/holiday.csv', has_header=True,

num_train=19, target_index=-1); print("holiday.csv")
159 testDT(data, print_tree=False)

Note that different runs may provide different values as they split the train-
ing and test sets differently. So if you have a hypothesis about what works
better, make sure it is true for different runs.

Exercise 7.6 The current algorithm does not have a very sophisticated stopping
criterion. What is the current stopping criterion? (Hint: you need to look at both
learn tree and select split.)
Exercise 7.7 Extend the current algorithm to include in the stopping criterion

(a) A minimum child size; don’t use a split if one of the children has fewer
elements that this.
(b) A depth-bound on the depth of the tree.
(c) An improvement bound such that a split is only carried out if error with the
split is better than the error without the split by at least the improvement
bound.

Which values for these parameters make the prediction errors on the test set the
smallest? Try it on more than one dataset.
Exercise 7.8 Without any input features, it is often better to include a pseudo-
count that is added to the counts from the training data. Modify the code so that
it includes a pseudo-count for the predictions. When evaluating a split, including
pseudo counts can make the split worse than no split. Does pruning with an im-
provement bound and pseudo-counts make the algorithm work better than with
an improvement bound by itself?
Exercise 7.9 Some people have suggested using information gain (which is equiv-
alent to greedy optimization of log loss) as the measure of improvement when
building the tree, even in they want to have non-probabilistic predictions in the
final tree. Does this work better than myopically choosing the split that is best for
the evaluation criteria we will use to judge the final prediction?

7.5 Cross Validation and Parameter Tuning

To run the cross validation demo, in folder ”aipython”,
load ”learnCrossValidation.py”, using e.g., ipython -i
learnCrossValidation.py. Run the examples at the end to pro-
duce a graph like Figure 7.15. Note that different runs will produce
different graphs, so your graph will not look like the one in the
textbook. To try more examples, copy and paste the commented-out
commands at the bottom of that file. This requires Python 3 with
matplotlib.

https://aipython.org Version 0.9.9 October 15, 2023

170 7. Supervised Machine Learning

The above decision tree overfits the data. One way to determine whether
the prediction is overfitting is by cross validation. The code below implements
k-fold cross validation, which can be used to choose the value of parameters
to best fit the training data. If we want to use parameter tuning to improve
predictions on a particular dataset, we can only use the training data (and not
the test data) to tune the parameter.
In k-fold cross validation, we partition the training set into k approximately
equal-sized folds (each fold is an enumeration of examples). For each fold, we
train on the other examples, and determine the error of the prediction on that
fold. For example, if there are 10 folds, we train on 90% of the data, and then
test on remaining 10% of the data. We do this 10 times, so that each example
gets used as a test set once, and in the training set 9 times.
The code below creates one copy of the data, and multiple views of the data.
For each fold, fold enumerates the examples in the fold, and fold complement
enumerates the examples not in the fold.
learnCrossValidation.py — Cross Validation for Parameter Tuning
11 from learnProblem import Data_set, Data_from_file, Evaluate
12 from learnNoInputs import Predict
13 from learnDT import DT_learner
14 import matplotlib.pyplot as plt
15 import random
16
17 class K_fold_dataset(object):
18 def __init__(self, training_set, num_folds):
19 self.data = training_set.train.copy()
20 self.target = training_set.target
21 self.input_features = training_set.input_features
22 self.num_folds = num_folds
23 self.conditions = training_set.conditions
24
25 random.shuffle(self.data)
26 self.fold_boundaries = [(len(self.data)*i)//num_folds
27 for i in range(0,num_folds+1)]
28
29 def fold(self, fold_num):
30 for i in range(self.fold_boundaries[fold_num],
31 self.fold_boundaries[fold_num+1]):
32 yield self.data[i]
33
34 def fold_complement(self, fold_num):
35 for i in range(0,self.fold_boundaries[fold_num]):
36 yield self.data[i]
37 for i in range(self.fold_boundaries[fold_num+1],len(self.data)):
38 yield self.data[i]
The validation error is the average error for each example, where we test on
each fold, and learn on the other folds.
learnCrossValidation.py — (continued)

https://aipython.org Version 0.9.9 October 15, 2023

7.5. Cross Validation and Parameter Tuning 171

40 def validation_error(self, learner, error_measure, **other_params):

41 error = 0
42 try:
43 for i in range(self.num_folds):
44 predictor = learner(self,
train=list(self.fold_complement(i)),
45 **other_params).learn()
46 error += sum( error_measure(predictor(e), self.target(e))
47 for e in self.fold(i))
48 except ValueError:
49 return float("inf") #infinity
50 return error/len(self.data)

The plot error method plots the average error as a function of a the mini-
mum number of examples in decision-tree search, both for the validation set
and for the test set. The error on the validation set can be used to tune the
parameter — choose the value of the parameter that minimizes the error. The
error on the test set cannot be used to tune the parameters; if is were to be used
this way then it cannot be used to test.

learnCrossValidation.py — (continued)

52 def plot_error(data, criterion=Evaluate.squared_loss,

leaf_prediction=Predict.empirical,
53 num_folds=5, maxx=None, xscale='linear'):
54 """Plots the error on the validation set and the test set
55 with respect to settings of the minimum number of examples.
56 xscale should be 'log' or 'linear'
57 """
58 plt.ion()
59 plt.xscale(xscale) # change between log and linear scale
60 plt.xlabel("min_child_weight")
61 plt.ylabel("average "+criterion.__doc__)
62 folded_data = K_fold_dataset(data, num_folds)
63 if maxx == None:
64 maxx = len(data.train)//2+1
65 verrors = [] # validation errors
66 terrors = [] # test set errors
67 for mcw in range(1,maxx):
68 verrors.append(folded_data.validation_error(DT_learner,criterion,leaf_prediction=leaf_predi
69 min_child_weight=mcw))
70 tree = DT_learner(data, criterion, leaf_prediction=leaf_prediction,
min_child_weight=mcw).learn()
71 terrors.append(data.evaluate_dataset(data.test,tree,criterion))
72 plt.plot(range(1,maxx), verrors, ls='-',color='k',
73 label="validation for "+criterion.__doc__)
74 plt.plot(range(1,maxx), terrors, ls='--',color='k',
75 label="test set for "+criterion.__doc__)
76 plt.legend()
77 plt.draw()
78

https://aipython.org Version 0.9.9 October 15, 2023

172 7. Supervised Machine Learning

validation for squared loss

0.22 test set for squared loss

0.20
average squared loss

0.18

0.16

0.14

0 20 40 60 80
min_child_weight

Figure 7.2: plot error for SPECT dataset

79 # The following produces the graphs of Figure 7.18 of Poole and Mackworth
[2023]
80 # data = Data_from_file('data/SPECT.csv',target_index=0, seed=123)
81 # plot_error(data, criterion=Evaluate.log_loss,
leaf_prediction=Predict.laplace)
82
83 #also try:
84 # plot_error(data)
85 # data = Data_from_file('data/carbool.csv', target_index=-1, seed=123)

Figure 7.2 shows the average squared loss in the validation and test sets as
a function of the min_child_weight in the decision-tree learning algorithm.
(SPECT data with seed 12345 followed by plot_error(data)). Different seeds
will produce different graphs. The assumption behind cross vaildation is that
the parameter that minimizes the loss on the validation set, will be a good pa-
rameter for the test set.
Note that different runs for the same data will have the same test error, but
different validation error. If you rerun the Data_from_file, with a different
seed, you will get the new test and training sets, and so the graph will change.

Exercise 7.10 Change the error plot so that it can evaluate the stopping criteria
of the exercise of Section 7.6. Which criteria makes the most difference?

https://aipython.org Version 0.9.9 October 15, 2023

7.6. Linear Regression and Classification 173

7.6 Linear Regression and Classification

Here is a stochastic gradient descent searcher for linear regression and classifi-
cation.
learnLinear.py — Linear Regression and Classification
11 from learnProblem import Learner
12 import random, math
13
14 class Linear_learner(Learner):
15 def __init__(self, dataset, train=None,
16 learning_rate=0.1, max_init = 0.2,
17 squashed=True, batch_size=10):
18 """Creates a gradient descent searcher for a linear classifier.
19 The main learning is carried out by learn()
20
21 dataset provides the target and the input features
22 train provides a subset of the training data to use
23 number_iterations is the default number of steps of gradient descent
24 learning_rate is the gradient descent step size
25 max_init is the maximum absolute value of the initial weights
26 squashed specifies whether the output is a squashed linear function
27 """
28 self.dataset = dataset
29 self.target = dataset.target
30 if train==None:
31 self.train = self.dataset.train
32 else:
33 self.train = train
34 self.learning_rate = learning_rate
35 self.squashed = squashed
36 self.batch_size = batch_size
37 self.input_features = [one]+dataset.input_features # one is defined
below
38 self.weights = {feat:random.uniform(-max_init,max_init)
39 for feat in self.input_features}
predictor predicts the value of an example from the current parameter settings.
predictor string gives a string representation of the predictor.
learnLinear.py — (continued)

41
42 def predictor(self,e):
43 """returns the prediction of the learner on example e"""
44 linpred = sum(w*f(e) for f,w in self.weights.items())
45 if self.squashed:
46 return sigmoid(linpred)
47 else:
48 return linpred
49
50 def predictor_string(self, sig_dig=3):

https://aipython.org Version 0.9.9 October 15, 2023

174 7. Supervised Machine Learning

51 """returns the doc string for the current prediction function

52 sig_dig is the number of significant digits in the numbers"""
53 doc = "+".join(str(round(val,sig_dig))+"*"+feat.__doc__
54 for feat,val in self.weights.items())
55 if self.squashed:
56 return "sigmoid("+ doc+")"
57 else:
58 return doc

learn is the main algorithm of the learner. It does num iter steps of stochastic
gradient descent. Only the number of iterations is specified; the other parame-
ters it gets from the class.

learnLinear.py — (continued)

60 def learn(self,num_iter=100):
61 batch_size = min(self.batch_size, len(self.train))
62 d = {feat:0 for feat in self.weights}
63 for it in range(num_iter):
64 self.display(2,"prediction=",self.predictor_string())
65 for e in random.sample(self.train, batch_size):
66 error = self.predictor(e) - self.target(e)
67 update = self.learning_rate*error
68 for feat in self.weights:
69 d[feat] += update*feat(e)
70 for feat in self.weights:
71 self.weights[feat] -= d[feat]
72 d[feat]=0
73 return self.predictor

one is a function that always returns 1. This is used for one of the input prop-
erties.

learnLinear.py — (continued)

75 def one(e):
76 "1"
77 return 1

sigmoid(x) is the function

1
1 + e−x

The inverse of sigmoid is the logit function

learnLinear.py — (continued)

79 def sigmoid(x):
80 return 1/(1+math.exp(-x))
81
82 def logit(x):
83 return -math.log(1/x-1)

https://aipython.org Version 0.9.9 October 15, 2023

7.6. Linear Regression and Classification 175

sigmoid([x0 , v2 , . . . ]) returns [v0 , v2 , . . . ] where

exp(xi )
vi =
∑j exp(xj )

The inverse of sigmoid is the logit function

learnLinear.py — (continued)

85 def softmax(xs,domain=None):
86 """xs is a list of values, and
87 domain is the domain (a list) or None if the list should be returned
88 returns a distribution over the domain (a dict)
89 """
90 m = max(xs) # use of m prevents overflow (and all values underflowing)
91 exps = [math.exp(x-m) for x in xs]
92 s = sum(exps)
93 if domain:
94 return {d:v/s for (d,v) in zip(domain,exps)}
95 else:
96 return [v/s for v in exps]
97
98 def indicator(v, domain):
99 return [1 if v==dv else 0 for dv in domain]

The following tests the learner on a datasets. Uncomment the other datasets
for different examples.
learnLinear.py — (continued)

101 from learnProblem import Data_set, Data_from_file, Evaluate

102 from learnProblem import Evaluate
103 import matplotlib.pyplot as plt
104
105 def test(**args):
106 data = Data_from_file('data/SPECT.csv', target_index=0)
107 # data = Data_from_file('data/mail_reading.csv', target_index=-1)
108 # data = Data_from_file('data/carbool.csv', target_index=-1)
109 learner = Linear_learner(data,**args)
110 learner.learn()
111 print("function learned is", learner.predictor_string())
112 for ecrit in Evaluate.all_criteria:
113 test_error = data.evaluate_dataset(data.test, learner.predictor,
ecrit)
114 print(" Average", ecrit.__doc__, "is", test_error)

The following plots the errors on the training and test sets as a function of
the number of steps of gradient descent.
learnLinear.py — (continued)

116 def plot_steps(learner=None,

117 data = None,
118 criterion=Evaluate.squared_loss,

https://aipython.org Version 0.9.9 October 15, 2023

176 7. Supervised Machine Learning

119 step=1,
120 num_steps=1000,
121 log_scale=True,
122 legend_label=""):
123 """
124 plots the training and test error for a learner.
125 data is the
126 learner_class is the class of the learning algorithm
127 criterion gives the evaluation criterion plotted on the y-axis
128 step specifies how many steps are run for each point on the plot
129 num_steps is the number of points to plot
130
131 """
132 if legend_label != "": legend_label+=" "
133 plt.ion()
134 plt.xlabel("step")
135 plt.ylabel("Average "+criterion.__doc__)
136 if log_scale:
137 plt.xscale('log') #plt.semilogx() #Makes a log scale
138 else:
139 plt.xscale('linear')
140 if data is None:
141 data = Data_from_file('data/holiday.csv', has_header=True,
num_train=19, target_index=-1)
142 #data = Data_from_file('data/SPECT.csv', target_index=0)
143 # data = Data_from_file('data/mail_reading.csv', target_index=-1)
144 # data = Data_from_file('data/carbool.csv', target_index=-1)
145 #random.seed(None) # reset seed
146 if learner is None:
147 learner = Linear_learner(data)
148 train_errors = []
149 test_errors = []
150 for i in range(1,num_steps+1,step):
151 test_errors.append(data.evaluate_dataset(data.test,
learner.predictor, criterion))
152 train_errors.append(data.evaluate_dataset(data.train,
learner.predictor, criterion))
153 learner.display(2, "Train error:",train_errors[-1],
154 "Test error:",test_errors[-1])
155 learner.learn(num_iter=step)
156 plt.plot(range(1,num_steps+1,step),train_errors,ls='-',label=legend_label+"training")
157 plt.plot(range(1,num_steps+1,step),test_errors,ls='--',label=legend_label+"test")
158 plt.legend()
159 plt.draw()
160 learner.display(1, "Train error:",train_errors[-1],
161 "Test error:",test_errors[-1])
162
163 if __name__ == "__main__":
164 test()
165

https://aipython.org Version 0.9.9 October 15, 2023

7.6. Linear Regression and Classification 177

training
1.1 test
1.0
Average log loss (bits)

0.9

0.8

0.7

0.6

0.5

0.4
100 101 102 103
step

Figure 7.3: plot steps for SPECT dataset

166 # This generates the figure

167 # from learnProblem import Data_set_augmented, prod_feat
168 # data = Data_from_file('data/SPECT.csv', prob_test=0.5, target_index=0,
seed=123)
169 # dataplus = Data_set_augmented(data, [], [prod_feat])
170 # plot_steps(data=data, num_steps=1000)
171 # plot_steps(data=dataplus, num_steps=1000) # warning very slow
Figure 7.3 shows the result of plot_steps(data=data, num_steps=1000) in
the code above. What would you expect to happen with the augmented data
(with extra features)? Hint: think about underitting and overfitting.
Exercise 7.11 The squashed learner only makes predictions in the range (0, 1).
If the output values are {1, 2, 3, 4} there is no use prediction less than 1 or greater
than 4. Change the squashed learner so that it can learn values in the range (1, 4).
Test it on the file 'data/car.csv'.
The following plots the prediction as a function of the function of the num-
ber of steps of gradient descent. We first define a version of range that allows
for real numbers (integers and floats).
learnLinear.py — (continued)

172 def arange(start,stop,step):

173 """returns enumeration of values in the range [start,stop) separated by
step.

https://aipython.org Version 0.9.9 October 15, 2023

178 7. Supervised Machine Learning

174 like the built-in range(start,stop,step) but allows for integers and
floats.
175 Note that rounding errors are expected with real numbers. (or use
numpy.arange)
176 """
177 while start<stop:
178 yield start
179 start += step
180
181 def plot_prediction(data,
182 learner = None,
183 minx = 0,
184 maxx = 5,
185 step_size = 0.01, # for plotting
186 label = "function"):
187 plt.ion()
188 plt.xlabel("x")
189 plt.ylabel("y")
190 if learner is None:
191 learner = Linear_learner(data, squashed=False)
192 learner.learning_rate=0.001
193 learner.learn(100)
194 learner.learning_rate=0.0001
195 learner.learn(1000)
196 learner.learning_rate=0.00001
197 learner.learn(10000)
198 learner.display(1,"function learned is", learner.predictor_string(),
199 "error=",data.evaluate_dataset(data.train, learner.predictor,
Evaluate.squared_loss))
200 plt.plot([e[0] for e in data.train],[e[-1] for e in
data.train],"bo",label="data")
201 plt.plot(list(arange(minx,maxx,step_size)),[learner.predictor([x])
202 for x in
arange(minx,maxx,step_size)],
203 label=label)
204 plt.legend()
205 plt.draw()

learnLinear.py — (continued)

207 from learnProblem import Data_set_augmented, power_feat

208 def plot_polynomials(data,
209 learner_class = Linear_learner,
210 max_degree = 5,
211 minx = 0,
212 maxx = 5,
213 num_iter = 1000000,
214 learning_rate = 0.00001,
215 step_size = 0.01, # for plotting
216 ):
217 plt.ion()

https://aipython.org Version 0.9.9 October 15, 2023

7.7. Boosting 179

218 plt.xlabel("x")
219 plt.ylabel("y")
220 plt.plot([e[0] for e in data.train],[e[-1] for e in
data.train],"ko",label="data")
221 x_values = list(arange(minx,maxx,step_size))
222 line_styles = ['-','--','-.',':']
223 colors = ['0.5','k','k','k','k']
224 for degree in range(max_degree):
225 data_aug = Data_set_augmented(data,[power_feat(n) for n in
range(1,degree+1)],
226 include_orig=False)
227 learner = learner_class(data_aug,squashed=False)
228 learner.learning_rate = learning_rate
229 learner.learn(num_iter)
230 learner.display(1,"For degree",degree,
231 "function learned is", learner.predictor_string(),
232 "error=",data.evaluate_dataset(data.train,
learner.predictor, Evaluate.squared_loss))
233 ls = line_styles[degree % len(line_styles)]
234 col = colors[degree % len(colors)]
235 plt.plot(x_values,[learner.predictor([x]) for x in x_values],
linestyle=ls, color=col,
236 label="degree="+str(degree))
237 plt.legend(loc='upper left')
238 plt.draw()
239
240 # Try:
241 # data0 = Data_from_file('data/simp_regr.csv', prob_test=0,
boolean_features=False, target_index=-1)
242 # plot_prediction(data0)
243 # plot_polynomials(data0)
244 # What if the step size was bigger?
245 #datam = Data_from_file('data/mail_reading.csv', target_index=-1)
246 #plot_prediction(datam)

7.7 Boosting
The following code implements functional gradient boosting for regression.
A Boosted dataset is created from a base dataset by subtracting the pre-
diction of the offset function from each example. This does not save the new
dataset, but generates it as needed. The amount of space used is constant, in-
dependent on the size of the dataset.
learnBoosting.py — Functional Gradient Boosting
11 from learnProblem import Data_set, Learner, Evaluate
12 from learnNoInputs import Predict
13 from learnLinear import sigmoid
14 import statistics
15 import random

https://aipython.org Version 0.9.9 October 15, 2023

180 7. Supervised Machine Learning

16
17 class Boosted_dataset(Data_set):
18 def __init__(self, base_dataset, offset_fun, subsample=1.0):
19 """new dataset which is like base_dataset,
20 but offset_fun(e) is subtracted from the target of each example e
21 """
22 self.base_dataset = base_dataset
23 self.offset_fun = offset_fun
24 self.train =
random.sample(base_dataset.train,int(subsample*len(base_dataset.train)))
25 self.test = base_dataset.test
26 #Data_set.__init__(self, base_dataset.train, base_dataset.test,
27 # base_dataset.prob_test, base_dataset.target_index)
28
29 #def create_features(self):
30 """creates new features - called at end of Data_set.init()
31 defines a new target
32 """
33 self.input_features = self.base_dataset.input_features
34 def newout(e):
35 return self.base_dataset.target(e) - self.offset_fun(e)
36 newout.frange = self.base_dataset.target.frange
37 newout.ftype = self.infer_type(newout.frange)
38 self.target = newout
39
40 def conditions(self, *args, colsample_bytree=0.5, **nargs):
41 conds = self.base_dataset.conditions(*args, **nargs)
42 return random.sample(conds, int(colsample_bytree*len(conds)))

A boosting learner takes in a dataset and a base learner, and returns a new
predictor. The base learner, takes a dataset, and returns a Learner object.

learnBoosting.py — (continued)

44 class Boosting_learner(Learner):
45 def __init__(self, dataset, base_learner_class, subsample=0.8):
46 self.dataset = dataset
47 self.base_learner_class = base_learner_class
48 self.subsample = subsample
49 mean = sum(self.dataset.target(e)
50 for e in self.dataset.train)/len(self.dataset.train)
51 self.predictor = lambda e:mean # function that returns mean for
each example
52 self.predictor.__doc__ = "lambda e:"+str(mean)
53 self.offsets = [self.predictor] # list of base learners
54 self.predictors = [self.predictor] # list of predictors
55 self.errors = [data.evaluate_dataset(data.test, self.predictor,
Evaluate.squared_loss)]
56 self.display(1,"Predict mean test set mean squared loss=",
self.errors[0] )
57
58

https://aipython.org Version 0.9.9 October 15, 2023

7.7. Boosting 181

59 def learn(self, num_ensembles=10):

60 """adds num_ensemble learners to the ensemble.
61 returns a new predictor.
62 """
63 for i in range(num_ensembles):
64 train_subset = Boosted_dataset(self.dataset, self.predictor,
subsample=self.subsample)
65 learner = self.base_learner_class(train_subset)
66 new_offset = learner.learn()
67 self.offsets.append(new_offset)
68 def new_pred(e, old_pred=self.predictor, off=new_offset):
69 return old_pred(e)+off(e)
70 self.predictor = new_pred
71 self.predictors.append(new_pred)
72 self.errors.append(data.evaluate_dataset(data.test,
self.predictor, Evaluate.squared_loss))
73 self.display(1,f"Iteration {len(self.offsets)-1},treesize =
{new_offset.num_leaves}. mean squared
loss={self.errors[-1]}")
74 return self.predictor
For testing, sp DT learner returns a learner that predicts the mean at the leaves
and is evaluated using squared loss. It can also take arguments to change the
default arguments for the trees.
learnBoosting.py — (continued)

76 # Testing
77
78 from learnDT import DT_learner
79 from learnProblem import Data_set, Data_from_file
80
81 def sp_DT_learner(split_to_optimize=Evaluate.squared_loss,
82 leaf_prediction=Predict.mean,**nargs):
83 """Creates a learner with different default arguments replaced by
**nargs
84 """
85 def new_learner(dataset):
86 return DT_learner(dataset,split_to_optimize=split_to_optimize,
87 leaf_prediction=leaf_prediction, **nargs)
88 return new_learner
89
90 #data = Data_from_file('data/car.csv', target_index=-1) regression
91 data = Data_from_file('data/student/student-mat-nq.csv',
separator=';',has_header=True,target_index=-1,seed=13,include_only=list(range(30))+[32])
#2.0537973790924946
92 #data = Data_from_file('data/SPECT.csv', target_index=0, seed=62) #123)
93 #data = Data_from_file('data/mail_reading.csv', target_index=-1)
94 #data = Data_from_file('data/holiday.csv', has_header=True, num_train=19,
target_index=-1)
95 #learner10 = Boosting_learner(data,
sp_DT_learner(split_to_optimize=Evaluate.squared_loss,

https://aipython.org Version 0.9.9 October 15, 2023

182 7. Supervised Machine Learning

leaf_prediction=Predict.mean, min_child_weight=10))
96 #learner7 = Boosting_learner(data, sp_DT_learner(0.7))
97 #learner5 = Boosting_learner(data, sp_DT_learner(0.5))
98 #predictor9 =learner9.learn(10)
99 #for i in learner9.offsets: print(i.__doc__)
100 import matplotlib.pyplot as plt
101
102 def plot_boosting_trees(data, steps=10, mcws=[30,20,20,10], gammas=
[100,200,300,500]):
103 # to reduce clutter uncomment one of following two lines
104 #mcws=[10]
105 #gammas=[200]
106 learners = [(mcw, gamma, Boosting_learner(data,
sp_DT_learner(min_child_weight=mcw, gamma=gamma)))
107 for gamma in gammas for mcw in mcws
108 ]
109 plt.ion()
110 plt.xscale('linear') # change between log and linear scale
111 plt.xlabel("number of trees")
112 plt.ylabel("mean squared loss")
113 markers = (m+c for c in ['k','g','r','b','m','c','y'] for m in
['-','--','-.',':'])
114 for (mcw,gamma,learner) in learners:
115 data.display(1,f"min_child_weight={mcw}, gamma={gamma}")
116 learner.learn(steps)
117 plt.plot(range(steps+1), learner.errors, next(markers),
118 label=f"min_child_weight={mcw}, gamma={gamma}")
119 plt.legend()
120 plt.draw()
121
122 # plot_boosting_trees(data)

7.7.1 Gradient Tree Boosting

The following implements gradient Boosted trees for classification. If you want
to use this gradient tree boosting for a real problem, we recommend using
XGBoost [Chen and Guestrin, 2016] or LightGBM [Ke, Meng, Finley, Wang,
Chen, Ma, Ye, and Liu, 2017].
GTB_learner subclasses DT-learner. The method learn_tree is used un-
changed. DT-learner assumes that the value at the leaf is the prediction of the
leaf, thus leaf_value needs to be overridden. It also assumes that all nodes
at a leaf have the same prediction, but in GBT the elements of a leaf can have
different values, depending on the previous trees. Thus sum_losses also needs
to be overridden.
learnBoosting.py — (continued)

124 class GTB_learner(DT_learner):

125 def __init__(self, dataset, number_trees, lambda_reg=1, gamma=0,
**dtargs):

https://aipython.org Version 0.9.9 October 15, 2023

7.7. Boosting 183

126 DT_learner.init(self, dataset,

split_to_optimize=Evaluate.log_loss, **dtargs)
127 self.number_trees = number_trees
128 self.lambda_reg = lambda_reg
129 self.gamma = gamma
130 self.trees = []
131
132 def learn(self):
133 for i in range(self.number_trees):
134 tree =
self.learn_tree(self.dataset.conditions(self.max_num_cuts),
self.train)
135 self.trees.append(tree)
136 self.display(1,f"""Iteration {i} treesize = {tree.num_leaves}
train logloss={
137 self.dataset.evaluate_dataset(self.dataset.train,
self.gtb_predictor, Evaluate.log_loss)
138 } test logloss={
139 self.dataset.evaluate_dataset(self.dataset.test,
self.gtb_predictor, Evaluate.log_loss)}""")
140 return self.gtb_predictor
141
142 def gtb_predictor(self, example, extra=0):
143 """prediction for example,
144 extras is an extra contribution for this example being considered
145 """
146 return sigmoid(sum(t(example) for t in self.trees)+extra)
147
148 def leaf_value(self, egs, domain=[0,1]):
149 """value at the leaves for examples egs
150 domain argument is ignored"""
151 pred_acts = [(self.gtb_predictor(e),self.target(e)) for e in egs]
152 return sum(a-p for (p,a) in pred_acts) /(sum(p*(1-p) for (p,a) in
pred_acts)+self.lambda_reg)
153
154
155 def sum_losses(self, data_subset):
156 """returns sum of losses for dataset (assuming a leaf is formed
with no more splits)
157 """
158 leaf_val = self.leaf_value(data_subset)
159 error = sum(Evaluate.log_loss(self.gtb_predictor(e,leaf_val),
self.target(e))
160 for e in data_subset) + self.gamma
161 return error

Testing

learnBoosting.py — (continued)

163 # data = Data_from_file('data/carbool.csv', target_index=-1, seed=123)

164 # gtb_learner = GTB_learner(data, 10)

https://aipython.org Version 0.9.9 October 15, 2023

184 7. Supervised Machine Learning

165 # gtb_learner.learn()

https://aipython.org Version 0.9.9 October 15, 2023

Chapter 8

Neural Networks and Deep

Learning

Warning: this is not meant to be an efficient implementation of deep learning.

If you want to do serious machine learning on meduim-sized or large data,
we recommend Keras (https://keras.io) [Chollet, 2021] or PyTorch (https:
//pytorch.org), which are very efficient, particularly on GPUs. They are, how-
ever, black boxes. The AIPython neural network code should be seen like a car
engine made of glass; you can see exactly how it works, even if it is not fast.
The parameters that are the same as in Keras have the same names.

8.1 Layers
A neural network is built from layers.
This provides a modular implementation of layers. Layers can easily be
stacked in many configurations. A layer needs to implement a function to com-
pute the output values from the inputs, a way to back-propagate the error, and
perhaps update its parameters.
learnNN.py — Neural Network Learning
11 from learnProblem import Learner, Data_set, Data_from_file,
Data_from_files, Evaluate
12 from learnLinear import sigmoid, one, softmax, indicator
13 import random, math, time
14
15 class Layer(object):
16 def __init__(self, nn, num_outputs=None):
17 """Given a list of inputs, outputs will produce a list of length
num_outputs.
18 nn is the neural network this layer is part of

185
186 8. Neural Networks and Deep Learning

19 num outputs is the number of outputs for this layer.

20 """
21 self.nn = nn
22 self.num_inputs = nn.num_outputs # output of nn is the input to
this layer
23 if num_outputs:
24 self.num_outputs = num_outputs
25 else:
26 self.num_outputs = nn.num_outputs # same as the inputs
27
28 def output_values(self,input_values, training=False):
29 """Return the outputs for this layer for the given input values.
30 input_values is a list of the inputs to this layer (of length
num_inputs)
31 returns a list of length self.num_outputs.
32 It can act differently when training and when predicting.
33 """
34 raise NotImplementedError("output_values") # abstract method
35
36 def backprop(self,errors):
37 """Backpropagate the errors on the outputs
38 errors is a list of errors for the outputs (of length
self.num_outputs).
39 Returns the errors for the inputs to this layer (of length
self.num_inputs).
40
41 You can assume that this is only called after corresponding
output_values,
42 which can remember information information required for the
back-propagation.
43 """
44 raise NotImplementedError("backprop") # abstract method
45
46 def update(self):
47 """updates parameters after a batch.
48 overridden by layers that have parameters
49 """
50 pass
A linear layer maintains an array of weights. self .weights[o][i] is the weight
between input i and output o. A 1 is added to the end of the inputs. The default
initialization is the Glorot uniform initializer [Glorot and Bengio, 2010], which
is the default in Keras. An alternative is to provide a limit, in which case the
values are selected uniformly in the range [−limit, limit]. Keras treats the bias
separately, and defaults to zero.
learnNN.py — (continued)

52 class Linear_complete_layer(Layer):
53 """a completely connected layer"""
54 def __init__(self, nn, num_outputs, limit=None):
55 """A completely connected linear layer.

https://aipython.org Version 0.9.9 October 15, 2023

8.1. Layers 187

56 nn is a neural network that the inputs come from

57 num_outputs is the number of outputs
58 the random initialization of parameters is in range [-limit,limit]
59 """
60 Layer.__init__(self, nn, num_outputs)
61 if limit is None:
62 limit =math.sqrt(6/(self.num_inputs+self.num_outputs))
63 # self.weights[o][i] is the weight between input i and output o
64 self.weights = [[random.uniform(-limit, limit) if inf <
self.num_inputs else 0
65 for inf in range(self.num_inputs+1)]
66 for outf in range(self.num_outputs)]
67 self.delta = [[0 for inf in range(self.num_inputs+1)]
68 for outf in range(self.num_outputs)]
69
70 def output_values(self,input_values, training=False):
71 """Returns the outputs for the input values.
72 It remembers the values for the backprop.
73
74 Note in self.weights there is a weight list for every output,
75 so wts in self.weights loops over the outputs.
76 The bias is the *last* value of each list in self.weights.
77 """
78 self.inputs = input_values + [1]
79 return [sum(w*val for (w,val) in zip(wts,self.inputs))
80 for wts in self.weights]
81
82 def backprop(self,errors):
83 """Backpropagate the errors, updating the weights and returning the
error in its inputs.
84 """
85 input_errors = [0]*(self.num_inputs+1)
86 for out in range(self.num_outputs):
87 for inp in range(self.num_inputs+1):
88 input_errors[inp] += self.weights[out][inp] * errors[out]
89 self.delta[out][inp] += self.inputs[inp] * errors[out]
90 return input_errors[:-1] # remove the error for the "1"
91
92 def update(self):
93 """updates parameters after a batch"""
94 batch_step_size = self.nn.learning_rate / self.nn.batch_size
95 for out in range(self.num_outputs):
96 for inp in range(self.num_inputs+1):
97 self.weights[out][inp] -= batch_step_size *
self.delta[out][inp]
98 self.delta[out][inp] = 0

The standard activation function for hidden nodes is the ReLU.

learnNN.py — (continued)

100 class ReLU_layer(Layer):

https://aipython.org Version 0.9.9 October 15, 2023

188 8. Neural Networks and Deep Learning

101 """Rectified linear unit (ReLU) f(z) = max(0, z).

102 The number of outputs is equal to the number of inputs.
103 """
104 def __init__(self, nn):
105 Layer.__init__(self, nn)
106
107 def output_values(self, input_values, training=False):
108 """Returns the outputs for the input values.
109 It remembers the input values for the backprop.
110 """
111 self.input_values = input_values
112 self.outputs= [max(0,inp) for inp in input_values]
113 return self.outputs
114
115 def backprop(self,errors):
116 """Returns the derivative of the errors"""
117 return [e if inp>0 else 0 for e,inp in zip(errors,
self.input_values)]
One of the old standards for the activation function for hidden layers is the
sigmoid. It is included here to experiment with.
learnNN.py — (continued)

119 class Sigmoid_layer(Layer):

120 """sigmoids of the inputs.
121 The number of outputs is equal to the number of inputs.
122 Each output is the sigmoid of its corresponding input.
123 """
124 def __init__(self, nn):
125 Layer.__init__(self, nn)
126
127 def output_values(self, input_values, training=False):
128 """Returns the outputs for the input values.
129 It remembers the output values for the backprop.
130 """
131 self.outputs= [sigmoid(inp) for inp in input_values]
132 return self.outputs
133
134 def backprop(self,errors):
135 """Returns the derivative of the errors"""
136 return [e*out*(1-out) for e,out in zip(errors, self.outputs)]

8.2 Feedforward Networks

learnNN.py — (continued)

138 class NN(Learner):

139 def __init__(self, dataset, validation_proportion = 0.1,
learning_rate=0.001):
140 """Creates a neural network for a dataset,

https://aipython.org Version 0.9.9 October 15, 2023

8.2. Feedforward Networks 189

141 layers is the list of layers

142 """
143 self.dataset = dataset
144 self.output_type = dataset.target.ftype
145 self.learning_rate = learning_rate
146 self.input_features = dataset.input_features
147 self.num_outputs = len(self.input_features)
148 validation_num = int(len(self.dataset.train)*validation_proportion)
149 if validation_num > 0:
150 random.shuffle(self.dataset.train)
151 self.validation_set = self.dataset.train[-validation_num:]
152 self.training_set = self.dataset.train[:-validation_num]
153 else:
154 self.validation_set = []
155 self.training_set = self.dataset.train
156 self.layers = []
157 self.bn = 0 # number of batches run
158
159 def add_layer(self,layer):
160 """add a layer to the network.
161 Each layer gets number of inputs from the previous layers outputs.
162 """
163 self.layers.append(layer)
164 self.num_outputs = layer.num_outputs
165
166 def predictor(self,ex):
167 """Predicts the value of the first output for example ex.
168 """
169 values = [f(ex) for f in self.input_features]
170 for layer in self.layers:
171 values = layer.output_values(values)
172 return sigmoid(values[0]) if self.output_type =="boolean" \
173 else softmax(values, self.dataset.target.frange) if
self.output_type == "categorical" \
174 else values[0]
175
176 def predictor_string(self):
177 return "not implemented"

The learn method learns a network.

learnNN.py — (continued)

179 def learn(self, epochs=5, batch_size=32, num_iter = None,

report_each=10):
180 """Learns parameters for a neural network using stochastic gradient
decent.
181 epochs is the number of times through the data (on average)
182 batch_size is the maximum size of each batch
183 num_iter is the number of iterations over the batches
184 - overrides epochs if provided (allows for fractions of epochs)

https://aipython.org Version 0.9.9 October 15, 2023

190 8. Neural Networks and Deep Learning

185 report_each means give the errors after each multiple of that
iterations
186 """
187 self.batch_size = min(batch_size, len(self.training_set)) # don't
have batches bigger than training size
188 if num_iter is None:
189 num_iter = (epochs * len(self.training_set)) // self.batch_size
190 #self.display(0,"Batch\t","\t".join(criterion.__doc__ for criterion
in Evaluate.all_criteria))
191 for i in range(num_iter):
192 batch = random.sample(self.training_set, self.batch_size)
193 for e in batch:
194 # compute all outputs
195 values = [f(e) for f in self.input_features]
196 for layer in self.layers:
197 values = layer.output_values(values, training=True)
198 # backpropagate
199 predicted = [sigmoid(v) for v in values] if self.output_type
== "boolean"\
200 else softmax(values) if self.output_type ==
"categorical"\
201 else values
202 actuals = indicator(self.dataset.target(e),
self.dataset.target.frange) \
203 if self.output_type == "categorical"\
204 else [self.dataset.target(e)]
205 errors = [pred-obsd for (obsd,pred) in
zip(actuals,predicted)]
206 for layer in reversed(self.layers):
207 errors = layer.backprop(errors)
208 # Update all parameters in batch
209 for layer in self.layers:
210 layer.update()
211 self.bn+=1
212 if (i+1)%report_each==0:
213 self.display(0,self.bn,"\t",
214 "\t\t".join("{:.4f}".format(
215 self.dataset.evaluate_dataset(self.validation_set,
self.predictor, criterion))
216 for criterion in Evaluate.all_criteria),
sep="")

8.3 Improved Optimization

8.3.1 Momentum

learnNN.py — (continued)

218 class Linear_complete_layer_momentum(Linear_complete_layer):

219 """a completely connected layer"""

https://aipython.org Version 0.9.9 October 15, 2023

8.3. Improved Optimization 191

220 def init(self, nn, num_outputs, limit=None, alpha=0.9, epsilon =

1e-07, vel0=0):
221 """A completely connected linear layer.
222 nn is a neural network that the inputs come from
223 num_outputs is the number of outputs
224 max_init is the maximum value for random initialization of
parameters
225 vel0 is the initial velocity for each parameter
226 """
227 Linear_complete_layer.__init__(self, nn, num_outputs, limit=limit)
228 # self.weights[o][i] is the weight between input i and output o
229 self.velocity = [[vel0 for inf in range(self.num_inputs+1)]
230 for outf in range(self.num_outputs)]
231 self.alpha = alpha
232 self.epsilon = epsilon
233
234 def update(self):
235 """updates parameters after a batch"""
236 batch_step_size = self.nn.learning_rate / self.nn.batch_size
237 for out in range(self.num_outputs):
238 for inp in range(self.num_inputs+1):
239 self.velocity[out][inp] = self.alpha*self.velocity[out][inp]
- batch_step_size * self.delta[out][inp]
240 self.weights[out][inp] += self.velocity[out][inp]
241 self.delta[out][inp] = 0

8.3.2 RMS-Prop

learnNN.py — (continued)

243 class Linear_complete_layer_RMS_Prop(Linear_complete_layer):

244 """a completely connected layer"""
245 def __init__(self, nn, num_outputs, limit=None, rho=0.9, epsilon =
1e-07):
246 """A completely connected linear layer.
247 nn is a neural network that the inputs come from
248 num_outputs is the number of outputs
249 max_init is the maximum value for random initialization of
parameters
250 """
251 Linear_complete_layer.__init__(self, nn, num_outputs, limit=limit)
252 # self.weights[o][i] is the weight between input i and output o
253 self.ms = [[0 for inf in range(self.num_inputs+1)]
254 for outf in range(self.num_outputs)]
255 self.rho = rho
256 self.epsilon = epsilon
257
258 def update(self):
259 """updates parameters after a batch"""
260 for out in range(self.num_outputs):
261 for inp in range(self.num_inputs+1):

https://aipython.org Version 0.9.9 October 15, 2023

192 8. Neural Networks and Deep Learning

262 gradient = self.delta[out][inp] / self.nn.batch_size

263 self.ms[out][inp] = self.rho*self.ms[out][inp]+ (1-self.rho)
* gradient**2
264 self.weights[out][inp] -=
self.nn.learning_rate/(self.ms[out][inp]+self.epsilon)**0.5
* gradient
265 self.delta[out][inp] = 0

8.4 Dropout
Dropout is implemented as a layer.

learnNN.py — (continued)

267 from utilities import flip

268 class Dropout_layer(Layer):
269 """Dropout layer
270 """
271
272 def __init__(self, nn, rate=0):
273 """
274 rate is fraction of the input units to drop. 0 =< rate < 1
275 """
276 self.rate = rate
277 Layer.__init__(self, nn)
278
279 def output_values(self, input_values, training=False):
280 """Returns the outputs for the input values.
281 It remembers the input values for the backprop.
282 """
283 if training:
284 scaling = 1/(1-self.rate)
285 self.mask = [0 if flip(self.rate) else 1
286 for _ in input_values]
287 return [x*y*scaling for (x,y) in zip(input_values, self.mask)]
288 else:
289 return input_values
290
291 def backprop(self,errors):
292 """Returns the derivative of the errors"""
293 return [x*y for (x,y) in zip(errors, self.mask)]
294
295 class Dropout_layer_0(Layer):
296 """Dropout layer
297 """
298
299 def __init__(self, nn, rate=0):
300 """
301 rate is fraction of the input units to drop. 0 =< rate < 1
302 """
303 self.rate = rate

https://aipython.org Version 0.9.9 October 15, 2023

8.4. Dropout 193

304 Layer.init(self, nn)

305
306 def output_values(self, input_values, training=False):
307 """Returns the outputs for the input values.
308 It remembers the input values for the backprop.
309 """
310 if training:
311 scaling = 1/(1-self.rate)
312 self.outputs= [0 if flip(self.rate) else inp*scaling # make 0
with probability rate
313 for inp in input_values]
314 return self.outputs
315 else:
316 return input_values
317
318 def backprop(self,errors):
319 """Returns the derivative of the errors"""
320 return errors

8.4.1 Examples
The following constructs a neural network with one hidden layer. The output
is assumed to be Boolean or Real. If it is categorical, the final layer should
have the same number of outputs as the number of cetegories (so it can use a
softmax).

learnNN.py — (continued)

322 #data = Data_from_file('data/mail_reading.csv', target_index=-1)

323 #data = Data_from_file('data/mail_reading_consis.csv', target_index=-1)
324 data = Data_from_file('data/SPECT.csv', prob_test=0.3, target_index=0,
seed=12345)
325 #data = Data_from_file('data/iris.data', prob_test=0.2, target_index=-1) #
150 examples approx 120 test + 30 test
326 #data = Data_from_file('data/if_x_then_y_else_z.csv', num_train=8,
target_index=-1) # not linearly sep
327 #data = Data_from_file('data/holiday.csv', target_index=-1) #,
num_train=19)
328 #data = Data_from_file('data/processed.cleveland.data', target_index=-1)
329 #random.seed(None)
330
331 # nn3 is has a single hidden layer of width 3
332 nn3 = NN(data, validation_proportion = 0)
333 nn3.add_layer(Linear_complete_layer(nn3,3))
334 #nn3.add_layer(Sigmoid_layer(nn3))
335 nn3.add_layer(ReLU_layer(nn3))
336 nn3.add_layer(Linear_complete_layer(nn3,1)) # when using
output_type="boolean"
337 #nn3.learn(epochs = 100)
338

https://aipython.org Version 0.9.9 October 15, 2023

194 8. Neural Networks and Deep Learning

339 # nn3do is like nn3 but with dropout on the hidden layer
340 nn3do = NN(data, validation_proportion = 0)
341 nn3do.add_layer(Linear_complete_layer(nn3do,3))
342 #nn3.add_layer(Sigmoid_layer(nn3)) # comment this or the next
343 nn3do.add_layer(ReLU_layer(nn3do))
344 nn3do.add_layer(Dropout_layer(nn3do, rate=0.5))
345 nn3do.add_layer(Linear_complete_layer(nn3do,1))
346 #nn3do.learn(epochs = 100)
347
348 # nn3_rmsp is like nn3 but uses RMS prop
349 nn3_rmsp = NN(data, validation_proportion = 0)
350 nn3_rmsp.add_layer(Linear_complete_layer_RMS_Prop(nn3_rmsp,3))
351 #nn3_rmsp.add_layer(Sigmoid_layer(nn3_rmsp)) # comment this or the next
352 nn3_rmsp.add_layer(ReLU_layer(nn3_rmsp))
353 nn3_rmsp.add_layer(Linear_complete_layer_RMS_Prop(nn3_rmsp,1))
354 #nn3_rmsp.learn(epochs = 100)
355
356 # nn3_m is like nn3 but uses momentum
357 mm1_m = NN(data, validation_proportion = 0)
358 mm1_m.add_layer(Linear_complete_layer_momentum(mm1_m,3))
359 #mm1_m.add_layer(Sigmoid_layer(mm1_m)) # comment this or the next
360 mm1_m.add_layer(ReLU_layer(mm1_m))
361 mm1_m.add_layer(Linear_complete_layer_momentum(mm1_m,1))
362 #mm1_m.learn(epochs = 100)
363
364 # nn2 has a single a hidden layer of width 2
365 nn2 = NN(data, validation_proportion = 0)
366 nn2.add_layer(Linear_complete_layer_RMS_Prop(nn2,2))
367 nn2.add_layer(ReLU_layer(nn2))
368 nn2.add_layer(Linear_complete_layer_RMS_Prop(nn2,1))
369
370 # nn5 is has a single hidden layer of width 5
371 nn5 = NN(data, validation_proportion = 0)
372 nn5.add_layer(Linear_complete_layer_RMS_Prop(nn5,5))
373 nn5.add_layer(ReLU_layer(nn5))
374 nn5.add_layer(Linear_complete_layer_RMS_Prop(nn5,1))
375
376 # nn0 has no hidden layers, and so is just logistic regression:
377 nn0 = NN(data, validation_proportion = 0) #learning_rate=0.05)
378 nn0.add_layer(Linear_complete_layer(nn0,1))
379 # Or try this for RMS-Prop:
380 #nn0.add_layer(Linear_complete_layer_RMS_Prop(nn0,1))
Plotting. Figure 8.1 shows the training and test performance on the SPECT
dataset for the architectures above. Note the nn5 test has infinite log loss after
about 45,000 steps. The noisyness of the predictions might indicate that the
step size is too big. This was produced by the code below:
learnNN.py — (continued)

382 from learnLinear import plot_steps

383 from learnProblem import Evaluate

https://aipython.org Version 0.9.9 October 15, 2023

8.4. Dropout 195

2.5

2.0

nn0 training
Average log loss (bits)

1.5 nn0 test

nn2 training
nn2 test
nn3 training
nn3 test
nn5 training
1.0 nn5 test

0.5

0 20000 40000 60000 80000 100000

step

Figure 8.1: Plotting train and test log loss for various algorithms on SPECT dataset

384
385 # To show plots first choose a criterion to use
386 # crit = Evaluate.log_loss
387 # crit = Evaluate.accuracy
388 # plot_steps(learner = nn0, data = data, criterion=crit, num_steps=10000,
log_scale=False, legend_label="nn0")
389 # plot_steps(learner = nn2, data = data, criterion=crit, num_steps=10000,
log_scale=False, legend_label="nn2")
390 # plot_steps(learner = nn3, data = data, criterion=crit, num_steps=10000,
log_scale=False, legend_label="nn3")
391 # plot_steps(learner = nn5, data = data, criterion=crit, num_steps=10000,
log_scale=False, legend_label="nn5")
392
393 # for (nn,nname) in [(nn0,"nn0"),(nn2,"nn2"),(nn3,"nn3"),(nn5,"nn5")]:
plot_steps(learner = nn, data = data, criterion=crit,
num_steps=100000, log_scale=False, legend_label=nname)
394
395 # Print some training examples
396 #for eg in random.sample(data.train,10): print(eg,nn3.predictor(eg))
397
398 # Print some test examples
399 #for eg in random.sample(data.test,10): print(eg,nn3.predictor(eg))

https://aipython.org Version 0.9.9 October 15, 2023

196 8. Neural Networks and Deep Learning

400
401 # To see the weights learned in linear layers
402 # nn3.layers[0].weights
403 # nn3.layers[2].weights
404
405 # Print test:
406 # for e in data.train: print(e,nn0.predictor(e))
407
408 def test(data, hidden_widths = [5], epochs=100,
409 optimizers = [Linear_complete_layer,
410 Linear_complete_layer_momentum,
Linear_complete_layer_RMS_Prop]):
411 data.display(0,"Batch\t","\t".join(criterion.__doc__ for criterion in
Evaluate.all_criteria))
412 for optimizer in optimizers:
413 nn = NN(data)
414 for width in hidden_widths:
415 nn.add_layer(optimizer(nn,width))
416 nn.add_layer(ReLU_layer(nn))
417 if data.target.ftype == "boolean":
418 nn.add_layer(optimizer(nn,1))
419 else:
420 error(f"Not implemented: {data.output_type}")
421 nn.learn(epochs)

The following tests on MNIST. The original files are from http://yann.lecun.
com/exdb/mnist/. This code assumes you use the csv files from https://pjreddie.
com/projects/mnist-in-csv/, and put them in the directory ../MNIST/. Note
that this is very inefficient; you would be better to use Keras or Pytorch. There
are 28 ∗ 28 = 784 input units and 512 hidden units, which makes 401,408 pa-
rameters for the lowest linear layer. So don’t be surprised when it takes many
hours in AIPython (even if it only takes a few seconds in Keras).

learnNN.py — (continued)

423 # Simplified version: (6000 training instances)

424 # data_mnist = Data_from_file('../MNIST/mnist_train.csv', prob_test=0.9,
target_index=0, boolean_features=False, target_type="categorical")
425
426 # Full version:
427 # data_mnist = Data_from_files('../MNIST/mnist_train.csv',
'../MNIST/mnist_test.csv', target_index=0, boolean_features=False,
target_type="categorical")
428
429 # nn_mnist = NN(data_mnist, validation_proportion = 0.02,
learning_rate=0.001) #validation set = 1200
430 # nn_mnist.add_layer(Linear_complete_layer_RMS_Prop(nn_mnist,512));
nn_mnist.add_layer(ReLU_layer(nn_mnist));
nn_mnist.add_layer(Linear_complete_layer_RMS_Prop(nn_mnist,10))
431 # start_time = time.perf_counter();nn_mnist.learn(epochs=1,
batch_size=128);end_time = time.perf_counter();print("Time:", end_time

https://aipython.org Version 0.9.9 October 15, 2023

8.4. Dropout 197

- start_time,"seconds") #1 epoch
432 # determine test error:
433 # data_mnist.evaluate_dataset(data_mnist.test, nn_mnist.predictor,
Evaluate.accuracy)
434 # Print some random predictions:
435 # for eg in random.sample(data_mnist.test,10):
print(data_mnist.target(eg),nn_mnist.predictor(eg),nn_mnist.predictor(eg)[data_mnist.target(eg

Exercise 8.1 In the definition of nn3 above, for each of the following, first hy-
pothesize what will happen, then test your hypothesis, then explain whether you
testing confirms your hypothesis or not. Test it for more than one data set, and use
more than one run for each data set.
(a) Which fits the data better, having a sigmoid layer or a ReLU layer after the
first linear layer?
(b) Which is faster, having a sigmoid layer or a ReLU layer after the first linear
layer?
(c) What happens if you have both the sigmoid layer and then a ReLU layer
after the first linear layer and before the second linear layer?
(d) What happens if you have a ReLU layer then a sigmoid layer after the first
linear layer and before the second linear layer?
(e) What happens if you have neither the sigmoid layer nor a ReLU layer after
the first linear layer?
Exercise 8.2 Do some

https://aipython.org Version 0.9.9 October 15, 2023

Chapter 9

Reasoning with Uncertainty

9.1 Representing Probabilistic Models

A probabilisitic model uses the same definition of a variable as a CSP (Section
4.1.1, page 67). A variable consists of a name, a domain and an optional (x,y)
position (for displaying). The domain of a variable is a list or a tuple, as the
ordering will matter in the representation of factors.

9.2 Representing Factors

A factor is, mathematically, a function from variables into a number; that is
given a value for each of its variable, it gives a number. Factors are used for
conditional probabilities, utilities in the next chapter, and are explicitly con-
structed by some algorithms (in particular variable elimination).
A variable assignment, or just assignment, is represented as a {variable :
value} dictionary. A factor can be evaluated when all of its variables are as-
signed. The method get_value evaluates the factor for an assignment. The
assignment can include extra variables not in the factor. This method needs to
be defined for every subclass.
probFactors.py — Factors for graphical models
11 from display import Displayable
12 import math
13
14 class Factor(Displayable):
15 nextid=0 # each factor has a unique identifier; for printing
16
17 def __init__(self, variables, name=None):
18 self.variables = variables # list of variables

199
200 9. Reasoning with Uncertainty

19 if name:
20 self.name = name
21 else:
22 self.name = f"f{Factor.nextid}"
23 Factor.nextid += 1
24
25 def can_evaluate(self,assignment):
26 """True when the factor can be evaluated in the assignment
27 assignment is a {variable:value} dict
28 """
29 return all(v in assignment for v in self.variables)
30
31 def get_value(self,assignment):
32 """Returns the value of the factor given the assignment of values
to variables.
33 Needs to be defined for each subclass.
34 """
35 assert self.can_evaluate(assignment)
36 raise NotImplementedError("get_value") # abstract method
The method __str__ returns a brief definition (like “f7(X,Y,Z)”).The method
to_table returns string representations of a table showing all of the assign-
ments of values to variables, and the corresponding value.
probFactors.py — (continued)

38 def __str__(self):
39 """returns a string representing a summary of the factor"""
40 return f"{self.name}({','.join(str(var) for var in
self.variables)})"
41
42 def to_table(self, variables=None, given={}):
43 """returns a string representation of the factor.
44 Allows for an arbitrary variable ordering.
45 variables is a list of the variables in the factor
46 (can contain other variables)"""
47 if variables==None:
48 variables = [v for v in self.variables if v not in given]
49 else: #enforce ordering and allow for extra variables in ordering
50 variables = [v for v in variables if v in self.variables and v
not in given]
51 head = "\t".join(str(v) for v in variables)+"\t"+self.name
52 return head+"\n"+self.ass_to_str(variables, given, variables)
53
54 def ass_to_str(self, vars, asst, allvars):
55 #print(f"ass_to_str({vars}, {asst}, {allvars})")
56 if vars:
57 return "\n".join(self.ass_to_str(vars[1:], {**asst,
vars[0]:val}, allvars)
58 for val in vars[0].domain)
59 else:
60 val = self.get_value(asst)

https://aipython.org Version 0.9.9 October 15, 2023

9.3. Conditional Probability Distributions 201

61 val_st = "{:.6f}".format(val) if isinstance(val,float) else

str(val)
62 return ("\t".join(str(asst[var]) for var in allvars)
63 + "\t"+val_st)
64
65 __repr__ = __str__

9.3 Conditional Probability Distributions

A conditional probability distribution (CPD) is a type of factor that represents
a conditional probability. A CPD representing P(X | Y1 . . . Yk ) is a type of
factor, where given values for X and each Yi returns a number.
probFactors.py — (continued)

67 class CPD(Factor):
68 def __init__(self, child, parents):
69 """represents P(variable | parents)
70 """
71 self.parents = parents
72 self.child = child
73 Factor.__init__(self, parents+[child], name=f"Probability")
74
75 def __str__(self):
76 """A brief description of a factor using in tracing"""
77 if self.parents:
78 return f"P({self.child}|{','.join(str(p) for p in
self.parents)})"
79 else:
80 return f"P({self.child})"
81
82 __repr__ = __str__
A constant CPD has no parents, and has probability 1 when the variable has
the value specified, and 0 when the variable has a different value.
probFactors.py — (continued)

84 class ConstantCPD(CPD):
85 def __init__(self, variable, value):
86 CPD.__init__(self, variable, [])
87 self.value = value
88 def get_value(self, assignment):
89 return 1 if self.value==assignment[self.child] else 0

9.3.1 Logistic Regression

A logistic regression CPD, for Boolean variable X represents P(X=True | Y1 . . . Yk ),
using k + 1 real-values weights so
P(X=True | Y1 . . . Yk ) = sigmoid(w0 + ∑ wi Yi )
i

https://aipython.org Version 0.9.9 October 15, 2023

202 9. Reasoning with Uncertainty

where for Boolean Yi , True is represented as 1 and False as 0.

probFactors.py — (continued)

91 from learnLinear import sigmoid, logit

92
93 class LogisticRegression(CPD):
94 def __init__(self, child, parents, weights):
95 """A logistic regression representation of a conditional
probability.
96 child is the Boolean (or 0/1) variable whose CPD is being defined
97 parents is the list of parents
98 weights is list of parameters, such that weights[i+1] is the weight
for parents[i]
99 """
100 assert len(weights) == 1+len(parents)
101 CPD.__init__(self, child, parents)
102 self.weights = weights
103
104 def get_value(self,assignment):
105 assert self.can_evaluate(assignment)
106 prob = sigmoid(self.weights[0]
107 + sum(self.weights[i+1]*assignment[self.parents[i]]
108 for i in range(len(self.parents))))
109 if assignment[self.child]: #child is true
110 return prob
111 else:
112 return (1-prob)

9.3.2 Noisy-or
A noisy-or, for Boolean variable X with Boolean parents Y1 . . . Yk is parametrized
by k + 1 parameters p0 , p1 , . . . , pk , where each 0 ≤ pi ≤ 1. The semantics is de-
fined as though there are k + 1 hidden variables Z0 , Z1 . . . Zk , where P(Z0 ) = p0
and P(Zi | Yi ) = pi for i ≥ 1, and where X is true if and only if Z0 ∨ Z1 ∨ · · · ∨ Zk
(where ∨ is “or”). Thus X is false if all of the Zi are false. Intuitively, Z0 is the
probability of X when all Yi are false and each Zi is a noisy (probabilistic) mea-
sure that Yi makes X true, and X only needs one to make it true.
probFactors.py — (continued)

114 class NoisyOR(CPD):

115 def __init__(self, child, parents, weights):
116 """A noisy representation of a conditional probability.
117 variable is the Boolean (or 0/1) child variable whose CPD is being
defined
118 parents is the list of Boolean (or 0/1) parents
119 weights is list of parameters, such that weights[i+1] is the weight
for parents[i]
120 """
121 assert len(weights) == 1+len(parents)

https://aipython.org Version 0.9.9 October 15, 2023

9.3. Conditional Probability Distributions 203

122 CPD.init(self, child, parents)

123 self.weights = weights
124
125 def get_value(self,assignment):
126 assert self.can_evaluate(assignment)
127 probfalse = (1-self.weights[0])*math.prod(1-self.weights[i+1]
128 for i in
range(len(self.parents))
129 if
assignment[self.parents[i]])
130 if assignment[self.child]:
131 return 1-probfalse
132 else:
133 return probfalse

9.3.3 Tabular Factors and Prob

A tabular factor is a factor that represents each assignment of values to vari-
ables separately. It is represented by a Python array (or python dict). If the
variables are V1 , V2 , . . . , Vk , the value of f (V1 = v1 , V2 = v1 , . . . , Vk = vk ) is
stored in f [v1 ][v2 ] . . . [vk ].
If the domain of Vi is [0, . . . , ni − 1] this can be represented as an array. Oth-
erwise we can use a dictionary. Python is nice in that it doesn’t care, whether
an array or dict is used except when enumerating the values; enumerating a
dict gives the keys (the variables) but enumerating an array gives the values.
So we have to be careful not to do this.
probFactors.py — (continued)

135 class TabFactor(Factor):

136
137 def __init__(self, variables, values, name=None):
138 Factor.__init__(self, variables, name=name)
139 self.values = values
140
141 def get_value(self, assignment):
142 return self.get_val_rec(self.values, self.variables, assignment)
143
144 def get_val_rec(self, value, variables, assignment):
145 if variables == []:
146 return value
147 else:
148 return self.get_val_rec(value[assignment[variables[0]]],
149 variables[1:],assignment)
Prob is a factor that represents a conditional probability by enumerating all
of the values.
probFactors.py — (continued)

151 class Prob(CPD,TabFactor):

https://aipython.org Version 0.9.9 October 15, 2023

204 9. Reasoning with Uncertainty

152 """A factor defined by a conditional probability table"""

153 def __init__(self, var, pars, cpt, name=None):
154 """Creates a factor from a conditional probability table, cpt
155 The cpt values are assumed to be for the ordering par+[var]
156 """
157 TabFactor.__init__(self, pars+[var], cpt, name)
158 self.child = var
159 self.parents = pars

9.3.4 Decision Tree Representations of Factors

A decision tree representation of a conditional probability is either:

• IFeq(var, val, true_cond, false_cond) where true_cond and false_cond

are decision trees. true_cond is used if variable var has value val in an
assignment; false_cond is used if var has a different value.

• a distribution over the child variable

Note that not all parents needs to be assigned to evaluate the decision tree; you
only need the branch down the tree that gives the distribition.
probFactors.py — (continued)

161 class ProbDT(CPD):

162 def __init__(self, child, parents, dt):
163 CPD.__init__(self, child, parents)
164 self.dt = dt
165
166 def get_value(self, assignment):
167 return self.dt.get_value(assignment, self.child)
168
169 def can_evaluate(self, assignment):
170 return self.child in assignment and self.dt.can_evaluate(assignment)
Decison trees are made up of conditons; here equality equality of a value and
a variable:
probFactors.py — (continued)

172 class IFeq:

173 def __init__(self, var, val, true_cond, false_cond):
174 self.var = var
175 self.val = val
176 self.true_cond = true_cond
177 self.false_cond = false_cond
178
179 def get_value(self, assignment, child):
180 if assignment[self.var] == self.val:
181 return self.true_cond.get_value(assignment, child)
182 else:
183 return self.false_cond.get_value(assignment,child)

https://aipython.org Version 0.9.9 October 15, 2023

9.3. Conditional Probability Distributions 205

184
185 def can_evaluate(self, assignment):
186 if self.var not in assignment:
187 return False
188 elif assignment[self.var] == self.val:
189 return self.true_cond.can_evaluate(assignment)
190 else:
191 return self.false_cond.can_evaluate(assignment)

At the leaves are distribitions over the child variable.

probFactors.py — (continued)

193 class Dist:

194 def __init__(self, dist):
195 """Dist is an arror or dictionary indexed by value of current
child"""
196 self.dist = dist
197
198 def get_value(self, assignment, child):
199 return self.dist[assignment[child]]
200
201 def can_evaluate(self, assignment):
202 return True

The following shows a decision representation of the Example 9.18 of Poole and
Mackworth [2023]. When the Action is to go out, the probability is a function
of rain; otherwise it is a function of full.

probFactors.py — (continued)

204 ##### A decision tree representation Example 9.18 of AIFCA 3e

205 from variable import Variable
206
207 boolean = [False, True]
208
209 action = Variable('Action', ['go_out', 'get_coffee'], position=(0.5,0.8))
210 rain = Variable('Rain', boolean, position=(0.2,0.8))
211 full = Variable('Cup Full', boolean, position=(0.8,0.8))
212
213 wet = Variable('Wet', boolean, position=(0.5,0.2))
214 p_wet = ProbDT(wet,[action,rain,full],
215 IFeq(action, 'go_out',
216 IFeq(rain, True, Dist([0.2,0.8]),
Dist([0.9,0.1])),
217 IFeq(full, True, Dist([0.4,0.6]),
Dist([0.7,0.3]))))
218
219 # See probRC for wetBN which expands this example to a complete network

https://aipython.org Version 0.9.9 October 15, 2023

206 9. Reasoning with Uncertainty

9.4 Graphical Models

A graphical model consists of a set of variables and a set of factors. A be-
lief network is a graphical model where all of the factors represent conditional
probabilities. There are some operations (such as pruning variables) which are
applicable to belief networks, but are not applicable to more general models.
At the moment, we will treat them as the same.

probGraphicalModels.py — Graphical Models and Belief Networks

11 from display import Displayable
12 from probFactors import CPD
13 import matplotlib.pyplot as plt
14
15 class GraphicalModel(Displayable):
16 """The class of graphical models.
17 A graphical model consists of a title, a set of variables and a set of
factors.
18
19 vars is a set of variables
20 factors is a set of factors
21 """
22 def __init__(self, title, variables=None, factors=None):
23 self.title = title
24 self.variables = variables
25 self.factors = factors

A belief network (also known as a Bayesian network) is a graphical model

where all of the factors are conditional probabilities, and every variable has a
conditional probability of it given its parents. This only checks the first condi-
tion, and builds some useful data structures.

probGraphicalModels.py — (continued)

27 class BeliefNetwork(GraphicalModel):
28 """The class of belief networks."""
29
30 def __init__(self, title, variables, factors):
31 """vars is a set of variables
32 factors is a set of factors. All of the factors are instances of
CPD (e.g., Prob).
33 """
34 GraphicalModel.__init__(self, title, variables, factors)
35 assert all(isinstance(f,CPD) for f in factors), factors
36 self.var2cpt = {f.child:f for f in factors}
37 self.var2parents = {f.child:f.parents for f in factors}
38 self.children = {n:[] for n in self.variables}
39 for v in self.var2parents:
40 for par in self.var2parents[v]:
41 self.children[par].append(v)
42 self.topological_sort_saved = None

https://aipython.org Version 0.9.9 October 15, 2023

9.4. Graphical Models 207

The following creates a topological sort of the nodes, where the parents of
a node come before the node in the resulting order. This is based on Kahn’s
algorithm from 1962.

probGraphicalModels.py — (continued)

44 def topological_sort(self):
45 """creates a topological ordering of variables such that the
parents of
46 a node are before the node.
47 """
48 if self.topological_sort_saved:
49 return self.topological_sort_saved
50 next_vars = {n for n in self.var2parents if not self.var2parents[n]
}
51 self.display(3,'topological_sort: next_vars',next_vars)
52 top_order=[]
53 while next_vars:
54 var = next_vars.pop()
55 self.display(3,'select variable',var)
56 top_order.append(var)
57 next_vars |= {ch for ch in self.children[var]
58 if all(p in top_order for p in
self.var2parents[ch])}
59 self.display(3,'var_with_no_parents_left',next_vars)
60 self.display(3,"top_order",top_order)
61 assert
set(top_order)==set(self.var2parents),(top_order,self.var2parents)
62 self.topologicalsort_saved=top_order
63 return top_order

9.4.1 Showing Belief Networks

The show method uses matplotlib to show the graphical structure of a belief
network.

probGraphicalModels.py — (continued)

65 def show(self, fontsize=10, facecolor='orange'):

66 plt.ion() # interactive
67 ax = plt.figure().gca()
68 ax.set_axis_off()
69 plt.title(self.title, fontsize=fontsize)
70 bbox =
dict(boxstyle="round4,pad=1.0,rounding_size=0.5",facecolor=facecolor)
71 for var in self.variables: #reversed(self.topological_sort()):
72 for par in self.var2parents[var]:
73 ax.annotate(var.name, par.position, xytext=var.position,
74 arrowprops={'arrowstyle':'<-'},bbox=bbox,
75 ha='center', va='center',
fontsize=fontsize)

https://aipython.org Version 0.9.9 October 15, 2023

208 9. Reasoning with Uncertainty

4-chain

A
B
C
D

Figure 9.1: bn 4ch.show()

76 for var in self.variables:

77 x,y = var.position
78 plt.text(x,y,var.name,bbox=bbox,ha='center', va='center',
fontsize=fontsize)

9.4.2 Example Belief Networks

A Chain of 4 Variables
The first example belief network is a simple chain A −→ B −→ C −→ D,
shown in Figure 9.1.
Please do not change this, as it is the example used for testing.

probExamples.py — Example belief networks

11 from variable import Variable
12 from probFactors import CPD, Prob, LogisticRegression, NoisyOR, ConstantCPD
13 from probGraphicalModels import BeliefNetwork
14
15 #### Simple Example Used for Unit Tests ####
16 boolean = [False, True]
17 A = Variable("A", boolean, position=(0,0.8))
18 B = Variable("B", boolean, position=(0.333,0.7))
19 C = Variable("C", boolean, position=(0.666,0.6))
20 D = Variable("D", boolean, position=(1,0.5))
21
22 f_a = Prob(A,[],[0.4,0.6])
23 f_b = Prob(B,[A],[[0.9,0.1],[0.2,0.8]])
24 f_c = Prob(C,[B],[[0.6,0.4],[0.3,0.7]])
25 f_d = Prob(D,[C],[[0.1,0.9],[0.75,0.25]])
26
27 bn_4ch = BeliefNetwork("4-chain", {A,B,C,D}, {f_a,f_b,f_c,f_d})

https://aipython.org Version 0.9.9 October 15, 2023

9.4. Graphical Models 209

Report-of-leaving
Tamper Fire

Alarm Smoke

Leaving

Report

Figure 9.2: The report-of-leaving belief network

Report-of-Leaving Example
The second belief network, bn_report, is Example 9.13 of Poole and Mack-
worth [2023] (http://artint.info). The output of bn_report.show() is shown
in Figure 9.2 of this document.
probExamples.py — (continued)

29 # Belief network report-of-leaving example (Example 9.13 shown in Figure

9.3) of
30 # Poole and Mackworth, Artificial Intelligence, 2023 http://artint.info
31 boolean = [False, True]
32
33 Alarm = Variable("Alarm", boolean, position=(0.366,0.5))
34 Fire = Variable("Fire", boolean, position=(0.633,0.75))
35 Leaving = Variable("Leaving", boolean, position=(0.366,0.25))
36 Report = Variable("Report", boolean, position=(0.366,0.0))
37 Smoke = Variable("Smoke", boolean, position=(0.9,0.5))
38 Tamper = Variable("Tamper", boolean, position=(0.1,0.75))
39
40 f_ta = Prob(Tamper,[],[0.98,0.02])
41 f_fi = Prob(Fire,[],[0.99,0.01])
42 f_sm = Prob(Smoke,[Fire],[[0.99,0.01],[0.1,0.9]])
43 f_al = Prob(Alarm,[Fire,Tamper],[[[0.9999, 0.0001], [0.15, 0.85]], [[0.01,
0.99], [0.5, 0.5]]])
44 f_lv = Prob(Leaving,[Alarm],[[0.999, 0.001], [0.12, 0.88]])

https://aipython.org Version 0.9.9 October 15, 2023

210 9. Reasoning with Uncertainty

Simple Diagnosis

Influenza Smokes

Sore Throat Fever Bronchitis

Coughing Wheezing

Figure 9.3: Simple diagnosis example; simple diagnosis.show()

45 f_re = Prob(Report,[Leaving],[[0.99, 0.01], [0.25, 0.75]])

46
47 bn_report = BeliefNetwork("Report-of-leaving",
{Tamper,Fire,Smoke,Alarm,Leaving,Report},
48 {f_ta,f_fi,f_sm,f_al,f_lv,f_re})

Simple Diagnostic Example

This is the “simple diagnostic example” of Exercise 9.1 of Poole and Mackworth
[2023], reproduced here as Figure 9.3
probExamples.py — (continued)

50 # Belief network simple-diagnostic example (Exercise 9.3 shown in Figure

9.39) of
51 # Poole and Mackworth, Artificial Intelligence, 2023 http://artint.info
52
53 Influenza = Variable("Influenza", boolean, position=(0.4,0.8))
54 Smokes = Variable("Smokes", boolean, position=(0.8,0.8))
55 SoreThroat = Variable("Sore Throat", boolean, position=(0.2,0.5))
56 HasFever = Variable("Fever", boolean, position=(0.4,0.5))
57 Bronchitis = Variable("Bronchitis", boolean, position=(0.6,0.5))
58 Coughing = Variable("Coughing", boolean, position=(0.4,0.2))
59 Wheezing = Variable("Wheezing", boolean, position=(0.8,0.2))
60
61 p_infl = Prob(Influenza,[],[0.95,0.05])
62 p_smokes = Prob(Smokes,[],[0.8,0.2])
63 p_sth = Prob(SoreThroat,[Influenza],[[0.999,0.001],[0.7,0.3]])
64 p_fever = Prob(HasFever,[Influenza],[[0.99,0.05],[0.9,0.1]])
65 p_bronc = Prob(Bronchitis,[Influenza,Smokes],[[[0.9999, 0.0001], [0.3,
0.7]], [[0.1, 0.9], [0.01, 0.99]]])

https://aipython.org Version 0.9.9 October 15, 2023

9.4. Graphical Models 211

Pearl's Sprinkler Example

Season

Rained Sprinkler

Grass wet

Grass shiny Shoes wet

Figure 9.4: The sprinkler belief network

66 p_cough = Prob(Coughing,[Bronchitis],[[0.93,0.07],[0.2,0.8]])
67 p_wheeze = Prob(Wheezing,[Bronchitis],[[0.999,0.001],[0.4,0.6]])
68
69 simple_diagnosis = BeliefNetwork("Simple Diagnosis",
70 {Influenza, Smokes, SoreThroat, HasFever, Bronchitis,
Coughing, Wheezing},
71 {p_infl, p_smokes, p_sth, p_fever, p_bronc, p_cough,
p_wheeze})

Sprinkler Example
The third belief network is the sprinkler example from Pearl. The output of
bn_sprinkler.show() is shown in Figure 9.4 of this document.
probExamples.py — (continued)

73 Season = Variable("Season", ["dry_season","wet_season"],

position=(0.5,0.9))
74 Sprinkler = Variable("Sprinkler", ["on","off"], position=(0.9,0.6))
75 Rained = Variable("Rained", boolean, position=(0.1,0.6))
76 Grass_wet = Variable("Grass wet", boolean, position=(0.5,0.3))
77 Grass_shiny = Variable("Grass shiny", boolean, position=(0.1,0))
78 Shoes_wet = Variable("Shoes wet", boolean, position=(0.9,0))
79
80 f_season = Prob(Season,[],{'dry_season':0.5, 'wet_season':0.5})

https://aipython.org Version 0.9.9 October 15, 2023

212 9. Reasoning with Uncertainty

Bipartite Diagnostic Network (noisy-or)

Cold Flu Covid

Cough Fever Sneeze

Figure 9.5: A bipartite diagnostic network

81 f_sprinkler = Prob(Sprinkler,[Season],{'dry_season':{'on':0.4,'off':0.6},
82 'wet_season':{'on':0.01,'off':0.99}})
83 f_rained = Prob(Rained,[Season],{'dry_season':[0.9,0.1], 'wet_season':
[0.2,0.8]})
84 f_wet = Prob(Grass_wet,[Sprinkler,Rained], {'on': [[0.1,0.9],[0.01,0.99]],
85 'off':[[0.99,0.01],[0.3,0.7]]})
86 f_shiny = Prob(Grass_shiny, [Grass_wet], [[0.95,0.05], [0.3,0.7]])
87 f_shoes = Prob(Shoes_wet, [Grass_wet], [[0.98,0.02], [0.35,0.65]])
88
89 bn_sprinkler = BeliefNetwork("Pearl's Sprinkler Example",
90 {Season, Sprinkler, Rained, Grass_wet, Grass_shiny,
Shoes_wet},
91 {f_season, f_sprinkler, f_rained, f_wet, f_shiny,
f_shoes})

Bipartite Diagnostic Model with Noisy-or

The belief network bn_no1 is a bipartite diagnostic model, with independent
diseases, and the symptoms depend on the diseases, where the CPDs are de-
fined using noisy-or. Bipartite means it is in two parts; the diseases are only
connected to the symptoms and the symptoms are only connected to the dis-
eases. The output of bn_no1.show() is shown in Figure 9.5 of this document.

https://aipython.org Version 0.9.9 October 15, 2023

9.4. Graphical Models 213

probExamples.py — (continued)

93 #### Bipartite Diagnostic Network ###

94 Cough = Variable("Cough", boolean, (0.1,0.1))
95 Fever = Variable("Fever", boolean, (0.5,0.1))
96 Sneeze = Variable("Sneeze", boolean, (0.9,0.1))
97 Cold = Variable("Cold",boolean, (0.1,0.9))
98 Flu = Variable("Flu",boolean, (0.5,0.9))
99 Covid = Variable("Covid",boolean, (0.9,0.9))
100
101 p_cold_no = Prob(Cold,[],[0.9,0.1])
102 p_flu_no = Prob(Flu,[],[0.95,0.05])
103 p_covid_no = Prob(Covid,[],[0.99,0.01])
104
105 p_cough_no = NoisyOR(Cough, [Cold,Flu,Covid], [0.1, 0.3, 0.2, 0.7])
106 p_fever_no = NoisyOR(Fever, [ Flu,Covid], [0.01, 0.6, 0.7])
107 p_sneeze_no = NoisyOR(Sneeze, [Cold,Flu ], [0.05, 0.5, 0.2 ])
108
109 bn_no1 = BeliefNetwork("Bipartite Diagnostic Network (noisy-or)",
110 {Cough, Fever, Sneeze, Cold, Flu, Covid},
111 {p_cold_no, p_flu_no, p_covid_no, p_cough_no,
p_fever_no, p_sneeze_no})
112
113 # to see the conditional probability of Noisy-or do:
114 # print(p_cough_no.to_table())
115
116 # example from box "Noisy-or compared to logistic regression"
117 # X = Variable("X",boolean)
118 # w0 = 0.01
119 # print(NoisyOR(X,[A,B,C,D],[w0, 1-(1-0.05)/(1-w0), 1-(1-0.1)/(1-w0),
1-(1-0.2)/(1-w0), 1-(1-0.2)/(1-w0), ]).to_table(given={X:True}))

Bipartite Diagnostic Model with Logistic Regression

The belief network bn_lr1 is a bipartite diagnostic model, with independent
diseases, and the symptoms depend on the diseases, where the CPDs are de-
fined using logistic regression. It has the same graphical structure as the pre-
vious example (see Figure 9.5). This has the (approximately) the same con-
ditional probabilities as the previous example when zero or one diseases are
present. Note that sigmoid(−2.2) ≈ 0.1

probExamples.py — (continued)

121
122 p_cold_lr = Prob(Cold,[],[0.9,0.1])
123 p_flu_lr = Prob(Flu,[],[0.95,0.05])
124 p_covid_lr = Prob(Covid,[],[0.99,0.01])
125
126 p_cough_lr = LogisticRegression(Cough, [Cold,Flu,Covid], [-2.2, 1.67,
1.26, 3.19])

https://aipython.org Version 0.9.9 October 15, 2023

214 9. Reasoning with Uncertainty

127 p_fever_lr = LogisticRegression(Fever, [ Flu,Covid], [-4.6, 5.02,

5.46])
128 p_sneeze_lr = LogisticRegression(Sneeze, [Cold,Flu ], [-2.94, 3.04, 1.79
])
129
130 bn_lr1 = BeliefNetwork("Bipartite Diagnostic Network - logistic
regression",
131 {Cough, Fever, Sneeze, Cold, Flu, Covid},
132 {p_cold_lr, p_flu_lr, p_covid_lr, p_cough_lr,
p_fever_lr, p_sneeze_lr})
133
134 # to see the conditional probability of Noisy-or do:
135 #print(p_cough_lr.to_table())
136
137 # example from box "Noisy-or compared to logistic regression"
138 # from learnLinear import sigmoid, logit
139 # w0=logit(0.01)
140 # X = Variable("X",boolean)
141 # print(LogisticRegression(X,[A,B,C,D],[w0, logit(0.05)-w0, logit(0.1)-w0,
logit(0.2)-w0, logit(0.2)-w0]).to_table(given={X:True}))
142 # try to predict what would happen (and then test) if we had
143 # w0=logit(0.01)

9.5 Inference Methods

Each of the inference methods implements the query method that computes
the posterior probability of a variable given a dictionary of {variable : value}
observations. The methods are Displayable because they implement the display
method which is currently text-based.

probGraphicalModels.py — (continued)

80 from display import Displayable

81 from probExamples import bn_4ch, B, D
82
83 class InferenceMethod(Displayable):
84 """The abstract class of graphical model inference methods"""
85 method_name = "unnamed" # each method should have a method name
86
87 def __init__(self,gm=None):
88 self.gm = gm
89
90 def query(self, qvar, obs={}):
91 """returns a {value:prob} dictionary for the query variable"""
92 raise NotImplementedError("InferenceMethod query") # abstract method

We use bn_4ch as the test case, in particular P(B | D = true). This needs an
error threshold, particularly for the approximate methods, where the default
threshold is much too accurate.

https://aipython.org Version 0.9.9 October 15, 2023

9.5. Inference Methods 215

Report-of-leaving observed: {Report: True}

Tamper Fire
False: 0.601 False: 0.769
True: 0.399 True: 0.231

Alarm Smoke
False: 0.372 False: 0.785
True: 0.628 True: 0.215

Leaving
False: 0.347
True: 0.653

Report=True

Figure 9.6: The report-of-leaving belief network with posterior distributions

probGraphicalModels.py — (continued)

94 def testIM(self, threshold=0.0000000001):

95 solver = self(bn_4ch)
96 res = solver.query(B,{D:True})
97 correct_answer = 0.429632380245
98 assert correct_answer-threshold < res[True] <
correct_answer+threshold, \
99 f"value {res[True]} not in desired range for
{self.method_name}"
100 print(f"Unit test passed for {self.method_name}.")

9.5.1 Showing Posterior Distributions

The show_post method draws the posterior distribution of all variables. Figure
9.6 shows the result of bn_reportRC.show_post({Report:True}) when run after
loading probRC.py (see below).

probGraphicalModels.py — (continued)

102 def show_post(self, obs={}, num_format="{:.3f}", fontsize=10,

facecolor='orange'):
103 """draws the graphical model conditioned on observations obs
104 num_format is number format (allows for more or less precision)
105 fontsize gives size of the text

https://aipython.org Version 0.9.9 October 15, 2023

216 9. Reasoning with Uncertainty

106 facecolor gives the color of the nodes

107 """
108 plt.ion() # interactive
109 ax = plt.figure().gca()
110 ax.set_axis_off()
111 plt.title(self.gm.title+" observed: "+str(obs), fontsize=fontsize)
112 bbox = dict(boxstyle="round4,pad=1.0,rounding_size=0.5",
facecolor=facecolor)
113 vartext = {} # variable:text dictionary
114 for var in self.gm.variables: #reversed(self.gm.topological_sort()):
115 if var in obs:
116 text = var.name + "=" + str(obs[var])
117 else:
118 distn = self.query(var, obs=obs)
119
120 text = var.name + "\n" + "\n".join(str(d)+":
"+num_format.format(v) for (d,v) in distn.items())
121 vartext[var] = text
122 # Draw arcs
123 for par in self.gm.var2parents[var]:
124 ax.annotate(text, par.position, xytext=var.position,
125 arrowprops={'arrowstyle':'<-'},bbox=bbox,
126 ha='center', va='center',
fontsize=fontsize)
127 for var in self.gm.variables:
128 x,y = var.position
129 plt.text(x,y,vartext[var], bbox=bbox, ha='center', va='center',
fontsize=fontsize)

9.6 Naive Search

An instance of a ProbSearch object takes in a graphical model. The query method
uses naive search to compute the probability of a query variable given obser-
vations on other variables. See Figure 9.9 of Poole and Mackworth [2023].
probRC.py — Recursive Conditioning for Graphical Models
11 import math
12 from probGraphicalModels import GraphicalModel, InferenceMethod
13 from probFactors import Factor
14
15 class ProbSearch(InferenceMethod):
16 """The class that queries graphical models using recursive conditioning
17
18 gm is graphical model to query
19 """
20 method_name = "naive search"
21
22 def __init__(self,gm=None):
23 InferenceMethod.__init__(self, gm)

https://aipython.org Version 0.9.9 October 15, 2023

9.6. Naive Search 217

24 ## self.max_display_level = 3
25
26 def query(self, qvar, obs={}, split_order=None):
27 """computes P(qvar | obs) where
28 qvar is the query variable
29 obs is a variable:value dictionary
30 split_order is a list of the non-observed non-query variables in gm
31 """
32 if qvar in obs:
33 return {val:(1 if val == obs[qvar] else 0)
34 for val in qvar.domain}
35 else:
36 if split_order == None:
37 split_order = [v for v in self.gm.variables
38 if (v not in obs) and v != qvar]
39 unnorm = [self.prob_search({qvar:val}|obs, self.gm.factors,
split_order)
40 for val in qvar.domain]
41 p_obs = sum(unnorm)
42 return {val:pr/p_obs for val,pr in zip(qvar.domain, unnorm)}
The following is the naive search-based algorithm. It is exponential in the
number of variables, so is not very useful. However, it is simple, and useful
to understand before looking at the more complicated algorithm used in the
subclass.
probRC.py — (continued)

44 def prob_search(self, context, factors, split_order):

45 """simple search algorithm
46 context: a variable:value dictionary
47 factors: a set of factors
48 split_order: list of variables not assigned in context
49 returns sum over variable assignments to variables in split order
of product of factors """
50 self.display(2,"calling prob_search,",(context,factors,split_order))
51 if not factors:
52 return 1
53 elif to_eval := {fac for fac in factors
54 if fac.can_evaluate(context)}:
55 # evaluate factors when all variables are assigned
56 self.display(3,"prob_search evaluating factors",to_eval)
57 val = math.prod(fac.get_value(context) for fac in to_eval)
58 return val * self.prob_search(context, factors-to_eval,
split_order)
59 else:
60 total = 0
61 var = split_order[0]
62 self.display(3, "prob_search branching on", var)
63 for val in var.domain:
64 total += self.prob_search({var:val}|context, factors,
split_order[1:])

https://aipython.org Version 0.9.9 October 15, 2023

218 9. Reasoning with Uncertainty

65 self.display(3, "prob_search branching on", var,"returning",

total)
66 return total

9.7 Recursive Conditioning

The recursive conditioning algorithm adds forgetting and caching and recog-
nizing disconnected components to the naive search. We do this by adding
a cache and redefining the recursive search algorithm. In inherits the query
method. See Figure 9.12 of Poole and Mackworth [2023].
probRC.py — (continued)

68 class ProbRC(ProbSearch):
69 method_name = "recursive conditioning"
70
71 def __init__(self,gm=None):
72 self.cache = {(frozenset(), frozenset()):1}
73 ProbSearch.__init__(self,gm)
74
75 def prob_search(self, context, factors, split_order):
76 """ returns \sum_{split_order} \prod_{factors} given assignment in
context
77 context is a variable:value dictionary
78 factors is a set of factors
79 split_order: list of variables in factors that are not in context
80 """
81 self.display(3,"calling rc,",(context,factors))
82 ce = (frozenset(context.items()), frozenset(factors)) # key for the
cache entry
83 if ce in self.cache:
84 self.display(3,"rc cache lookup",(context,factors))
85 return self.cache[ce]
86 # if not factors: #no factors; not needed with forgetting and caching
87 # return 1
88 elif vars_not_in_factors := {var for var in context
89 if not any(var in fac.variables
90 for fac in factors)}:
91 # forget variables not in any factor
92 self.display(3,"rc forgetting variables", vars_not_in_factors)
93 return self.prob_search({key:val for (key,val) in
context.items()
94 if key not in vars_not_in_factors},
95 factors, split_order)
96 elif to_eval := {fac for fac in factors
97 if fac.can_evaluate(context)}:
98 # evaluate factors when all variables are assigned
99 self.display(3,"rc evaluating factors",to_eval)
100 val = math.prod(fac.get_value(context) for fac in to_eval)
101 if val == 0:

https://aipython.org Version 0.9.9 October 15, 2023

9.7. Recursive Conditioning 219

102 return 0
103 else:
104 return val * self.prob_search(context,
105 {fac for fac in factors
106 if fac not in to_eval},
107 split_order)
108 elif len(comp := connected_components(context, factors,
split_order)) > 1:
109 # there are disconnected components
110 self.display(3,"splitting into connected components",comp,"in
context",context)
111 return(math.prod(self.prob_search(context,f,eo) for (f,eo) in
comp))
112 else:
113 assert split_order, "split_order should not be empty to get
here"
114 total = 0
115 var = split_order[0]
116 self.display(3, "rc branching on", var)
117 for val in var.domain:
118 total += self.prob_search({var:val}|context, factors,
split_order[1:])
119 self.cache[ce] = total
120 self.display(2, "rc branching on", var,"returning", total)
121 return total
connected_components returns a list of connected components, where a con-
nected component is a set of factors and a set of variables, where the graph that
connects variables and factors that involve them is connected. The connected
components are built one at a time; with a current connected component. At
all times factors is partitioned into 3 disjoint sets:

• component_factors containing factors in the current connected compo-

nent where all factors that share a variable are already in the component

• factors_to_check containing factors in the current connected component

where potentially some factors that share a variable are not in the com-
ponent; these need to be checked

• other_factors the other factors that are not (yet) in the connected com-
ponent

probRC.py — (continued)

123 def connected_components(context, factors, split_order):

124 """returns a list of (f,e) where f is a subset of factors and e is a
subset of split_order
125 such that each element shares the same variables that are disjoint from
other elements.
126 """
127 other_factors = set(factors) #copies factors

https://aipython.org Version 0.9.9 October 15, 2023

220 9. Reasoning with Uncertainty

128 factors_to_check = {other_factors.pop()} # factors in connected

component still to be checked
129 component_factors = set() # factors in first connected component
already checked
130 component_variables = set() # variables in first connected component
131 while factors_to_check:
132 next_fac = factors_to_check.pop()
133 component_factors.add(next_fac)
134 new_vars = set(next_fac.variables) - component_variables -
context.keys()
135 component_variables |= new_vars
136 for var in new_vars:
137 factors_to_check |= {f for f in other_factors
138 if var in f.variables}
139 other_factors -= factors_to_check # set difference
140 if other_factors:
141 return ( [(component_factors,[e for e in split_order
142 if e in component_variables])]
143 + connected_components(context, other_factors,
144 [e for e in split_order
145 if e not in component_variables]) )
146 else:
147 return [(component_factors, split_order)]

Testing:

probRC.py — (continued)

149 from probExamples import bn_4ch, A,B,C,D,f_a,f_b,f_c,f_d

150 bn_4chv = ProbRC(bn_4ch)
151 ## bn_4chv.query(A,{})
152 ## bn_4chv.query(D,{})
153 ## InferenceMethod.max_display_level = 3 # show more detail in displaying
154 ## InferenceMethod.max_display_level = 1 # show less detail in displaying
155 ## bn_4chv.query(A,{D:True},[C,B])
156 ## bn_4chv.query(B,{A:True,D:False})
157
158 from probExamples import bn_report,Alarm,Fire,Leaving,Report,Smoke,Tamper
159 bn_reportRC = ProbRC(bn_report) # answers queries using recursive
conditioning
160 ## bn_reportRC.query(Tamper,{})
161 ## InferenceMethod.max_display_level = 0 # show no detail in displaying
162 ## bn_reportRC.query(Leaving,{})
163 ## bn_reportRC.query(Tamper,{},
split_order=[Smoke,Fire,Alarm,Leaving,Report])
164 ## bn_reportRC.query(Tamper,{Report:True})
165 ## bn_reportRC.query(Tamper,{Report:True,Smoke:False})
166
167 ## To display resulting posteriors try:
168 # bn_reportRC.show_post({})
169 # bn_reportRC.show_post({Smoke:False})
170 # bn_reportRC.show_post({Report:True})

https://aipython.org Version 0.9.9 October 15, 2023

9.7. Recursive Conditioning 221

171 # bn_reportRC.show_post({Report:True, Smoke:False})

172
173 ## Note what happens to the cache when these are called in turn:
174 ## bn_reportRC.query(Tamper,{Report:True},
split_order=[Smoke,Fire,Alarm,Leaving])
175 ## bn_reportRC.query(Smoke,{Report:True},
split_order=[Tamper,Fire,Alarm,Leaving])
176
177 from probExamples import bn_sprinkler, Season, Sprinkler, Rained,
Grass_wet, Grass_shiny, Shoes_wet
178 bn_sprinklerv = ProbRC(bn_sprinkler)
179 ## bn_sprinklerv.query(Shoes_wet,{})
180 ## bn_sprinklerv.query(Shoes_wet,{Rained:True})
181 ## bn_sprinklerv.query(Shoes_wet,{Grass_shiny:True})
182 ## bn_sprinklerv.query(Shoes_wet,{Grass_shiny:False,Rained:True})
183
184 from probExamples import bn_no1, bn_lr1, Cough, Fever, Sneeze, Cold, Flu,
Covid
185 bn_no1v = ProbRC(bn_no1)
186 bn_lr1v = ProbRC(bn_lr1)
187 ## bn_no1v.query(Flu, {Fever:1, Sneeze:1})
188 ## bn_lr1v.query(Flu, {Fever:1, Sneeze:1})
189 ## bn_lr1v.query(Cough,{})
190 ## bn_lr1v.query(Cold,{Cough:1,Sneeze:0,Fever:1})
191 ## bn_lr1v.query(Flu,{Cough:0,Sneeze:1,Fever:1})
192 ## bn_lr1v.query(Covid,{Cough:1,Sneeze:0,Fever:1})
193 ## bn_lr1v.query(Covid,{Cough:1,Sneeze:0,Fever:1,Flu:0})
194 ## bn_lr1v.query(Covid,{Cough:1,Sneeze:0,Fever:1,Flu:1})
195
196 if __name__ == "__main__":
197 InferenceMethod.testIM(ProbSearch)
198 InferenceMethod.testIM(ProbRC)

The following example uses the decision tree representation of Section 9.3.4
(page 205). Does recursive conditioning split on variable full for the query
commented out below? What can be done to guarantee that it does?

probRC.py — (continued)

200 from probFactors import Prob, action, rain, full, wet, p_wet
201 from probGraphicalModels import BeliefNetwork
202 p_action = Prob(action,[],{'go_out':0.3, 'get_coffee':0.7})
203 p_rain = Prob(rain,[],[0.4,0.6])
204 p_full = Prob(full,[],[0.1,0.9])
205
206 wetBN = BeliefNetwork("Wet (decision tree CPD)", {action, rain, full, wet},
207 {p_action, p_rain, p_full, p_wet})
208 wetRC = ProbRC(wetBN)
209 # wetRC.query(wet, {action:'go_out', rain:True})
210 # wetRC.show_post({action:'go_out', rain:True})
211 # wetRC.show_post({action:'go_out', wet:True})

https://aipython.org Version 0.9.9 October 15, 2023

222 9. Reasoning with Uncertainty

9.8 Variable Elimination

An instance of a VE object takes in a graphical model. The query method uses
variable elimination to compute the probability of a variable given observa-
tions on some other variables.
probVE.py — Variable Elimination for Graphical Models
11 from probFactors import Factor, FactorObserved, FactorSum, factor_times
12 from probGraphicalModels import GraphicalModel, InferenceMethod
13
14 class VE(InferenceMethod):
15 """The class that queries Graphical Models using variable elimination.
16
17 gm is graphical model to query
18 """
19 method_name = "variable elimination"
20
21 def __init__(self,gm=None):
22 InferenceMethod.__init__(self, gm)
23
24 def query(self,var,obs={},elim_order=None):
25 """computes P(var|obs) where
26 var is a variable
27 obs is a {variable:value} dictionary"""
28 if var in obs:
29 return {var:1 if val == obs[var] else 0 for val in var.domain}
30 else:
31 if elim_order == None:
32 elim_order = self.gm.variables
33 projFactors = [self.project_observations(fact,obs)
34 for fact in self.gm.factors]
35 for v in elim_order:
36 if v != var and v not in obs:
37 projFactors = self.eliminate_var(projFactors,v)
38 unnorm = factor_times(var,projFactors)
39 p_obs=sum(unnorm)
40 self.display(1,"Unnormalized probs:",unnorm,"Prob obs:",p_obs)
41 return {val:pr/p_obs for val,pr in zip(var.domain, unnorm)}
A FactorObserved is a factor that is the result of some observations on an-
other factor. We don’t store the values in a list; we just look them up as needed.
The observations can include variables that are not in the list, but should have
some intersection with the variables in the factor.
probFactors.py — (continued)

221 class FactorObserved(Factor):

222 def __init__(self,factor,obs):
223 Factor.__init__(self, [v for v in factor.variables if v not in obs])
224 self.observed = obs
225 self.orig_factor = factor
226

https://aipython.org Version 0.9.9 October 15, 2023

9.8. Variable Elimination 223

227 def get_value(self,assignment):

228 return self.orig_factor.get_value(assignment|self.observed)
A FactorSum is a factor that is the result of summing out a variable from the
product of other factors. I.e., it constructs a representation of:

∑ ∏ f.
var f ∈factors

We store the values in a list in a lazy manner; if they are already computed, we
used the stored values. If they are not already computed we can compute and
store them.
probFactors.py — (continued)

230 class FactorSum(Factor):

231 def __init__(self,var,factors):
232 self.var_summed_out = var
233 self.factors = factors
234 vars = list({v for fac in factors
235 for v in fac.variables if v is not var})
236 #for fac in factors:
237 # for v in fac.variables:
238 # if v is not var and v not in vars:
239 # vars.append(v)
240 Factor.__init__(self,vars)
241 self.values = {}
242
243 def get_value(self,assignment):
244 """lazy implementation: if not saved, compute it. Return saved
value"""
245 asst = frozenset(assignment.items())
246 if asst in self.values:
247 return self.values[asst]
248 else:
249 total = 0
250 new_asst = assignment.copy()
251 for val in self.var_summed_out.domain:
252 new_asst[self.var_summed_out] = val
253 total += math.prod(fac.get_value(new_asst) for fac in
self.factors)
254 self.values[asst] = total
255 return total
The method factor times multiples a set of factors that are all factors on the same
variable (or on no variables). This is the last step in variable elimination before
normalizing. It returns an array giving the product for each value of variable.
probFactors.py — (continued)

257 def factor_times(variable, factors):

258 """when factors are factors just on variable (or on no variables)"""
259 prods = []

https://aipython.org Version 0.9.9 October 15, 2023

224 9. Reasoning with Uncertainty

260 facs = [f for f in factors if variable in f.variables]

261 for val in variable.domain:
262 ast = {variable:val}
263 prods.append(math.prod(f.get_value(ast) for f in facs))
264 return prods
To project observations onto a factor, for each variable that is observed in
the factor, we construct a new factor that is the factor projected onto that vari-
able. Factor observed creates a new factor that is the result is assigning a value
to a single variable.
probVE.py — (continued)

43 def project_observations(self,factor,obs):
44 """Returns the resulting factor after observing obs
45
46 obs is a dictionary of {variable:value} pairs.
47 """
48 if any((var in obs) for var in factor.variables):
49 # a variable in factor is observed
50 return FactorObserved(factor,obs)
51 else:
52 return factor
53
54 def eliminate_var(self,factors,var):
55 """Eliminate a variable var from a list of factors.
56 Returns a new set of factors that has var summed out.
57 """
58 self.display(2,"eliminating ",str(var))
59 contains_var = []
60 not_contains_var = []
61 for fac in factors:
62 if var in fac.variables:
63 contains_var.append(fac)
64 else:
65 not_contains_var.append(fac)
66 if contains_var == []:
67 return factors
68 else:
69 newFactor = FactorSum(var,contains_var)
70 self.display(2,"Multiplying:",[str(f) for f in contains_var])
71 self.display(2,"Creating factor:", newFactor)
72 self.display(3, newFactor.to_table()) # factor in detail
73 not_contains_var.append(newFactor)
74 return not_contains_var
75
76 from probExamples import bn_4ch, A,B,C,D
77 bn_4chv = VE(bn_4ch)
78 ## bn_4chv.query(A,{})
79 ## bn_4chv.query(D,{})
80 ## InferenceMethod.max_display_level = 3 # show more detail in displaying
81 ## InferenceMethod.max_display_level = 1 # show less detail in displaying

https://aipython.org Version 0.9.9 October 15, 2023

9.9. Stochastic Simulation 225

82 ## bn_4chv.query(A,{D:True})
83 ## bn_4chv.query(B,{A:True,D:False})
84
85 from probExamples import bn_report,Alarm,Fire,Leaving,Report,Smoke,Tamper
86 bn_reportv = VE(bn_report) # answers queries using variable elimination
87 ## bn_reportv.query(Tamper,{})
88 ## InferenceMethod.max_display_level = 0 # show no detail in displaying
89 ## bn_reportv.query(Leaving,{})
90 ## bn_reportv.query(Tamper,{},elim_order=[Smoke,Report,Leaving,Alarm,Fire])
91 ## bn_reportv.query(Tamper,{Report:True})
92 ## bn_reportv.query(Tamper,{Report:True,Smoke:False})
93
94 from probExamples import bn_sprinkler, Season, Sprinkler, Rained,
Grass_wet, Grass_shiny, Shoes_wet
95 bn_sprinklerv = VE(bn_sprinkler)
96 ## bn_sprinklerv.query(Shoes_wet,{})
97 ## bn_sprinklerv.query(Shoes_wet,{Rained:True})
98 ## bn_sprinklerv.query(Shoes_wet,{Grass_shiny:True})
99 ## bn_sprinklerv.query(Shoes_wet,{Grass_shiny:False,Rained:True})
100
101 from probExamples import bn_lr1, Cough, Fever, Sneeze, Cold, Flu, Covid
102 vediag = VE(bn_lr1)
103 ## vediag.query(Cough,{})
104 ## vediag.query(Cold,{Cough:1,Sneeze:0,Fever:1})
105 ## vediag.query(Flu,{Cough:0,Sneeze:1,Fever:1})
106 ## vediag.query(Covid,{Cough:1,Sneeze:0,Fever:1})
107 ## vediag.query(Covid,{Cough:1,Sneeze:0,Fever:1,Flu:0})
108 ## vediag.query(Covid,{Cough:1,Sneeze:0,Fever:1,Flu:1})
109
110 if __name__ == "__main__":
111 InferenceMethod.testIM(VE)

9.9 Stochastic Simulation

9.9.1 Sampling from a discrete distribution
The method sample one generates a single sample from a (possible unnormal-
ized) distribution. dist is a {value : weight} dictionary, where weight ≥ 0. This
returns a value with probability in proportion to its weight.
probStochSim.py — Probabilistic inference using stochastic simulation
11 import random
12 from probGraphicalModels import InferenceMethod
13
14 def sample_one(dist):
15 """returns the index of a single sample from normalized distribution
dist."""
16 rand = random.random()*sum(dist.values())
17 cum = 0 # cumulative weights

https://aipython.org Version 0.9.9 October 15, 2023

226 9. Reasoning with Uncertainty

18 for v in dist:
19 cum += dist[v]
20 if cum > rand:
21 return v

If we want to generate multiple samples, repeatedly calling sample one may

not be efficient. If we want to generate n samples, and the distribution is over
m values, sample one takes time O(mn). If m and n are of the same order of
magnitude, we can do better.
The method sample multiple generates multiple samples from a distribution
defined by dist, where dist is a {value : weight} dictionary, where weight ≥
0 and the weights cannot all be zero. This returns a list of values, of length
num samples, where each sample is selected with a probability proportional to
its weight.
The method generates all of the random numbers, sorts them, and then
goes through the distribution once, saving the selected samples.

probStochSim.py — (continued)

23 def sample_multiple(dist, num_samples):

24 """returns a list of num_samples values selected using distribution
dist.
25 dist is a {value:weight} dictionary that does not need to be normalized
26 """
27 total = sum(dist.values())
28 rands = sorted(random.random()*total for i in range(num_samples))
29 result = []
30 dist_items = list(dist.items())
31 cum = dist_items[0][1] # cumulative sum
32 index = 0
33 for r in rands:
34 while r>cum:
35 index += 1
36 cum += dist_items[index][1]
37 result.append(dist_items[index][0])
38 return result

Exercise 9.1
What is the time and space complexity the following 4 methods to generate n
samples, where m is the length of dist:
(a) n calls to sample one
(b) sample multiple
(c) Create the cumulative distribution (choose how this is represented) and, for
each random number, do a binary search to determine the sample associated
with the random number.
(d) Choose a random number in the range [i/n, (i + 1)/n) for each i ∈ range(n),
where n is the number of samples. Use these as the random numbers to
select the particles. (Does this give random samples?)

https://aipython.org Version 0.9.9 October 15, 2023

9.9. Stochastic Simulation 227

For each method suggest when it might be the best method.

The test sampling method can be used to generate the statistics from a num-
ber of samples. It is useful to see the variability as a function of the number of
samples. Try it for few samples and also for many samples.
probStochSim.py — (continued)

40 def test_sampling(dist, num_samples):

41 """Given a distribution, dist, draw num_samples samples
42 and return the resulting counts
43 """
44 result = {v:0 for v in dist}
45 for v in sample_multiple(dist, num_samples):
46 result[v] += 1
47 return result
48
49 # try the following queries a number of times each:
50 # test_sampling({1:1,2:2,3:3,4:4}, 100)
51 # test_sampling({1:1,2:2,3:3,4:4}, 100000)

9.9.2 Sampling Methods for Belief Network Inference

A SamplingInferenceMethod is an InferenceMethod, but the query method also
takes arguments for the number of samples and the sample-order (which is an
ordering of factors). The first methods assume a belief network (and not an
undirected graphical model).
probStochSim.py — (continued)

53 class SamplingInferenceMethod(InferenceMethod):
54 """The abstract class of sampling-based belief network inference
methods"""
55
56 def __init__(self,gm=None):
57 InferenceMethod.__init__(self, gm)
58
59 def query(self,qvar,obs={},number_samples=1000,sample_order=None):
60 raise NotImplementedError("SamplingInferenceMethod query") #
abstract

9.9.3 Rejection Sampling

probStochSim.py — (continued)

62 class RejectionSampling(SamplingInferenceMethod):
63 """The class that queries Graphical Models using Rejection Sampling.
64
65 gm is a belief network to query
66 """
67 method_name = "rejection sampling"

https://aipython.org Version 0.9.9 October 15, 2023

228 9. Reasoning with Uncertainty

68
69 def __init__(self, gm=None):
70 SamplingInferenceMethod.__init__(self, gm)
71
72 def query(self, qvar, obs={}, number_samples=1000, sample_order=None):
73 """computes P(qvar | obs) where
74 qvar is a variable.
75 obs is a {variable:value} dictionary.
76 sample_order is a list of variables where the parents
77 come before the variable.
78 """
79 if sample_order is None:
80 sample_order = self.gm.topological_sort()
81 self.display(2,*sample_order,sep="\t")
82 counts = {val:0 for val in qvar.domain}
83 for i in range(number_samples):
84 rejected = False
85 sample = {}
86 for nvar in sample_order:
87 fac = self.gm.var2cpt[nvar] #factor with nvar as child
88 val = sample_one({v:fac.get_value({**sample, nvar:v}) for v
in nvar.domain})
89 self.display(2,val,end="\t")
90 if nvar in obs and obs[nvar] != val:
91 rejected = True
92 self.display(2,"Rejected")
93 break
94 sample[nvar] = val
95 if not rejected:
96 counts[sample[qvar]] += 1
97 self.display(2,"Accepted")
98 tot = sum(counts.values())
99 # As well as the distribution we also include raw counts
100 dist = {c:v/tot if tot>0 else 1/len(qvar.domain) for (c,v) in
counts.items()}
101 dist["raw_counts"] = counts
102 return dist

9.9.4 Likelihood Weighting

Likelihood weighting includes a weight for each sample. Instead of rejecting
samples based on observations, likelihood weighting changes the weights of
the sample in proportion with the probability of the observation. The weight
then becomes the probability that the variable would have been rejected.
probStochSim.py — (continued)

104 class LikelihoodWeighting(SamplingInferenceMethod):

105 """The class that queries Graphical Models using Likelihood weighting.
106
107 gm is a belief network to query
108 """

https://aipython.org Version 0.9.9 October 15, 2023

9.9. Stochastic Simulation 229

109 method_name = "likelihood weighting"

110
111 def __init__(self, gm=None):
112 SamplingInferenceMethod.__init__(self, gm)
113
114 def query(self,qvar,obs={},number_samples=1000,sample_order=None):
115 """computes P(qvar | obs) where
116 qvar is a variable.
117 obs is a {variable:value} dictionary.
118 sample_order is a list of factors where factors defining the parents
119 come before the factors for the child.
120 """
121 if sample_order is None:
122 sample_order = self.gm.topological_sort()
123 self.display(2,*[v for v in sample_order
124 if v not in obs],sep="\t")
125 counts = {val:0 for val in qvar.domain}
126 for i in range(number_samples):
127 sample = {}
128 weight = 1.0
129 for nvar in sample_order:
130 fac = self.gm.var2cpt[nvar]
131 if nvar in obs:
132 sample[nvar] = obs[nvar]
133 weight *= fac.get_value(sample)
134 else:
135 val = sample_one({v:fac.get_value({**sample,nvar:v}) for
v in nvar.domain})
136 self.display(2,val,end="\t")
137 sample[nvar] = val
138 counts[sample[qvar]] += weight
139 self.display(2,weight)
140 tot = sum(counts.values())
141 # as well as the distribution we also include the raw counts
142 dist = {c:v/tot for (c,v) in counts.items()}
143 dist["raw_counts"] = counts
144 return dist

Exercise 9.2 Change this algorithm so that it does importance sampling using a
proposal distribution. It needs sample one using a different distribution and then
update the weight of the current sample. For testing, use a proposal distribution
that only specifies probabilities for some of the variables (and the algorithm uses
the probabilities for the network in other cases).

9.9.5 Particle Filtering

In this implementation, a particle is a {variable : value} dictionary. Because
adding a new value to dictionary involves a side effect, the dictionaries need
to be copied during resampling.

https://aipython.org Version 0.9.9 October 15, 2023

230 9. Reasoning with Uncertainty

probStochSim.py — (continued)

146 class ParticleFiltering(SamplingInferenceMethod):

147 """The class that queries Graphical Models using Particle Filtering.
148
149 gm is a belief network to query
150 """
151 method_name = "particle filtering"
152
153 def __init__(self, gm=None):
154 SamplingInferenceMethod.__init__(self, gm)
155
156 def query(self, qvar, obs={}, number_samples=1000, sample_order=None):
157 """computes P(qvar | obs) where
158 qvar is a variable.
159 obs is a {variable:value} dictionary.
160 sample_order is a list of factors where factors defining the parents
161 come before the factors for the child.
162 """
163 if sample_order is None:
164 sample_order = self.gm.topological_sort()
165 self.display(2,*[v for v in sample_order
166 if v not in obs],sep="\t")
167 particles = [{} for i in range(number_samples)]
168 for nvar in sample_order:
169 fac = self.gm.var2cpt[nvar]
170 if nvar in obs:
171 weights = [fac.get_value({**part, nvar:obs[nvar]})
172 for part in particles]
173 particles = [{**p, nvar:obs[nvar]}
174 for p in resample(particles, weights,
number_samples)]
175 else:
176 for part in particles:
177 part[nvar] = sample_one({v:fac.get_value({**part,
nvar:v})
178 for v in nvar.domain})
179 self.display(2,part[nvar],end="\t")
180 counts = {val:0 for val in qvar.domain}
181 for part in particles:
182 counts[part[qvar]] += 1
183 tot = sum(counts.values())
184 # as well as the distribution we also include the raw counts
185 dist = {c:v/tot for (c,v) in counts.items()}
186 dist["raw_counts"] = counts
187 return dist

Resampling
Resample is based on sample multiple but works with an array of particles.
(Aside: Python doesn’t let us use sample multiple directly as it uses a dictionary,

https://aipython.org Version 0.9.9 October 15, 2023

9.9. Stochastic Simulation 231

and particles, represented as dictionaries can’t be the key of dictionaries).

probStochSim.py — (continued)

189 def resample(particles, weights, num_samples):

190 """returns num_samples copies of particles resampled according to
weights.
191 particles is a list of particles
192 weights is a list of positive numbers, of same length as particles
193 num_samples is n integer
194 """
195 total = sum(weights)
196 rands = sorted(random.random()*total for i in range(num_samples))
197 result = []
198 cum = weights[0] # cumulative sum
199 index = 0
200 for r in rands:
201 while r>cum:
202 index += 1
203 cum += weights[index]
204 result.append(particles[index])
205 return result

9.9.6 Examples
probStochSim.py — (continued)

207 from probExamples import bn_4ch, A,B,C,D

208 bn_4chr = RejectionSampling(bn_4ch)
209 bn_4chL = LikelihoodWeighting(bn_4ch)
210 ## InferenceMethod.max_display_level = 2 # detailed tracing for all
inference methods
211 ## bn_4chr.query(A,{})
212 ## bn_4chr.query(C,{})
213 ## bn_4chr.query(A,{C:True})
214 ## bn_4chr.query(B,{A:True,C:False})
215
216 from probExamples import bn_report,Alarm,Fire,Leaving,Report,Smoke,Tamper
217 bn_reportr = RejectionSampling(bn_report) # answers queries using
rejection sampling
218 bn_reportL = LikelihoodWeighting(bn_report) # answers queries using
likelihood weighting
219 bn_reportp = ParticleFiltering(bn_report) # answers queries using particle
filtering
220 ## bn_reportr.query(Tamper,{})
221 ## bn_reportr.query(Tamper,{})
222 ## bn_reportr.query(Tamper,{Report:True})
223 ## InferenceMethod.max_display_level = 0 # no detailed tracing for all
inference methods
224 ## bn_reportr.query(Tamper,{Report:True},number_samples=100000)
225 ## bn_reportr.query(Tamper,{Report:True,Smoke:False})

https://aipython.org Version 0.9.9 October 15, 2023

232 9. Reasoning with Uncertainty

226 ## bn_reportr.query(Tamper,{Report:True,Smoke:False},number_samples=100)
227
228 ## bn_reportL.query(Tamper,{Report:True,Smoke:False},number_samples=100)
229 ## bn_reportL.query(Tamper,{Report:True,Smoke:False},number_samples=100)
230
231 from probExamples import bn_sprinkler,Season, Sprinkler
232 from probExamples import Rained, Grass_wet, Grass_shiny, Shoes_wet
233 bn_sprinklerr = RejectionSampling(bn_sprinkler) # answers queries using
rejection sampling
234 bn_sprinklerL = LikelihoodWeighting(bn_sprinkler) # answers queries using
rejection sampling
235 bn_sprinklerp = ParticleFiltering(bn_sprinkler) # answers queries using
particle filtering
236 #bn_sprinklerr.query(Shoes_wet,{Grass_shiny:True,Rained:True})
237 #bn_sprinklerL.query(Shoes_wet,{Grass_shiny:True,Rained:True})
238 #bn_sprinklerp.query(Shoes_wet,{Grass_shiny:True,Rained:True})
239
240 if __name__ == "__main__":
241 InferenceMethod.testIM(RejectionSampling, threshold=0.1)
242 InferenceMethod.testIM(LikelihoodWeighting, threshold=0.1)
243 InferenceMethod.testIM(ParticleFiltering, threshold=0.1)
Exercise 9.3 This code keeps regenerating the distribution of a variable given
its parents. Implement one or both of the following, and compare them to the
original. Make cond dist return a slice that corresponds to the distribution, and
then use the slice instead of the dictionary (a list slice does not generate new data
structures). Make cond dist remember values it has already computed, and only
return these.

9.9.7 Gibbs Sampling

The following implements Gibbs sampling, a form of Markov Chain Monte
Carlo MCMC.

probStochSim.py — (continued)

245 #import random

246 #from probGraphicalModels import InferenceMethod
247
248 #from probStochSim import sample_one, SamplingInferenceMethod
249
250 class GibbsSampling(SamplingInferenceMethod):
251 """The class that queries Graphical Models using Gibbs Sampling.
252
253 bn is a graphical model (e.g., a belief network) to query
254 """
255 method_name = "Gibbs sampling"
256
257 def __init__(self, gm=None):
258 SamplingInferenceMethod.__init__(self, gm)
259 self.gm = gm

https://aipython.org Version 0.9.9 October 15, 2023

9.9. Stochastic Simulation 233

260
261 def query(self, qvar, obs={}, number_samples=1000, burn_in=100,
sample_order=None):
262 """computes P(qvar | obs) where
263 qvar is a variable.
264 obs is a {variable:value} dictionary.
265 sample_order is a list of non-observed variables in order, or
266 if sample_order None, an arbitrary ordering is used
267 """
268 counts = {val:0 for val in qvar.domain}
269 if sample_order is not None:
270 variables = sample_order
271 else:
272 variables = [v for v in self.gm.variables if v not in obs]
273 random.shuffle(variables)
274 var_to_factors = {v:set() for v in self.gm.variables}
275 for fac in self.gm.factors:
276 for var in fac.variables:
277 var_to_factors[var].add(fac)
278 sample = {var:random.choice(var.domain) for var in variables}
279 self.display(3,"Sample:",sample)
280 sample.update(obs)
281 for i in range(burn_in + number_samples):
282 for var in variables:
283 # get unnormalized probability distribution of var given its
neighbors
284 vardist = {val:1 for val in var.domain}
285 for val in var.domain:
286 sample[var] = val
287 for fac in var_to_factors[var]: # Markov blanket
288 vardist[val] *= fac.get_value(sample)
289 sample[var] = sample_one(vardist)
290 if i >= burn_in:
291 counts[sample[qvar]] +=1
292 self.display(3," ",sample)
293 tot = sum(counts.values())
294 # as well as the computed distribution, we also include raw counts
295 dist = {c:v/tot for (c,v) in counts.items()}
296 dist["raw_counts"] = counts
297 self.display(2, f"Gibbs sampling P({qvar}|{obs}) = {dist}")
298 return dist
299
300 #from probExamples import bn_4ch, A,B,C,D
301 bn_4chg = GibbsSampling(bn_4ch)
302 ## InferenceMethod.max_display_level = 2 # detailed tracing for all
inference methods
303 bn_4chg.query(A,{})
304 ## bn_4chg.query(D,{})
305 ## bn_4chg.query(B,{D:True})
306 ## bn_4chg.query(B,{A:True,C:False})

https://aipython.org Version 0.9.9 October 15, 2023

234 9. Reasoning with Uncertainty

307
308 from probExamples import bn_report,Alarm,Fire,Leaving,Report,Smoke,Tamper
309 bn_reportg = GibbsSampling(bn_report)
310 ## bn_reportg.query(Tamper,{Report:True},number_samples=1000)
311
312 if __name__ == "__main__":
313 InferenceMethod.testIM(GibbsSampling, threshold=0.1)

Exercise 9.4 Change the code so that it can have multiple query variables. Make
the list of query variable be an input to the algorithm, so that the default value is
the list of all non-observed variables.
Exercise 9.5 In this algorithm, explain where it computes the probability of a
variable given its Markov blanket. Instead of returning the average of the samples
for the query variable, it is possible to return the average estimate of the probabil-
ity of the query variable given its Markov blanket. Does this converge to the same
answer as the given code? Does it converge faster, slower, or the same?

9.9.8 Plotting Behavior of Stochastic Simulators

The stochastic simulation runs can give different answers each time they are
run. For the algorithms that give the same answer in the limit as the number of
samples approaches infinity (as do all of these algorithms), the algorithms can
be compared by comparing the accuracy for multiple runs. Summary statistics
like the variance may provide some information, but the assumptions behind
the variance being appropriate (namely that the distribution is approximately
Gaussian) may not hold for cases where the predictions are bounded and often
skewed.
It is more appropriate to plot the distribution of predictions over multiple
runs. The plot stats method plots the prediction of a particular variable (or for
the partition function) for a number of runs of the same algorithm. On the x-
axis, is the prediction of the algorithm. On the y-axis is the number of runs
with prediction less than or equal to the x value. Thus this is like a cumulative
distribution over the predictions, but with counts on the y-axis.
Note that for runs where there are no samples that are consistent with the
observations (as can happen with rejection sampling), the prediction of proba-
bility is 1.0 (as a convention for 0/0).
That variable what contains the query variable, or what is “prob ev”, the
probability of evidence.
probStochSim.py — (continued)

315 import matplotlib.pyplot as plt

316
317 def plot_stats(method, qvar, qval, obs, number_runs=1000, **queryargs):
318 """Plots a cumulative distribution of the prediction of the model.
319 method is a InferenceMethod (that implements appropriate query(.))
320 plots P(qvar=qval | obs)
321 qvar is the query variable, qval is corresponding value

https://aipython.org Version 0.9.9 October 15, 2023

9.9. Stochastic Simulation 235

322 obs is the {variable:value} dictionary representing the observations

323 number_iterations is the number of runs that are plotted
324 **queryargs is the arguments to query (often number_samples for
sampling methods)
325 """
326 plt.ion()
327 plt.xlabel("value")
328 plt.ylabel("Cumulative Number")
329 method.max_display_level, prev_mdl = 0, method.max_display_level #no
display
330 answers = [method.query(qvar,obs,**queryargs)
331 for i in range(number_runs)]
332 values = [ans[qval] for ans in answers]
333 label = f"""{method.method_name}
P({qvar}={qval}|{','.join(f'{var}={val}'
334 for (var,val) in
obs.items())})"""
335 values.sort()
336 plt.plot(values,range(number_runs),label=label)
337 plt.legend() #loc="upper left")
338 plt.draw()
339 method.max_display_level = prev_mdl # restore display level
340
341 # Try:
342 # plot_stats(bn_reportr,Tamper,True,{Report:True,Smoke:True},
number_samples=1000, number_runs=1000)
343 # plot_stats(bn_reportL,Tamper,True,{Report:True,Smoke:True},
number_samples=1000, number_runs=1000)
344 # plot_stats(bn_reportp,Tamper,True,{Report:True,Smoke:True},
number_samples=1000, number_runs=1000)
345 # plot_stats(bn_reportr,Tamper,True,{Report:True,Smoke:True},
number_samples=100, number_runs=1000)
346 # plot_stats(bn_reportL,Tamper,True,{Report:True,Smoke:True},
number_samples=100, number_runs=1000)
347 # plot_stats(bn_reportg,Tamper,True,{Report:True,Smoke:True},
number_samples=1000, number_runs=1000)
348
349 def plot_mult(methods, example, qvar, qval, obs, number_samples=1000,
number_runs=1000):
350 for method in methods:
351 solver = method(example)
352 if isinstance(method,SamplingInferenceMethod):
353 plot_stats(solver, qvar, qval, obs,
number_samples=number_samples, number_runs=number_runs)
354 else:
355 plot_stats(solver, qvar, qval, obs, number_runs=number_runs)
356
357 from probRC import ProbRC
358 # Try following (but it takes a while..)
359 methods =

https://aipython.org Version 0.9.9 October 15, 2023

236 9. Reasoning with Uncertainty

[ProbRC,RejectionSampling,LikelihoodWeighting,ParticleFiltering,GibbsSampling]
360 #plot_mult(methods,bn_report,Tamper,True,{Report:True,Smoke:False},number_samples=100,
number_runs=1000)
361 #
plot_mult(methods,bn_report,Tamper,True,{Report:False,Smoke:True},number_samples=100,
number_runs=1000)
362
363 # Sprinkler Example:
364 #
plot_stats(bn_sprinklerr,Shoes_wet,True,{Grass_shiny:True,Rained:True},number_samples=1000)
365 #
plot_stats(bn_sprinklerL,Shoes_wet,True,{Grass_shiny:True,Rained:True},number_samples=1000)

9.10 Hidden Markov Models

This code for hidden Markov models is independent of the graphical mod-
els code, to keep it simple. Section 9.11 gives code that models hidden Markov
models, and more generally, dynamic belief networks, using the graphical mod-
els code.
This HMM code assumes there are multiple Boolean observation variables
that depend on the current state and are independent of each other given the
state.

probHMM.py — Hidden Markov Model

11 import random
12 from probStochSim import sample_one, sample_multiple
13
14 class HMM(object):
15 def __init__(self, states, obsvars, pobs, trans, indist):
16 """A hidden Markov model.
17 states - set of states
18 obsvars - set of observation variables
19 pobs - probability of observations, pobs[i][s] is P(Obs_i=True |
State=s)
20 trans - transition probability - trans[i][j] gives P(State=j |
State=i)
21 indist - initial distribution - indist[s] is P(State_0 = s)
22 """
23 self.states = states
24 self.obsvars = obsvars
25 self.pobs = pobs
26 self.trans = trans
27 self.indist = indist

Consider the following example. Suppose you want to unobtrusively keep

track of an animal in a triangular enclosure using sound. Suppose you have
3 microphones that provide unreliable (noisy) binary information at each time

https://aipython.org Version 0.9.9 October 15, 2023

9.10. Hidden Markov Models 237

step. The animal is either close to one of the 3 points of the triangle or in the
middle of the triangle.
probHMM.py — (continued)

29 # state
30 # 0=middle, 1,2,3 are corners
31 states1 = {'middle', 'c1', 'c2', 'c3'} # states
32 obs1 = {'m1','m2','m3'} # microphones
The observation model is as follows. If the animal is in a corner, it will
be detected by the microphone at that corner with probability 0.6, and will be
independently detected by each of the other microphones with a probability of
0.1. If the animal is in the middle, it will be detected by each microphone with
a probability of 0.4.
probHMM.py — (continued)

34 # pobs gives the observation model:

35 #pobs[mi][state] is P(mi=on | state)
36 closeMic=0.6; farMic=0.1; midMic=0.4
37 pobs1 = {'m1':{'middle':midMic, 'c1':closeMic, 'c2':farMic, 'c3':farMic},
# mic 1
38 'm2':{'middle':midMic, 'c1':farMic, 'c2':closeMic, 'c3':farMic}, #
mic 2
39 'm3':{'middle':midMic, 'c1':farMic, 'c2':farMic, 'c3':closeMic}} #
mic 3
The transition model is as follows: If the animal is in a corner it stays in
the same corner with probability 0.80, goes to the middle with probability 0.1
or goes to one of the other corners with probability 0.05 each. If it is in the
middle, it stays in the middle with probability 0.7, otherwise it moves to one
the corners, each with probability 0.1.
probHMM.py — (continued)

41 # trans specifies the dynamics

42 # trans[i] is the distribution over states resulting from state i
43 # trans[i][j] gives P(S=j | S=i)
44 sm=0.7; mmc=0.1 # transition probabilities when in middle
45 sc=0.8; mcm=0.1; mcc=0.05 # transition probabilities when in a corner
46 trans1 = {'middle':{'middle':sm, 'c1':mmc, 'c2':mmc, 'c3':mmc}, # was in
middle
47 'c1':{'middle':mcm, 'c1':sc, 'c2':mcc, 'c3':mcc}, # was in corner
1
48 'c2':{'middle':mcm, 'c1':mcc, 'c2':sc, 'c3':mcc}, # was in corner
2
49 'c3':{'middle':mcm, 'c1':mcc, 'c2':mcc, 'c3':sc}} # was in corner
3
Initially the animal is in one of the four states, with equal probability.
probHMM.py — (continued)

51 # initially we have a uniform distribution over the animal's state

https://aipython.org Version 0.9.9 October 15, 2023

238 9. Reasoning with Uncertainty

52 indist1 = {st:1.0/len(states1) for st in states1}

53
54 hmm1 = HMM(states1, obs1, pobs1, trans1, indist1)

9.10.1 Exact Filtering for HMMs

A HMMVEfilter has a current state distribution which can be updated by ob-
serving or by advancing to the next time.
probHMM.py — (continued)

56 from display import Displayable

57
58 class HMMVEfilter(Displayable):
59 def __init__(self,hmm):
60 self.hmm = hmm
61 self.state_dist = hmm.indist
62
63 def filter(self, obsseq):
64 """updates and returns the state distribution following the
sequence of
65 observations in obsseq using variable elimination.
66
67 Note that it first advances time.
68 This is what is required if it is called sequentially.
69 If that is not what is wanted initially, do an observe first.
70 """
71 for obs in obsseq:
72 self.advance() # advance time
73 self.observe(obs) # observe
74 return self.state_dist
75
76 def observe(self, obs):
77 """updates state conditioned on observations.
78 obs is a list of values for each observation variable"""
79 for i in self.hmm.obsvars:
80 self.state_dist = {st:self.state_dist[st]*(self.hmm.pobs[i][st]
81 if obs[i] else
(1-self.hmm.pobs[i][st]))
82 for st in self.hmm.states}
83 norm = sum(self.state_dist.values()) # normalizing constant
84 self.state_dist = {st:self.state_dist[st]/norm for st in
self.hmm.states}
85 self.display(2,"After observing",obs,"state
distribution:",self.state_dist)
86
87 def advance(self):
88 """advance to the next time"""
89 nextstate = {st:0.0 for st in self.hmm.states} # distribution over
next states
90 for j in self.hmm.states: # j ranges over next states

https://aipython.org Version 0.9.9 October 15, 2023

9.10. Hidden Markov Models 239

91 for i in self.hmm.states: # i ranges over previous states

92 nextstate[j] += self.hmm.trans[i][j]*self.state_dist[i]
93 self.state_dist = nextstate
94 self.display(2,"After advancing state
distribution:",self.state_dist)
The following are some queries for hmm1.
probHMM.py — (continued)

96 hmm1f1 = HMMVEfilter(hmm1)
97 # hmm1f1.filter([{'m1':0, 'm2':1, 'm3':1}, {'m1':1, 'm2':0, 'm3':1}])
98 ## HMMVEfilter.max_display_level = 2 # show more detail in displaying
99 # hmm1f2 = HMMVEfilter(hmm1)
100 # hmm1f2.filter([{'m1':1, 'm2':0, 'm3':0}, {'m1':0, 'm2':1, 'm3':0},
{'m1':1, 'm2':0, 'm3':0},
101 # {'m1':0, 'm2':0, 'm3':0}, {'m1':0, 'm2':0, 'm3':0},
{'m1':0, 'm2':0, 'm3':0},
102 # {'m1':0, 'm2':0, 'm3':0}, {'m1':0, 'm2':0, 'm3':1},
{'m1':0, 'm2':0, 'm3':1},
103 # {'m1':0, 'm2':0, 'm3':1}])
104 # hmm1f3 = HMMVEfilter(hmm1)
105 # hmm1f3.filter([{'m1':1, 'm2':0, 'm3':0}, {'m1':0, 'm2':0, 'm3':0},
{'m1':1, 'm2':0, 'm3':0}, {'m1':1, 'm2':0, 'm3':1}])
106
107 # How do the following differ in the resulting state distribution?
108 # Note they start the same, but have different initial observations.
109 ## HMMVEfilter.max_display_level = 1 # show less detail in displaying
110 # for i in range(100): hmm1f1.advance()
111 # hmm1f1.state_dist
112 # for i in range(100): hmm1f3.advance()
113 # hmm1f3.state_dist

Exercise 9.6 The representation assumes that there are a list of Boolean obser-
vations. Extend the representation so that the each observation variable can have
multiple discrete values. You need to choose a representation for the model, and
change the algorithm.

9.10.2 Localization
The localization example in the book is a controlled HMM, where there is a
given action at each time and the transition depends on the action.
probLocalization.py — Controlled HMM and Localization example
11 from probHMM import HMMVEfilter, HMM
12 from display import Displayable
13 import matplotlib.pyplot as plt
14 from matplotlib.widgets import Button, CheckButtons
15
16 class HMM_Controlled(HMM):
17 """A controlled HMM, where the transition probability depends on the
action.

https://aipython.org Version 0.9.9 October 15, 2023

240 9. Reasoning with Uncertainty

18 Instead of the transition probability, it has a function act2trans

19 from action to transition probability.
20 Any algorithms need to select the transition probability according
to the action.
21 """
22 def __init__(self, states, obsvars, pobs, act2trans, indist):
23 self.act2trans = act2trans
24 HMM.__init__(self, states, obsvars, pobs, None, indist)
25
26
27 local_states = list(range(16))
28 door_positions = {2,4,7,11}
29 def prob_door(loc): return 0.8 if loc in door_positions else 0.1
30 local_obs = {'door':[prob_door(i) for i in range(16)]}
31 act2trans = {'right': [[0.1 if next == current
32 else 0.8 if next == (current+1)%16
33 else 0.074 if next == (current+2)%16
34 else 0.002 for next in range(16)]
35 for current in range(16)],
36 'left': [[0.1 if next == current
37 else 0.8 if next == (current-1)%16
38 else 0.074 if next == (current-2)%16
39 else 0.002 for next in range(16)]
40 for current in range(16)]}
41 hmm_16pos = HMM_Controlled(local_states, {'door'}, local_obs,
42 act2trans, [1/16 for i in range(16)])
To change the VE localization code to allow for controlled HMMs, notice
that the action selects which transition probability to us.
probLocalization.py — (continued)

43 class HMM_Local(HMMVEfilter):
44 """VE filter for controlled HMMs
45 """
46 def __init__(self, hmm):
47 HMMVEfilter.__init__(self, hmm)
48
49 def go(self, action):
50 self.hmm.trans = self.hmm.act2trans[action]
51 self.advance()
52
53 loc_filt = HMM_Local(hmm_16pos)
54 # loc_filt.observe({'door':True}); loc_filt.go("right");
loc_filt.observe({'door':False}); loc_filt.go("right");
loc_filt.observe({'door':True})
55 # loc_filt.state_dist
The following lets us interactively move the agent and provide observa-
tions. It shows the distribution over locations.
probLocalization.py — (continued)

57 class Show_Localization(Displayable):

https://aipython.org Version 0.9.9 October 15, 2023

9.10. Hidden Markov Models 241

58 def init(self,hmm, fontsize=10):

59 self.hmm = hmm
60 self.fontsize = fontsize
61 self.loc_filt = HMM_Local(hmm)
62 fig,(self.ax) = plt.subplots()
63 plt.subplots_adjust(bottom=0.2)
64 ## Set up buttons:
65 left_butt = Button(plt.axes([0.05,0.02,0.1,0.05]), "left")
66 left_butt.label.set_fontsize(self.fontsize)
67 left_butt.on_clicked(self.left)
68 right_butt = Button(plt.axes([0.25,0.02,0.1,0.05]), "right")
69 right_butt.label.set_fontsize(self.fontsize)
70 right_butt.on_clicked(self.right)
71 door_butt = Button(plt.axes([0.45,0.02,0.1,0.05]), "door")
72 door_butt.label.set_fontsize(self.fontsize)
73 door_butt.on_clicked(self.door)
74 nodoor_butt = Button(plt.axes([0.65,0.02,0.1,0.05]), "no door")
75 nodoor_butt.label.set_fontsize(self.fontsize)
76 nodoor_butt.on_clicked(self.nodoor)
77 reset_butt = Button(plt.axes([0.85,0.02,0.1,0.05]), "reset")
78 reset_butt.label.set_fontsize(self.fontsize)
79 reset_butt.on_clicked(self.reset)
80 ## draw the distribution
81 plt.subplot(1, 1, 1)
82 self.draw_dist()
83 plt.show()
84
85 def draw_dist(self):
86 self.ax.clear()
87 plt.ylim(0,1)
88 plt.ylabel("Probability", fontsize=self.fontsize)
89 plt.xlabel("Location", fontsize=self.fontsize)
90 plt.title("Location Probability Distribution",
fontsize=self.fontsize)
91 plt.xticks(self.hmm.states,fontsize=self.fontsize)
92 plt.yticks(fontsize=self.fontsize)
93 vals = [self.loc_filt.state_dist[i] for i in self.hmm.states]
94 self.bars = self.ax.bar(self.hmm.states, vals, color='black')
95 self.ax.bar_label(self.bars,["{v:.2f}".format(v=v) for v in vals],
padding = 1, fontsize=self.fontsize)
96 plt.draw()
97
98 def left(self,event):
99 self.loc_filt.go("left")
100 self.draw_dist()
101 def right(self,event):
102 self.loc_filt.go("right")
103 self.draw_dist()
104 def door(self,event):
105 self.loc_filt.observe({'door':True})

https://aipython.org Version 0.9.9 October 15, 2023

242 9. Reasoning with Uncertainty

106 self.draw_dist()
107 def nodoor(self,event):
108 self.loc_filt.observe({'door':False})
109 self.draw_dist()
110 def reset(self,event):
111 self.loc_filt.state_dist = {i:1/16 for i in range(16)}
112 self.draw_dist()
113
114 # sl = Show_Localization(hmm_16pos)
115 # sl = Show_Localization(hmm_16pos, fontsize=15) # for demos - enlarge
window

9.10.3 Particle Filtering for HMMs

In this implementation a particle is just a state. If you want to do some form
of smoothing, a particle should probably be a history of states. This maintains,
particles, an array of states, weights an array of (non-negative) real numbers,
such that weights[i] is the weight of particles[i].

probHMM.py — (continued)

114 from display import Displayable

115 from probStochSim import resample
116
117 class HMMparticleFilter(Displayable):
118 def __init__(self,hmm,number_particles=1000):
119 self.hmm = hmm
120 self.particles = [sample_one(hmm.indist)
121 for i in range(number_particles)]
122 self.weights = [1 for i in range(number_particles)]
123
124 def filter(self, obsseq):
125 """returns the state distribution following the sequence of
126 observations in obsseq using particle filtering.
127
128 Note that it first advances time.
129 This is what is required if it is called after previous filtering.
130 If that is not what is wanted initially, do an observe first.
131 """
132 for obs in obsseq:
133 self.advance() # advance time
134 self.observe(obs) # observe
135 self.resample_particles()
136 self.display(2,"After observing", str(obs),
137 "state distribution:",
self.histogram(self.particles))
138 self.display(1,"Final state distribution:",
self.histogram(self.particles))
139 return self.histogram(self.particles)
140

https://aipython.org Version 0.9.9 October 15, 2023

9.10. Hidden Markov Models 243

141 def advance(self):

142 """advance to the next time.
143 This assumes that all of the weights are 1."""
144 self.particles = [sample_one(self.hmm.trans[st])
145 for st in self.particles]
146
147 def observe(self, obs):
148 """reweighs the particles to incorporate observations obs"""
149 for i in range(len(self.particles)):
150 for obv in obs:
151 if obs[obv]:
152 self.weights[i] *= self.hmm.pobs[obv][self.particles[i]]
153 else:
154 self.weights[i] *=
1-self.hmm.pobs[obv][self.particles[i]]
155
156 def histogram(self, particles):
157 """returns list of the probability of each state as represented by
158 the particles"""
159 tot=0
160 hist = {st: 0.0 for st in self.hmm.states}
161 for (st,wt) in zip(self.particles,self.weights):
162 hist[st]+=wt
163 tot += wt
164 return {st:hist[st]/tot for st in hist}
165
166 def resample_particles(self):
167 """resamples to give a new set of particles."""
168 self.particles = resample(self.particles, self.weights,
len(self.particles))
169 self.weights = [1] * len(self.particles)

The following are some queries for hmm1.

probHMM.py — (continued)

171 hmm1pf1 = HMMparticleFilter(hmm1)

172 # HMMparticleFilter.max_display_level = 2 # show each step
173 # hmm1pf1.filter([{'m1':0, 'm2':1, 'm3':1}, {'m1':1, 'm2':0, 'm3':1}])
174 # hmm1pf2 = HMMparticleFilter(hmm1)
175 # hmm1pf2.filter([{'m1':1, 'm2':0, 'm3':0}, {'m1':0, 'm2':1, 'm3':0},
{'m1':1, 'm2':0, 'm3':0},
176 # {'m1':0, 'm2':0, 'm3':0}, {'m1':0, 'm2':0, 'm3':0},
{'m1':0, 'm2':0, 'm3':0},
177 # {'m1':0, 'm2':0, 'm3':0}, {'m1':0, 'm2':0, 'm3':1},
{'m1':0, 'm2':0, 'm3':1},
178 # {'m1':0, 'm2':0, 'm3':1}])
179 # hmm1pf3 = HMMparticleFilter(hmm1)
180 # hmm1pf3.filter([{'m1':1, 'm2':0, 'm3':0}, {'m1':0, 'm2':0, 'm3':0},
{'m1':1, 'm2':0, 'm3':0}, {'m1':1, 'm2':0, 'm3':1}])

Exercise 9.7 A form of importance sampling can be obtained by not resampling.

https://aipython.org Version 0.9.9 October 15, 2023

244 9. Reasoning with Uncertainty

Is it better or worse than particle filtering? Hint: you need to think about how
they can be compared. Is the comparison different if there are more states than
particles?
Exercise 9.8 Extend the particle filtering code to continuous variables and ob-
servations. In particular, suppose the state transition is a linear function with
Gaussian noise of the previous state, and the observations are linear functions
with Gaussian noise of the state. You may need to research how to sample from a
Gaussian distribution.

9.10.4 Generating Examples

The following code is useful for generating examples.

probHMM.py — (continued)

182 def simulate(hmm,horizon):

183 """returns a pair of (state sequence, observation sequence) of length
horizon.
184 for each time t, the agent is in state_sequence[t] and
185 observes observation_sequence[t]
186 """
187 state = sample_one(hmm.indist)
188 obsseq=[]
189 stateseq=[]
190 for time in range(horizon):
191 stateseq.append(state)
192 newobs =
{obs:sample_one({0:1-hmm.pobs[obs][state],1:hmm.pobs[obs][state]})
193 for obs in hmm.obsvars}
194 obsseq.append(newobs)
195 state = sample_one(hmm.trans[state])
196 return stateseq,obsseq
197
198 def simobs(hmm,stateseq):
199 """returns observation sequence for the state sequence"""
200 obsseq=[]
201 for state in stateseq:
202 newobs =
{obs:sample_one({0:1-hmm.pobs[obs][state],1:hmm.pobs[obs][state]})
203 for obs in hmm.obsvars}
204 obsseq.append(newobs)
205 return obsseq
206
207 def create_eg(hmm,n):
208 """Create an annotated example for horizon n"""
209 seq,obs = simulate(hmm,n)
210 print("True state sequence:",seq)
211 print("Sequence of observations:\n",obs)
212 hmmfilter = HMMVEfilter(hmm)
213 dist = hmmfilter.filter(obs)

https://aipython.org Version 0.9.9 October 15, 2023

9.11. Dynamic Belief Networks 245

214 print("Resulting distribution over states:\n",dist)

9.11 Dynamic Belief Networks

A dynamic belief network (DBN) is a belief network that extends in time.
There are a number of ways that reasoning can be carried out in a DBN,
including:

• Rolling out the DBN for some time period, and using standard belief net-
work inference. The latest time that needs to be in the rolled out network
is the time of the latest observation or the time of a query (whichever is
later). This allows us to observe any variables at any time and query any
variables at any time. This is covered in Section 9.11.2.

• An unrolled belief network may be very large, and we might only be in-
terested in asking about “now”. In this case we can just representing the
variables “now”. In this approach we can observe and query the current
variables. We can them move to the next time. This does not allow for
arbitrary historical queries (about the past or the future), but can be much
simpler. This is covered in Section 9.11.3.

9.11.1 Representing Dynamic Belief Networks

To specify a DBN, think about the distribution now. Now will be represented as
time 1. Each variable will have a corresponding previous variable; these will
be created together.
A dynamic belief network consists of:

• A set of features. A variable is a feature-time pair.

• An initial distribution over the features ”now” (time 1). This is a belief
network with all variables being time 1 variables.

• A specification of the dynamics. We define the how the variables now

(time 1) depend on variables now and the previous time (time 0), in such
a way that the graph is acyclic.

probDBN.py — Dynamic belief networks

11 from variable import Variable
12 from probGraphicalModels import GraphicalModel, BeliefNetwork
13 from probFactors import Prob, Factor, CPD
14 from probVE import VE
15 from display import Displayable
16
17 class DBNvariable(Variable):
18 """A random variable that incorporates the stage (time)

https://aipython.org Version 0.9.9 October 15, 2023

246 9. Reasoning with Uncertainty

19
20 A variable can have both a name and an index. The index defaults to 1.
21 """
22 def __init__(self,name,domain=[False,True],index=1):
23 Variable.__init__(self,f"{name}_{index}",domain)
24 self.basename = name
25 self.domain = domain
26 self.index = index
27 self.previous = None
28
29 def __lt__(self,other):
30 if self.name != other.name:
31 return self.name<other.name
32 else:
33 return self.index<other.index
34
35 def __gt__(self,other):
36 return other<self
37
38 def variable_pair(name,domain=[False,True]):
39 """returns a variable and its predecessor. This is used to define
2-stage DBNs
40
41 If the name is X, it returns the pair of variables X_prev,X_now"""
42 var_now = DBNvariable(name,domain,index='now')
43 var_prev = DBNvariable(name,domain,index='prev')
44 var_now.previous = var_prev
45 return var_prev, var_now
A FactorRename is a factor that is the result renaming the variables in the
factor. It takes a factor, fac, and a {new : old} dictionary, where new is the name
of a variable in the resulting factor and old is the corresponding name in fac.
This assumes that the all variables are renamed.
probDBN.py — (continued)

47 class FactorRename(Factor):
48 def __init__(self,fac,renaming):
49 """A renamed factor.
50 fac is a factor
51 renaming is a dictionary of the form {new:old} where old and new
var variables,
52 where the variables in fac appear exactly once in the renaming
53 """
54 Factor.__init__(self,[n for (n,o) in renaming.items() if o in
fac.variables])
55 self.orig_fac = fac
56 self.renaming = renaming
57
58 def get_value(self,assignment):
59 return self.orig_fac.get_value({self.renaming[var]:val
60 for (var,val) in assignment.items()

https://aipython.org Version 0.9.9 October 15, 2023

9.11. Dynamic Belief Networks 247

61 if var in self.variables})
The following class renames the variables of a conditional probability distri-
bution. It is used for template models (e.g., dynamic decision networks or
relational models)
probDBN.py — (continued)

63 class CPDrename(FactorRename, CPD):

64 def __init__(self, cpd, renaming):
65 renaming_inverse = {old:new for (new,old) in renaming.items()}
66 CPD.__init__(self,renaming_inverse[cpd.child],[renaming_inverse[p]
for p in cpd.parents])
67 self.orig_fac = cpd
68 self.renaming = renaming

probDBN.py — (continued)

70 class DBN(Displayable):
71 """The class of stationary Dynamic Belief networks.
72 * name is the DBN name
73 * vars_now is a list of current variables (each must have
74 previous variable).
75 * transition_factors is a list of factors for P(X|parents) where X
76 is a current variable and parents is a list of current or previous
variables.
77 * init_factors is a list of factors for P(X|parents) where X is a
78 current variable and parents can only include current variables
79 The graph of transition factors + init factors must be acyclic.
80
81 """
82 def __init__(self, title, vars_now, transition_factors=None,
init_factors=None):
83 self.title = title
84 self.vars_now = vars_now
85 self.vars_prev = [v.previous for v in vars_now]
86 self.transition_factors = transition_factors
87 self.init_factors = init_factors
88 self.var_index = {} # var_index[v] is the index of variable v
89 for i,v in enumerate(vars_now):
90 self.var_index[v]=i
Here is a 3 variable DBN:
probDBN.py — (continued)

92 A0,A1 = variable_pair("A", domain=[False,True])

93 B0,B1 = variable_pair("B", domain=[False,True])
94 C0,C1 = variable_pair("C", domain=[False,True])
95
96 # dynamics
97 pc = Prob(C1,[B1,C0],[[[0.03,0.97],[0.38,0.62]],[[0.23,0.77],[0.78,0.22]]])
98 pb = Prob(B1,[A0,A1],[[[0.5,0.5],[0.77,0.23]],[[0.4,0.6],[0.83,0.17]]])
99 pa = Prob(A1,[A0,B0],[[[0.1,0.9],[0.65,0.35]],[[0.3,0.7],[0.8,0.2]]])

https://aipython.org Version 0.9.9 October 15, 2023

248 9. Reasoning with Uncertainty

100
101 # initial distribution
102 pa0 = Prob(A1,[],[0.9,0.1])
103 pb0 = Prob(B1,[A1],[[0.3,0.7],[0.8,0.2]])
104 pc0 = Prob(C1,[],[0.2,0.8])
105
106 dbn1 = DBN("Simple DBN",[A1,B1,C1],[pa,pb,pc],[pa0,pb0,pc0])
Here is the animal example
probDBN.py — (continued)

108 from probHMM import closeMic, farMic, midMic, sm, mmc, sc, mcm, mcc
109
110 Pos_0,Pos_1 = variable_pair("Position",domain=[0,1,2,3])
111 Mic1_0,Mic1_1 = variable_pair("Mic1")
112 Mic2_0,Mic2_1 = variable_pair("Mic2")
113 Mic3_0,Mic3_1 = variable_pair("Mic3")
114
115 # conditional probabilities - see hmm for the values of sm,mmc, etc
116 ppos = Prob(Pos_1, [Pos_0],
117 [[sm, mmc, mmc, mmc], #was in middle
118 [mcm, sc, mcc, mcc], #was in corner 1
119 [mcm, mcc, sc, mcc], #was in corner 2
120 [mcm, mcc, mcc, sc]]) #was in corner 3
121 pm1 = Prob(Mic1_1, [Pos_1], [[1-midMic, midMic], [1-closeMic, closeMic],
122 [1-farMic, farMic], [1-farMic, farMic]])
123 pm2 = Prob(Mic2_1, [Pos_1], [[1-midMic, midMic], [1-farMic, farMic],
124 [1-closeMic, closeMic], [1-farMic, farMic]])
125 pm3 = Prob(Mic3_1, [Pos_1], [[1-midMic, midMic], [1-farMic, farMic],
126 [1-farMic, farMic], [1-closeMic, closeMic]])
127 ipos = Prob(Pos_1,[], [0.25, 0.25, 0.25, 0.25])
128 dbn_an =DBN("Animal DBN",[Pos_1,Mic1_1,Mic2_1,Mic3_1],
129 [ppos, pm1, pm2, pm3],
130 [ipos, pm1, pm2, pm3])

9.11.2 Unrolling DBNs

probDBN.py — (continued)

132 class BNfromDBN(BeliefNetwork):

133 """Belief Network unrolled from a dynamic belief network
134 """
135
136 def __init__(self,dbn,horizon):
137 """dbn is the dynamic belief network being unrolled
138 horizon>0 is the number of steps (so there will be horizon+1
variables for each DBN variable.
139 """
140 self.name2var = {var.basename:
[DBNvariable(var.basename,var.domain,index) for index in
range(horizon+1)]

https://aipython.org Version 0.9.9 October 15, 2023

9.11. Dynamic Belief Networks 249

141 for var in dbn.vars_now}

142 self.display(1,f"name2var={self.name2var}")
143 variables = {v for vs in self.name2var.values() for v in vs}
144 self.display(1,f"variables={variables}")
145 bnfactors = {CPDrename(fac,{self.name2var[var.basename][0]:var
146 for var in fac.variables})
147 for fac in dbn.init_factors}
148 bnfactors |= {CPDrename(fac,{self.name2var[var.basename][i]:var
149 for var in fac.variables if
var.index=='prev'}
150 | {self.name2var[var.basename][i+1]:var
151 for var in fac.variables if
var.index=='now'})
152 for fac in dbn.transition_factors
153 for i in range(horizon)}
154 self.display(1,f"bnfactors={bnfactors}")
155 BeliefNetwork.__init__(self, dbn.title, variables, bnfactors)

Here are two examples. Note that we need to use bn.name2var['B'][2] to

get the variable B2 (B at time 2).

probDBN.py — (continued)

157 # Try
158 #from probRC import ProbRC
159 #bn = BNfromDBN(dbn1,2) # construct belief network
160 #drc = ProbRC(bn) # initialize recursive conditioning
161 #B2 = bn.name2var['B'][2]
162 #drc.query(B2) #P(B2)
163 #drc.query(bn.name2var['B'][1],{bn.name2var['B'][0]:True,bn.name2var['C'][1]:False})
#P(B1|B0,C1)

9.11.3 DBN Filtering

If we only wanted to ask questions about the current state, we can save space
by forgetting the history variables.

probDBN.py — (continued)

164 class DBNVEfilter(VE):

165 def __init__(self,dbn):
166 self.dbn = dbn
167 self.current_factors = dbn.init_factors
168 self.current_obs = {}
169
170 def observe(self, obs):
171 """updates the current observations with obs.
172 obs is a variable:value dictionary where variable is a current
173 variable.
174 """
175 assert all(self.current_obs[var]==obs[var] for var in obs
176 if var in self.current_obs),"inconsistent current
observations"

https://aipython.org Version 0.9.9 October 15, 2023

250 9. Reasoning with Uncertainty

177 self.current_obs.update(obs) # note 'update' is a dict method

178
179 def query(self,var):
180 """returns the posterior probability of current variable var"""
181 return
VE(GraphicalModel(self.dbn.title,self.dbn.vars_now,self.current_factors)).query(var,self.c
182
183 def advance(self):
184 """advance to the next time"""
185 prev_factors = [self.make_previous(fac) for fac in
self.current_factors]
186 prev_obs = {var.previous:val for var,val in
self.current_obs.items()}
187 two_stage_factors = prev_factors + self.dbn.transition_factors
188 self.current_factors =
self.elim_vars(two_stage_factors,self.dbn.vars_prev,prev_obs)
189 self.current_obs = {}
190
191 def make_previous(self,fac):
192 """Creates new factor from fac where the current variables in fac
193 are renamed to previous variables.
194 """
195 return FactorRename(fac, {var.previous:var for var in
fac.variables})
196
197 def elim_vars(self,factors, vars, obs):
198 for var in vars:
199 if var in obs:
200 factors = [self.project_observations(fac,obs) for fac in
factors]
201 else:
202 factors = self.eliminate_var(factors, var)
203 return factors
Example queries:
probDBN.py — (continued)

205 #df = DBNVEfilter(dbn1)

206 #df.observe({B1:True}); df.advance(); df.observe({C1:False})
207 #df.query(B1) #P(B1|B0,C1)
208 #df.advance(); df.query(B1)
209 #dfa = DBNVEfilter(dbn_an)
210 # dfa.observe({Mic1_1:0, Mic2_1:1, Mic3_1:1})
211 # dfa.advance()
212 # dfa.observe({Mic1_1:1, Mic2_1:0, Mic3_1:1})
213 # dfa.query(Pos_1)

https://aipython.org Version 0.9.9 October 15, 2023

Chapter 10

Learning with Uncertainty

10.1 Bayesian Learning

The section contains two implementations of the (discretized) beta distribution.
The first represents Bayesian learning as a belief network. The second is an
interactive tool to understand the beta distribution.
The following uses a belief network representation from the previous chap-
ter to learn (discretized) probabilities. Figure 10.1 shows the output after ob-
serving heads, heads, tails. Notice the prediction of future tosses.

learnBayesian.py — Bayesian Learning

11 from variable import Variable
12 from probFactors import Prob
13 from probGraphicalModels import BeliefNetwork
14 from probRC import ProbRC
15
16 #### Coin Toss ###
17 # multiple coin tosses:
18 toss = ['tails','heads']
19 tosses = [ Variable(f"Toss#{i}", toss,
20 (0.8, 0.9-i/10) if i<10 else (0.4,0.2))
21 for i in range(11)]
22
23 def coinTossBN(num_bins = 10):
24 prob_bins = [x/num_bins for x in range(num_bins+1)]
25 PH = Variable("P_heads", prob_bins, (0.1,0.9))
26 p_PH = Prob(PH,[],{x:0.5/num_bins if x in [0,1] else 1/num_bins for x
in prob_bins})
27 p_tosses = [ Prob(tosses[i],[PH], {x:{'tails':1-x,'heads':x} for x in
prob_bins})
28 for i in range(11)]

251
252 10. Learning with Uncertainty

P_heads
0.0: 0.000
0.05: 0.001
0.1: 0.005
0.15: 0.012 Coin Tosses observed: {Toss#0: 'heads', Toss#1: 'heads', Toss#2: 'tails'}
0.2: 0.019
0.25: 0.028
0.3: 0.038
0.35: 0.048
0.4: 0.058
0.45: 0.067 Toss#0=heads
0.5: 0.075
0.55: 0.082
0.6: 0.087
0.65: 0.089
0.7: 0.088 Toss#1=heads
0.75: 0.085
0.8: 0.077
0.85: 0.065
0.9: 0.049
0.95: 0.027 Toss#2=tails
1.0: 0.000

Toss#3
tails: 0.401
heads: 0.599

Toss#4
tails: 0.401
heads: 0.599

Toss#5
tails: 0.401
heads: 0.599

Toss#6
tails: 0.401
heads: 0.599

Toss#10 Toss#7
tails: 0.401 tails: 0.401
heads: 0.599 heads: 0.599

Toss#8
tails: 0.401
heads: 0.599

Toss#9
tails: 0.401
heads: 0.599

Figure 10.1: coinTossBN after observing heads, heads, tails

https://aipython.org Version 0.9.9 October 15, 2023

10.1. Bayesian Learning 253

Beta Distribution
4.0
12 heads; 4 tails
3.5 3 heads; 1 tails
6 heads; 2 tails
3.0
2.5
Probability

2.0
1.5
1.0
0.5
0.0
0.0 0.2 0.4 0.6 0.8 1.0
P(Heads)
heads tails save reset

Figure 10.2: Beta distribution after some observations

29 return BeliefNetwork("Coin Tosses",

30 [PH]+tosses,
31 [p_PH]+p_tosses)
32
33
34 #
35 # coinRC = ProbRC(coinTossBN(20))
36 # coinRC.query(tosses[10],{tosses[0]:'heads'})
37 # coinRC.show_post({})
38 # coinRC.show_post({tosses[0]:'heads'})
39 # coinRC.show_post({tosses[0]:'heads',tosses[1]:'heads'})
40 # coinRC.show_post({tosses[0]:'heads',tosses[1]:'heads',tosses[2]:'tails'})

Figure 10.2 shows a plot of the Beta distribution (the P head variable in the
previous belief network) given some sets of observations.
This is a plot that is produced by the following interactive tool.
learnBayesian.py — (continued)

42 from display import Displayable

43 import matplotlib.pyplot as plt
44 from matplotlib.widgets import Button, CheckButtons
45
46 class Show_Beta(Displayable):

https://aipython.org Version 0.9.9 October 15, 2023

254 10. Learning with Uncertainty

47 def init(self,num=100, fontsize=10):

48 self.num = num
49 self.dist = [1 for i in range(num)]
50 self.vals = [i/num for i in range(num)]
51 self.fontsize = fontsize
52 self.saves = []
53 self.num_heads = 0
54 self.num_tails = 0
55 plt.ioff()
56 fig,(self.ax) = plt.subplots()
57 plt.subplots_adjust(bottom=0.2)
58 ## Set up buttons:
59 heads_butt = Button(plt.axes([0.05,0.02,0.1,0.05]), "heads")
60 heads_butt.label.set_fontsize(self.fontsize)
61 heads_butt.on_clicked(self.heads)
62 tails_butt = Button(plt.axes([0.25,0.02,0.1,0.05]), "tails")
63 tails_butt.label.set_fontsize(self.fontsize)
64 tails_butt.on_clicked(self.tails)
65 save_butt = Button(plt.axes([0.45,0.02,0.1,0.05]), "save")
66 save_butt.label.set_fontsize(self.fontsize)
67 save_butt.on_clicked(self.save)
68 reset_butt = Button(plt.axes([0.85,0.02,0.1,0.05]), "reset")
69 reset_butt.label.set_fontsize(self.fontsize)
70 reset_butt.on_clicked(self.reset)
71 ## draw the distribution
72 plt.subplot(1, 1, 1)
73 self.draw_dist()
74 plt.show()
75
76 def draw_dist(self):
77 sv = self.num/sum(self.dist)
78 self.dist = [v*sv for v in self.dist]
79 #print(self.dist)
80 self.ax.clear()
81 plt.ylabel("Probability", fontsize=self.fontsize)
82 plt.xlabel("P(Heads)", fontsize=self.fontsize)
83 plt.title("Beta Distribution", fontsize=self.fontsize)
84 plt.xticks(fontsize=self.fontsize)
85 plt.yticks(fontsize=self.fontsize)
86 self.ax.plot(self.vals, self.dist, color='black', label =
f"{self.num_heads} heads; {self.num_tails} tails")
87 for (nh,nt,d) in self.saves:
88 self.ax.plot(self.vals, d, label = f"{nh} heads; {nt} tails")
89 self.ax.legend()
90 plt.draw()
91
92 def heads(self,event):
93 self.num_heads += 1
94 self.dist = [self.dist[i]*self.vals[i] for i in range(self.num)]
95 self.draw_dist()

https://aipython.org Version 0.9.9 October 15, 2023

10.2. K-means 255

96 def tails(self,event):
97 self.num_tails += 1
98 self.dist = [self.dist[i]*(1-self.vals[i]) for i in range(self.num)]
99 self.draw_dist()
100 def save(self,event):
101 self.saves.append((self.num_heads,self.num_tails,self.dist))
102 self.draw_dist()
103 def reset(self,event):
104 self.num_tails = 0
105 self.num_heads = 0
106 self.dist = [1/self.num for i in range(self.num)]
107 self.draw_dist()
108
109 # s1 = Show_Beta(100)
110 # sl = Show_Beta(100, fontsize=15) # for demos - enlarge window

10.2 K-means
The k-means learner maintains two lists that suffice as sufficient statistics to
classify examples, and to learn the classification:

• class counts is a list such that class counts[c] is the number of examples in
the training set with class = c.

• feature sum is a list such that feature sum[i][c] is sum of the values for the
i’th feature i for members of class c. The average value of the ith feature
in class i is
feature sum[i][c]
class counts[c]

The class is initialized by randomly assigning examples to classes, and updat-

ing the statistics for class counts and feature sum.
learnKMeans.py — k-means learning
11 from learnProblem import Data_set, Learner, Data_from_file
12 import random
13 import matplotlib.pyplot as plt
14
15 class K_means_learner(Learner):
16 def __init__(self,dataset, num_classes):
17 self.dataset = dataset
18 self.num_classes = num_classes
19 self.random_initialize()
20
21 def random_initialize(self):
22 # class_counts[c] is the number of examples with class=c
23 self.class_counts = [0]*self.num_classes

https://aipython.org Version 0.9.9 October 15, 2023

256 10. Learning with Uncertainty

24 # feature_sum[i][c] is the sum of the values of feature i for class

c
25 self.feature_sum = [[0]*self.num_classes
26 for feat in self.dataset.input_features]
27 for eg in self.dataset.train:
28 cl = random.randrange(self.num_classes) # assign eg to random
class
29 self.class_counts[cl] += 1
30 for (ind,feat) in enumerate(self.dataset.input_features):
31 self.feature_sum[ind][cl] += feat(eg)
32 self.num_iterations = 0
33 self.display(1,"Initial class counts: ",self.class_counts)

The distance from (the mean of) a class to an example is the sum, over all
features, of the sum-of-squares differences of the class mean and the example
value.

learnKMeans.py — (continued)

35 def distance(self,cl,eg):
36 """distance of the eg from the mean of the class"""
37 return sum( (self.class_prediction(ind,cl)-feat(eg))**2
38 for (ind,feat) in
enumerate(self.dataset.input_features))
39
40 def class_prediction(self,feat_ind,cl):
41 """prediction of the class cl on the feature with index feat_ind"""
42 if self.class_counts[cl] == 0:
43 return 0 # there are no examples so we can choose any value
44 else:
45 return self.feature_sum[feat_ind][cl]/self.class_counts[cl]
46
47 def class_of_eg(self,eg):
48 """class to which eg is assigned"""
49 return (min((self.distance(cl,eg),cl)
50 for cl in range(self.num_classes)))[1]
51 # second element of tuple, which is a class with minimum
distance

One step of k-means updates the class counts and feature sum. It uses the old
values to determine the classes, and so the new values for class counts and
feature sum. At the end it determines whether the values of these have changes,
and then replaces the old ones with the new ones. It returns an indicator of
whether the values are stable (have not changed).

learnKMeans.py — (continued)

53 def k_means_step(self):
54 """Updates the model with one step of k-means.
55 Returns whether the assignment is stable.
56 """
57 new_class_counts = [0]*self.num_classes

https://aipython.org Version 0.9.9 October 15, 2023

10.2. K-means 257

58 # feature_sum[i][c] is the sum of the values of feature i for class

c
59 new_feature_sum = [[0]*self.num_classes
60 for feat in self.dataset.input_features]
61 for eg in self.dataset.train:
62 cl = self.class_of_eg(eg)
63 new_class_counts[cl] += 1
64 for (ind,feat) in enumerate(self.dataset.input_features):
65 new_feature_sum[ind][cl] += feat(eg)
66 stable = (new_class_counts == self.class_counts) and
(self.feature_sum == new_feature_sum)
67 self.class_counts = new_class_counts
68 self.feature_sum = new_feature_sum
69 self.num_iterations += 1
70 return stable
71
72
73 def learn(self,n=100):
74 """do n steps of k-means, or until convergence"""
75 i=0
76 stable = False
77 while i<n and not stable:
78 stable = self.k_means_step()
79 i += 1
80 self.display(1,"Iteration",self.num_iterations,
81 "class counts: ",self.class_counts,"
Stable=",stable)
82 return stable
83
84 def show_classes(self):
85 """sorts the data by the class and prints in order.
86 For visualizing small data sets
87 """
88 class_examples = [[] for i in range(self.num_classes)]
89 for eg in self.dataset.train:
90 class_examples[self.class_of_eg(eg)].append(eg)
91 print("Class","Example",sep='\t')
92 for cl in range(self.num_classes):
93 for eg in class_examples[cl]:
94 print(cl,*eg,sep='\t')
95
96 def plot_error(self, maxstep=20):
97 """Plots the sum-of-squares error as a function of the number of
steps"""
98 plt.ion()
99 plt.xlabel("step")
100 plt.ylabel("Ave sum-of-squares error")
101 train_errors = []
102 if self.dataset.test:
103 test_errors = []

https://aipython.org Version 0.9.9 October 15, 2023

258 10. Learning with Uncertainty

104 for i in range(maxstep):

105 self.learn(1)
106 train_errors.append( sum(self.distance(self.class_of_eg(eg),eg)
107 for eg in self.dataset.train)
108 /len(self.dataset.train))
109 if self.dataset.test:
110 test_errors.append(
sum(self.distance(self.class_of_eg(eg),eg)
111 for eg in self.dataset.test)
112 /len(self.dataset.test))
113 plt.plot(range(1,maxstep+1),train_errors,
114 label=str(self.num_classes)+" classes. Training set")
115 if self.dataset.test:
116 plt.plot(range(1,maxstep+1),test_errors,
117 label=str(self.num_classes)+" classes. Test set")
118 plt.legend()
119 plt.draw()
120
121 %data = Data_from_file('data/emdata1.csv', num_train=10,
target_index=2000) % trivial example
122 data = Data_from_file('data/emdata2.csv', num_train=10, target_index=2000)
123 %data = Data_from_file('data/emdata0.csv', num_train=14,
target_index=2000) % example from textbook
124 kml = K_means_learner(data,2)
125 num_iter=4
126 print("Class assignment after",num_iter,"iterations:")
127 kml.learn(num_iter); kml.show_classes()
128
129 # Plot the error
130 # km2=K_means_learner(data,2); km2.plot_error(20) # 2 classes
131 # km3=K_means_learner(data,3); km3.plot_error(20) # 3 classes
132 # km13=K_means_learner(data,13); km13.plot_error(20) # 13 classes
133
134 # data = Data_from_file('data/carbool.csv',
target_index=2000,boolean_features=True)
135 # kml = K_means_learner(data,3)
136 # kml.learn(20); kml.show_classes()
137 # km3=K_means_learner(data,3); km3.plot_error(20) # 3 classes
138 # km3=K_means_learner(data,30); km3.plot_error(20) # 30 classes

Exercise 10.1 Change boolean features = True flag to allow for numerical features.
K-means assumes the features are numerical, so we want to make non-numerical
features into numerical features (using characteristic functions) but we probably
don’t want to change numerical features into Boolean.
Exercise 10.2 If there are many classes, some of the classes can become empty
(e.g., try 100 classes with carbool.csv). Implement a way to put some examples
into a class, if possible. Two ideas are:
(a) Initialize the classes with actual examples, so that the classes will not start
empty. (Do the classes become empty?)

https://aipython.org Version 0.9.9 October 15, 2023

10.3. EM 259

(b) In class prediction, we test whether the code is empty, and make a prediction
of 0 for an empty class. It is possible to make a different prediction to “steal”
an example (but you should make sure that a class has a consistent value for
each feature in a loop).
Make your own suggestions, and compare it with the original, and whichever of
these you think may work better.

10.3 EM
In the following definition, a class, c, is a integer in range [0, num classes). i is
an index of a feature, so feat[i] is the ith feature, and a feature is a function from
tuples to values. val is a value of a feature.
A model consists of 2 lists, which form the sufficient statistics:

• class counts is a list such that class counts[c] is the number of tuples with
class = c, where each tuple is weighted by its probability, i.e.,

class counts[c] = ∑ P(t)

t:class(t)=c

• feature counts is a list such that feature counts[i][val][c] is the weighted count
of the number of tuples t with feat[i](t) = val and class(t) = c, each tuple
is weighted by its probability, i.e.,

feature counts[i][val][c] = ∑ P(t)

t:feat[i](t)=val andclass(t)=c

learnEM.py — EM Learning
11 from learnProblem import Data_set, Learner, Data_from_file
12 import random
13 import math
14 import matplotlib.pyplot as plt
15
16 class EM_learner(Learner):
17 def __init__(self,dataset, num_classes):
18 self.dataset = dataset
19 self.num_classes = num_classes
20 self.class_counts = None
21 self.feature_counts = None
The function em step goes though the training examples, and updates these
counts. The first time it is run, when there is no model, it uses random distri-
butions.
learnEM.py — (continued)

23 def em_step(self, orig_class_counts, orig_feature_counts):

https://aipython.org Version 0.9.9 October 15, 2023

260 10. Learning with Uncertainty

24 """updates the model."""

25 class_counts = [0]*self.num_classes
26 feature_counts = [{val:[0]*self.num_classes
27 for val in feat.frange}
28 for feat in self.dataset.input_features]
29 for tple in self.dataset.train:
30 if orig_class_counts: # a model exists
31 tpl_class_dist = self.prob(tple, orig_class_counts,
orig_feature_counts)
32 else: # initially, with no model, return a random
distribution
33 tpl_class_dist = random_dist(self.num_classes)
34 for cl in range(self.num_classes):
35 class_counts[cl] += tpl_class_dist[cl]
36 for (ind,feat) in enumerate(self.dataset.input_features):
37 feature_counts[ind][feat(tple)][cl] += tpl_class_dist[cl]
38 return class_counts, feature_counts
prob computes the probability of a class c for a tuple tpl, given the current statis-
tics.

P(c | tple) ∝ P(c) ∗ ∏ P(Xi =tple(i) | c)

i
class counts[c] feature counts[i][feati (tple)][c]
len(self .dataset) ∏
= ∗
i
class counts[c]
∏i feature counts[i][feati (tple)][c]
∝
class counts[c]|feats|−1

The last step is because len(self .dataset) is a constant (independent of c). class counts[c]
can be taken out of the product, but needs to be raised to the power of the num-
ber of features, and one of them cancels.
learnEM.py — (continued)

40 def prob(self, tple, class_counts, feature_counts):

41 """returns a distribution over the classes for tuple tple in the
model defined by the counts
42 """
43 feats = self.dataset.input_features
44 unnorm = [prod(feature_counts[i][feat(tple)][c]
45 for (i,feat) in enumerate(feats))
46 /(class_counts[c]**(len(feats)-1))
47 for c in range(self.num_classes)]
48 thesum = sum(unnorm)
49 return [un/thesum for un in unnorm]
learn does n steps of EM:
learnEM.py — (continued)

51 def learn(self,n):
52 """do n steps of em"""

https://aipython.org Version 0.9.9 October 15, 2023

10.3. EM 261

53 for i in range(n):
54 self.class_counts,self.feature_counts =
self.em_step(self.class_counts,
55 self.feature_counts)
The following is for visualizing the classes. It prints the dataset ordered by the
probability of class c.
learnEM.py — (continued)

57 def show_class(self,c):
58 """sorts the data by the class and prints in order.
59 For visualizing small data sets
60 """
61 sorted_data =
sorted((self.prob(tpl,self.class_counts,self.feature_counts)[c],
62 ind, # preserve ordering for equal
probabilities
63 tpl)
64 for (ind,tpl) in enumerate(self.dataset.train))
65 for cc,r,tpl in sorted_data:
66 print(cc,*tpl,sep='\t')
The following are for evaluating the classes.
The probability of a tuple can be evaluated by marginalizing over the classes:

P(tple) = ∑ P(c) ∗ ∏ P(Xi =tple(i) | c)

c i
cc[c] fc[i][feati (tple)][c]
=∑ ∗∏
c len(self .dataset) i
cc[c]

where cc is the class count and fc is feature count. len(self .dataset) can be dis-
tributed out of the sum, and cc[c] can be taken out of the product:
1 1
=
len(self .dataset) ∑ cc[c]#feats−1 ∗ ∏ fc[i][feati (tple)][c]
c i

Given the probability of each tuple, we can evaluate the logloss, as the negative
of the log probability:
learnEM.py — (continued)

68 def logloss(self,tple):
69 """returns the logloss of the prediction on tple, which is
-log(P(tple))
70 based on the current class counts and feature counts
71 """
72 feats = self.dataset.input_features
73 res = 0
74 cc = self.class_counts
75 fc = self.feature_counts
76 for c in range(self.num_classes):
77 res += prod(fc[i][feat(tple)][c]

https://aipython.org Version 0.9.9 October 15, 2023

262 10. Learning with Uncertainty

78 for (i,feat) in
enumerate(feats))/(cc[c]**(len(feats)-1))
79 if res>0:
80 return -math.log2(res/len(self.dataset.train))
81 else:
82 return float("inf") #infinity
83
84 def plot_error(self, maxstep=20):
85 """Plots the logloss error as a function of the number of steps"""
86 plt.ion()
87 plt.xlabel("step")
88 plt.ylabel("Ave Logloss (bits)")
89 train_errors = []
90 if self.dataset.test:
91 test_errors = []
92 for i in range(maxstep):
93 self.learn(1)
94 train_errors.append( sum(self.logloss(tple) for tple in
self.dataset.train)
95 /len(self.dataset.train))
96 if self.dataset.test:
97 test_errors.append( sum(self.logloss(tple) for tple in
self.dataset.test)
98 /len(self.dataset.test))
99 plt.plot(range(1,maxstep+1),train_errors,
100 label=str(self.num_classes)+" classes. Training set")
101 if self.dataset.test:
102 plt.plot(range(1,maxstep+1),test_errors,
103 label=str(self.num_classes)+" classes. Test set")
104 plt.legend()
105 plt.draw()
106
107 def prod(L):
108 """returns the product of the elements of L"""
109 res = 1
110 for e in L:
111 res *= e
112 return res
113
114 def random_dist(k):
115 """generate k random numbers that sum to 1"""
116 res = [random.random() for i in range(k)]
117 s = sum(res)
118 return [v/s for v in res]
119
120 data = Data_from_file('data/emdata2.csv', num_train=10, target_index=2000)
121 eml = EM_learner(data,2)
122 num_iter=2
123 print("Class assignment after",num_iter,"iterations:")
124 eml.learn(num_iter); eml.show_class(0)

https://aipython.org Version 0.9.9 October 15, 2023

10.3. EM 263

125
126 # Plot the error
127 # em2=EM_learner(data,2); em2.plot_error(40) # 2 classes
128 # em3=EM_learner(data,3); em3.plot_error(40) # 3 classes
129 # em13=EM_learner(data,13); em13.plot_error(40) # 13 classes
130
131 # data = Data_from_file('data/carbool.csv',
target_index=2000,boolean_features=False)
132 # [f.frange for f in data.input_features]
133 # eml = EM_learner(data,3)
134 # eml.learn(20); eml.show_class(0)
135 # em3=EM_learner(data,3); em3.plot_error(60) # 3 classes
136 # em3=EM_learner(data,30); em3.plot_error(60) # 30 classes

Exercise 10.3 For the EM data, where there are naturally 2 classes, 3 classes does
better on the training set after a while than 2 classes, but worse on the test set.
Explain why. Hint: look what the 3 classes are. Use ”em3.show class(i)” for each
of the classes i ∈ [0, 3).
Exercise 10.4 Write code to plot the logloss as a function of the number of classes
(from 1 to say 15) for a fixed number of iterations. (From the experience with the
existing code, think about how many iterations is appropriate.)

https://aipython.org Version 0.9.9 October 15, 2023

Chapter 11

Causality

11.1 Do Questions
A causal model can answer “do” questions.
The intervene function takes a belief network and a variable:value dictio-
nary specifying what to “do”, and returns a belief network resulting from in-
tevening to set each variable in the disctionary to its value specified. It replaces
each CPD with an constant CPD.
probDo.py — Probabilistic inference with the do operator
11 from probGraphicalModels import InferenceMethod, BeliefNetwork
12 from probFactors import CPD, ConstantCPD
13
14 def intervene(bn, do={}):
15 assert isinstance(bn, BeliefNetwork), f"Do only applies to belief
networks ({bn.title})"
16 if do=={}:
17 return bn
18 else:
19 newfacs = ({f for (ch,f) in bn.var2cpt.items() if ch not in do} |
20 {ConstantCPD(v,c) for (v,c) in do.items()})
21 return BeliefNetwork(f"{bn.title}(do={do})", bn.variables, newfacs)
The following adds the queryDo method to the InferenceMethod class, so it
can be used with any inference method. It replaces the graphical model with
the modified one, runs the inference algorithm, and resores the initial belief
network.
probDo.py — (continued)

23 def queryDo(self, qvar, obs={}, do={}):

24 """Extends query method to also allow for interventions.
25 """

265
266 11. Causality

26 oldBN, self.gm = self.gm, intervene(self.gm, do)

27 result = self.query(qvar, obs)
28 self.gm = oldBN # restore original
29 return result
30
31 # make queryDo available for all inference methods
32 InferenceMethod.queryDo = queryDo
Test cases:
probDo.py — (continued)

34 from probRC import ProbRC

35
36 from probExamples import bn_sprinkler, Season, Sprinkler, Rained,
Grass_wet, Grass_shiny, Shoes_wet
37 bn_sprinklerv = ProbRC(bn_sprinkler)
38 ## bn_sprinklerv.queryDo(Shoes_wet)
39 ## bn_sprinklerv.queryDo(Shoes_wet,obs={Sprinkler:"on"})
40 ## bn_sprinklerv.queryDo(Shoes_wet,do={Sprinkler:"on"})
41 ## bn_sprinklerv.queryDo(Season, obs={Sprinkler:"on"})
42 ## bn_sprinklerv.queryDo(Season, do={Sprinkler:"on"})
43
44 ### Showing posterior distributions:
45 # bn_sprinklerv.show_post({})
46 # bn_sprinklerv.show_post({Sprinkler:"on"})
47 # spon = intervene(bn_sprinkler, do={Sprinkler:"on"})
48 # ProbRC(spon).show_post({})
The following is a representation of a possible model where marijuana is a
gateway drug to harder drugs (or not). Try the queries at the end.
probDo.py — (continued)

50 from variable import Variable

51 from probFactors import Prob
52 from probGraphicalModels import BeliefNetwork
53 boolean = [False, True]
54
55 Drug_Prone = Variable("Drug_Prone", boolean, position=(0.1,0.5)) #
(0.5,0.9))
56 Side_Effects = Variable("Side_Effects", boolean, position=(0.1,0.5)) #
(0.5,0.1))
57 Takes_Marijuana = Variable("\nTakes_Marijuana\n", boolean,
position=(0.1,0.5))
58 Takes_Hard_Drugs = Variable("Takes_Hard_Drugs", boolean,
position=(0.9,0.5))
59
60 p_dp = Prob(Drug_Prone, [], [0.8, 0.2])
61 p_be = Prob(Side_Effects, [Takes_Marijuana], [[1, 0], [0.4, 0.6]])
62 p_tm = Prob(Takes_Marijuana, [Drug_Prone], [[0.98, 0.02], [0.2, 0.8]])
63 p_thd = Prob(Takes_Hard_Drugs, [Side_Effects, Drug_Prone],
64 # Drug_Prone=False Drug_Prone=True
65 [[[0.999, 0.001], [0.6, 0.4]], # Side_Effects=False

https://aipython.org Version 0.9.9 October 15, 2023

11.2. Counterfactual Example 267

66 [[0.99999, 0.00001], [0.995, 0.005]]]) # Side_Effects=True

67
68 drugs = BeliefNetwork("Gateway Drug?",
69 [Drug_Prone,Side_Effects, Takes_Marijuana,
Takes_Hard_Drugs],
70 [p_tm, p_dp, p_be, p_thd])
71
72 drugsq = ProbRC(drugs)
73 # drugsq.queryDo(Takes_Hard_Drugs)
74 # drugsq.queryDo(Takes_Hard_Drugs, obs = {Takes_Marijuana: True})
75 # drugsq.queryDo(Takes_Hard_Drugs, obs = {Takes_Marijuana: False})
76 # drugsq.queryDo(Takes_Hard_Drugs, do = {Takes_Marijuana: True})
77 # drugsq.queryDo(Takes_Hard_Drugs, do = {Takes_Marijuana: False})
78
79
80 # ProbRC(drugs).show_post({})
81 # ProbRC(drugs).show_post({Takes_Marijuana: True})
82 # ProbRC(drugs).show_post({Takes_Marijuana: False})
83 # ProbRC(intervene(drugs, do={Takes_Marijuana: True})).show_post({})
84 # ProbRC(intervene(drugs, do={Takes_Marijuana: False})).show_post({})
85 # Why was that? Try the following then repeat:
86 # Drug_Prone.position=(0.5,0.9); Side_Effects.position=(0.5,0.1)

11.2 Counterfactual Example

This is for a chain A → B → C where the query is A=true, C=true is observed;
what is the probability of C is A were false. See Figure 11.1.
probCounterfactual.py — Counterfactual Query Example
11 from probVariables import Variable
12 from probFactors import Prob, ProbDT, IFeq, Dist
13 from probGraphicalModels import BeliefNetwork
14 from probRC import ProbRC
15 from probDo import queryDo
16
17 boolean = [False, True]
18
19 # without a deterministic system
20 Ap = Variable("Ap", boolean, position=(0.2,0.8))
21 Bp = Variable("Bp", boolean, position=(0.2,0.4))
22 Cp = Variable("Cp", boolean, position=(0.2,0.0))
23 p_Ap = Prob(Ap, [], [0.5,0.5])
24 p_Bp = Prob(Bp, [Ap], [[0.6,0.4], [0.6,0.4]]) # does not depend on A!
25 p_Cp = Prob(Cp, [Bp], [[0.2,0.8], [0.9,0.1]])
26 abcSimple = BeliefNetwork("ABC Simple",
27 [Ap,Bp,Cp],
28 [p_Ap, p_Bp, p_Cp])
29 ABCsimpq = ProbRC(abcSimple)
30 # ABCsimpq.show_post(obs = {Ap:True, Cp:True})

https://aipython.org Version 0.9.9 October 15, 2023

268 11. Causality

ABC Counterfactual Example

A B if a B if not a A'

B C if b C if not b B'

C C'

Figure 11.1: A → B → C belief network for “what if A”

31
32 # as a deterministic system with independent noise
33 A = Variable("A", boolean, position=(0.2,0.8))
34 B = Variable("B", boolean, position=(0.2,0.4))
35 C = Variable("C", boolean, position=(0.2,0.0))
36 Aprime = Variable("A'", boolean, position=(0.8,0.8))
37 Bprime = Variable("B'", boolean, position=(0.8,0.4))
38 Cprime = Variable("C'", boolean, position=(0.8,0.0))
39 BifA = Variable("B if a", boolean, position=(0.4,0.8))
40 BifnA = Variable("B if not a", boolean, position=(0.6,0.8))
41 CifB = Variable("C if b", boolean, position=(0.4,0.4))
42 CifnB = Variable("C if not b", boolean, position=(0.6,0.4))
43
44 p_A = Prob(A, [], [0.5,0.5])
45 p_B = Prob(B, [A, BifA, BifnA], [[[[1,0],[0,1]],[[1,0],[0,1]]], # A=0
46 [[[1,0],[1,0]],[[0,1],[0,1]]]]) # A=1
47 p_C = Prob(C, [B, CifB, CifnB], [[[[1,0],[0,1]],[[1,0],[0,1]]], # B=0
48 [[[1,0],[1,0]],[[0,1],[0,1]]]]) # B=1
49 p_Aprime = Prob(Aprime,[], [0.6,0.4])
50 p_Bprime = Prob(Bprime, [Aprime, BifA, BifnA],
[[[[1,0],[0,1]],[[1,0],[0,1]]], # A=0
51 [[[1,0],[1,0]],[[0,1],[0,1]]]]) # A=1
52 p_Cprime = Prob(Cprime, [Bprime, CifB, CifnB],
[[[[1,0],[0,1]],[[1,0],[0,1]]], # B=0

https://aipython.org Version 0.9.9 October 15, 2023

11.2. Counterfactual Example 269

53 [[[1,0],[1,0]],[[0,1],[0,1]]]]) # B=1
54 p_bifa = Prob(BifA, [], [0.6,0.4]) # Does not actually depend on A!!!
55 p_bifna = Prob(BifnA, [], [0.6,0.4])
56 p_cifb = Prob(CifB, [], [0.9,0.1])
57 p_cifnb = Prob(CifnB, [], [0.2,0.8])
58
59 abcCounter = BeliefNetwork("ABC Counterfactual Example",
60 [A,B,C,Aprime,Bprime,Cprime,BifA, BifnA, CifB,
CifnB],
61 [p_A,p_B,p_C,p_Aprime,p_Bprime, p_Cprime, p_bifa,
p_bifna, p_cifb, p_cifnb])
62
63 abcq = ProbRC(abcCounter)
64 # abcq.queryDo(Cprime, obs = {Aprime:False, A:True})
65 # abcq.queryDo(Cprime, obs = {C:True, Aprime:False})
66 # abcq.queryDo(Cprime, obs = {A:True, C:True, Aprime:False})
67 # abcq.queryDo(Cprime, obs = {A:True, C:True, Aprime:False})
68 # abcq.queryDo(Cprime, obs = {A:False, C:True, Aprime:False})
69 # abcq.queryDo(CifB, obs = {C:True,Aprime:False})
70 # abcq.queryDo(CifnB, obs = {C:True,Aprime:False})
71
72 # abcq.show_post(obs = {})
73 # abcq.show_post(obs = {Aprime:False, A:True})
74 # abcq.show_post(obs = {A:True, C:True, Aprime:False})
75 # abcq.show_post(obs = {A:True, C:True, Aprime:True})

The following is the firing squad example of Pearl. See Figure 11.2.

probCounterfactual.py — (continued)

77 Order = Variable("Order", boolean, position=(0.4,0.8))

78 S1 = Variable("S1", boolean, position=(0.3,0.4))
79 S1o = Variable("S1o", boolean, position=(0.1,0.8))
80 S1n = Variable("S1n", boolean, position=(0.0,0.6))
81 S2 = Variable("S2", boolean, position=(0.5,0.4))
82 S2o = Variable("S2o", boolean, position=(0.7,0.8))
83 S2n = Variable("S2n", boolean, position=(0.8,0.6))
84 Dead = Variable("Dead", boolean, position=(0.4,0.0))
85
86 def eqto(var):
87 return IFeq(var,True,Dist([0,1]), Dist([1,0]))
88
89 p_S1 = ProbDT(S1, [Order, S1o, S1n],
90 IFeq(Order,True, eqto(S1o), eqto(S1n)))
91 p_S2 = ProbDT(S2, [Order, S2o, S2n],
92 IFeq(Order,True, eqto(S2o), eqto(S2n)))
93 #p_S1 = Prob(S1, [Order, S1o, S1n], [[[[1,0],[0,1]],[[1,0],[0,1]]], #
Order=0
94 # [[[1,0],[1,0]],[[0,1],[0,1]]]]) # Order=1
95 #p_S2 = Prob(S2, [Order, S2o, S2n], [[[[1,0],[0,1]],[[1,0],[0,1]]], #
Order=0
96 # [[[1,0],[1,0]],[[0,1],[0,1]]]]) # Order=1

https://aipython.org Version 0.9.9 October 15, 2023

270 11. Causality

Firing squad observed: {}

S1o Order S2o
False: 0.010 False: 0.900 False: 0.010
True: 0.990 True: 0.100 True: 0.990
S1n S2n
False: 0.990 False: 0.990
True: 0.010 True: 0.010
S1 S2
False: 0.892 False: 0.892
True: 0.108 True: 0.108

Dead
False: 0.882
True: 0.118

Figure 11.2: Firing squad belief network

97 p_dead = Prob(Dead, [S1,S2], [[[1,0],[0,1]],[[0,1],[0,1]]])

98 p_order = Prob(Order, [], [0.9, 0.1])
99 p_s1o = Prob(S1o, [], [0.01, 0.99])
100 p_s1n = Prob(S1n, [], [0.99, 0.01])
101 p_s2o = Prob(S2o, [], [0.01, 0.99])
102 p_s2n = Prob(S2n, [], [0.99, 0.01])
103
104 firing_squad = BeliefNetwork("Firing squad",
105 [Order, S1, S1o, S1n, S2, S2o, S2n, Dead],
106 [p_order, p_dead, p_S1, p_s1o, p_s1n, p_S2, p_s2o,
p_s2n])
107 fsq = ProbRC(firing_squad)
108 # fsq.queryDo(Dead)
109 # fsq.queryDo(Order, obs={Dead:True})
110 # fsq.queryDo(Dead, obs={Order:True})
111 # fsq.show_post({})
112 # fsq.show_post({Dead:True})
113 # fsq.show_post({Order:True})

https://aipython.org Version 0.9.9 October 15, 2023

Chapter 12

Planning with Uncertainty

12.1 Decision Networks

The decision network code builds on the representation for belief networks of
Chapter 9.
We first allow for factors that define the utility. Here the utility is a function
of the variables in vars. In a utility table the utility is defined in terms of a
tabular factor – a list that enumerates the values – as in Section 9.3.3.
decnNetworks.py — Representations for Decision Networks
11 from probGraphicalModels import GraphicalModel, BeliefNetwork
12 from probFactors import Factor, CPD, TabFactor, factor_times, Prob
13 from variable import Variable
14 import matplotlib.pyplot as plt
15
16 class Utility(Factor):
17 """A factor defining a utility"""
18 pass
19
20 class UtilityTable(TabFactor, Utility):
21 """A factor defining a utility using a table"""
22 def __init__(self, vars, table, position=None):
23 """Creates a factor on vars from the table.
24 The table is ordered according to vars.
25 """
26 TabFactor.__init__(self,vars,table, name="Utility")
27 self.position = position
A decision variable is a like a random variable with a string name, and a do-
main, which is a list of possible values. The decision variable also includes the
parents, a list of the variables whose value will be known when the decision is
made. It also includes a potion, which is only used for plotting.

271
272 12. Planning with Uncertainty

decnNetworks.py — (continued)

29 class DecisionVariable(Variable):
30 def __init__(self, name, domain, parents, position=None):
31 Variable.__init__(self, name, domain, position)
32 self.parents = parents
33 self.all_vars = set(parents) | {self}
A decision network is a graphical model where the variables can be random
variables or decision variables. Among the factors we assume there is one util-
ity factor.
decnNetworks.py — (continued)

35 class DecisionNetwork(BeliefNetwork):
36 def __init__(self, title, vars, factors):
37 """vars is a list of variables
38 factors is a list of factors (instances of CPD and Utility)
39 """
40 GraphicalModel.__init__(self, title, vars, factors) # don't call
init for BeliefNetwork
41 self.var2parents = ({v : v.parents for v in vars if
isinstance(v,DecisionVariable)}
42 | {f.child:f.parents for f in factors if
isinstance(f,CPD)})
43 self.children = {n:[] for n in self.variables}
44 for v in self.var2parents:
45 for par in self.var2parents[v]:
46 self.children[par].append(v)
47 self.utility_factor = [f for f in factors if
isinstance(f,Utility)][0]
48 self.topological_sort_saved = None
49
50 def __str__(self):
51 return self.title
The split order ensures that the parents of a decision node are split before
the decision node, and no other variables (if that is possible).
decnNetworks.py — (continued)

53 def split_order(self):
54 so = []
55 tops = self.topological_sort()
56 for v in tops:
57 if isinstance(v,DecisionVariable):
58 so += [p for p in v.parents if p not in so]
59 so.append(v)
60 so += [v for v in tops if v not in so]
61 return so

decnNetworks.py — (continued)

63 def show(self):
64 plt.ion() # interactive

https://aipython.org Version 0.9.9 October 15, 2023

12.1. Decision Networks 273

Umbrella Decision Network

Weather

Forecast Utility

Umbrella

Figure 12.1: The umbrella decision network

65 ax = plt.figure().gca()
66 ax.set_axis_off()
67 plt.title(self.title)
68 for par in self.utility_factor.variables:
69 ax.annotate("Utility", par.position,
xytext=self.utility_factor.position,
70 arrowprops={'arrowstyle':'<-'},
71 bbox=dict(boxstyle="sawtooth,pad=1.0",color="red"),
72 ha='center')
73 for var in reversed(self.topological_sort()):
74 if isinstance(var,DecisionVariable):
75 bbox = dict(boxstyle="square,pad=1.0",color="green")
76 else:
77 bbox = dict(boxstyle="round4,pad=1.0,rounding_size=0.5")
78 if self.var2parents[var]:
79 for par in self.var2parents[var]:
80 ax.annotate(var.name, par.position, xytext=var.position,
81 arrowprops={'arrowstyle':'<-'},bbox=bbox,
82 ha='center')
83 else:
84 x,y = var.position
85 plt.text(x,y,var.name,bbox=bbox,ha='center')

12.1.1 Example Decision Networks

Umbrella Decision Network
Here is a simple ”umbrella” decision network. The output of umbrella_dn.show()
is shown in Figure 12.1.

decnNetworks.py — (continued)

https://aipython.org Version 0.9.9 October 15, 2023

274 12. Planning with Uncertainty

87 Weather = Variable("Weather", ["NoRain", "Rain"], position=(0.5,0.8))

88 Forecast = Variable("Forecast", ["Sunny", "Cloudy", "Rainy"],
position=(0,0.4))
89 # Each variant uses one of the following:
90 Umbrella = DecisionVariable("Umbrella", ["Take", "Leave"], {Forecast},
position=(0.5,0))
91
92 p_weather = Prob(Weather, [], {"NoRain":0.7, "Rain":0.3})
93 p_forecast = Prob(Forecast, [Weather], {"NoRain":{"Sunny":0.7,
"Cloudy":0.2, "Rainy":0.1},
94 "Rain":{"Sunny":0.15,
"Cloudy":0.25, "Rainy":0.6}})
95 umb_utility = UtilityTable([Weather, Umbrella], {"NoRain":{"Take":20,
"Leave":100},
96 "Rain":{"Take":70,
"Leave":0}},
position=(1,0.4))
97
98 umbrella_dn = DecisionNetwork("Umbrella Decision Network",
99 {Weather, Forecast, Umbrella},
100 {p_weather, p_forecast, umb_utility})

The following is a variant with the umbrella decision having 2 parents; nothing
else has changed. This is interesting because one of the parents is not needed;
if the agent knows the weather, it can ignore the forecast.
decnNetworks.py — (continued)

102 Umbrella2p = DecisionVariable("Umbrella", ["Take", "Leave"], {Forecast,

Weather}, position=(0.5,0))
103 umb_utility2p = UtilityTable([Weather, Umbrella2p], {"NoRain":{"Take":20,
"Leave":100},
104 "Rain":{"Take":70,
"Leave":0}},
position=(1,0.4))
105 umbrella_dn2p = DecisionNetwork("Umbrella Decision Network (extra arc)",
106 {Weather, Forecast, Umbrella2p},
107 {p_weather, p_forecast, umb_utility2p})

Fire Decision Network

The fire decision network of Figure 12.2 (showing the result of fire_dn.show())
is represented as:
decnNetworks.py — (continued)

109 boolean = [False, True]

110 Alarm = Variable("Alarm", boolean, position=(0.25,0.633))
111 Fire = Variable("Fire", boolean, position=(0.5,0.9))
112 Leaving = Variable("Leaving", boolean, position=(0.25,0.366))
113 Report = Variable("Report", boolean, position=(0.25,0.1))
114 Smoke = Variable("Smoke", boolean, position=(0.75,0.633))

https://aipython.org Version 0.9.9 October 15, 2023

12.1. Decision Networks 275

Fire Decision Network

Tamper Fire

Alarm Smoke Utility

Leaving Chk_Sm See_Sm

Report Call

Figure 12.2: Fire Decision Network

115 Tamper = Variable("Tamper", boolean, position=(0,0.9))

116
117 See_Sm = Variable("See_Sm", boolean, position=(0.75,0.366) )
118 Chk_Sm = DecisionVariable("Chk_Sm", boolean, {Report}, position=(0.5,
0.366))
119 Call = DecisionVariable("Call", boolean,{See_Sm,Chk_Sm,Report},
position=(0.75,0.1))
120
121 f_ta = Prob(Tamper,[],[0.98,0.02])
122 f_fi = Prob(Fire,[],[0.99,0.01])
123 f_sm = Prob(Smoke,[Fire],[[0.99,0.01],[0.1,0.9]])
124 f_al = Prob(Alarm,[Fire,Tamper],[[[0.9999, 0.0001], [0.15, 0.85]], [[0.01,
0.99], [0.5, 0.5]]])
125 f_lv = Prob(Leaving,[Alarm],[[0.999, 0.001], [0.12, 0.88]])
126 f_re = Prob(Report,[Leaving],[[0.99, 0.01], [0.25, 0.75]])
127 f_ss = Prob(See_Sm,[Chk_Sm,Smoke],[[[1,0],[1,0]],[[1,0],[0,1]]])
128
129 ut =
UtilityTable([Chk_Sm,Fire,Call],[[[0,-200],[-5000,-200]],[[-20,-220],[-5020,-220]]],
position=(1,0.633))
130
131 fire_dn = DecisionNetwork("Fire Decision Network",
132 {Tamper,Fire,Alarm,Leaving,Smoke,Call,See_Sm,Chk_Sm,Report},
133 {f_ta,f_fi,f_sm,f_al,f_lv,f_re,f_ss,ut})

https://aipython.org Version 0.9.9 October 15, 2023

276 12. Planning with Uncertainty

Cheat Decision
Watched Punish

Caught1 Caught2

Cheat_1 Cheat_2 Utility

Grade_1 Grade_2

Fin_Grd

Figure 12.3: Cheating Decision Network

Cheating Decision Network

The following is the representation of the cheating decision of Figure 12.3. Note
that we keep the names of the variables short (less than 8 characters) so that the
tables look good when printed.

decnNetworks.py — (continued)

135 grades = ['A','B','C','F']

136 Watched = Variable("Watched", boolean, position=(0,0.9))
137 Caught1 = Variable("Caught1", boolean, position=(0.2,0.7))
138 Caught2 = Variable("Caught2", boolean, position=(0.6,0.7))
139 Punish = Variable("Punish", ["None","Suspension","Recorded"],
position=(0.8,0.9))
140 Grade_1 = Variable("Grade_1", grades, position=(0.2,0.3))
141 Grade_2 = Variable("Grade_2", grades, position=(0.6,0.3))
142 Fin_Grd = Variable("Fin_Grd", grades, position=(0.8,0.1))
143 Cheat_1 = DecisionVariable("Cheat_1", boolean, set(), position=(0,0.5))
#no parents
144 Cheat_2 = DecisionVariable("Cheat_2", boolean, {Cheat_1,Caught1},
position=(0.4,0.5))
145
146 p_wa = Prob(Watched,[],[0.7, 0.3])
147 p_cc1 = Prob(Caught1,[Watched,Cheat_1],[[[1.0, 0.0], [0.9, 0.1]], [[1.0,
0.0], [0.5, 0.5]]])

https://aipython.org Version 0.9.9 October 15, 2023

12.1. Decision Networks 277

148 p_cc2 = Prob(Caught2,[Watched,Cheat_2],[[[1.0, 0.0], [0.9, 0.1]], [[1.0,

0.0], [0.5, 0.5]]])
149 p_pun = Prob(Punish,[Caught1,Caught2],
150 [[{"None":0,"Suspension":0,"Recorded":0},
151 {"None":0.5,"Suspension":0.4,"Recorded":0.1}],
152 [{"None":0.6,"Suspension":0.2,"Recorded":0.2},
153 {"None":0.2,"Suspension":0.3,"Recorded":0.3}]])
154 p_gr1 = Prob(Grade_1,[Cheat_1], [{'A':0.2, 'B':0.3, 'C':0.3, 'F': 0.2},
155 {'A':0.5, 'B':0.3, 'C':0.2, 'F':0.0}])
156 p_gr2 = Prob(Grade_2,[Cheat_2], [{'A':0.2, 'B':0.3, 'C':0.3, 'F': 0.2},
157 {'A':0.5, 'B':0.3, 'C':0.2, 'F':0.0}])
158 p_fg = Prob(Fin_Grd,[Grade_1,Grade_2],
159 {'A':{'A':{'A':1.0, 'B':0.0, 'C': 0.0, 'F':0.0},
160 'B': {'A':0.5, 'B':0.5, 'C': 0.0, 'F':0.0},
161 'C':{'A':0.25, 'B':0.5, 'C': 0.25, 'F':0.0},
162 'F':{'A':0.25, 'B':0.25, 'C': 0.25, 'F':0.25}},
163 'B':{'A':{'A':0.5, 'B':0.5, 'C': 0.0, 'F':0.0},
164 'B': {'A':0.0, 'B':1, 'C': 0.0, 'F':0.0},
165 'C':{'A':0.0, 'B':0.5, 'C': 0.5, 'F':0.0},
166 'F':{'A':0.0, 'B':0.25, 'C': 0.5, 'F':0.25}},
167 'C':{'A':{'A':0.25, 'B':0.5, 'C': 0.25, 'F':0.0},
168 'B': {'A':0.0, 'B':0.5, 'C': 0.5, 'F':0.0},
169 'C':{'A':0.0, 'B':0.0, 'C': 1, 'F':0.0},
170 'F':{'A':0.0, 'B':0.0, 'C': 0.5, 'F':0.5}},
171 'F':{'A':{'A':0.25, 'B':0.25, 'C': 0.25, 'F':0.25},
172 'B': {'A':0.0, 'B':0.25, 'C': 0.5, 'F':0.25},
173 'C':{'A':0.0, 'B':0.0, 'C': 0.5, 'F':0.5},
174 'F':{'A':0.0, 'B':0.0, 'C': 0, 'F':1.0}}})
175
176 utc = UtilityTable([Punish,Fin_Grd],
177 {'None':{'A':100, 'B':90, 'C': 70, 'F':50},
178 'Suspension':{'A':40, 'B':20, 'C': 10, 'F':0},
179 'Recorded':{'A':70, 'B':60, 'C': 40, 'F':20}},
180 position=(1,0.5))
181
182 cheating_dn = DecisionNetwork("Cheating Decision Network",
183 {Punish,Caught2,Watched,Fin_Grd,Grade_2,Grade_1,Cheat_2,Caught1,Cheat_1},
184 {p_wa, p_cc1, p_cc2, p_pun, p_gr1, p_gr2,p_fg,utc})

Chain of 3 decisions
The following example is a finite-stage fully-observable Markov decision pro-
cess with a single reward (utility) at the end. It is interesting because the par-
ents do not include all predecessors. The methods we use will work without
change on this, even though the agent does not condition on all of its previous
observations and actions. The output of ch3.show() is shown in Figure 12.4.
decnNetworks.py — (continued)

186 S0 = Variable('S0', boolean, position=(0,0.5))

187 D0 = DecisionVariable('D0', boolean, {S0}, position=(1/7,0.1))

https://aipython.org Version 0.9.9 October 15, 2023

278 12. Planning with Uncertainty

3-chain
Utility

S0 S1 S2 S3

D0 D1 D2

Figure 12.4: A decision network that is a chain of 3 decisions

188 S1 = Variable('S1', boolean, position=(2/7,0.5))

189 D1 = DecisionVariable('D1', boolean, {S1}, position=(3/7,0.1))
190 S2 = Variable('S2', boolean, position=(4/7,0.5))
191 D2 = DecisionVariable('D2', boolean, {S2}, position=(5/7,0.1))
192 S3 = Variable('S3', boolean, position=(6/7,0.5))
193
194 p_s0 = Prob(S0, [], [0.5,0.5])
195 tr = [[[0.1, 0.9], [0.9, 0.1]], [[0.2, 0.8], [0.8, 0.2]]] # 0 is flip, 1
is keep value
196 p_s1 = Prob(S1, [D0,S0], tr)
197 p_s2 = Prob(S2, [D1,S1], tr)
198 p_s3 = Prob(S3, [D2,S2], tr)
199
200 ch3U = UtilityTable([S3],[0,1], position=(7/7,0.9))
201
202 ch3 = DecisionNetwork("3-chain",
{S0,D0,S1,D1,S2,D2,S3},{p_s0,p_s1,p_s2,p_s3,ch3U})
203 #rc3 = RC_DN(ch3)
204 #rc3.optimize()
205 #rc3.show_policy()

https://aipython.org Version 0.9.9 October 15, 2023

12.1. Decision Networks 279

12.1.2 Decision Functions

The output of an optimation function is an optimal policy, a list of decision
functions, and the expected value of the optimal policy. A decision function is
the action for each decision variable as a function of its parents.
decnNetworks.py — (continued)

207 class DictFactor(Factor):

208 """A factor the represents the values using a dicionary"""
209 def __init__(self, *pargs, **kwargs):
210 self.values = {}
211 Factor.__init__(self, *pargs, **kwargs)
212
213 def assign(self, assignment, value):
214 self.values[frozenset(assignment.items())] = value
215
216 def get_value(self, assignment):
217 ass = frozenset(assignment.items())
218 assert ass in self.values, f"assignment {assignment} cannot be
evaluated"
219 return self.values[ass]
220
221 class DecisionFunction(DictFactor):
222 def __init__(self, decision, parents):
223 """ A decision function
224 decision is a decision variable
225 parents is a set of variables
226 """
227 self.decision = decision
228 self.parent = parents
229 DictFactor.__init__(self, parents, name=decision.name)

12.1.3 Recursive Conditioning for decision networks

An instance of a RC_DN object takes in a decision network. The query method
uses recursive conditioning to compute the expected utility of the optimal pol-
icy. self.opt_policy becomes the optimal policy.
decnNetworks.py — (continued)

231 import math

232 from probGraphicalModels import GraphicalModel, InferenceMethod
233 from probFactors import Factor
234 from probRC import connected_components
235
236 class RC_DN(InferenceMethod):
237 """The class that queries graphical models using recursive conditioning
238
239 gm is graphical model to query
240 """
241

https://aipython.org Version 0.9.9 October 15, 2023

280 12. Planning with Uncertainty

242 def init(self,gm=None):

243 self.gm = gm
244 self.cache = {(frozenset(), frozenset()):1}
245 ## self.max_display_level = 3
246
247 def optimize(self, split_order=None, algorithm=None):
248 """computes expected utility, and creates optimal decision
functions, where
249 elim_order is a list of the non-observed non-query variables in gm
250 algorithm is the (search algortithm to use). Default is self.rc
251 """
252 if algorithm is None:
253 algorithm = self.rc
254 if split_order == None:
255 split_order = self.gm.split_order()
256 self.opt_policy = {v:DecisionFunction(v, v.parents)
257 for v in self.gm.variables
258 if isinstance(v,DecisionVariable)}
259 return algorithm({}, self.gm.factors, split_order)
260
261 def show_policy(self):
262 print('\n'.join(df.to_table() for df in self.opt_policy.values()))

The following us the simplest search-based algorithm. It is exponential in

the number of variables, so is not very useful. However, it is simple, and useful
to understand before looking at the more complicated algorithm. Note that the
above code does not call rc0; you will need to change the self.rc to self.rc0
in above code to use it.

decnNetworks.py — (continued)

264 def rc0(self, context, factors, split_order):

265 """simplest search algorithm
266 context is a variable:value dictionary
267 factors is a set of factors
268 split_order is a list of variables in factors that are not in
context
269 """
270 self.display(3,"calling rc0,",(context,factors),"with
SO",split_order)
271 if not factors:
272 return 1
273 elif to_eval := {fac for fac in factors if
fac.can_evaluate(context)}:
274 self.display(3,"rc0 evaluating factors",to_eval)
275 val = math.prod(fac.get_value(context) for fac in to_eval)
276 return val * self.rc0(context, factors-to_eval, split_order)
277 else:
278 var = split_order[0]
279 self.display(3, "rc0 branching on", var)
280 if isinstance(var,DecisionVariable):

https://aipython.org Version 0.9.9 October 15, 2023

12.1. Decision Networks 281

281 assert set(context) <= set(var.parents), f"cannot optimize

{var} in context {context}"
282 maxres = -math.inf
283 for val in var.domain:
284 self.display(3,"In rc0, branching on",var,"=",val)
285 newres = self.rc0({var:val}|context, factors,
split_order[1:])
286 if newres > maxres:
287 maxres = newres
288 theval = val
289 self.opt_policy[var].assign(context,theval)
290 return maxres
291 else:
292 total = 0
293 for val in var.domain:
294 total += self.rc0({var:val}|context, factors,
split_order[1:])
295 self.display(3, "rc0 branching on", var,"returning", total)
296 return total

We can combine the optimization for decision networks above, with the
improvements of recursive conditioning used for graphical models (Section
9.7, page 218).

decnNetworks.py — (continued)

298 def rc(self, context, factors, split_order):

299 """ returns the number \sum_{split_order} \prod_{factors} given
assignments in context
300 context is a variable:value dictionary
301 factors is a set of factors
302 split_order is a list of variables in factors that are not in
context
303 """
304 self.display(3,"calling rc,",(context,factors))
305 ce = (frozenset(context.items()), frozenset(factors)) # key for the
cache entry
306 if ce in self.cache:
307 self.display(2,"rc cache lookup",(context,factors))
308 return self.cache[ce]
309 # if not factors: # no factors; needed if you don't have forgetting
and caching
310 # return 1
311 elif vars_not_in_factors := {var for var in context
312 if not any(var in fac.variables for
fac in factors)}:
313 # forget variables not in any factor
314 self.display(3,"rc forgetting variables", vars_not_in_factors)
315 return self.rc({key:val for (key,val) in context.items()
316 if key not in vars_not_in_factors},
317 factors, split_order)

https://aipython.org Version 0.9.9 October 15, 2023

282 12. Planning with Uncertainty

318 elif to_eval := {fac for fac in factors if

fac.can_evaluate(context)}:
319 # evaluate factors when all variables are assigned
320 self.display(3,"rc evaluating factors",to_eval)
321 val = math.prod(fac.get_value(context) for fac in to_eval)
322 if val == 0:
323 return 0
324 else:
325 return val * self.rc(context, {fac for fac in factors if fac
not in to_eval}, split_order)
326 elif len(comp := connected_components(context, factors,
split_order)) > 1:
327 # there are disconnected components
328 self.display(2,"splitting into connected components",comp)
329 return(math.prod(self.rc(context,f,eo) for (f,eo) in comp))
330 else:
331 assert split_order, f"split_order empty rc({context},{factors})"
332 var = split_order[0]
333 self.display(3, "rc branching on", var)
334 if isinstance(var,DecisionVariable):
335 assert set(context) <= set(var.parents), f"cannot optimize
{var} in context {context}"
336 maxres = -math.inf
337 for val in var.domain:
338 self.display(3,"In rc, branching on",var,"=",val)
339 newres = self.rc({var:val}|context, factors,
split_order[1:])
340 if newres > maxres:
341 maxres = newres
342 theval = val
343 self.opt_policy[var].assign(context,theval)
344 self.cache[ce] = maxres
345 return maxres
346 else:
347 total = 0
348 for val in var.domain:
349 total += self.rc({var:val}|context, factors,
split_order[1:])
350 self.display(3, "rc branching on", var,"returning", total)
351 self.cache[ce] = total
352 return total

Here is how to run the optimize the example decision networks:

decnNetworks.py — (continued)

354 # Umbrella decision network

355 #urc = RC_DN(umbrella_dn)
356 #urc.optimize(algorithm=urc.rc0) #RC0
357 #urc.optimize() #RC
358 #urc.show_policy()
359

https://aipython.org Version 0.9.9 October 15, 2023

12.1. Decision Networks 283

360 #rc_fire = RC_DN(fire_dn)

361 #rc_fire.optimize()
362 #rc_fire.show_policy()
363
364 #rc_cheat = RC_DN(cheating_dn)
365 #rc_cheat.optimize()
366 #rc_cheat.show_policy()
367
368 #rc_ch3 = RC_DN(ch3)
369 #rc_ch3.optimize()
370 #rc_ch3.show_policy()

12.1.4 Variable elimination for decision networks

VE DN is variable elimination for decision networks. The method optimize is
used to optimize all the decisions. Note that optimize requires a legal elimina-
tion ordering of the random and decision variables, otherwise it will give an
exception. (A decision node can only be maximized if the variables that are not
its parents have already been eliminated.)

decnNetworks.py — (continued)

372 from probVE import VE

373
374 class VE_DN(VE):
375 """Variable Elimination for Decision Networks"""
376 def __init__(self,dn=None):
377 """dn is a decision network"""
378 VE.__init__(self,dn)
379 self.dn = dn
380
381 def optimize(self,elim_order=None,obs={}):
382 if elim_order == None:
383 elim_order = reversed(self.gm.split_order())
384 self.opt_policy = {}
385 proj_factors = [self.project_observations(fact,obs)
386 for fact in self.dn.factors]
387 for v in elim_order:
388 if isinstance(v,DecisionVariable):
389 to_max = [fac for fac in proj_factors
390 if v in fac.variables and set(fac.variables) <=
v.all_vars]
391 assert len(to_max)==1, "illegal variable order
"+str(elim_order)+" at "+str(v)
392 newFac = FactorMax(v, to_max[0])
393 self.opt_policy[v]=newFac.decision_fun
394 proj_factors = [fac for fac in proj_factors if fac is not
to_max[0]]+[newFac]
395 self.display(2,"maximizing",v )
396 self.display(3,newFac)

https://aipython.org Version 0.9.9 October 15, 2023

284 12. Planning with Uncertainty

397 else:
398 proj_factors = self.eliminate_var(proj_factors, v)
399 assert len(proj_factors)==1,"Should there be only one element of
proj_factors?"
400 return proj_factors[0].get_value({})
401
402 def show_policy(self):
403 print('\n'.join(df.to_table() for df in self.opt_policy.values()))

decnNetworks.py — (continued)

405 class FactorMax(TabFactor):

406 """A factor obtained by maximizing a variable in a factor.
407 Also builds a decision_function. This is based on FactorSum.
408 """
409
410 def __init__(self, dvar, factor):
411 """dvar is a decision variable.
412 factor is a factor that contains dvar and only parents of dvar
413 """
414 self.dvar = dvar
415 self.factor = factor
416 vars = [v for v in factor.variables if v is not dvar]
417 Factor.__init__(self,vars)
418 self.values = {}
419 self.decision_fun = DecisionFunction(dvar, dvar.parents)
420
421 def get_value(self,assignment):
422 """lazy implementation: if saved, return saved value, else compute
it"""
423 new_asst = {x:v for (x,v) in assignment.items() if x in
self.variables}
424 asst = frozenset(new_asst.items())
425 if asst in self.values:
426 return self.values[asst]
427 else:
428 max_val = float("-inf") # -infinity
429 for elt in self.dvar.domain:
430 fac_val = self.factor.get_value(assignment|{self.dvar:elt})
431 if fac_val>max_val:
432 max_val = fac_val
433 best_elt = elt
434 self.values[asst] = max_val
435 self.decision_fun.assign(assignment, best_elt)
436 return max_val
Here are some example queries:
decnNetworks.py — (continued)

438 # Example queries:

439 # vf = VE_DN(fire_dn)
440 # vf.optimize()

https://aipython.org Version 0.9.9 October 15, 2023

12.2. Markov Decision Processes 285

441 # vf.show_policy()
442
443 # VE_DN.max_display_level = 3 # if you want to show lots of detail
444 # vc = VE_DN(cheating_dn)
445 # vc.optimize()
446 # vc.show_policy()
447
448
449 def test(dn):
450 rc0dn = RC_DN(dn)
451 rc0v = rc0dn.optimize(algorithm=rc0dn.rc0)
452 rcdn = RC_DN(dn)
453 rcv = rcdn.optimize()
454 assert abs(rc0v-rcv)<1e-10, f"rc0 produces {rc0v}; rc produces {rcv}"
455 vedn = VE_DN(dn)
456 vev = vedn.optimize()
457 assert abs(vev-rcv)<1e-10, f"VE_DN produces {vev}; RC produces {rcv}"
458 print(f"passed unit test. rc0, rc and VE gave same result for {dn}")
459
460 if __name__ == "__main__":
461 test(fire_dn)

12.2 Markov Decision Processes

We will represent a Markov decision process (MDP) directly, rather than using
the recursive conditioning or variable elimination code, as we did for decision
networks.
mdpProblem.py — Representations for Markov Decision Processes
11 import random
12 import matplotlib.pyplot as plt
13 from matplotlib.widgets import Button, CheckButtons, TextBox
14 from display import Displayable
15 from utilities import argmaxd
16
17 class MDP(Displayable):
18 """A Markov Decision Process. Must define:
19 self.states the set (or list) of states
20 self.actions the set (or list) of actions
21 self.discount a real-valued discount
22 """
23
24 def __init__(self, states, actions, discount, init=0):
25 self.states = states
26 self.actions = actions
27 self.discount = discount
28 self.initv = self.v = {s:init for s in self.states}
29 self.initq = self.q = {s: {a: init for a in self.actions} for s in
self.states}

https://aipython.org Version 0.9.9 October 15, 2023

286 12. Planning with Uncertainty

30
31 def P(self,s,a):
32 """Transition probability function
33 returns a dictionary of {s1:p1} such that P(s1 | s,a)=p1. Other
probabilities are zero.
34 """
35 raise NotImplementedError("P") # abstract method
36
37 def R(self,s,a):
38 """Reward function R(s,a)
39 returns the expected reward for doing a in state s.
40 """
41 raise NotImplementedError("R") # abstract method
Two state partying example (Example 9.27 in Poole and Mackworth [2017]):
mdpExamples.py — MDP Examples
11 from mdpProblem import MDP, GridMDP
12 import matplotlib.pyplot as plt
13
14 class party(MDP):
15 """Simple 2-state, 2-Action Partying MDP Example"""
16 def __init__(self, discount=0.9):
17 states = {'healthy','sick'}
18 actions = {'relax', 'party'}
19 MDP.__init__(self, states, actions, discount)
20
21 def R(self,s,a):
22 "R(s,a)"
23 return { 'healthy': {'relax': 7, 'party': 10},
24 'sick': {'relax': 0, 'party': 2 }}[s][a]
25
26 def P(self,s,a):
27 "returns a dictionary of {s1:p1} such that P(s1 | s,a)=p1. Other
probabilities are zero."
28 phealthy = { # P('healthy' | s, a)
29 'healthy': {'relax': 0.95, 'party': 0.7},
30 'sick': {'relax': 0.5, 'party': 0.1 }}[s][a]
31 return {'healthy':phealthy, 'sick':1-phealthy}
The next example is the tiny game from Example 12.1 and Figure 12.1 of
Poole and Mackworth [2017]. The state is represented as (x, y) where x counts
from zero from the left, and y counts from zero upwards, so the state (0, 0) is on
the bottom-left state. The actions are upC for up-careful, and upR for up-risky.
(Note that GridMDP is just a type of MDP for which we have methods to show;
you can assume it is just MDP here).
mdpExamples.py — (continued)

34 class MDPtiny(GridMDP):
35 def __init__(self, discount=0.9):
36 actions = ['right', 'upC', 'left', 'upR']

https://aipython.org Version 0.9.9 October 15, 2023

12.2. Markov Decision Processes 287

37 self.x_dim = 2 # x-dimension
38 self.y_dim = 3
39 states = [(x,y) for x in range(self.x_dim) for y in
range(self.y_dim)]
40 # for GridMDP
41 self.xoff = {'right':0.25, 'upC':0, 'left':-0.25, 'upR':0}
42 self.yoff = {'right':0, 'upC':-0.25, 'left':0, 'upR':0.25}
43 GridMDP.__init__(self, states, actions, discount)
44
45 def P(self,s,a):
46 """return a dictionary of {s1:p1} if P(s1 | s,a)=p1. Other
probabilities are zero.
47 """
48 (x,y) = s
49 if a == 'right':
50 return {(1,y):1}
51 elif a == 'upC':
52 return {(x,min(y+1,2)):1}
53 elif a == 'left':
54 if (x,y) == (0,2): return {(0,0):1}
55 else: return {(0,y): 1}
56 elif a == 'upR':
57 if x==0:
58 if y<2: return {(x,y):0.1, (x+1,y):0.1, (x,y+1):0.8}
59 else: # at (0,2)
60 return {(0,0):0.1, (1,2): 0.1, (0,2): 0.8}
61 elif y < 2: # x==1
62 return {(0,y):0.1, (1,y):0.1, (1,y+1):0.8}
63 else: # at (1,2)
64 return {(0,2):0.1, (1,2): 0.9}
65
66 def R(self,s,a):
67 (x,y) = s
68 if a == 'right':
69 return [0,-1][x]
70 elif a == 'upC':
71 return [-1,-1,-2][y]
72 elif a == 'left':
73 if x==0:
74 return [-1, -100, 10][y]
75 else: return 0
76 elif a == 'upR':
77 return [[-0.1, -10, 0.2],[-0.1, -0.1, -0.9]][x][y]
78 # at (0,2) reward is 0.1*10+0.8*-1=0.2
Here is the domain of Example 9.28 of Poole and Mackworth [2017]. Here
the state is represented as (x, y) where x counts from zero from the left, and y
counts from zero upwards, so the state (0, 0) is on the bottom-left state.
mdpExamples.py — (continued)

80 class grid(GridMDP):

https://aipython.org Version 0.9.9 October 15, 2023

288 12. Planning with Uncertainty

81 """ x_dim * y_dim grid with rewarding states"""

82 def __init__(self, discount=0.9, x_dim=10, y_dim=10):
83 self.x_dim = x_dim # size in x-direction
84 self.y_dim = y_dim # size in y-direction
85 actions = ['up', 'down', 'right', 'left']
86 states = [(x,y) for x in range(y_dim) for y in range(y_dim)]
87 self.rewarding_states = {(3,2):-10, (3,5):-5, (8,2):10, (7,7):3}
88 self.fling_states = {(8,2), (7,7)}
89 self.xoff = {'right':0.25, 'up':0, 'left':-0.25, 'down':0}
90 self.yoff = {'right':0, 'up':0.25, 'left':0, 'down':-0.25}
91 GridMDP.__init__(self, states, actions, discount)
92
93 def intended_next(self,s,a):
94 """returns the next state in the direction a.
95 This is where the agent will end up if to goes in its
intended_direction
96 (which it does with probability 0.7).
97 """
98 (x,y) = s
99 if a=='up':
100 return (x, y+1 if y+1 < self.y_dim else y)
101 if a=='down':
102 return (x, y-1 if y > 0 else y)
103 if a=='right':
104 return (x+1 if x+1 < self.x_dim else x,y)
105 if a=='left':
106 return (x-1 if x > 0 else x,y)
107
108 def P(self,s,a):
109 """return a dictionary of {s1:p1} if P(s1 | s,a)=p1. Other
probabilities are zero.
110 Corners are tricky because different actions result in same state.
111 """
112 if s in self.fling_states:
113 return {(0,0): 0.25, (self.x_dim-1,0):0.25,
(0,self.y_dim-1):0.25, (self.x_dim-1,self.y_dim-1):0.25}
114 res = dict()
115 for ai in self.actions:
116 s1 = self.intended_next(s,ai)
117 ps1 = 0.7 if ai==a else 0.1
118 if s1 in res: # occurs in corners
119 res[s1] += ps1
120 else:
121 res[s1] = ps1
122 return res
123
124 def R(self,s,a):
125 if s in self.rewarding_states:
126 return self.rewarding_states[s]
127 else:

https://aipython.org Version 0.9.9 October 15, 2023

12.2. Markov Decision Processes 289

128 (x,y) = s
129 rew = 0
130 # rewards from crashing:
131 if y==0: ## on bottom.
132 rew += -0.7 if a == 'down' else -0.1
133 if y==self.y_dim-1: ## on top.
134 rew += -0.7 if a == 'up' else -0.1
135 if x==0: ## on left
136 rew += -0.7 if a == 'left' else -0.1
137 if x==self.x_dim-1: ## on right.
138 rew += -0.7 if a == 'right' else -0.1
139 return rew

12.2.1 Value Iteration

This implements value iteration.
This uses indexes of the states and actions (not the names). The value func-
tion is represented so v[s] is the value of state with index s. A Q function is
represented so q[s][a] is the value for doing action with index a state with index
s. Similarly a policy π is represented as a list where pi[s], where s is the index
of a state, returns the index of the action.
mdpProblem.py — (continued)

43 def vi(self, n):

44 """carries out n iterations of value iteration, updating value
function self.v
45 Returns a Q-function, value function, policy
46 """
47 self.display(3,"calling vi")
48 assert n>0,"You must carry out at least one iteration of vi.
n="+str(n)
49 #v = v0 if v0 is not None else {s:0 for s in self.states}
50 for i in range(n):
51 self.q = {s: {a: self.R(s,a)+self.discount*sum(p1*self.v[s1]
52 for (s1,p1) in
self.P(s,a).items())
53 for a in self.actions}
54 for s in self.states}
55 self.v = {s: max(self.q[s][a] for a in self.actions)
56 for s in self.states}
57 self.pi = {s: argmaxd(self.q[s])
58 for s in self.states}
59 return self.q, self.v, self.pi
The following shows how this can be used.
mdpExamples.py — (continued)

141 ## Testing value iteration

142 # Try the following:
143 # pt = party(discount=0.9)

https://aipython.org Version 0.9.9 October 15, 2023

290 12. Planning with Uncertainty

144 # pt.vi(1)
145 # pt.vi(100)
146 # party(discount=0.99).vi(100)
147 # party(discount=0.4).vi(100)
148
149 # gr = grid(discount=0.9)
150 # gr.show()
151 # q,v,pi = gr.vi(100)
152 # q[(7,2)]

12.2.2 Showing Grid MDPs

A GridMDP is a type of MDP where we the states are (x,y) positions. It is a
special sort of MDP only because we have methods to show it.

mdpProblem.py — (continued)

61 class GridMDP(MDP):
62 def __init__(self, states, actions, discount):
63 MDP.__init__(self, states, actions, discount)
64
65 def show(self):
66 #plt.ion() # interactive
67 fig,(self.ax) = plt.subplots()
68 plt.subplots_adjust(bottom=0.2)
69 stepB = Button(plt.axes([0.8,0.05,0.1,0.075]), "step")
70 stepB.on_clicked(self.on_step)
71 resetB = Button(plt.axes([0.65,0.05,0.1,0.075]), "reset")
72 resetB.on_clicked(self.on_reset)
73 self.qcheck = CheckButtons(plt.axes([0.2,0.05,0.35,0.075]),
74 ["show q-values","show policy"])
75 self.qcheck.on_clicked(self.show_vals)
76 self.font_box = TextBox(plt.axes([0.1,0.05,0.05,0.075]),"Font:",
textalignment="center")
77 self.font_box.on_submit(self.set_font_size)
78 self.font_box.set_val(str(plt.rcParams['font.size']))
79 self.show_vals(None)
80 plt.show()
81
82 def set_font_size(self, s):
83 plt.rcParams.update({'font.size': eval(s)})
84 plt.draw()
85
86 def show_vals(self,event):
87 self.ax.cla()
88 array = [[self.v[(x,y)] for x in range(self.x_dim)]
89 for y in range(self.y_dim)]
90 self.ax.pcolormesh([x-0.5 for x in range(self.x_dim+1)],
91 [x-0.5 for x in range(self.y_dim+1)],
92 array, edgecolors='black',cmap='summer')

https://aipython.org Version 0.9.9 October 15, 2023

12.2. Markov Decision Processes 291

93 # for cmap see

https://matplotlib.org/stable/tutorials/colors/colormaps.html
94 if self.qcheck.get_status()[1]: # "show policy"
95 for (x,y) in self.q:
96 maxv = max(self.q[(x,y)][a] for a in self.actions)
97 for a in self.actions:
98 if self.q[(x,y)][a] == maxv:
99 # draw arrow in appropriate direction
100 self.ax.arrow(x,y,self.xoff[a]*2,self.yoff[a]*2,
101 color='red',width=0.05, head_width=0.2,
length_includes_head=True)
102 if self.qcheck.get_status()[0]: # "show q-values"
103 self.show_q(event)
104 else:
105 self.show_v(event)
106 self.ax.set_xticks(range(self.x_dim))
107 self.ax.set_xticklabels(range(self.x_dim))
108 self.ax.set_yticks(range(self.y_dim))
109 self.ax.set_yticklabels(range(self.y_dim))
110 plt.draw()
111
112 def on_step(self,event):
113 self.vi(1)
114 self.show_vals(event)
115
116 def show_v(self,event):
117 """show values"""
118 for (x,y) in self.v:
119 self.ax.text(x,y,"{val:.2f}".format(val=self.v[(x,y)]),ha='center')
120
121 def show_q(self,event):
122 """show q-values"""
123 for (x,y) in self.q:
124 for a in self.actions:
125 self.ax.text(x+self.xoff[a],y+self.yoff[a],
126 "{val:.2f}".format(val=self.q[(x,y)][a]),ha='center')
127
128 def on_reset(self,event):
129 self.v = self.initv
130 self.q = self.initq
131 self.show_vals(event)

Figure 12.5 shows the user interface for the tiny domain, which can be ob-
tained using
MDPtiny(discount=0.9).show()
resizing it, checking “show q-values” and “show policy”, and clicking “step” a
few times.

https://aipython.org Version 0.9.9 October 15, 2023

292 12. Planning with Uncertainty

24.27 21.71
2 28.09 22.34 25.03 21.34
23.03 20.34

14.10 21.84
1 -78.56 19.25 21.44 18.25
24.03 21.34

20.53 18.78
0 17.09 16.67 18.09 15.67
20.44 18.25

0 1

show q-values reset step

show policy

Figure 12.5: Interface for tiny example, after a number of steps. Each rectangle
represents a state. In each rectangle are the 4 Q-values for the state. The leftmost
number is the for the left action; the rightmost number is for the right action; the
upper most is for the upR (up-risky) action and the lowest number is for the
upC action. The arrow points to the action(s) with the maximum Q-value. Use
MDPtiny().show() after loading mdpExamples.py

To run the demo in class do:

% python -i mdpExamples.py
MDPtiny(discount=0.9).show()

Figure 12.6 shows the user interface for the grid domain, which can be ob-
tained using
grid(discount=0.9).show()
resizing it, checking “show q-values” and “show policy”, and clicking “step” a
few times.
Exercise 12.1 Computing q before v may seem like a waste of space because we
don’t need to store q in order to compute value function or the policy. Change the

https://aipython.org Version 0.9.9 October 15, 2023

12.2. Markov Decision Processes 293

0.12 0.54 0.85 1.18 1.57 2.01 2.50 2.89 2.57 2.03
9 0.12 0.94 0.92 1.32 1.27 1.65 1.59 2.01 1.94 2.43 2.35 2.90 2.80 3.37 3.22 3.27 3.39 2.87 2.93 2.03
0.93 1.35 1.68 2.04 2.46 2.94 3.49 3.99 3.58 3.02
0.90 1.33 1.65 2.00 2.40 2.87 3.41 3.82 3.49 2.93
8 0.51 1.33 1.32 1.74 1.68 2.10 2.03 2.51 2.43 3.00 2.90 3.56 3.44 4.17 3.94 4.00 4.21 3.58 3.64 2.72
1.19 1.63 1.99 2.42 2.93 3.52 4.21 4.91 4.32 3.73
1.17 1.59 1.93 2.32 2.82 3.37 4.00 6.01 4.21 3.60
7 0.65 1.48 1.45 1.91 1.83 2.32 2.21 2.82 2.73 3.44 3.31 4.13 3.96 4.97 6.01 6.01 5.12 4.30 4.35 3.42
1.20 1.60 1.90 2.27 3.07 3.69 4.33 6.01 5.10 4.50
1.24 1.67 2.00 2.07 3.07 3.77 4.50 5.34 4.86 4.34
6 0.59 1.39 1.39 1.75 1.69 2.05 1.66 2.41 2.51 3.45 3.40 4.14 4.05 4.83 4.70 5.32 5.10 5.01 5.14 4.23
1.21 1.60 1.70 -0.62 3.07 4.05 4.79 5.57 5.97 5.40
1.21 1.58 1.49 -2.72 2.80 3.91 4.62 5.34 5.71 5.22
5 0.63 1.43 1.41 1.59 1.35 -0.79 -3.07 -2.16 -0.23 3.45 3.54 4.65 4.53 5.50 5.31 6.21 5.96 5.97 6.19 5.20
1.37 1.78 1.77 -2.32 3.38 4.63 5.51 6.45 7.19 6.46
1.29 1.70 1.83 -0.44 3.42 4.49 5.34 6.24 6.86 6.27
4 0.82 1.67 1.64 2.13 2.02 2.58 2.12 3.17 3.26 4.51 4.42 5.48 5.32 6.48 6.25 7.46 7.10 7.13 7.48 6.33
1.43 1.88 2.26 2.46 4.33 5.43 6.47 7.62 8.71 7.69
1.43 1.89 2.24 2.13 4.14 5.24 6.25 7.40 8.29 7.48
3 0.83 1.68 1.65 2.13 2.00 2.57 1.81 3.20 3.43 5.15 5.06 6.39 6.20 7.61 7.39 9.01 8.45 8.50 9.06 7.65
1.34 1.73 1.65 -2.96 4.30 6.08 7.44 9.00 10.61 9.10
1.41 1.81 1.46 -7.13 3.78 5.81 7.07 8.44 13.01 8.59
2 0.72 1.50 1.47 1.47 1.06 -3.31 -8.04 -6.26 -2.38 4.81 4.96 7.05 6.77 8.68 8.26 10.60 13.01 13.01 10.70 8.85
1.44 1.84 1.50 -7.10 3.78 5.81 7.07 8.44 13.01 8.59
1.35 1.76 1.69 -2.91 4.30 6.08 7.44 9.00 10.61 9.11
1 0.87 1.72 1.69 2.19 2.07 2.64 1.89 3.25 3.46 5.16 5.06 6.39 6.20 7.62 7.39 9.01 8.46 8.51 9.06 7.65
1.45 1.99 2.45 2.43 4.15 5.22 6.24 7.40 8.32 7.50
1.39 1.90 2.35 2.94 4.37 5.40 6.46 7.63 8.76 7.71
0 0.78 1.69 1.63 2.28 2.16 2.89 2.75 3.63 3.55 4.53 4.40 5.45 5.29 6.47 6.26 7.50 7.15 7.19 7.52 6.36
0.78 1.34 1.89 2.55 3.44 4.30 5.24 6.29 7.15 6.36
0 1 2 3 4 5 6 7 8 9

show q-values
reset step
show policy

Figure 12.6: Interface for grid example, after a number of steps. Each rectan-
gle represents a state. In each rectangle are the 4 Q-values for the state. The
leftmost number is the for the left action; the rightmost number is for the right
action; the upper most is for the up action and the lowest number is for the down
action. The arrow points to the action(s) with the maximum Q-value. From
grid(discount=0.9).show()

https://aipython.org Version 0.9.9 October 15, 2023

294 12. Planning with Uncertainty

algorithm so that it loops through the states and actions once per iteration, and
only stores the value function and the policy. Note that to get the same results as
before, you would need to make sure that you use the previous value of v in the
computation not the current value of v. Does using the current value of v hurt the
algorithm or make it better (in approaching the actual value function)?

12.2.3 Asynchronous Value Iteration

This implements asynchronous value iteration, storing Q.
A Q function is represented so q[s][a] is the value for doing action with
index a state with index s.
mdpProblem.py — (continued)

133 def avi(self,n):

134 states = list(self.states)
135 actions = list(self.actions)
136 for i in range(n):
137 s = random.choice(states)
138 a = random.choice(actions)
139 self.q[s][a] = (self.R(s,a) + self.discount *
140 sum(p1 * max(self.q[s1][a1]
141 for a1 in self.actions)
142 for (s1,p1) in self.P(s,a).items()))
143 return Q
The following shows how avi can be used.
mdpExamples.py — (continued)

155 ## Testing asynchronous value iteration

156 # Try the following:
157 # pt = party(discount=0.9)
158 # pt.avi(10)
159 # pt.vi(1000)
160
161 # gr = grid(discount=0.9)
162 # q = gr.avi(100000)
163 # q[(7,2)]

Exercise 12.2 Implement value iteration that stores the V-values rather than the
Q-values. Does it work better than storing Q? (What might better mean?)
Exercise 12.3 In asynchronous value iteration, try a number of different ways
to choose the states and actions to update (e.g., sweeping through the state-action
pairs, choosing them at random). Note that the best way may be to determine
which states have had their Q-values change the most, and then update the pre-
vious ones, but that is not so straightforward to implement, because you need to
find those previous states.

https://aipython.org Version 0.9.9 October 15, 2023

Chapter 13

Reinforcement Learning

13.1 Representing Agents and Environments

The reinforcement learning agents and environments are instances of the gen-
eral agent architecture of Section 2.1, where the percepts are reward–state pairs.
The state is the world state; this is the fully observable assumption. In particu-
lar:

• An agent implements the method select_action that takes the reward

and environment state and returns the next action (and updates the state
of the agent).

• An environment implements the method do that takes the action and re-
turns a pair of the reward and the resulting environment state.

These are chained together to simulate the system.

The only difference between the architecture of Section 2.1 is that the sim-
ulation starts by calling the agent method initial_action(state), which is
generally to save the state and return a random action.
The environments have names for the roles of agents participating. In this
chapter, where we assume a single agent, this is used as the name of the envi-
ronment.
rlProblem.py — Representations for Reinforcement Learning
11 import random
12 import math
13 from display import Displayable
14 from agents import Agent, Environment
15 from utilities import pick_from_dist, argmaxe, argmaxd, flip
16
17 class RL_env(Environment):

295
296 13. Reinforcement Learning

18 def init(self, name, actions, state):

19 """creates an environment given name, list of actions, and initial
state"""
20 self.name = name # the role for an agent
21 self.actions = actions # list of all actions
22 self.state = state # initial state
23
24 # must implement do(action)->(reward,state)

rlProblem.py — (continued)

26
27 class RL_agent(Agent):
28 """An RL_Agent
29 has percepts (s, r) for some state s and real reward r
30 """
31 def __init__(self, actions):
32 self.actions = actions
33
34 def initial_action(self, env_state):
35 """return the initial action, and remember the state and action
36 Act randomly initially
37 Could be overridden to initialize data structures (as the agent now
knows about one state)
38 """
39 self.state = env_state
40 self.action = random.choice(self.actions)
41 return self.action
42
43 def select_action(self, reward, state):
44 """
45 Select the action given the reward and next state
46 Remember the action in self.action
47 This implements "Act randomly" and should be overridden!
48 """
49 self.action = random.choice(self.actions)
50 return self.action
This is similar to Simulate of Section 2.1, except it is initialized by agent.initial_action(state).
rlProblem.py — (continued)

52 import matplotlib.pyplot as plt

53
54 class Simulate(Displayable):
55 """simulate the interaction between the agent and the environment
56 for n time steps.
57 Returns a pair of the agent state and the environment state.
58 """
59 def __init__(self, agent, environment):
60 self.agent = agent
61 self.env = environment
62 self.action = agent.initial_action(self.env.state)

https://aipython.org Version 0.9.9 October 15, 2023

13.1. Representing Agents and Environments 297

63 self.reward_history = [] # for plotting

64
65 def go(self, n):
66 for i in range(n):
67 (reward,state) = self.env.do(self.action)
68 self.display(2,f"i={i} reward={reward}, state={state}")
69 self.reward_history.append(reward)
70 self.action = self.agent.select_action(reward,state)
71 self.display(2,f" action={self.action}")
72 return self

The following plots the sum of rewards as a function of the step in a simulation.

rlProblem.py — (continued)

74 def plot(self, label=None, step_size=None, xscale='linear'):

75 """
76 plots the rewards history in the simulation
77 label is the label for the plot
78 step_size is the number of steps between each point plotted
79 xscale is 'log' or 'linear'
80
81 returns sum of rewards
82 """
83 if step_size is None: #for long simulations (> 999), only plot some
points
84 step_size = max(1,len(self.reward_history)//500)
85 if label is None:
86 label = self.agent.method
87 plt.ion()
88 plt.xscale(xscale)
89 plt.xlabel("step")
90 plt.ylabel("Sum of rewards")
91 sum_history, sum_rewards = acc_rews(self.reward_history, step_size)
92 plt.plot(range(0,len(self.reward_history),step_size), sum_history,
label=label)
93 plt.legend()
94 plt.draw()
95 return sum_rewards
96
97 def acc_rews(rews,step_size):
98 """returns the rolling sum of the values, sampled each step_size, and
the sum
99 """
100 acc = []
101 sumr = 0; i=0
102 for e in rews:
103 sumr += e
104 i += 1
105 if (i%step_size == 0): acc.append(sumr)
106 return acc, sumr

https://aipython.org Version 0.9.9 October 15, 2023

298 13. Reinforcement Learning

Here is the definition of the simple 2-state, 2-action decision about whether
to party or relax (Example 12.29 in Poole and Mackworth [2023]).
rlExamples.py — Some example reinforcement learning environments
11 from rlProblem import RL_env
12 class Healthy_env(RL_env):
13 def __init__(self):
14 RL_env.__init__(self, "Party Decision", ["party", "relax"],
"healthy")
15
16 def do(self, action):
17 """updates the state based on the agent doing action.
18 returns reward,state
19 """
20 if self.state=="healthy":
21 if action=="party":
22 self.state = "healthy" if flip(0.7) else "sick"
23 reward = 10
24 else: # action=="relax"
25 self.state = "healthy" if flip(0.95) else "sick"
26 reward = 7
27 else: # self.state=="sick"
28 if action=="party":
29 self.state = "healthy" if flip(0.1) else "sick"
30 reward = 2
31 else:
32 self.state = "healthy" if flip(0.5) else "sick"
33 reward = 0
34 return reward, self.state

13.1.1 Simulating an environment from an MDP

Given the definition for an MDP (page 285), Env from MDP takes in an MDP
and simulates the environment with those dynamics.
Note that the representation of an MDP does not contain enough informa-
tion to simulate a system, because does not specify the distribution over initial
states, and it loses any dependency between the rewards and the resulting state
(e.g., hitting the wall and having a negative reward may be correlated). The
following code assumes the agent always received the average reward for the
state and action.
rlProblem.py — (continued)

109 class Env_from_MDP(RL_env):

110 def __init__(self, mdp):
111 initial_state = random.choice(mdp.states)
112 RL_env.__init__(self, "From MDP", mdp.actions, initial_state)
113 self.mdp = mdp
114
115 def do(self, action):

https://aipython.org Version 0.9.9 October 15, 2023

13.1. Representing Agents and Environments 299

4 P1 R P2

3 M

2 M

1 M M M

0 P3 P4

0 1 2 3 4

Figure 13.1: Monster game

116 """updates the state based on the agent doing action.

117 returns state,reward
118 """
119 reward = self.mdp.R(self.state,action)
120 self.state = pick_from_dist(self.mdp.P(self.state,action))
121 return reward,self.state

13.1.2 Monster Game

This is for the game depicted in Figure 13.1 (Example 13.2 of Poole and Mack-
worth [2023]).
rlExamples.py — (continued)

36 import random
37 from utilities import flip
38 from rlProblem import RL_env
39
40 class Monster_game_env(RL_env):
41 xdim = 5
42 ydim = 5
43
44 vwalls = [(0,3), (0,4), (1,4)] # vertical walls right of these locations
45 hwalls = [] # not implemented
46 crashed_reward = -1
47
48 prize_locs = [(0,0), (0,4), (4,0), (4,4)]
49 prize_apears_prob = 0.3
50 prize_reward = 10
51
52 monster_locs = [(0,1), (1,1), (2,3), (3,1), (4,2)]

https://aipython.org Version 0.9.9 October 15, 2023

300 13. Reinforcement Learning

53 monster_appears_prob = 0.4
54 monster_reward_when_damaged = -10
55 repair_stations = [(1,4)]
56
57 actions = ["up","down","left","right"]
58
59 def __init__(self):
60 # State:
61 self.x = 2
62 self.y = 2
63 self.damaged = False
64 self.prize = None
65 # Statistics
66 self.number_steps = 0
67 self.accumulated_rewards = 0 # sum of rewards received
68 self.min_accumulated_rewards = 0
69 self.min_step = 0
70 self.zero_crossing = 0
71 RL_env.__init__(self, "Monster Game", self.actions, (self.x,
self.y, self.damaged, self.prize))
72 self.display(2,"","Step","Tot Rew","Ave Rew",sep="\t")
73
74 def do(self,action):
75 """updates the state based on the agent doing action.
76 returns reward,state
77 """
78 assert action in self.actions, f"Monster game, unknown action:
{action}"
79 reward = 0.0
80 # A prize can appear:
81 if self.prize is None and flip(self.prize_apears_prob):
82 self.prize = random.choice(self.prize_locs)
83 # Actions can be noisy
84 if flip(0.4):
85 actual_direction = random.choice(self.actions)
86 else:
87 actual_direction = action
88 # Modeling the actions given the actual direction
89 if actual_direction == "right":
90 if self.x==self.xdim-1 or (self.x,self.y) in self.vwalls:
91 reward += self.crashed_reward
92 else:
93 self.x += 1
94 elif actual_direction == "left":
95 if self.x==0 or (self.x-1,self.y) in self.vwalls:
96 reward += self.crashed_reward
97 else:
98 self.x += -1
99 elif actual_direction == "up":
100 if self.y==self.ydim-1:

https://aipython.org Version 0.9.9 October 15, 2023

13.2. Q Learning 301

101 reward += self.crashed_reward

102 else:
103 self.y += 1
104 elif actual_direction == "down":
105 if self.y==0:
106 reward += self.crashed_reward
107 else:
108 self.y += -1
109 else:
110 raise RuntimeError(f"unknown_direction: {actual_direction}")
111
112 # Monsters
113 if (self.x,self.y) in self.monster_locs and
flip(self.monster_appears_prob):
114 if self.damaged:
115 reward += self.monster_reward_when_damaged
116 else:
117 self.damaged = True
118 if (self.x,self.y) in self.repair_stations:
119 self.damaged = False
120
121 # Prizes
122 if (self.x,self.y) == self.prize:
123 reward += self.prize_reward
124 self.prize = None
125
126 # Statistics
127 self.number_steps += 1
128 self.accumulated_rewards += reward
129 if self.accumulated_rewards < self.min_accumulated_rewards:
130 self.min_accumulated_rewards = self.accumulated_rewards
131 self.min_step = self.number_steps
132 if self.accumulated_rewards>0 and reward>self.accumulated_rewards:
133 self.zero_crossing = self.number_steps
134 self.display(2,"",self.number_steps,self.accumulated_rewards,
135 self.accumulated_rewards/self.number_steps,sep="\t")
136
137 return reward, (self.x, self.y, self.damaged, self.prize)

13.2 Q Learning
To run the Q-learning demo, in folder “aipython”, load
“rlQLearner.py”, and copy and paste the example queries at the
bottom of that file.

rlQLearner.py — Q Learning
11 import random
12 import math

https://aipython.org Version 0.9.9 October 15, 2023

302 13. Reinforcement Learning

13 from display import Displayable

14 from utilities import argmaxe, argmaxd, flip
15 from rlProblem import RL_agent, epsilon_greedy, ucb
16
17 class Q_learner(RL_agent):
18 """A Q-learning agent has
19 belief-state consisting of
20 state is the previous state (initialized by RL_agent
21 q is a {(state,action):value} dict
22 visits is a {(state,action):n} dict. n is how many times action was
done in state
23 acc_rewards is the accumulated reward
24
25 """

rlQLearner.py — (continued)

27 def init(self, role, actions, discount,

28 exploration_strategy=epsilon_greedy, es_kwargs={},
29 alpha_fun=lambda _:0.2,
30 Qinit=0, method="Q_learner"):
31 """
32 role is the role of the agent (e.g., in a game)
33 actions is the set of actions the agent can do
34 discount is the discount factor
35 exploration_strategy is the exploration function, default
"epsilon_greedy"
36 es_kwargs is extra arguments of exploration_strategy
37 alpha_fun is a function that computes alpha from the number of
visits
38 Qinit is the initial q-value
39 method gives the method used to implement the role (for plotting)
40 """
41 RL_agent.__init__(self, actions)
42 self.role = role
43 self.discount = discount
44 self.exploration_strategy = exploration_strategy
45 self.es_kwargs = es_kwargs
46 self.alpha_fun = alpha_fun
47 self.Qinit = Qinit
48 self.method = method
49 self.acc_rewards = 0
50 self.Q = {}
51 self.visits = {}
The initial action is a random action. It remembers the state, and initializes the
data structures.
rlQLearner.py — (continued)

53 def initial_action(self, state):

54 """ Returns the initial action; selected at random
55 Initialize Data Structures

https://aipython.org Version 0.9.9 October 15, 2023

13.2. Q Learning 303

56 """
57 self.Q[state] = {act:self.Qinit for act in self.actions}
58 self.action = random.choice(self.actions)
59 self.visits[state] = {act:0 for act in self.actions}
60 self.state = state
61 self.display(2, f"Initial State: {state} Action {self.action}")
62 self.display(2,"s\ta\tr\ts'\tQ")
63 return self.action
64
65 def select_action(self, reward, next_state):
66 """give reward and next state, select next action to be carried
out"""
67 if next_state not in self.visits: # next state not seen before
68 self.Q[next_state] = {act:self.Qinit for act in self.actions}
69 self.visits[next_state] = {act:0 for act in self.actions}
70 self.visits[self.state][self.action] +=1
71 alpha = self.alpha_fun(self.visits[self.state][self.action])
72 self.Q[self.state][self.action] += alpha*(
73 reward
74 + self.discount * max(self.Q[next_state].values())
75 - self.Q[self.state][self.action])
76 self.display(2,self.state, self.action, reward, next_state,
77 self.Q[self.state][self.action], sep='\t')
78 self.state = next_state
79 self.action = self.exploration_strategy(self.Q[next_state],
80 self.visits[next_state],**self.es_kwargs)
81 self.display(3,f"Agent {self.role} doing {self.action} in state
{self.state}")
82 return self.action

Exercise 13.1 Implement SARSA. Hint: it does not do a max in do. Instead it
needs to choose next act before it does the update.

13.2.1 Exploration Strategies

Two explorations strategies are defined: epsilon-greedy and UCB.
In general an exploration strategy takes two arguments

• Qs is a {action : q value} dictionary for the current state

• Vs is a {action : visits} dictionary for the current state; where visits is the
number of times that the action has been carried out in the current state.

rlProblem.py — (continued)

123 def epsilon_greedy(Qs, Vs={}, epsilon=0.1):

124 """select action given epsilon greedy
125 Qs is the {action:Q-value} dictionary for current state
126 Vs is ignored
127 """

https://aipython.org Version 0.9.9 October 15, 2023

304 13. Reinforcement Learning

128 if flip(epsilon):
129 return random.choice(list(Qs.keys())) # act randomly
130 else:
131 return argmaxd(Qs)
132
133 def ucb(Qs, Vs, c=1.4):
134 """select action given upper-confidence bound
135 Qs is the {action:Q-value} dictionary for current state
136 Vs is the {action:visits} dictionary for current state
137
138 0.01 is to prevent divide-by zero (could just be infinity)
139 """
140 Ns = sum(Vs.values())
141 ucb1 = {a:Qs[a]+c*math.sqrt(Ns/(0.01+Vs[a]))
142 for a in Qs.keys()}
143 action = argmaxd(ucb1)
144 return action

Exercise 13.2 Implement a soft-max action selection. Choose a temperature that

works well for the domain. Explain how you picked this temperature. Compare
the epsilon-greedy, soft-max and optimism in the face of uncertainty.

13.2.2 Testing Q-learning

The first tests are for the 2-action 2-state decision about whether to relax or
party (Example 12.29 of Poole and Mackworth [2023].
rlQLearner.py — (continued)

84 ####### TEST CASES ########

85 from rlProblem import Simulate,epsilon_greedy, ucb
86 from rlExamples import Healthy_env, Monster_game_env
87 from rlQLearner import Q_learner
88
89 env = Healthy_env()
90 # Some RL agents with different parameters:
91 ag = Q_learner(env.name, env.actions, 0.7)
92 ag_ucb = Q_learner(env.name, env.actions, 0.7, exploration_strategy = ucb,
es_kwargs={'c':0.1}, method="ucb")
93 ag_opt = Q_learner(env.name, env.actions, 0.7, Qinit=100,
method="optimistic" )
94 ag_exp_m = Q_learner(env.name, env.actions, 0.7,
es_kwargs={'epsilon':0.5}, method="more explore")
95 ag_greedy = Q_learner(env.name, env.actions, 0.1, Qinit=100, method="disc
0.1")
96
97 sim_ag = Simulate(ag,env)
98
99 # sim_ag.go(100)
100 # ag.Q # get the learned Q-values
101 # sim_ag.plot()

https://aipython.org Version 0.9.9 October 15, 2023

13.3. Q-leaning with Experience Replay 305

102 # Simulate(ag_ucb,env).go(100).plot()
103 # Simulate(ag_opt,env).go(100).plot()
104 # Simulate(ag_exp_m,env).go(100).plot()
105 # Simulate(ag_greedy,env).go(100).plot()
106
107
108 from mdpExamples import MDPtiny
109 from rlProblem import Env_from_MDP
110 envt = Env_from_MDP(MDPtiny())
111 agt = Q_learner(envt.name, envt.actions, 0.8)
112 #Simulate(agt, envt).go(1000).plot()
113
114 mon_env = Monster_game_env()
115 mag1 = Q_learner(mon_env.name, mon_env.actions,0.9)
116 #Simulate(mag1,mon_env).go(100000).plot()
117 mag_ucb = Q_learner(mon_env.name, mon_env.actions,0.9,exploration_strategy
= ucb,es_kwargs={'c':0.1},method="UCB(0.1)")
118 #Simulate(mag_ucb,mon_env).go(100000).plot()
119
120 mag2 = Q_learner(mon_env.name, mon_env.actions,
0.9,es_kwargs={'epsilon':0.2},alpha_fun=lambda
k:1/k,method="alpha=1/k")
121 #Simulate(mag2,mon_env).go(100000).plot()
122 mag3 = Q_learner(mon_env.name, mon_env.actions, 0.9,alpha_fun=lambda
k:10/(9+k),method="alpha=10/(9+k)")
123 #Simulate(mag3,mon_env).go(100000).plot()

13.3 Q-leaning with Experience Replay

A bounded buffer remembers values up to size buffer_size. Once it is full, all
old experiences have the same chance of being in the buffer.
rlQExperienceReplay.py — Q-Learner with Experience Replay
11 from rlQLearner import Q_learner
12 from utilities import flip
13 import random
14
15 class BoundedBuffer(object):
16 def __init__(self, buffer_size=1000):
17 self.buffer_size = buffer_size
18 self.buffer = [0]*buffer_size
19 self.number_added = 0
20
21 def add(self,experience):
22 if self.number_added < self.buffer_size:
23 self.buffer[self.number_added] = experience
24 else:
25 if flip(self.buffer_size/self.number_added):
26 position = random.randrange(self.buffer_size)

https://aipython.org Version 0.9.9 October 15, 2023

306 13. Reinforcement Learning

27 self.buffer[position] = experience
28 self.number_added += 1
29
30 def get(self):
31 return self.buffer[random.randrange(min(self.number_added,
self.buffer_size))]

A Q_ER_Learner does Q-leaning with experience replay. It only uses action

replay after burn_in number of steps.

rlQExperienceReplay.py — (continued)

33 class Q_ER_learner(Q_learner):
34 def __init__(self, role, actions, discount,
35 max_buffer_size=10000,
36 num_updates_per_action=5, burn_in=1000,
37 method="Q_ER_learner", **q_kwargs):
38 """Q-learner with experience replay
39 role is the role of the agent (e.g., in a game)
40 actions is the set of actions the agent can do
41 discount is the discount factor
42 max_buffer_size is the maximum number of past experiences that is
remembered
43 burn_in is the number of steps before using old experiences
44 num_updates_per_action is the number of q-updates for past
experiences per action
45 q_kwargs are any extra parameters for Q_learner
46 """
47 Q_learner.__init__(self, role, actions, discount, method=method,
**q_kwargs)
48 self.experience_buffer = BoundedBuffer(max_buffer_size)
49 self.num_updates_per_action = num_updates_per_action
50 self.burn_in = burn_in
51
52 def select_action(self, reward, next_state):
53 """give reward and new state, select next action to be carried
out"""
54 self.experience_buffer.add((self.state,self.action,reward,next_state))
#remember experience
55 if next_state not in self.Q: # Q and visits are defined on the same
states
56 self.Q[next_state] = {act:self.Qinit for act in self.actions}
57 self.visits[next_state] = {act:0 for act in self.actions}
58 self.visits[self.state][self.action] +=1
59 alpha = self.alpha_fun(self.visits[self.state][self.action])
60 self.Q[self.state][self.action] += alpha*(
61 reward
62 + self.discount * max(self.Q[next_state].values())
63 - self.Q[self.state][self.action])
64 self.display(2,self.state, self.action, reward, next_state,
65 self.Q[self.state][self.action], sep='\t')
66 self.state = next_state

https://aipython.org Version 0.9.9 October 15, 2023

13.4. Model-based Reinforcement Learner 307

67 # do some updates from experience buffer

68 if self.experience_buffer.number_added > self.burn_in:
69 for i in range(self.num_updates_per_action):
70 (s,a,r,ns) = self.experience_buffer.get()
71 self.visits[s][a] +=1 # is this correct?
72 alpha = self.alpha_fun(self.visits[s][a])
73 self.Q[s][a] += alpha * (r +
74 self.discount* max(self.Q[ns][na]
75 for na in self.actions)
76 -self.Q[s][a] )
77 ### CHOOSE NEXT ACTION ###
78 self.action = self.exploration_strategy(self.Q[next_state],
79 self.visits[next_state],**self.es_kwargs)
80 self.display(3,f"Agent {self.role} doing {self.action} in state
{self.state}")
81 return self.action

rlQExperienceReplay.py — (continued)

83 from rlProblem import Simulate

84 from rlExamples import Monster_game_env
85 from rlQLearner import mag1, mag2, mag3
86
87 mon_env = Monster_game_env()
88 mag1ar = Q_ER_learner(mon_env.name, mon_env.actions,0.9,method="Q_ER")
89 # Simulate(mag1ar,mon_env).go(100000).plot()
90
91 mag3ar = Q_ER_learner(mon_env.name, mon_env.actions, 0.9, alpha_fun=lambda
k:10/(9+k),method="Q_ER alpha=10/(9+k)")
92 # Simulate(mag3ar,mon_env).go(100000).plot()

13.4 Model-based Reinforcement Learner

To run the demo, in folder “aipython”, load “rlModelLearner.py”, and
copy and paste the example queries at the bottom of that file. This
assumes Python 3.

A model-based reinforcement learner builds a Markov decision process model

of the domain, simultaneously learns the model and plans with that model.
The model-based reinforcement learner used the following data structures:

• Q[s][a] is dictionary that, given state s and action a returns the Q-value,
the estimate of the future (discounted) value of being in state s and doing
action a.

• R[s][a] is dictionary that, given a (s, a) state s and action a is the average
reward received from doing a in state s.

https://aipython.org Version 0.9.9 October 15, 2023

308 13. Reinforcement Learning

• T [s][a][s′ ] is dictionary that, given states s and s′ and action a returns the
number of times a was done in state s and the result was state s′ . Note
that s′ is only a key if it has been the result of doing a in s; there are no 0
counts recorded.

• visits[s][a] is dictionary that, given state s and action a returns the number
of times action a was carried out in state s. This is the C of Figure 13.6 of
Poole and Mackworth [2023].
Note that visits[s][a] = ∑s′ T [s][a][s′ ] but is stored separately to keep the
code more readable.

The main difference to Figure 13.6 of Poole and Mackworth [2023] is the code
below does a fixed number of asynchronous value iteration updates per step.
rlModelLearner.py — Model-based Reinforcement Learner
11 import random
12 from rlProblem import RL_agent, Simulate, epsilon_greedy, ucb
13 from display import Displayable
14 from utilities import argmaxe, flip
15
16 class Model_based_reinforcement_learner(RL_agent):
17 """A Model-based reinforcement learner
18 """
19
20 def __init__(self, role, actions, discount,
21 exploration_strategy=epsilon_greedy, es_kwargs={},
22 Qinit=0,
23 updates_per_step=10, method="MBR_learner"):
24 """role is the role of the agent (e.g., in a game)
25 actions is the list of actions the agent can do
26 discount is the discount factor
27 explore is the proportion of time the agent will explore
28 Qinit is the initial value of the Q's
29 updates_per_step is the number of AVI updates per action
30 label is the label for plotting
31 """
32 RL_agent.__init__(self, actions)
33 self.role = role
34 self.actions = actions
35 self.discount = discount
36 self.exploration_strategy = exploration_strategy
37 self.es_kwargs = es_kwargs
38 self.Qinit = Qinit
39 self.updates_per_step = updates_per_step
40 self.method = method

rlModelLearner.py — (continued)

42 def initial_action(self, state):

43 """ Returns the initial action; selected at random

https://aipython.org Version 0.9.9 October 15, 2023

13.4. Model-based Reinforcement Learner 309

44 Initialize Data Structures

45
46 """
47 self.state = state
48 self.T = {self.state: {a: {} for a in self.actions}}
49 self.visits = {self.state: {a: 0 for a in self.actions}}
50 self.Q = {self.state: {a: self.Qinit for a in self.actions}}
51 self.R = {self.state: {a: 0 for a in self.actions}}
52 self.states_list = [self.state] # list of states encountered
53 self.action = random.choice(self.actions)
54 self.display(2, f"Initial State: {state} Action {self.action}")
55 self.display(2,"s\ta\tr\ts'\tQ")
56 return self.action

rlModelLearner.py — (continued)

58 def select_action(self, reward, next_state):

59 """do num_steps of interaction with the environment
60 for each action, do updates_per_step iterations of asynchronous
value iteration
61 """
62 if next_state not in self.visits: # has not been encountered before
63 self.states_list.append(next_state)
64 self.visits[next_state] = {a:0 for a in self.actions}
65 self.T[next_state] = {a:{} for a in self.actions}
66 self.Q[next_state] = {a:self.Qinit for a in self.actions}
67 self.R[next_state] = {a:0 for a in self.actions}
68 if next_state in self.T[self.state][self.action]:
69 self.T[self.state][self.action][next_state] += 1
70 else:
71 self.T[self.state][self.action][next_state] = 1
72 self.visits[self.state][self.action] += 1
73 self.R[self.state][self.action] +=
(reward-self.R[self.state][self.action])/self.visits[self.state][self.action]
74 st,act = self.state,self.action #initial state-action pair for AVI
75 for update in range(self.updates_per_step):
76 self.Q[st][act] = self.R[st][act]+self.discount*(
77 sum(self.T[st][act][nst]/self.visits[st][act]*
78 max(self.Q[nst][nact] for nact in self.actions)
79 for nst in self.T[st][act].keys()))
80 st = random.choice(self.states_list)
81 act = random.choice(self.actions)
82 self.state = next_state
83 self.action = self.exploration_strategy(self.Q[next_state],
84 self.visits[next_state],**self.es_kwargs)
85 return self.action

rlModelLearner.py — (continued)

87 from rlExamples import Monster_game_env

88 mon_env = Monster_game_env()

https://aipython.org Version 0.9.9 October 15, 2023

310 13. Reinforcement Learning

89 mbl1 = Model_based_reinforcement_learner(mon_env.name, mon_env.actions,

0.9, updates_per_step=10,method="model-based(10)")
90 # Simulate(mbl1,mon_env).go(100000).plot()
91 mbl2 = Model_based_reinforcement_learner(mon_env.name, mon_env.actions,
0.9, updates_per_step=1, method="model-based(1)")
92 # Simulate(mbl2,mon_env).go(100000).plot()

Exercise 13.3 If there was only one update per step, the algorithm can be made
simpler and use less space. Explain how. Does it make it more efficient? Is it
worthwhile having more than one update per step for the games implemented
here?
Exercise 13.4 It is possible to implement the model-based reinforcement learner
by replacing Q, R, T, visits, res states with a single dictionary that, given a state and
action returns a tuple corresponding to these data structures. Does this make the
algorithm easier to understand? Does this make the algorithm more efficient?
Exercise 13.5 If the states and the actions were mapped into integers, the dic-
tionaries could be implemented perhaps more efficiently as arrays. How does the
code need to change?. Implement this for the monster game. Is it more efficient?
Exercise 13.6 In random_choice in the updates of select_action, all state-action
pairs have the same chance of being chosen. Does selecting state-action pairs pro-
portionally to the number of times visited work better than what is implemented?
Provide evidence for your answer.

13.5 Reinforcement Learning with Features

To run the demo, in folder “aipython”, load “rlFeatures.py”, and copy
and paste the example queries at the bottom of that file. This assumes
Python 3.

13.5.1 Representing Features

A feature is a function from state and action. To construct the features for a
domain, we construct a function that takes a state and an action and returns the
list of all feature values for that state and action. This feature set is redesigned
for each problem.
get features(state, action) returns the feature values appropriate for the mon-
ster game.
rlMonsterGameFeatures.py — Feature-based Reinforcement Learner
11 from rlExamples import Monster_game_env
12 from rlProblem import RL_env
13
14 def get_features(state,action):
15 """returns the list of feature values for the state-action pair
16 """

https://aipython.org Version 0.9.9 October 15, 2023

13.5. Reinforcement Learning with Features 311

17 assert action in Monster_game_env.actions, f"Monster game, unknown

action: {action}"
18 (x,y,d,p) = state
19 # f1: would go to a monster
20 f1 = monster_ahead(x,y,action)
21 # f2: would crash into wall
22 f2 = wall_ahead(x,y,action)
23 # f3: action is towards a prize
24 f3 = towards_prize(x,y,action,p)
25 # f4: damaged and action is toward repair station
26 f4 = towards_repair(x,y,action) if d else 0
27 # f5: damaged and towards monster
28 f5 = 1 if d and f1 else 0
29 # f6: damaged
30 f6 = 1 if d else 0
31 # f7: not damaged
32 f7 = 1-f6
33 # f8: damaged and prize ahead
34 f8 = 1 if d and f3 else 0
35 # f9: not damaged and prize ahead
36 f9 = 1 if not d and f3 else 0
37 features = [1,f1,f2,f3,f4,f5,f6,f7,f8,f9]
38 # the next 20 features are for 5 prize locations
39 # and 4 distances from outside in all directions
40 for pr in Monster_game_env.prize_locs+[None]:
41 if p==pr:
42 features += [x, 4-x, y, 4-y]
43 else:
44 features += [0, 0, 0, 0]
45 # fp04 feature for y when prize is at 0,4
46 # this knows about the wall to the right of the prize
47 if p==(0,4):
48 if x==0:
49 fp04 = y
50 elif y<3:
51 fp04 = y
52 else:
53 fp04 = 4-y
54 else:
55 fp04 = 0
56 features.append(fp04)
57 return features
58
59 def monster_ahead(x,y,action):
60 """returns 1 if the location expected to get to by doing
61 action from (x,y) can contain a monster.
62 """
63 if action == "right" and (x+1,y) in Monster_game_env.monster_locs:
64 return 1
65 elif action == "left" and (x-1,y) in Monster_game_env.monster_locs:

https://aipython.org Version 0.9.9 October 15, 2023

312 13. Reinforcement Learning

66 return 1
67 elif action == "up" and (x,y+1) in Monster_game_env.monster_locs:
68 return 1
69 elif action == "down" and (x,y-1) in Monster_game_env.monster_locs:
70 return 1
71 else:
72 return 0
73
74 def wall_ahead(x,y,action):
75 """returns 1 if there is a wall in the direction of action from (x,y).
76 This is complicated by the internal walls.
77 """
78 if action == "right" and (x==Monster_game_env.xdim-1 or (x,y) in
Monster_game_env.vwalls):
79 return 1
80 elif action == "left" and (x==0 or (x-1,y) in Monster_game_env.vwalls):
81 return 1
82 elif action == "up" and y==Monster_game_env.ydim-1:
83 return 1
84 elif action == "down" and y==0:
85 return 1
86 else:
87 return 0
88
89 def towards_prize(x,y,action,p):
90 """action goes in the direction of the prize from (x,y)"""
91 if p is None:
92 return 0
93 elif p==(0,4): # take into account the wall near the top-left prize
94 if action == "left" and (x>1 or x==1 and y<3):
95 return 1
96 elif action == "down" and (x>0 and y>2):
97 return 1
98 elif action == "up" and (x==0 or y<2):
99 return 1
100 else:
101 return 0
102 else:
103 px,py = p
104 if p==(4,4) and x==0:
105 if (action=="right" and y<3) or (action=="down" and y>2) or
(action=="up" and y<2):
106 return 1
107 else:
108 return 0
109 if (action == "up" and y<py) or (action == "down" and py<y):
110 return 1
111 elif (action == "left" and px<x) or (action == "right" and x<px):
112 return 1
113 else:

https://aipython.org Version 0.9.9 October 15, 2023

13.5. Reinforcement Learning with Features 313

114 return 0
115
116 def towards_repair(x,y,action):
117 """returns 1 if action is towards the repair station.
118 """
119 if action == "up" and (x>0 and y<4 or x==0 and y<2):
120 return 1
121 elif action == "left" and x>1:
122 return 1
123 elif action == "right" and x==0 and y<3:
124 return 1
125 elif action == "down" and x==0 and y>2:
126 return 1
127 else:
128 return 0
The following uses a simpler set of features. In particular, it only considers
whether the action will most likely result in a monster position or a wall, and
whether the action moves towards the current prize.
rlMonsterGameFeatures.py — (continued)

130 def simp_features(state,action):

131 """returns a list of feature values for the state-action pair
132 """
133 assert action in Monster_game_env.actions
134 (x,y,d,p) = state
135 # f1: would go to a monster
136 f1 = monster_ahead(x,y,action)
137 # f2: would crash into wall
138 f2 = wall_ahead(x,y,action)
139 # f3: action is towards a prize
140 f3 = towards_prize(x,y,action,p)
141 return [1,f1,f2,f3]

13.5.2 Feature-based RL learner

This learns a linear function approximation of the Q-values. It requires the
function get features that given a state and an action returns a list of values for
all of the features. Each environment requires this function to be provided.
rlFeatures.py — Feature-based Reinforcement Learner
11 import random
12 from rlProblem import RL_agent, epsilon_greedy, ucb
13 from display import Displayable
14 from utilities import argmaxe, flip
15
16 class SARSA_LFA_learner(RL_agent):
17 """A SARSA with linear function approximation (LFA) learning agent has
18 """
19 def __init__(self, role, actions, discount, get_features,

https://aipython.org Version 0.9.9 October 15, 2023

314 13. Reinforcement Learning

20 exploration_strategy=epsilon_greedy, es_kwargs={},
21 step_size=0.01, winit=0, method="SARSA_LFA"):
22 """role is the role of the agent (e.g., in a game)
23 actions is the set of actions the agent can do
24 discount is the discount factor
25 get_features is a function get_features(state,action) -> list of
feature values
26 exploration_strategy is the exploration function, default
"epsilon_greedy"
27 es_kwargs is extra keyword arguments of the exploration_strategy
28 step_size is gradient descent step size
29 winit is the initial value of the weights
30 method gives the method used to implement the role (for plotting)
31 """
32 RL_agent.__init__(self, actions)
33 self.role = role
34 self.discount = discount
35 self.exploration_strategy = exploration_strategy
36 self.es_kwargs = es_kwargs
37 self.get_features = get_features
38 self.step_size = step_size
39 self.winit = winit
40 self.method = method
The initial action is a random action. It remembers the state, and initializes the
data structures.
rlFeatures.py — (continued)

42 def initial_action(self, state):

43 """ Returns the initial action; selected at random
44 Initialize Data Structures
45 """
46 self.action = random.choice(self.actions)
47 self.features = self.get_features(state, self.action)
48 self.weights = [self.winit for f in self.features]
49 self.state = state
50 self.display(2, f"Initial State: {state} Action {self.action}")
51 self.display(2,"s\ta\tr\ts'\tQ")
52 return self.action
do takes in the number of steps.
rlFeatures.py — (continued)

54
55 def Q(self, state,action):
56 """returns Q-value of the state and action for current weights
57 """
58 return dot_product(self.weights, self.get_features(state,action))
59
60 def select_action(self, reward, next_state):
61 """do num_steps of interaction with the environment"""
62 feature_values = self.get_features(self.state,self.action)

https://aipython.org Version 0.9.9 October 15, 2023

13.5. Reinforcement Learning with Features 315

63 oldQ = self.Q(self.state,self.action)
64 next_action = self.exploration_strategy({a:self.Q(next_state,a)
65 for a in self.actions}, {})
66 nextQ = self.Q(next_state,next_action)
67 delta = reward + self.discount * nextQ - oldQ
68 for i in range(len(self.weights)):
69 self.weights[i] += self.step_size * delta * feature_values[i]
70 self.display(2,self.state, self.action, reward, next_state,
71 self.Q(self.state,self.action), delta, sep='\t')
72 self.state = next_state
73 self.action = next_action
74 return self.action
75
76 def show_actions(self,state=None):
77 """prints the value for each action in a state.
78 This may be useful for debugging.
79 """
80 if state is None:
81 state = self.state
82 for next_act in self.actions:
83 print(next_act,dot_product(self.weights,
self.get_features(state,next_act)))
84
85 def dot_product(l1,l2):
86 return sum(e1*e2 for (e1,e2) in zip(l1,l2))

Test code:

rlFeatures.py — (continued)

88 from rlProblem import Simulate

89 from rlExamples import Monster_game_env # monster game environment
90 import rlMonsterGameFeatures
91
92 mon_env = Monster_game_env()
93 fa1 = SARSA_LFA_learner(mon_env.name, mon_env.actions, 0.9,
rlMonsterGameFeatures.get_features)
94 # Simulate(fa1,mon_env).go(100000).plot()
95 fas1 = SARSA_LFA_learner(mon_env.name, mon_env.actions, 0.9,
rlMonsterGameFeatures.simp_features, method="LFA (simp features)")
96 #Simulate(fas1,mon_env).go(100000).plot()

Exercise 13.7 How does the step-size affect performance? Try different step sizes
(e.g., 0.1, 0.001, other sizes in between). Explain the behavior you observe. Which
step size works best for this example. Explain what evidence you are basing your
prediction on.
Exercise 13.8 Does having extra features always help? Does it sometime help?
Does whether it helps depend on the step size? Give evidence for your claims.
Exercise 13.9 For each of the following first predict, then plot, then explain the
behavior you observed:

https://aipython.org Version 0.9.9 October 15, 2023

316 13. Reinforcement Learning

(a) SARSA LFA, Model-based learning (with 1 update per step) and Q-learning
for 10,000 steps 20% exploring followed by 10,000 steps 100% exploiting
(b) SARSA LFA, model-based learning and Q-learning for
i) 100,000 steps 20% exploring followed by 100,000 steps 100% exploit
ii) 10,000 steps 20% exploring followed by 190,000 steps 100% exploit
(c) Suppose your goal was to have the best accumulated reward after 200,000
steps. You are allowed to change the exploration rate at a fixed number of
steps. For each of the methods, which is the best position to start exploiting
more? Which method is better? What if you wanted to have the best reward
after 10,000 or 1,000 steps?
Based on this evidence, explain when it is preferable to use SARSA LFA, Model-
based learner, or Q-learning.
Important: you need to run each algorithm more than once. Your explanation
should include the variability as well as the typical behavior.
Exercise 13.10 In the call to self.exploration_strategy, what should the counts
be? (The code above will fail for ucb, for example.) Think about the case where
there are too many states. Suppose we are just learning for a neighborhood of a
current state (e.g., a fixed number of steps away the from the current state); how
could the algorithm be modifies to make sure it has at least explored the close
neighborhood of the current state?

https://aipython.org Version 0.9.9 October 15, 2023

Chapter 14

Multiagent Systems

14.1 Minimax
Here we consider two-player zero-sum games. Here a player only wins when
another player loses. This can be modeled as where there is a single utility
which one agent (the maximizing agent) is trying minimize and the other agent
(the minimizing agent) is trying to minimize.

14.1.1 Creating a two-player game

masProblem.py — A Multiagent Problem

11 from display import Displayable
12
13 class Node(Displayable):
14 """A node in a search tree. It has a
15 name a string
16 isMax is True if it is a maximizing node, otherwise it is minimizing
node
17 children is the list of children
18 value is what it evaluates to if it is a leaf.
19 """
20 def __init__(self, name, isMax, value, children):
21 self.name = name
22 self.isMax = isMax
23 self.value = value
24 self.allchildren = children
25
26 def isLeaf(self):
27 """returns true of this is a leaf node"""
28 return self.allchildren is None

317
318 14. Multiagent Systems

29
30 def children(self):
31 """returns the list of all children."""
32 return self.allchildren
33
34 def evaluate(self):
35 """returns the evaluation for this node if it is a leaf"""
36 return self.value

The following gives the tree from Figure 11.5 of the book. Note how 888 is used
as a value here, but never appears in the trace.

masProblem.py — (continued)

38 fig10_5 = Node("a",True,None, [
39 Node("b",False,None, [
40 Node("d",True,None, [
41 Node("h",False,None, [
42 Node("h1",True,7,None),
43 Node("h2",True,9,None)]),
44 Node("i",False,None, [
45 Node("i1",True,6,None),
46 Node("i2",True,888,None)])]),
47 Node("e",True,None, [
48 Node("j",False,None, [
49 Node("j1",True,11,None),
50 Node("j2",True,12,None)]),
51 Node("k",False,None, [
52 Node("k1",True,888,None),
53 Node("k2",True,888,None)])])]),
54 Node("c",False,None, [
55 Node("f",True,None, [
56 Node("l",False,None, [
57 Node("l1",True,5,None),
58 Node("l2",True,888,None)]),
59 Node("m",False,None, [
60 Node("m1",True,4,None),
61 Node("m2",True,888,None)])]),
62 Node("g",True,None, [
63 Node("n",False,None, [
64 Node("n1",True,888,None),
65 Node("n2",True,888,None)]),
66 Node("o",False,None, [
67 Node("o1",True,888,None),
68 Node("o2",True,888,None)])])])])

The following is a representation of a magic-sum game, where players take

turns picking a number in the range [1, 9], and the first player to have 3 num-
bers that sum to 15 wins. Note that this is a syntactic variant of tic-tac-toe or
naughts and crosses. To see this, consider the numbers on a magic square (Fig-
ure 14.1); 3 numbers that add to 15 correspond exactly to the winning positions

https://aipython.org Version 0.9.9 October 15, 2023

14.1. Minimax 319

6 1 8
7 5 3
2 9 4

Figure 14.1: Magic Square

of tic-tac-toe played on the magic square.

Note that we do not remove symmetries. (What are the symmetries? How
do the symmetries of tic-tac-toe translate here?)
masProblem.py — (continued)

70
71 class Magic_sum(Node):
72 def __init__(self, xmove=True, last_move=None,
73 available=[1,2,3,4,5,6,7,8,9], x=[], o=[]):
74 """This is a node in the search for the magic-sum game.
75 xmove is True if the next move belongs to X.
76 last_move is the number selected in the last move
77 available is the list of numbers that are available to be chosen
78 x is the list of numbers already chosen by x
79 o is the list of numbers already chosen by o
80 """
81 self.isMax = self.xmove = xmove
82 self.last_move = last_move
83 self.available = available
84 self.x = x
85 self.o = o
86 self.allchildren = None #computed on demand
87 lm = str(last_move)
88 self.name = "start" if not last_move else "o="+lm if xmove else
"x="+lm
89
90 def children(self):
91 if self.allchildren is None:
92 if self.xmove:
93 self.allchildren = [
94 Magic_sum(xmove = not self.xmove,
95 last_move = sel,
96 available = [e for e in self.available if e is
not sel],
97 x = self.x+[sel],
98 o = self.o)
99 for sel in self.available]
100 else:
101 self.allchildren = [
102 Magic_sum(xmove = not self.xmove,
103 last_move = sel,
104 available = [e for e in self.available if e is
not sel],

https://aipython.org Version 0.9.9 October 15, 2023

320 14. Multiagent Systems

105 x = self.x,
106 o = self.o+[sel])
107 for sel in self.available]
108 return self.allchildren
109
110 def isLeaf(self):
111 """A leaf has no numbers available or is a win for one of the
players.
112 We only need to check for a win for o if it is currently x's turn,
113 and only check for a win for x if it is o's turn (otherwise it would
114 have been a win earlier).
115 """
116 return (self.available == [] or
117 (sum_to_15(self.last_move,self.o)
118 if self.xmove
119 else sum_to_15(self.last_move,self.x)))
120
121 def evaluate(self):
122 if self.xmove and sum_to_15(self.last_move,self.o):
123 return -1
124 elif not self.xmove and sum_to_15(self.last_move,self.x):
125 return 1
126 else:
127 return 0
128
129 def sum_to_15(last,selected):
130 """is true if last, together with two other elements of selected sum to
15.
131 """
132 return any(last+a+b == 15
133 for a in selected if a != last
134 for b in selected if b != last and b != a)

14.1.2 Minimax and α-β Pruning

This is a naive depth-first minimax algorithm:

masMiniMax.py — Minimax search with alpha-beta pruning

11 def minimax(node,depth):
12 """returns the value of node, and a best path for the agents
13 """
14 if node.isLeaf():
15 return node.evaluate(),None
16 elif node.isMax:
17 max_score = float("-inf")
18 max_path = None
19 for C in node.children():
20 score,path = minimax(C,depth+1)
21 if score > max_score:
22 max_score = score
23 max_path = C.name,path

https://aipython.org Version 0.9.9 October 15, 2023

14.1. Minimax 321

24 return max_score,max_path
25 else:
26 min_score = float("inf")
27 min_path = None
28 for C in node.children():
29 score,path = minimax(C,depth+1)
30 if score < min_score:
31 min_score = score
32 min_path = C.name,path
33 return min_score,min_path
The following is a depth-first minimax with α-β pruning. It returns the
value for a node as well as a best path for the agents.
masMiniMax.py — (continued)

35 def minimax_alpha_beta(node,alpha,beta,depth=0):
36 """node is a Node, alpha and beta are cutoffs, depth is the depth
37 returns value, path
38 where path is a sequence of nodes that results in the value
39 """
40 node.display(2," "*depth,"minimax_alpha_beta(",node.name,", ",alpha, ",
", beta,")")
41 best=None # only used if it will be pruned
42 if node.isLeaf():
43 node.display(2," "*depth,"returning leaf value",node.evaluate())
44 return node.evaluate(),None
45 elif node.isMax:
46 for C in node.children():
47 score,path = minimax_alpha_beta(C,alpha,beta,depth+1)
48 if score >= beta: # beta pruning
49 node.display(2," "*depth,"pruned due to
beta=",beta,"C=",C.name)
50 return score, None
51 if score > alpha:
52 alpha = score
53 best = C.name, path
54 node.display(2," "*depth,"returning max alpha",alpha,"best",best)
55 return alpha,best
56 else:
57 for C in node.children():
58 score,path = minimax_alpha_beta(C,alpha,beta,depth+1)
59 if score <= alpha: # alpha pruning
60 node.display(2," "*depth,"pruned due to
alpha=",alpha,"C=",C.name)
61 return score, None
62 if score < beta:
63 beta=score
64 best = C.name,path
65 node.display(2," "*depth,"returning min beta",beta,"best=",best)
66 return beta,best
Testing:

https://aipython.org Version 0.9.9 October 15, 2023

322 14. Multiagent Systems

masMiniMax.py — (continued)

68 from masProblem import fig10_5, Magic_sum, Node

69
70 # Node.max_display_level=2 # print detailed trace
71 # minimax_alpha_beta(fig10_5, -9999, 9999,0)
72 # minimax_alpha_beta(Magic_sum(), -9999, 9999,0)
73
74 #To see how much time alpha-beta pruning can save over minimax, uncomment
the following:
75 ## import timeit
76 ## timeit.Timer("minimax(Magic_sum(),0)",setup="from __main__ import
minimax, Magic_sum"
77 ## ).timeit(number=1)
78 ## trace=False
79 ## timeit.Timer("minimax_alpha_beta(Magic_sum(), -9999, 9999,0)",
80 ## setup="from __main__ import minimax_alpha_beta, Magic_sum"
81 ## ).timeit(number=1)

14.2 Multiagent Learning

The next code of for multiple agents that learn when interacting with other
agents. The main difference with the learners from the last chapter is that the
games take actions from all agents and provide a separate reward to each agent.
The following agent maintains a stochastic policy; it learns a distribution
over actions for each state.

masLearn.py — Simulations of agents learning

11 from display import Displayable
12 import utilities # argmaxall for (element,value) pairs
13 import matplotlib.pyplot as plt
14 import random
15 from rlProblem import RL_agent
16
17
18 class StochasticPIAgent(RL_agent):
19 """This agent maintains the Q-function for each state.
20 Chooses the best action using empirical distribution over actions
21 """
22 def __init__(self, role, actions, discount=0,
23 alpha_fun=lambda k:10/(9+k), Qinit=1, pi_init=1,
method="Stochastic Q_learner"):
24 """
25 role is the role of the agent (e.g., in a game)
26 actions is the set of actions the agent can do.
27 discount is the discount factor (0 is appropriate if there is a
single state)
28 alpha_fun is a function that computes alpha from the number of
visits

https://aipython.org Version 0.9.9 October 15, 2023

14.2. Multiagent Learning 323

29 Qinit is the initial q-values

30 pi_init gives the prior counts (Dirichlet prior) for the policy
(must be >0)
31 method gives the method used to implement the role
32 """
33 #self.max_display_level = 3
34 RL_agent.__init__(self, actions)
35 self.role = role
36 self.discount = discount
37 self.alpha_fun = alpha_fun
38 self.Qinit = Qinit
39 self.pi_init = pi_init
40 self.method = method
41 self.Q = {}
42 self.pi = {}
43 self.visits = {}
44
45 def initial_action(self, state):
46 """ Returns the initial action; selected at random
47 Initialize Data Structures
48 """
49 self.Q[state] = {act:self.Qinit for act in self.actions}
50 self.pi[state] = {act:self.pi_init for act in self.actions}
51 self.action = random.choice(self.actions)
52 self.visits[state] = {act:0 for act in self.actions}
53 self.state = state
54 self.display(2, f"Initial State: {state} Action {self.action}")
55 self.display(2,"s\ta\tr\ts'\tQ")
56 return self.action
57
58 def select_action(self, reward, next_state):
59 """give reward and next state, select next action to be carried
out"""
60 if next_state not in self.visits: # next state not seen before
61 self.Q[next_state] = {act:self.Qinit for act in self.actions}
62 self.pi[next_state] = {act:self.pi_init for act in
63 self.actions}
64 self.visits[next_state] = {act:0 for act in self.actions}
65 self.visits[self.state][self.action] +=1
66 alpha = self.alpha_fun(self.visits[self.state][self.action])
67 self.Q[self.state][self.action] += alpha*(
68 reward
69 + self.discount * max(self.Q[next_state].values())
70 - self.Q[self.state][self.action])
71 a_best = utilities.argmaxd(self.Q[self.state])
72 self.pi[self.state][a_best] +=1
73 self.display(2,self.state, self.action, reward, next_state,
74 self.Q[self.state][self.action], sep='\t')
75 self.state = next_state
76 self.action = select_from_dist(self.pi[next_state])

https://aipython.org Version 0.9.9 October 15, 2023

324 14. Multiagent Systems

77 self.display(3,f"Agent {self.role} doing {self.action} in state

{self.state}")
78 return self.action
79
80
81 def normalize(dist):
82 """dict is a {value:number} dictionary, where the numbers are all
non-negative
83 returns dict where the numbers sum to one
84 """
85 tot = sum(dist.values())
86 return {var:val/tot for (var,val) in dist.items()}
87
88 def select_from_dist(dist):
89 rand = random.random()
90 for (act,prob) in normalize(dist).items():
91 rand -= prob
92 if rand < 0:
93 return act

The agent can be tested on the reinforcement learning benchmarks from the
previous chapter:

masLearn.py — (continued)

95 #### Testing on RL benchmarks #####

96 from rlProblem import Simulate
97 from rlExamples import Healthy_env, Monster_game_env
98 mon_env = Monster_game_env()
99 magspi =StochasticPIAgent(mon_env.name, mon_env.actions,0.9)
100 #Simulate(magspi,mon_env).go(100000).plot()

The simulation for a game passes the joint action from all agents to the
environment, which returns a tuple of rewards – one for each agent – and the
next state.

masLearn.py — (continued)

102 class SimulateGame(Displayable):

103 def __init__(self, game, agent_types):
104 #self.max_display_level = 3
105 self.game = game
106 self.agents = [agent_types[i](game.players[i], game.actions[i], 0)
for i in range(game.num_agents)] # list of agents
107 self.action_dists = [{act:0 for act in game.actions[i]} for i in
range(game.num_agents)]
108 self.action_history = []
109 self.state_history = []
110 self.reward_history = []
111 self.dist = {}
112 self.dist_history = []
113 self.actions = tuple(ag.initial_action(game.initial_state) for ag
in self.agents)

https://aipython.org Version 0.9.9 October 15, 2023

14.2. Multiagent Learning 325

114 self.num_steps = 0
115
116 def go(self, steps):
117 for i in range(steps):
118 self.num_steps += 1
119 (self.rewards, state) = self.game.play(self.actions)
120 self.display(3, f"In go rewards={self.rewards}, state={state}")
121 self.reward_history.append(self.rewards)
122 self.state_history.append(state)
123 self.actions = tuple(agent.select_action(reward, state)
124 for (agent,reward) in
zip(self.agents,self.rewards))
125 self.action_history.append(self.actions)
126 for i in range(self.game.num_agents):
127 self.action_dists[i][self.actions[i]] += 1
128 self.dist_history.append([{a:i for (a,i) in elt.items()} for
elt in self.action_dists]) # deep copy
129 #print("Scores:", ' '.join(f"{self.agents[i].role} average
reward={ag.total_score/self.num_steps}" for ag in self.agents))
130 print("Distributions:", '
'.join(str({a:self.dist_history[-1][i][a]/sum(self.dist_history[-1][i].values())
for a in self.game.actions[i]})
131 for i in
range(self.game.num_agents)))
132 #return self.reward_history, self.action_history
133
134 def action_dist(self,which_actions=[1,1]):
135 """ which actions is [a0,a1]
136 returns the empirical distribution of actions for agents,
137 where ai specifies the index of the actions for agent i
138 remove this???
139 """
140 return [sum(1 for a in sim.action_history
141 if
a[i]==gm.actions[i][which_actions[i]])/len(sim.action_history)
142 for i in range(2)]

The plotting shows how the empirical distributions of the first two agents
changes as the learning continues.

masLearn.py — (continued)

144 def plot_dynamics(self, x_action=0, y_action=0):

145 plt.ion() # make it interactive
146 agents = self.agents
147 x_act = self.game.actions[0][x_action]
148 y_act = self.game.actions[1][y_action]
149 plt.xlabel(f"Probability {self.game.players[0]}
{self.agents[0].actions[x_action]}")
150 plt.ylabel(f"Probability {self.game.players[1]}
{self.agents[1].actions[y_action]}")

https://aipython.org Version 0.9.9 October 15, 2023

326 14. Multiagent Systems

151 plt.plot([self.dist_history[i][0][x_act]/sum(self.dist_history[i][0].values())
for i in range(len(self.dist_history))],
152 [self.dist_history[i][1][y_act]/sum(self.dist_history[i][1].values())
for i in range(len(self.dist_history))])
153 #plt.legend()
154 plt.savefig('soccerplot.pdf')
155 plt.show()

masLearn.py — (continued)

157
158 class ShoppingGame(Displayable):
159 def __init__(self):
160 self.num_agents = 2
161 self.states = ['s']
162 self.initial_state = 's'
163 self.actions = [['shopping', 'football']]*2
164 self.players = ['football-preferrer goes to', 'shopping-preferrer
goes to']
165
166 def play(self, actions):
167 """Given (action1,action2) returns (resulting_state, (rewward1,
reward2))
168 """
169 return ({('football', 'football'): (2, 1),
170 ('football', 'shopping'): (0, 0),
171 ('shopping', 'football'): (0, 0),
172 ('shopping', 'shopping'): (1, 2)
173 }[actions], 's')
174
175
176 class SoccerGame(Displayable):
177 def __init__(self):
178 self.num_agents = 2
179 self.states = ['s']
180 self.initial_state = 's'
181 self.initial_state = 's'
182 self.actions = [['right', 'left']]*2
183 self.players = ['goalkeeper', 'kicker']
184
185 def play(self, actions):
186 """Given (action1,action2) returns (resulting_state, (rewward1,
reward2))
187 resulting state is 's'
188 """
189 return ({('left', 'left'): (0.6, 0.4),
190 ('left', 'right'): (0.3, 0.7),
191 ('right', 'left'): (0.2, 0.8),
192 ('right', 'right'): (0.9,0.1)
193 }[actions], 's')
194

https://aipython.org Version 0.9.9 October 15, 2023

14.2. Multiagent Learning 327

195 class GameShow(Displayable):

196 def __init__(self):
197 self.num_agents = 2
198 self.states = ['s']
199 self.initial_state = 's'
200 self.actions = [['takes', 'gives']]*2
201 self.players = ['Agent 1', 'Agent 2']
202
203 def play(self, actions):
204 return ({('takes', 'takes'): (1, 1),
205 ('takes', 'gives'): (11, 0),
206 ('gives', 'takes'): (0, 11),
207 ('gives', 'gives'): (10, 10)
208 }[actions], 's')
209
210
211 class UniqueNEGameExample(Displayable):
212 def __init__(self):
213 self.num_agents = 2
214 self.states = ['s']
215 self.initial_state = 's'
216 self.actions = [['a1', 'b1', 'c1'],['d2', 'e2', 'f2']]
217 self.players = ['agent 1 does', 'agent 2 does']
218
219 def play(self, actions):
220 return ({('a1', 'd2'): (3, 5),
221 ('a1', 'e2'): (5, 1),
222 ('a1', 'f2'): (1, 2),
223 ('b1', 'd2'): (1, 1),
224 ('b1', 'e2'): (2, 9),
225 ('b1', 'f2'): (6, 4),
226 ('c1', 'd2'): (2, 6),
227 ('c1', 'e2'): (4, 7),
228 ('c1', 'f2'): (0, 8)
229 }[actions], 's')

masLearn.py — (continued)

232 # Choose one:

233 # gm = ShoppingGame()
234 # gm = SoccerGame()
235 # gm = GameShow()
236 # gm = UniqueNEGameExample()
237
238 from rlQLearner import Q_learner
239 from rlProblem import RL_agent
240 # Choose one of the combinations of learners:
241 # sim=SimulateGame(gm,[StochasticPIAgent, StochasticPIAgent]);
sim.go(10000)
242 # sim= SimulateGame(gm,[Q_learner, Q_learner]); sim.go(10000)
243 # sim=SimulateGame(gm,[Q_learner, StochasticPIAgent]); sim.go(10000)

https://aipython.org Version 0.9.9 October 15, 2023

328 14. Multiagent Systems

244
245
246 # sim.plot_dynamics()
247
248 # empirical proportion that agents did their action at index 1:
249 # sim.action_dist([1,1])
250
251 # (unnormalized) empirical distribution for agent 0
252 # sim.agents[0].dist

Exercise 14.1 Try the game show game (prisoner’s dilemma) with two StochasticPIAgent
agents and alpha_fun=lambda k:0.1. Try also 0.01. Why does this work qualita-
tively different? Is this better?

https://aipython.org Version 0.9.9 October 15, 2023

Chapter 15

Relational Learning

15.1 Collaborative Filtering

Based on gradient descent algorithm of Koren, Y., Bell, R. and Volinsky, C.,
Matrix Factorization Techniques for Recommender Systems, IEEE Computer
2009.
This assumes the form of the dataset from movielens (http://grouplens.
org/datasets/movielens/). The rating are a set of (user, item, rating, timestamp)
tuples.

relnCollFilt.py — Latent Property-based Collaborative Filtering

11 import random
12 import matplotlib.pyplot as plt
13 import urllib.request
14 from learnProblem import Learner
15 from display import Displayable
16
17 class CF_learner(Learner):
18 def __init__(self,
19 rating_set, # a Rating_set object
20 rating_subset = None, # subset of ratings to be used as
training ratings
21 test_subset = None, # subset of ratings to be used as test
ratings
22 step_size = 0.01, # gradient descent step size
23 reglz = 1.0, # the weight for the regularization
terms
24 num_properties = 10, # number of hidden properties
25 property_range = 0.02 # properties are initialized to be
between
26 # -property_range and property_range

329
330 15. Relational Learning

27 ):
28 self.rating_set = rating_set
29 self.ratings = rating_subset or rating_set.training_ratings #
whichever is not empty
30 if test_subset is None:
31 self.test_ratings = self.rating_set.test_ratings
32 else:
33 self.test_ratings = test_subset
34 self.step_size = step_size
35 self.reglz = reglz
36 self.num_properties = num_properties
37 self.num_ratings = len(self.ratings)
38 self.ave_rating = (sum(r for (u,i,r,t) in self.ratings)
39 /self.num_ratings)
40 self.users = {u for (u,i,r,t) in self.ratings}
41 self.items = {i for (u,i,r,t) in self.ratings}
42 self.user_bias = {u:0 for u in self.users}
43 self.item_bias = {i:0 for i in self.items}
44 self.user_prop = {u:[random.uniform(-property_range,property_range)
45 for p in range(num_properties)]
46 for u in self.users}
47 self.item_prop = {i:[random.uniform(-property_range,property_range)
48 for p in range(num_properties)]
49 for i in self.items}
50 self.zeros = [0 for p in range(num_properties)]
51 self.iter=0
52
53 def stats(self):
54 self.display(1,"ave sumsq error of mean for training=",
55 sum((self.ave_rating-rating)**2 for
(user,item,rating,timestamp)
56 in self.ratings)/len(self.ratings))
57 self.display(1,"ave sumsq error of mean for test=",
58 sum((self.ave_rating-rating)**2 for
(user,item,rating,timestamp)
59 in self.test_ratings)/len(self.test_ratings))
60 self.display(1,"error on training set",
61 self.evaluate(self.ratings))
62 self.display(1,"error on test set",
63 self.evaluate(self.test_ratings))

learn carries out num iter steps of gradient descent.

relnCollFilt.py — (continued)

65 def prediction(self,user,item):
66 """Returns prediction for this user on this item.
67 The use of .get() is to handle users or items not in the training
set.
68 """
69 return (self.ave_rating
70 + self.user_bias.get(user,0) #self.user_bias[user]

https://aipython.org Version 0.9.9 October 15, 2023

15.1. Collaborative Filtering 331

71 + self.item_bias.get(item,0) #self.item_bias[item]
72 +
sum([self.user_prop.get(user,self.zeros)[p]*self.item_prop.get(item,self.zeros)[
73 for p in range(self.num_properties)]))
74
75 def learn(self, num_iter = 50):
76 """ do num_iter iterations of gradient descent."""
77 for i in range(num_iter):
78 self.iter += 1
79 abs_error=0
80 sumsq_error=0
81 for (user,item,rating,timestamp) in
random.sample(self.ratings,len(self.ratings)):
82 error = self.prediction(user,item) - rating
83 abs_error += abs(error)
84 sumsq_error += error * error
85 self.user_bias[user] -= self.step_size*error
86 self.item_bias[item] -= self.step_size*error
87 for p in range(self.num_properties):
88 self.user_prop[user][p] -=
self.step_size*error*self.item_prop[item][p]
89 self.item_prop[item][p] -=
self.step_size*error*self.user_prop[user][p]
90 for user in self.users:
91 self.user_bias[user] -= self.step_size*self.reglz*
self.user_bias[user]
92 for p in range(self.num_properties):
93 self.user_prop[user][p] -=
self.step_size*self.reglz*self.user_prop[user][p]
94 for item in self.items:
95 self.item_bias[item] -=
self.step_size*self.reglz*self.item_bias[item]
96 for p in range(self.num_properties):
97 self.item_prop[item][p] -=
self.step_size*self.reglz*self.item_prop[item][p]
98 self.display(1,"Iteration",self.iter,
99 "(Ave Abs,AveSumSq) training
=",self.evaluate(self.ratings),
100 "test =",self.evaluate(self.test_ratings))

evaluate evaluates current predictions on the rating set:

relnCollFilt.py — (continued)

102 def evaluate(self,ratings):

103 """returns (avergage_absolute_error, average_sum_squares_error) for
ratings
104 """
105 abs_error = 0
106 sumsq_error = 0
107 if not ratings: return (0,0)
108 for (user,item,rating,timestamp) in ratings:

https://aipython.org Version 0.9.9 October 15, 2023

332 15. Relational Learning

109 error = self.prediction(user,item) - rating

110 abs_error += abs(error)
111 sumsq_error += error * error
112 return abs_error/len(ratings), sumsq_error/len(ratings)

Exercise 15.1 The above code updates the parameters after each example, but
only regularizes after the whole batch. Change the program so that it implements
stochastic gradient descent with a given batch size, and only updates the parame-
ters after a batch.
Exercise 15.2 In the previous questions, can the regularization avoid iterating
through the parameters for all users and items after a batch? Consider items that
are in many batches versus those in a few or even no batches. (Warning: This is
challenging to get right.)

15.1.1 Plotting
relnCollFilt.py — (continued)

114 def plot_predictions(self, examples="test"):

115 """
116 examples is either "test" or "training" or the actual examples
117 """
118 if examples == "test":
119 examples = self.test_ratings
120 elif examples == "training":
121 examples = self.ratings
122 plt.ion()
123 plt.xlabel("prediction")
124 plt.ylabel("cumulative proportion")
125 self.actuals = [[] for r in range(0,6)]
126 for (user,item,rating,timestamp) in examples:
127 self.actuals[rating].append(self.prediction(user,item))
128 for rating in range(1,6):
129 self.actuals[rating].sort()
130 numrat=len(self.actuals[rating])
131 yvals = [i/numrat for i in range(numrat)]
132 plt.plot(self.actuals[rating], yvals,
label="rating="+str(rating))
133 plt.legend()
134 plt.draw()
This plots a single property. Each (user, item, rating) is plotted where the
x-value is the value of the property for the user, the y-value is the value of the
property for the item, and the rating is plotted at this (x, y) position. That is,
rating is plotted at the (x, y) position (p(user), p(item)).
relnCollFilt.py — (continued)

136 def plot_property(self,

137 p, # property
138 plot_all=False, # true if all points should be plotted

https://aipython.org Version 0.9.9 October 15, 2023

15.1. Collaborative Filtering 333

139 num_points=200 # number of random points plotted if not

all
140 ):
141 """plot some of the user-movie ratings,
142 if plot_all is true
143 num_points is the number of points selected at random plotted.
144
145 the plot has the users on the x-axis sorted by their value on
property p and
146 with the items on the y-axis sorted by their value on property p and
147 the ratings plotted at the corresponding x-y position.
148 """
149 plt.ion()
150 plt.xlabel("users")
151 plt.ylabel("items")
152 user_vals = [self.user_prop[u][p]
153 for u in self.users]
154 item_vals = [self.item_prop[i][p]
155 for i in self.items]
156 plt.axis([min(user_vals)-0.02,
157 max(user_vals)+0.05,
158 min(item_vals)-0.02,
159 max(item_vals)+0.05])
160 if plot_all:
161 for (u,i,r,t) in self.ratings:
162 plt.text(self.user_prop[u][p],
163 self.item_prop[i][p],
164 str(r))
165 else:
166 for i in range(num_points):
167 (u,i,r,t) = random.choice(self.ratings)
168 plt.text(self.user_prop[u][p],
169 self.item_prop[i][p],
170 str(r))
171 plt.show()

15.1.2 Creating Rating Sets

A rating set can be read from the Internet or read from a local file. The default
is to read the Movielens 100K dataset from the Internet. It would be more
efficient to save the dataset as a local file, and then set local file = True, as then
it will not need to download the dataset every time the program is run.
relnCollFilt.py — (continued)

173 class Rating_set(Displayable):

174 def __init__(self,
175 date_split=892000000,
176 local_file=False,
177 url="http://files.grouplens.org/datasets/movielens/ml-100k/u.data",
178 file_name="u.data"):
179 self.display(1,"reading...")

https://aipython.org Version 0.9.9 October 15, 2023

334 15. Relational Learning

180 if local_file:
181 lines = open(file_name,'r')
182 else:
183 lines = (line.decode('utf-8') for line in
urllib.request.urlopen(url))
184 all_ratings = (tuple(int(e) for e in line.strip().split('\t'))
185 for line in lines)
186 self.training_ratings = []
187 self.training_stats = {1:0, 2:0, 3:0, 4:0 ,5:0}
188 self.test_ratings = []
189 self.test_stats = {1:0, 2:0, 3:0, 4:0 ,5:0}
190 for rate in all_ratings:
191 if rate[3] < date_split: # rate[3] is timestamp
192 self.training_ratings.append(rate)
193 self.training_stats[rate[2]] += 1
194 else:
195 self.test_ratings.append(rate)
196 self.test_stats[rate[2]] += 1
197 self.display(1,"...read:", len(self.training_ratings),"training
ratings and",
198 len(self.test_ratings),"test ratings")
199 tr_users = {user for (user,item,rating,timestamp) in
self.training_ratings}
200 test_users = {user for (user,item,rating,timestamp) in
self.test_ratings}
201 self.display(1,"users:",len(tr_users),"training,",len(test_users),"test,",
202 len(tr_users & test_users),"in common")
203 tr_items = {item for (user,item,rating,timestamp) in
self.training_ratings}
204 test_items = {item for (user,item,rating,timestamp) in
self.test_ratings}
205 self.display(1,"items:",len(tr_items),"training,",len(test_items),"test,",
206 len(tr_items & test_items),"in common")
207 self.display(1,"Rating statistics for training set:
",self.training_stats)
208 self.display(1,"Rating statistics for test set: ",self.test_stats)
Sometimes it is useful to plot a property for all (user, item, rating) triples.
There are too many such triples in the data set. The method create top subset
creates a much smaller dataset where this makes sense. It picks the most rated
items, then picks the users who have the most ratings on these items. It is
designed for depicting the meaning of properties, and may not be useful for
other purposes.
relnCollFilt.py — (continued)

210 def create_top_subset(self, num_items = 30, num_users = 30):

211 """Returns a subset of the ratings by picking the most rated items,
212 and then the users that have most ratings on these, and then all of
the
213 ratings that involve these users and items.
214 """

https://aipython.org Version 0.9.9 October 15, 2023

15.2. Relational Probabilistic Models 335

215 items = {item for (user,item,rating,timestamp) in

self.training_ratings}
216
217 item_counts = {i:0 for i in items}
218 for (user,item,rating,timestamp) in self.training_ratings:
219 item_counts[item] += 1
220
221 items_sorted = sorted((item_counts[i],i) for i in items)
222 top_items = items_sorted[-num_items:]
223 set_top_items = set(item for (count, item) in top_items)
224
225 users = {user for (user,item,rating,timestamp) in
self.training_ratings}
226 user_counts = {u:0 for u in users}
227 for (user,item,rating,timestamp) in self.training_ratings:
228 if item in set_top_items:
229 user_counts[user] += 1
230
231 users_sorted = sorted((user_counts[u],u)
232 for u in users)
233 top_users = users_sorted[-num_users:]
234 set_top_users = set(user for (count, user) in top_users)
235 used_ratings = [ (user,item,rating,timestamp)
236 for (user,item,rating,timestamp) in
self.training_ratings
237 if user in set_top_users and item in set_top_items]
238 return used_ratings
239
240 movielens = Rating_set()
241 learner1 = CF_learner(movielens, num_properties = 1)
242 #learner1.learn(50)
243 # learner1.plot_predictions(examples = "training")
244 # learner1.plot_predictions(examples = "test")
245 #learner1.plot_property(0)
246 #movielens_subset = movielens.create_top_subset(num_items = 20, num_users
= 20)
247 #learner_s = CF_learner(movielens, rating_subset=movielens_subset,
test_subset=[], num_properties=1)
248 #learner_s.learn(1000)
249 #learner_s.plot_property(0,plot_all=True)

15.2 Relational Probabilistic Models

The following implements relational belief networks – belief networks with
plates. Plates correspond to logical variables.
relnProbModels.py — Relational Probabilistic Models: belief networks with plates
11 from display import Displayable
12 from probGraphicalModels import BeliefNetwork

https://aipython.org Version 0.9.9 October 15, 2023

336 15. Relational Learning

13 from variable import Variable

14 from probRC import ProbRC
15 from probFactors import Prob
16 import random
17
18 boolean = [False, True]
A ParVar is a parametrized random variable, which consists of the name, a list
of logical variables (plates), a domain, and a position. For each assignment of
an entity to each logical variable, there is a random variable in a grounding.
relnProbModels.py — (continued)

20 class ParVar(object):
21 """Parametrized random variable"""
22 def __init__(self, name, log_vars, domain, position=None):
23 self.name = name # string
24 self.log_vars = log_vars
25 self.domain = domain # list of values
26 self.position = position if position else (random.random(),
random.random())
27 self.size = len(domain)
The class RBN is of relational belief networks.
relnProbModels.py — (continued)

29 class RBN(Displayable):
30 def __init__(self, title, parvars, parfactors):
31 self.title = title
32 self.parvars = parvars
33 self.parfactors = parfactors
34 self.log_vars = {V for PV in parvars for V in PV.log_vars}
35
36 def ground(self, populations):
37 """Ground the belief network with the populations of the logical
variables.
38 populations is a dictionary that maps each logical variable to the
list of individuals.
39 Returns a belief network representation of the grounding.
40 """
41 assert all(lv in populations for lv in self.log_vars)
42 self.cps = [] # conditional probabilities in the grounding
43 self.var_dict = {} # ground variables created
44 for pp in self.parfactors:
45 self.ground_parfactor(pp, list(self.log_vars), populations, {})
46 return BeliefNetwork(self.title+"_grounded",
self.var_dict.values(), self.cps)
47
48 def ground_parfactor(self, parfactor, lvs, populations, context):
49 """
50 parfactor is the parfactor to get instances of
51 lvs is a list of the logical variables in parfactor not assigned in
context

https://aipython.org Version 0.9.9 October 15, 2023

15.2. Relational Probabilistic Models 337

52 populations is {logical_variable: population} dictionary

53 context is a {logical_variable:value} dictionary for
logical_variable in parfactor
54 """
55 if lvs == []:
56 if isinstance(parfactor, Prob):
57 self.cps.append(Prob(self.ground_pvr(parfactor.child,context),
58 [self.ground_pvr(p,context) for p in
parfactor.parents],
59 parfactor.values))
60 else:
61 print("Parfactor not implemented for",parfactor,"of
type",type(parfactor))
62 else:
63 for val in populations[lvs[0]]:
64 self.ground_parfactor(parfactor, lvs[1:], populations,
{lvs[0]:val}|context)
65
66
67 def ground_pvr(self, prv, context):
68 """grounds a parametrized random variable with respect to a context
69 prv is a parametrized random variable
70 context is a logical_variable:value dictionary that assigns all
logical variables in prv
71 """
72 if isinstance(prv,ParVar):
73 args = tuple(context[lv] for lv in prv.log_vars)
74 if (prv,args) in self.var_dict:
75 return self.var_dict[(prv,args)]
76 else:
77 new_gv = GrVar(prv, args)
78 self.var_dict[(prv,args)] = new_gv
79 return new_gv
80 else: # allows for non-parametrized random variables
81 return prv

A GrVar is a variable constructed by grounding a parametrized random vari-

able with respect to a tuple of values for the logical variables.

relnProbModels.py — (continued)

83 class GrVar(Variable):
84 """Grounded Variable"""
85 def __init__(self,parvar,args):
86 (x,y) = parvar.position
87 pos = (x + random.uniform(-0.2,0.2), y + random.uniform(-0.2,0.2))
88 Variable.__init__(self,parvar.name+"("+",".join(args)+")",
parvar.domain, pos)
89 self.parvar= parvar
90 self.args = tuple(args)
91 self.hash_value = None
92

https://aipython.org Version 0.9.9 October 15, 2023

338 15. Relational Learning

93 def __hash__(self):
94 if self.hash_value is None:
95 self.hash_value = hash((self.parvar, self.args))
96 return self.hash_value
97
98 def __eq__(self, other):
99 return isinstance(other,GrVar) and self.parvar == other.parvar and
self.args == other.args

The following is a representation of Examples 17.5-17.7 of Poole and Mack-

worth [2023]. The plate model – represented here using grades – is shown in
Figure 17.4. The observation in obs corresponds to the dataset of Figure 17.3.
The grounding in grades_gr corresponds to Figure 17.5, but also includes the
Grade variables no needed to answer the query (see exercise below).
Try the commented out queries to the Python shell:
relnProbModels.py — (continued)

101 Int = ParVar("Intelligent", ["St"], boolean, position=(0.25,0.75))

102 Grade = ParVar("Grade", ["St","Co"], ["A", "B", "C"], position=(0.5,0.25))
103 Diff = ParVar("Difficult", ["Co"], boolean, position=(0.75,0.75))
104
105 pg = Prob(Grade, [Int, Diff],
106 [[{"A": 0.1, "B":0.4, "C":0.5},
107 {"A": 0.01, "B":0.09, "C":0.9}],
108 [{"A": 0.9, "B":0.09, "C":0.01},
109 {"A": 0.5, "B":0.4, "C":0.1}]])
110 pi = Prob( Int, [], [0.5, 0.5])
111 pd = Prob( Diff, [], [0.5, 0.5])
112 grades = RBN("Grades RBN", {Int, Grade, Diff}, {pg,pi,pd})
113
114
115 #grades_gr = grades.ground({"St":["s1", "s2", "s3", "s4"], "Co":["c1",
"c2", "c3", "c4"]})
116
117 obs = {GrVar(Grade,["s1","c1"]):"A", GrVar(Grade,["s2","c1"]):"C",
GrVar(Grade,["s1","c2"]):"B",
118 GrVar(Grade,["s2","c3"]):"B", GrVar(Grade,["s3","c2"]):"B",
GrVar(Grade,["s4","c3"]):"B"}
119
120 # grades_rc = ProbRC(grades_gr)
121 # grades_rc.query(GrVar(Grade,["s3","c4"]), obs)
122 # grades_rc.query(GrVar(Grade,["s4","c4"]), obs)
123 # grades_rc.query(GrVar(Int,["s3"]), obs)
124 # grades_rc.query(GrVar(Int,["s4"]), obs)

Exercise 15.3 The grounding above creates a random variable for each element
for each possible combination of individuals in the populations. Change it so that
it only creates as many random variables as needed to answer a query. For exam-
ple, for the observations and queries above, only the variables in Figure 17.5 need
to be created.

https://aipython.org Version 0.9.9 October 15, 2023

15.2. Relational Probabilistic Models 339

Exercise 15.4 Displaying the ground network, e.g., using grades_gr.show(), cre-
ates a messy diagram. Make it so that the user can provide offsets for each individ-
ual and uses the position of the prv plus the offsets of all the individuals involved.
Use this create to create a 2D grid of grades in the example above.

https://aipython.org Version 0.9.9 October 15, 2023

Chapter 16

Version History

• 2023-10-6 Version 0.9.8 GUIS for search, Bayesian learning, causality and
many smaller changes.

• 2023-07-31 Version 0.9.7 includes relational probabilistic models and smaller

changes

• 2023-06-06 Version 0.9.6 controllers are more consistent. Many smaller

changes.

• 2022-08-13 Version 0.9.5 major revisions including extra code for causality
and deep learning

• 2021-07-08 Version 0.9.1 updated the CSP code to have the same repre-
sentation of variables as used by the probability code

• 2021-05-13 Version 0.9.0 Major revisions to chapters 8 and 9. Introduced

recursive conditioning, simplified much code. New section on multia-
gent reinforcement learning.

• 2020-11-04 Version 0.8.6 simplified value iteration for MDPs.

• 2020-10-20 Version 0.8.4 planning simplified, and gives error if goal not
part of state (by design). Fixed arc costs.

• 2020-07-21 Version 0.8.2 added positions and string to constraints

• 2019-09-17 Version 0.8.0 rerepresented blocks world (Section 6.1.2) due to

bug found by Donato Meoli.

341
Bibliography

Chen, T. and Guestrin, C. (2016), Xgboost: A scalable tree boosting system. In

KDD ’16: 22nd ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, pages 785–794, URL https://doi.org/10.1145/2939672.
2939785. 182

Chollet, F. (2021), Deeep Learning with Python. Manning. 185

Dua, D. and Graff, C. (2017), UCI machine learning repository. URL http://
archive.ics.uci.edu/ml. 147

Glorot, X. and Bengio, Y. (2010), Understanding the difficulty of training deep

feedforward neural networks. In Thirteenth International Conference on Artifi-
cial Intelligence and Statistics, pages 249–256, URL https://proceedings.mlr.
press/v9/glorot10a.html. 186

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-
Y. (2017), LightGBM: A highly efficient gradient boosting decision tree. In
Advances in Neural Information Processing Systems 30. 182

Lichman, M. (2013), UCI machine learning repository. URL http://archive.

ics.uci.edu/ml. 147

Pérez, F. and Granger, B. E. (2007), IPython: a system for interactive scientific

computing. Computing in Science and Engineering, 9(3):21–29, URL https://
ipython.org. 10

Poole, D. L. and Mackworth, A. K. (2023), Artificial Intelligence: foundations of

computational agents. Cambridge University Press, 3rd edition, URL https:

343
344 Bibliography

//artint.info. 9, 23, 25, 46, 47, 48, 74, 121, 205, 209, 210, 216, 218, 298, 299,
304, 308, 338

https://aipython.org Version 0.9.9 October 15, 2023

Index

α-β pruning, 321 Boosted dataset, 179

Boosting learner, 180
A∗ search, 51 Branch and bound, 104
A∗ Search, 58 CF learner, 329
action, 123 CPD, 201
agent, 23, 295 CPDrename, 247
argmax, 19 CSP, 69
assignment, 69, 199 CSP from STRIPS, 136
assumable, 117 Clause, 107
asynchronous value iteration, 294 Con solver, 84
augmented feature, 158 ConstantCPD, 201
Constraint, 68
Bayesian network, 206
DBN, 247
belief network, 206
DBNVEfilter, 249
blocks world, 126
Boolean feature, 148 DBNvariable, 245
botton-up proof, 110 DF Branch and bound, 63
branch-and-bound search, 62 DT learner, 165
Data from file, 155
class Data from files, 157
Action instance, 140 Data set, 149
Agent, 24 Data set augmented, 158
Arc, 40 DecisionFunction, 279
Askable, 107 DecisionNetwork, 272
Assumable, 117 DecisionVariable, 271
BNfromDBN, 248 Displayable, 18
BeliefNetwork, 206 Dist, 205

345
346 Index

Dropout layer, 192 Node, 317

EM learner, 259 NoisyOR, 202
Env from MDP, 298 POP node, 140
Environment, 24 POP search from STRIPS, 141
Evaluate, 154 ParVar, 336
Factor, 199 ParticleFiltering, 229
FactorMax, 284 Path, 43
FactorObserved, 222 Planning problem, 124
FactorRename, 246 Plot env, 34
FactorSum, 223 Plot prices, 27
Forward STRIPS, 129 Predict, 162
FrontierPQ, 57 Prob, 203
GTB learner, 182 ProbDT, 204
GibbsSampling, 232 ProbRC, 218
GrVar, 337 ProbSearch, 216
GraphicalModel, 206 Q learner, 301
GridMDP, 286, 290 RBN, 336
HMM, 236 RC DN, 279
HMMVEfilter, 238 RL agent, 296
HMM Controlled, 239 RL env, 295
HMM Local, 240 Rating set, 333
HMMparticleFilter, 242 ReLU layer, 187
Healthye nv, 298 Regression STRIPS, 133
IFeq, 204 RejectionSampling, 227
InferenceMethod, 214, 265 Rob body, 30
KB, 108 Rob env, 29
KBA, 117 Rob middle layer, 32
K fold dataset, 170 Rob top layer, 33
K means learner, 255 Runtime distribution, 100
Layer, 185 SARSA LFA learner, 313
Learner, 160 SLSearcher, 93
LikelihoodWeighting, 228 STRIPS domain, 124
Linear complete layer, 186 SamplingInferenceMethod, 227
Linear complete layer RMS Prop, Search from CSP, 81, 83
191 Search problem, 39
Linear complete layer momentum, Search problem from explicit graph,
190 41
Linear learner, 173 Search with AC from CSP, 92
LogisticRegression, 202 Searcher, 51
MDP, 285 SearcherGUI, 54
Magic sum, 319 SearcherMPP, 60
Model based reinforcement learner, Show Localization, 240
308 Sigmoid layer, 188
Monster game env, 299 SoftConstraint, 103
NN, 188 State, 129

https://aipython.org Version 0.9.9 October 15, 2023

Index 347

Strips, 123 Dropout, 192

Subgoal, 133 dynamic belief network (DBN), 245
TP agent, 27 representation, 245
TP env, 26
TabFactor, 203 EM, 259
Updatable priority queue, 99 environment, 23, 24, 295
Utility, 271 error, 153
UtilityTable, 271 example, 148
VE, 222 explanation, 113
VE DN, 283 explicit graph, 41
Variable, 67
clause, 107 factor, 199, 203
collaborative filtering, 329 factor times, 223
comprehensions, 12 feature, 148, 150
condition, 68 feature engineering, 147
conditional probability distribution file
(CPD), 201 agentBuying.py, 26
consistency algorithms, 84 agentEnv.py, 29
constraint, 68 agentFollowTarget.py, 36
constraint satisfaction problem, 67 agentMiddle.py, 32
CPD (conditional probability distri- agentTop.py, 33
bution), 201 agents.py, 24
cross validation, 169 cspConsistency.py, 84
CSP, 67 cspConsistencyGUI.py, 89
consistency, 84 cspDFS.py, 81
domain splitting, 87, 91 cspExamples.py, 72
search, 82 cspProblem.py, 68
stochastic local search, 93 cspSLS.py, 93
currying, 72 cspSearch.py, 83
cspSoft.py, 103
dataset, 148 decnNetworks.py, 271
DBN display.py, 18
filtering, 249 learnBayesian.py, 251
unrolling, 248 learnBoosting.py, 179
DBN (dynamic belief network), 245 learnCrossValidation.py, 170
debugging, 113 learnDT.py, 165
decision network, 271 learnEM.py, 259
decision tree learning, 165 learnKMeans.py, 255
decision tree factors, 204 learnLinear.py, 173
decision variable, 271 learnNN.py, 185
deep learning, 185 learnNoInputs.py, 162
dict union, 20 learnProblem.py, 148
display, 18 logicAssumables.py, 117
Displayable, 18 logicBottomUp.py, 110
domain splitting, 87, 91 logicExplain.py, 113

https://aipython.org Version 0.9.9 October 15, 2023

348 Index

logicNegation.py, 120 filtering, 238, 242

logicProblem.py, 107 DBN, 249
logicTopDown.py, 112 flip, 20
masLearn.py, 322 forward planning, 128
masMiniMax.py, 320 frange, 150
masProblem.py, 317 ftype, 150
mdpExamples.py, 286
mdpProblem.py, 285 game, 317
probCounterfactual.py, 267 Gibbs sampling, 232
probDBN.py, 245 graphical model, 206
probDo.py, 265
heuristic planning, 131, 135
probExamples.py, 208
hidden Markov model, 236
probFactors.py, 199
hierarchical controller, 28
probGraphicalModels.py, 206
HMM
probHMM.py, 236
exact filtering, 238
probLocalization.py, 239
particle filtering, 242
probRC.py, 216
probStochSim.py, 225 HMM (hidden Markov models), 236
probVE.py, 222 importance sampling, 229
pythonDemo.py, 13 interact
relnCollFilt.py, 329 proofs, 114
relnProbModels.py, 335 ipython, 10
rlExamples.py, 298
rlFeatures.py, 313 k-means, 255
rlModelLearner.py, 308 kernel, 158
rlMonsterGameFeatures.py, 310 knowledge base, 108
rlProblem.py, 295
rlQExperienceReplay.py, 305 learner, 160
rlQLearner.py, 301 learning, 147–197, 251–263, 295–316,
searchBranchAndBound.py, 63 329–335
searchExample.py, 45 cross validation, 169
searchGUI.py, 54 decision tree, 165
searchGeneric.py, 51 deep, 185–197
searchGrid.py, 61 deep learning, 185
searchMPP.py, 60 EM, 259
searchProblem.py, 39 k-means, 255
searchTest.py, 64 linear regression, 173
stripsCSPPlanner.py, 136 linear classification, 173
stripsForwardPlanner.py, 129 neural network, 185
stripsHeuristic.py, 131 no inputs, 161
stripsPOP.py, 140 reinforcement, 295–316
stripsProblem.py, 123 relational, 329
stripsRegressionPlanner.py, 133 supervised, 147–184
utilities.py, 19 with uncertainty, 251–263
variable.py, 67 LightGBM, 182

https://aipython.org Version 0.9.9 October 15, 2023

Index 349

likelihood weighting, 228 with certainty, 123–145

linear regression, 173 with learning, 307
linear classification, 173 with uncertainty, 271–294
localization, 239 plotting
logistic regression, 201 agents in time, 27
logit, 174, 175 reinforcement learning, 297
loss, 153 robot environment, 34
run-time distribution, 100
magic square, 318 stochastic simulation, 234
magic-sum game, 318 predictor, 153
Markov Chain Monte Carlo, 232
Prob, 203
Markov decision process, 285
probabilistic inference methods, 214
max display level, 18
probability, 199
MCMC, 232
proof
MDP, 285, 298
bottom-up, 110
method
explanation, 113
consistent, 70
top-down, 112
holds, 69
proposition, 107
maxh, 131
Python, 9
zero, 129
minimax, 317 Q learning, 301
minimax algorithm, 320 query, 214
minsets, 118 queryD0, 265
model-based reinforcement learner,
307 RC, 218, 279
multiagent system, 317 recursive conditioning, 218
multiple path pruning, 60 recursive conditioning (RC), 218
recursive conditioning for decision
n-queens problem, 80
networks, 279
naive search probabilistic inference,
regression planning, 133
216
reinforcement learning, 295–316
naughts and crosses, 318
environment, 295
neural network, 185
feature-based, 310
noisy-or, 202
model-based, 307
NotImplementedError, 24
Q-learning, 301
partial-order planner, 140 rejection sampling, 227
particle filtering, 229 relational learning, 329
HMMs, 242 ReLU, 187
planning, 123–145, 271–294 resampling, 230
CSP, 136 robot
decision network, 271 body, 30
forward, 128 environment, 29
MDP, 285 middle layer, 32
partial order, 140 plotting, 34
regression, 133 top layer, 33

https://aipython.org Version 0.9.9 October 15, 2023

350 Index

robot delivery domain, 124 XGBoost, 182

run time, 16
runtime distribution, 100 yield, 15

sampling, 225
importance sampling, 229
belief networks, 227
likelihood weighting, 228
particle filtering, 229
rejection, 227
scope, 68
search, 39
A∗ , 51
branch-and-bound, 62
multiple path pruning, 60
search with any conflict, 95
search with var pq, 97
show, 70, 207
sigmoid, 174
softmax, 175
stochastic local search, 93
any-conflict, 95
two-stage choice, 96
stochastic simulation, 225

tabular factor, 203

test
SLS, 101
tic-tac-toe, 318
top-down proof, 112

uncertainty, 199
unit test, 21, 59, 80, 111, 112, 114
unrolling
DBN, 248
updatable priority queue, 99
utility, 271
utility table, 271

value iteration, 289

variable, 67
variable elimination (VE), 222
variable elimination for decision net-
works, 283
VE, 222
visualize, 18

https://aipython.org Version 0.9.9 October 15, 2023

Python AI Programming Guide
No ratings yet
Python AI Programming Guide
388 pages
Python AI Programming Guide
No ratings yet
Python AI Programming Guide
324 pages
Python AI Foundations Guide
No ratings yet
Python AI Foundations Guide
328 pages
Aipython
No ratings yet
Aipython
330 pages
Python+artificial Intelligence
100% (1)
Python+artificial Intelligence
330 pages
Aipython
No ratings yet
Aipython
396 pages
Aipython
No ratings yet
Aipython
392 pages
Inteligencia Artificial
No ratings yet
Inteligencia Artificial
408 pages
Python For Artificial Intelligence
No ratings yet
Python For Artificial Intelligence
298 pages
Aipython
No ratings yet
Aipython
408 pages
Aipython
No ratings yet
Aipython
410 pages
Python Code For Artififical Intelligence
No ratings yet
Python Code For Artififical Intelligence
100 pages
Python For AI
No ratings yet
Python For AI
300 pages
AI Python
No ratings yet
AI Python
312 pages
Aipython Inteligencia Artificial
No ratings yet
Aipython Inteligencia Artificial
320 pages
Aipython PDF
No ratings yet
Aipython PDF
320 pages
Python Code For Artificial Intelligence
No ratings yet
Python Code For Artificial Intelligence
298 pages
Aipython
No ratings yet
Aipython
287 pages
AI Python
No ratings yet
AI Python
221 pages
Python Code For Artificial Intelligence: Foundations of Computational Agents
No ratings yet
Python Code For Artificial Intelligence: Foundations of Computational Agents
221 pages
Python Code For Artificial Intelligence: Foundations of Computational Agents
No ratings yet
Python Code For Artificial Intelligence: Foundations of Computational Agents
221 pages
Aipython
No ratings yet
Aipython
221 pages
Python Code For Artificial Intelligence: Foundations of Computational Agents
No ratings yet
Python Code For Artificial Intelligence: Foundations of Computational Agents
221 pages
Python Code For Artificial Intelligence: Foundations of Computational Agents
No ratings yet
Python Code For Artificial Intelligence: Foundations of Computational Agents
221 pages
AI Python 1565131797
No ratings yet
AI Python 1565131797
221 pages
AI Python
No ratings yet
AI Python
221 pages
Artificial Intelligence With Python
No ratings yet
Artificial Intelligence With Python
229 pages
Aipython
No ratings yet
Aipython
227 pages
Python Code For Artificial Intelligence: Foundations of Computational Agents
No ratings yet
Python Code For Artificial Intelligence: Foundations of Computational Agents
221 pages
Aipython
No ratings yet
Aipython
221 pages
Python Code For AI
100% (3)
Python Code For AI
219 pages
Aipython
No ratings yet
Aipython
221 pages
AI Code Guide
No ratings yet
AI Code Guide
215 pages
CS2109S Notes
No ratings yet
CS2109S Notes
19 pages
CS6601 Notes
No ratings yet
CS6601 Notes
66 pages
Python Data Science
100% (1)
Python Data Science
173 pages
Aipython
No ratings yet
Aipython
219 pages
D2l-En Deep Learning PDF
No ratings yet
D2l-En Deep Learning PDF
639 pages
Deep Learning Guide: Concepts & Implementation
100% (3)
Deep Learning Guide: Concepts & Implementation
661 pages
Deep Learning Guide: Installation & Basics
No ratings yet
Deep Learning Guide: Installation & Basics
651 pages
Heart Disease Prediction Report
No ratings yet
Heart Disease Prediction Report
113 pages
Libro Nuevo ML
No ratings yet
Libro Nuevo ML
577 pages
Python Data Science Readthedocs Io en Latest
No ratings yet
Python Data Science Readthedocs Io en Latest
196 pages
6.036 Notes
No ratings yet
6.036 Notes
99 pages
Pyomo Online Documentation 5.1.1
No ratings yet
Pyomo Online Documentation 5.1.1
113 pages
Pyomo 4.1 Documentation Guide
No ratings yet
Pyomo 4.1 Documentation Guide
113 pages
Table of Contents
No ratings yet
Table of Contents
9 pages
Deep Learning Guide with Gluon
100% (1)
Deep Learning Guide with Gluon
633 pages
AI Concepts Using Python
100% (9)
AI Concepts Using Python
428 pages
d2l en Pytorch
No ratings yet
d2l en Pytorch
977 pages
MSME Samples
No ratings yet
MSME Samples
144 pages
The Role of Artificial Intelligence in Healthcare
No ratings yet
The Role of Artificial Intelligence in Healthcare
4 pages
L-JS - L-Av Difusores de Techo
No ratings yet
L-JS - L-Av Difusores de Techo
90 pages
Django 3 by Example: Build Powerful and Reliable Python Web Applications From Scratch, 3Rd Edition
0% (1)
Django 3 by Example: Build Powerful and Reliable Python Web Applications From Scratch, 3Rd Edition
3 pages
Service Manual Z1Z18
No ratings yet
Service Manual Z1Z18
36 pages
AAIATC250282787 Stage1 4410091239 4197190046
No ratings yet
AAIATC250282787 Stage1 4410091239 4197190046
4 pages
Coin-Operated Mobile Charger Guide
No ratings yet
Coin-Operated Mobile Charger Guide
122 pages
Database Lab for IT Students
No ratings yet
Database Lab for IT Students
3 pages
Haws Part 7360bt 7460bt Specsheet PDF
No ratings yet
Haws Part 7360bt 7460bt Specsheet PDF
2 pages
M1-ATX: - Installation Guide
No ratings yet
M1-ATX: - Installation Guide
6 pages
HCK0 Samsung Datasheet DDR3SDRAM
No ratings yet
HCK0 Samsung Datasheet DDR3SDRAM
64 pages
TB170604 - New 9040-9042-S8 IHM Board - OEM Firmware
No ratings yet
TB170604 - New 9040-9042-S8 IHM Board - OEM Firmware
3 pages
तरंगिनी
No ratings yet
तरंगिनी
119 pages
P6BX2
No ratings yet
P6BX2
32 pages
Karnataka Rtgs
No ratings yet
Karnataka Rtgs
1 page
Active Electronic Scan Array Airborne Fire Control Radar EL/M-2052
No ratings yet
Active Electronic Scan Array Airborne Fire Control Radar EL/M-2052
2 pages
Ch.2 SCM Chopra Meindl
No ratings yet
Ch.2 SCM Chopra Meindl
30 pages
Queuing Theory: Single vs Multi-Server Systems
No ratings yet
Queuing Theory: Single vs Multi-Server Systems
59 pages
IKEA Crisis Management Strategy
83% (6)
IKEA Crisis Management Strategy
12 pages
Sumitomo Llantas
No ratings yet
Sumitomo Llantas
16 pages
800g Wavelogic 5 Extreme Motr Module Ds
No ratings yet
800g Wavelogic 5 Extreme Motr Module Ds
3 pages
Ecsbc 2024
No ratings yet
Ecsbc 2024
215 pages
Certificate Generation Using Blockchain Report-1
No ratings yet
Certificate Generation Using Blockchain Report-1
31 pages
ENFLEX - Pharma & Healthcare - V2 - LoRe
No ratings yet
ENFLEX - Pharma & Healthcare - V2 - LoRe
33 pages
IT002 VoIP
No ratings yet
IT002 VoIP
6 pages
Grid View
No ratings yet
Grid View
116 pages
GC2000 Software Operating Instructions - en
No ratings yet
GC2000 Software Operating Instructions - en
65 pages
Flexible Armored Cable Wiring 2
No ratings yet
Flexible Armored Cable Wiring 2
2 pages
Field Tools en 132502
No ratings yet
Field Tools en 132502
3 pages
Accounting Information Systems: The Processes and Controls Leslie Turner Ready To Read
100% (1)
Accounting Information Systems: The Processes and Controls Leslie Turner Ready To Read
130 pages