Current State of the Course!!!
We’re done with Part I Search and Planning!
Part II: Probabilistic Reasoning
Diagnosis
Speech recognition
Tracking objects
Robot mapping
Genetics
Error correcting codes
… lots more!
Part III: Machine Learning
Part IV: Advanced Applications of AI`
CSE 401: Artificial Intelligence
Probability
[These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]
Today
Probability
Random Variables
Joint and Marginal Distributions
Conditional Distribution
Product Rule, Chain Rule, Bayes’ Rule
Inference
Independence
You’ll need all this stuff A LOT for the
next few weeks, so make sure you go
over it now!
Random Variables
A random variable is some aspect of the world about
which we (may) have uncertainty
R = Is it raining?
T = Is it hot or cold?
D = How long will it take to drive to work?
L = Where is the ghost?
We denote random variables with capital letters
Like variables in a CSP, random variables have domains
R in {true, false} (often write as {+r, -r})
T in {hot, cold}
D in [0, )
L in possible locations, maybe {(0,0), (0,1), …}
Probability Distributions
Associate a probability with each value
Temperature: Weather:
W P
T P
sun 0.6
hot 0.5
rain 0.1
cold 0.5
fog 0.3
meteor 0.0
Probability Distributions
Unobserved random variables have distributions
Shorthand notation:
T P W P
hot 0.5 sun 0.6
cold 0.5 rain 0.1
fog 0.3
meteor 0.0
A distribution is a TABLE of probabilities of values OK if all domain entries are unique
A probability (lower case value) is a single number
Must have: and
Joint Distributions
A joint distribution over a set of random variables:
specifies a real number for each assignment (or outcome):
T W P
Must obey: hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
Size of distribution if n variables with domain sizes d?
For all but the smallest distributions, impractical to write out!
Probabilistic Models
A probabilistic model is a joint distribution Distribution over T,W
over a set of random variables
T W P
Probabilistic models: hot sun 0.4
(Random) variables with domains hot rain 0.1
Assignments are called outcomes
Joint distributions: say whether assignments cold sun 0.2
(outcomes) are likely cold rain 0.3
Normalized: sum to 1.0
Ideally: only certain variables directly interact
Constraint over T,W
Constraint satisfaction problems: T W P
Variables with domains
hot sun T
Constraints: state whether assignments are
possible hot rain F
Ideally: only certain variables directly interact
cold sun F
cold rain T
Events
An event is a set E of outcomes
From a joint distribution, we can
calculate the probability of any event T W P
Probability that it’s hot AND sunny? hot sun 0.4
hot rain 0.1
Probability that it’s hot?
cold sun 0.2
Probability that it’s hot OR sunny? cold rain 0.3
Typically, the events we care about
are partial assignments, like P(T=hot)
Quiz: Events
P(+x, +y) ?
X Y P
+x +y 0.2
P(+x) ?
+x -y 0.3
-x +y 0.4
-x -y 0.1
P(-y OR +x) ?
Marginal Distributions
Marginal distributions are sub-tables which eliminate variables
Marginalization (summing out): Combine collapsed rows by adding
T P
hot 0.5
T W P
cold 0.5
hot sun 0.4
hot rain 0.1
cold sun 0.2 W P
cold rain 0.3 sun 0.6
rain 0.4
Quiz: Marginal Distributions
X P
+x
X Y P
-x
+x +y 0.2
+x -y 0.3
-x +y 0.4 Y P
-x -y 0.1 +y
-y
Conditional Probabilities
A simple relation between joint and conditional probabilities
In fact, this is taken as the definition of a conditional probability
P(a,b)
P(a) P(b)
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
Quiz: Conditional Probabilities
P(+x | +y) ?
X Y P
+x +y 0.2 P(-x | +y) ?
+x -y 0.3
-x +y 0.4
-x -y 0.1
P(-y | +x) ?
Conditional Distributions
Conditional distributions are probability distributions over
some variables given fixed values of others
Conditional Distributions
Joint Distribution
W P
T W P
sun 0.8
hot sun 0.4
rain 0.2
hot rain 0.1
cold sun 0.2
W P cold rain 0.3
sun 0.4
rain 0.6
Normalization Trick
T W P
hot sun 0.4
W P
hot rain 0.1
sun 0.4
cold sun 0.2
rain 0.6
cold rain 0.3
Normalization Trick
SELECT the joint NORMALIZE the
probabilities selection
T W P matching the (make it sum to one)
hot sun 0.4 evidence T W P W P
hot rain 0.1 cold sun 0.2 sun 0.4
cold sun 0.2 cold rain 0.3 rain 0.6
cold rain 0.3
Normalization Trick
SELECT the joint NORMALIZE the
probabilities selection
T W P matching the (make it sum to one)
hot sun 0.4 evidence T W P W P
hot rain 0.1 cold sun 0.2 sun 0.4
cold sun 0.2 cold rain 0.3 rain 0.6
cold rain 0.3
Why does this work? Sum of selection is P(evidence)! (P(T=c), here)
Quiz: Normalization Trick
P(X | Y=-y) ?
SELECT the joint NORMALIZE the
probabilities selection
X Y P matching the (make it sum to one)
+x +y 0.2 evidence
+x -y 0.3
-x +y 0.4
-x -y 0.1
To Normalize
(Dictionary) To bring or restore to a normal condition
All entries sum to ONE
Procedure:
Step 1: Compute Z = sum over all entries
Step 2: Divide every entry by Z
Example 1 Example 2
W P W P T W P T W P
Normalize
hot sun 20 Normalize hot sun 0.4
sun 0.2 sun 0.4
hot rain 5 hot rain 0.1
rain 0.3 Z = 0.5 rain 0.6 Z = 50
cold sun 10 cold sun 0.2
cold rain 15 cold rain 0.3
Probabilistic Inference
Probabilistic inference: compute a desired
probability from other known probabilities (e.g.
conditional from joint)
We generally compute conditional probabilities
P(on time | no reported accidents) = 0.90
These represent the agent’s beliefs given the evidence
Probabilities change with new evidence:
P(on time | no accidents, 5 a.m.) = 0.95
P(on time | no accidents, 5 a.m., raining) = 0.80
Observing new evidence causes beliefs to be updated
Inference by Enumeration
* Works fine with
General case: We want: multiple query
Evidence variables: variables, too
Query* variable:
All variables
Hidden variables:
Step 1: Select the Step 2: Sum out H to get joint Step 3: Normalize
entries consistent of Query and evidence
with the evidence
Inference by Enumeration
S T W P
P(W)?
summe hot sun 0.30
r
summe hot rain 0.05
P(W | winter)? r
summe cold sun 0.10
r
summe cold rain 0.05
r
P(W | winter, hot)? winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
Obvious problems:
Worst-case time complexity O(dn)
Space complexity O(dn) to store the joint distribution
The Product Rule
Sometimes have conditional distributions but want the joint
The Product Rule
Example:
D W P D W P
wet sun 0.1 wet sun 0.08
R P
dry sun 0.9 dry sun 0.72
sun 0.8
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06
The Chain Rule
More generally, can always write any joint distribution as an
incremental product of conditional distributions
Why is this always true?
Bayes’ Rule
Two ways to factor a joint distribution over two variables:
That’s my rule!
Dividing, we get:
Why is this at all helpful?
Lets us build one conditional from its reverse
Often one conditional is tricky but the other one is simple
Foundation of many systems we’ll see later (e.g. ASR, MT)
In the running for most important AI equation!
Inference with Bayes’ Rule
Example: Diagnostic probability from causal probability:
Example:
M: meningitis, S: stiff neck
Example
givens
Note: posterior probability of meningitis still very small
Note: you should still get stiff necks checked out! Why?
Quiz: Bayes’ Rule
Given: D W P
wet sun 0.1
R P
dry sun 0.9
sun 0.8
wet rain 0.7
rain 0.2
dry rain 0.3
What is P(W | dry) ?