An Introduction to Artificial Intelligence
Chapter 13 &14.1-14.2: Uncertainty & Bayesian Networks
Ms.Nitu L. Pariyal)
CSE Department
M.G. college of Engg.
Outline
Uncertainty
Probability
Syntax and Semantics
Inference
Independence and Bayes' Rule
Bayesian Network
Uncertainty
Agents may need to handle uncertainty, whether due to
partial observability, nondeterminism,or a combination of
the two. An agent may never know for certain what state it’s in
or where it will end up after a sequence of actions.
Let action At = leave for airport t minutes before flight
Will At get me there on time?
Problems:
1. partial observability (road state, other drivers' plans, etc.)
2. noisy sensors (traffic reports)
3. uncertainty in action outcomes (flat tire, etc.)
4. immense complexity of modeling and predicting traffic
Hence a purely logical approach either
1. risks falsehood: “A25 will get me there on time”,
or
2. leads to conclusions that are too weak for
decision making:
“A25 will get me there on time if there's no accident on
the bridge and it doesn't rain and my tires remain
intact etc etc.”
(A1440 might reasonably be said to get me there on time
but I'd have to stay overnight in the airport …)
Making decisions under
uncertainty
Suppose I believe the following:
P(A25 gets me there on time | …) = 0.04
P(A90 gets me there on time | …) = 0.70
P(A120 gets me there on time | …) = 0.95
P(A1440 gets me there on time | …) = 0.9999
Which action to choose?
Depends on my preferences for missing flight vs. time spent
waiting, etc.
◦ Utility theory is used to represent and infer preferences
◦ Decision theory = probability theory + utility theory
Syntax
Basic element: random variable
Similar to propositional logic: possible worlds defined by
assignment of values to random variables.
Boolean random variables
e.g., Cavity (do I have a cavity?)
Discrete random variables
e.g., Weather is one of <sunny,rainy,cloudy,snow>
Domain values must be exhaustive and mutually exclusive
Elementary proposition constructed by assignment of a value to
a random variable:
e.g., Weather = sunny, Cavity = false
(abbreviated as cavity)
Complex propositions formed from elementary propositions
and standard logical connectives
e.g., Weather = sunny Cavity = false
Syntax
Atomic event: A complete specification of the state of the
world about which the agent is uncertain
E.g., if the world consists of only two Boolean variables Cavity
and Toothache, then there are 4 distinct atomic events:
Cavity = false Toothache = false
Cavity = false Toothache = true
Cavity = true Toothache = false
Cavity = true Toothache = true
Atomic events are mutually exclusive(Limited) and
exhaustive(full or complete)
Axioms of probability
For any propositions A, B
◦ 0 ≤ P(A) ≤ 1
◦ P(true) = 1 and P(false) = 0
◦ P(A B) = P(A) + P(B) - P(A B)
◦
Prior probability
Prior or unconditional probabilities of propositions
e.g., P(Cavity = true) = 0.1 and P(Weather = sunny) = 0.72
correspond to belief prior to arrival of any (new) evidence
Probability distribution gives values for all possible assignments:
P(Weather) = <0.72,0.1,0.08,0.1> (normalized, i.e., sums to 1)
Joint probability distribution for a set of random variables gives
the probability of every atomic event on those random
variables
P(Weather,Cavity) = a 4 × 2 matrix of values:
Weather = sunny rainy cloudy snow
Cavity = true 0.144 0.02 0.016 0.02
Cavity = false 0.576 0.08 0.064 0.08
Every question about a domain can be answered by the joint
Conditional probability
Conditional or posterior probabilities
e.g., P(cavity | toothache) = 0.8
i.e., given that toothache is all I know
(Notation for conditional distributions:
P(Cavity | Toothache) = 2-element vector of 2-element vectors)
If we know more, e.g., cavity is also given, then we have
P(cavity | toothache,cavity) = 1
New evidence may be irrelevant, allowing simplification, e.g.,
P(cavity | toothache, sunny) = P(cavity | toothache) = 0.8
This kind of inference, sanctioned by domain knowledge, is
critical
Conditional probability
Definition of conditional probability:
P(a | b) = P(a b) / P(b) if P(b) > 0
Product rule gives an alternative formulation:
P(a b) = P(a | b) P(b) = P(b | a) P(a)
A general version holds for whole distributions, e.g.,
P(Weather,Cavity) = P(Weather | Cavity) P(Cavity)
(View as a set of 4 × 2 equations, not matrix mult.)
Chain rule is derived by successive application of product
rule:
P(X1, …,Xn) = P(X1,...,Xn-1) P(Xn | X1,...,Xn-1)
= P(X1,...,Xn-2) P(Xn-1 | X1,...,Xn-2) P(Xn | X1,...,Xn-1)
= …
= πi= 1^n P(Xi | X1, … ,Xi-1)
Inference by Numeration
Start with the joint probability distribution:
For any proposition φ, sum the atomic events where
it is true: P(φ) = Σω:ω╞φ P(ω)
Inference by enumeration
Start with the joint probability distribution:
For any proposition φ, sum the atomic events
where it is true: P(φ) = Σω:ω╞φ P(ω)
P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 =
0.2
Inference by enumeration
Start with the joint probability distribution:
For any proposition φ, sum the atomic events where it is
true: P(φ) = Σω:ω╞φ P(ω)
P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
P(cavity ∨ toothache) = 0.108 + 0.012 + 0.072 + 0.008
+ 0.016 + 0.064 = 0.28 .
Inference by enumeration
Start with the joint probability distribution:
Can also compute conditional probabilities:
P(cavity | toothache) = P(cavity toothache)
P(toothache)
= 0.016+0.064
0.108 + 0.012 + 0.016 + 0.064
= 0.4
Conditional probability
Conditional or posterior probabilities
e.g., P(cavity | toothache) = 0.8
i.e., given that toothache is all I know
New evidence may be irrelevant,
allowing simplification, e.g.,
P(cavity | toothache, sunny) = P(cavity |
toothache) = 0.8
Conditional probability
Definition of conditional probability:
P(a | b) = P(a b) / P(b) if P(b) > 0
Product rule gives an alternative formulation:
P(a b) = P(a | b) P(b) = P(b | a) P(a)
Chain rule is derived by successive application of
product rule:
P(X1, …,Xn)
= P(X1,...,Xn-1) P(Xn | X1,...,Xn-1)
= P(X1,...,Xn-2) P(Xn-1 | X1,...,Xn-2) P(Xn | X1,...,Xn-1)
=…
= πi= 1^n P(Xi | X1, … ,Xi-1)
Independence
Let us expand the full joint distribution in Figure 13.3 by
adding a fouarth variable, Weather .
The full joint distribution then becomes P(Toothache,
Catch, Cavity,Weather ), which has 2 × 2 × 2 × 4 = 32
entries. It contains four “editions” of the table shown in
Figure 13.3,
one for each kind of weather.
What relationship do these editions have to each other and
to the original three-variable table?
For example, how are P(toothache, catch, cavity, cloudy)
and P(toothache, catch, cavity) related? We can use the
product rule:
P(toothache, catch, cavity, cloudy)= P(cloudy | toothache,
catch, cavity)P(toothache, catch, cavity) .
Independence
A and B are independent iff
P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A)
P(B)
P(Toothache, Catch, Cavity, Weather)
= P(Toothache, Catch, Cavity) P(Weather)
32 entries reduced to 12; for n independent biased
coins, O(2n) →O(n)
Bayes' Rule
P(ab) = P(a | b) P(b) = P(b | a) P(a)
Bayes' rule: P(a | b) = P(b | a) P(a) / P(b)
Useful for assessing diagnostic probability from
causal probability:
P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect)
E.g., let M be meningitis, S be stiff neck:
◦
◦ P(m|s) = P(s|m) P(m) / P(s) = 0.8 × 0.0001 / 0.1 = 0.0008
Note: posterior probability of meningitis still very small!
◦
Bayes' Rule and conditional
independence
P(Cavity | toothache catch)
= αP(toothache catch | Cavity) P(Cavity)
= αP(toothache | Cavity) P(catch | Cavity) P(Cavity)
This is an example of a naïve Bayes model:
P(Cause,Effect1, … ,Effectn) = P(Cause) πiP(Effecti|Cause)
Bayesian networks
A simple, graphical notation for conditional
independence assertions and hence for compact
specification of full joint distributions
Syntax:
a set of nodes, one per variable
◦
◦ a directed, acyclic graph (link ≈ "directly influences")
◦ a conditional distribution for each node given its parents:
P (Xi | Parents (Xi))
In the simplest case, conditional distribution
represented as a conditional probability table
(CPT) giving the distribution over Xi for each
combination of parent values
Example
Topology of network encodes conditional
independence assertions:
Weather is independent of the other variables
Toothache and Catch are conditionally
independent given Cavity
Example
I'm at work, neighbor John calls to say my alarm is ringing,
but neighbor Mary doesn't call. Sometimes it's set off by
minor earthquakes. Is there a burglar?
Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls
Network topology reflects "causal" knowledge:
◦ A burglar can set the alarm off
◦ An earthquake can set the alarm off
◦ The alarm can cause Mary to call
◦ The alarm can cause John to call
Example contd.
Compactness
A CPT for Boolean Xi with k Boolean parents has 2k rows for
the combinations of parent values
Each row requires one number p for Xi = true
(the number for Xi = false is just 1-p)
If each variable has no more than k parents, the complete
network requires O(n · 2k) numbers
I.e., grows linearly with n, vs. O(2n) for the full joint
distribution
For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1
= 31)
Semantics
The full joint distribution is defined as the product of
the local conditional distributions:
n
P (X1, … ,Xn) = πi = 1 P (Xi | Parents(Xi))
e.g., P(j m a b e)
= P (j | a) P (m | a) P (a | b, e) P (b) P (e)
Constructing Bayesian
networks
1. Choose an ordering of variables X1, … ,Xn
2. For i = 1 to n
3. add Xi to the network
◦
◦ select parents from X1, … ,Xi-1 such that
P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1)
This choice of parents guarantees:
n
P (X1, … ,Xn) = πi =1 P (Xi | X1, … , Xi-1)
n
(chain rule)
= πi =1P (Xi | Parents(Xi))
(by construction)
Example
Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)?
Example
Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)?
No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)?
Example
Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)?
No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No
P(B | A, J, M) = P(B | A)?
P(B | A, J, M) = P(B)?
Example
Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)?
No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No
P(B | A, J, M) = P(B | A)? Yes
P(B | A, J, M) = P(B)? No
P(E | B, A ,J, M) = P(E | A)?
P(E | B, A, J, M) = P(E | A, B)?
Example
Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)?
No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No
P(B | A, J, M) = P(B | A)? Yes
P(B | A, J, M) = P(B)? No
P(E | B, A ,J, M) = P(E | A)? No
P(E | B, A, J, M) = P(E | A, B)? Yes
Example contd.
Deciding conditional independence is hard in noncausal
directions
(Causal models and conditional independence seem
hardwired for humans!)
Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers
needed
Figure 14.4 (a) A node X is conditionally independent of its non-descendants (e.g.,
. the Zijs) given its parents (the Uis shown in the gray area)
Figure 14.4 A node X is conditionally independent of all other nodes in the )b(
.network given its Markov blanket (the gray area)
Summary
Probability is a rigorous formalism for
uncertain knowledge
Joint probability distribution specifies
probability of every atomic event
Queries can be answered by summing
over atomic events
For nontrivial domains, we must find a way
to reduce the joint size
Independence and conditional
Summary
Bayesian networks provide a natural
representation for (causally induced)
conditional independence
Topology + CPTs = compact representation
of joint distribution
Generally easy for domain experts to
construct