0% found this document useful (0 votes)
7 views79 pages

UNIT-4 New

The document discusses uncertainty in artificial intelligence, focusing on Bayesian networks and decision-making under uncertainty. It covers key concepts such as probability, conditional probability, independence, and Bayes' Rule, illustrating how these concepts can be applied to model and infer relationships between variables. The document also explains the structure and semantics of Bayesian networks, highlighting their efficiency in representing joint distributions.

Uploaded by

amolamolamol147
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views79 pages

UNIT-4 New

The document discusses uncertainty in artificial intelligence, focusing on Bayesian networks and decision-making under uncertainty. It covers key concepts such as probability, conditional probability, independence, and Bayes' Rule, illustrating how these concepts can be applied to model and infer relationships between variables. The document also explains the structure and semantics of Bayesian networks, highlighting their efficiency in representing joint distributions.

Uploaded by

amolamolamol147
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 79

An Introduction to Artificial Intelligence

Chapter 13 &14.1-14.2: Uncertainty & Bayesian Networks


Ms.Nitu L. Pariyal)
CSE Department
M.G. college of Engg.
Outline
 Uncertainty
 Probability
 Syntax and Semantics
 Inference
 Independence and Bayes' Rule
 Bayesian Network
Uncertainty
 Agents may need to handle uncertainty, whether due to
partial observability, nondeterminism,or a combination of
the two. An agent may never know for certain what state it’s in
or where it will end up after a sequence of actions.
Let action At = leave for airport t minutes before flight
Will At get me there on time?
Problems:
1. partial observability (road state, other drivers' plans, etc.)
2. noisy sensors (traffic reports)
3. uncertainty in action outcomes (flat tire, etc.)
4. immense complexity of modeling and predicting traffic
Hence a purely logical approach either
1. risks falsehood: “A25 will get me there on time”,
or
2. leads to conclusions that are too weak for
decision making:
“A25 will get me there on time if there's no accident on
the bridge and it doesn't rain and my tires remain
intact etc etc.”
(A1440 might reasonably be said to get me there on time
but I'd have to stay overnight in the airport …)
Making decisions under
uncertainty
Suppose I believe the following:

P(A25 gets me there on time | …) = 0.04


P(A90 gets me there on time | …) = 0.70
P(A120 gets me there on time | …) = 0.95
P(A1440 gets me there on time | …) = 0.9999

 Which action to choose?


Depends on my preferences for missing flight vs. time spent
waiting, etc.
◦ Utility theory is used to represent and infer preferences
◦ Decision theory = probability theory + utility theory
Syntax
 Basic element: random variable
 Similar to propositional logic: possible worlds defined by
assignment of values to random variables.
 Boolean random variables
 e.g., Cavity (do I have a cavity?)

 Discrete random variables


e.g., Weather is one of <sunny,rainy,cloudy,snow>
 Domain values must be exhaustive and mutually exclusive
 Elementary proposition constructed by assignment of a value to
a random variable:
 e.g., Weather = sunny, Cavity = false
 (abbreviated as cavity)
 Complex propositions formed from elementary propositions
and standard logical connectives
 e.g., Weather = sunny  Cavity = false

Syntax
Atomic event: A complete specification of the state of the
world about which the agent is uncertain

 E.g., if the world consists of only two Boolean variables Cavity
and Toothache, then there are 4 distinct atomic events:

Cavity = false Toothache = false


Cavity = false  Toothache = true
Cavity = true  Toothache = false
Cavity = true  Toothache = true

Atomic events are mutually exclusive(Limited) and


exhaustive(full or complete)

Axioms of probability
For any propositions A, B

◦ 0 ≤ P(A) ≤ 1
◦ P(true) = 1 and P(false) = 0
◦ P(A  B) = P(A) + P(B) - P(A  B)

Prior probability
Prior or unconditional probabilities of propositions

e.g., P(Cavity = true) = 0.1 and P(Weather = sunny) = 0.72

correspond to belief prior to arrival of any (new) evidence


Probability distribution gives values for all possible assignments:

P(Weather) = <0.72,0.1,0.08,0.1> (normalized, i.e., sums to 1)

Joint probability distribution for a set of random variables gives


the probability of every atomic event on those random
variables

 P(Weather,Cavity) = a 4 × 2 matrix of values:

Weather = sunny rainy cloudy snow


Cavity = true 0.144 0.02 0.016 0.02
Cavity = false 0.576 0.08 0.064 0.08

Every question about a domain can be answered by the joint


Conditional probability
 Conditional or posterior probabilities
 e.g., P(cavity | toothache) = 0.8

i.e., given that toothache is all I know


 (Notation for conditional distributions:
P(Cavity | Toothache) = 2-element vector of 2-element vectors)

If we know more, e.g., cavity is also given, then we have



 P(cavity | toothache,cavity) = 1

New evidence may be irrelevant, allowing simplification, e.g.,



P(cavity | toothache, sunny) = P(cavity | toothache) = 0.8
This kind of inference, sanctioned by domain knowledge, is
critical

Conditional probability
Definition of conditional probability:

P(a | b) = P(a  b) / P(b) if P(b) > 0

Product rule gives an alternative formulation:



P(a  b) = P(a | b) P(b) = P(b | a) P(a)

A general version holds for whole distributions, e.g.,



P(Weather,Cavity) = P(Weather | Cavity) P(Cavity)
 (View as a set of 4 × 2 equations, not matrix mult.)
 Chain rule is derived by successive application of product
rule:
P(X1, …,Xn) = P(X1,...,Xn-1) P(Xn | X1,...,Xn-1)
= P(X1,...,Xn-2) P(Xn-1 | X1,...,Xn-2) P(Xn | X1,...,Xn-1)
= …
= πi= 1^n P(Xi | X1, … ,Xi-1)
Inference by Numeration
Start with the joint probability distribution:

For any proposition φ, sum the atomic events where


it is true: P(φ) = Σω:ω╞φ P(ω)

Inference by enumeration
Start with the joint probability distribution:

For any proposition φ, sum the atomic events


where it is true: P(φ) = Σω:ω╞φ P(ω)

 P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 =
0.2

Inference by enumeration
Start with the joint probability distribution:

For any proposition φ, sum the atomic events where it is


true: P(φ) = Σω:ω╞φ P(ω)

 P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2

 P(cavity ∨ toothache) = 0.108 + 0.012 + 0.072 + 0.008
+ 0.016 + 0.064 = 0.28 .
Inference by enumeration
Start with the joint probability distribution:

Can also compute conditional probabilities:


P(cavity | toothache) = P(cavity  toothache)

P(toothache)

= 0.016+0.064

0.108 + 0.012 + 0.016 + 0.064


= 0.4
Conditional probability

Conditional or posterior probabilities



 e.g., P(cavity | toothache) = 0.8

i.e., given that toothache is all I know

New evidence may be irrelevant,


allowing simplification, e.g.,

P(cavity | toothache, sunny) = P(cavity |


toothache) = 0.8
Conditional probability

Definition of conditional probability:



 P(a | b) = P(a  b) / P(b) if P(b) > 0

Product rule gives an alternative formulation:



 P(a  b) = P(a | b) P(b) = P(b | a) P(a)

Chain rule is derived by successive application of


product rule:

P(X1, …,Xn)
= P(X1,...,Xn-1) P(Xn | X1,...,Xn-1)
= P(X1,...,Xn-2) P(Xn-1 | X1,...,Xn-2) P(Xn | X1,...,Xn-1)
=…
= πi= 1^n P(Xi | X1, … ,Xi-1)
Independence
 Let us expand the full joint distribution in Figure 13.3 by
adding a fouarth variable, Weather .
 The full joint distribution then becomes P(Toothache,
Catch, Cavity,Weather ), which has 2 × 2 × 2 × 4 = 32
entries. It contains four “editions” of the table shown in
Figure 13.3,
 one for each kind of weather.
 What relationship do these editions have to each other and
to the original three-variable table?
 For example, how are P(toothache, catch, cavity, cloudy)
and P(toothache, catch, cavity) related? We can use the
product rule:
 P(toothache, catch, cavity, cloudy)= P(cloudy | toothache,
catch, cavity)P(toothache, catch, cavity) .
Independence
 A and B are independent iff
 P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A)
P(B)

P(Toothache, Catch, Cavity, Weather)


= P(Toothache, Catch, Cavity) P(Weather)

32 entries reduced to 12; for n independent biased


coins, O(2n) →O(n)


Bayes' Rule
P(ab) = P(a | b) P(b) = P(b | a) P(a)

  Bayes' rule: P(a | b) = P(b | a) P(a) / P(b)

Useful for assessing diagnostic probability from


causal probability:
P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect)

 E.g., let M be meningitis, S be stiff neck:



◦ P(m|s) = P(s|m) P(m) / P(s) = 0.8 × 0.0001 / 0.1 = 0.0008

Note: posterior probability of meningitis still very small!



Bayes' Rule and conditional
independence

P(Cavity | toothache  catch)


= αP(toothache  catch | Cavity) P(Cavity)
= αP(toothache | Cavity) P(catch | Cavity) P(Cavity)

This is an example of a naïve Bayes model:



 P(Cause,Effect1, … ,Effectn) = P(Cause) πiP(Effecti|Cause)

Bayesian networks
 A simple, graphical notation for conditional
independence assertions and hence for compact
specification of full joint distributions

 Syntax:
 a set of nodes, one per variable

◦ a directed, acyclic graph (link ≈ "directly influences")
◦ a conditional distribution for each node given its parents:
P (Xi | Parents (Xi))

 In the simplest case, conditional distribution


represented as a conditional probability table
(CPT) giving the distribution over Xi for each
combination of parent values
Example
 Topology of network encodes conditional
independence assertions:

 Weather is independent of the other variables


 Toothache and Catch are conditionally
independent given Cavity
Example
 I'm at work, neighbor John calls to say my alarm is ringing,
but neighbor Mary doesn't call. Sometimes it's set off by
minor earthquakes. Is there a burglar?

 Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls

 Network topology reflects "causal" knowledge:


◦ A burglar can set the alarm off
◦ An earthquake can set the alarm off
◦ The alarm can cause Mary to call
◦ The alarm can cause John to call
Example contd.
Compactness
 A CPT for Boolean Xi with k Boolean parents has 2k rows for
the combinations of parent values

 Each row requires one number p for Xi = true


(the number for Xi = false is just 1-p)

 If each variable has no more than k parents, the complete


network requires O(n · 2k) numbers

 I.e., grows linearly with n, vs. O(2n) for the full joint
distribution

 For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1


= 31)
Semantics
The full joint distribution is defined as the product of
the local conditional distributions:
n

P (X1, … ,Xn) = πi = 1 P (Xi | Parents(Xi))

e.g., P(j  m  a  b  e)


= P (j | a) P (m | a) P (a | b, e) P (b) P (e)
Constructing Bayesian
networks
1. Choose an ordering of variables X1, … ,Xn
2. For i = 1 to n
3. add Xi to the network

◦ select parents from X1, … ,Xi-1 such that
P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1)
This choice of parents guarantees:
n
P (X1, … ,Xn) = πi =1 P (Xi | X1, … , Xi-1)
n
(chain rule)
= πi =1P (Xi | Parents(Xi))
(by construction)
Example
Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?
Example
Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?
No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)?
Example
Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?
No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No
P(B | A, J, M) = P(B | A)?
P(B | A, J, M) = P(B)?
Example
Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?
No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No
P(B | A, J, M) = P(B | A)? Yes
P(B | A, J, M) = P(B)? No
P(E | B, A ,J, M) = P(E | A)?
P(E | B, A, J, M) = P(E | A, B)?
Example
Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?
No
P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No
P(B | A, J, M) = P(B | A)? Yes
P(B | A, J, M) = P(B)? No
P(E | B, A ,J, M) = P(E | A)? No
P(E | B, A, J, M) = P(E | A, B)? Yes
Example contd.

Deciding conditional independence is hard in noncausal


directions

 (Causal models and conditional independence seem


hardwired for humans!)

 Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers


needed

Figure 14.4 (a) A node X is conditionally independent of its non-descendants (e.g.,
. the Zijs) given its parents (the Uis shown in the gray area)
Figure 14.4 A node X is conditionally independent of all other nodes in the )b(
.network given its Markov blanket (the gray area)
Summary
Probability is a rigorous formalism for
uncertain knowledge

 Joint probability distribution specifies
probability of every atomic event
 Queries can be answered by summing
over atomic events

 For nontrivial domains, we must find a way
to reduce the joint size

 Independence and conditional
Summary
 Bayesian networks provide a natural
representation for (causally induced)
conditional independence
 Topology + CPTs = compact representation

of joint distribution
 Generally easy for domain experts to

construct

You might also like