Uncertain knowledge and Learning Uncertainty: Acting under Uncertainty, Basic Probability Notation, Inference Using Full
Joint Distributions, Independence, Bayes’ Rule and Its Use
Probabilistic Reasoning: Representing Knowledge in an Uncertain Domain, The Semantics of Bayesian Networks, Efficient
Representation of Conditional Distributions, Approximate Inference in Bayesian Networks, Relational and First-Order
Probability, Other Approaches to Uncertain Reasoning; Dempster-Shafer theory.
1. What is Uncertainty?*
*Uncertainty* means we're *not 100% sure* about something.
#### *Example Everyone Can Relate To:*
- You studied hard for an exam.
But you’re still unsure if you’ll get an A. Why?
Because the questions might be tricky, or you might have made a silly mistake.
*That’s uncertainty.*
In AI, uncertainty comes when the system doesn’t have *complete or reliable information*.
*2. Why Does Uncertainty Happen in AI?*
Just like humans, AI faces uncertainty for many reasons:
#### *Examples:*
- *Unreliable info:* Your friend tells you the exam is postponed, but you’re not sure if they’re joking.
- *Sensor error:* A smart thermometer says the room is 50°C – clearly something's wrong!
- *Climate variation:* A smart irrigation system can’t predict rain properly due to sudden weather change.
- *Human errors:* Someone enters wrong patient info in a hospital database
### *3. How AI Handles Uncertainty?*
Through something called *Probabilistic Reasoning* – like *thinking in chances*.
#### *Example:*
Let’s say you're deciding whether to carry an umbrella.
- You check your weather app – it says *60% chance of rain*.
- You look outside – clouds are gathering.
- You remember last time when it looked the same, and it did rain.
*Even though you’re not 100% sure, you carry the umbrella.*
That’s probabilistic reasoning. You made a decision based on *probabilities*, not certainties.
### *4. What is a Certainty Factor (CF)?*
AI systems use something called a *Certainty Factor* to *measure belief*.
It ranges from:
- *+1* = Completely sure it’s true
- *0* = No idea
- *-1* = Completely sure it’s false
#### *Example:*
You're watching a cricket match.
India needs 10 runs in 1 ball.
You think there’s a *0.1 (10%) chance they’ll win*.
That’s your certainty factor.
AI uses this kind of thinking to make decisions under uncertainty.
### *5. Belief vs. Disbelief*
AI systems also measure *how much evidence increases or decreases belief*.
#### *Example:*
You’re suspicious your friend ate your chips.
- You find *empty packets* in their bag – belief increases!
- But your mom says she gave them chips too – belief decreases
AI works the same way by updating belief (*MB) and disbelief (MD*) values with every new piece of information.
*6. Final Decision*
AI combines MB and MD to get a final *Certainty Factor (CF)*.
*Example:*
- You have 0.8 belief and 0.2 disbelief → Final CF = 0.8 – 0.2 = *0.6*
- AI says: “There is a *60% confidence* in this diagnosis.”
*7. Why is This Useful in Real Life?*
We use uncertainty reasoning *every day*
| Situation | Your Thought Process |
|------------------------------|-----------------------------------------------------|
| Should I eat street food? | "Looks tasty, but I might get sick – 50/50 chance." |
| Should I invest in Bitcoin? | "High risk, high return – I’m 60% sure it’ll work." |
| Should I bunk class today? | "Teacher might take attendance – not sure." |
AI also makes decisions based on *risk, probability, and evidence* just like we do!
2. What is Full Joint Distribution?
Imagine we have two things:
Weather (Sunny or Rainy)
Traffic (Light or Heavy)
Now, we want to list all the combinations of these and assign a probability to each. This table with all combinations and their
probabilities is called a Full Joint Probability Distribution.
Example:
Weather Traffic Probability
Sunny Light0.3
Sunny Heavy 0.2
Rainy Light0.1
Rainy Heavy 0.4
This tells us, for example, the chance that it’s sunny and traffic is light is 0.3 (or 30%).
2. What is Inference?
Inference means using known probabilities to find new probabilities.
For example, let’s say you want to know:
What’s the chance it’s rainy in general?
You’ll add up all rows where the weather is rainy.
P(Rainy) = P(Rainy, Light) + P(Rainy, Heavy) = 0.1 + 0.4 = 0.5 (or 50%).
What’s the chance it’s rainy given that the traffic is heavy?
This is conditional probability.
Formula:
P(Rainy | Heavy) = P(Rainy and Heavy) / P(Heavy)
= 0.4 / (0.2 + 0.4) = 0.4 / 0.6 = 0.666 or 66.6%
This tells you: If traffic is heavy, there’s a 66.6% chance that it’s also rainy.
3. Real-Life Example: Medical Diagnosis
Let’s say a person has a symptom, and we want to know the chance they have a disease.
Disease Symptom Probability
Yes Present 0.3
Yes Absent 0.1
No Present 0.2
No Absent 0.4
Now someone comes with the symptom. We want to find:
What’s the chance they actually have the disease?
Use the formula:
P(Disease = Yes | Symptom = Present) =
= P(Yes and Present) / (P(Yes and Present) + P(No and Present))
= 0.3 / (0.3 + 0.2) = 0.3 / 0.5 = 0.6 or 60%
So we say: If someone has the symptom, there’s a 60% chance they have the disease.
4. Final Tip
You can tell your friends:
Full Joint Distribution is like a full table of probabilities for all combinations.
Inference means picking useful info from that table to answer real questions.
It works well for small problems, but for large ones, we use smarter methods like Bayesian Networks.
Semantics of The Bayesian Networks
What is a Bayesian Network?
• A Bayesian Network is a powerful tool from the field of
probabilistic reasoning and machine learning.
• It is a graph-based model that represents a set of random variables
and their probabilistic relationships in a structured, compact, and
intuitive form.
• They are named after Thomas Bayes, due to their foundation in
Bayes’ Theorem, which underpins how probabilities are updated
when new evidence is observed.
Why are Bayesian
Networks Important?
• They provide a graphical structure to represent dependencies
between variables.
• They allow for efficient reasoning and probabilistic inference.
• They can be learned from data or manually constructed with
expert knowledge.
• They can scale to large, complex domains while still being interpretable.
• They're widely used in fields like AI, medicine, robotics, economics,
and more.
What Do We Mean by
"Semantics"?
• In Bayesian Networks, semantics refers to the meaning behind the
structure — how the graph connects to probability and
inference.It answers:
• What do variable connections represent?
• How is the joint probability defined?
• How are dependencies and independencies interpreted?
Understanding semantics helps
• Designing accurate models
• Performing correct inference
• Learning from data
1. Graphical Semantics
• The structure of a Bayesian network is a Directed Acyclic Graph.
• Each node represents a random variable.
• Each directed edge represents a direct probabilistic dependency
between variables.
• No cycles allowed — ensures consistency in probability computations.
• Example:
• If we have nodes A → B → C:
• A influences B
• B influences C
• There is no direct influence from A to C, but A indirectly influences C.
2. Probabilistic Semantics
• The semantics define how the joint probability distribution (JPD)
over all variables is constructed.
3. Conditional Independence
Semantics
• A Bayesian Network makes independence assumptions based on
its structure.
• d-separation is the key concept:
• Direction-separation (short for directional separation) is a rule used to
determine whether two variables are conditionally independent,
given some other variables, just by looking at the graph structure of a
Bayesian Network.
• A path between two nodes is "blocked" (i.e., variables are
conditionally independent) depending on the configuration of colliders
and observed variables.
• If two variables are d-separated given a set of variables, they
are conditionally independent in the probability distribution.
• d-separation helps in understanding which variables influence others.
• Useful in efficient inference and learning from data.
In other words,
Y(Going to the park) and Z(Carrying an umbrella) are conditionally
independent given X(Weather), because knowing whether a person is
carrying an umbrella doesn't give any more information about whether
they'll go to the park once we know whether it’s raining or not.
What is Dempster Shafer
Theory?
The Dempster-Shafer Theory (DST) is a mathematical
framework for handling uncertainty and incomplete
information in decision-making. Unlike traditional
probability theory, which requires prior probabilities, DST
allows for degrees of belief based on available evidence.
This makes it highly useful in artificial intelligence, expert
systems, and data fusion applications.
NEED OF DEMPSTER SHAFER
THEORY
This theory was released because of following reason:-
⚫ Bayesian theory is only concerned about single evidences.
⚫ Bayesian probability cannot describe ignorance
DST is an evidence theory, it combines all possible outcomes of the
problem. Hence it is used to solve problems where there may be a
chance that a different evidence will lead to some different result.
The uncertainity in this model is given by:-
1. Consider all possible outcomes.
2.Belief will lead to believe in some possibility by bringing out
some evidence. (What is this supposed to mean?)
3. Plausibility will make evidence compatible with
possible outcomes.
Key Concepts
I. Belief (Bel): The degree of evidence supporting a hypothesis.
2. Plausibility (PI): The degree to which a hypothesis could be true,
given the evidence.
3. Evidence: Information that supports one or more hypotheses.
Example of Dempster Shafer
Theory in Action
A doctor suspects a patient might have Disease A or Disease B based on symptoms and test
results. However, uncertainty exists due to incomplete information.
Assign Mass Functions (m(K))
◦ A blood test suggests Disease A with m(A) = 0.6 (60% confidence).
◦ A different test supports Disease B with m(B) = 0.3 (30% confidence).
◦ There is a 10% uncertainty where neither disease is confirmed: m(Ω)=0.1
Compute Belief (Bel) and Plausibility (Pl)
◦ Bel(A)=0.6, as that is the confirmed evidence supporting Disease A.
◦ Pl(A)=1-Bel(¬A)=1-0.3=0.7, meaning Disease A is at most 70% probable.
Dempster’s Rule of Combination
If new evidence emerges (e.g., an AI system analyzing X-ray images assigns , DST allows us to
combine these sources mathematically, increasing confidence in Disease A.
Characteristics of Dempster
Shafer Theory
•Ability to Handle Incomplete Information
•Independence from Prior Probability Assumptions
•Flexibility in Combining Evidence from Multiple Sources
Advantages
•Better uncertainty handling: Unlike classical probability, DST allows partial
belief assignment, making it useful in situations where complete probability
distributions are unavailable.
•Ideal for AI applications: DST is widely used in AI-driven decision-making,
sensor fusion, and expert systems, where multiple uncertain sources of
evidence need to be combined.
•No need for prior probabilities: Unlike Bayesian inference, DST does not require
strict prior probability distributions, making it flexible for real-world AI
applications.
Disadvantages
•Computational complexity: As the number of
possible hypotheses increases, DST’s calculations
become computationally expensive, making it
difficult to scale in large AI models.
•Defining mass functions: The effectiveness of
DST depends on properly defining mass
functions, which can be challenging without
sufficient domain expertise or reliable evidence
sources.