0% found this document useful (0 votes)
22 views19 pages

Chapter 4

The document discusses the importance of quantifying uncertainty in AI systems, which helps improve decision-making in uncertain environments by avoiding overconfidence and enhancing safety. It also covers various types of learning systems in AI, including supervised, unsupervised, and reinforcement learning, emphasizing their ability to adapt and improve over time. Additionally, it explores methods for representing and reasoning under uncertainty, such as Bayesian networks and fuzzy logic, which are crucial for effective AI applications in real-world scenarios.

Uploaded by

Hrithik Suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views19 pages

Chapter 4

The document discusses the importance of quantifying uncertainty in AI systems, which helps improve decision-making in uncertain environments by avoiding overconfidence and enhancing safety. It also covers various types of learning systems in AI, including supervised, unsupervised, and reinforcement learning, emphasizing their ability to adapt and improve over time. Additionally, it explores methods for representing and reasoning under uncertainty, such as Bayesian networks and fuzzy logic, which are crucial for effective AI applications in real-world scenarios.

Uploaded by

Hrithik Suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

AI-CH4

🔍 Quantifying Uncertainty in AI

Quantifying uncertainty refers to how an AI system estimates how con dent it is in its predictions or decisions.

Why Is It Important?

AI systems often make decisions in situations where:

• Data is incomplete or noisy

• Outcomes are uncertain

• There are multiple possible interpretations

By quantifying uncertainty, the system can:

• Avoid overcon dent mistakes

• Flag uncertain outputs for human review

• Improve safety in critical systems (e.g., healthcare, autonomous vehicles)

Types of Uncertainty:

1. Aleatoric Uncertainty (Statistical noise):

◦ Caused by randomness in the data (e.g., blurry image, measurement error).

◦ Cannot be reduced by more data.

2. Epistemic Uncertainty (Model ignorance):

◦ Comes from a lack of knowledge in the model.

◦ Can be reduced with more training data.

Techniques to Measure Uncertainty:

• Bayesian Methods: Use probability distributions over model parameters.

• Monte Carlo Dropout: Use dropout at test time to simulate an ensemble of models.

• Ensemble Learning: Train multiple models and analyze prediction variance.

• Con dence Scores: Output probabilities for classi cations (e.g., 90% con dence it's a cat).

🤖 Learning Systems in AI

What Are They?

Learning systems are AI systems that improve their performance over time by learning from data or experiences
instead of being explicitly programmed.

Types of Learning in AI:

1

fi
fi
fi
fi
fi
AI-CH4

Learning Type Description Example


Supervised Learns from labeled data Spam detection in emails
Learning
Unsupervised Finds patterns in unlabeled data Customer segmentation
Learning
Reinforcement Learns by trial and error through
Game-playing AI like AlphaGo
Learning rewards
Learns from a mix of labeled and Image classi cation with few
Semi-supervised
unlabeled data labels
Pretraining language models like
Self-supervised Learns to label its own data
ChatGPT

Components of a Learning System:

1. Data: The foundation for training.

2. Model: The algorithm that learns patterns (e.g., neural network).

3. Training: The process of improving model accuracy.

4. Evaluation: Testing the model on new data to measure performance.

5. Feedback loop: Continuous learning and improvement.

🎯 Real-World Example (Combining Both)

In medical diagnosis AI:

• The AI learns from patient data (learning system).

• It also provides a con dence score (quantifying uncertainty) for each diagnosis.

• If uncertainty is high, it can alert doctors for further review.

Summary

• Quantifying uncertainty helps AI systems know how sure they are, improving trust and safety.

• Learning systems allow AI to adapt and get better over time using data.

Both are critical for building intelligent, safe, and reliable AI applications.

🤖 Acting Under Uncertainty in AI — Explained

Acting under uncertainty in Arti cial Intelligence (AI) refers to the ability of an AI system to make decisions or take
actions when it doesn't have complete or perfect knowledge about the environment, future outcomes, or the
consequences of its actions.

🔍 Why Is It Important?

In real-world situations, perfect information is rare. AI systems need to act even when:

2

fi
fi
fi
AI-CH4
• Data is incomplete, noisy, or ambiguous

• Outcomes are unpredictable

• The environment changes dynamically

Despite these challenges, the AI must choose the best possible action to achieve its goal.

💡 Real-Life Examples

1. Self-driving cars
Don’t always know how pedestrians or other drivers will behave. They must predict and act accordingly.

2. Robots in search-and-rescue
Often work in unknown or partially destroyed environments and must make safe decisions without full
information.

3. Medical diagnosis systems


Face uncertain symptoms and test results, yet must recommend diagnoses or treatments.

🧠 How AI Handles Uncertainty

AI uses mathematical models and probabilistic reasoning to act under uncertainty:

1. Probability Theory

• Represents beliefs as probabilities (e.g., “there is a 70% chance the object is a cat”).

• Helps AI evaluate the likelihood of different outcomes.

2. Bayesian Networks

• A graphical model showing probabilistic relationships between variables.

• Useful in decision-making where many uncertain factors are involved.

3. Decision Theory

• Combines probabilities with the expected utility (value or bene t) of outcomes.

• AI chooses the action with the highest expected bene t.

4. Markov Decision Processes (MDPs)

• A framework for planning in uncertain environments.

• When the state is only partially known, it becomes a Partially Observable MDP (POMDP).

5. Exploration vs. Exploitation

• AI must balance trying new actions (exploration) and using known good ones (exploitation), especially in
reinforcement learning.

⚙ General Process
3

fi
fi
AI-CH4
1. Sense the environment (may be imperfect or partial).

2. Reason using probabilities to estimate the current state and possible future outcomes.

3. Choose an action that maximizes the chance of success or minimizes risk.

4. Learn from the outcome to improve future decision-making.

🎲 What is Basic Probability Notation?

Probability notation is a standardized way to represent events, outcomes, and their likelihood in mathematics and AI.

Here are the most common basic probability notations you need to know:

🧮 1. Probability of an Event

• Notation: P(A)

• Meaning: Probability that event A happens.

• Example:
If A is "getting heads when ipping a coin", then
P(A) = 0.5

🔁 2. Complement of an Event

• Notation: P(A') or P(¬A)

• Meaning: Probability that event A does NOT happen.

• Formula:
P(A') = 1 - P(A)

• Example:
If P(A) = 0.7, then P(A') = 0.3

🔗 3. Joint Probability

• Notation: P(A ∩ B)

• Meaning: Probability that both events A and B happen.

• Example:
Probability of drawing a red card and a king from a deck.

🔀 4. Union of Events

• Notation: P(A ∪ B)

• Meaning: Probability that at least one of A or B occurs.

• Formula:
P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

🔄 5. Conditional Probability
4

fl
AI-CH4
• Notation: P(A | B)

• Meaning: Probability that event A happens given that B has already happened.

• Formula:
P(A | B) = P(A ∩ B) / P(B) (if P(B) ≠ 0)

• Example:
If it’s raining (B), what’s the probability you’ll carry an umbrella (A)?

⚖ 6. Bayes’ Theorem

• Formula:
P(A∣B)=P(B∣A)⋅P(A)P(B)
P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)}
P(A∣B)=P(B)P(B∣A)⋅P(A)

• Meaning: A way to update probabilities based on new evidence.

🧠 Summary Table

Notation Meaning
P(A) Probability of event A
P(A') or
Probability A does not happen
P(¬A)
P(A ∩ B) Probability A and B both happen
Probability A or B (or both)
P(A ∪ B) happen
`P(A B)`

🧠 Inference Using Full Joint Distributions

5

AI-CH4

Bayes' Theorem Formula

bayes' Rule, also known as Bayes' Theorem, is a fundamental principle in probability theory that allows us to update our
beliefs based on new evidence. It is widely used in statistics, machine learning, and AI for decision-making under
uncertainty.

Bayes' Theorem Formula

Use Cases of Bayes' Rule

1. Medical Diagnosis – Helps determine the probability of a disease given symptoms.

2. Spam Filtering – Used in email classi cation to detect spam messages.

3. Machine Learning – Forms the basis of Bayesian inference in probabilistic models.

4. Autonomous Systems – Helps robots and AI agents make decisions under uncertainty.

5. Financial Risk Analysis – Assesses risks based on prior market trends and new data.

Bayes' Rule is essential for updating probabilities dynamically as new information becomes available.

Representing Knowledge in an Uncertain Domain,

Representing knowledge in an uncertain domain is a crucial aspect of arti cial intelligence, as real-world environments
often involve incomplete, ambiguous, or noisy information. AI systems must use specialized techniques to model and
reason about uncertainty effectively.

Key Approaches to Representing Uncertain Knowledge

1. Probabilistic Reasoning – Uses probability theory to quantify uncertainty and make informed decisions.

2. Bayesian Networks – Graphical models that represent probabilistic dependencies among variables.

3. Hidden Markov Models (HMMs) – Used for sequential data where states are uncertain, such as speech
recognition.

4. Markov Decision Processes (MDPs) – Help AI agents make optimal decisions in uncertain environments.

5. Fuzzy Logic – Allows reasoning with imprecise or vague information by assigning degrees of truth.
6

fi
fi
AI-CH4
6. Dempster-Shafer Theory – A generalization of probability theory that handles uncertainty in evidence-based
reasoning.

7. Belief Networks – Represent uncertain knowledge using probability distributions over multiple variables.

8. Case-Based Reasoning – AI learns from past cases and applies them to new situations with uncertainty.

Applications

• Medical Diagnosis – AI systems predict diseases based on uncertain patient data.

• Autonomous Systems – Robots and self-driving cars navigate unpredictable environments.

• Natural Language Processing (NLP) – AI interprets ambiguous human language.

• Financial Forecasting – AI models predict market trends despite uncertain economic conditions.

These techniques enable AI to operate effectively in dynamic and unpredictable environments

🤔 Other Approaches to Uncertain Reasoning in AI

In addition to Bayesian reasoning, AI uses several other methods to reason under uncertainty — especially when
data is incomplete, ambiguous, or noisy.

Here are the main approaches:

1. 🔁 Rule-Based Systems with Certainty Factors

• Used in: Early expert systems like MYCIN (medical diagnosis)

• Idea: Attach a certainty factor (CF) to each rule to re ect con dence.

• Example:
IF symptom = fever
THEN disease = u [CF = 0.7]

• Limitation: Not based on formal probability theory; lacks consistency in combining uncertainties.

2. 🌫 Fuzzy Logic

• Used when: Concepts are vague or imprecise (not just uncertain).

• Key Idea: Allows partial truth (values between 0 and 1), not just true/false.

• Example:
"Temperature is high" → 0.8 (80% true)
Useful in control systems, like washing machines or air conditioning.

• Advantage: Great for modeling human-like reasoning.

3. 📈 Dempster–Shafer Theory (Belief Functions)

• Generalizes Bayesian probability.

• Allows expressing degrees of belief without needing precise probabilities.


7

fl
fl
fi
AI-CH4
• Distinguishes between belief and plausibility:

◦ Belief (Bel): Minimum certainty based on evidence

◦ Plausibility (Pl): Maximum certainty given all available evidence

• Advantage: Can represent ignorance explicitly.

4. 🧠 Markov Logic Networks (MLNs)

• Combine probabilistic graphical models with rst-order logic.

• Handle complex domains with uncertain relationships (e.g., social networks).

• Each logical rule is assigned a weight to represent how strong the rule is under uncertainty.

5. 🎲 Probabilistic Graphical Models (PGMs)

• Includes:

◦ Bayesian Networks (directed)

◦ Markov Networks (undirected)

• Represent dependencies between variables using graphs.

• Used for complex reasoning tasks (e.g., medical diagnosis, NLP, vision).

6. 🧬 Possibility Theory

• Focuses on possibility and necessity instead of probability.

• Suited for qualitative uncertainty where assigning numerical probabilities is hard.

• Less precise but easier to apply in some expert systems.

7. 🤖 Non-monotonic Reasoning & Default Logic

• Allows conclusions to be revised when new information contradicts earlier assumptions.

• Example:
Assume “Birds y” → conclude “Tweety ies”
But if you learn “Tweety is a penguin” → retract the earlier conclusion.

• Useful in commonsense AI and real-world reasoning.

Rule-based methods for uncertain reasoning

8

fl
fl
fi
AI-CH4

🌫 What is Representing Vagueness in AI?

Vagueness refers to situations where the boundaries of a concept are not clearly de ned — unlike uncertainty (which
is about not knowing the truth), vagueness is about the blurry nature of the concept itself.

🤔 Example of Vagueness:

• What does "tall person" mean?

◦ Is someone 5'10" tall? 6'0"? 6'4"? There's no sharp cutoff.

• What is "hot weather"?

◦ It might feel hot at 30°C to one person, but not to another.

This kind of imprecision cannot be handled well with classic Boolean logic (true/false). Instead, AI uses fuzzy logic
and similar approaches to represent vagueness.

Key Approaches to Representing Vagueness

1. Fuzzy Logic – Allows reasoning with imprecise data by assigning degrees of truth between 0 and 1, rather
than binary true/false values.

2. Probability Theory – Models uncertainty by assigning likelihoods to different possibilities.

3. Possibility Theory – Focuses on degrees of possibility rather than strict probabilities.

4. Dempster-Shafer Theory – Handles uncertainty by assigning belief functions instead of precise probabilities.

5. Vague Predicates – Words like "tall" or "young" lack clear-cut boundaries, leading to borderline cases.

6. Non-Monotonic Reasoning – Allows AI to revise conclusions when new evidence contradicts previous
assumptions.

Applications

• Natural Language Processing (NLP) – Helps AI interpret ambiguous human language.

• Legal Reasoning – Addresses vague legal de nitions, such as "reasonable doubt".

• Medical Diagnosis – Handles cases where symptoms don't t neatly into prede ned categories.

• AI Decision-Making – Enables systems to make exible judgments in uncertain environments.

Vagueness is distinct from ambiguity, where a term has multiple meanings (e.g., "bank" referring to a nancial
institution or a riverbank.

FUZZY SETS AND LOGIC

Fuzzy sets and fuzzy logic are mathematical frameworks designed to handle uncertainty and imprecision, making them
useful in AI, control systems, and decision-making.

Fuzzy Sets

9

fi
fl
fi
fi
fi
fi
AI-CH4
A fuzzy set is an extension of classical set theory where elements have degrees of membership rather than a strict
binary classi cation (true or false). In classical sets, an element either belongs to a set or it doesn’t. In fuzzy sets,
membership is represented by a membership function that assigns values between 0 and 1, indicating the degree to
which an element belongs to the set.

For example, in a fuzzy set of "Tall People":

• A person 6'5" might have a membership value of 1 (fully tall).

• A person 5'10" might have a membership value of 0.7 (somewhat tall).

• A person 5'5" might have a membership value of 0.3 (barely tall).

Fuzzy Logic

Fuzzy logic is a multi-valued logic system that allows reasoning with imprecise or vague information. Unlike classical
Boolean logic, which operates on strict true (1) or false (0) values, fuzzy logic allows intermediate values.

Key Components of Fuzzy Logic:

1. Fuzzi cation – Converts crisp inputs into fuzzy values using membership functions.

2. Fuzzy Rules – "If-Then" rules de ne relationships between fuzzy variables.

3. Inference Engine – Applies fuzzy rules to derive conclusions.

4. Defuzzi cation – Converts fuzzy outputs back into crisp values.

Applications of Fuzzy Logic

• Control Systems – Used in washing machines, air conditioners, and traf c control.

• AI & Robotics – Helps robots make exible decisions in uncertain environments.

• Medical Diagnosis – Assists in handling vague symptoms for disease prediction.

• Natural Language Processing (NLP) – Improves AI's understanding of ambiguous language.

Fuzzy logic is widely used in systems where human-like reasoning is needed

🧰 Components of Fuzzy Logic System

Component Description
Fuzzi er Converts crisp input into fuzzy values
Rule Base Set of fuzzy if-then rules
Inference
Applies logic to fuzzy inputs
Engine
Converts fuzzy output into a crisp
Defuzzi er
decision

📦 Real-World Applications

10

fi
fi
fi
fi
fi
fi
fl
fi
AI-CH4

Application Use of Fuzzy Logic


Washing Machines Adjust wash cycles based on dirt level
Air Conditioning Systems Set temperature levels smoothly
Make driving decisions with vague
Autonomous Cars
inputs
Natural Language
Interpret fuzzy human expressions
Processing

Fuzzy Logic

Fuzzy logic is a mathematical framework that deals with imprecise or vague information. Unlike classical Boolean
logic, which operates on strict true (1) or false (0) values, fuzzy logic allows intermediate values between 0 and 1,
representing degrees of truth.

Key Components of Fuzzy Logic:

1. Fuzzy Sets – Elements have degrees of membership rather than binary classi cation.

2. Membership Functions – De ne how input values belong to fuzzy sets.

3. Fuzzy Rules – "If-Then" statements that describe relationships between fuzzy variables.

4. Inference Engine – Applies fuzzy rules to derive conclusions.

5. Defuzzi cation – Converts fuzzy outputs back into crisp values.

Why Study Fuzzy Logic?

• Real-world concepts are often imprecise — e.g., "hot", "tall", "fast".

• Fuzzy logic helps computers reason and make decisions in these gray areas.

• It’s widely used in control systems, pattern recognition, natural language processing, and robotics.

How it Works:

• De ne fuzzy sets for vague concepts (e.g., “hot” temperature).

• Use membership functions to assign a degree of membership (0 to 1) to inputs.

• Apply fuzzy rules (IF-THEN) that use fuzzy values.

• Combine and process these rules with an inference engine.

• Produce a crisp output using defuzzi cation.

🌳 Study of Decision Trees

What is a Decision Tree?

• A Decision Tree is a owchart-like tree structure used for classi cation or regression.

11

fi
fi
fl
fi
fi
fi
fi
AI-CH4
• Each internal node tests an attribute (feature).

• Each branch corresponds to an outcome of the test.

• Each leaf node represents a class label or a value.

Why Study Decision Trees?

• They provide a simple, interpretable model for making decisions.

• Used in machine learning for classi cation problems like spam detection, medical diagnosis, etc.

• Easy to visualize and understand.

How it Works:

• The tree is built by splitting data on attributes that best separate the classes.

• Common splitting criteria include Information Gain, Gini Index, or Chi-square.

• Once the tree is trained, you classify new data by traversing the tree from root to a leaf.

🔄 Comparing Fuzzy Logic and Decision Trees

Feature Fuzzy Logic Decision Trees


Handles Vagueness and partial truth Clear yes/no decisions
Nature Approximate reasoning Rule-based exact branching
Output Degrees of membership (0 to 1) Discrete classes or values
Moderate (requires understanding
Explainability High (easy to visualize)
membership functions)
Typical Use Classi cation, regression, pattern
Control systems, fuzzy decision-making
Cases recognition

Implementing decision trees


Implementing decision trees involves several key steps, from data preprocessing to model training and evaluation.
Here’s a breakdown of the essential aspects:

1. Data Preparation

• Feature Selection – Identify relevant attributes that in uence the decision-making process.

• Handling Missing Data – Use techniques like imputation or removal to deal with incomplete records.

• Categorical vs. Numerical Data – Convert categorical variables into numerical representations if needed.

2. Splitting Criteria

• Entropy & Information Gain (ID3 Algorithm) – Measures the reduction in uncertainty after a split.

• Gini Index (CART Algorithm) – Evaluates the purity of a node by measuring class distribution.

12

fi
fi
fl
AI-CH4
• Variance Reduction (Regression Trees) – Used for continuous target variables.

3. Tree Construction

• Recursive Partitioning – Splits data iteratively based on the best attribute.

• Stopping Criteria – Limits tree depth to prevent over tting.

• Pruning – Removes unnecessary branches to improve generalization.

4. Model Evaluation

• Accuracy Metrics – Use precision, recall, and F1-score for classi cation trees.

• Cross-Validation – Ensures robustness by testing on multiple data subsets.

• Over tting Prevention – Techniques like pruning and limiting depth help avoid excessive complexity.

5. Implementation in Python

Libraries like Scikit-learn provide built-in functions for decision tree implementation. You can check out a detailed
guide on Python Decision Tree Implementation for practical examples.

Forms of Learning in Machine Learning

Machine learning involves different types of learning methods based on how data is provided and how models improve
over time. The primary forms of learning include:

1. Supervised Learning – The model learns from labeled data, where each input has a corresponding correct
output.

2. Unsupervised Learning – The model identi es patterns and structures in unlabeled data without prede ned
outputs.

3. Semi-Supervised Learning – A mix of supervised and unsupervised learning, where some data is labeled and
some is not.

4. Reinforcement Learning – The model learns by interacting with an environment and receiving rewards or
penalties.

Supervised Learning

Supervised learning is one of the most widely used machine learning techniques. It involves training a model using
labeled data, meaning each input is paired with the correct output.

Key Characteristics:

• Requires labeled datasets.

• The model learns a mapping function from inputs to outputs.

• Used for classi cation and regression tasks.

Types of Supervised Learning:

1. Classi cation – Predicts discrete labels (e.g., spam detection, image recognition).
13

fi
fi
fi
fi
fi
fi
fi
AI-CH4
2. Regression – Predicts continuous values (e.g., stock price prediction, temperature forecasting).

Examples of Supervised Learning:

• Email Spam Detection – Classi es emails as spam or not based on labeled examples.

• Medical Diagnosis – Predicts diseases based on patient symptoms and historical data.

• Speech Recognition – Converts spoken words into text using labeled speech data.

Supervised learning is widely used in AI applications where labeled data is available.

Learning decision trees


Learning decision trees is a fundamental approach in machine learning where a model learns to classify or predict
outcomes based on a hierarchical structure of decisions. Decision trees split data into branches based on feature values,
making them easy to interpret and visualize.

How Decision Trees Learn

1. Feature Selection – The algorithm selects the best attribute to split the data at each node.

2. Splitting Criteria – Methods like entropy (ID3), Gini index (CART), or variance reduction determine the best
split.

3. Recursive Partitioning – The tree grows by repeatedly splitting data into subsets.

4. Stopping Conditions – Limits like maximum depth or minimum samples per node prevent over tting.

5. Pruning – Removes unnecessary branches to improve generalization.

Types of Decision Trees

• Classi cation Trees – Used for categorical predictions (e.g., spam detection).

• Regression Trees – Used for continuous predictions (e.g., stock price forecasting).

🌳 What is Decision Tree Representation?

A Decision Tree is a tree-like model used to make decisions or predictions based on input data. It represents a ow of
decisions, breaking down a complex decision into simpler, sequential tests on attributes.

Key Components of Decision Tree Representation:

Component Description
Root Node The top node representing the entire dataset; where decision-making starts.
Internal Nodes Nodes that test an attribute (feature) of the data. Each node corresponds to a
decision
Outcomesrule
of (e.g., “Isleading
Age > to
30?”).
the test next nodes or leaves. Each branch corresponds
Branches (Edges)
to a possible value or range of the attribute.
Leaf Nodes Final nodes that assign a class label (classi cation) or value (regression). No
(Terminal Nodes) further splits happen here.

14

fi
fi
fi
fi
fl
AI-CH4

How it Works

• Start at the root node.

• At each internal node, test the attribute speci ed.

• Follow the branch corresponding to the attribute value for the given input.

• Repeat until a leaf node is reached.

• Output the decision or class label at the leaf.

🌟 Expressiveness of Decision Trees

Expressiveness refers to the ability of decision trees to represent complex decision boundaries and patterns in data.

What Does Expressiveness Mean for Decision Trees?

• It’s about how well a decision tree can model relationships between input features and output classes or values.

• A more expressive tree can capture complex, non-linear patterns.

• Less expressive trees represent only simple, linear, or easy-to-separate data.

Factors Affecting Expressiveness

Factor Effect on Expressiveness


Depth of the Tree Deeper trees can represent more complex decisions.
Number of Nodes More nodes allow ner splits and detailed decision paths.
Handling categorical and continuous variables increases
Type of Splits
expressiveness.
Handling Trees can capture interactions between features by hierarchical
Interactions splitting.

Strengths of Decision Trees in Expressiveness

• Can represent any Boolean function given enough depth.

• Handle non-linear decision boundaries naturally.

• Capture feature interactions via sequential splits.

• Intuitive representation of complex rules.

Limitations

• Trees that are too deep can over t (model noise rather than signal).

• May require large trees for very complex functions, impacting interpretability.

• Sometimes struggle with very smooth or highly complex decision boundaries compared to other models (like neural
networks).
15

fi
fi
fi
AI-CH4

🌳 Inducing Decision Trees from Examples

Inducing a decision tree means building the tree automatically from a set of training examples (data points with
features and known labels).

What is the Goal?

• Given a dataset of examples with input features and output labels, create a decision tree that classi es or
predicts correctly.

• The tree should generalize well to unseen data, not just memorize training examples.

Step-by-Step Process of Inducing Decision Trees

1. Start with the Entire Dataset

• Begin at the root node with all training examples.

2. Select the Best Attribute to Split

• Choose the feature that best separates the data into distinct classes.

• Common selection criteria:

◦ Information Gain (based on entropy reduction)

◦ Gini Index (measure of impurity)

◦ Gain Ratio, Chi-square

3. Split the Dataset

• Divide data into subsets according to the chosen attribute’s possible values or split points.

• For continuous features, pick the best threshold.

4. Create Child Nodes

• Each subset becomes a child node.

5. Repeat Recursively

• For each child node, repeat:

◦ If all examples belong to the same class, make a leaf node with that class.

◦ Otherwise, select the best attribute to split further.

6. Stopping Conditions

• Stop splitting when:

◦ All examples have the same class.

◦ No attributes left.

16

fi
AI-CH4
◦ Number of examples is too small.

◦ Maximum tree depth reached.

7. Prune the Tree (Optional)

• To avoid over tting, prune branches that do not improve accuracy on validation data.

Example Illustration

Suppose dataset with attributes: Weather (Sunny, Rainy), Temperature (High, Low), and label Play (Yes/No).

• Start: All examples at root.

• Choose attribute with highest information gain, say Weather.

• Split dataset into subsets: Weather=Sunny and Weather=Rainy.

• Recursively split subsets on next best attributes until pure leaves.

Summary Table

Step Description
Start Use all training examples at root
Attribute
Pick feature that best splits the data
Selection
Splitting Partition data based on selected attribute
Recursion Repeat for each subset
Stop when nodes are pure or no further split
Stopping Criteria
possible
Pruning (optional) Remove branches that cause over tting

Common Algorithms for Decision Tree Induction

• ID3 (uses Information Gain)

• C4.5 (improvement of ID3, handles continuous attributes, pruning)

• CART (Classi cation and Regression Trees, uses Gini Index)

17

fi
fi
fi
AI-CH4

18

AI-CH4

19

You might also like