0% found this document useful (0 votes)

29 views14 pages

Decision Trees: Example

Decision Trees are supervised learning techniques used for classification and regression, structured as tree-like models with decision nodes and leaf nodes. The construction involves selecting features to split the dataset recursively until certain stopping criteria are met, and various criteria like Gini Impurity and Entropy are used for splitting. Ensemble learning methods like Bagging and Boosting enhance model accuracy, with Random Forest being a specific ensemble method that builds multiple decision trees using random subsets of data and features.

Uploaded by

gorlikrishnaveni23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views14 pages

Decision Trees: Example

Uploaded by

gorlikrishnaveni23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

DECISION TREES

• Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems.
• It is a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
• Decision nodes are used to make any decision and have multiple branches, whereas Leaf
nodes are the output of those decisions and do not contain any further branches.
• The decisions or the tests are performed on the basis of features of the given dataset.
• It is a graphical representation for getting all the possible solutions to a problem/decision
based on given conditions.
• It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.

Example:
• One of the reasons that decision trees are popular is that we can turn them into a set of
logical disjunctions (if ... then rules) that then go into program code very simply.

Ex: if there is a party then go to it if there is not a party and you have an urgent deadline
then study

Constructing Decision Trees:

1. Start with the Entire Dataset

Consider all training data as the root.

2. Choose the Best Feature to Split

• Evaluate each feature using a splitting criterion (explained below).

• Select the feature that best separates the data.

3. Split the Dataset

Based on the selected feature, divide the dataset into subsets.

4. Recurse on Each Subset

Repeat the process recursively on each subset, treating it as a new node.

5. Stopping Criteria

Stop if:

o All instances in a node belong to the same class.

o Maximum tree depth is reached.

o Minimum number of samples in a node is reached.

o No further gain from splitting.

6. Assign Output

• For classification: Assign the most common class.

• For regression: Assign the average value.

Main Criteria Used for Splitting Nodes

1. Gini Impurity

Gini Impurity measures how mixed the classes are in a node. A lower value means a purer node.
It is used in the CART algorithm.

2. Entropy / Information Gain

Information Gain tells us how much useful information a feature gives. A good feature split
reduces entropy. It is used in the ID3 algorithm.

3. Chi-Square

Chi-Square checks if the split between data groups is meaningful or just due to chance.

4. Reduction in Variance

This is used in regression trees. It chooses splits that reduce the difference (variance) within the
data.

5. Gain Ratio

Gain Ratio improves Information Gain by adjusting for features that have many different values.
It is used in the C4.5 algorithm.

CLASSIFICATION AND REGRESSION IN TREES

CART is a machine learning algorithm that builds decision trees.
It is used for two types of problems:

• Classification – when we want to predict a category or label

• Regression – when we want to predict a numerical value

CART uses a binary tree structure, meaning each node splits into two branches.

Classification in Trees
This is used when the output we want is a class label (like “yes” or “no”).

Example:

• Predicting whether an email is spam or not spam

• Diagnosing whether a person has disease A or not

Working
• At each node, CART checks all features and chooses the one that best separates the data.
• The goal is to group similar items together in each branch.
• This process continues until each branch (called a leaf node) has mostly one class.

Gini Impurity (for classification)

Gini Impurity is a measure used to find the best feature to split the data.

What it means:
• If all data at a node belongs to one class → Gini = 0 (perfectly pure)

• If data is equally mixed among classes → Gini is higher

• Formula:

Where pi is the probability of a class at that node.

CART chooses the split that has the lowest Gini impurity, meaning it creates purer
groups.

Regression in Trees
This is used when the output is a numerical value (like marks, price, or temperature).

Example:

• Predicting the price of a house based on area, location, etc.

• Estimating the rainfall for next week

Working
• CART splits the data again and again to form smaller and smaller groups.
• It finds the best split by checking how much the error reduces after the split.

Residual Sum of Squares (for regression)

• This is the error measure used for regression trees.
• It calculates how far the predicted values are from the actual values.
• Formula:

Where:
• yi is the actual value
• y^ is the predicted value
CART tries to find a split that gives the lowest error (minimum SSE).

Residual Reduction
Residual Reduction is how much the error decreases after a split.

What it means:

• If a split greatly reduces the error → it is a good split

• If it doesn't help much → CART won’t split that way

CART keeps splitting until:

• The tree is too deep

• Or a node has very few data points

ENSEMBLE LEARNING
• Ensemble learning refers to the approach of combining multiple ML models to produce a
more accurate and robust prediction compared to any individual model.
• The conventional ensemble methods include bagging, boosting, and stacking-based
methods

Boosting:
• Boosting is an ensemble technique that combines multiple weak learners to create a
strong learner.
• The ensemble of weak models are trained in series such that each model that comes
next, tries to correct errors of the previous model until the entire training dataset is
predicted correctly.
• One of the most well-known boosting algorithms is AdaBoost (Adaptive Boosting).

AdaBoost (Adaptive Boosting)

• AdaBoost (Adaptive Boosting) is an ensemble learning method used for classification and
regression tasks.
• It works by training weak classifiers one after another, giving more focus (weight) to the
misclassified data in each round.
• The final model is a weighted combination of all weak classifiers, where more accurate
models get higher weight.

Steps in AdaBoost:

1. Weight Initialization

Start by giving equal importance to all training examples.

2. Model Training

Train a simple model on the data, using the current importance (weights) of each example.

3. Weighted Error Calculation

Check how many examples were classified wrongly, giving more attention to the important
ones.

4. Model Weight Calculation

If the model does well, give it more power (weight) in the final decision; if it does badly, give
it less.

5. Update Instance Weights

Increase the importance (weight) of wrongly classified examples, so the next model tries
harder on them.

6. Repeat

Repeat steps 2 to 5 multiple times, each time focusing more on the hard examples.

7. Final Model Creation

Combine all the models into one strong model, giving more say to the better ones.

8. Classification

For new data, the final model makes a prediction by taking a vote from all the simple
models.
Bagging:
• Bagging (Bootstrap Aggregating) is an ensemble learning technique that builds multiple
models on different random subsets of the training data (with replacement).
• It helps to reduce variance and prevent overfitting, especially in high-variance models
like decision trees.
• Bagging is used for both classification and regression tasks and makes the final prediction
by combining outputs from all base models

Steps:
1. Bootstrap Sampling

Randomly create multiple subsets of the training data with replacement.

2. Train Base Models

Train a separate model (usually the same type) on each subset independently.

3. Aggregate Predictions

Combine the results:

Use majority vote for classification

Use average for regression

4. Out-of-Bag Evaluation

Use the samples not included in the training subset (OOB samples) to estimate model
accuracy.

5. Final Prediction

Output the aggregated prediction from all models for the final result.

Random Forest
Definition:

Random Forest is an ensemble method based on Bagging that builds a collection of Decision
Trees, using random subsets of data and features, to improve prediction and reduce overfitting.

Steps:
1. Bootstrap Sampling

Create random subsets of training data with replacement, like in Bagging.

2. Random Feature Selection

When building each Decision Tree, at each split, use a random subset of features (not all
features).

3. Train Multiple Decision Trees

Build many Decision Trees, each trained on different data and features.

4. Aggregate Predictions

Combine the predictions:

Use majority vote for classification

Use average for regression

5. Final Prediction

Predict the final output by combining the outputs of all the trees.

Different Ways to Combine Classifiers

1. Majority Voting (for Classification)

If the number of classifiers is odd and they are independent, the label chosen by most
classifiers is used as the final output.

2. Mean (for Regression)

For regression tasks, we usually take the average (mean) of all model outputs.

3. Median (for Robust Regression)

Since the mean can be affected by outliers, using the median (middle value) is often
better for more stable predictions.

4. Bagging Uses Median

When the median is used to combine predictions, it leads to a more robust version of
bagging, sometimes called "robust bagging".

BASIC STATISTICS

1. Mean

The mean is the average value of a dataset.

It is calculated by adding all the values in the dataset and then dividing the total by the
number of values.

The mean is a useful measure of central tendency, but it is sensitive to outliers, meaning
extreme values can affect it significantly.

2. Median

The median is the middle value of a dataset when all the values are arranged in order.

If the number of values is odd, the median is the exact middle value.

If the number of values is even, the median is calculated by taking the average of the two
middle values.

The median is a reliable measure of central tendency because it is not affected by outliers or
extreme values.

3. Mode

The mode is the value that occurs most frequently in a dataset.

It is useful for identifying the most common value in the data.

If a dataset has two or more values that occur with the same highest frequency, it is called
bimodal, trimodal, or multimodal respectively.

The mode may not be a helpful measure if all values are different or if the dataset has a wide
range of values.

4. Variance

Variance measures how much the values in a dataset differ from the mean.

A high variance indicates that the data points are spread out widely, while a low variance
means they are closely clustered around the mean.

5. Standard Deviation
The standard deviation is the square root of the variance.

It tells us how much the values in the dataset typically differ from the mean, and it is
expressed in the same units as the data.

A small standard deviation means the data is closer to the mean, while a large one means
the data is more spread out.

6. Covariance

Covariance shows the relationship between two variables, indicating how one variable
changes when the other changes.

A positive covariance means both variables tend to increase or decrease together.

A negative covariance means that as one variable increases, the other tends to decrease.

Covariance is scale-dependent, so the values can vary greatly.

7. Mahalanobis Distance

The Mahalanobis Distance is a statistical measure that shows how far a point is from the
mean of a distribution, considering the correlations between variables.

It is used in applications like outlier detection, clustering, and classification.

The formula is:

GAUSSIAN MIXTURE MODELS (GMM)

Gaussian Mixture Model (GMM) is a probabilistic model that assumes data is generated from a
mixture of several Gaussian (normal) distributions, each with its own mean and variance.
Purpose:

GMM is mainly used for unsupervised learning, especially in clustering, where we don’t have
labelled data. It helps to group similar data points based on the likelihood that they belong to
the same Gaussian distribution.

How GMM Works:

1. Initialization

Start with a guess of how many Gaussians (clusters) there are.

2. Expectation Step (E-Step)

For each data point, calculate the probability that it belongs to each Gaussian.

3. Maximization Step (M-Step)

Update the parameters (mean, variance, and weight) of each Gaussian to maximize the
likelihood of the data.

4. Repeat

Repeat E-Step and M-Step until the model converges, i.e., changes are very small.

Notation:

• K: Number of Gaussian components

• N: Number of data points

• D: Dimensionality of the data

GMM Parameters

Means (μ) → The center of each Gaussian component

Covariance (Σ) → The shape and spread of each component

Weights (π) → The probability (importance) of each component

Model Training

• Training a GMM involves setting the parameters using available data.

• The Expectation-Maximization (EM) technique is often employed, alternating between the

Expectation (E) and Maximization (M) steps until convergence.

Expectation-Maximization:

• During the E step, the model calculates the probability of each data point belonging to each
Gaussian component.

• The M step then adjusts the model’s parameters based on these probabilities.
Clustering and Density Estimation:

• Post-training, GMMs cluster data points based on the highest posterior probability.

• They are also used for density estimation, assessing the probability density at any point in the
feature space.

Applications:
• Customer segmentation
• Anomaly detection
• Image segmentation
• Speaker identification

Nearest Neighbour Methods:

K-Nearest Neighbors Algorithm:

What is KNN? – Simple Points

• K-Nearest Neighbors (KNN) is a supervised learning algorithm used for both classification
and regression.
• It is a lazy learning algorithm, meaning it does not build a model during training — it
simply stores the data.
• KNN makes predictions only when a new input is given.
• The algorithm assumes that similar data points are located close together in the feature
space.

To make a prediction:

• It finds the 'K' nearest neighbors (closest data points) to the input.
• For classification, it uses majority voting of the neighbors' labels.
• For regression, it takes the average of the neighbors' values.

Steps in KNN Algorithm:

Step 1: Choose the value of K

K is the number of nearest neighbors to look at when making a prediction.

A good value of K is chosen by testing for the best accuracy.

Step 2: Calculate Distance

Use Euclidean distance (or other distance measures) to find how close each training point is
to the new (target) point.

Step 3: Find Nearest Neighbors

Pick the K training points that are closest to the target point based on the calculated
distances.

Step 4: Make Prediction

For Classification:

→ Use majority voting — the class that appears the most among the K neighbors is selected.

For Regression:

→ Take the average of the values of the K nearest neighbors to predict the result.

K MEANS ALGORITHM
• K-Means is an unsupervised learning algorithm used to group unlabelled data into K
clusters based on similarity.
• The value of K is pre-defined and represents the number of clusters the algorithm should
form.
• It works by finding centroids (center points) and assigning data points to the nearest
centroid.
• The algorithm is iterative, repeatedly adjusting the centroids and cluster assignments until
the results stabilize.
• K-Means is widely used in tasks like customer segmentation, image compression, and
pattern recognition because it is simple, fast, and effective.

Main Purpose:

The K-Means algorithm tries to group similar data points together by minimizing the distance
between the points and their cluster center (called a centroid).
Working

1. Choose the number of clusters (K)

Decide how many groups (clusters) you want to divide the data into.

2. Initialize K centroids randomly

Select K random points from the dataset to serve as the initial centers of the clusters.

3. Assign each data point to the nearest centroid

Measure the distance (usually Euclidean distance) from each point to each centroid and
assign it to the closest one.

4. Update the centroids

Recalculate the centroid of each cluster by taking the average position of all the points in
that cluster.

5. Repeat steps 3 and 4

Continue reassigning points and updating centroids until the centroids do not change
much, or a maximum number of iterations is reached.

Ml-Unit Iii-1
No ratings yet
Ml-Unit Iii-1
46 pages
Unit 2
No ratings yet
Unit 2
29 pages
ML-PPT Unit Iii-1
No ratings yet
ML-PPT Unit Iii-1
38 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Unit3 ML
No ratings yet
Unit3 ML
23 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Unit 3 by GPT
No ratings yet
Unit 3 by GPT
10 pages
Decision Trees: Make A Decision (Represent An Outcome
No ratings yet
Decision Trees: Make A Decision (Represent An Outcome
4 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Unit 3 - ML (NEW)
No ratings yet
Unit 3 - ML (NEW)
68 pages
2025 Ensemble Learning
No ratings yet
2025 Ensemble Learning
25 pages
Tree-Based Methods
No ratings yet
Tree-Based Methods
32 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Tree Based Methods
No ratings yet
Tree Based Methods
64 pages
ML Unit 03
No ratings yet
ML Unit 03
23 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Lecture 5a
No ratings yet
Lecture 5a
24 pages
Random Forests Simplified
No ratings yet
Random Forests Simplified
39 pages
ML Unit 4
No ratings yet
ML Unit 4
47 pages
Machine Learning Chapter 2
No ratings yet
Machine Learning Chapter 2
53 pages
Business Analytics: Foundation: Material Handouts
No ratings yet
Business Analytics: Foundation: Material Handouts
7 pages
Supervised Decision TreeRandom Forest
No ratings yet
Supervised Decision TreeRandom Forest
39 pages
Machine Learning: Classification & Decision Trees
No ratings yet
Machine Learning: Classification & Decision Trees
24 pages
ML4 - Decision Trees & Random Forest
No ratings yet
ML4 - Decision Trees & Random Forest
44 pages
Module3 Final
No ratings yet
Module3 Final
66 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
2023-24 ML Notes 2
No ratings yet
2023-24 ML Notes 2
16 pages
Chap9 Cart 574 1
No ratings yet
Chap9 Cart 574 1
42 pages
6 DecisionTrees ID3 CART
No ratings yet
6 DecisionTrees ID3 CART
24 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Non-Metric Classification & Decision Trees
No ratings yet
Non-Metric Classification & Decision Trees
35 pages
Decision Tree
No ratings yet
Decision Tree
15 pages
S&ML Unit 6 - Q & A
No ratings yet
S&ML Unit 6 - Q & A
12 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Models: Regularization Is Used To Prevent Overfitting by Adding A Penalty To The Model's Complexity. 1.
No ratings yet
Models: Regularization Is Used To Prevent Overfitting by Adding A Penalty To The Model's Complexity. 1.
6 pages
ML Unit@4
No ratings yet
ML Unit@4
70 pages
Unit 3,4,5 ML (CS - AI)
No ratings yet
Unit 3,4,5 ML (CS - AI)
37 pages
Unit 3
No ratings yet
Unit 3
21 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
CS467-M4-Machine Learning-Ktustudents - in
No ratings yet
CS467-M4-Machine Learning-Ktustudents - in
9 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Classification Algorithms
No ratings yet
Classification Algorithms
31 pages
223a1131 ML Exp 4
No ratings yet
223a1131 ML Exp 4
9 pages
Chapter 09 CART-3
No ratings yet
Chapter 09 CART-3
42 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
ML Important
No ratings yet
ML Important
11 pages
Phys361 S24 Lecture 17 Random Forests
No ratings yet
Phys361 S24 Lecture 17 Random Forests
24 pages
Bagging, Boosting, Decision Trees, Random Forest
No ratings yet
Bagging, Boosting, Decision Trees, Random Forest
19 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Unit 3
No ratings yet
Unit 3
25 pages
Unit-IV New
No ratings yet
Unit-IV New
18 pages
Tree Based Algorithms in Machine Learning
No ratings yet
Tree Based Algorithms in Machine Learning
8 pages
016 Akram Khan
No ratings yet
016 Akram Khan
28 pages
Assignment Business
No ratings yet
Assignment Business
7 pages
Additional Reviewer in Practical Research 2
No ratings yet
Additional Reviewer in Practical Research 2
1 page
Effect of The 1-h Bundle On Mortality in Patients With Suspected Sepsis in The Emergency Department ICM 2024
No ratings yet
Effect of The 1-h Bundle On Mortality in Patients With Suspected Sepsis in The Emergency Department ICM 2024
10 pages
ABE Level 4 Diploma in Business Management
100% (1)
ABE Level 4 Diploma in Business Management
2 pages
Internal Assessment Evaluation Criteria
No ratings yet
Internal Assessment Evaluation Criteria
2 pages
AI Prompts Transform Design Process
No ratings yet
AI Prompts Transform Design Process
22 pages
Testbank For Brain Behavior An Introduction To Behavioral Neuroscience 5th Edition Garrett Instant Download
No ratings yet
Testbank For Brain Behavior An Introduction To Behavioral Neuroscience 5th Edition Garrett Instant Download
18 pages
Research Paper
No ratings yet
Research Paper
9 pages
Art of Questioning - Dr. Abecia NTOT
No ratings yet
Art of Questioning - Dr. Abecia NTOT
58 pages
The Effect of Working Students To Academic Performance
86% (7)
The Effect of Working Students To Academic Performance
8 pages
MU0017 Talent Mgt.
100% (1)
MU0017 Talent Mgt.
13 pages
Assignment Liver Cancer
No ratings yet
Assignment Liver Cancer
18 pages
SPED 5302 Week 5 Assignment
No ratings yet
SPED 5302 Week 5 Assignment
21 pages
A Presentation ON: Quick Response Manufacturing
100% (1)
A Presentation ON: Quick Response Manufacturing
23 pages
Podcast Listening Test
No ratings yet
Podcast Listening Test
9 pages
Industrial Marketing Management
No ratings yet
Industrial Marketing Management
12 pages
List of Ayurveda Colleges in Gujarat
No ratings yet
List of Ayurveda Colleges in Gujarat
1 page
Bc230416398 Edua630 Final Thesis Report
No ratings yet
Bc230416398 Edua630 Final Thesis Report
61 pages
VR21 Mba
No ratings yet
VR21 Mba
214 pages
(Psychology of Learning and Motivation 58) Brian H. Ross (Eds.) - Academic Press, Elsevier (2013)
No ratings yet
(Psychology of Learning and Motivation 58) Brian H. Ross (Eds.) - Academic Press, Elsevier (2013)
283 pages
(SFU) BNUR - Student Handbook (Programme Information) 2024-25
No ratings yet
(SFU) BNUR - Student Handbook (Programme Information) 2024-25
34 pages
Impact of Visual Impairment On Quality of Life
No ratings yet
Impact of Visual Impairment On Quality of Life
8 pages
Academic Report Assessment Guide
100% (1)
Academic Report Assessment Guide
16 pages
Case Study Analysis of Performance Appraisal (HRM) PDF
No ratings yet
Case Study Analysis of Performance Appraisal (HRM) PDF
4 pages
How To Get A 2 1 in Media Communication and Cultural Studies 1st Edition Noel R Williams 2024 Scribd Download
100% (5)
How To Get A 2 1 in Media Communication and Cultural Studies 1st Edition Noel R Williams 2024 Scribd Download
67 pages
Africa-China Relations Analysis
100% (1)
Africa-China Relations Analysis
5 pages
9 Chap - Module 2 - Consumer Buying Motives
No ratings yet
9 Chap - Module 2 - Consumer Buying Motives
7 pages
Marketing Management Basics
No ratings yet
Marketing Management Basics
29 pages
Salicylic Acid Solubility in Ethanol
No ratings yet
Salicylic Acid Solubility in Ethanol
2 pages