0% found this document useful (0 votes)

71 views11 pages

Decision Trees

The document discusses decision trees, including how they are constructed from nodes and branches, the different types of nodes, how to determine the root node, the training and prediction processes, and the key mathematical concepts of entropy and information gain. Decision trees are built recursively by evaluating features to find the best split criteria at each node, using entropy to measure purity and information gain to evaluate splits, until a stopping criteria is reached. The document provides examples of calculating entropy and information gain in Python code and discusses how recursion is applied in decision tree implementations.

Uploaded by

Derek Degbedzui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views11 pages

Decision Trees

Uploaded by

Derek Degbedzui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

5.

Decision Trees

● Decision trees are a non-parametric model used for both regression and
classification tasks
● This notebook demonstrates how to use decision trees for classification
● Decision trees are constructed from only two elements - nodes and branches
● Decision trees are based on recursion
○ A function that calls itself until some exit condition is met
○ The algorithm is built by recursively evaluating different features
and using the best split feature and criteria at each node
○ You'll learn how the best split feature and criteria are calculated in
the math section
● There are multiple types of node a decision tree could have, and all are presented
in the following figure:

● Root node - node at the top of the tree, contains a feature that best splits the data
(a single feature that alone classifies the target variable most accurately)
● Decision nodes - nodes where the variables are evaluated. These nodes have
arrows pointing to them and away from them
● Leaf nodes - final node at which the prediction is made

How to determine the root node

● Check how every input feature classifies the target variable independently
● If neither is 100% correct, we can consider them as impure
● The Entropy metric can be used to calculate impuritz
○ Formula discussed later
○ Values range from 0 (best) to 1 (worst)
● The variable with the lowest entropy (impurity) is used as a root node

Training process
● Determine the root node (discussed earlier)
● Calculate the Information gain for a single split
○ Formula discussed later
○ The higher the gain the better the split
● Do a greedy search
○ Go over all input feature and their unique values (thresholds)
○ Calculate information gain for every feature/threshold combination
○ Save the best split feature and best split threshold for every node
○ Build the tree recursively
○ Some stopping criteria should be applied when doing so
■ Think of it as an exit condition of a recursive
function
■ This could be maximum depth, minimum samples
at node...
○ If at the leaf node, return the prediction (most common value)
■ You'll know you're at a leaf node if a stopping
criteria has been met or if the split is pure

Prediction process

● Recursively traverse the tree

● At each node check if the direction of the traversal (left or right), based on the input
data
● When the leaf node is reached, the most common value is returned

Math behind

● Essentially, you only need to implement two formulas

○ Entropy
○ Information gain
● Entropy
○ Measures the purity of the split
○ Calculated at the node level
○ Ranges between 0 (pure) and 1 (impure)
● Example:

● Example in Python:
In [1]:
import numpy as np
from collections import Counter

In [2]:
def entropy(s):
counts = np.bincount(s)
percentages = counts / len(s)

entropy = 0
for pct in percentages:
if pct > 0:
entropy += pct * np.log2(pct)
return -entropy

In [3]:
s = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1]
print(f'Entropy: {np.round(entropy(s), 5)}')

Entropy: 0.88129

● Information gain:
○ Simply an average of all entropy based on a specific split
○ The higher the information gain, the better the decision split is

● Example:

● Example in Python:
In [4]:
def information_gain(parent, left_child, right_child):
num_left = len(left_child) / len(parent)
num_right = len(right_child) / len(parent)

gain = entropy(parent) - (num_left * entropy(left_child) + num_right * entropy(right_child))

return gain

In [5]:
parent = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]
left_child = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
right_child = [0, 0, 0, 0, 1, 1, 1, 1]

print(f'Information gain: {np.round(information_gain(parent, left_child, right_child), 5)}')

Information gain: 0.18094

Recursion Crash Course

● A lot of decision trees implementation boils down to recursion

● Put simply, a recursive function is a function that calls itself
● Some "exit condition" is required if a function will call itself multiple times
○ It's common to write it at the top of the function
● Let's take a look at the simplest example - a recursive function that returns a
factorial of an integer
In [16]:
def factorial(x):
# Exit condition
if x == 1:
return 1
return x * factorial(x - 1)

print(f'Factorial of 5 is {factorial(5)}')

Factorial of 5 is 120

● Recursion is needed in decision tree classifiers to build additional nodes of a tree

until some condition (exit) is met
Implementation

● We'll need two classes

○ Node - implements a single node of a decision tree
○ DecisionTree - implements the algorithm
● The Node class is here to store the data about the feature, threshold, data going left
and right, information gain, and the leaf node value
○ All are initially set to None
○ The leaf node value is available only for leaf nodes
In [6]:
class Node:
'''
Helper class which implements a single tree node.
'''
def __init__(self, feature=None, threshold=None, data_left=None, data_right=None, gain=None,
value=None):
self.feature = feature
self.threshold = threshold
self.data_left = data_left
self.data_right = data_right
self.gain = gain
self.value = value

● The DecisionTree class contains a bunch of methods

● The constructor holds values for min_samples_split and max_depth. These are
hyperparameters. The first one is used to specify a minimum number of samples
required to split a node, and the second one specifies a maximum depth of a tree.
Both are used in recursive functions as exit conditions
● The _entropy(s) function calculates the impurity of an input vector s
● The _information_gain(parent, left_child, right_child) calculates the information gain
value of a split between a parent and two children
● The _best_split(X, y) function calculates the best splitting parameters for input
features X and a target variable y
○ It does so by iterating over every column in X and every threshold
value in every column to find the optimal split using information
gain
● The _build(X, y, depth) function recursively builds a decision tree until stopping
criterias are met (hyperparameters in the constructor)
● The fit(X, y) function calls the _build() function and stores the built tree to the
constructor
● The _predict(x) function traverses the tree to classify a single instance
● The predict(X) function applies the _predict() function to every instance in matrix X
In [7]:
class DecisionTree:
'''
Class which implements a decision tree classifier algorithm.
'''
def __init__(self, min_samples_split=2, max_depth=5):
self.min_samples_split = min_samples_split
self.max_depth = max_depth
self.root = None

@staticmethod
def _entropy(s):
'''
Helper function, calculates entropy from an array of integer values.

:param s: list
:return: float, entropy value
'''
# Convert to integers to avoid runtime errors
counts = np.bincount(np.array(s, dtype=np.int64))
# Probabilities of each class label
percentages = counts / len(s)

# Caclulate entropy
entropy = 0
for pct in percentages:
if pct > 0:
entropy += pct * np.log2(pct)
return -entropy

def _information_gain(self, parent, left_child, right_child):

'''
Helper function, calculates information gain from a parent and two child nodes.

:param parent: list, the parent node

:param left_child: list, left child of a parent
:param right_child: list, right child of a parent
:return: float, information gain
'''
num_left = len(left_child) / len(parent)
num_right = len(right_child) / len(parent)

# One-liner which implements the previously discussed formula

return self._entropy(parent) - (num_left * self._entropy(left_child) + num_right *
self._entropy(right_child))

def _best_split(self, X, y):

'''
Helper function, calculates the best split for given features and target

:param X: np.array, features

:param y: np.array or list, target
:return: dict
'''
best_split = {}
best_info_gain = -1
n_rows, n_cols = X.shape

# For every dataset feature

for f_idx in range(n_cols):
X_curr = X[:, f_idx]
# For every unique value of that feature
for threshold in np.unique(X_curr):
# Construct a dataset and split it to the left and right parts
# Left part includes records lower or equal to the threshold
# Right part includes records higher than the threshold
df = np.concatenate((X, y.reshape(1, -1).T), axis=1)
df_left = np.array([row for row in df if row[f_idx] <= threshold])
df_right = np.array([row for row in df if row[f_idx] > threshold])

# Do the calculation only if there's data in both subsets

if len(df_left) > 0 and len(df_right) > 0:
# Obtain the value of the target variable for subsets
y = df[:, -1]
y_left = df_left[:, -1]
y_right = df_right[:, -1]

# Caclulate the information gain and save the split parameters

# if the current split if better then the previous best
gain = self._information_gain(y, y_left, y_right)
if gain > best_info_gain:
best_split = {
'feature_index': f_idx,
'threshold': threshold,
'df_left': df_left,
'df_right': df_right,
'gain': gain
}
best_info_gain = gain
return best_split

def _build(self, X, y, depth=0):

'''
Helper recursive function, used to build a decision tree from the input data.

:param X: np.array, features

:param y: np.array or list, target
:param depth: current depth of a tree, used as a stopping criteria
:return: Node
'''
n_rows, n_cols = X.shape

# Check to see if a node should be leaf node

if n_rows >= self.min_samples_split and depth <= self.max_depth:
# Get the best split
best = self._best_split(X, y)
# If the split isn't pure
if best['gain'] > 0:
# Build a tree on the left
left = self._build(
X=best['df_left'][:, :-1],
y=best['df_left'][:, -1],
depth=depth + 1
)
right = self._build(
X=best['df_right'][:, :-1],
y=best['df_right'][:, -1],
depth=depth + 1
)
return Node(
feature=best['feature_index'],
threshold=best['threshold'],
data_left=left,
data_right=right,
gain=best['gain']
)
# Leaf node - value is the most common target value
return Node(
value=Counter(y).most_common(1)[0][0]
)

def fit(self, X, y):

'''
Function used to train a decision tree classifier model.

:param X: np.array, features

:param y: np.array or list, target
:return: None
'''
# Call a recursive function to build the tree
self.root = self._build(X, y)

def _predict(self, x, tree):

'''
Helper recursive function, used to predict a single instance (tree traversal).

:param x: single observation

:param tree: built tree
:return: float, predicted class
'''
# Leaf node
if tree.value != None:
return tree.value
feature_value = x[tree.feature]

# Go to the left
if feature_value <= tree.threshold:
return self._predict(x=x, tree=tree.data_left)

# Go to the right
if feature_value > tree.threshold:
return self._predict(x=x, tree=tree.data_right)

def predict(self, X):

'''
Function used to classify new instances.

:param X: np.array, features

:return: np.array, predicted classes
'''
# Call the _predict() function for every observation
return [self._predict(x, self.root) for x in X]
Testing

● We'll use the Iris dataset from Scikit-Learn

In [8]:
from sklearn.datasets import load_iris

iris = load_iris()

X = iris['data']
y = iris['target']

● The below code applies train/test split to the dataset:

In [9]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

● You can now initialize and train the model, and make the predictions afterwards:
In [10]:
model = DecisionTree()
model.fit(X_train, y_train)
preds = model.predict(X_test)

In [11]:
np.array(preds, dtype=np.int64)

In [12]:
y_test

● As you can see, the arrays are identical

● Let's calculate the accuracy to confirm this:
In [13]:
from sklearn.metrics import accuracy_score

accuracy_score(y_test, preds)

● As expected, the perfect score was obtained on the test set

Comparison with Scikit-Learn

● We already know our model works good, but let's compare it to the
DecisionTreeClassifier from Scikit-Learn
In [14]:
from sklearn.tree import DecisionTreeClassifier

sk_model = DecisionTreeClassifier()
sk_model.fit(X_train, y_train)
sk_preds = sk_model.predict(X_test)
In [15]:
accuracy_score(y_test, sk_preds)

● Both perform the same, at least accuracy-wise

Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
Decision Trees Implementation
No ratings yet
Decision Trees Implementation
13 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
LVC 1 Post-Session Summary
No ratings yet
LVC 1 Post-Session Summary
9 pages
Decision Trees
No ratings yet
Decision Trees
7 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
08 - Ensemble Methods
No ratings yet
08 - Ensemble Methods
59 pages
Module 5 Notes
No ratings yet
Module 5 Notes
8 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
Decision Trees for Beginners
No ratings yet
Decision Trees for Beginners
45 pages
Decision Tree Basics for Data Scientists
No ratings yet
Decision Tree Basics for Data Scientists
61 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
Building A Decision Tree Classifier From Scratch
No ratings yet
Building A Decision Tree Classifier From Scratch
10 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
Entropy and Information Gain For Decision Tree Algorithm
No ratings yet
Entropy and Information Gain For Decision Tree Algorithm
12 pages
Decision Trees for Data Mining Students
No ratings yet
Decision Trees for Data Mining Students
30 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Tree Based Algorithms in Machine Learning
No ratings yet
Tree Based Algorithms in Machine Learning
8 pages
ID3 Algorithm for ML Students
No ratings yet
ID3 Algorithm for ML Students
6 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Decision Trees Cheat Sheet PDF
No ratings yet
Decision Trees Cheat Sheet PDF
2 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
EST Cheatsheet
No ratings yet
EST Cheatsheet
5 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
MA4270
No ratings yet
MA4270
1 page
Decissin Tree & Over Fitting
No ratings yet
Decissin Tree & Over Fitting
22 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
09 Decision Trees Nearest Neighbor
No ratings yet
09 Decision Trees Nearest Neighbor
8 pages
Random Forest
No ratings yet
Random Forest
25 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Trees for CS Students
No ratings yet
Decision Trees for CS Students
54 pages
Decision Tree
No ratings yet
Decision Tree
15 pages
Decision Tree
No ratings yet
Decision Tree
17 pages
Unit II
No ratings yet
Unit II
34 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
6 pages
Chapter 09 CART - Week 06 - 02
No ratings yet
Chapter 09 CART - Week 06 - 02
53 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
15 pages
Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
62 pages
Decision Tree Algorithm Guide
No ratings yet
Decision Tree Algorithm Guide
25 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
10 pages
Multiple Linear Regression Guide
No ratings yet
Multiple Linear Regression Guide
7 pages
Simple Linear Regression: Math Behind
0% (1)
Simple Linear Regression: Math Behind
6 pages
Better Data Science - Make Synthetic Datasets With Python
No ratings yet
Better Data Science - Make Synthetic Datasets With Python
4 pages
Linear Regression For Absolute Beginners With Implementation in Python
No ratings yet
Linear Regression For Absolute Beginners With Implementation in Python
17 pages
Better Data Science - Generate PDF Reports With Python
No ratings yet
Better Data Science - Generate PDF Reports With Python
5 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
5 pages
Data Analytics for Business Growth
No ratings yet
Data Analytics for Business Growth
13 pages
Easy Projects - CS619 Spring 2025 by MS Rehman
No ratings yet
Easy Projects - CS619 Spring 2025 by MS Rehman
20 pages
C++ STL and std::vector Guide
No ratings yet
C++ STL and std::vector Guide
18 pages
Flowchart Symbols Guide
No ratings yet
Flowchart Symbols Guide
2 pages
PowerPoint 2B Celebrations Instructions
No ratings yet
PowerPoint 2B Celebrations Instructions
2 pages
Digital Business Systems Guide
No ratings yet
Digital Business Systems Guide
39 pages
Office 2016 Activation
No ratings yet
Office 2016 Activation
2 pages
Azure 303
No ratings yet
Azure 303
46 pages
BIT 1101 Computer Architecture
No ratings yet
BIT 1101 Computer Architecture
89 pages
Python Database
No ratings yet
Python Database
3 pages
Physics Presentation (Invention of Blue Ray)
No ratings yet
Physics Presentation (Invention of Blue Ray)
20 pages
Software Engineering
No ratings yet
Software Engineering
21 pages
Hpe Proliant Gen11 Ai
No ratings yet
Hpe Proliant Gen11 Ai
7 pages
Accelerate Edtech Innovation With Aws
No ratings yet
Accelerate Edtech Innovation With Aws
12 pages
Governance Risk Compliance
No ratings yet
Governance Risk Compliance
6 pages
Input Output Devices
No ratings yet
Input Output Devices
34 pages
Paper1ga Final
No ratings yet
Paper1ga Final
5 pages
C Programming: Struct Assignment Guide
100% (2)
C Programming: Struct Assignment Guide
4 pages
Internship Report Presentation Slide Template
No ratings yet
Internship Report Presentation Slide Template
20 pages
Sistem Monitoring Dan Kontroling Irigasi Sawah Menggunakan Microcontroller Wemos D1 Berbasis Internet of Things
No ratings yet
Sistem Monitoring Dan Kontroling Irigasi Sawah Menggunakan Microcontroller Wemos D1 Berbasis Internet of Things
7 pages
Tests For Paired Sensitivities
No ratings yet
Tests For Paired Sensitivities
10 pages
Mining Modeling - Flexsim
100% (1)
Mining Modeling - Flexsim
19 pages
Webspeed
No ratings yet
Webspeed
4 pages
Latihan PBO 1
100% (1)
Latihan PBO 1
5 pages
Module 4 - Chapter 1
No ratings yet
Module 4 - Chapter 1
5 pages
Pandas Visualisation
No ratings yet
Pandas Visualisation
27 pages
Sap
No ratings yet
Sap
4 pages
SUSE Reprint - Omdia Universe Container Management Products 2024-25
No ratings yet
SUSE Reprint - Omdia Universe Container Management Products 2024-25
13 pages
Assignment PM
No ratings yet
Assignment PM
20 pages
1.1 Hardware and Software: WWW - Yahmad.Co - Uk
100% (2)
1.1 Hardware and Software: WWW - Yahmad.Co - Uk
5 pages
Minimize Time and Cost For Successful Completion of A Large Scale Project Applying Project Crashing Method PDF
No ratings yet
Minimize Time and Cost For Successful Completion of A Large Scale Project Applying Project Crashing Method PDF
9 pages

Decision Trees

Uploaded by

Decision Trees

Uploaded by

5.

How to determine the root node

● Recursively traverse the tree

● Essentially, you only need to implement two formulas

gain = entropy(parent) - (num_left * entropy(left_child) + num_right * entropy(right_child))

print(f'Information gain: {np.round(information_gain(parent, left_child, right_child), 5)}')

Information gain: 0.18094

Recursion Crash Course

● A lot of decision trees implementation boils down to recursion

● Recursion is needed in decision tree classifiers to build additional nodes of a tree

● We'll need two classes

● The DecisionTree class contains a bunch of methods

def _information_gain(self, parent, left_child, right_child):

:param parent: list, the parent node

# One-liner which implements the previously discussed formula

def _best_split(self, X, y):

:param X: np.array, features

# For every dataset feature

# Do the calculation only if there's data in both subsets

# Caclulate the information gain and save the split parameters

def _build(self, X, y, depth=0):

:param X: np.array, features

# Check to see if a node should be leaf node

def fit(self, X, y):

:param X: np.array, features

def _predict(self, x, tree):

:param x: single observation

def predict(self, X):

:param X: np.array, features

● We'll use the Iris dataset from Scikit-Learn

● The below code applies train/test split to the dataset:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

● As you can see, the arrays are identical

● As expected, the perfect score was obtained on the test set

● Both perform the same, at least accuracy-wise

You might also like