0% found this document useful (0 votes)

10 views26 pages

CART

Decision trees are a tree-structured algorithm used for modeling relationships in data, suitable for both classification and regression tasks. They are easy to interpret and visualize but may not perform as well as more complex methods like random forests or boosting. Key concepts include root nodes, internal nodes, and leaf nodes, with considerations for overfitting and pruning to optimize model performance.

Uploaded by

jparrojado

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views26 pages

CART

Uploaded by

jparrojado

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Decision Trees

Decision Tree Overview

● Algorithm uses tree structure to model relationships
among the features and the potential outcomes

● It breaks down dataset into smaller subset with

increase in depth of tree

● It’s a flowchart for deciding how to classify a new

observation
Decision Tree Overview
● Decision trees are simple and useful for intepretation
● Has nice graphical representation
● Typically not competitive with best supervised learning approaches
● Random forests and Boosting are some ways to modify the approach
● Sometimes result in dramatic increase in prediction accuracy
● We will focus on basic Classification And Regression Trees (CART)
Terminology in CART
● Root node – first decision node starting from the original data

● Internal nodes – points along the tree where we split the data

● Branch – segment of the tree that connect the nodes

● Leaf/Terminal node – holds final result of splitting the data

Decision Tree Overview
Root Node
Branch

Internal node Internal node

Spli
t

Terminal node Internal node Terminal node Terminal node

Terminal node Terminal node

Classification Vs Regression
● Classification
○ Spam/ Not spam
○ Admit to ICU or not
○ Lend money/ deny
○ Intrusion detection

● Regression
○ Predict stock returns
○ Pricing a house or a car
○ Weather predictions (Temp, Rainfall etc.)
○ Economic growth predictions
Decision Tree Overview
● The decision tree algorithm learns (i.e creates the decision tree from the data set)
through the optimization of an error function.

● CART is one of the special techniques which can be used for solving both regression &
classification problems.

● In this lecture, we will be focusing on how to solve a classification problem. The

approach is similar for solving a regression problem.
Visualizing a Classification tree
● Let’s look at a simple example
● How do we classify the observations into red or blue?
Steps in a decision tree
●
Steps in a decision tree
●

RED CLASS BLUE CLASS

Steps in a decision tree
● Once the final divisions are made, if
a new data point comes in which has
value (0.2,0.3) – the classification BLUE CLASS RED CLASS

tree predicts that the new

observation is Red as it falls in the
region where the majority is Red
RED CLASS BLUE CLASS
Representation as a tree

Red Blue
Blue Red
●
How do we know where to split?
● The below dataset was easy where the pattern was easy to visualize. What about datasets which
are complicated where the patterns are more complex? How does the algorithm decide where to
split?

BLUE CLASS RED CLASS

RED CLASS BLUE CLASS

How do we know where to split?

● We choose the variable and place to split that results in the smallest sum
of the Gini indices of the two new regions
Maximum and minimum of Gini Index (2 classes)

G=1(1-1)+0(1-0)=0
Simple example
How do we know where to split?

G1 G2

G3 G4
Let’s build a small tree for a real world problem

Root Node
MALE FEMALE

Node Node
Group 1

Group 2
Let’s build a small tree for a real world problem
●

Group 1

Group 2
Let’s build a small tree for a real world problem

● A computer would calculate the Gini index for

splitting with different independent variables –
Occupation, Age, Gender etc. and finds the
optimal variable to split the data
Overfitting in Decision trees

● Splitting non-stop eventually

leads to each point being its
own region
● Such a decision tree model
would perform very poorly
on unseen test data.
● Our model is overfitting the
data !
How do we know when to stop?

● One strategy is to stop after the decrease in Gini index due to each split
is smaller than some threshold
● Not good as we may miss some good splits due to stopping early at poor
splits
● Better strategy is to grow a very large tree, then prune it to obtain a
subtree
Pruning
●
Pruning

● Pruning is not available in MS Azure

● How to decide optimal size of a tree?

● Answer – tuning (trial and error) and cross validation

● Build different decision trees with different hyperparameter values. Use

cross validation to observe their performance, and pick the best one
Hyperparameters for Decision Tree
● Maximum Depth- the largest length between the root to leaf – We can define the
depth of the decision tree

● Minimum number of samples per leaf - we can set a minimum for the number of
samples we allow on each leaf.

● Minimum sample split - the minimum number of samples required to split an internal
node

● Maximum features - the number of features that one looks for in each split.
Advantages Vs Disadvantages

Advantages Disadvantages

• Graphical representation of results: • Prediction accuracy is not as good as

very easy to explain to people more complicated approaches
(probably even easier than linear • Computational issue with large
regression) categorical variables
• Intepretation (logic of the model) • Final tree is not very stable (small
• Handle missing data change in data leads to very different
• Handles numeric and categorical tree)
variables
• Captures nonlinear effect

Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
13 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Unit 4
No ratings yet
Unit 4
33 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Decisiontree, Prefixcodeandgametree
No ratings yet
Decisiontree, Prefixcodeandgametree
12 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
2.unit 2
No ratings yet
2.unit 2
23 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages
Breaking Down Decision Tree Algorithm
No ratings yet
Breaking Down Decision Tree Algorithm
10 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Lecture 3 - Decision Trees and Random Forest
No ratings yet
Lecture 3 - Decision Trees and Random Forest
20 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Unit 3
No ratings yet
Unit 3
25 pages
Decision Tree Learning (8 Hours)
No ratings yet
Decision Tree Learning (8 Hours)
141 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Decision Tree
0% (1)
Decision Tree
16 pages
Machine Learning Chapter 4
No ratings yet
Machine Learning Chapter 4
9 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
CSL0777 L25
No ratings yet
CSL0777 L25
39 pages
Tree
No ratings yet
Tree
7 pages
Unit 3.2 Decision Tree Algorithm Wit Examples
No ratings yet
Unit 3.2 Decision Tree Algorithm Wit Examples
85 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
Decision Trees for Beginners
No ratings yet
Decision Trees for Beginners
45 pages
Decision Tree & Random ForestNotes
No ratings yet
Decision Tree & Random ForestNotes
11 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
34 pages
Unit 3 - ML (NEW)
No ratings yet
Unit 3 - ML (NEW)
68 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
Lecture 5a
No ratings yet
Lecture 5a
24 pages
Decsion Tree
No ratings yet
Decsion Tree
6 pages
ML-PPT Unit Iii-1
No ratings yet
ML-PPT Unit Iii-1
38 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
7 pages
Non-Metric Classification & Decision Trees
No ratings yet
Non-Metric Classification & Decision Trees
35 pages
AI - Mod 5. Part 2
No ratings yet
AI - Mod 5. Part 2
40 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Decision Trees for Data Scientists
0% (1)
Decision Trees for Data Scientists
24 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Decision Tree DT
No ratings yet
Decision Tree DT
20 pages
Classification: Decision Trees: Business Analytics Lecture 7/8
No ratings yet
Classification: Decision Trees: Business Analytics Lecture 7/8
35 pages
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
No ratings yet
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
22 pages
Naive Bayes and Decision Tree Classification
No ratings yet
Naive Bayes and Decision Tree Classification
21 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Decisiontrees
No ratings yet
Decisiontrees
28 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Blaine Ciment 114q
100% (1)
Blaine Ciment 114q
38 pages
Variations in The Wave Climate and Sediment Transport Due To Climate Change Along The Coast of Vietnam
No ratings yet
Variations in The Wave Climate and Sediment Transport Due To Climate Change Along The Coast of Vietnam
20 pages
Does The Weather Affect How Happy People Feel
No ratings yet
Does The Weather Affect How Happy People Feel
5 pages
Operations Management Forecasting
No ratings yet
Operations Management Forecasting
5 pages
System Identification Using Matlab: SWF CUA + +
No ratings yet
System Identification Using Matlab: SWF CUA + +
6 pages
First Order Differential Equations
No ratings yet
First Order Differential Equations
9 pages
Gutter Design
100% (1)
Gutter Design
2 pages
Forecasting Gas Investment Prices
No ratings yet
Forecasting Gas Investment Prices
13 pages
Heat Transfer Problem Set
No ratings yet
Heat Transfer Problem Set
2 pages
VAR Models for Time Series Analysis
No ratings yet
VAR Models for Time Series Analysis
8 pages
Boyle's Law Experiment
60% (5)
Boyle's Law Experiment
6 pages
A Confidence Interval For The Median Survival Time
No ratings yet
A Confidence Interval For The Median Survival Time
14 pages
Lesson Plan
No ratings yet
Lesson Plan
8 pages
Forecasting Models & Regression Analysis
No ratings yet
Forecasting Models & Regression Analysis
13 pages
Effects of Geometry and Pavement Friction On Horizontal Curve Crash Frequency
No ratings yet
Effects of Geometry and Pavement Friction On Horizontal Curve Crash Frequency
23 pages
Scientific Method Worksheets PDF
90% (10)
Scientific Method Worksheets PDF
29 pages
Probability for Business Analysts
No ratings yet
Probability for Business Analysts
58 pages
STATA Training Session 2
No ratings yet
STATA Training Session 2
45 pages
Forecasting Techniques Guide
100% (1)
Forecasting Techniques Guide
55 pages
Refrigeration & Liquefaction
No ratings yet
Refrigeration & Liquefaction
21 pages
R Squared and Adjusted R Squared
100% (1)
R Squared and Adjusted R Squared
2 pages
Lee Et Al., 2017
No ratings yet
Lee Et Al., 2017
9 pages
Time Series Analysis: 1.case Study
No ratings yet
Time Series Analysis: 1.case Study
15 pages
Random Sampling
No ratings yet
Random Sampling
2 pages
Technical Report On The Alicia Copper Gold Project
100% (2)
Technical Report On The Alicia Copper Gold Project
109 pages
E-Journal GJHSS B Vol 14 Issue 3
No ratings yet
E-Journal GJHSS B Vol 14 Issue 3
80 pages
Khairul Huda (Fullpaper Revision 1)
No ratings yet
Khairul Huda (Fullpaper Revision 1)
8 pages
1673 Larose 3e ch09 SE
No ratings yet
1673 Larose 3e ch09 SE
88 pages
Wetted Wall Column
20% (5)
Wetted Wall Column
5 pages
F.S. OF B.E.-I (Civil - Irrigation Water Management) Engineering Drawing - I
No ratings yet
F.S. OF B.E.-I (Civil - Irrigation Water Management) Engineering Drawing - I
53 pages

CART

Uploaded by

CART

Uploaded by

Decision Trees

Decision Tree Overview

● It breaks down dataset into smaller subset with

● It’s a flowchart for deciding how to classify a new

● Branch – segment of the tree that connect the nodes

● Leaf/Terminal node – holds final result of splitting the data

Internal node Internal node

Terminal node Internal node Terminal node Terminal node

Terminal node Terminal node

● In this lecture, we will be focusing on how to solve a classification problem. The

RED CLASS BLUE CLASS

tree predicts that the new

BLUE CLASS RED CLASS

RED CLASS BLUE CLASS

● A computer would calculate the Gini index for

● Splitting non-stop eventually

● Pruning is not available in MS Azure

● Answer – tuning (trial and error) and cross validation

● Build different decision trees with different hyperparameter values. Use

• Graphical representation of results: • Prediction accuracy is not as good as

You might also like