0% found this document useful (0 votes)

12 views5 pages

Report

Uploaded by

pedanticwiles

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views5 pages

Report

Uploaded by

pedanticwiles

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Assignment 2: ELL 409

Arnav Raj
October 13, 2024

Abstract
In this report, I present the implementation and analysis of a Decision Tree
classifier that I built from scratch, including both pre-pruning and post-pruning
strategies. I focus on handling class imbalance, choosing appropriate pruning
methods, and evaluating the model’s performance before and after pruning. I also
discuss my rationale for not using certain techniques like SMOTE and detail how I
selected pruning parameters such as the alpha value.

1 Introduction
1.1 Problem Statement
The objective of this assignment was to implement a Decision Tree classifier from scratch to predict
whether a bank client will subscribe to a term deposit based on various attributes. I faced an
imbalanced dataset, with a significant majority of the ’no’ class. The key challenges included handling
class imbalance, selecting appropriate pruning strategies to prevent overfitting, and optimizing the
model’s performance.

2 Approach
2.1 Data Preprocessing
I began by loading the dataset and inspecting it for missing values and data types. I used Label
Encoding to convert categorical variables into numerical values suitable for the model. Then, I split
the data into training and validation sets using an 80-20 split with stratification to maintain the
class distribution.

2.2 Handling Class Imbalance

Initially, I applied the Synthetic Minority Over-sampling Technique (SMOTE) to balance the dataset.
However, it led to overfitting; the model began to predict the minority class (’yes’) excessively,
matching the number of ’no’ predictions. To mitigate this, I employed a custom resampling strategy:

• I oversampled the ’yes’ class to constitute one-third of the total training data.

• This approach ensured that the minority class was better represented without causing the
model to overfit.

• I left the ’no’ class unchanged to preserve the original data distribution as much as possible.

1
2.3 Decision Tree Implementation
I implemented the Decision Tree with the following considerations:

• Criteria for Splitting: I used entropy as the criterion for measuring the quality of a split.

• Handling Continuous Features: I handled continuous features by finding the best threshold
that maximizes information gain.

• Stopping Conditions: I controlled the tree growth using parameters like maximum depth,
minimum samples per leaf, and minimum gain.

2.4 Pruning Strategy

2.4.1 Post-Pruning (Cost Complexity Pruning)
I chose post-pruning over pre-pruning for the following reasons:

• Full Tree Exploration: It allows the model to consider all possible splits before simplifying
the tree, potentially capturing more complex patterns.

• Better Generalization: By pruning after the tree is fully grown, I can use validation data
to decide which branches to prune, enhancing generalization.

2.4.2 Alpha Value Selection

The alpha value controls the complexity penalty in the cost complexity pruning algorithm. I selected
the alpha value as follows:

• I generated a range of alpha values from the tree’s structure.

• I performed cross-validation to select the optimal alpha that minimizes the validation error.

• The optimal alpha value was found to be 0.25.

• The selected alpha balances the trade-off between tree complexity and model performance.

2.5 Rationale for Not Using SMOTE

I decided not to use SMOTE for the following reasons:

• Overfitting Risk: SMOTE generated synthetic samples that caused the model to overfit, as
it started predicting the minority class excessively.

• Data Integrity: The synthetic samples might not represent realistic scenarios, potentially
introducing noise.

• Alternative Approach: My custom resampling provided better control over the class distri-
bution without significantly altering the original data.

3 Results and Observations

3.1 Optimal Alpha and Pruning
After performing cross-validation, the optimal alpha value was determined to be 0.25. Pruning the
tree with this alpha value resulted in a significant reduction in tree size and improved generalization.

2
3.2 Model Performance Before and After Pruning
I evaluated the model’s performance on the validation set before and after pruning:

• Performance Before Pruning:

– Accuracy: 0.8552
– Precision: 0.4149
– Recall: 0.5784
– F1-Score: 0.4832

• Performance After Pruning:

– Accuracy: 0.8588
– Precision: 0.4292
– Recall: 0.6276
– F1-Score: 0.5098

Interestingly, the performance metrics remained the same before and after pruning. However, the
pruned tree is significantly less complex, which is beneficial for interpretability and may generalize
better to unseen data.

3.3 Comparison of Tree Sizes

Before Pruning After Pruning

Total Nodes 5989 523
Total Leaves 2995 262

Table 1: Comparison of Tree Sizes Before and After Pruning

As shown in Table 1, pruning reduced the total number of nodes from 5989 to 523 and the number
of leaves from 2995 to 262. This significant reduction in complexity indicates that many branches in
the unpruned tree were not contributing to better performance.

3
3.4 Visualizations

Figure 1: Feature Importance

Figure 2: Confusion Matrix of the Pruned Model on Validation Data

4
Figure 3: ROC Curve of the Pruned Model on Validation Data

4 Conclusion
In conclusion, implementing a Decision Tree classifier with post-pruning effectively addressed the
issues of overfitting and class imbalance. By adjusting the class distribution to make the ’yes’ class
one-third of the training data, I was able to better learn the patterns associated with the minority
class without overfitting. Post-pruning with an optimal alpha value of 0.25 significantly simplified
the tree without compromising performance, enhancing the model’s generalization capabilities.

5 References
1 Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106.

2 Han, J., Kamber, M., & Pei, J. (2011). Data Mining: Concepts and Techniques. Elsevier.

3 Imbalanced-learn Documentation. https://imbalanced-learn.org/stable/

4 Scikit-learn Documentation. https://scikit-learn.org/stable/modules/tree.html

5 SMOTE: Synthetic Minority Over-sampling Technique. https://arxiv.org/abs/1106.1813

Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
ML Mod2
No ratings yet
ML Mod2
5 pages
Unit 4-2
No ratings yet
Unit 4-2
20 pages
Decision Tree
No ratings yet
Decision Tree
68 pages
Lec 9-Decision Tree Overfiting
No ratings yet
Lec 9-Decision Tree Overfiting
14 pages
Unit 4
No ratings yet
Unit 4
33 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
Lecture-4 Unit 2
No ratings yet
Lecture-4 Unit 2
73 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Non-Metric Classification & Decision Trees
No ratings yet
Non-Metric Classification & Decision Trees
35 pages
68546bc500cdd
No ratings yet
68546bc500cdd
6 pages
AIML Ak
No ratings yet
AIML Ak
21 pages
DMDW Experiment-9
No ratings yet
DMDW Experiment-9
5 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Tree
No ratings yet
Tree
7 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Machine Learning: Professor Department of Computer Science & Engineering
No ratings yet
Machine Learning: Professor Department of Computer Science & Engineering
33 pages
Classification Problems
No ratings yet
Classification Problems
53 pages
MLA Lab 6:-Implementation of Decision Tree
No ratings yet
MLA Lab 6:-Implementation of Decision Tree
16 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
Decision Tree Classification Guide
No ratings yet
Decision Tree Classification Guide
8 pages
Decision Trees for Data Mining Students
No ratings yet
Decision Trees for Data Mining Students
30 pages
21cs54 Aiml Module4
No ratings yet
21cs54 Aiml Module4
128 pages
EST Cheatsheet
No ratings yet
EST Cheatsheet
5 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
14 pages
Overfitting in Decision Trees
No ratings yet
Overfitting in Decision Trees
19 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Prunning 2
No ratings yet
Prunning 2
21 pages
Lecture 7.2 - DTC Algorithm Implementation
No ratings yet
Lecture 7.2 - DTC Algorithm Implementation
7 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
PDS LVC 2 Post-Session Summary
No ratings yet
PDS LVC 2 Post-Session Summary
11 pages
Decissin Tree & Over Fitting
No ratings yet
Decissin Tree & Over Fitting
22 pages
ML Lab Record2
No ratings yet
ML Lab Record2
42 pages
ML Unit3
No ratings yet
ML Unit3
8 pages
2.unit 2
No ratings yet
2.unit 2
23 pages
ML Unit@4
No ratings yet
ML Unit@4
70 pages
ReducedRandomForest Extended
No ratings yet
ReducedRandomForest Extended
17 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Updated DM Unit 3
No ratings yet
Updated DM Unit 3
28 pages
Random Forest
No ratings yet
Random Forest
25 pages
ML: Decision Trees & Random Forests
No ratings yet
ML: Decision Trees & Random Forests
25 pages
CSL0777 L25
No ratings yet
CSL0777 L25
39 pages
ML Unit 4
No ratings yet
ML Unit 4
47 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
Ml-Unit Iii-1
No ratings yet
Ml-Unit Iii-1
46 pages
Decision Tree Learning Challenges
No ratings yet
Decision Tree Learning Challenges
23 pages
2023AIB1008 Lab08
No ratings yet
2023AIB1008 Lab08
8 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
Assignment 5Dm
No ratings yet
Assignment 5Dm
3 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages
Prac 6
No ratings yet
Prac 6
6 pages
Wells Fargo International Solutions Private Limited (QAP - Intern Analysts)
No ratings yet
Wells Fargo International Solutions Private Limited (QAP - Intern Analysts)
7 pages
Whirlpool Corporation (Intern)
No ratings yet
Whirlpool Corporation (Intern)
2 pages
VTION (Android Developer)
No ratings yet
VTION (Android Developer)
3 pages
Wipro Enterprises (P) Limited (Summer Internship Program (FMCG) )
No ratings yet
Wipro Enterprises (P) Limited (Summer Internship Program (FMCG) )
2 pages
Wadhwani Foundation (GenAI and Blockchain Internship)
No ratings yet
Wadhwani Foundation (GenAI and Blockchain Internship)
3 pages
Warner Bros. Discovery (Software Engineer)
No ratings yet
Warner Bros. Discovery (Software Engineer)
3 pages
Urja Mobility (Technical Support and R&D Intern)
No ratings yet
Urja Mobility (Technical Support and R&D Intern)
2 pages
VTION (Data Scientist)
No ratings yet
VTION (Data Scientist)
3 pages
Urja Mobility (Supply Chain Management Intern)
No ratings yet
Urja Mobility (Supply Chain Management Intern)
3 pages
VTION (LLM AI Developer)
No ratings yet
VTION (LLM AI Developer)
3 pages
WinZO Games (Software Developer Intern)
No ratings yet
WinZO Games (Software Developer Intern)
3 pages
Enterprise AI with Generative RAG
No ratings yet
Enterprise AI with Generative RAG
8 pages
L04 Problem Solving As Search Informed
No ratings yet
L04 Problem Solving As Search Informed
49 pages
L05 Local Search Algorithms
No ratings yet
L05 Local Search Algorithms
28 pages
L07 Probabilistic Reasoning Till Sep6
No ratings yet
L07 Probabilistic Reasoning Till Sep6
71 pages
AI Search Techniques for Students
No ratings yet
AI Search Techniques for Students
65 pages
L06 Adversarial Search
No ratings yet
L06 Adversarial Search
66 pages
Artificial Intelligence Application On Aircraft Ma
No ratings yet
Artificial Intelligence Application On Aircraft Ma
7 pages
IOE Thapathali Campus Minor and Major Project Report Template 5 - 7
No ratings yet
IOE Thapathali Campus Minor and Major Project Report Template 5 - 7
19 pages
CHAPTER 12 Bolaños GASB
No ratings yet
CHAPTER 12 Bolaños GASB
9 pages
Artificial-Intelligence-Ai-In-Business Statista
No ratings yet
Artificial-Intelligence-Ai-In-Business Statista
34 pages
Comparison of HOG, MSER, SIFT, FAST, LBP and CANNY Features For Cell Detection in Histopathological Images
No ratings yet
Comparison of HOG, MSER, SIFT, FAST, LBP and CANNY Features For Cell Detection in Histopathological Images
6 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
195 pages
Policy-Based Reinforcement Learning: Shusen Wang
No ratings yet
Policy-Based Reinforcement Learning: Shusen Wang
46 pages
HyperVerge AI Governance Framework
No ratings yet
HyperVerge AI Governance Framework
6 pages
Zhou Et Al - Designing AI Learning Experiences For K-12
No ratings yet
Zhou Et Al - Designing AI Learning Experiences For K-12
16 pages
AI For Games - Unit 1 Notes
No ratings yet
AI For Games - Unit 1 Notes
8 pages
Question Bank - REINFORCEMENT LEARNING
80% (5)
Question Bank - REINFORCEMENT LEARNING
2 pages
NATO Air Interdiction & Support
100% (1)
NATO Air Interdiction & Support
44 pages
AI Governance Framework for the U.S.
No ratings yet
AI Governance Framework for the U.S.
10 pages
Efficient RTM
No ratings yet
Efficient RTM
20 pages
An Empirical Study of AI Techniques in Mobile Applications
No ratings yet
An Empirical Study of AI Techniques in Mobile Applications
23 pages
ĐỀ 22
No ratings yet
ĐỀ 22
8 pages
Thesis Repot
No ratings yet
Thesis Repot
9 pages
Artificial Intelligence in Business - Creating Value With Machine Learning - Professional & Executive Development - Harvard DCE
No ratings yet
Artificial Intelligence in Business - Creating Value With Machine Learning - Professional & Executive Development - Harvard DCE
4 pages
A Review On Recent Scenario of Cosmetics
0% (1)
A Review On Recent Scenario of Cosmetics
8 pages
CS236 Default Project
No ratings yet
CS236 Default Project
3 pages
Object Detection Using You Only Look Once (YOLO) Algorithm in Convolution Neural Network (CNN)
No ratings yet
Object Detection Using You Only Look Once (YOLO) Algorithm in Convolution Neural Network (CNN)
5 pages
Audit 3.0
No ratings yet
Audit 3.0
37 pages
AI Index Report 2024 - Artificial Intelligence Index
No ratings yet
AI Index Report 2024 - Artificial Intelligence Index
1 page
A Playwriting Technique To Engage On A Shared Reflective Enquiry PDF
No ratings yet
A Playwriting Technique To Engage On A Shared Reflective Enquiry PDF
10 pages
Cyber Security Course in Malappuram
No ratings yet
Cyber Security Course in Malappuram
10 pages
Intro to Machine Learning Concepts
No ratings yet
Intro to Machine Learning Concepts
35 pages
Government Polytechnic College: Machine Learning
No ratings yet
Government Polytechnic College: Machine Learning
22 pages
Sample Term Paper
No ratings yet
Sample Term Paper
7 pages
Pattern Classification Labreport
No ratings yet
Pattern Classification Labreport
5 pages
Digital Image Processing Chapter 8 Image Analysis and Pattern Recognition
No ratings yet
Digital Image Processing Chapter 8 Image Analysis and Pattern Recognition
36 pages