0% found this document useful (0 votes)

7 views25 pages

Module 3

The document provides an overview of similarity-based learning, regression analysis, and decision tree learning, detailing various algorithms such as K-Nearest Neighbor, Weighted K-Nearest Neighbor, and Locally Weighted Regression. It also discusses decision trees, including their structure, the ID3 algorithm for tree construction, and concepts like entropy and information gain. Additionally, it covers different types of regression, including linear, multiple, polynomial, and logistic regression.

Uploaded by

emanikanta535

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views25 pages

Module 3

Uploaded by

emanikanta535

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

4/19/2025

MODULE 3
SIMILARITY BASED LEARNING,
REGRESSION ANALYSIS, DECISION
TREE LEARNING

DR. SHIVA PRASAD KM

ASSOCIATE PROFESSOR
DEPARTMENT OF CSE
RYMEC BALLARI
M: 7899964163

OVERVIEW CHAPTER 4 5 & 6 FROM TEXT BOOK 1

■ Introduction on Similarity based learning
■ Nearest-Neighbor Learning
■ Weighted K-Nearest-Neighbor Algorithm
■ Nearest Centroid Classifier
■ Locally Weighted Regression (LWR).
■ Problems
■ Introduction to Regression
■ Introduction to Linear Regression
■ Multiple Linear Regression
■ Polynomial Regression
■ Logistic Regression.

Dr. Shiva Prasad KM Associate Professor CSE Department RYMEC Ballari M: 7899964163

1
4/19/2025

INTRODUCTION ON SIMILARITY BASED LEARNING

■ Similarity based classifiers use similarity measures to locate the nearest neighbors and classify the test
instances which works in contrast with other learning mechanisms such as decision trees or neural
networks.

■ Similarity based learning is also called as instance based learning or just in time learning since it does
not build an abstract model of the training instances and performs lazy learning classification for new
instances.

■ An instance is an entity or an example in the training dataset. It is described by a set of features or

attributes. One attribute describes the class label or category of the instance.

INSTANCE BASED LEARNING VS MODEL BASED LEARNING

2
4/19/2025

NEAREST NEIGHBOUR LEARNING

■ A natural approach to similarity based classification is K-Nearest Neighbor (K-NN) which is non-
parametric method used for both classification and regression problems.

■ KNN is one of the most basic yet essential classification algorithms in machine learning. It belongs to
the supervised learning domain and finds intense application in pattern recognition, data mining, and
intrusion detection.

■ It is simple and powerful non-parametric algorithm that predicts the category based on the test instances
according to the k training samples which are closer to the test instances and classifies it to that
category which has the largest probability.

■ There are various distance metrics used in K NN Algorithm they are Eucledian distance, Manhattan
Distance, Minkowski Distance

NEAREST NEIGHBOUR LEARNING

3
4/19/2025

NEAREST NEIGHBOUR LEARNING

■ Consider the student performance training dataset of 8 instances shown in the table below, which
describes the performance of individual student in a course and their CGPA obtained in previous
semesters based on the performance of student classify weather a student will pass or fail using K-
Nearest Neighbor algorithm?

WEIGHTED K-NEAREST NEIGHBOUR ALGORITHM

■ The weighted KNN is an extension of K-NN. It chooses the neighbors by using the weighted
distance.

■ The K- Nearest Neighbor algorithm has some serious limitations as its performance is solely
dependent on choosing the K-Nearest Neighbor the distance metrics used for the decision rule.

■ In weighted K-NN, the nearest k points are given a weight using a function called as the kernel
function. The intuition behind weighted K-NN, is to give more weight to the points which are nearby
and less weight to the points which are further away.

4
4/19/2025

WEIGHTED K-NEAREST NEIGHBOUR ALGORITHM

NEAREST CENTROID ALGORITHM

■ A simple alternative to k-NN Classifiers for similarity based classification is the nearest centroid
Classifier (Mean Difference Classifier).

■ The Nearest Centroid Classifier is a simple and intuitive classification method. It assigns a class label
to a new data point based on the closest class centroid (mean of features for each class).

■ Algorithm:

5
4/19/2025

NEAREST CENTROID ALGORITHM

LOCALLY WEIGHTED REGRESSION (LWR)

■ Locally Weighted Regression (LWR) is a non-parametric supervised learning algorithm that performs
local regression by combining regression model with nearest neighbor model.

■ LWR is also referred as memory based method as it requires training data while prediction but uses
only the training instances locally around the point of interest.

■ Using nearest neighbor we find the instances that are closest to the test instances and fit the linear
function to each of those k nearest instances in local regression model.

■ The key idea is that we need to approximate the linear function of all k neighbor that minimizes the
error such that the prediction line is no more linear but rather it is curve.

■ Ordinarily the linear regression finds out the linear relationship between x input and y output.

6
4/19/2025

LOCALLY WEIGHTED REGRESSION (LWR)

REGRESSION AND TYPES OF REGRESSION

■ Regression analysis is a statistical method to model the relationship between a dependent (target) and independent
(predictor) variables with one or more independent variables.

■ Regression is a supervised learning technique which helps in finding the correlation between variables.

■ It is mainly used for prediction, forecasting, time series modeling, and determining the causal-effect relationship
between variables.

■ Regression shows a line or curve that passes through all the data points on target-predictor graph in such a way
that the vertical distance between the data points and the regression line is minimum." The distance between data
points and line tells whether a model has captured a strong relationship or not. Function of regression analysis is
given by:

Y=f(x)
■ Here, y is called dependent variable and x is called independent variable

7
4/19/2025

TYPES OF REGRESSION

REGRESSION AND TYPES OF REGRESSION

■ Linear Regression: It is a type of regression where line is fitted upon the given data for finding the linear
relationship between one independent variable and one dependent variable to describe the relationship.

■ Multiple Regression : It is a type of regression where a line is fitted for finding the linear relationship
between two or more independent variables and one dependent variable to describe relationship among the
variables.

■ Polynomial regression: It is a type of non-linear regression method of describing the relationship among the
variables where Nth degree polynomial is used to model the relationship between one dependent and one
independent variable. It is used to model two or more independent variables and one dependent variable.

■ Logistic Regression: It is used for predicting categorical variables that are involve one or more independent
variable and one dependent variable. It is also called as Binary classifier.

8
4/19/2025

LINEAR REGRESSION
■ In the simplest form the linear regression can be created by fitting the line among the scattered data points.

■ The line is the form of the equation given below

■ Here a0 is the intercept which represents the bias and a1 represents the slope of the line. These are called
regression coefficients. e is the error in prediction.

■ The assumptions of linear regression are listed below:

– The observations (y) are random and are mutually independent.

– The difference between the predicted and true values are called error. The error is also mutually
independent with the same distribution such as normal distribution with zero mean and constant variables.

– The distribution of the error term is independent of joint distribution and the unknown parameters are
constant.

LINEAR REGRESSION
■ The formulas used to calculate the values of a0 and a1 are given below as follows

9
4/19/2025

LINEAR REGRESSION IN MATRIX FORM

■ Matrix notations can be used for representing the values of independent and dependent variables.

■ The equation is written in the form :

■ This can be written as :

– Y=Xa+e, where X is an n*2 matrix, Y is an n*1 vector, a is a 2*1 column vector and e is an n*1 column
vector.

MULTIPLE LINEAR REGRESSION

■ Multiple regression model involves multiple predictors or independent variables and one dependent variable.
This is an extension of linear regression model.

■ The basic assumption of multiple linear regression are that the independent variables are not highly corellated
and hence multicollinearity problem does not exist.

10
4/19/2025

MODULE 3
DECISION TREE AND BAYESIAN
LEARNING

DR. SHIVA PRASAD KM

ASSOCIATE PROFESSOR
DEPARTMENT OF CSE
RYMEC BALLARI
M: 7899964163

OVERVIEW

• INTRODUCTION OF DECISION TREE

• ENTROPY AND INFORMATION GAIN

• DECISION TREE INDUCTION ALGORITHM: ID3

• DECISION TREE REPRESENTATION

• REGRESSION TREE AND ITS PROBLEMS

Dr. Shiva Prasad KM Associate Professor CSE Department RYMEC Ballari M: 7899964163

11
4/19/2025

INTRODUCTION OF DECISION TREE

■ A decision tree is a supervised machine learning algorithm used for both classification
and regression tasks.

■ It's a predictive modeling tool that works by creating a tree-like structure, representing
decisions and their possible consequences. Decision tree learning involves building a
tree from a dataset by recursively splitting it based on different attributes or features.

■ The representation of a decision tree is in the form of a tree structure where each
internal node represents a feature, each branch represents a decision rule based on that
feature, and each leaf node represents the outcome or the class label.

INTRODUCTION OF DECISION TREE

12
4/19/2025

PROBLEMS OF DECISION TREE

■ Classification problems: Decision trees are excellent for classification tasks. Whether the classes are
binary (e.g., yes/no, true/false) or multiclass (categorizing into more than two classes), decision trees can
effectively handle these scenarios.

■ Predictive analysis: When you need to make predictions about continuous variables (regression
problems), decision trees can also be used. For instance, predicting the price of a house based on various
features like area, location, number of bedrooms, etc.

■ Feature selection: Decision trees can help identify important features in a dataset. They assess the
relevance of each feature by their placement in the tree, making them valuable for feature selection in
larger datasets.

PROBLEMS OF DECISION TREE

■ Data exploration and understanding: Decision trees are great for understanding the relationships and
structures within the data. They provide a transparent, easily interpretable model that can be useful in
uncovering insights about the relationships between different attributes.

■ Handling non-linear relationships: Decision trees can model non-linear relationships between features
and the target variable, as they recursively split the data based on these relationships.

■ Medical diagnosis: Decision trees are used in the medical field for diagnostic purposes. They can be
employed to determine the likelihood of a disease based on symptoms or other medical test results.

■ Customer relationship management: For customer segmentation or churn prediction in business

applications, decision trees can help identify patterns that differentiate customer behaviors.

13
4/19/2025

PROBLEMS OF DECISION TREE

■ Anomaly detection: Decision trees can be utilized to identify anomalies or outliers in a dataset by
flagging cases that do not fit the patterns established by the tree.

■ Text and sentiment analysis: In natural language processing, decision trees can be applied for
sentiment analysis, spam filtering, and text categorization.

■ Multi-output problems: Decision trees can handle multiple outputs. For example, if a system needs to
make multiple decisions simultaneously based on various input features, decision trees can be structured
to handle such scenarios.

ENTROPY AND INFORMATION GAIN

■ Entropy and Information Gain are fundamental concepts used in decision tree learning, particularly in
the construction and selection of nodes during the tree-building process.

■ Entropy: Entropy is a measure of impurity or disorder in a set of examples. In the context of decision
trees, entropy is used to determine the homogeneity of a group of examples.

■ Lower entropy implies that the examples within a group are more uniform or pure, while higher entropy
indicates the presence of mixed or diverse examples.
■ The formula for entropy, often used in the context of decision trees, is given by:
Entropy=−∑(pi⋅log2(pi))

Where: pi : represents proportion of examples in the dataset that belongs to class I and
the summation runs over all classes in the dataset.

14
4/19/2025

ENTROPY AND INFORMATION GAIN

■ Information Gain: Information Gain is the measure of the effectiveness of a particular attribute in
classifying the examples. When constructing a decision tree, the algorithm aims to choose the attribute
that best splits the dataset. Information Gain helps decide which attribute to use for this split by
determining the attribute that provides the most knowledge or the most significant reduction in entropy.

■ The formula for Information Gain is based on the entropy. For a given attribute A, the Information
Gain is calculated as follows:

Information Gain=Entropy before splitting−Weighted average of Entropy after splitting

Example:

■ The attribute with the highest Information Gain is considered the most informative or the best attribute
for splitting the dataset at a particular node in the decision tree.

Iterative Dichotomiser 3-ID3 Algorithm

■ The ID3 (Iterative Dichotomiser 3) algorithm is a classic approach used for constructing decision trees
by using the training dataset with labels.

■ It was developed by Ross Quinlan and is specifically designed for learning decision trees from datasets.

■ The key steps of the ID3 algorithm are as follows:

■ Calculate Entropy: Compute the entropy of the target variable in the dataset.

■ Feature Selection: For each attribute in the dataset:

– Calculate the Information Gain for that attribute.

– Select the attribute that has the highest Information Gain as the best attribute to split the dataset.

■ Create a Node: Create a decision node based on the best attribute.

15
4/19/2025

Iterative Dichotomiser 3-ID3 Algorithm

■ Split the Dataset: Split the dataset into subsets based on the values of the selected attribute.

■ Repeat: For each subset, recursively apply the above steps until one of the following conditions is met:

– All the instances in the subset belong to the same class (pure node).

– There are no more attributes left to split, or a predefined stopping criterion is met.

Iterative Dichotomiser 3-ID3 Algorithm

■ Let's consider an example to demonstrate the ID3 algorithm. Suppose we have a dataset that
contains weather and play tennis data, and we aim to construct a decision tree using the ID3
algorithm.

16
4/19/2025

Iterative Dichotomiser 3-ID3 Algorithm

17
4/19/2025

Iterative Dichotomiser 3-ID3 Algorithm

18
4/19/2025

Iterative Dichotomiser 3-ID3 Algorithm

19
4/19/2025

Iterative Dichotomiser 3-ID3 Algorithm

20
4/19/2025

Iterative Dichotomiser 3-ID3 Algorithm

21
4/19/2025

Iterative Dichotomiser 3-ID3 Algorithm

22
4/19/2025

Iterative Dichotomiser 3-ID3 Algorithm

REGRESSION TREE
■ Regression Tree are a variant of decision tree where the target feature is the continuous valued variable.
These trees can be constructed using an algorithm called reduction in variance which uses standard
deviation to choose the best splitting attribute.

23
4/19/2025

REGRESSION TREE

PROBLEM ON REGRESSION TREE

■ Construct a regression tree using the table below which consists of 10 data instances and 3 attributes
“Assessment”, “Assignment” and “Project”. The target attribute is the Result which is a continuous
attribute.

24
4/19/2025

VALIDATING AND PRUNNING OF DECISION TREE

■ Validating and pruning decision trees is a crucial part of building accurate and robust machine learning
models.

■ Decision trees are prone to over fitting, which means they can learn to capture noise and details in the
training data that do not generalize well to new, unseen data.

■ Validation and pruning are techniques used to mitigate this issue and improve the performance of
decision tree models.

■ The pre-pruning technique of Decision Trees is tuning the hyper parameters prior to the training
pipeline. It involves the heuristic known as ‘early stopping’ which stops the growth of the decision tree -
preventing it from reaching its full depth. It stops the tree-building process to avoid producing leaves
with small samples.

VALIDATING AND PRUNNING OF DECISION TREE

■ Post-pruning does the opposite of pre-pruning and allows the Decision Tree model to grow to its full
depth. Once the model grows to its full depth, tree branches are removed to prevent the model from over
fitting. The algorithm will continue to partition data into smaller subsets until the final subsets produced
are similar in terms of the outcome variable. The final subset of the tree will consist of only a few data
points allowing the tree to have learned the data to the T. However, when a new data point is introduced
that differs from the learned data - it may not get predicted well.

ML Module 3
No ratings yet
ML Module 3
34 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
Module 3
No ratings yet
Module 3
101 pages
Classification
No ratings yet
Classification
74 pages
Unit 2 Supervised Learning and Applications
No ratings yet
Unit 2 Supervised Learning and Applications
13 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Module 3
No ratings yet
Module 3
63 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2
No ratings yet
ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2
23 pages
ML 2 ND Unit
No ratings yet
ML 2 ND Unit
50 pages
Foundation of Machine Learning F-PMLFML02-WS
No ratings yet
Foundation of Machine Learning F-PMLFML02-WS
352 pages
Class 3 - Classification
No ratings yet
Class 3 - Classification
80 pages
Lecture 3
No ratings yet
Lecture 3
51 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Supervised Learning. wk3
No ratings yet
Supervised Learning. wk3
18 pages
Classification and Regression
No ratings yet
Classification and Regression
26 pages
Module - 03 Machine Learning (BCS602) Search Creators
No ratings yet
Module - 03 Machine Learning (BCS602) Search Creators
29 pages
Module 3
No ratings yet
Module 3
23 pages
28 - AI-Regression vs. Classification
No ratings yet
28 - AI-Regression vs. Classification
35 pages
Unit 3
No ratings yet
Unit 3
45 pages
Unit 2 - NOTES1 - ML
No ratings yet
Unit 2 - NOTES1 - ML
35 pages
ML Module3
No ratings yet
ML Module3
136 pages
Broadly, There Are 3 Types of Machine Learning Algorithms.
No ratings yet
Broadly, There Are 3 Types of Machine Learning Algorithms.
33 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
INSTANCE Based Learning
No ratings yet
INSTANCE Based Learning
12 pages
Unit 6
No ratings yet
Unit 6
107 pages
Machine Learning
No ratings yet
Machine Learning
62 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Supervised Learning Notes
No ratings yet
Supervised Learning Notes
13 pages
Classification & Regression Models
No ratings yet
Classification & Regression Models
32 pages
Algorithms
No ratings yet
Algorithms
5 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
Supervised Learning
No ratings yet
Supervised Learning
24 pages
Unit-2 Supervised Machine Learning
No ratings yet
Unit-2 Supervised Machine Learning
132 pages
Module 3-1
No ratings yet
Module 3-1
46 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
53 pages
Chapter 2
No ratings yet
Chapter 2
50 pages
Week-14 Lecture 28
No ratings yet
Week-14 Lecture 28
34 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
@vtudeveloper - in ML Mod 3
No ratings yet
@vtudeveloper - in ML Mod 3
29 pages
Supervised ML
No ratings yet
Supervised ML
69 pages
CS550 Regression
No ratings yet
CS550 Regression
62 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
Data Analysis Chap 3
No ratings yet
Data Analysis Chap 3
21 pages
MLT Unit 3 Part 2
No ratings yet
MLT Unit 3 Part 2
57 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
28 pages
Unit 2
No ratings yet
Unit 2
19 pages
Module 3
No ratings yet
Module 3
20 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
Types of Supervised Learning2
No ratings yet
Types of Supervised Learning2
66 pages
5 Regression-1
No ratings yet
5 Regression-1
46 pages
IDA117V Supervised ML
No ratings yet
IDA117V Supervised ML
39 pages
AI ML 3 Updated
No ratings yet
AI ML 3 Updated
34 pages
REGRESSION
No ratings yet
REGRESSION
13 pages
Regression Models: by Mayuri Bhandari
No ratings yet
Regression Models: by Mayuri Bhandari
64 pages
Machine Learning for Engineering Students
No ratings yet
Machine Learning for Engineering Students
31 pages
Further Statistics 1 Unit Test 1 Discrete Probability Distributions
No ratings yet
Further Statistics 1 Unit Test 1 Discrete Probability Distributions
3 pages
Quantitative Techniques For Management Sciences Practice Questions
No ratings yet
Quantitative Techniques For Management Sciences Practice Questions
2 pages
Quality and Productivity Management
No ratings yet
Quality and Productivity Management
7 pages
Principal Components in Regression Analysis
No ratings yet
Principal Components in Regression Analysis
27 pages
Excel For Statistical Data Analysis
No ratings yet
Excel For Statistical Data Analysis
54 pages
Da Notes
No ratings yet
Da Notes
6 pages
Homework #6
No ratings yet
Homework #6
3 pages
Unit 3 Analysis of Data Assignment Part 1 March 2023 v2
No ratings yet
Unit 3 Analysis of Data Assignment Part 1 March 2023 v2
4 pages
Data Analysis & Interpretation Guide
No ratings yet
Data Analysis & Interpretation Guide
4 pages
Hypothesis Testing II
No ratings yet
Hypothesis Testing II
48 pages
Huang 2019 Manova A Procedure Whose Time Has Passed Tambah8
No ratings yet
Huang 2019 Manova A Procedure Whose Time Has Passed Tambah8
5 pages
(Ebook PDF) Business Statistics: A Decision Making Approach 10th Edition Full Chapters Included
100% (4)
(Ebook PDF) Business Statistics: A Decision Making Approach 10th Edition Full Chapters Included
88 pages
10.1016@B978 0 12 816797 7.00009 6
No ratings yet
10.1016@B978 0 12 816797 7.00009 6
13 pages
Chapter 23 Correlation and Linear Regression Lecture Notes
No ratings yet
Chapter 23 Correlation and Linear Regression Lecture Notes
23 pages
HR Om11 ch04
No ratings yet
HR Om11 ch04
63 pages
Python Tutorial
No ratings yet
Python Tutorial
37 pages
SDM Perawat dan Beban Kerja
No ratings yet
SDM Perawat dan Beban Kerja
17 pages
ANOVA Guide for Data Analysts
No ratings yet
ANOVA Guide for Data Analysts
32 pages
Causal Comparative Research
No ratings yet
Causal Comparative Research
28 pages
Casino Games and The Central Limit Theorem: Ashok Singh, Ph. D. Rohan J. Dalpatadu, Ph.D. Dennis J. Murphy, PH.D
No ratings yet
Casino Games and The Central Limit Theorem: Ashok Singh, Ph. D. Rohan J. Dalpatadu, Ph.D. Dennis J. Murphy, PH.D
17 pages
Discrete Random Variables Guide
No ratings yet
Discrete Random Variables Guide
15 pages
3is Q4 Complete Notes
No ratings yet
3is Q4 Complete Notes
20 pages
Business Analytics 2 Marks Question Bank
No ratings yet
Business Analytics 2 Marks Question Bank
5 pages
Ap Stats Cheat Sheet
No ratings yet
Ap Stats Cheat Sheet
1 page
Chi Square
No ratings yet
Chi Square
34 pages
Statistics 1 AQA Revision Notes
No ratings yet
Statistics 1 AQA Revision Notes
7 pages
Time Series Analysis: Unit Root Tests & Box-Jenkins
No ratings yet
Time Series Analysis: Unit Root Tests & Box-Jenkins
35 pages
D 19 Title & Unit 1-3
No ratings yet
D 19 Title & Unit 1-3
103 pages
Statistics Assignment Analysis
No ratings yet
Statistics Assignment Analysis
3 pages
Eda 10 Anova
No ratings yet
Eda 10 Anova
34 pages

Module 3

Uploaded by

Module 3

Uploaded by

4/19/2025

DR. SHIVA PRASAD KM

OVERVIEW CHAPTER 4 5 & 6 FROM TEXT BOOK 1

INTRODUCTION ON SIMILARITY BASED LEARNING

■ An instance is an entity or an example in the training dataset. It is described by a set of features or

INSTANCE BASED LEARNING VS MODEL BASED LEARNING

NEAREST NEIGHBOUR LEARNING

NEAREST NEIGHBOUR LEARNING

NEAREST NEIGHBOUR LEARNING

WEIGHTED K-NEAREST NEIGHBOUR ALGORITHM

WEIGHTED K-NEAREST NEIGHBOUR ALGORITHM

NEAREST CENTROID ALGORITHM

NEAREST CENTROID ALGORITHM

LOCALLY WEIGHTED REGRESSION (LWR)

LOCALLY WEIGHTED REGRESSION (LWR)

REGRESSION AND TYPES OF REGRESSION

REGRESSION AND TYPES OF REGRESSION

■ The line is the form of the equation given below

■ The assumptions of linear regression are listed below:

– The observations (y) are random and are mutually independent.

LINEAR REGRESSION IN MATRIX FORM

■ The equation is written in the form :

■ This can be written as :

MULTIPLE LINEAR REGRESSION

DR. SHIVA PRASAD KM

• INTRODUCTION OF DECISION TREE

• ENTROPY AND INFORMATION GAIN

• DECISION TREE INDUCTION ALGORITHM: ID3

• DECISION TREE REPRESENTATION

• REGRESSION TREE AND ITS PROBLEMS

INTRODUCTION OF DECISION TREE

INTRODUCTION OF DECISION TREE

PROBLEMS OF DECISION TREE

PROBLEMS OF DECISION TREE

■ Customer relationship management: For customer segmentation or churn prediction in business

PROBLEMS OF DECISION TREE

ENTROPY AND INFORMATION GAIN

ENTROPY AND INFORMATION GAIN

Information Gain=Entropy before splitting−Weighted average of Entropy after splitting

Iterative Dichotomiser 3-ID3 Algorithm

■ The key steps of the ID3 algorithm are as follows:

■ Feature Selection: For each attribute in the dataset:

– Calculate the Information Gain for that attribute.

■ Create a Node: Create a decision node based on the best attribute.

Iterative Dichotomiser 3-ID3 Algorithm

Iterative Dichotomiser 3-ID3 Algorithm

Iterative Dichotomiser 3-ID3 Algorithm

Iterative Dichotomiser 3-ID3 Algorithm

Iterative Dichotomiser 3-ID3 Algorithm

Iterative Dichotomiser 3-ID3 Algorithm

Iterative Dichotomiser 3-ID3 Algorithm

Iterative Dichotomiser 3-ID3 Algorithm

Iterative Dichotomiser 3-ID3 Algorithm

Iterative Dichotomiser 3-ID3 Algorithm

Iterative Dichotomiser 3-ID3 Algorithm

Iterative Dichotomiser 3-ID3 Algorithm

Iterative Dichotomiser 3-ID3 Algorithm

Iterative Dichotomiser 3-ID3 Algorithm

Iterative Dichotomiser 3-ID3 Algorithm

PROBLEM ON REGRESSION TREE

VALIDATING AND PRUNNING OF DECISION TREE

VALIDATING AND PRUNNING OF DECISION TREE

You might also like