0% found this document useful (0 votes)
32 views75 pages

DT Dr. Rabi Shaw

Uploaded by

kishorechiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views75 pages

DT Dr. Rabi Shaw

Uploaded by

kishorechiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Outline

Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Classification Techniques
January 4, 2025

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Data
Similarity Measures Between Data Objects

Introduction to Classification

Decision Tree

Distance Based Classification Algorithms

Dr. Rabi Shaw Classification Techniques


Outline
Data
Introduction to Classification Similarity Measures Between Data Objects
Decision Tree
Distance Based Classification Algorithms

Data
Collection of data objects and their attributes.
An attribute is a property or characteristic of an object.
A collection of attributes describe an object.

Table: (Source: Tan, Kumar and Steinbach, Introduction to Data Mining)


Tid Refund Marital Income Cheat
Status (Class)
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes

Classification Techniques
Dr. Rabi Shaw
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Classification

Definition
The task of classification is to assign an object into one of the
predefined categories.

Definition
Given a dataset D of objects and a set of classes C, the classification
problem is to define a mapping f : D → C, where each object is
assigned to only one class. A class Cj ∈ C contains precisely those
objects mapped to it. Cj = {oi | f (oi ) = Cj }

Classification Techniques
Dr. Rabi Shaw
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Classification

Working Principle

A set of objects with their categories are provided. Training


Set.
It captures relationship between objects and their categories.
(Find a suitable model)
Assign a class label (category) to an unknown object.

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Classification Process

Training Set
(xi, y i) Classifier
l
Model (xk , y )

(xk , ?)
Test Set

Figure: Working Principle of Classification Technique.

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Examples of Classification Task

Predicting tumor cells as cancerous or non-cancerous.

Categorizing new articles into one of the predefined categories,


i.e., Political News, Sports News, Entertainment News, etc

Categorizing animals as Mammals or non-Mammals.

Classifying email into spam and non-spam.


Identifying Credit card transaction as legitimate or fraudulent.

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Popular Classification Methods

Decision Tree based Approaches


Neighborhood based Approaches.
Neural Networks
Bayes Theorem and Naive Bayes Network
Support Vector Machines.
Ensemble Classifiers.

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Decision Tree
Table: Training Set (Source: Tan, Kumar and Steinbach, Introduction to Data Mining)

Tid Refund Marital Income Cheat


Status (Class)
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No (a) Decision Tree
5 No Divorced 95K Yes
6 No Married 60K No Refund
7 Yes Divorced 220K No s No
Ye
8 No Single 85K Yes
9 No Married 75K No Marital Status
No
10 No Single 90K Yes Married
Single,Divorced

INCOME No

<80K >80K

No YES

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Refund Status Income Class


No Single 81K ?

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Refund Status Income Class


No Single 81K ?

Refund
s No
Ye
Marital Status
No
Married
Single,Divorced

INCOME No

<80K >80K

No YES

(b) Decision Tree

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Refund Status Income Class


No Single 81K ?

Refund
s No
Ye
Marital Status
No
Married
Single,Divorced

INCOME No

<80K >80K

No YES

(c) Decision Tree

Refund Status Income Class


No Single 81K Yes
Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Decision Tree

Very popular classification method

Model is interpretable

Simple algorithm for model building

Prediction accuracy is adequate for many applications

Prediction process is straightforward.

Can handle nominal, ordinal and continuous attributes.

Assume the data fits in memory.

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Decision Tree (DT)

Decision Tree is a classifier.


A decision tree (DT) corresponding to a given training set T
has the following properties.
1 Each internal node denotes a test on an attribute.
2 Each branch represents an outcome of the test.
3 Each leaf node holds a class label.

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

How to build DT: Hunt’s Algorithm

Let TN be a training set associated with a node N.


If all records in TN belong to a single class C1 , then node N is
a leaf node and labeled as the class C1 .

If TN records belong to more than one class, an


attribute test condition needs to be selected for partitioning
the records into smaller subsets.
Recursively apply the procedure to each subset.

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Hunt’s Algorithm (Contd..)

Table: Training Set (Source: Tan, Kumar and Steinbach, Introduction to Data Mining)

Tid Home Marital Income Defaulter


Owner Status (Class)
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Issues in Decision Tree

How to partition the records ?

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Issues in Decision Tree

How to partition the records ?

How to determine best split?

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Splitting Criterion

Splitting criterion indicates splitting attribute with split point


(test condition). (Objective is to get each partition as pure
(homogeneous ) as possible.)

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

How to specify test condition

Depends on attributes types


1 Nominal
2 Ordinal
3 Continuous

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

How to specify test condition

Depends on attributes types


1 Nominal
2 Ordinal
3 Continuous

Depends on number of ways to split


1 2-way split
2 Multi-way split

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Splitting Based on Continuous Attributes

Discretization: to form an ordinal attribute. (bucketing )


Binary decision: Income < 80K and Income ≥ 80K.

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

How to determine best split

Nodes with homogeneous class distribution are preferred.


Needs to have a measure of node homogeneity (node
impurity/ node purity)

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

How to determine best split

A metric is used for selecting the best split.

Three popular metrics:


1 Gini Index (used in CART)
2 Entropy (used in C4.5)
3 Classification Error

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Table: Before Split : Node M0


Class # object
C0 N0
C1 N1

րM1
Using Attribute AցM 2

Table: Split with A : M12


Node Class # object
M1 C0 N10
C1 N11
M2 C0 N20
C1 N21

րM3
Using Attribute BցM 4

Table: Split with B : M34


Node Class # object
M3 C0 N30
C1 N31
M4 C0 N40
C1 N41

Gain= (M0 - M12 ) Vs (M0 -M34 )


Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Summary of the last Class


Working Principle of Classification Technique.
Training Set
(xi, y i) Classifier
l
Model (xk , y )

(xk , ?)
Test Set

Decision Tree as a Classifier.

How to determine best split


1 Gini Index (used in CART)

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Summary of the last Class


Working Principle of Classification Technique.
Training Set
(xi, y i) Classifier
l
Model (xk , y )

(xk , ?)
Test Set

Decision Tree as a Classifier.

How to determine best split


1 Gini Index (used in CART)
2 Entropy (used in C4.5)
3 Classification Error
Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Gini Index (used in CART)

Gini Index for a given node t is computed as follow.


X
GINI (t) = 1 − [p(j|t)]2
j

where p(j|t) is the relative frequency of class j at node t.

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Gini Index (used in CART)

Gini Index for a given node t is computed as follow.


X
GINI (t) = 1 − [p(j|t)]2
j

where p(j|t) is the relative frequency of class j at node t.

What is the maximum value of GINI (.) ?


What is the minimum value ?

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

2
P
GINI (t) = 1 − j [p(j|t)]
GINI =?
Class #object
C1 0
C2 6
GINI =?
Class #object
C1 1
C2 5

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

2
P
GINI (t) = 1 − j [p(j|t)]
GINI =?
Class #object
C1 0
C2 6
GINI =?
Class #object
C1 1
C2 5
GINI =?
Class #object
C1 3
C2 3

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

2
P
GINI (t) = 1 − j [p(j|t)]
GINI =?
Class #object
C1 0
C2 6
GINI =?
Class #object
C1 1
C2 5
GINI =?
Class #object
C1 3
C2 3
GINI = 0.278
Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

2
P
GINI (t) = 1 − j [p(j|t)]

When a node p is split into k children, then Combined split of


children is computed as,
k
X ni
GINIsplit = GINI (i )
n
i

ni = # records/objects at child i .
n = # records at p.

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Computing GINI Index

Node Class # records


Parent C1 6
C2 6

Attribute AրN
ցN2
1

Class N1 N2
C1 5 1
C2 2 4

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Computing GINI Index

Node Class # records


Parent C1 6
C2 6

Attribute AրN
ցN2
1

Class N1 N2
C1 5 1
C2 2 4

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Computing GINI Index

Node Class # records


Parent C1 6
C2 6

Attribute AրN
ցN2
1

Class N1 N2
C1 5 1
C2 2 4

GINIsplit =? = (7/12) ∗ 0.408 + (5/12) ∗ 0.285

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Entropy

ID3, C4.5 use Information Gain using Entropy as the


attribute selection measure.

X
Entropy (t) = − p(j/t) log2 p(j/t)
j

where p(j|t) is the relative frequency of class j at node t.

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

P
Entropy (t) = − j p(j/t) log2 p(j/t)
Entropy =?
Class #object
C1 0
C2 6
Entropy =?
Class #object
C1 1
C2 5

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

P
Entropy (t) = − j p(j/t) log2 p(j/t)
Entropy =?
Class #object
C1 0
C2 6
Entropy =?
Class #object
C1 1
C2 5
Entropy =?
Class #object
C1 3
C2 3

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

P
Entropy (t) = − j p(j/t) log2 p(j/t)
Entropy =?
Class #object
C1 0
C2 6
Entropy =?
Class #object
C1 1
C2 5
Entropy =?
Class #object
C1 3
C2 3
Entropy =?
Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Information Gain

Let A be an attribute with v distinct values.

P ni
Information Gain (A) = Entropy (A) − i ( n )Entropy (ni )

(Gain(A) tells how much would be the gained by branching on


A)

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Computing Information Gain

Node Class # records


Parent C1 6
C2 6

Attribute AրN
ցN2
1

Class N1 N2
C1 5 1
C2 2 4

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Computing Information Gain

Node Class # records


Parent C1 6
C2 6

Attribute AրN
ցN2
1

Class N1 N2
C1 5 1
C2 2 4

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Computing Information Gain

Node Class # records


Parent C1 6
C2 6

Attribute AրN
ցN2
1

Class N1 N2
C1 5 1
C2 2 4

Entropysplit =? = (7/12)∗? + (5/12)∗?

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Computing Information Gain

Node Class # records


Parent C1 6
C2 6

Attribute AրN
ցN2
1

Class N1 N2
C1 5 1
C2 2 4

Entropysplit =? = (7/12)∗? + (5/12)∗?


Information Gain=?

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Gain Ratio
Information Gain is biased towards the test with many
outcomes.

Gain Ratio overcomes this bias.


C4.5 uses Gain Ratio.

Information Gain (A) is normalized by SPLITINFO.


SPLITINFO (A) = − v nn1 × log2 ( nn1 )
P

Gain(A)
Gain Ratio(A) =
SPLITINFO(A)
Attribute with maximum Gain Ratio is selected as the splitting
attribute.
Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Classification Error

Classification Error (t) = 1 − maxj p(j/t)

What is the maximum and minimum value of Classification


Error?

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Classification Error (t) = 1 − maxj [p(j/t)]


Error =?
Class #object
C1 0
C2 6
Error =?
Class #object
C1 1
C2 5

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Classification Error (t) = 1 − maxj [p(j/t)]


Error =?
Class #object
C1 0
C2 6
Error =?
Class #object
C1 1
C2 5
Error =?
Class #object
C1 3
C2 3

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Classification Error (t) = 1 − maxj [p(j/t)]


Error =?
Class #object
C1 0
C2 6
Error =?
Class #object
C1 1
C2 5
Error =?
Class #object
C1 3
C2 3
Error =
Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Comparison among Metrics

(d) Sou teinbach, Introduction to Data Mining


Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Stopping Criteria

Stop expanding a node when all the records belong to the


same class

Stop expanding a node when all the records have similar


attribute values

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Decision Tree

Inexpensive to construct

Extremely fast at classifying unknown records


Accuracy is comparable to other classification techniques for
many simple data sets.

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Underfitting and Overfitting

Training Error : Error committed by a classification model on


training set.
Generalization Error :Expected error of the model on unseen
objects.

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Underfitting and Overfitting

Training Error : Error committed by a classification model on


training set.
Generalization Error :Expected error of the model on unseen
objects.

A good classification model must fit the training data as well


as correctly classify unknown objects.

Model Overfitting: A Classification model, which commits low


training error and high generalization error.

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Model Overfitting

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Cause of Model Overfitting

Presence of Noise :

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Cause of Model Overfitting

Presence of Noise :

Lack of representative Samples :

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Estimating Generalization Error

P
Error on training set: e(T ) = i e(t)
P ′
Generalization error : j e (t)
P ′
Estimating j e (t)
For each leaf node e ′ (t) = e(t) × 0.5
Estimated Error=e(T ) + N × 0.5,
N = number of leaf nodes in the DT.

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Estimating Generalization Error

P
Error on training set: e(T ) = i e(t)
P ′
Generalization error : j e (t)
P ′
Estimating j e (t)
For each leaf node e ′ (t) = e(t) × 0.5
Estimated Error=e(T ) + N × 0.5,
N = number of leaf nodes in the DT.
For a tree with 40 leaf nodes and 10 errors on training (out of
1000 instances):
Estimated Error=

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Minimum Description Length (MDL)

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Minimum Description Length (MDL)

Cost(Model , Data) = Cost(Model ) + Cost(Data|Model )

Cost(Data|Model ) encodes the misclassification errors.


Cost(Model ) uses node encoding (number of children) plus
splitting condition encoding.
Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Nearest Neighbor Classifier(Cover and Hart, 1967)

Let T = {(xi , yi )}ni=1 be set of labeled patterns (Training set),


where xi is a pattern and yi be its class label. Let x be a
pattern with unknown class label (test pattern).
NN Rule is :
Let x ′ ∈ T be the pattern nearest to a test pattern x.
l (x) = l (x ′ )
Complexity:
Time: O(|T |)
Space: O(|T |)

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

K-Nearest Neighbor Classifier


Let T = {(xi , yi )}ni=1 be a Training set.
Let x be a pattern with unknown class label (test pattern).

Algorithm:
KNN = ∅
For each t ∈ T
1 if |KNN| <= K
KNN = KNN ∪ {t}
2 else
1 Find a x ′ ∈ KNN such that dis(x, x ′ ) > dis(x, t)
2 KNN = KNN − {x ′ }; KNN = KNN ∪ {t}
The pattern x belongs to a Class in which most of the
patterns in KNN belong to.
Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

How to find the value of K

r -fold Cross Validation


Partition the training set into r blocks. Let these are
T1 , T2 , . . . , Tr
For i = 1 to r do
1 Consider T − Ti as the training set and Ti as the validation set.
2 For a range of K values (say from 1 to m) find the error rates
on the validation set.
3 Let these error rates are ei 1 , ei 2 , . . . , eim
Take ei = mean of {e1i , e2i , . . . , eri }, for i = 1 to m.
The value of K = argmin{e1 , e2 , . . . , em }
i

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Weighted k-NNC(Dudani, 1976)


k-NNC gives equal importance to the first NN and to the last
NN.
Weight is assigned to each nearest neighbor of a quary
pattern.
Cj C
Let X = {xi=1..k | xi j ∈ T } be the set of k-NN of q, whose
class label is to be determined.
Let D = {d1 , d2 , . . . , dk } be an ordered set, where
di = ||xi − q||, di ≤ dj , i < j
The weight wj is assigned to j th nearest neighbor as follows.
( d −d
k j
if dj 6= d1
wj = dk −d1
1 if dj = d1
Calculate weighted sums of patterns belong to each class.
Classify q to a class for which weighted sum is maximum.
Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Model Evaluation

Counts the test records correctly or incorrectly predicted by a


classifier.
Confusion Matrix

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Metrics for Performance Evaluation

Table: Confusion Matrix


ACTUAL CLASS
Class=Yes Class=No
PREDICTED Class=Yes a b
CLASS Class=No c d

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Table: Confusion Matrix


ACTUAL CLASS
Class=Yes Class=No
PREDICTED Class=Yes TP FP
CLASS Class=No FN TN

TP + TN
Accuracy=
TP + FP + FN + TN

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Limitation of Accuracy

Consider a 2-class problem.


Class +ve: 9990
Class -ve:10

If model predicts everything to be +ve class, accuracy is


9990/10000 = 99.9%

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Limitation of Accuracy

Consider a 2-class problem.


Class +ve: 9990
Class -ve:10

If model predicts everything to be +ve class, accuracy is


9990/10000 = 99.9%
Accuracy is misleading.

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Classification Cost

Table: Cost Matrix


ACTUAL CLASS
Class=Yes Class=No
PREDICTED Class=Yes C (Yes|Yes) Cost(Yes|No)
CLASS Class=No Cost(No|Yes) Cost(No|No)

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Table: Cost Matrix


ACTUAL CLASS
Class=Yes Class=No
PREDICTED Class=Yes −1 100
CLASS Class=No 1 0

Table: Confusion Matrix


ACTUAL CLASS
Class=Yes Class=No
PREDICTED Class=Yes 150 60
CLASS Class=No 40 250

Accuracy=? Cost=?

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

Cost-Sensitive Measures

Precision(P)=
TP
TP + FP
Recall(R)=
TP
TP + FN
F-measure=
2×P ×R
P +R

Classification Techniques
Outline
Data
Introduction to Classification
Decision Tree
Distance Based Classification Algorithms

THANK YOU

Dr. Rabi Shaw Classification Techniques

You might also like